Personal tools
You are here: Home SIGs Speech SIG INTERSPEECH 2012 Speaker Trait Challenge

INTERSPEECH 2012 Speaker Trait Challenge

Homepage of the INTERSPEECH 2012 Speaker Trait Challenge - Personality, Likability, Pathology.

 

Speaker Trait Challenge, Interspeech 2012

Personality, Likability, Pathology



Last updated:

19 September 2012

Last addition:

Winners added.


Organisers:

Björn Schuller (TUM, Germany)
Stefan Steidl (FAU Erlangen-Nuremberg, Germany)
Anton Batliner (FAU Erlangen-Nuremberg, Germany)
Elmar Nöth (FAU Erlangen-Nuremberg, Germany)
Alessandro Vinciarelli (University of Glasgow, UK)
Felix Burkhardt (Deutsche Telekom, Germany)
Rob van Son (Netherlands Cancer Institute / University of Amsterdam, The Netherlands)

Sponsored by:

HUMAINE Association
Telekom Innovation Laboratories



Officially started:

19 December 2011



*FINISHED* Winners of the INTERSPEECH 2012 Speaker Trait Challenge:
  • The Personality Sub-Challenge Prize is awarded to:
    Alexei V. Ivanov and Xin Chen
    Modulation Spectrum Analysis for Speaker Personality Trait Recognition
  • The Likability Sub-Challenge Prize is awarded to:
    Claude Montacié and Marie-José Caraty
    Pitch and Intonation Contribution to Speakers’ Traits Classification
  • The Pathology Sub-Challenge Prize is awarded to:
    Jangwon Kim, Naveen Kumar, Andreas Tsiartas, Ming Li and Shrikanth S. Narayanan
    Intelligibility Classification of Pathological Speech Using Fusion of Multiple Subsystems

    The organisers congratulate the winners and thank all participants for their outstanding contributions - Overall the best result could be reached by fusion of these in the first two Sub-Challenges.

    If you are interested in using the corpora of the Challenge outside this event, please contact the database owners (Alessandro Vinciarelli, Felix Burkhardt, Rob van Son) for new licenses and download.





Deadlines:

1 April 2012 First paper submission to INTERSPEECH 2012
18 June 2012 Final result upload
22 June 2012 Camera ready paper




Baseline results are available:

Download the Paper on the Challenge.




Call for Participation as PDF.

*Get started:* License agreement for the dataset download (All Sub-Challenges).

The Challenge

Whereas the first open comparative challenges in the field of paralinguistics targeted more "conventional" phenomena such as emotion, age, and gender, there still exists a multiplicity of not yet covered, but highly relevant speaker states and traits. In the last instalment, we focused on speaker states, namely sleepiness and intoxication. Consequently, we now want to focus on speaker traits: the INTERSPEECH 2012 Speaker Trait Challenge broadens the scope by addressing three less researched speaker traits: the computational analysis of personality, likability, and pathology in speech. Apart from intelligent and socially competent future agents and robots, main applications are found in the medical domain and surveillance.

For these Challenge tasks, the SPEAKER PERSONALITY CORPUS (SPC), the SPEAKER LIKABILITY DATABASE (SLD), and the NKI CCRT (Concomitant Chemo Radiation Treatment) SPEECH CORPUS (NCSC) with high diversity of speakers of different personality and likability and genuine pathologies will be provided by part of the organisers. The first – SPC - consists of 2 hours of French speech from 330 speakers labelled by 11 judges with standardised personality assessment tests, and will serve to evaluate features and algorithms for the estimation of speakers' personality traits in the popular "Big Five" OCEAN dimensions (openness, conscientiousness, extraversion, agreeableness, and neuroticism). The second – SLD – bases on the aGender corpus as employed in the INTERSPEECH 2010 Paralinguistic Challenge. Likability annotations by 32 labellers were added for 800 speakers in perfect age class and gender balance for roughly 1 hour of speech. Finally, NCSC provides 3 hours of Dutch speech from 40 speakers with head and neck cancer (tumours located in the vocal tract and larynx) recorded before and at various times after treatment. NCSC was created within the scope of an unrestricted research grant of Atos Medical, Sweden. The corpora feature further rich annotation such as speaker meta-data, orthographic transcript, phonemic transcript, and segmentation and multiple annotation tracks. All three are given with distinct definitions of test, development, and training partitions, incorporating speaker independence as needed in most real-life settings. Benchmark results of the most popular approaches will be provided as in the years before.

In these respects, the INTERSPEECH 2012 Speaker Trait Challenge shall help bridging the gap between excellent research on paralinguistic information in spoken language and low compatibility of results.

Three Sub-Challenges are addressed:

  • In the Personality Sub-Challenge, the personality of a speaker has to be determined based on acoustics potentially including linguistics above or below average for the OCEAN five personality dimensions.
  • In the Likability Sub-Challenge, the likability of a speaker's voice has to be determined by a suited learning algorithm and acoustic features. While the annotation provides likability in multiple levels, only two classes have to be recognised accordingly: likability above or below average.
  • In the Pathology Sub-Challenge, the intelligibility of a speaker has to be determined by a suited classification algorithm and acoustic features.

The measures of competition will be Unweighted Accuracy and Area Under the receiver operating Curve (AUC). Transcription of the train and development sets will be known. All Sub-Challenges allow contributors to find their own features with their own machine learning algorithm. However, a standard feature set will be provided per corpus that may be used. Participants will have to stick to the definition of training, development, and test sets. They may report on results obtained on the development set, but have only five trials (per Sub-Challenge) to upload their results on the test sets, whose labels are unknown to them. Each participation will be accompanied by a paper presenting the results that undergoes peer-review and has to be accepted for the conference in order to participate in the Challenge. The organisers preserve the right to re-evaluate the findings, but will not participate themselves in the Challenge. Participants are encouraged to compete in all Sub-Challenges.

Overall, contributions using the provided or an equivalent database are sought in (but not limited to) the following areas:

  • Participation in the Personality Sub-Challenge
  • Participation in the Likability Sub-Challenge
  • Participation in the Pathology Sub-Challenge
  • Novel features and algorithms for the analysis of speaker traits
  • Unsupervised learning methods for speaker trait analysis
  • Perception studies, additional annotation and feature analysis on the given sets
  • Context exploitation in speaker trait assessment

The results of the Challenge will be presented at Interspeech 2012 in Portland, Oregon.
Prizes will be awarded to the Sub-Challenge winners.
If you are interested and planning to participate in the Speaker Trait Challenge, or if you want to be kept informed about the Challenge, please send the organisers an e-mail to indicate your interest.


To get started: Please obtain the License Agreement to get a password and further instructions for the download of the datasets: Please fill it out for the dataset(s) you wish to obtain, and sign and fax, accordingly.
After downloading the data you can directly start your experiments with the train and development sets. Once you found your best method you should write your paper for the Special Event. At the same time you can compute your results per instance and Sub-Challenge task on the test set and upload them. We will then let you know your according performance result.

Paper on the Challenge: The introductive Paper on the Challenge (will be provided here soon) provides extensive descriptions and baseline results. All participants will be asked to avoid repetitions of Challenge, data, or feature descriptions in their submissions, but include the following citation:

B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss: “The Interspeech 2012 Speaker Trait Challenge”, Proc. Interspeech 2012, ISCA, Portland, OR, USA, 2012.

Result Submission: Should you want to participate in the Challenge, you have to upload your results as follows: Below are the templates to upload your results on the Test partitions in the WEKA ARFF file format (detailed information on the file format can also be found here). Note that exactly one prediction has to be provided for each instance and all instances have to be covered. A total of five uploads per participating site and Sub-Challenge is allowed - these may be used until 18 June 2012. Result confidences are optional. However, if provided, they should be in the interval [0.000;1.000] and sum up to 1 over classes per instance. The idea behind this is that we want to combine all final predictions of the participants in a ROVER or Ensemble Learning experiment to provide an impression of obtainable performance with “combined efforts”.

Result files need to be uploaded here. A username and password is sent to all participants. The results on test can directly be seen after upload. Results on the Develop partitions will need to be calculated by yourself.

Templates (Confidences are optional! The files can be made with any text editor):

Personality Sub-Challenge:

Please use the 2-class format as given in the example below:

@relation your_affiliation_IS2012_STC_Personality

@attribute filename string
@attribute openness {O,NO}
@attribute conscientiousness {C,NC}
@attribute extroversion {E,NE}
@attribute agreeableness {A,NA}
@attribute neuroticism {N,NN}
@attribute posterior_openness_O numeric
@attribute posterior_openness_NO numeric
@attribute posterior_conscientiousness_C numeric
@attribute posterior_conscientiousness_NC numeric
@attribute posterior_extroversion_E numeric
@attribute posterior_extroversion_NE numeric
@attribute posterior_agreeableness_A numeric
@attribute posterior_agreeableness_NA numeric
@attribute posterior_neuroticism_N numeric
@attribute posterior_neuroticism_NN numeric

@data
test_001.wav,O,NC,NE,A,NN,0.971,0.029,0.164,0.836,0.259,0.741,0.832,0.168,0.111,0.889
test_002.wav,NO,NC,E,A,N,0.341,0.659,0.089,0.911,0.742,0.258,0.817,0.183,1.000,0.000
...
test_201.wav,NO,C,NE,NA,NN,0.498,0.502,0.679,0.321,0.049,0.951,0.407,0.593,0.293,0.707


Likability Sub-Challenge:

Please use the 2-class format as given in the example below:

@relation your affiliation_IS2012_STC_Likability

@attribute filename string
@attribute likability {L,NL}
@attribute posterior_L numeric
@attribute posterior_NL numeric

@data
test_001.wav,L,0.986,0.014
test_002.wav,NL,0.309,0.691
...
test_228.wav,L,0.791,0.209


Pathology Sub-Challenge:

Please use the 2-class format as given in the example below:

@relation your affiliation_IS2012_STC_Pathology

@attribute filename string
@attribute intelligibility {I,NI}
@attribute posterior_I numeric
@attribute posterior_NL numeric

@data
test_001.wav,NI,0.039,0.961
test_002.wav,NI,0.253,0.747
...
test_739.wav,I,0.927,0.073


Paper Submission (all participants): Please be reminded that a paper submission is mandatory for the participation in the Challenge - however, paper contributions within the scope are also welcome if the authors do not intend to participate in the Challenge itself. In any case, please submit your paper until 1 April 2012 (and final results by 18 June 2012) using the standard style info and length limits, and submit to the regular submission system. However, you should choose only this Special Event (INTERSPEECH 2012 Speaker Trait Challenge) in the field 'Special sessions, special events, or show & tell ONLY'. Please further remind that

  • The deadline for submission of the papers and results is the INTERSPEECH 2012 paper submission deadline: 1 April 2012. Remaining result upload trials can be saved for new Challenge results until 18 June 2012.
  • The papers will undergo the normal review process.
  • Papers shall not repeat the descriptions of database, labels, partitioning etc. of the SPC, SLD, and NCSC corpora but cite the introductive paper (cf. also above).

    B. Schuller, S. Steidl, A. Batliner, E. Nöth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, B. Weiss: “The Interspeech 2012 Speaker Trait Challenge”, Proc. Interspeech 2012, ISCA, Portland, OR, USA, 2012.

  • Participants may contribute in all Sub-Challenges at a time.
  • A development set will allow for tests and results to be reported by the participants apart from their results on the official test set.
  • Papers may well report additional results on other databases.
  • An additional publication is planned that summarizes all results of the Challenge and results combination by ROVERING or ensemble techniques. However, this publication is assumed to be post INTERSPEECH 2012.


Frequently asked questions: you might find an answer to your questions here:
  • Q: Do I have to include test set results for the paper submission?
    A: If you already have these, it will be informative for the reviewers. However, these only need to be included in the final camera ready version.
  • Q: How is likability introduced to the observers?
    A: By written instruction just before the perception test. This is the text (translated from German):
    "Dear Participant,
    This is a listening test. You will hear x sentences. Please rate for each of them, how likable you find the speaker. Please try do not get influenced in your rating by the telephone-like quality of the transmission. You can listen to each sentence more than once, but please deliver your rating quickly representing your first impression."
  • Q: Can we submit one paper to Interspeech describing participation in several Sub-Challenges?
    A: This is possible - obviously the paper should read well and contain all necessary details, as it needs to pass peer-review as condition for successful Challenge participation.
  • Q: Do you have any document explaining the meaning of each phonemic symbol as used in the Pathology Sub-Challenge transcripts?
    A: Here you go: SAMPA symbols and YAPA symbols.
  • Q: In the provided database there is an EWE value. What is the meaning of this value? ?
    A: The Evaluator Weighted Estimator (EWE) introduces evaluator-dependent (i.e., rater-dependent or labeller dependent, if you will) weights to compute a weighted mean over the raters rather than weighting each rater equally. The idea behind is that raters underly a certain amount of disturbance during their rating process. EWE bases on cross-correlation between individual and mean responses and was repeatedly proven to lead to better results than w/o weighting. For details, please refer to cite [7] in the Paper on the Challenge.
  • Q: Is there a document that describes how the data for the Personality and Likability Sub-Challenge have been gathered?
    A: Please refer to cites [3] and [5] in the Paper on the Challenge. At the time, these are the only further documents in this respect.
  • Q: Can we have richer data of the observers/observations, so that we can also look at interobserver statistics?
    A: Items not available in the current download package will only be available after the Challenge. Then, however, this will be the case.
  • Q: My results are below the baseline - does it make sense to submit?
    A: Of course it does. We do not know whether the baseline will be surpassed and different experiences with the tasks on the same dataset will be of interest. Please remember that all submissions to the challenge go through the normal reviewing process. Although it is very likely that the reviewers do know - and take into account - the baselines, the criteria are the usual, i.e. scientific quality; surpassing any baseline - be this the one given for this challenge, or another one known from the literature - is just one of the criteria. A paper reporting results above the baseline, but poorly written, runs high risks *not* to be accepted; in contrast, a paper which is well written, contributing to our knowledge, but with results below the baseline, has high chances to be accepted.
  • Q: When is the deadline for submitting the results?
    A: You will need to submit results by 18 June 2012 prior to camera ready paper submission to INTERSPEECH as a result on test needs to be included in your final paper version if you want to compete for the Sub-Challenge awards. All of the five result submissions per Sub-Challenge and participant can be saved for submission as late as 18 June 2012.
  • Q: We have not received any login information to upload our classification results. What should we do?
    A: Please contact Stefan Steidl directly. He will create your account as soon as possible. In a very few cases, different persons who we assumed to be of the same team signed the license agreements for several of the three corpora. We then created only one account and sent the login information to only one of them. If you are actually in two separate teams, please inform us and we will create a separate account for you.
  • Q: Will others be able to see my results when I upload them?
    A: No, they will not. Only you will see your results and you will have them right away.
  • Q: How will the data be distributed to the participants? Are you sending the test data at the same time with training and development partitions?
    A: Yes, we do. However, labels are only given for training and development partitions. Further, we are not sending the data - you will need to download the data. Please first download, print, and sign the license agreements - one per data set once the agreements are available here - and scan and mail or fax these to the addresses given on the agreements. You will then receive an email with download instructions.
  • Q: What is the format required for submissions? Scores, predicted labels for the test samples, ... ?
    A: Yes, indeed: Per instance of the test set the identifier of the instance needs to be given followed by the predicted label. In addition, you can optionally provide a score - we will use this score information in our fusion of all participants' results. The submission of results will open soon - more information will be given at that time.
  • Q: I saw that the results for the challenge needed to be submitted in arff file format, but I am not familiar with this kind of file, how do I need to write the results file, is there any program that I can use or should I just write it like any other txt file?
    A: There is no special program - just use any text editor.
  • Q: Can I use additional databases within the Challenge?
    A: You can - as long as these are well known and accesible to everybody.
  • Q: May I participate in several Sub-Challenges?
    A: Yes, of course. Every site may participate in every Sub-Challenge and the five upload trials are per Sub-Challenge. In how many Sub-Challenges one site participates is decided at the moment you submit your paper by the results you include. Please send us an additional mail to indicate which Sub-Challenges you participate in for us to know.
  • Q: Do I have to submit a paper in order to participate?
    A: Yes, the submission of a paper is mandatory. Please make sure to select the special session during the submission procedure. The prizes will be awarded during INTERSPEECH 2012 in Portland.
  • Q: May I include results on other corpora in my paper?
    A: Yes, of course. As long as it fits the general focus these are of course very welcome.
  • Q: What are the other formalities with respect to the paper of mine?
    A: Please make sure to reference the official Paper on the Challenge and avoid repetition of the general data and Challenge description.
  • Thank you and welcome to the Challenge!



More Information will follow on a regular basis.



Document Actions
Powered by Plone

Portal usage statistics