Personal tools
You are here: Home SIGs and LIGs Speech SIG Emotion Challenge

Emotion Challenge

Start page for the INTERSPEECH 2009 Emotion Challenge.


Emotion Challenge, Interspeech 2009

Feature, Classifier, and Open Performance Comparison for Non-Prototypical Spontaneous Emotion Recognition


Bjoern Schuller (Technische Universitaet Muenchen, Germany)
Stefan Steidl (FAU Erlangen-Nuremberg, Germany)
Anton Batliner (FAU Erlangen-Nuremberg, Germany)

Sponsored by:

HUMAINE Association
Deutsche Telekom Laboratories

Final Results and "Lessons Learnt".

Call for Papers as PDF.
*Get started:* License agreement for the dataset download.
*Read about it:* Paper on the challenge to be cited.
*Participate:* Result submission is now open.
*FAQ:* Frequently asked questions now summarised and updated.

*Closed:* The submission of results for competition is now closed. However, each participating site may still upload results for their own experiments until their maximum amount of 25 uploads has been reached. The labels of the test set are now available within the general release of the FAU AIBO corpus.
Results were presented and awards granted during the session and the closing session at INTERSPEECH 2009. We would like to thank all the participants for their great contributions!

The Challenge

The young field of emotion recognition from voice has recently gained considerable interest in Human-Machine Communication, Human-Robot Communication, and Multimedia Retrieval. Numerous studies have been seen in the last decade trying to improve on features and classifiers. However, in comparison to related speech processing tasks such as Automatic Speech and Speaker Recognition, practically no standardised corpora and test-conditions exist to compare performances under exactly the same conditions. Instead, a multiplicity of evaluation strategies employed such as cross-validation or percentage splits without proper instance definition, prevents exact reproducibility. Further, to face more realistic use-cases, the community is in desperate need of more spontaneous and less prototypical data.

In these respects, the INTERSPEECH 2009 Emotion Challenge shall help bridging the gap between excellent research on human emotion recognition from speech and low compatibility of results: the FAU Aibo Emotion Corpus of spontaneous, emotionally coloured speech, and benchmark results of the two most popular approaches will be provided by the organisers. Nine hours of speech (51 children) were recorded at two different schools. This allows for distinct definition of test and training partitions incorporating speaker independence as needed in most real-life settings. The corpus further provides a uniquely detailed transcription of spoken content with word boundaries, non-linguistic vocalisations, emotion labels, units of analysis, etc.

Three sub-challenges are addressed in two different degrees of difficulty by using non-prototypical five or two emotion classes (including a garbage model):

  • The Open Performance Sub-Challenge allows contributors to find their own features with their own classification algorithm. However, they will have to stick to the definition of test and training sets.
  • In the Classifier Sub-Challenge, participants may use a large set of standard acoustic features provided by the organisers in the well known ARFF file format (WEKA) for classifier tuning. Features may be sub-sampled, altered and combined (e.g. by standardisation or analytical functions), the training bootstrapped, and several classifiers combined by e.g. ROVER or Ensemble Learning or side tasks learned as gender, etc. However, the audio files may not be used in this task.
  • In the Feature Sub-Challenge, participants are encouraged to upload their individual best features per unit of analysis with a maximum of 100 per contribution following the example of provided feature files. These features will then be tested by the organisers with equivalent settings in one classification task, and pooled together in a feature selection process. In particular, novel, high-level, or perceptually adequate features are sought-after.

The labels of the test set will be unknown, and all learning and optimisations need to be based only on the training material. However, each participant can upload instance predictions to receive the confusion matrix and results up to 25 times. The format will be instance and prediction and optionally additional probabilities per class. This allows a final fusion by e. g. ROVER or meta-classification of all participants’ results to demonstrate the potential maximum by combined efforts. As classes are unbalanced, the primary measure to optimise will be unweighted average (UA) recall, and secondly the weighted average (WA) recall (i. e. accuracy). The organisers will not take part in the sub-challenges but provide baselines. Participants are encouraged to compete in multiple sub-challenges.

Overall, contributions using the provided or an equivalent database are sought in (but not limited to) the following areas:

  • Participation in any of the sub-challenges
  • Speaker adaptation for emotion recognition
  • Noise/coding/transmission robust emotion recognition
  • Effects of prototyping on performance
  • Confidences in emotion recognition
  • Contextual knowledge exploitation

The results of the Challenge will be presented at a Special Session of Interspeech 2009 in Brighton, UK.
Prizes will be awarded to the sub-challenge winners and a best paper.
If you are interested and planning to participate in the Emotion Challenge, or if you want to be kept informed about the Challenge, please send the organisers an e-mail to indicate your interest.

To get started: Obtain the license agreement to get a password and further instructions for the dataset download here. Please fill it out, sign it, and fax it, accordingly.

Paper on the Challenge: The introductive paper on the challenge is now ready for download here. This paper provides extensive descriptions and baseline results. All participants are asked to avoid repetitions of challenge, data, or feature descritptions in their submissions, but include the following citation:

Schuller, B.; Steidl, S.; Batliner, A.: “The Interspeech 2009 Emotion Challenge”, Interspeech (2009), ISCA, Brighton, UK, 2009.

Background Information: You can now also download the book describing the data-set in detail from the password restricted area (after signing and submitting the license agreement).

Paper Submission (all participants): Each contribution in the challenge shall be accompanied by a paper submitted to the special session "INTERSPEECH 2009 Emotion Challenge" with the following conditions:

  • The deadline for submission of the papers and results is the INTERSPEECH 2009 paper submission deadline: 17th of April 2009. Note that this date will not be extended.
  • The papers will undergo the normal review process.
  • Papers shall not repeat the descriptions of database, labels, partitioning etc. of the FAU Aibo emotion corpus but cite the introductive paper (cf. above).
  • Participants may contribute in all sub-challenges at a time.
  • Papers may well report additional results on other databases.
  • An additional publication is planned that summarizes all results of the challenge and results combination by ROVERING or ensemble techniques. However, this publication is assumed to be post INTERSPEECH 2009.
Submissing of Results (Open Performance and Classifier Sub-Challenges): These are the templates to upload your results in the WEKA ARFF file format (detailed information on the file format can also be found here) for the 2- and 5-class tasks. Note that exactly one prediction has to be provided for each instance. A total of 25 uploads per participant is allowed. Class probabilities are optional. However, if provided, they should be in the interval [0.000;1.000] and sum up to 1 over classes per instance. The idea behind this is that we want to combine all final predictions of the participants in a ROVER or Ensemble Learning experiment to provide an impression of obtainable performance with “combined efforts”.

*New*: Result files need to be uploaded here. User accounts and passwords are sent to the registered participants by email.

@relation MySite_ISEC09_2-class
@attribute file_id string
@attribute assigned_class {NEG,IDL}
@attribute probability_IDL numeric
@attribute probability_NEG numeric


@relation MySite_ISEC09_5-class
@attribute file_id string
@attribute assigned_class {A,E,N,P,R}
@attribute probability_E numeric
@attribute probability_A numeric
@attribute probability_N numeric
@attribute probability_R numeric
@attribute probability_P numeric


Result Submission (Feature Sub-Challenge): For the feature files please follow the example of the ARFF files provided by the organisers. Here, one file for train and one for test has to be sent. Only one feature file per participant with a maximum of 100 features may be uploaded. Please use descriptive names for your features and zip the files. An additional readme with detailed explanation of the features may be additionally attached. Once ready and zipped please submit your feature file as mail attachment.

Transliteration: If you should want to use the transliteration of the test partition the following notes may be of interest:
  • The lexicon file is plain ASCII
  • "*" in the first column denotes non-words/fragments
  • "**" in the transcription denotes verbal noise that cannot be transcribed. However, in the lexicon the one line with "**" is a dummy that should be deleted.
  • The coding is SAMPA. Each character represents one phone, apart from ":" - this characterizes lengthening of the left adjacent vowel, e.g.: a: e: u: etc.
  • We denote glottal stop at vocalic word onset with "?" - this can be found within compounds as well
  • We denote syllable boundary with "|"
  • We denote word stress position with "'"
  • If your ASR cannot deal with these phenomena, you simply can delete these characters: ? | '

Frequently asked questions: you might find an answer to your questions here:
  • Q: May I participate in several sub-challenges?
    A: Yes, of course. Every site may participate in every sub-challenge. In how many sub-challenges one site participates is decided at the moment you submit your paper by the results you include. Please send us an additional mail to indicate which challenges you participate in for us to know. However, per site only 25 uploads of results are allowed. By this we aim to avoid brute-force upload to reveal the actual test-set labelling.
  • Q: Do I have to submit a paper in order to participate?
    A: Yes, the submission of a paper is mandatory. Please make sure to select the special session during the submission procedure. The prizes will be awarded during the Special Session in Brighton.
  • Q: What are the deadlines?
    A: The only deadline is the official INTERSPEECH paper submission deadline on Friday, 17 April 2009. Note that this is a strict deadline and no prolongation is to be expected as stressed by the organisers.
  • Q: May I include results on other corpora in my paper?
    A: Yes, of course. As long as it fits the general focus these are of course very welcome.
  • Q: What are the other formalities with respect to the paper of mine?
    A: Please make sure to reference the official paper on the challenge and avoid repetition of the general data and challenge description.
  • Q: Why are there cases that are attributed to different classes in the 5-class and 2-class problem (e.g. Neutral in the 5-class task, but NEGative in the 2-class task)?
    A: That is okay. We do not map the labels of the 5-class onto the 2-class task, but generate in both cases the chunk labels from the original word-based labels (five per word). Thus, for some rare mixed cases this ambivalence does occurr.
  • Q: The DC component in some .wav files is not zero (e.g. Train_08_000_00.wav). Why?
    A: The non-zero DC component in some audio files are recording artifacts and do not come from manually editing the files. Unfortunately, we do not have a list of files that are affected.
  • Thank you and welcome to the challenge!

Last updated: November 27th, 2009
Document Actions
Powered by Plone

Portal usage statistics