Personal tools
You are here: Home SIGs and LIGs Speech SIG IS11-Speaker-State-Challenge


Home page of the INTERSPEECH 2011 Speaker State Challenge.


Speaker State Challenge, Interspeech 2011

Intoxication and Sleepiness

Last updated:

06 September 2011

Last addition:

Winners of the Challenge


Björn Schuller (TUM, Germany)
Stefan Steidl (ICSI, USA)
Anton Batliner (FAU Erlangen-Nuremberg, Germany)
Florian Schiel (BASSS/LMU, Germany)
Jarek Krajewski (University of Wuppertal, Germany)

Sponsored by:

HUMAINE Association
Bavarian Archive for Speech Signals

Officially started:

02 February 2011

*FINISHED* Winners of the INTERSPEECH 2011 Speaker State Challenge:
  • The Intoxication Sub-Challenge Prize is awarded to:
    Intoxicated Speech Detection by Fusion of Speaker Normalized Hierarchical Features and GMM Supervectors
  • The Sleepiness Sub-Challenge Prize is awarded to:
    Speaker State Classification Based on Fusion of Asymmetric SIMPLS and Support Vector Machines

    The organisers congratulate the winners and thank all participants for their outstanding contributions - Overall the best result could be reached by fusion of these.

    If you are interested in using the corpora of the Challenge outside this event, please contact Florian Schiel for the ALC corpus and Jarek Krajewski for the SLC corpus licenses and download.

Call for Participation as PDF.

*Get started:* License agreement ALC corpus for the dataset download (Intoxication Sub-Challenge).
*Get started:* License agreement SLC corpus for the dataset download (Sleepiness Sub-Challenges).
*Read about it:* Paper on the Challenge to be cited.

The Challenge

While the first open comparative challenges in the field of paralinguistics targeted more "conventional" phenomena such as emotion, age, and gender, there still exists a multiplicity of not yet covered, but highly relevant speaker states and traits. Thus, the INTERSPEECH 2011 Speaker State Challenge broadens the scope by addressing two less researched speaker states while focusing on the crucial application domain of security and safety: the computational analysis of intoxication and sleepiness in speech. Apart from intelligent and socially competent future agents and robots, main applications are found in the medical domain and surveillance in high-risk environments such as driving, steering or controlling. The INTERSPEECH 2011’s theme "Speech science and technology for real life" is not only generally reflected in these every-day application scenarios, but also in particular by the conditions of the Challenge such as naturalistic paralinguistic phenomena and no pre-selection of instances.

For these Challenge tasks, the ALCOHOL LANGUAGE CORPUS (ALC) and the SLEEPY LANGUAGE CORPUS (SLC) with genuine intoxicated and sleepy speech will be provided by the organisers. The first consists of 39 hours of speech, stemming from 154 speakers in gender balance, and will serve to evaluate features and algorithms for the estimation of speaker intoxication. The second features 21 hours of speech recordings of 99 subjects, annotated in 10 different levels of sleepiness. The verbal material consists of different complexity reaching from sustained vowel phonation to natural communication. The corpora further feature detailed speaker meta data, orthographic transcript, phonemic transcript, and segmentation and multiple annotation tracks. Both are given with distinct definitions of test, development, and training partitions, incorporating speaker independence as needed in most real-life settings. Benchmark results will be provided.

In these respects, the INTERSPEECH 2011 Speaker State Challenge shall help bridging the gap between excellent research on paralinguistic information in spoken language and low compatibility of results.

Two Sub-Challenges are addressed:

  • In the Intoxication Sub-Challenge, alcoholisation of speakers has to be determined as two-class classification task: alcoholised for a blood alcohol concentration (BAC) exceeding 0.5 or non-alcoholised. The Challenge measure will be the unweighted average recall of these two classes to better compensate for imbalance between classes. However, in the training and development partition also the actual BAC from 0.28-1.75 per mill is provided. This information may be used as additional information for model construction or reporting of more precise results in submitted papers on the development partition.
  • In the Sleepiness Sub-Challenge, sleepiness of speakers has to be determined by a suited algorithm and acoustic features. While the annotation provides sleepiness in ten levels, only two classes have to be recognised accordingly: sleepiness for a level exceeding level seven. Again, the full information is provided for the training and development partition and the Challenge measure is unweighted average recall of the two classes.

Both Sub-Challenges allow contributors to find their own features with their own classification algorithm. However, a standard feature set will be given per corpus that may be used. The labels of the test set will be unknown, and participants will have to stick to the definition of training, development, and test sets. They may report on results obtained on the development set, but have only a limited number of five trials to upload their results on the test set, whose labels are unknown to them. A participation has to be accompanied by a paper presenting the results that undergoes peer-review. Only contributions with an accepted paper will be eligible for the Challenge participation. The organisers preserve the right to re-evaluate the findings, but will not participate themselves in the Challenge. Participants are encouraged to compete in both Sub-Challenges.

Overall, contributions using the provided or an equivalent database are sought in (but not limited to) the following areas:

  • Participation in the Intoxication Sub-Challenge
  • Participation in the Sleepiness Sub-Challenge
  • Novel features and algorithms for the analysis of speaker state
  • Cross-corpus and cross-task feature genericity analysis
  • Exploitation of speaker trait meta-information in speaker state analysis

The results of the Challenge will be presented at Interspeech 2011 in Florence, Italy.
Prizes will be awarded to the Sub-Challenge winners.
If you are interested and planning to participate in the Speaker State Challenge, or if you want to be kept informed about the Challenge, please send the organisers an e-mail to indicate your interest.

To get started: Please obtain the license agreements (Alcohol Language Corpus) (Sleepy Language Corpus) to get a password and further instructions for the download of the datasets. Please fill these out, sign them, and fax them, accordingly. After downloading the data you can directly start your experiments with the train and development sets. Once you found your best method you should write your paper for the Special Event. At the same time you can compute your results per instance and Sub-Challenge task on the test set and upload them. We will then let you know your according performance result.

Paper on the Challenge: The introductive Paper on the Challenge provides extensive descriptions and baseline results. All participants will be asked to avoid repetitions of Challenge, data, or feature descriptions in their submissions, but include the following citation:

Schuller, B.; Steidl, S.; Batliner, A.; Schiel, F.; Krajewski, J.: “The Interspeech 2011 Speaker State Challenge”, Interspeech (2011), ISCA, Florence, Italy, 2011.

Result Submission: Should you want to participate in the Challenge, you have to upload your results as follows: Below are the templates to upload your results on the Test partitions in the WEKA ARFF file format (detailed information on the file format can also be found here). Note that exactly one prediction has to be provided for each instance and all instances have to be covered. A total of 5 uploads per participant and Sub-Challenge is allowed - these may be used until the INTERSPEECH camera ready paper submission deadline. Result confidences are optional. However, if provided, they should be in the interval [0.000;1.000] and sum up to 1 over classes per instance. The idea behind this is that we want to combine all final predictions of the participants in a ROVER or Ensemble Learning experiment to provide an impression of obtainable performance with “combined efforts”.

Result files need to be uploaded here. A username and password is sent to all participants. The results on test can directly be seen after upload. Results on the Develop partitions will need to be calculated by yourself.

Templates (Confidences are optional! The files can be made with any text editor):

Intoxication Sub-Challenge:

Please use the 2-class format as given in the example below from the baseline:


@relation MySite_ISSSC_Intoxication_Sub-Challenge
@attribute name string
@attribute assigned_class {NAL,AL}
@attribute confidence_NAL numeric
@attribute confidence_AL numeric


Sleepiness Sub-Challenge:

Please use the 2-class format as given in the example below from the baseline:


@relation MySite_ISSSC_Sleepiness_Sub-Challenge
@attribute name string
@attribute assigned_class {NSL,SL}
@attribute confidence_NSL numeric
@attribute confidence_SL numeric


Paper Submission (all participants): Please be reminded that a paper submission is mandatory for the participation in the Challenge - however, paper contributions within the scope are also welcome if the authors do not intend to participate in the Challenge itself. In any case, please submit your paper using the standard style info and length limits, and submit to the regular submission system ( ). However, you should choose only this Special Event (INTERSPEECH 2011 Speaker State Challenge) in the field 'Special sessions, special events, or show & tell ONLY'. Please further remind that

  • The deadline for submission of the papers and results is the INTERSPEECH 2011 paper submission deadline: 31 March 2011. While the paper submitted at this time needs to provide a first result on test, remaining result upload trials can be saved for new Challenge results until the camera ready deadline.
  • The papers will undergo the normal review process.
  • Papers shall not repeat the descriptions of database, labels, partitioning etc. of the ALC and SLC corpora but cite the introductive paper (cf. also above).

    Schuller, B.; Steidl, S.; Batliner, A.; Schiel, F.; Krajewski, J.: “The Interspeech 2011 Speaker State Challenge”, Interspeech (2011), ISCA, Florence, Italy, 2011.

  • Participants may contribute in both Sub-Challenges at a time.
  • A development set will allow for tests and results to be reported by the participants apart from their results on the official test set.
  • Papers may well report additional results on other databases.
  • An additional publication is planned that summarizes all results of the Challenge and results combination by ROVERING or ensemble techniques. However, this publication is assumed to be post INTERSPEECH 2011.

Frequently asked questions: you might find an answer to your questions here:
  • Q: My results are below the baseline - does it make sense to submit?
    A: Of course it does. We do not know whether the baseline will be surpassed and different experiences with the tasks on the same dataset will be of interest. Please remember that all submissions to the challenge go through the normal reviewing process. Although it is very likely that the reviewers do know - and take into account - the baselines, the criteria are the usual, i.e. scientific quality; surpassing any baseline - be this the one given for this challenge, or another one known from the literature - is just one of the criteria. A paper reporting results above the baseline, but poorly written, runs high risks *not* to be accepted; in contrast, a paper which is well written, contributing to our knowledge, but with results below the baseline, has high chances to be accepted.
  • Q: When is the deadline for submitting the results?
    A: You will need to submit results by 09 June 2011 prior to camera ready paper submission to INTERSPEECH as a result on test needs to be included in your final paper version if you want to compete for the Sub-Challenge awards. All of the 5 result submissions can be saved for submission as late as 09 June 2011.
  • Q: We have not received any login information to upload our classification results. What should we do?
    A: Please contact Stefan Steidl directly. He will create your account as soon as possible. In a very few cases, different persons who we assumed to be of the same team signed the license agreements for the two corpora. We then created only one account and sent the login information to only one of them. If you are actually in two separate teams, please inform us and we will create a separate account for you.
  • Q: Will others be able to see my results when I upload them?
    A: No, they will not. Only you will see your results and you will have them right away.
  • Q: How will the data be distributed to the participants? Are you sending the test data at the same time with training and development partitions?
    A: Yes, we do. However, labels are only given for training and development partitions. Further, we are not sending the data - you will need to download the data. Please first download, print, and sign the license agreements - one per data set (ALC and SLC) - and scan and mail or fax these to the addresses given on the agreements. You will then receive an email with download instructions.
  • Q: What is the format required for submissions? Scores, predicted labels for the test samples, ... ?
    A: Yes, indeed: Per instance of the test set the identifier of the instance needs to be given followed by the predicted label. In addition, you can optionally provide a score - we will use this score information in our fusion of all participants' results. The submission of results will open soon - more information will be given at that time.
  • Q: You measure the BAC in “per mill”?
    A: The BAC is measured in permille BAC by volume. This is the standard way of measuring BAC in most central and eastern European countries. However, in Australia, Canada, and the Unites States BAC is usually measured in percent BAC by volume - thus, the range of the ALC corpus resembles 0.028 to 0.175 per cent. There are other ways of measuring BAC, as in Great Britain, where basis points by volume are used, or permille by BAC per mass (Scandinavia) or part per million.
  • Q: It is not obvious whether the 4k feature vector relates to a frame, a segment or a recording.
    A: Indeed - we will add a note in the final version of the paper. It relates to a recording.
  • Q: In Section 3 of the paper on the Challenge, you define a segment boundary by reference to “the signal contour’s value”, but I don’t think you state in the paper what the signal contour or its value is.
    A: Signal contour refers to the low-level descriptor (LLD) contour, i.e., all samples of a specific low-level descriptor in the analysed segment / the time series of low-level descriptor values after simple moving average low-pass filtering (3 frames). In the supra segmental approach all functionals (e.g. mean, standard deviation, linear regression, and also the number of peaks and segments found in this time series of values) are applied to this time series, which we refer to as a contour. Specifically for the "Segment" functional we assign the beginning of a new segment to the sample which is more than 20% larger (or smaller) then a short term average of the previous samples, thus having segment boundaries at points in time where the signal rapidly rises.

  • Thank you and welcome to the Challenge!

    More Information will follow on a regular basis.

  • Document Actions
    Powered by Plone

    Portal usage statistics