Table 1: Multimodal databases (Note: in Column 2 ‘audiovisual’ refers to speech and face unless otherwise indicated)
|
Identifier |
Modalities |
Emotional content |
Emotion elicitation methods |
Size |
Nature of material |
Language |
| HUMAINE Database Stage 1 (described in full and downloadable from www.emotion-research.net/download/pilot-db/) |
Audiovisual + gesture .avi files readable in ANVIL (see www.dfki.de/~kipp/anvil/ datafiles containing emotion labels, gesture labels, speech labels and FAPS all readable in ANVIL |
Wide range labelled for emotional content at 2 levels (ii) Time-aligned emotion labels assigned using the TRACE set of programs designed at Queen's University Belfast. Raters use a mouse to trace his/her perceived impressions of the emotional state of the speaker in the clip continuously over time on a one dimensional axis (e.g. intensity, activation/arousal, valence, power). Traces are available on seven dimensions. |
Naturalistic (including some clips from Belfast Naturalistic Database, Castaway Reality TV) and induced material (from Belfast Sensitive Artificial Listener, Belfast Spaghetti Data, Belfast Activity Data, EmoTaboo) |
50 clips ranging from 5 seconds to 3 minutes (16 clips currently availablewith ethical clearance and labelled for emotional content, emotional context and some signs of emotion – speech signs for all 16, and gesture and FAPS for one exemplar clip) |
Mainly interactive discourse |
English and some French and Hebrew clips |
|
Belfast Naturalistic Database (Douglas-Cowie et al 2000, 2003) |
Audio- visual |
Wide range |
Natural: 10-60 sec long ‘clips’ taken from television chat shows, current affairs programmes and interviews conducted by research team |
125 subjects; 31 male, 94 female |
Interactive unscripted discourse |
English |
|
Geneva Airport Lost Luggage Study (Scherer & Ceschi 1997; 2000) |
Audio-visual |
Anger, good humour, indifference, stress sadness |
Natural: unobtrusive videotaping of passengers at Geneva airport lost luggage counter followed up by interviews with passengers |
109 subjects |
Interactive unscripted discourse |
|
|
Chung (Chung 2000) |
Audio-visual |
Joy, neutrality, sadness (distress) |
Natural: television interviews in which speakers talk on a range of topics including sad and joyful moments in their lives |
77 subjects; 6 1 Korean speakers, 6 Americans |
Interactive unscripted discourse |
English and Korean |
|
SMARTKOM www.phonetik.uni-muenchen.de/Bas/BasMultiModaleng.html#SmartKom |
Audio-visual, ( +gestures) |
Joy,gratification,anger, irritation,helplessness, pondering, reflecting, surprise, neutral |
Human machine in WOZ scenario: solving tasks with system |
224 speakers; 4/5 minute sessions |
Interactive discourse |
German |
|
Amir et al. (Amir et al, 2000) |
Audio + physiological(EMG,GSR, Heart Rate, Temperature, Speech) |
Anger, disgust, fear, joy, neutrality, sadness |
Induced: subjects asked to recall personal experiences involving each of the emotional states |
140 subjects 60 Hebrew speakers 1 Russian speakers |
Non interactive, unscripted discourse |
Hebrew Russian |
|
SALAS database (http://www.image.ntua.gr/ermis/ IST-2000-29319, D09) |
Audio-visual |
Wide range of emotions/emotion related states but not very intense |
Induced: subjects talk to artificial listener & emotional states are changed by interaction with different personalities of the listener |
Pilot study of 20 subjects |
Interactive discourse Subjects unscripted Machine scripted |
English (Greek version being developed) |
|
ORESTEIA database (McMahon et al. 2003) |
Audiol + physiological (some visual data too) |
Stress, irritation, shock |
Induced: subjects encounter various problems while driving (deliberately positioned obstructions, dangers, annoyances ‘on the road’ |
29 subjects, 90min sessions per subject |
Non interactive speech: giving directions, giving answers to mental arithmetic etc |
English |
|
Belfast Boredom database (Cowie et al. 2003) |
Audio-visual |
Boredom |
Induced |
12 subjects: 30 minutes each |
Non interactive speech: naming objects on computer screen |
English |
|
XM2VTSDB multi-modal face database http://www.ee.surrey.ac.uk/Research/VSSP/xm2vtsdb/ |
Audio-visual |
None |
n/a |
295 subjects ; Video |
High quality colour images, 32 KHz 16-bit sound files, video sequences and a 3d Model + profiles ( left-profile and one right profile image per person, per session, a total of 2,360 images), scripted 4 sentences |
English |
|
ISLE project corpora (http://isle.nis.sdu.dk/, IST project IST-1999-10647) |
Audio-visual + gesture |
None |
n/a |
unclear |
|
|
|
Polzin (Polzin, 2000) |
Audio- visual (though only audio channel used) |
Anger, sadness, neutrality (other emotions as well, but in insufficient numbers to be used) |
Acted: sentence length segments taken from acted movies |
Unspecified no of speakers. Segment numbers 1586 angry, 1076 sad, 2991 neutral |
Scripted |
English |
|
Banse and Scherer (Banse and Scherer 1996) |
Audio- visual (visual info used to verify listener judgements of emotion) |
Anger (hot), anger (cold), anxiety, boredom, contempt, disgust, elation, fear (panic), happiness, interest, pride, sadness, shame |
Acted: actors were given scripted eliciting scenarios for each emotion , then asked to act out the scenario. |
12 (6 male, 6 female) |
Scripted: 2 semantically neutral sentences (nonsense sentences composed of phonemes from Indo-European languages) |
German |
Table 2: Speech databases
|
Identifier |
Emotional content |
Emotion elicitation methods |
Size |
Nature of material |
Language |
|
TALKAPILLAR (Beller, 2005) |
neutral, happiness, question, positive and negative surprised, angry, fear, disgust, indignation, |
Contextualised acting: actors asked to read semantically neutral sentences in range of emotions, but practised on emotionally loaded sentences beforehand to get in the right mood |
1 actor reading 26 semantically neutral sentences for each emotion (each repeated 3 times in different activation level : low,middle,high) |
Non interactive and scripted |
French |
|
Reading-Leeds database (Greasley et al., 1995; Roach et al., 1998, Stibbard 2001) |
Range of full blown emotions |
Natural: Unscripted interviews on radio/television in which speakers are asked by interviewers to relive emotionally intense experiences |
Around 4 ½ hours material |
Interactive unscripted discourse |
English |
|
France et al. (France et al., 2000) |
Depression, suicidal state, neutrality |
Natural: therapy sessions & phone conversations. Post therapy evaluation sessions were also used to elicit speech for the control subjects |
115 subjects: 48 females 67 males. Female sample: 10 controls (therapists), 17 dysthymic, 21 major depressed Male sample: 24 controls (therapists), 21 major depressed , 22 high risk suicidal |
Interactive unscripted discourse |
English |
|
Campbell CREST database, ongoing (Campbell 2002; see also Douglas-Cowie et al. 2003) |
Wide range of emotional states and emotion-related attitudes |
Natural: volunteers record their |
Target - 1000 hrs over 5 years |
Interactive unscripted discourse |
English Japanese Chinese |
|
Capital Bank Service and Stock |
Mainly negative - fear, anger, stress |
Natural: call center human-human interactions |
Unspecified (still being labelled) |
Interactive unscripted discourse |
English |
|
SYMPAFLY (as used by Batliner et al. 2004b) |
Joyful, neutral, emphatic, surprised, ironic, helpless, touchy, angry, panic |
Human machine dialogue system |
110 dialogues, 29.200 words (i.e. tokens, not vocabulary) |
Naïve users book flights using machine dialogue system |
German |
|
DARPA Communicator corpus (as used by Ang et al. 2002) See Walker et al. 2001 |
Frustration, annoyance |
Human machine dialogue system |
Extracts from recordings of simulated interactions with a call centre, average length about 2.75 words 13187 utterances in total of which 1750 are emotional: 35 unequivocally frustrated, 125 predominantly frustrated, 405 unequivocally frustrated or annoyed, 1185 predominantly frustrated or annoyed |
Users called systems built by various sites and made air travel arrangements over the phone |
English |
|
AIBO (Erlangen database) (Batliner et al. 2004a) |
Joyful, surprised, emphatic, helpless, touchy (irritated), angry, motherese, bored, reprimanding, neutral |
Human machine: interaction with robot |
51 german children, 51.393 words (i.e. tokens, not > vocabulary) English (Birmingham): 30 children, 5.822 words (i.e. > tokens, not> vocabulary) |
Task directions to robot |
German |
|
Fernandez et al. (Fernandez et al. 2000, 2003) |
Stress |
Induced: subjects give verbal responses to maths problems in simulated driving context |
Data reported from 4 subjects |
Unscripted numerical answers to mathematical questions |
English |
|
Tolkmitt and Scherer (Tolkmitt and Scherer, 1986) |
Stress (both cognitive & emotional) |
Induced: 2 types of stress (cognitive and emotional) were induced through slides. Cognitive stress induced through slides containing logical problems; emotional stress induced through slides of human bodies showing skin disease/accident injuries |
60 (33 male, 27 female) |
Partially scripted: subjects made 3 vocal responses to each slide within a 40sec presentation period - a numerical answer followed by 2 short statements. The start of each was scripted and subjects filled in the blank at the end, e.g. ‘Die Antwort ist Alternative …’ |
German |
|
Iriondo et al. (Iriondo et al., 2000) |
Desire, disgust, fury, fear, joy, surprise, sadness |
Contextualised acting: subjects asked to read passages written with appropriate emotional content |
8 subjects reading paragraph length passages (20-40mmsec each) |
Non interactive and scripted |
Spanish |
|
Mozziconacci (Mozziconacci, 1998) Note: database recorded at IPO for SOBUproject 92EA. |
Anger, boredom, fear, disgust, guilt, happiness, haughtiness, indignation, joy, rage, sadness, worry, neutrality |
Contextualised acting: actors asked to read semantically neutral sentences in range of emotions, but practised on emotionally loaded sentences beforehand to get in the right mood |
3 subjects reading 8 semantically neutral sentences (each repeated 3 times) |
Non interactive and scripted |
Dutch |
|
McGilloway (McGilloway, 1997; Cowie and Douglas-Cowie, 1996) |
Anger, fear, happiness, sadness, neutrality |
Contextualised acting: subjects asked to read passages written in appropriate emotional tone and content for each emotional state |
40 subjects reading 5 passages each |
Non interactive and scripted |
English |
|
Belfast structured Database An extension of McGilloway database above (Douglas- Cowie et al. 2000) |
Anger, fear, happiness, sadness, neutrality |
Contextualised acting: subjects read 10 McGilloway- style passages AND 10 other passages - scripted versions of naturally occurring emotion in the Belfast Naturalistic Database |
50 subjects reading 20 passages |
Non interactive and scripted |
English |
|
Danish Emotional Speech Database (Engberg et al., 1997) |
Anger, happiness sadness, surprise neutrality |
Acted |
4 subjects read 2 words, 9 sentences & 2 passages in a range of emotions |
Scripted (material not emotionally coloured) |
Danish |
|
Groningen ELRA corpus number S0020 (www.icp.inpg.fr/ELRA) |
Database only partially oriented to emotion |
Acted |
238 subjects reading 2 short texts |
Scripted |
Dutch |
|
Berlin database (Kienast & Sendlmeier 2000; Paeschke & Sendlmeier 2000) |
Anger- hot, boredom, disgust, fear- panic, happiness, sadness-sorrow, neutrality |
Acted |
10 subjects (5 male, 5 female) reading 10 sentences each |
Scripted (material selected to be semantically neutral) |
German |
|
Pereira (Pereira, 2000) |
Anger (hot), anger (cold), happiness, sadness, neutrality |
Acted |
2 subjects reading 2 utterances each |
Scripted (1 emotionally neutral sentence, 4 digit number) each repeated |
English |
|
van Bezooijen (van Bezooijen, 1984) |
Anger, contempt disgust, fear, interest joy, sadness shame, surprise, neutrality |
Acted |
8 (4 male, 4 female) reading 4 phrases |
Scripted (semantically neutral phrases) |
Dutch |
|
Abelin (Abelin 2000) |
Anger, disgust, dominance, fear, joy, sadness, shyness, surprise |
Acted |
1 subject |
Scripted (semantically neutral phrase) |
Swedish |
|
Yacoub et al (2003) (data from LDC, www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2002S28 |
15 emotions Neutral, hot anger, cold anger, happy, sadness, disgust, panic, anxiety, despair, elation, interest, shame, boredom, pride, contempt |
Acted |
2433 utterances from 8 actors |
