Personal tools
You are here: Home Projects humaine EARL HUMAINE Emotion Annotation and Representation Language (EARL): Proposal

HUMAINE Emotion Annotation and Representation Language (EARL): Proposal

Version 0.4.0, 30 June 2006


This report proposes a syntax for an XML-based language for representing and annotating emotions in technological contexts. In contrast to existing markup languages, where emotion is often represented in an ad-hoc way as part of a specific language, we propose a language aiming to be usable in a wide range of use cases, including corpus annotation as well as systems capable of recognising or generating emotions. We describe the scientific basis of our choice of emotion representations and the use case analysis through which we have determined the required expressive power of the language. We illustrate core properties of the proposed language using examples from various use case scenarios.

Table of Contents



 

1 Introduction

Representing emotional states in technological environments is necessarily based on some representation format. Ideally, such an Emotion Annotation and Representation Language (EARL) should be standardised to allow for data exchange, re-use of resources, and to enable system components to work together smoothly.

As there is no agreed model of emotion, creating such a unified representation format is difficult. In addition, the requirements coming from different use cases vary considerably. We have nevertheless formulated a first suggestion, leaving much freedom to the user to “plug in” their preferred emotion representation. The possibility to map one representation to another will make the format usable in heterogeneous environments where no single emotion representation can be used.

2 Different Descriptive Schemes for Emotions

A unified theory or model of emotional states currently does not exist [1]. Out of the range of existing types of descriptions, we focus on three that may be relevant when annotating corpora, or that may be used in different components of an emotion-oriented technological system.

Categorical representations are the simplest and most wide-spread, using a word to describe an emotional state. Such category sets have been proposed on different grounds, including evolutionarily basic emotion categories [2]; most frequent everyday emotions [3]; application-specific emotion sets [4]; or categories describing other affective states, such as moods or interpersonal stances [5].

Dimensional descriptions capture essential properties of emotional states, such as arousal (active/passive) and valence (negative/positive) [6]. Emotion dimensions can be used to describe general emotional tendencies, including low-intensity emotions.

Appraisal representations [7] characterise emotional states in terms of the detailed evaluations of eliciting conditions, such as their familiarity, intrinsic pleasantness, or relevance to one’s goals. Such detail can be used to characterise the cause or object of an emotion as it arises from the context, or to predict emotions in AI systems [8,9].

3 Use Cases and Requirements for an Emotion Annotation and Representation Language

In order to ensure that the expressive power of the representation language will make it suitable for a broad range of future applications, the design process for EARL was initiated by performing a collection of use cases among members of HUMAINE. This list of use cases for emotional representations comprises i) manual annotation of emotional content of (multimodal) databases, ii) affect recognition systems and iii) affective generation systems such as speech synthesizers or embodied conversational agents (ECAs). On the basis of these use cases and the survey of theoretical models of emotions, a first list of requirements for EARL was compiled, which subsequently underwent discussion and refinement by a considerable number of HUMAINE participants.

Among the different use cases, the annotation of databases poses the most refined and extended list of requirements, which also covers the requirements raised in systems for recognition or generation.

In the simplest case, text is marked up with categorical labels only. More complex use cases comprise time-varying encoding of emotion dimensions [6], independent annotation of multiple modalities, or the specification of relations between emotions occurring simultaneously (e.g. blending, masking) [3].

EARL is thus requested to provide means for encoding the following types of information.

Emotion descriptor. No single set of labels can be prescribed, because there is no agreement – neither in theory nor in application systems – on the types of emotion descriptors to use, and even less on the exact labels that should be used. EARL has to provide means for using different sets of categorical labels as well as emotion dimensions and appraisal-based descriptors of emotion.

Intensity of an emotion, to be expressed in terms of numeric values or discrete labels.

Regulation types, which encode a person’s attempt to regulate the expression of her emotions (e.g., simulate, hide, amplify).

Scope of an emotion label, which should be definable by linking it to a time span, a media object, a bit of text, a certain modality etc.

Combination of multiple emotions appearing simultaneously. Both the co-occurrence of emotions as well as the type of relation between these emotions (e.g. dominant vs. secondary emotion, masking, blending) should be specified.

Probability expresses the labeller’s degree of confidence with the emotion label provided.

In addition to these information types included in the list of requirements, a number of additional items were discussed. Roughly these can be grouped into information about the person (i.e. demographic data but also personality traits), the social environment (e.g., social register, intended audience), communicative goals, and physical environment (e.g. constraints on movements due to physical restrictions). Though the general usefulness of many of these information types is undisputed, they are intentionally not part of the currently proposed EARL specification. If needed, they have to be specified in domain-specific coding schemes that embed EARL. It was decided to draw the line rather strictly and concentrate on the encoding of emotions in the first place, in order to ensure a small but workable representation core to start with. The main rational to justify this restrictive approach was to first provide a simple language for encoding emotional states proper, and to leave out the factors that may have led to the actual expression of this state. Thus, EARL only encodes the fact that a person is, e.g., trying to hide certain feelings, but not the fact that this is due to a specific reason such as social context. Clearly, more discussion is needed to refine the limits of what should be part of EARL.

4 Proposed Realisation in XML

We propose an extendible, XML-based language to annotate and represent emotions, which can easily be integrated into other markup languages, which allows for the mapping between different emotion representations, and which can easily be adapted to specific applications.

Our proposal shares certain properties with existing languages such as APML [10], RRL [8], and EmoTV coding scheme [3], but was re-designed from scratch to account for the requirements compiled from theory and use cases. We used XML Schema Definition (XSD) to specify the EARL grammar, which allows us to define abstract datatypes and extend or restrict these to specify a particular set of emotion categories, dimensions or appraisals.

The following sections will present some core features of the proposed language, using illustrations from various types of data annotation. A structured specification table is given in Appendix A; the actual XML Schema, formally defining the language, is included as Appendix B.

4.1 Simple emotions

In EARL, emotion tags can be simple or complex. A simple <emotion> uses attributes to specify the category, dimensions and/or appraisals of one emotional state. Emotion tags can enclose text, link to other XML nodes, or specify a time span using start and end times to define their scope.

One design principle for EARL was that simple cases should look simple. For example, annotating text with a simple “pleasure” emotion results in a simple structure:

<emotion category="pleasure">Hello!</emotion>

Annotating the facial expression in a picture file face12.jpg with the category “pleasure” is simply:

<emotion xlink:href="face12.jpg" category="pleasure"/>

This “stand-off” annotation, using a reference attribute, can be used to refer to external files or to XML nodes in the same or a different annotation document in order to define the scope of the represented emotion.

In uni-modal or multi-modal clips, such as speech or video recordings, a start and end time can be used to determine the scope:

<emotion start="0.4" end="1.3" category="pleasure"/>

Besides categories, it is also possible to describe a simple emotion using emotion dimensions or appraisals:

<emotion xlink:href="face12.jpg" arousal="-0.2" valence="0.5" power="0.2"/>
<emotion xlink:href="face12.jpg" suddenness="-0.8" intrinsic_pleasantness="0.7" goal_conduciveness="0.3" relevance_self_concerns="0.7"/>

EARL is designed to give users full control over the sets of categories, dimensions and/or appraisals to be used in a specific application or annotation context (see below).

Information can be added to describe various additional properties of the emotion: an emotion intensity; a probability value, which can be used to reflect the (human or machine) labeller’s confidence in the emotion annotation; a number of regulationattributes, to indicate attempts to suppress, amplify, attenuate or simulate the expression of an emotion; and a modality, if the annotation is to be restricted to one modality.

For example, an annotation of a face showing obviously simulated pleasure of high intensity:

<emotion xlink:href="face12.jpg" category="pleasure" simulate="1.0" intensity="0.9"/>

In order to clarify that it is the face modality in which a pleasure emotion is detected with moderate probability, we can write:

<emotion xlink:href="face12.jpg" category="pleasure" modality="face" probability="0.5"/>

In combination, these attributes allow for a detailed description of individual emotions that do not vary in time.

4.2 Complex emotions

A <complex-emotion> describes one state composed of several aspects, for example because two emotions co-occur, or because of a regulation attempt, where one emotion is masked by the simulation of another one.

For example, to express that an expression could be either pleasure or friendliness, one could annotate:

<complex-emotion xlink:href="face12.jpg">
  <emotion category="pleasure" probability="0.5"/>
<emotion category="friendliness" probability="0.5"/>
</complex-emotion>

The co-occurrence of a major emotion of “pleasure” with a minor emotion of “worry” can be represented as follows.

<complex-emotion xlink:href="face12.jpg">
  <emotion category="pleasure" intensity="0.7"/>
  <emotion category="worry" intensity="0.5"/>
</complex-emotion>

Simulated pleasure masking suppressed annoyance would be represented:

<complex-emotion xlink:href="face12.jpg">
  <emotion category="pleasure" simulate="0.8"/>
  <emotion category="annoyance" suppress="0.5"/>
</complex-emotion>

The numeric values for “simulate” and “suppress” indicate the amount of regulation going on, on a scale from 0 (no regulation) to 1 (strongest regulation possible).  The above example corresponds to strong indications of simulation while the suppression is only halfway successful.

If different emotions are to be annotated for different modalities in a multi-modal clip, there are in principle two choices. On the one hand, it is possible to describe them as different aspects of one complex emotion, and thus share the same scope:

<complex-emotion xlink:href="clip23.avi">
  <emotion category="pleasure" modality="face"/>
  <emotion category="worry" modality="voice"/>
</complex-emotion>

When scope is defined in terms of start and end times, it seems unlikely that different emotional expressions start and end at exactly the same moments in different modalities.

Such expressions in the different modalities can be described as separate events, each with their own temporal scope:

<emotion start="0" end="1.9" category="pleasure" modality="face"/>
<emotion start="0.4" end="1.3" category="worry" modality="voice"/>

Both options for modality-specific annotation co-exist, and it remains to be seen which is more useful.

It may be important to note explicitly that complex patterns of overlapping emotions can easily be represented using multiple, independent <emotion> or <complex-emotion> tags. Consider for example the case that during an expression of pleasure, a complex expression that could be worry or boredom appears, they co-exist for some time, and after the pleasure expression ends, the worry-or-boredom expression continues:

<emotion start="0" end="1.9" category="pleasure"/>
<complex-emotion start="1.4" end="3.3">
  <emotion category="worry" probability="0.5"/>
  <emotion category="boredom" probability="0.5"/>
</complex-emotion>

Insofar, the constraint of a <complex-emotion> referring to a single scope is not a limitation in expressivity, but simply means that several elements need to be used if more complex patterns are to be described.

4.3 Annotating time-varying signals

Two modes are previewed for describing emotions that vary over time. They correspond to types of annotation tools used for labelling emotional database. The Anvil [11] approach consists in assigning a (possibly complex) label to a time span in which a property is conceptualised as constant. This can be described with the start and end attributes presented above.

The Feeltrace [6] approach consists in tracing a small number of dimensions continuously over time. In EARL, we propose to specify such time-varying attributes using embedded <samples> tags.

For example, a curve annotated with Feeltrace describing a shift from a neutral state to an active negative state would be realised using two <samples> elements, one for each dimension:

<emotion start="2" end="2.7">
  <samples value="arousal" rate="10">
    0 .1 .25 .4 .55 .6 .65 .66
  </samples>
  <samples value="valence" rate="10">
    0 -.1 -.2 -.25 -.3 -.4 -.4 -.45
  </samples>
</emotion>

The output of more recent descendents of Feeltrace, which can be used to annotate various regulations or appraisals, can be represented in the same way. A sudden drop in the appraisal  “consonant with expectation” can be described:

<emotion start="2" end="2.7">
  <samples value="consonant_with_expectation" rate="10">
    .9 .9 .7 .4 .1 -.3 -.7 -.75
  </samples>
</emotion>

An emotion of anger being increasingly suppressed over time can similarly be described:

<emotion start="2" end="2.7" category="anger">
  <samples value="suppress" rate="10">
    .1 .2 .3 .4 .4 .5 .6 .6
  </samples>
</emotion>

This relatively simple set of XML elements addresses many of the collected requirements.

5 A family of EARL Dialects: XML Schema design

Our suggested solution to the dilemma that no agreed emotion representation exists is to clearly separate the definition of an EARL document’s structure from the concrete emotion labels allowed, in a modular design. Each concrete EARL dialect is defined by combining a base XML schema, which defines the structure, and three XML schema “plugins”, containing the definitions for the sets of emotion categories, dimensions and appraisal tags, respectively. Different alternatives for each of these plugins exist, defining different sets of category labels, dimensions and appraisals. Figure 1 illustrates the idea.


Figure 1: Illustration of how different sets of emotion descriptors are combined to form a concrete EARL dialect

For example, to allow emotions to be described by a core set of 27 categories describing everyday emotions in combination with two emotion dimensions, the EARL dialect would combine the base schema with the corresponding plugins for the 27 categories and the two dimensions, and the “empty set” plugin for appraisals. Another EARL dialect, describing emotions in terms of four application-specific categories, would combine the base schema with an application-specific category plugin and two “empty set” plugins for dimensions and appraisals.

Even though EARL will provide users with the freedom to define their own emotion descriptor plugins, a default set of categories, dimensions and appraisals will be proposed, which can be used if there are no strong reasons for doing otherwise.

6 Inventories of categories, dimensions and appraisals

Even though the EARL design is “pluggable” in the sense just described, some degree of normation should be attempted by proposing inventories of categories, dimensions and appraisals that can be used by default, i.e. when no application-specific requirements demand different inventories.

Such default inventories should be in line with contemporary theory as much as possible; work in HUMAINE WP3 appears most promising in view of emotion-oriented computing.

6.1 Categories

Douglas-Cowie, Cox et al. (see HUMAINE deliverable D5f) propose a list of 48 emotion categories consolidated from various sources. The table is reproduced as Table 1, and the 48 categories are proposed as the default category set for EARL.

Negative & forceful
Anger
Annoyance
Contempt
Disgust
Irritation

Negative & not in control
Anxiety
Embarrassment
Fear
Helplessness
Powerlessness
Worry 

Negative thoughts
Doubt
Envy
Frustration
Guilt
Shame 

Negative & passive
Boredom
Despair
Disappointment
Hurt
Sadness 

Agitation
Shock
Stress
Tension

Positive & lively
Amusement
Delight
Elation
Excitement
Happiness
Joy
Pleasure 

Caring
Affection
Empathy
Friendliness
Love

Positive thoughts
Courage
Hope
Pride
Satisfaction
Trust

Quiet positive
Calm
Content
Relaxed
Relieved
Serene 

Reactive
Interest
Politeness
Surprise

Table 1: Consolidated list of emotion categories as proposed in HUMAINE deliverable D5f.

6.2 Dimensions

Typically, when emotions are described in terms of emotion dimensions, the literature has most frequently found two or three dimensions of decreasing relevance in terms of the proportion of, e.g., similarity ratings explained (see e.g. [12] for an overview). The most important dimension is related to a valenced evaluation in terms of positive vs. negative, pleasurable vs. unpleasurable. Ths second dimension is usually related to the overall state of activity or arousal, from active to passive. The third dimension, which is slightly less frequently used, is related to the degree of control or social power that an individual has in a situation, i.e. high vs. low control or, when the focus is on the social relationship, dominant vs. submissive.

Naming conventions vary. We suggest using the following names, for reasons of clarity, limited ambiguity and wide-spread use.

  arousal
  valence
  power

6.3 Appraisals

The following list is a flattened representation tentatively formulated based on Klaus Scherer's work [13]. Feedback from users is appreciated to determine the use of this list, and possible needs for adjustment.

  suddenness
  familiarity
  predictability
  intrinsic_pleasantness
  relevance_self_concerns
  relevance_relationship_concerns
  relevance_social_order_concerns
  goal_outcome_probability
  consonant_with_expectation
  goal_conduciveness
  goal_urgency
  cause_agent_self
  cause_agent_other
  cause_motive_intentional
  event_controllability
  agent_power
  goal_adjustment_possible
  standards_compatibility_external
  standards_compatibility_internal

 


7 Examples of applying EARL in different use cases

In the following, a number of different use cases are mentioned as examples for the use of EARL representations. These use cases are illustrations from various potential application areas, and are not intended in any way to be exhaustive. Other use cases certainly exist, and in the use cases mentioned, many more uses of EARL can be conceived.

7.1 Use case 0: Annotating text

Text can be annotated in three ways:

a) By directly enclosing the text:

<text xmlns:earl="http://emotion-research.net/earl/040/default">
This is a text with some
<earl:emotion category="excitement" intensity="0.3">quite exciting
</earl:emotion>
examples.
</text>


b) By referring to an element in the local document:

<text xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:earl="http://emotion-research.net/earl/040/default">
<earl:emotion category="excitement" intensity="0.3" xlink:href="#sent1"/>
<sentence id="sent1">
This is a text with some quite exciting examples.
</sentence>
</text>


c) Stand-off annotation, referring to a different document:

text34.xml:

<text>
<sentence id="sent1">
This is a text with some quite exciting examples.
</sentence>
</text>


emotion34.xml:

<earl xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://emotion-research.net/earl/040/default">
<emotion category="excitement" intensity="0.3" xlink:href="text34.xml#sent1"/>
</earl>

7.2 Use case 1a: Annotating multi-level audio-visual databases with ANVIL

For labelling time stretches with global or modality-specific emotion, a number of setups differing in complexity can be distinguished. For readability, top-level <earl> element and namespace definition are omitted here where these are not ambiguous.

a) Simplest case: One global label

<emotion category="pleasure" start="0.5" end="1.02"/>


b) Global and modality-specific labelling

The global label is complemented with, or replaced by, one or several modality-specific labels:

<emotion category="pleasure" probability="0.4" start="0.5" end="1.02"/>
<emotion modality="voice" category="pleasure" probability="0.9" start="0.5" end="1.02"/>
<emotion modality="face" category="neutral" probability="0.5" start="0" end="2"/>
<emotion modality="text" probability="0.4" start="0.5" end="1.02" arousal="-0.5" valence="0.1"/>


c) Two labels (co-occurring emotions)

A "major" emotion can be annotated using a higher intensity than a "minor" emotion.

<complex-emotion start="0.5" end="1.02">
  <emotion category="pleasure" intensity="0.7"/>
  <emotion category="worry" intensity="0.5"/>
</complex-emotion>


d) One emotion simulated, the other suppressed

<complex-emotion start="0.5" end="1.02">
  <emotion category="pleasure" simulate="1.0"/>
  <emotion category="worry" suppress="1.0"/>
</complex-emotion>

 

7.3 Use case 1b: Continuous annotation of emotional tone with Feeltrace

A series of sample points is represented, usually as a global assessment:

<emotion start="0.387" end="0.416">
  <samples rate="333" values="arousal">
    -0.00256 -0.00256 -0.00256 -0.00256 -0.00256 -0.00256 -0.00256 -0.00256 -0.00256 -0.00256
  </samples>
  <samples rate="333" values="valence">
    0.023 0.023 0.0281 0.0281 0.0281 0.0332 0.0332 0.0383 0.0383 0.0383
  </samples>
</emotion>

7.4 Use case 2a: Emotion recognition/classification of multi-modal input

a) annotation directly with EARL

Let us assume that an emotion recognition algorithm analyses a given multimodal video clip “clip123.avi”. The algorithm analyses individual modalities separately and provides probabilities for different emotion classes. Let us assume for example that the set of classes used in the classifier is: "positive", "neutral", "negative", defined in an EARL dialect with its own namespace, e.g. http://emotion-research.net/earl/040/posneuneg.

Probabilities are encoded in the "probability" attribute.

Then we would have, for example:

<complex-emotion xmlns="http://emotion-research.net/earl/040/posneuneg" xlink:href="clip123.avi">

  <complex-emotion modality="biosignal">
    <emotion category="positive" probability="0.1"/>
    <emotion category="neutral" probability="0.4"/>
    <emotion category="negative" probability="0.05"/>
  </complex-emotion>

  <complex-emotion modality="face">
    <emotion category="positive" probability="0.6"/>
    <emotion category="neutral" probability="0.1"/>
    <emotion category="negative" probability="0.3"/>
  </complex-emotion>

  <complex-emotion modality="voice">
    <emotion category="positive" probability="0.1"/>
    <emotion category="neutral" probability="0.3"/>
    <emotion category="negative" probability="0.1"/>
  </complex-emotion>

</complex-emotion>

In this example, we are trying to recognise from biosignal, face and voice; there is relatively clear evidence in favor of an emotion from the face, probably a positive one; and inconclusive evidence from the biosignal and the voice. In a further "merging" step, these individual results could then be combined to, assuming global averaging:

<complex-emotion xmlns="http://emotion-research.net/earl/040/posneuneg"
                 xlink:href="clip123.avi">
    <emotion category="positive" probability="0.267"/>
    <emotion category="neutral" probability="0.267"/>
    <emotion category="negative" probability="0.15"/>
</complex-emotion>

So the emotion is either neutral or positive.


b) Using EARL within an EMMA document

EMMA is a W3C working draft for an "Extensible MultiModal Annotation markup language", http://www.w3.org/TR/emma. The general purpose of EMMA is to represent information automatically extracted from a user's input by an interpretation component.

EARL can easily be integrated into EMMA via EMMA's extensible content representation mechanism.

<emma:emma version="1.0" xmlns:emma:="http://www.w3.org/2003/04/emma" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:earl="http://emotion-research.net/earl/040/posneuneg">
    <emma:interpretation id="int1">
      <earl:complex-emotion>
        <earl:complex-emotion modality="biosignal">
          <earl:emotion category="positive" probability="0.1"/>
          <earl:emotion category="neutral" probability="0.4"/>
          <earl:emotion category="negative" probability="0.05"/>
        </earl:complex-emotion>

        <earl:complex-emotion modality="face">
          <earl:emotion category="positive" probability="0.6"/>
          <earl:emotion category="neutral" probability="0.1"/>
          <earl:emotion category="negative" probability="0.3"/>
        </earl:complex-emotion>

        <earl:complex-emotion modality="voice">
          <earl:emotion category="positive" probability="0.1"/>
          <earl:emotion category="neutral" probability="0.3"/>
          <earl:emotion category="negative" probability="0.1"/>
        </earl:complex-emotion>

      </earl:complex-emotion>
    </emma:interpretation>
</emma:emma>

The result of the "merging" step could also be represented using EARL within EMMA, e.g. as follows:

<emma:emma version="1.0" xmlns:emma:="http://www.w3.org/2003/04/emma" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:earl="http://emotion-research.net/earl/040/posneuneg">
  <emma:one-of id="r1">
    <emma:interpretation id="int1" probability="0.267">
      <earl:emotion category="positive"/>
    </emma:interpretation>
    <emma:interpretation id="int2" probability="0.267">
      <earl:emotion category="neutral"/>
    </emma:interpretation>

    <emma:interpretation id="int3" probability="0.15">
      <earl:emotion category="negative"/>
    </emma:interpretation>
  </emma:one-of>
</emma:emma>

 

7.5 Use case 2b: Emotion generation in ECAs

In the generation of an  ECA dialogue act, first an appraisal component predicts a suitable emotion for the expression to be generated. The emotion is about something, which is encoded in some semantic representation of the world.

In this example, we specify the emotion in a modified version of the rich representation language (RRL) defined in the NECA project, and the emotion refers to an object from the semantic representation based on DRS (discourse representation structure).

Situation: The simulated salesperson tells the customer that the car is environmentally friendly. When predicting the customer’s emotional response, the customer appraisal model evaluates this as an unexpected but very pleasant news caused by the other agent.

<rrl xmlns:earl="http://emotion-research.net/earl/040/default">
...
<dialogueAct id="d_5">
  <semanticContent>
    <unaryCond id="c_11" arg="car_1" pred="nonpolluting"/>
    ...
  </semanticContent>
  ...
</dialogueAct>
<dialogueAct id="d_6">
  ...
  <earl:emotion predictability="0.1" intrinsic_pleasantness="0.9" cause_agent_other="0.9"/>
  ...
</dialogueAct>
...
</rrl>

After text generation, the emotion would be applied to the generated sentence. In addition, let us assume that a mapping to a set of emotion categories and dimensions occurs:

<rrl xmlns:earl="http://emotion-research.net/earl/040/default">
...
<dialogueAct id="d_6">
  ...
  <sentence id="s_6">Now that's nice to hear!</sentence>
  <earl:emotion xlink:href="#s_6" category="delight" intensity="0.9"
predictability="0.1" intrinsic_pleasantness="0.9" cause_agent_other="0.9"
 arousal="0.7" valence="0.8"/>
</dialogueAct>
...
</rrl>

8 Mapping emotion representations

The reason why EARL previews the use of different emotion representations is that no preferred representation has yet emerged for all types of use. Instead, the most profitable representation to use depends on the application. Still, it may be necessary to convert between different emotion representations, e.g. to enable components in a multi-modal generation system to work together even though they use different emotion representations [8].

For that reason, EARL will be complemented with a mechanism for mapping between emotion representations. From a scientific point of view, it will not always be possible to define such mappings. For example, the mapping between categories and dimensions will only work in one direction. Emotion categories, understood as short labels for complex states, can be located on emotion dimensions representing core properties; but a position in emotion dimension space is ambiguous with respect to many of the specific properties of emotion categories, and can thus only be mapped to generic super-categories. Guidelines for defining scientifically meaningful mappings will be provided.

9 Outlook

We have presented the expressive power of the EARL specification as it is currently conceived. Some specifications are still suboptimal, such as the representation of the start and end times. Other aspects may be missing but will be required by users, such as the annotation of the object of an emotion or the situational context. The current design choices can be questioned, e.g. more clarity could be gained by replacing the current flat list of attributes for categories, dimensions and appraisals with a substructure of elements. On the other hand, this would increase the annotation overhead, especially for simple annotations, which in practice may be the most frequently used. An iterative procedure of comment and improvement is needed before this language is likely to stabilise into a form suitable for a broad range of applications.

We are investigating opportunities for promoting the standardisation of the EARL as a recommended representation format for emotional states in technological applications.


10 REFERENCES

1. Scherer, K. et al., 2005. Proposal for exemplars and work towards them: Theory of emotions. HUMAINE deliverable D3e, http://emotion-research.net/deliverables

2. Ekman, P. (1999). Basic emotions. In Tim Dalgleish and Mick J. Power (Ed.), Handbook of Cognition & Emotion (pp. 301–320). New York: John Wiley.

3. Douglas-Cowie, E., L. Devillers, J-C. Martin, R. Cowie, S. Savvidou, S. Abrilian, and C. Cox (2005). Multimodal Databases of Everyday Emotion: Facing up to Complexity. In Proc. InterSpeech, Lisbon, September 2005.

4. Steidl, S., Levit, M., Batliner, A., Nöth, E., & Niemann, H. (2005). "Of all things the measure is man" - automatic classification of emotions and inter-labeler consistency. ICASSP 2005, International Conference on Acoustics, Speech, and Signal Processing, March 19-23, 2005, Philadelphia, U.S.A., Proceedings (pp. 317--320).

5. Scherer, K.R. (2000). Psychological models of emotion. In J. C. Borod (Ed.), The Neuropsychology of Emotion (pp. 137–162). New York: Oxford University Press.

6. Cowie, R., Douglas-Cowie, E., Savvidou, S., McMahon, E., Sawey, M., & Schröder, M. (2000). 'FEELTRACE': An instrument for recording perceived emotion in real time, ISCA Workshop on Speech and Emotion, Northern Ireland , p. 19-24.

7. Ellsworth, P.C., & Scherer, K. (2003). Appraisal processes in emotion. In Davidson R.J. et al. (Ed.), Handbook of Affective Sciences (pp. 572-595). Oxford New York: Oxford University Press.

8. Krenn, B., Pirker, H., Grice, M., Piwek, P., Deemter, K.v., Schröder, M., Klesen, M., & Gstrein, E. (2002). Generation of multimodal dialogue for net environments. Proceedings of Konvens. Saarbrücken, Germany.

9.  Aylett, R.S. (2004) Agents and affect: why embodied agents need affective systems Invited paper, 3rd Hellenic Conference on AI, Samos, May 2004 Springer Verlag LNAI 3025 pp496-504

10. de Carolis, B., C. Pelachaud, I. Poggi, M. Steedman (2004).APML, a Mark-up Language for Believable Behavior Generation, in H. Prendinger, Ed, Life-like Characters. Tools, Affective Functions and Applications, Springer.

11. Kipp, M. (2004). Gesture Generation by Imitation - From Human Behavior to Computer Character Animation. Boca Raton, Florida: Dissertation.com.

12. Schröder, M. (2004). Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis (Ph.D thesis). Vol. 7 of Phonus, Research Report of the Institute of Phonetics, Saarland University. Online at: http://www.dfki.de/~schroed.

13. Scherer, K.R. (1984). On the nature and function of emotion: a component process approach. In Klaus R. Scherer and Paul Ekman (Ed.), Approaches to emotion (pp. 293–317). Hillsdale, NJ: Erlbaum.


Document Actions
Powered by Plone

Portal usage statistics