Personal tools
You are here: Home Public News IEEE TAP Special Issue on Multimodal Processing in Speech-based Interactions

IEEE TAP Special Issue on Multimodal Processing in Speech-based Interactions

— filed under:

Special Issue of The IEEE Transactions on Audio, Speech and Language Processing on Multimodal Processing in Speech-based Interactions

What
  • Deadline
When 15 December 2007
from 16:10 to 16:10
Add event to calendar vCal
iCal

Recently there has been increasing research interest to jointly process audio and visual information related to human activities, and to extend the technological developments in individual modalities for human-computer interaction to include multimodal processing in order to improve robustness and naturalness. For example, we have witnessed significant research activity devoted to extending traditional, unimodal speech recognition to audio-visual speech recognition by incorporating the speaker’s lip motion; text-to-speech synthesis has been migrating towards audio-visual speech synthesis involving head, facial, and lip motions; speech databases for technology evaluation have evolved from single-modality broadcast news type audio towards multimodal recordings of complex human interactions in contexts such as meeting rooms and in support of a multitude of far-field multimodal technologies; and speaker authentication has been migrating towards multimodality by incorporating biometric traits such as facial images, videos, and fingerprints. Furthermore, we have witnessed emergence of major research programs in the area such as the European Union funded efforts on multimodal interfaces and interaction, as well as multimodal technology evaluation campaigns by NIST and the VACE community (Rich Transcription, CLEAR, etc).
Joint processing of audio, visual, pen, and other gestural input offers a means to improve naturalness and robustness of user interfaces that can automatically recognize human identity, intent, and activity in pervasive computing environments. Illustrative technologies include speaker and speech recognition, person localization, source separation, media synthesis, and media content mining. A critical factor that contributes to the effectiveness of multimodal processing in speech-based interactions is the robust integration or fusion of information from multiple modalities. This special issue invites researchers to submit original and unpublished work that concentrates on the multi-disciplinary field of multimodal processing of speech-based interactions. We solicit papers including, but not limited to, the following topics:
(a) Multimodal speaker recognition (identification and verification);
(b) Audio-visual speech recognition;
(c) Audio-visual speech synthesis;
(d) Multimodal fusion methodologies;
(e) Audio-visual open microphone engagement;
(f) Multimodal processing in media retrieval;
(g) Multimodal corpora and resources;
(h) Multimodality in spoken dialog interfaces;
(i) User-centered and adaptive multimodal interfaces



Proposed Schedule:

Submission deadline: 15 December, 2007

Notification of acceptance: 15 April, 2008

Final manuscript due: 1 June, 2008

Tentative publication date: 1 September, 2008.



Guest Editors:

Dr. Gerasimos Potamianos, IBM T.J. Watson Research Center, gpotam@us.ibm.com

Dr. Helen Meng, Chinese University of Hong Kong, hmmeng@se.cuhk.edu.hk

Dr. Sharon Oviatt, Adapx Inc., oviatt@adapx.com

Dr. Gerhard Rigoll, Technical University of Munich, rigoll@tum.de

More information about this event…

Document Actions
Powered by Plone

Portal usage statistics