TY - GEN
T1 - Robust user context analysis for multimodal interfaces
AU - Dey, Prasenjit
AU - Selvaraj, Muthuselvam
AU - Lee, Bowon
PY - 2011
Y1 - 2011
N2 - Multimodal Interfaces that enable natural means of interaction using multiple modalities such as touch, hand gestures, speech, and facial expressions represent a paradigm shift in human-computer interfaces. Their aim is to allow rich and intuitive multimodal interaction similar to human-to-human communication and interaction. From the multimodal system's perspective, apart from the various input modalities themselves, user context information such as states of attention and activity, and identities of interacting users can help greatly in improving the interaction experience. For example, when sensors such as cameras (webcams, depth sensors etc.) and microphones are always on and continuously capturing signals in their environment, user context information is very useful to distinguish genuine system-directed activity from ambient speech and gesture activity in the surroundings, and distinguish the "active user" from among a set of users. Information about user identity may be used to personalize the system's interface and behavior - e.g. the look of the GUI, modality recognition profiles, and information layout - to suit the specific user. In this paper, we present a set of algorithms and an architecture that performs audiovisual analysis of user context using sensors such as cameras and microphone arrays, and integrates components for lip activity and audio direction detection (speech activity), face detection and tracking (attention), and face recognition (identity). The proposed architecture allows the component data flows to be managed and fused with low latency, low memory footprint, and low CPU load, since such a system is typically required to run continuously in the background and report events of attention, activity, and identity, in real-time, to consuming applications.
AB - Multimodal Interfaces that enable natural means of interaction using multiple modalities such as touch, hand gestures, speech, and facial expressions represent a paradigm shift in human-computer interfaces. Their aim is to allow rich and intuitive multimodal interaction similar to human-to-human communication and interaction. From the multimodal system's perspective, apart from the various input modalities themselves, user context information such as states of attention and activity, and identities of interacting users can help greatly in improving the interaction experience. For example, when sensors such as cameras (webcams, depth sensors etc.) and microphones are always on and continuously capturing signals in their environment, user context information is very useful to distinguish genuine system-directed activity from ambient speech and gesture activity in the surroundings, and distinguish the "active user" from among a set of users. Information about user identity may be used to personalize the system's interface and behavior - e.g. the look of the GUI, modality recognition profiles, and information layout - to suit the specific user. In this paper, we present a set of algorithms and an architecture that performs audiovisual analysis of user context using sensors such as cameras and microphone arrays, and integrates components for lip activity and audio direction detection (speech activity), face detection and tracking (attention), and face recognition (identity). The proposed architecture allows the component data flows to be managed and fused with low latency, low memory footprint, and low CPU load, since such a system is typically required to run continuously in the background and report events of attention, activity, and identity, in real-time, to consuming applications.
KW - human-computer-interaction.
KW - multimodal systems
KW - speech
KW - user context
UR - http://www.scopus.com/inward/record.url?scp=83455197109&partnerID=8YFLogxK
U2 - 10.1145/2070481.2070498
DO - 10.1145/2070481.2070498
M3 - Conference contribution
AN - SCOPUS:83455197109
SN - 9781450306416
T3 - ICMI'11 - Proceedings of the 2011 ACM International Conference on Multimodal Interaction
SP - 81
EP - 88
BT - ICMI'11 - Proceedings of the 2011 ACM International Conference on Multimodal Interaction
T2 - 2011 ACM International Conference on Multimodal Interaction, ICMI'11
Y2 - 14 November 2011 through 18 November 2011
ER -