9780123748256

Presents state-of-the art methods for multimodal signal processing, analysis, and modelling

Jean-Philippe Thiran received his PhD from the Universit Catholique de Louvain (UCL) in 1997. He is Assistant Professor at the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland, responsible for the image analysis group. Dr Thiran's current scientific interests include image segmentation, multimodal signal processing and medical image analysis. Ferran Marqus is Full Professor in the TSC Department of Universitat Polytcnica di Catalunya (UPC) where he is lecturing on the area of digital signal and image processing. He has previously held posts at EPFL and the University of Southern California. He received his PhD from UPC in December 1992. Herv Bourlard is Director of the Idiap Research Institute, Full Professor at the Swiss Federal Institute of Technology at Lausanne(EPFL), and Director of a National Centre of Competence in Research on 'Interactive Multimodal Information Management. His current interests mainly include statistical pattern classification, signal processing, multi-channel processing, artificial neural networks, and applied mathematics.

Preface	p. xiii
Introduction	p. 1
Signal Processing, Modelling and Related Mathematical Tools	p. 5
Statistical Machine Learning for HCI	p. 7
Introduction	p. 7
Introduction to Statistical Learning	p. 8
Types of Problem	p. 8
Function Space	p. 9
Loss Functions	p. 10
Expected Risk and Empirical Risk	p. 10
Statistical Learning Theory	p. 11
Support Vector Machines for Binary Classification	p. 13
Hidden Markov Models for Speech Recognition	p. 16
Speech Recognition	p. 17
Markovian Processes	p. 17
Hidden Markov Models	p. 18
Inference and Learning with HMMs	p. 20
HMMs for Speech Recognition	p. 22
Conclusion	p. 22
References	p. 23
Speech Processing	p. 25
Introduction	p. 26
Speech Recognition	p. 28
Feature Extraction	p. 28
Acoustic Modelling	p. 30
Language Modelling	p. 33
Decoding	p. 34
Multiple Sensors	p. 35
Confidence Measures	p. 37
Robustness	p. 38
Speaker Recognition	p. 40
Overview	p. 40
Robustness	p. 43
Text-to-Speech Synthesis	p. 44
Natural Language Processing for Speech Synthesis	p. 44
Concatenative Synthesis with a Fixed Inventory	p. 46
Unit Selection-Based Synthesis	p. 50
Statistical Parametric Synthesis	p. 53
Conclusions	p. 56
References	p. 57
Natural Language and Dialogue Processing	p. 63
Introduction	p. 63
Natural Language Understanding	p. 64
Syntactic Parsing	p. 64
Semantic Parsing	p. 68
Contextual Interpretation	p. 70
Natural Language Generation	p. 71
Document Planning	p. 72
Microplanning	p. 73
Surface Realisation	p. 73
Dialogue Processing	p. 74
Discourse Modelling	p. 74
Dialogue Management	p. 77
Degrees of Initiative	p. 80
Evaluation	p. 81
Conclusion	p. 85
References	p. 85
Image and Video Processing Tools for HCI	p. 93
Introduction	p. 93
Face Analyses	p. 94
Face Detection	p. 95
Face Tracking	p. 96
Facial Feature Detection and Tracking	p. 98
Gaze Analysis	p. 100
Face Recognition	p. 101
Facial Expression Recognition	p. 103
Hand-Gesture Analysis	p. 104
Head Orientation Analysis and FoA Estimation	p. 106
Head Orientation Analysis	p. 106
Focus of Attention Estimation	p. 107
Body Gesture Analysis	p. 109
Conclusions	p. 112
References	p. 112
Processing of Handwriting and Sketching Dynamics	p. 119
Introduction	p. 119
History of Handwriting Modality and the Acquisition of Online Handwriting Signals	p. 121
Basics in Acquisition, Examples for Sensors	p. 123
Analysis of Online Handwriting and Sketching Signals	p. 124
Overview of Recognition Goals in HCI	p. 125
Sketch Recognition for User Interface Design	p. 128
Similarity Search in Digital Ink	p. 133
Summary and Perspectives for Handwriting and Sketching in HCI	p. 138
References	p. 139
Multimodal Signal Processing and Modelling	p. 143
Basic Concepts of Multimodal Analysis	p. 143
Defining Multimodality	p. 145
Advantages of Multimodal Analysis	p. 148
Conclusion	p. 151
References	p. 152
Multimodal Information Fusion	p. 153
Introduction	p. 153
Levels of Fusion	p. 156
Adaptive versus Non-Adaptive Fusion	p. 158
Other Design Issues	p. 162
Conclusions	p. 165
References	p. 165
Modality Integration Methods	p. 171
Introduction	p. 171
Multimodal Fusion for AVSR	p. 172
Types of Fusion	p. 172
Multistream HMMs	p. 174
Stream Reliability Estimates	p. 174
Multimodal Speaker Localisation	p. 178
Conclusion	p. 181
References	p. 181
A Multimodal Recognition Framework for Joint Modality Compensation and Fusion	p. 185
Introduction	p. 186
Joint Modality Recognition and Applications	p. 188
A New Joint Modality Recognition Scheme	p. 191
Concept	p. 191
Theoretical Background	p. 191
Joint Modality Audio-Visual Speech Recognition	p. 194
Signature Extraction Stage	p. 196
Recognition Stage	p. 197
Joint Modality Recognition in Biometrics	p. 198
Overview	p. 198
Results	p. 199
Conclusions	p. 203
References 204
Managing Multimodal Data, Metadata and Annotations: Challenges and Solutions	p. 207
Introduction	p. 208
Setting the Stage: Concepts and Projects	p. 208
Metadate-versusAnnotations	p. 209
Examples of Large Multimodal Collections	p. 210
Capturing and Recording Multimodal Data	p. 211
Capture Devices	p. 211
Synchronisation	p. 212
Activity Types in Multimodal Corpora	p. 213
Examples of Set-ups and Raw Data	p. 213
Reference Metadata and Annotations	p. 214
Gathering Metadata: Methods	p. 215
Metadata for the AMI Corpus	p. 216
Reference Annotations: Procedure and Tools	p. 217
Data Storage and Access	p. 219
Exchange Formats for Metadata and Annotations	p. 219
Data Servers	p. 221
Accessing Annotated Multimodal Data	p. 222
Conclusions and Perspectives	p. 223
References	p. 224
Multimodal Human-Computer and Human-to-Human Interaction	p. 229
Multimodal Input	p. 231
Introduction	p. 231
Advantages of Multimodal Input Interfaces	p. 232
State-of-the-Art Multimodal Input Systems	p. 234
Multimodality, Cognition and Performance	p. 237
Multimodal Perception and Cognition	p. 237
Cognitive Load and Performance	p. 238
Understanding Multimodal Input Behaviour	p. 239
Theoretical Frameworks	p. 240
Interpretation of Multimodal Input Patterns	p. 243
Adaptive Multimodal Interfaces	p. 245
Designing Multimodal Interfaces that Manage Users' Cognitive Load	p. 246
Designing Low-Load Multimodal Interfaces for Education	p. 248
Conclusions and Future Directions	p. 250
References	p. 251
MuItimodal Output: Facial Motion, Gestures and Synthesised Speech Synchronisation	p. 257
Introduction	p. 257
Basic AV Speech Synthesis	p. 258
The Animation System	p. 260
Coarticulation	p. 263
Extended AV Speech Synthesis	p. 264
Data-Driven Approaches	p. 267
Rule-Based Approaches	p. 269
Embodied Conversational Agents	p. 270
TTS Timing Issues	p. 272
On-the-Fly Synchronisation	p. 272
A Priori Synchronisation	p. 273
Conclusion	p. 274
References	p. 274
Interactive Representations of Multimodal Databases	p. 279
Introduction	p. 279
Multimodal Data Representation	p. 280
Multimodal Data Access	p. 283
Browsing as Extension of the Query Formulation Mechanism	p. 283
Browsing for the Exploration of the Content Space	p. 287
Alternative Representations	p. 292
Evaluation	p. 292
Commercial Impact	p. 293
Gaining Semantic from User Interaction	p. 294
Multimodal Interactive Retrieval	p. 294
Crowdsourcing	p. 295
Conclusion and Discussion	p. 298
References	p. 299
Modelling Interest in Face-to-Face Conversations from Multimodal Nonverbal Behaviour	p. 309
Introduction	p. 309
Perspectives on Interest Modelling	p. 311
Computing Interest from Audio Cues	p. 315
Computing interest from Multimodal Cues	p. 318
Other Concepts Related to Interest	p. 320
Concluding Remarks	p. 322
References	p. 323
Index	p. 327
Table of Contents provided by Ingram. All Rights Reserved.

What is included with this book?

The New copy of this book will include any supplemental materials advertised. Please check the title of the book to determine if it should include any access cards, study guides, lab manuals, CDs, etc.

The Used, Rental and eBook copies of this book are not guaranteed to include any supplemental materials. Typically, only the book itself is included. This is true even if the title states it includes any access cards, study guides, lab manuals, CDs, etc.

Amazon no longer offers textbook rentals. We do!

Multimodal Signal Processing

Multimodal Signal Processing

0123748259

Summary

Author Biography

Table of Contents

Supplemental Materials

Write a Review