INTERSPEECH 2006 and - Proceedingstoc.proceedings.com/01902webtoc.pdf · INTERSPEECH 2006 and 9th...
Transcript of INTERSPEECH 2006 and - Proceedingstoc.proceedings.com/01902webtoc.pdf · INTERSPEECH 2006 and 9th...
International Speech Communication Association
IINNTTEERRSSPPEEEECCHH 22000066 aanndd
99thth IInntteerrnnaattiioonnaall CCoonnffeerreennccee oonn SSppookkeenn LLaanngguuaaggee PPrroocceessssiinngg
IINNTTEERRSSPPEEEECCHH 22000066 -- IICCSSLLPP
September 17-21, 2006 Pittsburgh, Pennsylvania, USA
Volume 1 of 5
Printed from e-media with permission by:
Curran Associates, Inc. 57 Morehouse Lane
Red Hook, NY 12571 www.proceedings.com
ISBN: 978-1-60423-449-7
Some format issues inherent in the e-media version may also appear in this print version.
International Speech Communication Association
INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing
TABLE OF CONTENTS
VOLUME I
LANGUAGE MODELING FOR SPOKEN DIALOG SYSTEMS
Robust Interpretation in Dialogue by Combining Confidence Scores with Contextual Features............................................................................................................................ 1
Matthew Purver, Florin Ratiu, Lawrence Cavedon
A Clustering Approach to Semantic Decoding ................................................................................ 5Hui Ye, Steve Young
A Bootstrapping Approach for Developing Language Model of New Spoken Dialogue Systems by Selecting Web Texts...................................................................................... 9
Teruhisa Misu, Tatsuya Kawahara
Phoneme-to-Grapheme Mapping for Spoken Inquiries to the Semantic Web............................ 13Axel Horndasch, Elmar Noth, Anton Batliner, Volker Warnke
Bootstrapping Language Models for Dialogue Systems .............................................................. 17Karl Weilhammer, Matthew N. Stuttle, Steve Young
Question Answering with Discriminative Learning Algorithms................................................... 21Junlan Feng
FEATURE ENHANCEMENT FOR ROBUST ASR
Feature Normalization Using Smoothed Mixture Transformations ............................................. 25Patrick Kenny, Vishwa Gupta, G. Boulianne, Pierre Ouellet, Pierre Dumouchel
Stochastic Vector Mapping-Based Feature Enhancement Using Prior Model and Environment Adaptation for Noisy Speech Recognition.............................................................. 29
Chia-Hsin Hsieh, Chung-Hsien Wu, Jun-Yu Lin
A Framework for Robust MFCC Feature Extraction Using SNR-Dependent Compression of Enhanced Mel Filter Bank Energies ................................................................... 33
Babak Nasersharif, Ahmad Akbari
Coupling Particle Filters with Automatic Speech Recognition for Speech Feature Enhancement..................................................................................................................................... 37
Friedrich Faubel, Matthias Wolfel
Extension and Further Analysis of Higher Order Cepstral Moment Normalization (HOCMN) for Robust Features in Speech Recognition................................................................. 41
Chang-Wen Hsu, Lin-Shan Lee
An Improved Mel-Wiener Filter for Mel-LPC Based Speech Recognition ................................... 45Md. Babul Islam, Hiroshi Matsumoto, Kazumasa Yamamoto
DIALOG AND DISCOURSE
A Stochastic Approach for Dialog Management Based on Neural Networks............................. 49Lluis F. Hurtado, David Griol, Encarna Segarra, Emilio Sanchis
Discourse Structure and Speech Recognition Problems ............................................................. 53Mihai Rotaru, Diane J. Litman
A TextTiling Based Approach to Topic Boundary Detection in Meetings .................................. 57Satanjeev Banerjee, Alexander I. Rudnicky
An User-Centered Development of an Intuitive Dialog Control for Speech-Controlled Music Selection in Cars................................................................................................. 61
Stefan Schulz, Hilko Donker
Doing Research on a Deployed Spoken Dialogue System: One Year of Let's Go! Experience ......................................................................................................................................... 65
Antoine Raux, Dan Bohus, Brian Langner, Alan W. Black, Maxine Eskenazi
Detecting Question-Bearing Turns in Spoken Tutorial Dialogues .............................................. 69Jackson Liscombe, Jennifer J. Venditti, Julia Hirschberg
THE SPEECH SEPARATION CHALLENGE
A Computational Auditory Scene Analysis System for Robust Speech Recognition ....................................................................................................................................... 73
Soundararajan Srinivasan, Yang Shao, Zhaozhang Jin, Deliang Wang
CASA Based Speech Separation for Robust Speech Recognition ............................................. 77Runqiang Han, Pei Zhao, Qin Gao, Zhiping Zhang, Hao Wu, Xihong Wu
Enhancement of Harmonic Content of Speech Based on a Dynamic Programming Pitch Tracking Algorithm ................................................................................................................. 81
Mark R. Every, Philip J. B. Jackson
Recent Advances in Speech Fragment Decoding Techniques .................................................... 85Jon Barker, Andre Coy, Ning Ma, Martin Cooke
Speech Recognition Using Factorial Hidden Markov Models for Separation in the Feature Space.................................................................................................................................... 89
Tuomas Virtanen
Combining Missing-Feature Theory, Speech Enhancement and Speaker-Dependent/-Independent Modeling for Speech Separation.......................................................... 93
Ji Ming, Timothy J. Hazen, James R. Glass
Super-Human Multi-Talker Speech Recognition: The IBM 2006 Speech Separation Challenge System ............................................................................................................................. 97
T. Kristjansson, J. Hershey, P. Olsen, S. Rennie, Ramesh Gopinath
Modified Phase Opponency Based Solution to the Speech Separation Challenge................. 101Om D. Deshmukh, Carol Y. Espy-Wilson
MULTILINGUAL AND MULTI-ACCENT PROCESSING
The 2006 RWTH Parliamentary Speeches Transcription System .............................................. 105J. Loof, M. Bisani, Ch. Gollan, G. Heigold, Bjorn Hoffmeister, Ch. Plahl, Ralf Schluter, Hermann Ney
Multilingual Non-Native Speech Recognition Using Phonetic Confusion-Based Acoustic Model Modification and Graphemic Constraints......................................................... 109
G. Bouselmi, D. Fohr, I. Illina, Jean-Paul Haton
Automatic Speech Recognition of Cantonese-English Code-Mixing Utterances .................... 113Joyce Y. C. Chan, P. C. Ching, Tan Lee, Houwei Cao
The ICSI+ Multilingual Sentence Segmentation System............................................................. 117M. Zimmerman, Dilek Hakkani-Tur, J. Fung, N. Mirghafori, L. Gottlieb, Elizabeth Shriberg, Yang Liu
Cross-Language Evaluation of Voice-to-Phoneme Conversions for Voice-Tag Application in Embedded Platforms ............................................................................................. 121
Yan Ming Cheng, Changxue Ma, Lynette Melnar
A Multi-Space Distribution (MSD) Approach to Speech Recognition of Tonal Languages ....................................................................................................................................... 125
Huanliang Wang, Yao Qian, Frank K. Soong, Jian-Lai Zhou, Jiqing Han
Comparison of Acoustic Modeling Techniques for Vietnamese and Khmer ASR ................... 129Viet Bac Le, Laurent Besacier
Multi-Accent Chinese Speech Recognition.................................................................................. 133Yi Liu, Pascale Fung
Comparative Analysis of Formants of British, American and Australian Accents .................. 137Seyed Ghorshi, Saeed Vaseghi, Qin Yan
Automatic Initial/Final Generation for Dialectal Chinese Speech Recognition ........................ 141Linquan Liu, Thomas Fang Zheng, Wenhu Wu
Maximum Entropy Modeling for Diacritization of Arabic Text ................................................... 145Ruhi Sarikaya, Ossama Emam, Imed Zitouni, Yuqing Gao
Comparison of Slovak and Czech Speech Recognition Based on Grapheme and Phoneme Acoustic Models ............................................................................................................ 149
Slavomir Lihan, Jozef Juhar, Anton Cizmar
CORPORA, ANNOTATION, AND ASSESSMENT METRICS I
Integrating Festival and Windows................................................................................................. 153Rhys James Jones, Ambrose Choy, Briony Williams
Measuring the Acceptable Word Error Rate of Machine-Generated Webcast Transcripts....................................................................................................................................... 157
Cosmin Munteanu, Gerald Penn, Ron Baecker, Elaine Toms, David James
Analyzing Reusability of Speech Corpus Based on Statistical Multidimensional Scaling Method................................................................................................................................ 161
Goshu Nagino, Makoto Shozakai
Redundancy and Productivity in the Speech Technology Lexicon --- Can We Do Better?.............................................................................................................................................. 165
Susan Fitt, Korin Richmond
Word Intelligibility Estimation of Noise-Reduced Speech.......................................................... 169Takeshi Yamada, Masakazu Kumakura, Nobuhiko Kitawaki
Exploring the Unknown --- Collecting 1000 Speakers Over the Internet for the Ph@ttSessionz Database of Adolescent Speakers ..................................................................... 173
Christoph Draxler
A New Single-Ended Measure for Assessment of Speech Quality............................................ 177Timothy Murphy, Dorel Picovici, Abdulhussain E. Mahdi
Speech Technology for Minority Languages: The Case of Irish (Gaelic) ................................. 181Ailbhe Ni Chasaide, John Wogan, Brian O Raghallaigh, Aine Ni Bhriain, Eric Zoerner, Harald Berthelsen, Christer Gobl
Further Investigations on the Relationship Between Objective Measures of Speech Quality and Speech Recognition Rates in Noisy Environments.................................. 185
Francisco Jose Fraga, Carlos Alberto Ynoguti, Andre Godoi Chiovato
Non-Intrusive Speech Quality Assessment with Low Computational Complexity .................. 189Volodya Grancharov, David Y. Zhao, Jonas Lindblom, W. Bastiaan Kleijn
Using Speech Recognition Technique for Constructing a Phonetically Transcribed Taiwanese (Min-Nan) Text Corpus .......................................................................... 193
Min-Siong Liang, Ren-Yuan Lyu, Yuang-Chin Chiang
SloParl --- Slovenian Parliamentary Speech and Text Corpus for Large Vocabulary Continuous Speech Recognition .............................................................................. 197
Andrej Zgank, Tomaz Rotovnik, Matej Grasic, Marko Kos, Damjan Vlaj, Zdravko Kacic
An Annotation Scheme for Agreement Analysis ......................................................................... 201Siew Leng Toh, Fan Yang, Peter A. Heeman
SPEECH CODING
Signal Modification Incorporating Perceptual Weighting Filter ................................................. 205Joon-Hyuk Chang, Woohyung Lim, Nam Soo Kim
Enhanced Dynamic Codebook Reordering for Advanced Quantizer Structures ..................... 209Jani Nurminen
An Efficient Segment-Based Speech Compression Technique for Hand-Held TTS Systems ........................................................................................................................................... 213
Chang-Heon Lee, Sung-Kyo Jung, Thomas Eriksson, Won-Suk Jun, Hong-Goo Kang
An Unified Unit-Selection Framework for Ultra Low Bit-Rate Speech Coding ......................... 217Ramasubramanian V., Harish D.
Efficient VQ Techniques and General Noise Shaping in Noise Feedback Coding .................. 221Jes Thyssen, Juin-Hwey Chen
Classified Comfort Noise Generation for Efficient Voice Transmission................................... 225Yasheng Qian, Wei-Shou Hsu, Peter Kabal
Integration of a CELP Coder in the ARDOR Universal Sound Codec ....................................... 229Balazs Kovesi, Dominique Massaloux, David Virette, Julien Bensa
Two Stage Transform Vector Quantization of LSFs for Wideband Speech Coding ................ 233Saikat Chatterjee, T. V. Sreenivas
Comparison of Prediction Based LSF Quantization Methods Using Split VQ ......................... 237Saikat Chatterjee, T. V. Sreenivas
High-Rate Data Embedding in Unvoiced Speech ........................................................................ 241Konrad Hofbauer, Gernot Kubin
Pitch Resynchronization While Recovering from a Late Frame in a Predictive Speech Decoder .............................................................................................................................. 245
Kyle D. Anderson, Philippe Gournay
SPEECH ENHANCEMENT I
A Novel Environment-Dependent Speech Enhancement Method with Optimized Memory Footprint............................................................................................................................ 249
Suhadi Suhadi, Sorel Stan, Tim Fingscheidt
Weighted Codebook Mapping for Noisy Speech Enhancement Using Harmonic-Noise Model ..................................................................................................................................... 253
Esfandiar Zavarehei, Saeed Vaseghi, Qin Yan
MMSE Estimation of Complex-Valued Discrete Fourier Coefficients with Generalized Gamma Priors ............................................................................................................ 257
J. Jensen, R. C. Hendriks, J. S. Erkelens, R. Heusdens
Automatic Removal of Typed Keystrokes from Speech Signals ............................................... 261Amarnag Subramanya, Michael L. Seltzer, Alex Acero
Lattice LP Filtering for Noise Reduction in Speech Signals ...................................................... 265Erhard Rank, Gernot Kubin
Speech Enhancement Using Modified Phase Opponency Model.............................................. 269Om D. Deshmukh, Carol Y. Espy-Wilson
ASR OTHER I
Computer-Assisted Closed-Captioning of Live TV Broadcasts in French ............................... 273G. Boulianne, J.-F. Beaumont, M. Boisvert, J. Brousseau, P. Cardinal, C. Chapdelaine, M. Comeau, Pierre Ouellet, F. Osterrath
On the Use of Morphological Analysis for Dialectal Arabic Speech Recognition ................... 277Mohamed Afify, Ruhi Sarikaya, Hong-Kwang Jeff Kuo, Laurent Besacier, Yuqing Gao
Recognition of Classroom Lectures in European Portuguese .................................................. 281Isabel Trancoso, Ricardo Nunes, Luis Neves, Ceu Viana, Helena Moniz, Diamantino Caseiro, Ana Isabel Mata
Investigating Automatic Decomposition for ASR in Less Represented Languages ............... 285Thomas Pellegrini, Lori Lamel
Automatic Transcription of Somali Language ............................................................................. 289Abdillahi Nimaan, Pascal Nocera, Jean-Francois Bonastre
Analysis of Overlaps in Meetings by Dialog Factors, Hot Spots, Speakers, and Collection Site: Insights for Automatic Speech Recognition..................................................... 293
Ozgur Cetin, Elizabeth Shriberg
MODELING PROSODIC FEATURES
Combining Acoustic, Lexical, and Syntactic Evidence for Automatic Unsupervised Prosody Labeling ................................................................................................... 297
Sankaranarayanan Ananthakrishnan, Shrikanth Narayanan
On the Correlation Between Energy and Pitch Accent in Read English Speech ..................... 301Andrew Rosenberg, Julia Hirschberg
Corpus-Based Generation of Fundamental Frequency Contours Using Generation Process Model and Considering Emotional Focuses ................................................................. 305
Keikichi Hirose, Yasufumi Asano, Nobuaki Minematsu
Prosodic Boundaries in Czech: An Experiment Based on Delexicalized Speech ................... 309Tomas Dubeda
Totally Data-Driven Intonation Prediction Model Using a Novel F0 Contour Parametric Representation ............................................................................................................ 313
Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao
A Comparison of Inter-Transcriber Reliability for Two Systems of Prosodic Annotation: RaP (Rhythm and Pitch) and ToBI (Tones and Break Indices) ............................. 317
Laura Dilley, Mara Breen, Marti Bolivar, John Kraemer, Edward Gibson
SPOKEN INFORMATION RETRIEVAL
Saliency Parsing for Automated Directory Assistance............................................................... 321Issac Alphonso, Shuangyu Chang
Open-Vocabulary Spoken Document Retrieval Based on New Subword Models and Subword Phonetic Similarity.................................................................................................. 325
Kohei Iwata, Yoshiaki Itoh, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-Wook Lee
Improved Topic Classification Over Maximum Entropy Model Using K-Norm Based New Objectives.................................................................................................................... 329
Xiang Li, Ea-Ee Jan, Cheng Wu, David Lubensky
Efficient Interactive Retrieval of Spoken Documents with Key Terms Ranked by Reinforcement Learning................................................................................................................. 333
Yi-Cheng Pan, Jia-Yu Chen, Yen-Shin Lee, Yi-Sheng Fu, Lin-Shan Lee
Discriminative Named Entity Recognition of Speech Data Using Speech Recognition Confidence................................................................................................................. 337
Katsuhito Sudoh, Hajime Tsukada, Hideki Isozaki
Using Latent Semantic Indexing for Morph-Based Spoken Document Retrieval .................... 341Ville T. Turunen, Mikko Kurimo
FRONT-END METHODS FOR ASR
Feature Combination Using Linear Discriminant Analysis and Its Pitfalls............................... 345Ralf Schluter, Andras Zolnay, Hermann Ney
Discriminant Linear Processing of Time-Frequency Plane ........................................................ 349Fabio Valente, Hynek Hermansky
Automatic Speech Recognition Experiments with Articulatory Data........................................ 353Esmeralda Uraga, Thomas Hain
Speech Recognition with Phonological Features: Some Issues to Attend .............................. 357Frederik Stouten, Jean-Pierre Martens
Multi-Source Far-Distance Microphone Selection and Combination for Automatic Transcription of Lectures............................................................................................................... 361
Matthias Wolfel, Christian Fugen, Shajith Ikbal, John W. McDonough
Statistical Analysis and Performance of DFT Domain Noise Reduction Filters for Robust Speech Recognition .......................................................................................................... 365
Colin Breithaupt, Rainer Martin
Normalization of the Inter-Frame Information Using Smoothing Filtering ............................... 369L. Garcia, Jose C. Segura, Carmen Benitez, Javier Ramirez, Angel De La Torre
Comparative Study on Contributions of Pitch-Synchronization and Peak-Amplitude Towards Robustness Issue of ASR............................................................................ 373
Muhammad Ghulam, Junsei Horikawa, Tsuneo Nitta
Phoneme Recognition Based on Fisher Weight Map to Higher-Order Local Auto-Correlation ....................................................................................................................................... 377
Yasuo Ariki, Shunsuke Kato, Tetsuya Takiguchi
Data-Driven Design of Front-End Filter Bank for Lombard Speech Recognition .................... 381Hynek Boril, Petr Fousek, Petr Pollak
Optimization of Class Weights for LDA Feature Transformations ............................................ 385Andrej Ljolje
LDA Based Feature Estimation Methods for LVCSR .................................................................. 389Janne Pylkkonen
Robust Feature Extraction Based on Spectral Peaks of Group Delay and Autocorrelation Function and Phase Domain Analysis.............................................................. 393
G. Farahani, S. M. Ahadi, M. Mehdi Homayounpour
Frequency Warping by Linear Transformation of Standard MFCC ........................................... 397Sankaran Panchapagesan
LANGUAGE AND DIALECT RECOGNITION
Automatic Language Identification Using Wavelets ................................................................... 401Ana Lilia Reyes-Herrera, Luis Villasenor-Pineda, Manuel Montes-Y-Gomez
Minimum Classification Error Training of Hidden Markov Models for Acoustic Language Identification.................................................................................................................. 405
Josef G. Bauer, Ekaterina Timoshenko
Unsupervised Adaptation for Acoustic Language Identification............................................... 409Ekaterina Timoshenko, Josef G. Bauer
Low Complexity LID Using Pruned Pattern Tables of LZW ........................................................ 413S. V. Basavaraja, T. V. Sreenivas
Improved Language Identification Using Support Vector Machines for Language Modeling .......................................................................................................................................... 417
Xi Yang, Lu-Feng Zhai, Manhung Siu, Herbert Gish
Recent Advances in Phonotactic Language Recognition Using Binary-Decision Trees................................................................................................................................................. 421
Jiri Navratil
Fusion of Phonotactic and Prosodic Knowledge for Language Identificcation....................... 425Chi-Yueh Lin, Hsiao-Chuan Wang
Vector-Based Spoken Language Recognition Using Output Coding ....................................... 429Haizhou Li, Bin Ma, Rong Tong
Basque-Spanish Language Identification Using Phone-Based Methods ................................. 433Victor G. Guijarrubia, M. Ines Torres
The Role of Prosody in the Perception of US Native English Accents ..................................... 437Ayako Ikeno, John H. L. Hansen
Perceptual Identification and Phonetic Analysis of 6 Foreign Accents in French .................. 441Bianca Vieru-Dimulescu, Philippe Boula De Mareuil
Gaussian Mixture Selection and Data Selection for Unsupervised Spanish Dialect Classification ................................................................................................................................... 445
Rongqing Huang, John H. L. Hansen
SPOKEN DIALOG SYSTEMS I
Dynamic Extension of a Grammar-Based Dialogue System: Constructing an All-Recipes Knowing Robot................................................................................................................. 449
Petra Gieselmann, Alex Waibel
Scalable and Portable Web-Based Multimodal Dialogue Interaction with Geographical Databases ................................................................................................................ 453
Alexander Gruenstein, Stephanie Seneff, Chao Wang
System- versus User-Initiative Dialog Strategy for Driver Information Systems ..................... 457Chantal Ackermann, Marion Libossek
Have We Met? MDP Based Speaker ID for Robot Dialogue........................................................ 461Filip Krsmanovic, Curtis Spencer, Daniel Jurafsky, Andrew Y. Ng
Prominent Words as Anchors for TRP Projection....................................................................... 465R. J. J. H. Van Son, Wieneke Wesseling, Louis C. W. Pols
Learning Multi-Goal Dialogue Strategies Using Reinforcement Learning with Reduced State-Action Spaces ....................................................................................................... 469
Heriberto Cuayahuitl, Steve Renals, Oliver Lemon, Hiroshi Shimodaira
Pitch Range and Pause Duration as Markers of Discourse Hierarchy: Perception Experiments..................................................................................................................................... 473
Jorg Mayer, Ekaterina Jasinskaja, Ulrike Kolsch
Radiobot-CFF: A Spoken Dialogue System for Military Training .............................................. 477Antonio Roque, Anton Leuski, Vivek Rangarajan, Susan Robinson, Ashish Vaswani, Shrikanth Narayanan, David Traum
Is Voice Quality Enough? --- Study on How the Situation and User's Awareness Influence the Utterance Features .................................................................................................. 481
Shinya Yamada, Toshihiko Itoh, Kenji Araki
Development of Slovak GALAXY/VoiceXML Based Spoken Language Dialogue System to Retrieve Information from the Internet ....................................................................... 485
Jozef Juhar, Stanislav Ondas, Anton Cizmar, Milan Rusko, Gregor Rozinaj, Roman Jarina
LINTest: A Development Tool for Testing Dialogue Systems .................................................... 489Lars Degerstedt, Arne Jonsson
SPEAKER CHARACTERIZATION AND RECOGNITION I
Improving the Characterization of the Alternative Hypothesis via Kernel Discriminant Analysis for Likelihood Ratio-Based Speaker Verification.................................. 493
Yi-Hsiang Chao, Wei-Ho Tsai, Hsin-Min Wang, Ruei-Chuan Chang
A Discriminative Method for Speaker Verification Using the Difference Information ...................................................................................................................................... 497
Zhenchun Lei, Yingchun Yang, Zhaohui Wu
A Multiclass Framework for Speaker Verification Within an Acoustic Event Sequence System ........................................................................................................................... 501
Nicolas Scheffer, Jean-Francois Bonastre
Speaker Cluster Based GMM Tokenization for Speaker Recognition ....................................... 505Bin Ma, Donglai Zhu, Rong Tong, Haizhou Li
Intra-Speaker Variability Compensation in Speaker Verification with Limited Enrolling Data.................................................................................................................................. 509
Claudio Garreton, Nestor Becerra Yoma, Carlos Molina, Fernando Huenupan
Speaking Faces for Face-Voice Speaker Identity Verification ................................................... 513Girija Chetty, Michael Wagner
SYSTEM COMBINATION
A Study on Lattice Rescoring with Knowledge Scores for Automatic Speech Recognition ..................................................................................................................................... 517
Sabato Marco Siniscalchi, Jinyu Li, Chin-Hui Lee
Cross-System Adaptation and Combination for Continuous Speech Recognition: The Influence of Phoneme Set and Acoustic Front-End............................................................. 521
Sebastian Stuker, Christian Fugen, Susanne Burger, Matthias Wolfel
Generating Complementary Systems for Speech Recognition ................................................. 525C. Breslin, M. J. F. Gales
VOLUME II
Investigations of Issues for Using Multiple Acoustic Models to Improve Continuous Speech Recognition .................................................................................................. 529
Rong Zhang, Alexander I. Rudnicky
A New Framework for System Combination Based on Integrated Hypothesis Space................................................................................................................................................ 533
I-Fan Chen, Lin-Shan Lee
Frame Based System Combination and a Comparison with Weighted ROVER and CNC................................................................................................................................................... 537
Bjorn Hoffmeister, Tobias Klein, Ralf Schluter, Hermann Ney
INTERPRETING PROSODIC VARIATION
Towards an Integrated Understanding of Speaking Rate in Conversation .............................. 541Jiahong Yuan, Mark Liberman, Christopher Cieri
Prosody of Interrogative and Affirmative Sentences in Vietnamese Language: Analysis and Perceptive Results................................................................................................... 545
Minh Quang Vprotect Unhbox Voidb@x Penalty @m {}u, Dprotect Unhbox Voidb@x Penalty @m {}o Dat Tran, Eric Castelli
Intonational Cues to Student Questions in Tutoring Dialogs .................................................... 549Jennifer J. Venditti, Julia Hirschberg, Jackson Liscombe
Testing the Effect of Audiovisual Cues to Prominence via a Reaction-Time Experiment....................................................................................................................................... 553
Emiel Krahmer, Marc Swerts
Effect of Genre, Speaker, and Word Class on the Realization of Given and New Information ...................................................................................................................................... 557
Agustin Gravano, Julia Hirschberg
Word Order and Tonal Shape in the Production of Focus in Short Finnish Utterances........................................................................................................................................ 561
Martti Vainio, Juhani Jarvikivi, Stefan Werner
ARTICULATORY MODELING
Modeling Sensory-to-Motor Mappings Using Neural Nets and a 3D Articulatory Speech Synthesizer ........................................................................................................................ 565
Bernd J. Kroger, Peter Birkholz, Jim Kannampuzha, Christiane Neuschaefer-Rube
Semi-Automatic Extraction of Vocal Tract Movements from Cineradiographic Data................................................................................................................................................... 569
Julie Fontecave, Frederic Berthommier
Towards Continuous Speech Recognition Using Surface Electromyography ........................ 573Szu-Chen Jou, Tanja Schultz, Matthias Walliczek, Florian Kraft, Alex Waibel
A Trajectory Mixture Density Network for the Acoustic-Articulatory Inversion Mapping ........................................................................................................................................... 577
Korin Richmond
Articulatory Features for ``Meeting'' Speech Recognition ......................................................... 581Florian Metze
Training of Coarticulation Models Using Dominance Functions and Visual Unit Selection Methods for Audio-Visual Speech Synthesis ............................................................. 585
Zdenek Krnoul, Milos Zelezny, Ludek Muller, Jakub Kanis
ACOUSTIC MODELING I – TRAINING AND TOPOLOGIES
Phone Recognition Analysis for Trajectory HMM ....................................................................... 589Le Zhang, Steve Renals
Discriminative Kernel-Based Phoneme Sequence Recognition ................................................ 593Joseph Keshet, Shai Shalev-Shwartz, Samy Bengio, Yoram Singer, Dan Chazan
Combining Phonetic Attributes Using Conditional Random Fields .......................................... 597Jeremy Morris, Eric Fosler-Lussier
Discriminative MLE Training Using a Product of Gaussian Likelihoods .................................. 601T. Nagarajan, Douglas O'Shaughnessy
State-Level Variable Modeling for Phoneme Classification ....................................................... 605Hao-Zheng Li, Douglas O'Shaughnessy
A Time-Synchronous Phonetic Decoder for a Long-Contextual-Span Hidden Trajectory Model ............................................................................................................................. 609
Xiaolong Li, Li Deng, Dong Yu, Alex Acero
Analysis of HMM Temporal Evolution for Automatic Speech Recognition and Utterance Verification ..................................................................................................................... 613
Marta Casar, Jose A. R. Fonollosa
Improvements to Bucket Box Intersection Algorithm for Fast GMM Computation in Embedded Speech Recognition Systems ................................................................................ 617
Min Tang, Aravind Ganapathiraju
Forward-Backwards Training of Hybrid HMM/BN Acoustic Models .......................................... 621Konstantin Markov, Satoshi Nakamura
A Comparative Study of Gaussian Selection Methods in Large Vocabulary Continuous Speech Recognition .................................................................................................. 625
Dirk Gehrig, Thomas Schaaf
A Successive State and Mixture Splitting for Optimizing the Size of Models in Speech Recognition........................................................................................................................ 629
Soo-Young Suk, Seong-Jun Hahm, Ho-Youl Jung, Hyun-Yeol Chung
Improved Source Modeling and Predictive Classification for Channel Robust Speech Recognition........................................................................................................................ 633
Valentin Ion, Reinhold Haeb-Umbach
ACOUSTIC SIGNAL SEGMENTATION AND CLASSIFICATION
Automatic English Stop Consonants Classification Using Wavelet Analysis and Hidden Markov Models ................................................................................................................... 637
Marco Kuhne, Roberto Togneri
Single Frame Selection for Phoneme Classification................................................................... 641Tingyao Wu, Dirk Van Compernolle, Jacques Duchateau, Hugo Van Hamme
On the Relation Between Maximum Spectral Transition Positions and Phone Boundaries ...................................................................................................................................... 645
Sorin Dusan, Lawrence Rabiner
Objective Estimation of Suicidal Risk Using Vocal Output Characteristics............................. 649T. Yingthawornsuk, H. Kaymaz Keskinpala, D. France, D. M. Wilkes, R. G. Shiavi, R. M. Salomon
A Wavelet-Based Parameterization for Speech/Music Segmentation ....................................... 653E. Didiot, I. Illina, O. Mella, D. Fohr, Jean-Paul Haton
Distance Measure Between Gaussian Distributions for Discriminating Speaking Styles................................................................................................................................................ 657
Goshu Nagino, Makoto Shozakai
Bayesian Networks for Phonetic Classification Using Time-Scale Features ........................... 661Franz Pernkopf, Tuan Van Pham
Fast and Effective Retraining on Contrastive Vocal Characteristics with Bidirectional Long Short-Term Memory Nets .............................................................................. 665
Nicole Beringer
Exploiting Dendritic Autocorrelogram Structure to Identify Spectro-Temporal Regions Dominated by a Single Sound Source........................................................................... 669
Ning Ma, Phil Green, Andre Coy
Locating Phone Boundaries from Acoustic Discontinuities Using a Two-Staged Approach ......................................................................................................................................... 673
Pairote Leelaphattarakij, Proadpran Punyabukkana, Atiwong Suchato
Investigation on Rescoring Using Minimum Verification Error (MVE) Detectors .................... 677Qiang Fu, Biing-Hwang Juang
Generalization of the Minimum Classification Error (MCE) Training Based on Maximizing Generalized Posterior Probability (GPP).................................................................. 681
Qiang Fu, Antonio Moreno-Daniel, Biing-Hwang Juang, Jian-Lai Zhou, Frank K. Soong
Unsupervised Detection of Whispered Speech in the Presence of Normal Phonation......................................................................................................................................... 685
Michael A. Carlin, Brett Y. Smolenski, Stanley J. Wenndt
Friends and Enemies: A Novel Initialization for Speaker Diarization ........................................ 689Xavier Anguera, Chuck Wooters, Javier Hernando
LINGUISTICS, PHONOLOGY, AND PHONETICS I
Acoustic Cues for the Classification of Regular and Irregular Phonation................................ 693Kushan Surana, Janet Slifka
Realizations and Representations of Thai Tones in Monomoraic Syllables ............................ 697Rattima Nitisaroj
Measuring and Comparing Vowel Qualities in a Dutch Spontaneous Speech Corpus.............................................................................................................................................. 701
Irene Jacobi, Louis C. W. Pols, Jan Stroop
Phonetic Research on Accented Chinese in Three Dialectal Regions: Shanghai, Wuhan and Xiamen ......................................................................................................................... 705
Aijun Li, Qiang Fang, Ziyu Xiong
Pronunciation Variation Modeling for Mandarin with Accent .................................................... 709Chi Zhang, Ji Wu, Xi Xiao, Zuoying Wang
Specificity and Generalizability of Spontaneous Phonetic Imitation ........................................ 713Kuniko Y. Nielsen
On the Sufficiency of Automatic Phonetic Transcriptions for Pronunciation Variation Research.......................................................................................................................... 717
Christophe Van Bael, Hans Van Halteren
Automatic Detection of Voice Onset Time Contrasts for Use in Pronunciation Assessment ..................................................................................................................................... 721
Abe Kazemzadeh, Joseph Tepperman, Jorge Silva, Hong You, Sungbok Lee, Abeer Alwan, Shrikanth Narayanan
Unfilled Pauses in Japanese Sentences Read Aloud by Non-Native Learners........................ 725Hiroko Hirano, Goh Kawai, Keikichi Hirose, Nobuaki Minematsu
Detection of Quotations and Inserted Clauses and Its Application to Dependency Structure Analysis in Spontaneous Japanese............................................................................. 729
Ryoji Hamabe, Kiyotaka Uchimoto, Tatsuya Kawahara, Hitoshi Isahara
Chinese Input Method Based on Reduced Mandarin Phonetic Alphabet ................................. 733Chun-Han Tseng, Chia-Ping Chen
Thesaurus Expansion Using Similar Word Pairs from Patent Documents............................... 737Yoshimi Suzuki, Fumiyo Fukumoto
Low-Resource Autodiacritization of Abjads for Speech Keyword Search ............................... 741Patrick Schone
SPEECH TRANSLATION
Building an English-Iraqi Arabic Machine Translation System for Spoken Utterances with Limited Resources .............................................................................................. 745
Jason Riesa, Behrang Mohit, Kevin Knight, Daniel Marcu
A Phrase-Level Machine Translation Approach for Disfluency Detection Using Weighted Finite State Transducers............................................................................................... 749
Sameer Maskey, Bowen Zhou, Yuqing Gao
Improving Phrase-Based Korean-English Statistical Machine Translation.............................. 753Jonghoon Lee, Donghyeon Lee, Gary Geunbae Lee
A Hybrid Phrase-Based/Statistical Speech Translation System................................................ 757David Stallard, Fred Choi, Kriste Krstovski, Prem Natarajan, Rohit Prasad, Shirin Saleem
High-Quality Speech Translation in the Flight Domain............................................................... 761Chao Wang, Stephanie Seneff
Optimizing Components for Handheld Two-Way Speech Translation for an English-Iraqi Arabic System .......................................................................................................... 765
Roger Hsiao, Ashish Venugopal, Thilo Kohler, Ying Zhang, Paisarn Charoenpornsawat, Andreas Zollmann, Stephan Vogel, Alan W. Black, Tanja Schultz, Alex Waibel
ACOUSTIC MODELING II – ADAPTION
Distant-Talking Continuous Speech Recognition Based on a Novel Reverberation Model in the Feature Domain ......................................................................................................... 769
Armin Sehr, Marcus Zeller, Walter Kellermann
Robust Feature Space Adaptation for Telephony Speech Recognition ................................... 773Xin Lei, Jon Hamaker, Xiaodong He
A Simulated-Data Adaptation Technique for Robust Speech Recognition .............................. 777Nattanun Thatphithakkul, Boontee Kruatrachue, Chai Wutiwiwatchai, Sanparith Marukatat, Vataya Boonpiam
A New HMM Adaptation Approach for the Case of a Hands-Free Speech Input in Reverberant Rooms ........................................................................................................................ 781
Hans-Gunter Hirsch, Harald Finster
A Vector Space Approach to Environment Modeling for Robust Speech Recognition ..................................................................................................................................... 785
Yu Tsao, Chin-Hui Lee
Subspace Modeling and Selection for Noisy Speech Recognition ........................................... 789Jen-Tzung Chien, Chuan-Wei Ting
EMOTIONAL SPEECH AND SPEAKER STATE
Recognition of Interest in Human Conversational Speech ........................................................ 793Bjorn Schuller, Niels Kohler, Ronald Muller, Gerhard Rigoll
Using System and User Performance Features to Improve Emotion Detection in Spoken Tutoring Dialogs................................................................................................................ 797
Hua Ai, Diane J. Litman, Kate Forbes-Riley, Mihai Rotaru, Joel Tetreault, Amruta Purandare
Real-Life Emotions Detection with Lexical and Paralinguistic Cues on Human-Human Call Center Dialogs ............................................................................................................ 801
Laurence Devillers, Laurence Vidrascu
Real vs. Acted Emotional Speech ................................................................................................. 805Janneke Wilting, Emiel Krahmer, Marc Swerts
Emotion Recognition in Spontaneous Speech Using GMMs ..................................................... 809Daniel Neiberg, Kjell Elenius, Kornel Laskowski
Personality Factors in Human Deception Detection: Comparing Human to Machine Performance..................................................................................................................... 813
Frank Enos, Stefan Benus, Robin L. Cautin, Martin Graciarena, Julia Hirschberg, Elizabeth Shriberg
SPEECH AND LANGUAGE IN EDUCATION
Developing an Automatic Assessment Tool for Children's Oral Reading ................................ 817Leen Cleuren, Jacques Duchateau, Alain Sips, Pol Ghesquiere, Hugo Van Hamme
Prototyping a Call System for Students of Japanese Using Dynamic Diagram Generation and Interactive Hints................................................................................................... 821
Christopher Waple, Yasushi Tsubota, Masatake Dantsuji, Tatsuya Kawahara
A Multilingual Embodied Conversational Agent for Tutoring Speech and Language Learning ......................................................................................................................... 825
Dominic W. Massaro, Ying Liu, Trevor H. Chen, Charles Perfetti
Classroom Success of an Intelligent Tutoring System for Lexical Practice and Reading Comprehension ............................................................................................................... 829
Michael Heilman, Kevyn Collins-Thompson, Jamie Callan, Maxine Eskenazi
Assessing the Reading Level of Web Pages ............................................................................... 833Sarah E. Petersen, Mari Ostendorf
Is ASR Accurate Enough for Automated Reading Tutors, and How Can We Tell? ................. 837Jack Mostow
Development of a Program for Self Assessment of Japanese Pronunciation by English Learners ............................................................................................................................. 841
Chiharu Tsurutani, Yutaka Yamauchi, Nobuaki Minematsu, Dean Luo, Kazutaka Maruyama, Keikichi Hirose
Pronunciation Verification of Children's Speech for Automatic Literacy Assessment ..................................................................................................................................... 845
Joseph Tepperman, Jorge Silva, Abe Kazemzadeh, Hong You, Sungbok Lee, Abeer Alwan, Shrikanth Narayanan
Computer Aided Pronunciation Learning System Using Speech Recognition Techniques ...................................................................................................................................... 849
Sherif Mahdy Abdou, Salah Eldeen Hamid, Mohsen Rashwan, Abdurrahman Samir, Ossama Abdel-Hamid, Mostafa Shahin, Waleed Nazih
SPEECH PERCEPTION I
An Information Theoretic Tool for Investigating Speech Perception ........................................ 853Bryce Lobdell, Jont B. Allen
An Adaptive Sampling Procedure for Speech Perception Experiments................................... 857Geoffrey Stewart Morrison
Disentangling Gestural and Auditory Contrast Accounts of Compensation for Coarticulation .................................................................................................................................. 861
Navin Viswanathan, James S. Magnuson, Carol A. Fowler
The Role of Positional Probability in the Segmentation of Cantonese Speech ....................... 865Michael C. W. Yip
Nasality Perception of Vowels in Different Language Background .......................................... 869Shahina Haque, Tomio Takara
Steady-State Suppression in Reverberation: A Comparison of Native and Nonnative Speech Perception ....................................................................................................... 873
Nao Hodoshima, Dawn Behne, Takayuki Arai
Effect of Dynamic Information of Formants on Discrimination of English Vowels in Consonantal Contexts by Japanese Listeners ........................................................................ 877
Akiyo Joto
Native and Nonnative Audio-Visual Perception of English Fricatives in Quiet and Cafe-Noise Backgrounds ............................................................................................................... 881
Yue Wang, Dawn Behne, Haisheng Jiang, Chad Danyluck
Perceptive and Acoustic Measurement of Average Speaking Pitch of Female and Male Speakers in German Radio News......................................................................................... 885
Sven Grawunder, Ines Bose, Birgit Hertha, Franziska Trauselt, Lutz Christian Anders
Effects of Frequency Shifts on Perceived Naturalness and Gender Information in Speech ............................................................................................................................................. 889
Peter F. Assmann, Sophia Dembling, Terrance M. Nearey
Influence of Pause Length on Listeners' Impressions in Simultaneous Interpretation ................................................................................................................................... 893
Hitomi Tohyama, Shigeki Matsubara
New Measures to Chart Toddlers' Speech Perception and Language Development: A Test of the Lexical Restructuring Hypothesis ................................................. 897
Iris-Corinna Schwarz, Denis Burnham
Perception of Fundamental Frequency in Cochlear Implant Patients....................................... 901Angel De La Torre, Cristina Roldan, Manuel Sainz
SPEAKER CHARACTERIZATION AND RECOGNITION II
Significance of Formants from Difference Spectrum for Speaker Identification ..................... 905Kishore Prahallad, Sudhakar Varanasi, Ranganatham Veluru, Bharat Krishna M., Debashish S. Roy
Using Genetic Algorithms to Weight Acoustic Features for Speaker Recognition ................. 909Maider Zamalloa, German Bordel, Luis Javier Rodriguez, Mikel Penagarikano, Juan Pedro Uribe
Missing Feature Theory with Soft Spectral Subtraction for Speaker Verification ................... 913Michael T. Padilla, Thomas F. Quatieri, Douglas A. Reynolds
Prosodic Features for Speaker Verification ................................................................................. 917Leena Mary, Yegnanarayana B.
Unsupervised Learning of HMM Topology for Text-Dependent Speaker Verification....................................................................................................................................... 921
Ming Liu, Thomas S. Huang
On the Use of Jacobian Adaptation in Real Speaker Verification Applications....................... 925Jan Anguita, Javier Hernando
A Novel Framework of Text-Independent Speaker Verification Based on Utterance Transform and Iterative Cohort Modeling .................................................................................... 929
Ming Liu, Huazhong Ning, Thomas S. Huang, Zhengyou Zhang
A Cohort - UBM Approach to Mitigate Data Sparseness for In-Set/Out-of-Set Speaker Recognition ...................................................................................................................... 933
Vinod Prakash, John H. L. Hansen
Analysis of Lombard Effect Under Different Types and Levels of Noise with Application to In-Set Speaker ID Systems.................................................................................... 937
Vaishnevi S. Varadarajan, John H. L. Hansen
Reducing Speech Coding Distortion for Speaker Identification ................................................ 941Alan McCree
A Text-Prompted Distributed Speaker Verification System Implemented on a Cellular Phone and a Mobile Terminal .......................................................................................... 945
Tsuneo Kato, Hisashi Kawai
Automatic Detection of Irregular Phonation in Continuous Speech ......................................... 949Srikanth Vishnubhotla, Carol Y. Espy-Wilson
SPEECH PRODUCTION, PHYSIOLOGY, AND PATHOLOGY I
Effects of Word Frequency on the Acoustic Durations of Affixes............................................. 953Mark Pluymaekers, Mirjam Ernestus, R. Harald Baayen
A Noninvasive, Low-Cost Device to Study the Velopharyngeal Port During Speech and Some Preliminary Results ........................................................................................ 957
Xiaochuan Niu, Alexander B. Kain, Jan P. H. Van Santen
Characterization of Cued Speech Vowels from the Inner Lip Contour ..................................... 961Noureddine Aboutabit, Denis Beautemps, Laurent Besacier
Modelling Aspiration Noise During Phonation Using the LF Voice Source Model .................. 965Christer Gobl
A Simulation Based Parameter Optimization for a Coarticulation Model ................................. 969Jianguo Wei, Xugang Lu, Jianwu Dang
Multivariate Analysis of Frame-Based Acoustic Cues of Dysperiodicities in Connected Speech.......................................................................................................................... 973
A. Kacha, Francis Grenez, Jean Schoentgen
Effects of Midline Tongue Piercing on Spectral Centroid Frequencies of Sibilants ............... 977Tom Kovacs, Donald S. Finan
Assessment of Articulatory Sub-Systems of Dysarthric Speech Using an Isolated-Style Phoneme Recognition System............................................................................................. 981
P. Vijayalakshmi, M. R. Reddy, Douglas O'Shaughnessy
Respiratory/Laryngeal Interactions During Sustained Vowel Production in Children............................................................................................................................................ 985
Donald S. Finan, Carol A. Boliek
Acoustic Characterization of Children with Speech Delay......................................................... 989H. Timothy Bunnell, James B. Polikoff
Study of Time and Frequency Variability in Pathological Speech and Error Reduction Methods for Automatic Speech Recognition ............................................................ 993
Oscar Saz, Antonio Miguel, Eduardo Lleida, Alfonso Ortega, Luis Buera
FORMANT ESTIMATION
Tracking of Involuntary Formant Frequency Variations and Application to Parkinsonian Speech...................................................................................................................... 997
Laurence Cnockaert, Jean Schoentgen, Pascal Auzou, Canan Ozsancak, Francis Grenez
All-Pole Model Estimation of Vocal Tract on the Frequency Domain ..................................... 1001Luis Weruaga, Amar Al-Khayat
HMM-Based MAP Prediction of Voiced and Unvoiced Formant Frequencies from Noisy MFCC Vectors..................................................................................................................... 1005
Jonathan Darch, Ben Milner
Extracting Formants from Short Segments of Speech Using Group Delay Functions ....................................................................................................................................... 1009
Anand Joseph M., Guruprasad S., Yegnanarayana B.
Tracking of Visible Vocal Tract Resonances (VVTR) Based on Kalman Filtering ................. 1013I Yucel Ozbek, Mubeccel Demirekler
Wavelet Ridge Track Interpretation in Terms of Formants....................................................... 1017Salma Chaari, Kais Ouni, Noureddine Ellouze
LANGUAGE PROCESSING BEYOND AND BELOW THE WORD-LEVEL
Unsupervised Segmentation of Words into Morphemes --- Morpho Challenge 2005 Application to Automatic Speech Recognition................................................................. 1021
Mikko Kurimo, Mathias Creutz, Matti Varjokallio, Ebru Arisoy, Murat Saraclar
Lattice Extension and Rescoring Based Approaches for LVCSR of Turkish......................... 1025Ebru Arisoy, Murat Saraclar
Exploiting Semantic Relations for a Spoken Language Understanding Application ............ 1029Catherine Kobus, Geraldine Damnati, Lionel Delphin-Poulat, Renato De Mori
Sentence Boundary Detection of Spontaneous Japanese Using Statistical Language Model and Support Vector Machines........................................................................ 1033
Yuya Akita, Masahiro Saikou, Hiroaki Nanjo, Tatsuya Kawahara
Compact N-Gram Models by Incremental Growing and Clustering of Histories.................... 1037Sami Virpioja, Mikko Kurimo
Opinion Mining in a Telephone Survey Corpus ......................................................................... 1041Nathalie Camelin, Geraldine Damnati, Frederic Bechet, Renato De Mori
SPOKEN DIALOG SYSTEMS II
A User Simulator Based on VoiceXML for Evaluation of Spoken Dialog Systems................ 1045Akinori Ito, Keisuke Shimada, Motoyuki Suzuki, Shozo Makino
User Expectations and Real Experience on a Multimodal Interactive System....................... 1049Kristiina Jokinen, Topi Hurtig
Detecting Anger in Automated Voice Portal Dialogs ................................................................ 1053F. Burkhardt, J. Ajmera, Roman Englert, J. Stegmann, W. Burleson
VOLUME III
Evaluation of a Spoken Dialogue System with Usability Tests and Long-Term Pilot Studies: Similarities and Differences................................................................................. 1057
Markku Turunen, Jaakko Hakulinen, Anssi Kainulainen
CHAT: A Conversational Helper for Automotive Tasks ............................................................ 1061F. Weng, S. Varges, B. Raghunathan, F. Ratiu, H. Pon-Barry, B. Lathrop, Q. Zhang, H. Bratt, T. Scheideck, K. Xu, M. Purver, R. Mishra, A. Lien, M. Raya, S. Peters, Y. Meng, J. Russell, L. Cavedon, E. Shriberg, H. Schmidt, R. Prieto
User Simulation for Spoken Dialogue Systems: Learning and Evaluation ............................ 1065Kallirroi Georgila, James Henderson, Oliver Lemon
CORPORA, ANNOTATION, AND ASSESSMENT METRICS II
Conversational Quality Estimation Model for Wideband IP-Telephony Services .................. 1069Hitoshi Aoki, Atsuko Kurashima, Akira Takahashi
The Vocal Joystick Data Collection Effort and Vowel Corpus ................................................. 1073Kelley Kilanski, Jonathan Malkin, Xiao Li, Richard Wright, Jeff A. Bilmes
Comparison of the ITU-T P.85 Standard to Other Methods for the Evaluation of Text-to-Speech Systems .............................................................................................................. 1077
Dmitry Sityaev, Katherine Knill, Tina Burrows
An Annotation Scheme for Complex Disfluencies .................................................................... 1081Peter A. Heeman, Andy McMillin, J. Scott Yaruss
Automatic Phonetic Transcription of Large Speech Corpora: A Comparative Study .............................................................................................................................................. 1085
Christophe Van Bael, Lou Boves, Henk Van Den Heuvel, Helmer Strik
Examining Knowledge Sources for Human Error Correction .................................................. 1089Yongmei Shi, Lina Zhou
ROBUSTNESS AND ADAPTATION FOR ASR
An Integrated Solution for Error Concealment in DSR Systems Over Wireless Channels ........................................................................................................................................ 1093
Antonio M. Peinado, Angel M. Gomez, Victoria Sanchez, Jose L. Perez-Cordoba, Antonio J. Rubio
Interleaving and MMSE Estimation with VQ Replicas for Distributed Speech Recognition Over Lossy Packet Networks................................................................................. 1097
Angel M. Gomez, Antonio M. Peinado, Victoria Sanchez, Jose L. Carmona, Antonio J. Rubio
Noise-Robust Speech Recognition of Conversational Telephone Speech ............................ 1101Shaughnessy
Lost Speech Reconstruction Method Using Speech Recognition Based on Missing Feature Theory and HMM-Based Speech Synthesis................................................... 1105
Shingo Kuroiwa, Satoru Tsuge, Fuji Ren
Speaker Adaptation Using Evolutionary-Based Linear Transform ......................................... 1109Sid-Ahmed Selouani, Douglas O'Shaughnessy
A Speaker Adaptation Algorithm Using Principal Curves in Noisy Environments................ 1113Jingying Wang, Zuoying Wang
Limitations of MLLR Adaptation with Spanish-Accented English: An Error Analysis ......................................................................................................................................... 1117
Constance Clarke, Daniel Jurafsky
Issues with Uncertainty Decoding for Noise Robust Speech Recognition ............................ 1121H. Liao, M. J. F. Gales
Vector Taylor Series Based Joint Uncertainty Decoding ......................................................... 1125Haitian Xu, Luca Rigazio, David Kryze
A Maximum Likelihood Training Approach to Irrelevant Variability Compensation Based on Piecewise Linear Transformations ............................................................................ 1129
Qiang Huo, Donglai Zhu
Speaker Clustered Regression-Class Trees for MLLR Adaptation ......................................... 1133Arindam Mandal, Mari Ostendorf, Andreas Stolcke
Robust Speech Recognition Over Mobile Networks Using Combined Weighted Viterbi Decoding and Subvector Based Error Concealment .................................................... 1137
Zheng-Hua Tan, Paul Dalsgaard, Borge Lindberg
Speaker Adaptation of Trajectory HMMs Using Feature-Space MLLR.................................... 1141Heiga Zen, Yoshihiko Nankaku, Keiichi Tokuda, Tadashi Kitamura
Feature and Model Space Speaker Adaptation with Full Covariance Gaussians .................. 1145Daniel Povey, George Saon
MULTIMODAL, TRANSLATION AND INFORMATION RETRIEVAL
Linguistic Tuple Segmentation in Ngram-Based Statistical Machine Translation................. 1149Adria De Gispert, Jose B. Marino
Sentence Boundary Detection Using Sequential Dependency Analysis Combined with CRF-Based Chunking ........................................................................................................... 1153
Takanobu Oba, Takaaki Hori, Atsushi Nakamura
Sequence Classification for Machine Translation..................................................................... 1157Srinivas Bangalore, Patrick Haffner, Stephan Kanthak
Two-Stage Vocabulary-Free Spoken Document Retrieval --- Subword Identification and Re-Recognition of the Identified Sections................................................... 1161
Yoshiaki Itoh, Takayuki Otake, Kohei Iwata, Kazunori Kojima, Masaaki Ishigame, Kazuyo Tanaka, Shi-Wook Lee
Design and Performance Analysis of a Factoid Question Answering System for Spontaneous Speech Transcriptions ......................................................................................... 1165
Mihai Surdeanu, David Dominguez-Sal, Pere R. Comas
Performance Improvement of Dialog Speech Translation by Rejecting Unreliable Utterances...................................................................................................................................... 1169
Toshiyuki Takezawa, Tohru Shimizu
Cross-Lingual Dialog Model for Speech to Speech Translation .............................................. 1173Emil Ettelaie, Panayiotis G. Georgiou, Shrikanth Narayanan
A Robust Fusion Method for Multilingual Spoken Document Retrieval Systems Employing Tiered Resources ...................................................................................................... 1177
Murat Akbacak, John H. L. Hansen
Recent Advances of IBM's Handheld Speech Translation System ......................................... 1181Weizhong Zhu, Bowen Zhou, Charles Prosser, Pavel Krbec, Yuqing Gao
QASR: Question Answering Using Semantic Roles for Speech Interface.............................. 1185Svetlana Stenchikova, Dilek Hakkani-Tur, Gokhan Tur
Towards a Multimodal Topic Tracking System for a Mobile Robot ......................................... 1189Jan F. Maas, Britta Wrede, Gerhard Sagerer
Edge-Splitting in a Cumulative Multimodal System, for a No-Wait Temporal Threshold on Information Fusion, Combined with an Under-Specified Display.................... 1193
Edward C. Kaiser, Paulo Barthelmess
Joint Interpretation of Input Speech and Pen Gestures for Multimodal Human-Computer Interaction.................................................................................................................... 1197
Pui-Yu Hui, Helen M. Meng
ADVANCES IN ACOUSTIC SEGMENTATION
Voice Activity Detector Based on Enhanced Cumulant of LPC Residual and On-Line EM Algorithm ........................................................................................................................ 1201
David Cournapeau, Tatsuya Kawahara, Kenji Mase, Tomoji Toriyama
A Constrained Baum-Welch Algorithm for Improved Phoneme Segmentation and Efficient Training........................................................................................................................... 1205
David Huggins-Daines, Alexander I. Rudnicky
Infinite Models for Speaker Clustering ....................................................................................... 1209Fabio Valente
The Segmentation of Multi-Channel Meeting Recordings for Automatic Speech Recognition ................................................................................................................................... 1213
John Dines, Jithendra Vepa, Thomas Hain
Minimum Boundary Error Training for Automatic Phonetic Segmentation............................ 1217Jen-Wei Kuo, Hsin-Min Wang
Dynamic Evidence Models in a DBN Phone Recognizer .......................................................... 1221William Schuler, Tim Miller, Stephen Wu, Andrew Exley
ACOUSTIC MODELING III – LVCSR
The IBM 2006 Speech Transcription System for European Parliamentary Speeches ....................................................................................................................................... 1225
B. Ramabhadran, Olivier Siohan, L. Mangu, G. Zweig, M. Westphal, H. Schulz, A. Soneiro
Advances in Lecture Recognition: The ISL RT-06S Evaluation System ................................. 1229Christian Fugen, Matthias Wolfel, John W. McDonough, Shajith Ikbal, Florian Kraft, Kornel Laskowski, Mari Ostendorf, Sebastian Stuker, Kenichi Kumatani
Investigation on Mandarin Broadcast News Speech Recognition .......................................... 1233Mei-Yuh Hwang, Xin Lei, Wen Wang, Takahiro Shinozaki
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition...................... 1237Xin Lei, Manhung Siu, Mei-Yuh Hwang, Mari Ostendorf, Tan Lee
Prosodic Modeling in Large Vocabulary Mandarin Speech Recognition ............................... 1241Jui-Ting Huang, Lin-Shan Lee
Experiments on Chinese Speech Recognition with Tonal Models and Pitch Estimation Using the Mandarin Speecon Data .......................................................................... 1245
Ying Sun, Daniel Willett, Raymond Brueckner, Rainer Gruhn, Dirk Buhler
LINGUISTICS, PHONOLOGY, AND PHONETICS II
A Model of the Regularities Underlying Speaker Variation: Evidence from Hybrid Synthesis ....................................................................................................................................... 1249
Susan R. Hertz
Pauses as a Tool to Ensure Rhythmic Wellformedness........................................................... 1253Augustin Speyer
Factors Affecting Speakers' Choice of Fillers in Japanese Presentations............................. 1256Michiko Watanabe, Yasuharu Den, Keikichi Hirose, Shusaku Miwa, Nobuaki Minematsu
Developing Consistent Pronunciation Models for Phonemic Variants................................... 1260Marelie Davel, Etienne Barnard
Grapheme-to-Phoneme Conversion Using Automatically Extracted Associative Rules for Korean TTS System ..................................................................................................... 1264
Jinsik Lee, Seungwon Kim, Gary Geunbae Lee
Example-Based Grapheme-to-Phoneme Conversion for Thai ................................................. 1268Paisarn Charoenpornsawat, Tanja Schultz
SPEECH AN DVISUAL PROCESSING
Visual Correlates to Prominence in Several Expressive Modes .............................................. 1272Jonas Beskow, Bjorn Granstrom, David House
How Auditory and Visual Prosody is Used in End-of-Utterance Detection ............................ 1276Pashiera Barkhuysen, Emiel Krahmer, Marc Swerts
The Importance of Different Facial Areas for Signalling Visual Prominence ......................... 1280Marc Swerts, Emiel Krahmer
Visual Speech Segmentation and Speaker Recognition for Transcription of TV News ............................................................................................................................................... 1284
Josef Chaloupka
HMM-Based Continuous Sign Language Recognition Using a Fast Optical Flow Parameterization of Visual Information ...................................................................................... 1288
G. Cortes, L. Garcia, Carmen Benitez, Jose C. Segura
Audio-Visual Speech Recognition in the Presence of a Competing Speaker ........................ 1292Xu Shao, Jon Barker
TTS I
Expressive Prosody for Unit-Selection Speech Synthesis....................................................... 1296Volker Strom, Robert A. J. Clark, Simon King
Cues for Hesitation in Speech Synthesis ................................................................................... 1300Rolf Carlson, Kjell Gustafson, Eva Strangert
Multi-Domain Text-to-Speech Synthesis by Automatic Text Classification ........................... 1304Francesc Alias, Joan Claudi Socoro, Xavier Sevillano, Ignasi Iriondo, Xavier Gonzalvo
Phrase Break Prediction Using Logistic Generalized Linear Model........................................ 1308Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao
Joint Prosodic and Segmental Unit Selection Speech Synthesis ........................................... 1312Robert A. J. Clark, Simon King
Phonetically Enriched Labeling in Unit Selection TTS Synthesis ........................................... 1316Yeon-Jun Kim, Ann K. Syrdal, Alistair Conkie, Mark C. Beutnagel
Further Developments in LSM-Based Boundary Training for Unit Selection TTS ................. 1320Jerome R. Bellegarda
A Style Control Technique for Speech Synthesis Using Multiple Regression HSMM ............................................................................................................................................. 1324
Takashi Nose, Junichi Yamagishi, Takao Kobayashi
Acoustic Model Training Based on Linear Transformation and MAP Modification for HSMM-Based Speech Synthesis ........................................................................................... 1328
Katsumi Ogata, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi
Improving Arabic HMM Based Speech Synthesis Quality........................................................ 1332Ossama Abdel-Hamid, Sherif Mahdy Abdou, Mohsen Rashwan
FarsBayan: A Unit Selection Based Farsi Speech Synthesizer ............................................... 1336M. Mehdi Homayounpour, Majid Namnabat
Amharic Speech Synthesis Using Cepstral Method with Stress Generation Rule ................ 1340Tadesse Anberbir, Tomio Takara
Automatic Syllable-Pattern Induction in Statistical Thai Text-to-Phone Transcription ................................................................................................................................. 1344
Ausdang Thangthai, Chatchawarn Hansakunbuntheung, Rungkarn Siricharoenchai, Chai Wutiwiwatchai
Development of Prototype Text-to-Speech Systems for Northern Sotho............................... 1348H. J. Oosthuizen, S. T. Phihlela, M. J. D. Manamela
Identify Language Origin of Personal Names with Normalized Appearance Number of Web Pages .................................................................................................................. 1352
Jiali You, Yining Chen, Min Chu, Yong Zhao, Jinlin Wang
SPECIAL POPULATIONS – LEARNERS, AGED, CHALLENGED
Observations of the Spoken Language Acquisition Process Based on a Multimodal Infant Behavior Corpus ............................................................................................ 1356
Ryo Tsuji, Tomohiko Kasami, Shogo Ishikawa, Shinya Kiriyama, Yoichi Takebayashi, Shigeyoshi Kitazawa
Infants' Ability to Extract Verbs from Continuous Speech....................................................... 1360Ellen Marklund, Francisco Lacerda
Category Formation and the Role of Spectral Quality in the Perception and Production of English Front Vowels ........................................................................................... 1363
Ricardo A. H. Bion, Paola Escudero, Andreia S. Rauber, Barbara O. Baptista
Productions in Bilinguism, Early Foreign Language Learning and Monolinguism: A Prosodic Comparison ............................................................................................................... 1367
Ranka Bijeljac-Babic, Christelle Dodane, Sabine Metta, Claire Gerard
Training Native English Speakers to Identify Japanese Vowel Length with Fast Rate Sentences ............................................................................................................................. 1371
Yukari Hirata, Elizabeth Whitehurst, Emily Cullings, Jacob Whiton, Carol Glenn
Formant-Based English Vowel Assessment for Chinese in Taiwan ....................................... 1375Jiang-Chun Chen, Wei-Tang Hsu, J.-S. Roger Jang, Ren-Yuan Lyu, Yuang-Chin Chiang
Substitute Sounds for Ventriloquism and Speech Disorders .................................................. 1379Jorg Metzner, Marcel Schmittfull, Karl Schnell
Automatic Mandarin Pronunciation Scoring for Native Learners with Dialect Accent ............................................................................................................................................ 1383
Si Wei, Qing-Sheng Liu, Yu Hu, Ren-Hua Wang
Quick Individual Fitting Methods of Simplified Hearing Compensation for Elderly People............................................................................................................................................. 1387
Kengo Fujita, Tsuneo Kato, Hisashi Kawai
An Online Adaptive Filtering Algorithm for the Vocal Joystick ............................................... 1391Xiao Li, Jonathan Malkin, Susumu Harada, Jeff A. Bilmes, Richard Wright, James Landay
Speaking Aid System for Total Laryngectomees Using Voice Conversion of Body Transmitted Artificial Speech ...................................................................................................... 1395
Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano
A Spanish Speech to Sign Language Translation System for Assisting Deaf-Mute People............................................................................................................................................. 1399
R. San-Segundo, R. Barra, L. F. D'Haro, J. M. Montero, R. Cordoba, J. Ferreiros
Potential Relevance of Audio-Visual Integration in Mammals for Computational Modeling ........................................................................................................................................ 1403
Eeva Klintfors, Francisco Lacerda
Finding the Gaps: Applying a Connectionist Model of Word Segmentation to Noisy Phone-Recognized Speech Data ...................................................................................... 1407
C. Anton Rytting
SPEECH ENHANCEMENT II
Single Channel Speech Enhancement by Frequency Domain Constrained Optimization and Temporal Masking .......................................................................................... 1411
Wen Jin, Michael Scordilis
Speech Enhancement Based on Residual Noise Shaping ....................................................... 1415Jong Won Shin, Seung Yeol Lee, Hwan Sik Yun, Nam Soo Kim
Quality Improvement of Telephone Speech by Artificial Bandwidth Expansion --- Listening Tests in Three Languages........................................................................................... 1419
Hannu Pulakka, Laura Laaksonen, Paavo Alku
Role of Phase Estimation in Speech Enhancement .................................................................. 1423Benjamin J. Shannon, Kuldip K. Paliwal
Speech Enhancement Based on Spectral Estimation from Higher-Lag Autocorrelation ............................................................................................................................. 1427
Benjamin J. Shannon, Kuldip K. Paliwal, Climent Nadeu
Noise Update Modeling for Speech Enhancement: When Do We Do Enough? ..................... 1431Nitish Krishnamurthy, John H. L. Hansen
Mapping Neural Networks for Bandwidth Extension of Narrowband Speech........................ 1435A. Shahina, Yegnanarayana B.
Decision Directed Constrained Iterative Speech Enhancement .............................................. 1439Amit Das, John H. L. Hansen
Adaptive Filtering for Attenuating Musical Noise Caused by Spectral Subtraction .............. 1443Takahiro Murakami, Yoshihisa Ishida
Evaluation of Objective Measures for Speech Enhancement .................................................. 1447Yi Hu, Philipos C. Loizou
Performance Analysis of Various Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition ......................................................................... 1451
Myung-Suk Song, Chang-Heon Lee, Hong-Goo Kang
SPEAKER CHARACTERIZATION AND RECOGNITION III
Highly Noise Robust Text-Dependent Speaker Recognition Based on Hypothesized Wiener Filtering .................................................................................................... 1455
Ramasubramanian V., Deepak Vijaywargiay, Praveen Kumar V.
Speaker Identification Under Noisy Environments by Using Harmonic Structure Extraction and Reliable Frame Weighting .................................................................................. 1459
Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
Enhancing the Performance of a GMM-Based Speaker Identification System in a Multi-Microphone Setup ............................................................................................................... 1463
Andreas Stergiou, Aristodemos Pnevmatikakis, Lazaros C. Polymenakos
Discriminative Adaptation for Speaker Verification .................................................................. 1467C. Longworth, M. J. F. Gales
Within-Class Covariance Normalization for SVM-Based Speaker Recognition ..................... 1471Andrew O. Hatch, Sachin Kajarekar, Andreas Stolcke
A New Set of Features for Text-Independent Speaker Identification ...................................... 1475Carol Y. Espy-Wilson, Sandeep Manocha, Srikanth Vishnubhotla
ROBUST ASR
Rapid Speaker Adaptation Using Regression-Tree Based Spectral Peak Alignment....................................................................................................................................... 1479
Shizhen Wang, Xiaodong Cui, Abeer Alwan
Physiologically-Motivated Synchrony-Based Processing for Robust Automatic Speech Recognition...................................................................................................................... 1483
Chanwoo Kim, Yu-Hsiang Chiu, Richard M. Stern
Sub-Word Unit Based Non-Audible Speech Recognition Using Surface Electromyography......................................................................................................................... 1487
Matthias Walliczek, Florian Kraft, Szu-Chen Jou, Tanja Schultz, Alex Waibel
Individual On-Line Variance Adaptation of Frequency Filtered Parameters for Robust ASR ................................................................................................................................... 1491
Jesus Vicente-Pena, Fernando Diaz-De-Maria, Bastiaan Kleijn
Recent Progress on the Discriminative Region-Dependent Transform for Speech Feature Extraction......................................................................................................................... 1495
Bing Zhang, Spyros Matsoukas, Richard Schwartz
Improved Warping-Invariant Features for Automatic Speech Recognition............................ 1499Jan Rademacher, Matthias Wachter, Alfred Mertins
SPEECH PERCEPTION II
Effects of Featural Similarity and Overlap Position on Lexical Confusions and Overt Similarity Judgments ......................................................................................................... 1503
Sarah C. Creel, Delphine Dahan, Daniel Swingley
Word Structure and Tone Perception in Mandarin .................................................................... 1507Hansjorg Mixdorff, Yu Hu
Identification of Regional Accents in French: Perception and Categorization ...................... 1511Cecile Woehrling, Philippe Boula De Mareuil
Consonant and Vowel Confusions in Speech-Weighted Noise ............................................... 1515Sandeep Phatak, Jont B. Allen
Accident -- Execute: Increased Activation in Nonnative Listening ......................................... 1519Mirjam Broersma
Estimation of the Quality Dimension ``Directness/Frequency Content'' for the Instrumental Assessment of Speech Quality............................................................................. 1523
Kirstin Scholz, Marcel Waltermann, Lu Huo, Alexander Raake, Sebastian Moller, Ulrich Heute
SPEECH SUMMARIZATION
Summarization Evaluation for Text and Speech: Issues and Approaches............................. 1527Ani Nenkova
Summarization of Spontaneous Conversations ........................................................................ 1531Xiaodan Zhu, Gerald Penn
Perplexity Based Linguistic Model Adaptation for Speech Summarisation........................... 1535Pierre Chatain, Edward Whittaker, Joanna Mrozinski, Sadaoki Furui
Multi-Layered Summarization of Spoken Document Archives by Information Extraction and Semantic Structuring.......................................................................................... 1539
Lin-Shan Lee, Sheng-Yi Kong, Yi-Cheng Pan, Yi-Sheng Fu, Yu-Tsun Huang
Soundbite Detection in Broadcast News Domain ..................................................................... 1543Sameer Maskey, Julia Hirschberg
Dialogue Act Compression via Pitch Contour Preservation .................................................... 1547Gabriel Murray, Steve Renals
ACOUSTIC MODELING IV
Manifold HLDA and Its Application to Robust Speech Recognition ....................................... 1551Toshiaki Kubo, Tetsuji Ogawa, Tetsunori Kobayashi
Time-Dependent Cross-Probability Model for Multi-Environment Model Based LInear Normalization..................................................................................................................... 1555
Luis Buera, Eduardo Lleida, Juan A. Nolazco-Flores, Antonio Miguel, Alfonso Ortega
SPAM and Full Covariance for Speech Recognition ................................................................. 1559Daniel Povey
The Use of Bayesian Network for Incorporating Accent, Gender and Wide-Context Dependency Information .............................................................................................................. 1563
Sakriani Sakti, Konstantin Markov, Satoshi Nakamura
Integrating Phonetic Boundary Discrimination Explicitly into HMM Systems ....................... 1567Yu Wang, Eric Fosler-Lussier
Robust Acoustic-Based Syllable Detection ............................................................................... 1571Zhimin Xie, Partha Niyogi
A Tone Recognition Framework for Continuous Mandarin Speech ........................................ 1575Lei He, Jie Hao
Pronunciation Variant-Based Multi-Path HMMs for Syllables .................................................. 1579Annika Hamalainen, Louis Ten Bosch, Lou Boves
VOLUME IV
A New State-Dependent Phonetic Tied-Mixture Model with Head-Body-Tail Structured HMM for Real-Time Continuous Phoneme Recognition System .......................... 1583
Junho Park, Hanseok Ko
Conversion from Phoneme Based to Grapheme Based Acoustic Models for Speech Recognition...................................................................................................................... 1587
Andrej Zgank, Zdravko Kacic
Phone Vector DHMM to Decode a Phone Recognizer's Output ............................................... 1591Bong-Wan Kim, Dae-Lim Choi, Yongnam Um, Yong-Ju Lee
Combining Multiple-Sized Sub-Word Units in a Speech Recognition System Using Baseform Selection ........................................................................................................... 1595
T. Nagarajan, P. Vijayalakshmi, Douglas O'Shaughnessy
Local Transformation Models for Speech Recognition ............................................................ 1598Antonio Miguel, Eduardo Lleida, Alfons Juan, Luis Buera, Alfonso Ortega, Oscar Saz
LARGE VOCABULARY SPEECH RECOGNITION
Online Speech Detection and Dual-Gender Speech Recognition for Captioning Broadcast News ............................................................................................................................ 1602
Toru Imai, Shoei Sato, Akio Kobayashi, Kazuo Onoe, Shinichi Homma
Automatic Alignment and Error Correction of Human Generated Transcripts for Long Speech Recordings............................................................................................................. 1606
Timothy J. Hazen
Improving Speech Recognition Accuracy with Multi-Confidence Thresholding ................... 1610Shuangyu Chang
Conceptual Decoding from Word Lattices: Application to the Spoken Dialogue Corpus MEDIA ............................................................................................................................... 1614
Christophe Servan, Christian Raymond, Frederic Bechet, Pascal Nocera
Improving the Performance of Out-of-Vocabulary Word Rejection by Using Support Vector Machines............................................................................................................. 1618
Shilei Huang, Xiang Xie, Jingming Kuang
Robust Phone Lattice Decoding.................................................................................................. 1622Kris Demuynck, Dirk Van Compernolle, Hugo Van Hamme
Imperfect Transcript Driven Speech Recognition ..................................................................... 1626Benjamin Lecouteux, Georges Linares, Pascal Nocera, Jean-Francois Bonastre
New Improvements in Decoding Speed and Latency for Automatic Captioning ................... 1630Jian Xue, Rusheng Hu, Yunxin Zhao
Colloquial Iraqi ASR for Speech Translation ............................................................................. 1634Shirin Saleem, Rohit Prasad, Prem Natarajan
Reducing Computation on Parallel Decoding Using Frame-Wise Confidence Scores ............................................................................................................................................ 1638
Tomohiro Hakamata, Akinobu Lee, Yoshihiko Nankaku, Keiichi Tokuda
Posterior Based Keyword Spotting with a priori Thresholds................................................... 1642Hamed Ketabdar, Jithendra Vepa, Samy Bengio, Herve Bourlard
A Multi-Pass Error Detection and Correction Framework for Mandarin LVCSR .................... 1646Zhengyu Zhou, Helen M. Meng, Wai Kit Lo
Continual On-Line Monitoring of Czech Spoken Broadcast Programs .................................. 1650Jan Nouza, Jindrich Zdansky, Petr Cerva, Jan Kolorenc
SPEECH/NOISE/MUSIC SEGMENTATION
Fast SVM Training Based on the Choice of Effective Samples for Audio Classification ................................................................................................................................. 1654
Shilei Zhang, Hongchen Jiang, Shuwu Zhang, Bo Xu
Online Speaker Change Detection by Combining BIC with Microphone Array Beamforming ................................................................................................................................. 1658
Joerg Schmalenstroeer, Reinhold Haeb-Umbach
Speech/Non-Speech Discrimination Combining Advanced Feature Extraction and SVM Learning ................................................................................................................................ 1662
Javier Ramirez, Pablo Yelamos, J. M. Gorriz, Jose C. Segura, L. Garcia
Cooperation Between Global and Local Methods for the Automatic Segmentation of Speech Synthesis Corpora ...................................................................................................... 1666
Safaa Jarifi, Dominique Pastor, Olivier Rosec
Speaker Independent Voiced-Unvoiced Detection Evaluated in Different Speaking Styles.............................................................................................................................................. 1670
Martin Heckmann, Marco Moebus, Frank Joublin, Christian Goerick
Robust Speaker Diarization for Meetings: ICSI RT06s Evaluation System ............................ 1674Xavier Anguera, Chuck Wooters, Jose M. Pardo
A Multipitch Tracker for Monaural Speech Segmentation........................................................ 1678Andre Coy, Jon Barker
Novel Entropy Based Moving Average Refiners for HMM Landmarks.................................... 1682Rahul Chitturi, Mark Hasegawa Johnson
Two-Microphone Voice Activity Detection in the Presence of Coherent Interference.................................................................................................................................... 1686
Gibak Kim, Nam Ik Cho
On a Greedy Learning Algorithm for dPLRM with Applications to Phonetic Feature Detection.......................................................................................................................... 1690
Tor Andre Myrvoll, Tomoko Matsui
PITCH ESTIMATION
Improving Glottal Waveform Estimation Through Rank-Based Glottal Quality Assessment ................................................................................................................................... 1694
Elliot Moore II, Juan Torres
A Pitch Marks Filtering Algorithm Based on Restricted Dynamic Programming .................. 1698Francesc Alias, Carlos Monzo, Joan Claudi Socoro
Analysis of Nonmodal Phonation Using Minimum Entropy Deconvolution ........................... 1702Nicolas Malyska, Thomas F. Quatieri
An Automatic Singing Skill Evaluation Method for Unknown Melodies Using Pitch Interval Accuracy and Vibrato Features ..................................................................................... 1706
Tomoyasu Nakano, Masataka Goto, Yuzuru Hiraga
A Spectral-Temporal Method for Pitch Tracking ....................................................................... 1710Stephen A. Zahorian, Princy Dikshit, Hongbing Hu
Pitch Determination Using Aligned AMDF ................................................................................. 1714M. Shahidur Rahman, Hirobumi Tanaka, Tetsuya Shimamura
ACOUSTIC MODELING V – NOVEL APPROACHES
Syllable-Length Path Mixture Hidden Markov Models with Trajectory Clustering for Continuous Speech Recognition........................................................................................... 1718
Yan Han, Lou Boves
Acoustic Modeling for Spoken Dialogue Systems Based on Unsupervised Utterance-Based Selective Training............................................................................................ 1722
Tobias Cincarek, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano
GMM-Based Acoustic Modeling for Embedded Speech Recognition ..................................... 1726Christophe Levy, Georges Linares, Jean-Francois Bonastre
Boosting HMM Performance with a Memory Upgrade .............................................................. 1730Mathias De Wachter, Kris Demuynck, Dirk Van Compernolle
An Integrated Approach to Improve Speech Recognition Rate for Non-Native Speakers ........................................................................................................................................ 1734
Y. Deng, X. Li, C. Kwan, R. Xu, B. Raj, Richard M. Stern, D. Williamson
Bayesian Decision Tree State Tying for Conversational Speech Recognition ...................... 1738Rusheng Hu, Yunxin Zhao
CORPUS-BASED SYNTHESIS
Feature Extraction for Spectral Continuity Measures in Concatenative Speech Synthesis ....................................................................................................................................... 1742
Barry Kirkpatrick, Darragh O'Brien, Ronan Scaife
Decision Tree-Based Training of Probabilistic Concatenation Models for Corpus-Based Speech Synthesis ............................................................................................................. 1746
Shinsuke Sakai, Tatsuya Kawahara
Constructing Stylistic Synthesis Databases from Audio Books ............................................. 1750Yong Zhao, Di Peng, Lijuan Wang, Min Chu, Yining Chen, Peng Yu, Jun Guo
Expanding Phonetic Coverage in Unit Selection Synthesis Through Unit Substitution from a Donor Voice ................................................................................................. 1754
Alistair Conkie, Ann K. Syrdal
Unifying Unit Selection and Hidden Markov Model Speech Synthesis ................................... 1758Paul Taylor
CLUSTERGEN: A Statistical Parametric Synthesizer Using Trajectory Modeling ................. 1762Alan W. Black
SPOKEN DIALOG TECHNOLOGY R&D
Cluster-Based User Simulations for Learning Dialogue Strategies ........................................ 1766Verena Rieser, Oliver Lemon
Prompt Selection with Reinforcement Learning in an AT&T Call Routing Application..................................................................................................................................... 1770
Charles Lewis, Giuseppe Di Fabbrizio
Developing Speech Dialogs for Multimodal HMIs Using Finite State Machines .................... 1774Silke Goronzy, Raquel Mochales, Nicole Beringer
Development of Advanced Dialog Systems with PATE ............................................................ 1778Norbert Pfleger, Jan Schehl
A Joint Intention-Based Dialogue Engine .................................................................................. 1782Rajah Annamalai Subramanian, Philip Cohen
MeMo: Towards Automatic Usability Evaluation of Spoken Dialogue Services by User Error Simulations ................................................................................................................. 1786
Sebastian Moller, Roman Englert, Klaus Engelbrecht, Verena Hafner, Anthony Jameson, Antti Oulasvirta, Alexander Raake, Norbert Reithinger
MODELING SPEAKER EMOTIONAL STATE
Synthesizing Breathiness in Natural Speech with Sinusoidal Modelling ............................... 1790Brett Matthews, Raimo Bakis, Ellen Eide
Voice GMM Modelling for FESTIVAL/MBROLA Emotive TTS Synthesis................................. 1794Mauro Nicolao, Carlo Drioli, Piero Cosi
EmoVoice: A System to Generate Emotions in Speech ........................................................... 1798Joao P. Cabral, Luis C. Oliveira
Real-Time Synthesis of Chinese Visual Speech and Facial Expressions Using MPEG-4 FAP Features in a Three-Dimensional Avatar ............................................................. 1802
Zhiyong Wu, Shen Zhang, Lianhong Cai, Helen M. Meng
Modeling the Acoustic Correlates of Expressive Elements in Text Genres for Expressive Text-to-Speech Synthesis ........................................................................................ 1806
Hongwu Yang, Helen M. Meng, Lianhong Cai
Automatic Emotion Recognition of Speech Signal in Mandarin.............................................. 1810Sheng Zhang, P. C. Ching, Fanrang Kong
Feature Analysis for Emotion Recognition from Mandarin Speech Considering the Special Characteristics of Chinese Language .................................................................... 1814
Yi-Hao Kao, Lin-Shan Lee
Timing Levels in Segment-Based Speech Emotion Recognition ............................................ 1818Bjorn Schuller, Gerhard Rigoll
Analyzing Dialogue Data for Real-World Emotional Speech Classification ........................... 1822Ryuichi Nisimura, Souji Omae, Hideki Kawahara, Toshio Irino
Evolving Emotional Prosody ....................................................................................................... 1826Cecilia Ovesdotter Alm, Xavier Llora
Vocal Emotion Recognition with Cochlear Implants................................................................. 1830Xin Luo, Qian-Jie Fu, John J. Galvin III
Emotion Detection in Infants' Cries Based on a Maximum Likelihood Approach ................. 1834S. Matsunaga, S. Sakaguchi, M. Yamashita, S. Miyahara, S. Nishitani, K. Shinohara
"Yeah Right'': Sarcasm Recognition for Spoken Dialogue Systems....................................... 1838 Joseph Tepperman, David Traum, Shrikanth Narayanan
Identification of Confusion and Surprise in Spoken Dialog Using Prosodic Features ......................................................................................................................................... 1842
Rohit Kumar, Carolyn P. Rose, Diane J. Litman
Analysis and Detection of Speech Under Sleep Deprivation ................................................... 1846Tin Lay Nwe, Haizhou Li, Minghui Dong
Language, Gender, Speaking Style and Language Proficiency as Factors Influencing the Autonomous Vocalic Filler Production in Spontaneous Speech.................. 1850
Ioana Vasilescu, Martine Adda-Decker
LANGUAGE MODELING AND ASR APPLICATIONS
How to Handle Gender and Number Agreement in Statistical Language Models? ............... 1854Caroline Lavecchia, Kamel Smaili, Jean-Paul Haton
Prosodic Features for a Maximum Entropy Language Model .................................................. 1858Oscar Chan, Roberto Togneri
Language Model Adaptation with a Word List and a Raw Corpus .......................................... 1862Shinsuke Mori
Topic-Based Language Modeling with Dynamic Bayesian Networks ..................................... 1866Pascal Wiggers, Leon J. M. Rothkrantz
Speech Recognition of Foreign Out-of-Vocabulary Words Using a Hierarchical Language Model............................................................................................................................ 1870
Hirofumi Yamamoto, Genichiro Kikui, Satoshi Nakamura, Yoshinori Sagisaka
Language Modeling of Chinese Personal Names Based on Character Units for Continuous Chinese Speech Recognition ................................................................................. 1874
Xinhui Hu, Hirofumi Yamamoto, Genichiro Kikui, Yoshinori Sagisaka
A Syllable Based Continuous Speech Recognizer for Tamil ................................................... 1878Lakshmi A., Hema A. Murthy
Spontaneous Thai Speech Recognition ..................................................................................... 1882Monika Woszczyna, Paisarn Charoenpornsawat, Tanja Schultz
Acoustic Analysis and Automatic Recognition of Spontaneous Children's Speech ............ 1886M. Gerosa, D. Giuliani, Shrikanth Narayanan
Speech and Speech Recognition During Dictation Corrections.............................................. 1890Keith Vertanen
Comparison of Keyword Spotting Methods for Searching in Speech .................................... 1894Lubos Smidl, Josef V. Psutka
Automatic Generation of Statistical Language Models for Interactive Voice Response Applications ................................................................................................................ 1898
Mithun Balakrishna, Cyril Cerovic, Dan Moldovan, Ellis Cave
Call Analysis with Classification Using Speech and Non-Speech Features .......................... 1902Yun-Cheng Ju, Ye-Yi Wang, Alex Acero
SPOKEN LANGUAGE UNDERSTANDING
A Spoken Language Understanding Approach Using Successive Learners ......................... 1906Wei-Lin Wu, Ru-Zhan Lu, Hui Liu, Feng Gao
Conversational Help Desk: Vague Callers and Context Switch ............................................... 1910Osamuyimen Stewart, Juan Huerta, Ea-Ee Jan, Cheng Wu, Xiang Li, David Lubensky
Integrating Spoken Dialog and Question Answering: The Ritel Project ................................. 1914Sophie Rosset, Olivier Galibert, Gabriel Illouz, Aurelien Max
Rapid Simulation-Driven Reinforcement Learning of Multimodal Dialog Strategies in Human-Robot Interaction......................................................................................................... 1918
Thomas Prommer, Hartwig Holzapfel, Alex Waibel
Software Architectures for Incremental Understanding of Human Speech ........................... 1922Gregory Aist, James Allen, Ellen Campana, Lucian Galescu, Carlos A. Gomez Gallo, Scott C. Stoness, Mary Swift, Michael Tanenhaus
Lingua Machinae --- An Unorthodox Proposal .......................................................................... 1926Florian Schiel, Christoph Draxler, Marion Libossek
Evaluation of Content Presentation Strategies for an In-car Spoken Dialogue System ........................................................................................................................................... 1930
Heather Pon-Barry, Fuliang Weng, Sebastian Varges
On Designing Context Sensitive Language Models for Spoken Dialog Systems .................. 1934Vaibhava Goel, Ramesh Gopinath
Using SVM and Error-Correcting Codes for Multiclass Dialog Act Classification in Meeting Corpus ............................................................................................................................. 1938
Yang Liu
A Multilingual Expectations Model for Contextual Utterances in Mixed-Initiative Spoken Dialogue ........................................................................................................................... 1942
Hartwig Holzapfel, Alex Waibel
Dynamic Help Generation by Estimating User's Mental Model in Spoken Dialogue Systems ......................................................................................................................................... 1946
Yuichiro Fukubayashi, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
Dialog Act Tagging with Support Vector Machines and Hidden Markov Models................... 1950Dinoj Surendran, Gina-Anne Levow
SEGMENTATION AND VAD
Noise Robust Model-Based Voice Activity Detection ............................................................... 1954Angel De La Torre, Javier Ramirez, Carmen Benitez, Jose C. Segura, L. Garcia, Antonio J. Rubio
Auto-Segmentation Based VAD for Robust ASR ...................................................................... 1958Yu Shi, Frank K. Soong, Jian-Lai Zhou
Improved Speech Activity Detection Using Cross-Channel Features for Recognition of Multiparty Meetings ............................................................................................ 1962
Kofi Boakye, Andreas Stolcke
Evaluation of Voice Activity Detection by Combining Multiple Features with Weight Adaptation ........................................................................................................................ 1966
Yusuke Kida, Tatsuya Kawahara
Voice Activity Detection in Personal Audio Recordings Using Autocorrelogram Compensation ............................................................................................................................... 1970
Keansub Lee, Daniel P. W. Ellis
Discriminating Speech and Non-Speech with Regularized Least Squares ............................ 1974Ryan Rifkin, Nima Mesgarani
TECHNOLOGIES FOR SPECIFIC POPULATIONS: LEARNERS AND CHALLENGED
Automatic Grammar Correction for Second-Language Learners............................................ 1978John Lee, Stephanie Seneff
ASR-Based Corrective Feedback on Pronunciation: Does It Really Work? ........................... 1982Ambra Neri, Catia Cucchiarini, Helmer Strik
Evaluating Prosody of Mandarin Speech for Language Learning ........................................... 1986Minghui Dong, Haizhou Li, Tin Lay Nwe
Spoken Language Technologies Applied to Digital Talking Books ........................................ 1990Isabel Trancoso, Carlos Duarte, Antonio Serralheiro, Diamantino Caseiro, Luis Carrico, Ceu Viana
Building an English Speech Synthesis System from a Japanese ALS Patient's Voice............................................................................................................................................... 1994
Akemi IIda, Jun Ito, Shimpei Kajima, Tsutomu Sugawara
Multi-Modal System ICANDO: Intellectual Computer AssistaNt for Disabled Operators ....................................................................................................................................... 1998
Alexey Karpov, Andrey Ronzhin, Alexandre Cadiou
THE PROSODY OF TURN-TAKING AND DIALOG ACTS
User Responses to Prosodic Variation in Fragmentary Grounding Utterances in Dialog ............................................................................................................................................. 2002
Gabriel Skantze, David House, Jens Edlund
Analysis of Prosodic and Linguistic Cues of Phrase Finals for Turn-Taking and Dialog Acts .................................................................................................................................... 2006
Carlos Toshinori Ishi, Hiroshi Ishiguro, Norihiro Hagita
VOLUME V
From Reaction to Prediction: Experiments with Computational Models of Turn-Taking............................................................................................................................................. 2010
David Schlangen
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings ..................................................................................................................... 2014
Jachym Kolar, Elizabeth Shriberg, Yang Liu
A Case Study in the Identification of Prosodic Cues to Turn-Taking: Back-Channeling in Arabic .................................................................................................................... 2018
Nigel G. Ward, Yaffa Al Bayyari
family /nailon/ --- Software for Online Analysis of Prosody ..................................................... 2022Jens Edlund, Mattias Heldner
TTS II
Conditional Random Fields for Hierarchical Segment Selection in Text-to-Speech Synthesis ....................................................................................................................................... 2026
Christian Weiss, Wolfgang Hess
Corpus Design Based on the Kullback-Leibler Divergence for Text-to-Speech Synthesis Application................................................................................................................... 2030
Aleksandra Krul, Geraldine Damnati, Francois Yvon, Thierry Moudenc
HMM-Based Unit Selection Using Frame Sized Speech Segments ......................................... 2034Zhen-Hua Ling, Ren-Hua Wang
The Target Cost Formulation in Unit Selection Speech Synthesis ......................................... 2038Paul Taylor
Unit Selection and Its Relation to Symbolic Prosody: A New Approach ................................ 2042Daniel Tihelka, Jindrich Matousek
Minimum Generation Error Criterion for Tree-Based Clustering of Context Dependent HMMs .......................................................................................................................... 2046
Yi-Jian Wu, Wu Guo, Ren-Hua Wang
Selective-LPC Based Representation of STRAIGHT Spectrum and Its Applications in Spectral Smoothing .................................................................................................................. 2050
Heng Kang, Wenju Liu
Towards a Comprehensive Investigation of Factors Relevant to Peak Alignment Using a Unit Selection Corpus .................................................................................................... 2054
Matthias Jilka, Bernd Mobius
Six Approaches to Limited Domain Concatenative Speech Synthesis................................... 2058Robert J. Utama, Ann K. Syrdal, Alistair Conkie
From Pre-Recorded Prompts to Corporate Voices: On the Migration of Interactive Voice Response Applications...................................................................................................... 2062
V. Fischer, S. Kunzmann
Automatic Speech Segmentation with Multiple Statistical Models ......................................... 2066Seung Seop Park, Jong Won Shin, Nam Soo Kim
Evaluation of Perceptual Quality of Control Point Reduction in Rule-Based Synthesis ....................................................................................................................................... 2070
Kimmo Parssinen, Marko Moberg
Segment Connection Networks for Corpus-Based Speech Synthesis ................................... 2074Geert Coorman
SPEAKER CHARACTERIZATION AND RECOGNITION IV
Detection of a Third Speaker in Telephone Conversations...................................................... 2078Uchechukwu O. Ofoegbu, Ananth N. Iyer, Robert E. Yantorno, Stanley J. Wenndt
Improvement Speaker Clustering Using Global Similarity Features....................................... 2082Konstantin Biatov, Joachim Kohler
Voting for Two Speaker Segmentation ....................................................................................... 2086Balakrishnan Narayanaswamy, Rashmi Gangadharaiah, Richard M. Stern
Unsupervised Model Adaptation for Speaker Verification ....................................................... 2090Alexandre Preti, Jean-Francois Bonastre
A Quality Measure Method Using Gaussian Mixture Models and Divergence Measure for Speaker Identification ............................................................................................. 2094
Rong Zheng, Shuwu Zhang, Bo Xu
Gammatone Auditory Filterbank and Independent Component Analysis for Speaker Identification................................................................................................................... 2098
Yushi Zhang, Waleed H. Abdulla
Study on Speaker Verification on Emotional Speech ............................................................... 2102Wei Wu, Thomas Fang Zheng, Ming-Xing Xu, Huan-Jun Bao
On the Fusion of Prosody, Voice Spectrum and Face Features for Multimodal Person Verification ....................................................................................................................... 2106
M. Farrus, A. Garde, P. Ejarque, J. Luque, Javier Hernando
An MRI Based Study of the Acoustic Effects of Sinus Cavities and Its Application to Speaker Recognition ................................................................................................................ 2110
Tarun Pruthi, Carol Y. Espy-Wilson
Speaker Verification with Non-Audible Murmur Segments ...................................................... 2114Mariko Kojima, Tomoko Matsui, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano
Automatic Recognition of Speakers' Age and Gender on the Basis of Empirical Studies ........................................................................................................................................... 2118
Christian Muller
Text-Independent Speaker Identification in Birds ..................................................................... 2122E. J. S. Fox, J. D. Roberts, M. Bennamoun
Automatic Acoustic Identification of Insects Inspired by the Speaker Recognition Paradigm ........................................................................................................................................ 2126
Ilyas Potamitis, Todor Ganchev, Nikos Fakotakis
MULTICHANNEL SPEECH ENHANCEMENT/SPEECH PERCEPTION
Improved Hybrid Microphone Array Post-Filter by Integrating a Robust Speech Absence Probability Estimator for Speech Enhancement ....................................................... 2130
Junfeng Li, Masato Akagi, Yoiti Suzuki
Soft Decision Combining for Dual Channel Noise Reduction.................................................. 2134Timo Gerkmann, Rainer Martin
An Improved Affine Projection Algorithm Based Crosstalk Resistant Adaptive Noise Canceller ............................................................................................................................. 2138
Guo Chen, Vijay Parsa
An Optimum Microphone Array Post-Filter for Speech Applications ..................................... 2142Stamatis Leukimmiatis, Dimitrios Dimitriadis, Petros Maragos
Multi-Microphone Periodicity Function for Robust F0 Estimation in Real Noisy and Reverberant Environments................................................................................................... 2146
Federico Flego, Maurizio Omologo
A New Dual-Microphone Speech Enhancement Method for Oriented Noises........................ 2150H. R. Abutalebi, M. Pourahmadi, M. R. Aghabozorgi
50 Years Late: Repeating Miller-Nicely 1955 .............................................................................. 2154Andrew Lovitt, Jont B. Allen
New 20-Word Lists for Word Intelligibility Test in Japanese.................................................... 2158Shuichi Sakamoto, Tadahiro Yoshikawa, Shigeaki Amano, Yoiti Suzuki, Tadahisa Kondo
Sparseness and Speech Perception in Noise ............................................................................ 2162Guoping Li, Mark E. Lutman
An Assessment of Automatic Speech Recognition as Speech Intelligibility Estimation in the Context of Additive Noise .............................................................................. 2166
Wei M. Liu, John S. D. Mason, Nicholas W. D. Evans, Keith A. Jellyman
Underlying Quality Dimensions of Modern Telephone Connections ...................................... 2170Marcel Waltermann, Kirstin Scholz, Alexander Raake, Ulrich Heute, Sebastian Moller
An ERB Loudness Pattern Based Objective Speech Quality Measure ................................... 2174Guo Chen, Vijay Parsa, Susan Scollie
DIARIZATION IN ASR
A Spectral Clustering Approach to Speaker Diarization........................................................... 2178Huazhong Ning, Ming Liu, Hao Tang, Thomas S. Huang
BINSEG: An Efficient Speaker-Based Segmentation Technique ............................................. 2182Jindrich Zdansky
Multi-Stream Speaker Diarization Systems for the Meetings Domain..................................... 2186Ascension Gallardo-Antolin, Xavier Anguera, Chuck Wooters
Improved Performance Evaluation of Speech Event Detectors............................................... 2190Carla Lopes, Fernando Perdigao
Speaker Diarization for Multiple Distant Microphone Meetings: Mixing Acoustic Features and Inter-Channel Time Differences ........................................................................... 2194
Jose M. Pardo, Xavier Anguera, Chuck Wooters
Low-Complexity and Efficient Classification of Voiced/Unvoiced/Silence for Noisy Environments ................................................................................................................................ 2198
Tuan Van Pham, Gernot Kubin
LANGUAGE MODEL ADAPTATION, REFINEMENT, AND EVALUATION
Unsupervised Language Model Adaptation Based on Automatic Text Collection from WWW ..................................................................................................................................... 2202
Motoyuki Suzuki, Yasutomo Kajiura, Akinori Ito, Shozo Makino
Unsupervised Language Model Adaptation Using Latent Semantic Marginals ..................... 2206Yik-Cheung Tam, Tanja Schultz
Unsupervised Language Model Adaptation for Mandarin Broadcast Conversation Transcription ................................................................................................................................. 2210
David Mrva, Philip C. Woodland
Language Model Adaptation for Tiny Adaptation Corpora....................................................... 2214Dietrich Klakow
Pronunciation Dependent Language Models............................................................................. 2218Andrej Ljolje
Improving Perplexity Measures to Incorporate Acoustic Confusability ................................. 2222Amit Anil Nanavati, Nitendra Rajput
SPEECH PRODUCTION, PHYSIOLOGY, AND PATHOLOGY II
Voice Source Correlates of Prosodic Features in American English: A Pilot Study ............. 2226Markus Iseli, Yen-Liang Shue, Melissa A. Epstein, Patricia Keating, Jody Kreiman, Abeer Alwan
On Speech Variation and Word Type Differentiation by Articulatory Feature Representations ............................................................................................................................ 2230
Louis Ten Bosch, R. Harald Baayen, Mirjam Ernestus
A Study of Emotional Speech Articulation Using a Fast Magnetic Resonance Imaging Technique ....................................................................................................................... 2234
Sungbok Lee, Erik Bresch, Jason Adams, Abe Kazemzadeh, Shrikanth Narayanan
Reconstructing Tongue Movements from Audio and Video .................................................... 2238Hedvig Kjellstrom, Olov Engwall, Olle Balter
New Considerations for Vowel Nasalization Based on Separate Mouth-Nose Recording ...................................................................................................................................... 2242
Gang Feng, Cyril Kotenkoff
An Acoustic and Articulatory Study of Lombard Speech: Global Effects on the Utterance........................................................................................................................................ 2246
Maeva Garnier, Lucie Bailly, Marion Dohen, Pauline Welby, Helene Loevenbruck
VOICE MORPHING
Improving the Performance of HMM-Based Voice Conversion Using Context Clustering Decision Tree and Appropriate Regression Matrix Format ................................... 2250
Long Qin, Yi-Jian Wu, Zhen-Hua Ling, Ren-Hua Wang
Map-Based Adaptation for Speech Conversion Using Adaptation Data Selection and Non-Parallel Training ............................................................................................................ 2254
Chung-Han Lee, Chung-Hsien Wu
Novel Method for Data Clustering and Mode Selection with Application in Voice Conversion .................................................................................................................................... 2258
Jani Nurminen, Jilei Tian, Victor Popa
Text-Independent Cross-Language Voice Conversion ............................................................. 2262David Sundermann, Harald Hoge, Antonio Bonafonte, Hermann Ney, Julia Hirschberg
Maximum Likelihood Voice Conversion Based on GMM with STRAIGHT Mixed Excitation ....................................................................................................................................... 2266
Yamato Ohtani, Tomoki Toda, Hiroshi Saruwatari, Kiyohiro Shikano
Improving Body Transmitted Unvoiced Speech with Statistical Voice Conversion .............. 2270Mikihiro Nakagiri, Tomoki Toda, Hideki Kashioka, Kiyohiro Shikano
An HMM-Based Singing Voice Synthesis System..................................................................... 2274Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Voice Conversion Based on Mixtures of Factor Analyzers ...................................................... 2278Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda
Efficient Gaussian Mixture Model Evaluation in Voice Conversion ........................................ 2282Jilei Tian, Jani Nurminen, Victor Popa
Constrained Structural Maximum a posteriori Linear Regression for Average-Voice-Based Speech Synthesis................................................................................................... 2286
Yuji Nakano, Makoto Tachibana, Junichi Yamagishi, Takao Kobayashi
Frequency Warping Based on Mapping Formant Parameters ................................................. 2290Zhi-Wei Shuang, Raimo Bakis, Slava Shechtman, Dan Chazan, Yong Qin
Automatic Phonetic Segmentation by Using a SPM-Based Approach for a Mandarin Singing Voice Corpus.................................................................................................. 2294
Cheng-Yuan Lin, J.-S. Roger Jang
A Comparison of Singing Evaluation Algorithms ..................................................................... 2298Partha Lal
ASR OTHER II
Improving Speech Recognition of Two Simultaneous Speech Signals by Integrating ICA BSS and Automatic Missing Feature Mask Generation ................................. 2302
Ryu Takeda, Shun'Ichi Yamamoto, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno
Missing-Feature Reconstruction for Band-Limited Speech Recognition in Spoken Document Retrieval ...................................................................................................................... 2306
Wooil Kim, John H. L. Hansen
Incremental Learning of MAP Context-Dependent Edit Operations for Spoken Phone Number Recognition in an Embedded Platform ............................................................ 2310
Hahn Koo, Yan Ming Cheng
Development and Evaluation of Speech Database in Automotive Environments for Practical Speech Recognition Systems................................................................................ 2314
Yasunari Obuchi, Nobuo Hataoka
An Effective and Efficient Utterance Verification Technology Using Word N-Gram Filler Models .................................................................................................................................. 2318
Dong Yu, Yun-Cheng Ju, Alex Acero
An Efficient Bispectrum Phase Entropy-Based Algorithm for VAD ........................................ 2322J. M. Gorriz, Javier Ramirez, C. G. Puntonet, Jose C. Segura
Two-Step Unsupervised Speaker Adaptation Based on Speaker and Gender Recognition and HMM Combination ........................................................................................... 2326
Petr Cerva, Jan Nouza, Jan Silovsky
CENSREC2: Corpus and Evaluation Environments for In Car Continuous Digit Speech Recognition...................................................................................................................... 2330
Satoshi Nakamura, Masakiyo Fujimoto, Kazuya Takeda
Detection of Word Fragments in Mandarin Telephone Conversation ..................................... 2334Cheng-Tao Chu, Yun-Hsuan Sung, Yuan Zhao, Daniel Jurafsky
A DTW-Based Dissimilarity Measure for Left-to-Right Hidden Markov Models and Its Application to Word Confusability Analysis ......................................................................... 2338
Qiang Huo, Wei Li
Multi-Flow Block Interleaving Applied to Distributed Speech Recognition Over IP Networks ........................................................................................................................................ 2342
Angel M. Gomez, Juan J. Ramos-Munoz, Antonio M. Peinado, Victoria Sanchez
Moving Speech Recognition from Software to Silicon: The In Silico Vox Project................. 2346Edward C. Lin, Kai Yu, Rob A. Rutenbar, Tsuhan Chen
A Study on Detection Based Automatic Speech Recognition ................................................. 2350Chengyuan Ma, Yu Tsao, Chin-Hui Lee
Novel Time Domain Multi-Class SVMs for Landmark Detection .............................................. 2354Rahul Chitturi, Mark Hasegawa Johnson
PROSODY
Towards Automatic Parameter Extraction of Command-Response Model for Cantonese ...................................................................................................................................... 2358
Raymond W. M. Ng, Tan Lee, Wentao Gu
A Model for the f0 Reset in Corpus-Based Intonation Approaches ........................................ 2362Francisco Campillo, Jan P. H. Van Santen, Eduardo R. Banga
Generating German Intonation with a Trainable Prosodic Model............................................ 2366Gerard Bailly, Jan Gorisch
Incorporating Second-Order Information into Two-Step Major Phrase Break Prediction for Korean ................................................................................................................... 2370
Seungwon Kim, Jinsik Lee, Byeongchang Kim, Gary Geunbae Lee
Totally Data-Driven Duration Modeling Based on Generalized Linear Model for Mandarin TTS ................................................................................................................................ 2374
Lifu Yi, Jian Li, Xiaoyan Lou, Jie Hao
Segmental Duration Modeling in Turkish ................................................................................... 2378Ozlem Ozturk, Tolga Ciloglu
Lexical Stress in Continuous Speech Recognition ................................................................... 2382Rogier C. Van Dalen, Pascal Wiggers, Leon J. M. Rothkrantz
Improving Tone Recognition with Combined Frequency and Amplitude Modelling ............. 2386Siwei Wang, Gina-Anne Levow
Latent Prosodic Modeling (LPM) for Speech with Applications in Recognizing Spontaneous Mandarin Speech with Disfluencies.................................................................... 2390
Che-Kuang Lin, Lin-Shan Lee
Tone Recognition of Continuous Speech of Standard Chinese Using Neural Network and Tone Nucleus Model .............................................................................................. 2394
Keikichi Hirose, Hui Hu, Xiaodong Wang, Nobuaki Minematsu
Prosodic Feature Generation for Back-Channel Prediction ..................................................... 2398Thamar Solorio, Olac Fuentes, Nigel G. Ward, Yaffa Al Bayyari
On the Sufficiency and Redundancy of Pitch for TRP Projection ........................................... 2402Wieneke Wesseling, R. J. J. H. Van Son, Louis C. W. Pols
DISCRIMINATIVE TRAINING
Hypothesis Spaces for Minimum Bayes Risk Training in Large Vocabulary Speech Recognition...................................................................................................................... 2406
Matthew Gibson, Thomas Hain
Minimum Divergence Based Discriminative Training ............................................................... 2410Jun Du, Peng Liu, Frank K. Soong, Jian-Lai Zhou, Ren-Hua Wang
Solving Large Margin Estimation of HMMS via Semidefinite Programming .......................... 2414Xinwei Li, Hui Jiang
Use of Incrementally Regulated Discriminative Margins in MCE Training for Speech Recognition...................................................................................................................... 2418
Dong Yu, Li Deng, Xiaodong He, Alex Acero
Soft Margin Estimation of Hidden Markov Model Parameters ................................................. 2422Jinyu Li, Ming Yuan, Chin-Hui Lee
Discriminative Models for Spoken Language Understanding.................................................. 2426Ye-Yi Wang, Alex Acero
SPEECH SYNTHESIS
Evaluating a Virtual Speech Cuer................................................................................................ 2430G. Gibert, Gerard Bailly, F. Elisei
Intelligibility of Machine Translation Output in Speech Synthesis.......................................... 2434Laura Mayfield Tomokiyo, Kay Peterson, Alan W. Black, Kevin A. Lenzo
A Technique for Controlling Voice Quality of Synthetic Speech Using Multiple Regression HSMM......................................................................................................................... 2438
Makoto Tachibana, Takashi Nose, Junichi Yamagishi, Takao Kobayashi
Learning from Errors in Grapheme-to-Phoneme Conversion .................................................. 2442Tatyana Polyakova, Antonio Bonafonte
Eigenvoice Conversion Based on Gaussian Mixture Model .................................................... 2446Tomoki Toda, Yamato Ohtani, Kiyohiro Shikano
Generating Time-Constrained Audio Presentations of Structured Information..................... 2450Brian Langner, Rohit Kumar, Arthur Chan, Lingyun Gu, Alan W. Black
MULTIMODAL PROCESSING
Multimodal Authentication Using Qualitative Support Vector Machines................................ 2454F. Alsaade, A. Ariyaeeinia, L. Meng, A. Malegaonkar
Adaptive Multimodal Fusion by Uncertainty Compensation.................................................... 2458Vassilis Pitsikalis, Athanassios Katsamanis, George Papandreou, Petros Maragos
Effects of Familiarity with Faces and Voices on Second-Language Speech Processing: Components of Memory Traces............................................................................. 2462
Debra M. Hardison
Automatic Metadata Generation and Video Editing Based on Speech and Image Recognition for Medical Education Contents ............................................................................ 2466
Satoshi Tamura, Koji Hashimoto, Jiong Zhu, Satoru Hayamizu, Hirotsugu Asai, Hideki Tanahashi, Makoto Kanagawa
Analysis of Correlation Between Audio and Visual Speech Features for Clean Audio Feature Prediction in Noise .............................................................................................. 2470
Ibrahim Almajai, Ben Milner, Jonathan Darch
TDA: A New Trainable Trajectory Formation System for Facial Animation............................ 2474Oxana Govokhina, Gerard Bailly, Gaspard Breton, Paul Bagshaw
SPEECH ANALYSIS
Modeling of Speech Signals Based on Bessel-Like Orthogonal Transform .......................... 2478Giorgio Biagetti, Paolo Crippa, Claudio Turchetti
Glottal Closure and Opening Detection for Flexible Parametric Voice Coding...................... 2482Pamornpol Jinachitra
Independent Components for Acoustic Modeling..................................................................... 2486Jan Trmal, Jan Vanek, Ludek Muller, Jan Zelinka
Pitch-Scale Modification Using the Modulated Aspiration Noise Source............................... 2490Daryush Mehta, Thomas F. Quatieri
Max-Gabor Analysis and Synthesis of Spectrograms .............................................................. 2494Tony Ezzat, Jake Bouvrie, Tomaso Poggio
Monitoring of the Natural Voice Variations in Open and Closed Phases with Frequency Warped ARMA Modeling ........................................................................................... 2498
Pedro J. Quintana-Morales, Juan L. Navarro-Mesa, Antonio G. Ravelo-Garcia, Fernando D. Lorenzo-Garcia
Speech Analyzer Using a Joint Estimation Model of Spectral Envelope and Fine Structure ........................................................................................................................................ 2502
Hirokazu Kameoka, Jonathan Le Roux, Nobutaka Ono, Shigeki Sagayama
An Investigation of Manifold Learning for Speech Analysis .................................................... 2506Andrew Errity, John McKenna
An Incremental Algorithm for Signal Reconstruction from Short-Time Fourier Transform Magnitude ................................................................................................................... 2510
Jake Bouvrie, Tony Ezzat
Automatic Assignment of Anchoring Points on Vowel Templates for Defining Correspondence Between Time-Frequency Representations of Speech Samples ............... 2514
Toru Takahashi, Masashi Nishi, Toshio Irino, Hideki Kawahara
Nonlinear Dynamical Invariants for Speech Recognition......................................................... 2518S. Prasad, S. Srinivasan, M. Pannuri, G. Lazarou, Joseph Picone
ADVANCES IN NOISY ASR
Exploiting Polynomial-Fit Histogram Equalization and Temporal Average for Robust Speech Recognition ........................................................................................................ 2522
Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen
Missing Data Mask Models with Global Frequency and Temporal Constraints..................... 2526Sebastien Demange, Christophe Cerisara, Jean-Paul Haton
Multi-Stream ASR: An Oracle Perspective ................................................................................. 2530Hemant Misra, Jithendra Vepa, Herve Bourlard
A Weight Estimation Method Using LDA for Multi-Band Speech Recognition ...................... 2534Koji Iwano, Kaname Kojima, Sadaoki Furui
Powered Cepstral Normalization (P-CN) for Robust Features in Speech Recognition ................................................................................................................................... 2538
Chang-Wen Hsu, Lin-Shan Lee
Robust Automatic Speech Recognition for Accented Mandarin in Car Environments ................................................................................................................................ 2542
Pei Ding, Lei He, Xiang Yan, Jie Hao
A Robust Feature Extraction Based on the MTF Concept for Speech Recognition in Reverberant Environment ........................................................................................................ 2546
Xugang Lu, Masashi Unoki, Masato Akagi
Clean Speech Feature Estimation Based on Soft Spectral Masking ....................................... 2550Young Joon Kim, Woohyung Lim, Nam Soo Kim
Robust Speech Recognition by Modifying Clean and Telephone Feature Vectors Using Bidirectional Neural Network............................................................................................ 2554
Mansoor Vali, Seyyed Ali Seyyed Salehi, Kazem Karimi
Silence Energy Normalization for Robust Speech Recognition in Additive Noise Environment .................................................................................................................................. 2558
Chung-Fu Tai, Jeih-Weih Hung
Handling Convolutional Noise in Missing Data Automatic Speech Recognition................... 2562Maarten Van Segbroeck, Hugo Van Hamme
Noisy Speech Recognition Based on Selection of Multiple Noise Suppression Methods Using Noise GMMs........................................................................................................ 2566
Norihide Kitaoka, Souta Hamaguchi, Seiichi Nakagawa
Using Posterior-Based Features in Template Matching for Speech Recognition ................. 2570Guillermo Aradilla, Jithendra Vepa, Herve Bourlard
Hypothesis-Based Feature Combination of Multiple Speech Inputs for Robust Speech Recognition in Automotive Environments ................................................................... 2574
Yasunari Obuchi, Nobuo Hataoka
SOURCE SEPARATION AND LOCALIZATION
Continuous Time-Frequency Masking Method for Blind Speech Separation with Adaptive Choice of Threshold Parameter Using ICA ................................................................ 2578
Zbynek Koldovsky, Jan Nouza, Jan Kolorenc
Multistage Convolutive Blind Source Separation for Speech Mixture .................................... 2582Yanxue Liang, Ichiro Hagiwara
Detection and Separation of Speech Events in Meeting Recordings...................................... 2586Futoshi Asano, Jun Ogata
Audio Person Tracking in a Smart-Room Environment............................................................ 2590Alberto Abad, Carlos Segura, Dusan Macho, Javier Hernando, Climent Nadeu
Tracking and Beamforming for Multiple Simultaneous Speakers with Probabilistic Data Association Filters ............................................................................................................... 2594
Tobias Gehrig, Ulrich Klee, John W. McDonough, Shajith Ikbal, Matthias Wolfel, Christian Fugen
Modeling the Precedence Effect for Binaural Sound Source Localization in Noisy and Echoic Environments ............................................................................................................ 2598
Martin Heckmann, Tobias Rodemann, Bjorn Scholling, Frank Joublin, Christian Goerick
Using a Differential Microphone Array to Estimate the Direction of Arrival of Two Acoustic Sources.......................................................................................................................... 2602
Fotios Talantzis, Anthony G. Constantinides, Lazaros C. Polymenakos
Speaker Localization Based on Oriented Global Coherence Field.......................................... 2606Alessio Brutti, Maurizio Omologo, Piergiorgio Svaizer
Performance Evaluation of Three Features for Model-Based Single Channel Speech Separation Problem ........................................................................................................ 2610
M. H. Radfar, R. M. Dansereau, A. Sayadiyan
Single-Channel Speech Separation Using Sparse Non-Negative Matrix Factorization .................................................................................................................................. 2614
Mikkel N. Schmidt, Rasmus K. Olsson
Adaptive Speech Enhancement for Speech Separation in Diffuse Noise............................... 2618Rong Hu, Yunxin Zhao
A Probabilistic Graphical Model for Microphone Array Source Separation Using Rich Pre-Trained Source Models................................................................................................. 2622
H. T. Attias
Geometrically Constrained Permutation-Free Source Separation in an Undercomplete Speech Unmixing Scenario .............................................................................. 2626
Erik Visser
Highly Directional Multi-Beam Audio Loudspeaker .................................................................. 2630Dirk Olszewski, Klaus Linhard
Author Index