TravellingWave Technology
 
Home Company Technology Products News/Articles Contact

Background

Facilitating text input into computers and handheld devices is a work in progress. Well-known solutions include: (a) mobile triple-tapping, (b) ambiguous/unambiguous text prediction (i.e., T9, SureType, SureType), (c) mini-qwerty keyboards (BlackBerry), (d) on-screen soft-key displays (touch screen devices), (e) handwriting recognition, and (f) optical character recognition.

In theory, "Speech-to-Text" would be a natural alternative: if one could simply speak into their computer or device and have the text magically appear on the screen. The speech-to-text problem, acknowledged to be “the holy grail of computing”, has been historically plagued with problems including: (a) infinite language perplexity, (b) background and channel noises, (c) varied pronunciations, (d) unacceptable speaker-training methods, and (e) lack of intuitive error-correction. In reality, speech recognition in the commercial world has been successful in only limited command-and-control applications like call-center-automation where the lexicon is compressed.

Voice Powered Text Prediction™ by TravellingWave

TravellingWave has taken an innovative and patent-pending approach that uses voice recognition to enhance text prediction. Specifically, Voice Powered Text Prediction technology predicts words using the acoustics (speech) rendered by a user in addition to the letters inputted by the user; traditional predictive text input systems rely only on letters. The result is faster and simpler prediction of text. The technology builds on top of existing text input methods. Thus, in situations wherein speaking is undesirable, Voice Powered Text Prediction simply becomes standard text prediction. That is, the worst case becomes today’s best case. The overall system uses the following modules:

VoicePredict Multimodal Platform (patent-pending) fuses the hand/finger/stylus based inputs with microphone's speech input, in real-time, to result in voice powered text prediction. Additionally, if a user decides not to speak or if the background noise conditions are not suitable for optimum speech recognoition, the system automatically falls back to plain text prediction using keypad inputs, thus rendering a near 100% reliable speech interface solution.

Frequency Localized Temporal Speech Processing (proprietary) is based on the company's RAGs (Rao-Aronov-Garafutdinov Speech-Processing) algorithm which in turn is based on published research1-3 on compact features modeling the travelling wave phenomena in the human cochlea. This module extracts modulation information from speech, as opposed to traditional power spectrum analysis. This enables VoicePredict system to function robustly in noisy environments.

Acoustic and Language Modeling (patent-pending) techniques exploit the multimodal user interface and are optimized for the mobile text input problem.



REFERENCES:

(1) Research Supported by National Science Foundation under the Small Business Innovation Research Phase-I, Phase-IB, and Phase-II grants; currently active
(2) "On Decomposing Speech into Modulated Components", Ashwin Rao and Ramdas Kumaresan, Journal of the IEEE Trans. On Speech and Audio Processing, May 2000
(3) "Model Based Approach to Envelope and Positive Instantaneous Frequency Estimation of Signals", Ramdas Kumaresan and Ashwin Rao, Journal of the Acoustical Society of America, March 1999

 
© 2009 - TravellingWave Inc. All rights reserved.