![]() |
| Home | Company | Technology | Products | News/Articles | Contact |
|
|
Background
Obviously, Speech-to-Text seems like a natural alternative: if one could simply speak into their device and have the Text magically appear on the screen. Several attempts have been made to address the speech-to-text problem acknowledged to be “the holy grail of computing”. Unfortunately, the so-called "Dictation" approaches have had limited success because of (a) infinite language perplexity of the problem, (b) background and channel noises, (c) varied pronunciations, (d) unacceptable speaker-training methods, and (e) lack of intuitive error-correction.
Predictive Speech-to-Text™ by TravellingWave
• VoicePredict Multimodal Platform (patent-pending) handles very large vocabularies and incorporates information from one mode (for instance letters typed using a keyped) to enhance speech recognition of spoken words. The system "intelligently" combines information from multiple modes to result in a simple, fast input method. • Frequency Localized Temporal Speech Processing (Proprietary) is based on the company's RAGs (Rao-Aronov-Garafutdinov Speech-Processing) algorithm which in turn is based on published research1-3 on compact features modeling the travelling wave phenomena in the human cochlea. This module extracts modulation information from speech, as opposed to traditional power spectrum analysis. This enables VoicePredict system to function robustly in noisy environments. • VoicePredict Hands-Free-Eyes-Free Platform (patent-pending) combines our unique Alphabet speech recogntion with Predictive Speech-to-text and Text-to-Speech feedback. TravellingWave believes this to be the World's 1st and only hands-free and/or eyes-free text input with near 100% task-completion-accuracy.• Spellation (patent-pending) technology is based on the philosophy of extracting redundant acoustic information. It requires users to speak-and-spell as they dictate. This constraint seems minimal especially in the absence of a full-size keyboard. This approach yields significant improvements in recognition accuracy, compared to standard dictation approaches. • Acoustic and Language Modeling (proprietary): Unlike dictation of standard documents (like letters, memos, and medical/legal reports) text messages (sms, email, im) tend to be very short. For instance sms-text is restricted to a bandwidth of 140-160 characters per message. Our techniques may be viewed as an optimization of standard modeling techniques, for the messaging application.
REFERENCES: (1) Research Supported by National Science Foundation under the Small Business Innovation Research Phase-I, Phase-IB, and Phase-II grants; currently active (2) "On Decomposing Speech into Modulated Components", Ashwin Rao and Ramdas Kumaresan, Journal of the IEEE Trans. On Speech and Audio Processing, May 2000 (3) "Model Based Approach to Envelope and Positive Instantaneous Frequency Estimation of Signals", Ramdas Kumaresan and Ashwin Rao, Journal of the Acoustical Society of America, March 1999 |
| © 2008 -
TravellingWave Inc. All rights reserved.
|