.
 

Multimodal Technology underlying VoicePredict

TravellingWave has taken an innovative approach that combines redundant information from multiple modes, namely the keyboard and the microphone, to significantly enhance accuracies of both voice recognition and text prediction. Specifically, the technology underlying VoicePredict, predicts words using the speech rendered by a user in addition to the letters inputted by the user; traditional predictive text input systems rely only on letters; speech-to-text systems rely on speech only.

Acoustic Modeling

VoicePredict combines traditional acoustic modeling techniques (based on modeling phonemes using statistical models) with acoustic-phonetic modeling. In VoicePredict, the latter incorporates spectrally localized temporal features, in conjunction with features like phonetic durations, syllable boundaries and formant energies.

Language Modeling

VoicePredict adapts its language model based on the frequency of word usage. New words that are not in the large built-in dictionary (tens of thousands of words) are learnt on the fly. Currently VoicePredict employs unigram language models; meaning VoicePredict does not rely on a sentence context. The unigram modeling techniques make use of VoicePredict's inherent multimodality, resulting in an extremely robust language model.

[+]
 

Copyright TravellingWave Inc 2010. All Rights Reserved.

bottom