Multimodal Technology underlying VoicePredict
TravellingWave has taken an innovative approach that combines redundant information from multiple modes, namely the keyboard and
the microphone, to significantly enhance accuracies of both voice recognition and text prediction. Specifically, the technology underlying VoicePredict, predicts words using the speech rendered by a user in addition to the letters
inputted by the user; traditional predictive text input systems rely only on letters; speech-to-text systems rely on speech only.
Acoustic Modeling
VoicePredict combines traditional acoustic modeling techniques (based on modeling phonemes using statistical models) with
acoustic-phonetic modeling. In VoicePredict, the latter incorporates spectrally localized temporal features, in conjunction
with features like phonetic durations, syllable boundaries and formant energies.
Language Modeling
VoicePredict adapts its language model based on the frequency of word usage. New words that are not in the large built-in
dictionary (tens of thousands of words) are learnt on the fly. Currently VoicePredict employs unigram language models;
meaning VoicePredict does not rely on a sentence context. The unigram modeling techniques make use of VoicePredict's inherent multimodality, resulting in an extremely robust language model.
|
 |
|
|
Copyright TravellingWave Inc 2010. All Rights Reserved.