Thursday, September 30, 2010

[tcwycfav] Textual representation of speech recognition

Consider a one-way communication channel where the sender can only input speech and the receiver can only read (or see some visual representation of the speech).

To ask the speaker to speak naturally on one end and expect perfectly transcribed text on the other is asking too much out of automatic speech recognition.

First idea: transmit the ambiguity and let the receiver human figure it out.  Transcribe to IPA.  Or display a spectrogram.

Second idea: require the speaker to speak a modified language which does better for speech recognition.  We should be able to do better than the "baseline": spell everything with the NATO phonetic alphabet.  Or speak in Morse code.

The winning solution will probably be a combination of both.

There are humans at both ends, whose brains are adept at learning new languages and doing powerful context-sensitive error correction.

No comments :