The need for language aids is pervasive in today's world. Millions of individuals with language and speech challenges require additional support for language understanding and learning. Currently, however, these needs are not being met because there are not enough skilled teachers, interpreters, and professionals to give them the one on one attention that they need. Lipreading (speechreading because it involves more than just the lips) allows deaf and hard of hearing individuals to perceive and understand oral language and even to speak. Speechreading seldom disambiguates all of the spoken input, however, and other techniques have been used to allow a richer input. The proposed activity will develop a real-time system to automatically detect robust characteristics of auditory speech and to transform these continuous acoustic features into continuous supplementary visible features. This information combined with watching the speaker's face provides enough information for a person with limited hearing to perceive and understand what is being said. This new technology will allow the design of a wearable computing device that would transform these continuous acoustic features into continuous supplementary visible features and display them on a pair of eyeglasses. This system does not require any learning on the part of the talker and is perceptually and linguistically motivated because it is directly based on acoustic and phonetic properties of speech and gives continuous rather than only categorical information.

The technology we are proposing would be ideally designed for wearable computing so a user could have a face to face conversation while carrying a microphone and wearing a pair of simple glasses, which could also be fitted with the person's normal eye prescripton if necessary. The wearable product would process primitive characteristics of the speech signal such as voicing (the presence of energy at the fundamental frequency such as heard in vowel sounds); frication (high frequency noise like energy characteristic of various consonants such as s, z, and sh; and nasality (which is a unique resonance characteristics as in m, n, and ng). These characteristics would be recognized in real time, and the output delivered simultaneously to a pair of eyeglasses to illuminate small colored spots on the sides of the glasses.

The proposed research holds much promise because people naturally integrate auditory and visual information. In addition, the proposed system does not replace auditory information with the supplementary cues but rather supplements the auditory speech that is normally available to the listener. This strategy is particularly effective because of the complementarity of auditory and visual speech. The auditory speech that is robust in the signal and fairly easy to automatically recognize is exactly that which is not visible on the face. This serendipitous occurrence makes it more likely to succeed at automatically recognizing the robust acoustic characteristics and simultaneously presenting them visually.

The proposed technology qualifies as a transparent information appliance that adds perceptual and cognitive resources to the listener. We have developed a requirements analysis, a conceptual design, and possible physical designs for this appliance. It consists of a very affordable noninvasive device that is seamlessly integrated with normal dress, adding a pair of glasses (which might be used regardless) and a handheld about the size of mobile phone, iPod, or handheld computer. The augmented-reality device is also available for use 24/7, and requires very little maintenance.

2009 Update:

The Problem. The need for language aids is pervasive in today’s world. There are millions of individuals who have language and speech challenges, and these individuals require additional support for communication and language learning.

Solution. Given the limitation of hearing speech for many individuals, we propose to supplement the sound of speech and speechreading with an additional informative visual input. Acoustic characteristics of the speech will be transformed into readily perceivable visual characteristics, which will be simultaneously displayed on the speechreader’s eyeglasses. These acoustic features provide important linguistic information not directly observed on the face and are transformed into visual cues intended to enhance intelligibility and ease of comprehension.

Intellectual merit: The proposed research will advance the state of the art in human machine interaction, speech, machine learning and assistive technologies. The proposed activity will advance engineering research and speech science by developing a real-time system to automatically detect and track robust characteristics of auditory speech and to transform these continuous acoustic features into continuous supplementary visible features.

Broader impacts: The proposed activity will benefit the society by providing a research and theoretical foundation for a system that would be naturally available to almost all individuals at a very low cost. It does not require automatic speech recognition, and will always be more accurate regardless of the advances or lack of advances in speech recognition technology. It does not require literate users because no written information is presented as would be the case in a captioning system; it is age-independent in that it might be used by toddlers, adolescents, and throughout the life span; it is functional for all languages because all languages share the corresponding acoustic characteristics; it would provide significant help for people with hearing aids and cochlear implants; and it would be beneficial for many individuals with language challenges and even for children learning to read.