Text Spotting: Helping Blind Persons Read Signs

Text Spotting aims to understand how a blind person can access information about the environment using mobile vision

“It is easy—and dangerous—for the blind to get lost in unfamiliar urban environments,” said Roberto Manduchi from his Computer Vision Lab at UC Santa Cruz. Signs may be everywhere, especially in public spaces, but if you cannot read them, they are of little benefit. The professor of computer engineering is developing technologies to help blind people navigate safely, including a tool that scans nearby areas for text, zeros in on that text to get a high-quality image of it, and then reads it aloud.

“iPhones are pervasive in the blind community because they have long had good voice controls. And, of course, everyone else is using smartphones, so the blind person does not stand out,” notes Manduchi. His project on Robust Text Spotting, supported by a 2012 CITRIS Seed Grant, is built on an iPhone platform and does not require any special hardware, but it does require powerful software.

This project aims to understand how a blind person can access information about the environment using mobile vision — Text Spotting aims to understand how a blind person can access information about the environment using mobile vision

The software works as follows: if a blind user thinks valuable text may be posted in an area but is unsure exactly where it is, the user moves the phone around while it continuously takes photos. The program shoots and analyzes up to ten frames per second as its user moves through an environment. When it recognizes something that looks like text, it informs the user and helps guide him closer to it.

Manduchi worked on this part of the project with Stefano Carpin, a roboticist and associate professor of computer science at UC Merced. Carpin and Manduchi developed a system for guiding a robot toward text so that it could take a photograph of it with sufficient resolution to “read” it and convert it to audio.

Solving the problem for a robot was hardly trivial, but guiding blind persons toward text turned out to be even trickier, says Manduchi. For one thing, it is much harder for a person who cannot see to keep to a straight line while moving a phone toward text on a sign. “You might be just three feet away from your target and easily start pointing in the wrong direction. The camera does not see the target anymore, so as far as the user is concerned, he might as well be two miles away.”

To help prevent this, the text-spotting tool keeps track of the text’s location and uses a feedback system to guide the user toward it. “The feedback system is going to be extremely important,” says Manduchi.

Two feedback methods are currently available for conveying information through a smartphone to a blind person: acoustic and tactile. While spoken directions (up, down, to the right, etc) from the smartphone will help guide a blind user toward a sign, they can be both cumbersome and, when too often repeated, “very obnoxious,” Manduchi says. “We have to be careful that the feedback is not annoying,” he says. “There is an art to this.”

The final tool will probably deploy a combination of feedback methods: voice, vibration, and other sounds. A series of tones, for example, could grow closer together as the user zeros in on the text.

Finally, the smartphone tool reads the text. This is probably the simplest of the jobs since OCR (optical character recognition) is already so advanced. But the text must be properly framed so that important content is not lost. If a user only captures the last two words of a sign that reads “Do Not Cross Here,” he could be misguided toward grave danger. To avoid such mistakes, Manduchi is writing algorithms that enable the device to recognize the boundaries of chunks of text and guide the camera to a position that captures it in its entirety.

Manduchi is also trying to reduce the vulnerability of the users of his tools. Right now, the system’s users must hold the iPhone out from their bodies, which can be like dangling bait for thieves. One way of addressing these challenges could be with Google Glass, which would allow users to scan the world for text signs by simply moving their heads from side to side. “The location of Google Glass on the head is probably the best place I can imagine for a text-spotting camera,” says Manduchi. “Even without seeing, we can control our head position pretty well. Also, wearing smart glasses (assuming sighted people also adopt them in significant numbers) will be less conspicuous than waving a camera out in front of your body.”

The guidance technology for the tool is still being developed, but if things stay on track, says Manduchi, the entire device could be operational within a year. The project makes an excellent case for the value of interdisciplinary, multi-campus technology research in the interest of society, one that CITRIS is proud to support.

by Gordy Slack