Putting Humans into the Visual Equation

(Above: It would be difficult for even a powerful computer to recognize James Davis in all of these photos. For a human, the task is trivial.)

by Gordy Slack

Machine vision has come a long way in the past couple of decades, but reliable high-accuracy systems that conduct small-scale image analysis remain elusive, says James Davis, associate professor of computer science at UC Santa Cruz. As an expert in the field, Davis frequently fields calls from entrepreneurs seeking advice: “We have this great business idea,” they say, “but we need to work out one little technical detail.” Then they go on to explain that the computer must compare photos taken of cell samples, say, to stored images of different pathological cells. Or to do a web search for all on-sale swimsuits resembling the one on the cover of the current issue of Vanity Fair.

These may all be good ideas, says Davis, but they assume that computer vision is about twenty years more advanced than it is. “It just doesn’t work as well as they need it to in order to get reliably accurate results.”

Given the growing power of computer algorithms, people are often surprised by the lagging capability of machine vision. After all, says Davis, matching photos is a trivial job for almost any human. “You don’t need to be educated, specially trained, or even literate.”

Of course, businesses manipulating and comparing images could hire humans to match up web images for them—and some businesses do. The rise of microwork marketplaces such as Mechanical Turk, Samasource, and CrowdFlower have created new pools of potential labor for these jobs and platforms for reaching them. But conducting purely human searches of big data bases would be expensive and laborious. If there are 100,000 images of faces in a data base and the task requires finding two belonging to the same person, a human would have to look at all the images. Even if an employee were only paid one cent per image, it would cost $1,000 to conduct the search.

“That’s not going to be just another tab in PhotoShop!” says Davis.

But an inexpensive automated system could do parts of the search quickly and efficiently, zeroing in on a small set of fairly high quality results, though it would probably come up with a few dead wrong conclusions, says Davis. If that set of results was forwarded on to humans, they could confirm the results or decide ambiguous cases. Davis terms people recruited to do such work Human Processing Units (HPUs).

The trick, he adds, may be to hybridize the two approaches: have the computer algorithms do the initial searches and then, at the right moment, plug human employees into the equation “to look at those results, validate the positives, and weed out the duds.” The hard part, and the one that Davis is trying to tackle with help from a $50,000 CITRIS seed grant, is to make seamless the interface between the automated algorithmic part of the job and the human part. Roughly, engineers would write computer code, but some of the lines of code say “ask a person in platform a if x is an image of the same thing as y.”

Unlike their digital counterparts, HPUs “can be late, can decide to go faster than high-quality work requires, do not behave deterministically, and they do not come with instructions,” says Davis. His research examines just how these human qualities can be accounted for, quantified, and adopted into algorithms.

While pondering this problem, Davis also considered one of an entirely different sort:  How, he wondered, could the rewards of high-tech economies be spread more evenly around the world so that technological progress benefits large sectors of the world’s population and not just a lucky few?

What if people in some of the poorest parts of the world could be employed to do visual matching work, possibly even on their own simple cell phones?  They would only need to respond with a “yes” or a “no” to questions like, Are these two images of the same person? Or, Is this image a face or a wheel? They could be paid a small amount for each visual matching task, say a penny per image, and be woven into computer algorithms that could recruit them on a moment’s notice from anywhere around the world. They don’t necessarily need to know exactly who their employer is, or even the nature of the project they are working on. All they need to do is determine whether or not an image meets a given criterion. Yes or no. Thumbs up or thumbs down.

“If one could create a library of clearly described human visual identification routines that had, say, 99% accuracy, then all kinds of applications all around the world could plug into them to get or authenticate results.”

Ming-Hsuan Yang is a machine vision specialist developing super-resolution systems.

Davis’s collaborator and co-PI on the project, Ming-Hsuan Yang, assistant professor of engineering at UC Merced, also a specialist in machine vision, wants to employ Davis’s paradigm to speed up and strengthen his own research. Yang and his students are developing “super-resolution” machine vision systems that can zoom in on and clarify an image, amplifying certain characteristics and minimizing others. If you are zooming in on a human being, basically, you need to determine, in a digital image, if any pixel should be characterized as human or non human. “The pixels designating human characteristics get strengthened and the non-human ones diminished.”

“You see this kind of thing in science fiction,” he says. “But we’re doing the real thing.” Yang has developed several algorithms to do this, and he needs to test the results on real human eyes to see how well they work.

“We want to learn how human perception of these images works,” says Yang. “Given the same input we generate several outputs and then we ask people to evaluate and score them.  Which ones do they prefer and why?”

There are not nearly enough students to evaluate Yang’s huge sets of images. Automatically sending them to third-party human testers who can rate them would be an inexpensive and efficient way to test his results, he says, to explore the effectiveness of different approaches.

As Yang and Davis’s research in machine vision progresses, eventually it will obviate the need for HPUs. The machines will one day be adept enough to do all of these jobs by themselves. But for now, keen and versatile human perceptions may be both necessary to do the job right and a way to bring paid job opportunities to remote and impoverished communities.