Reasoning about a visual scene from a photograph is an inherently ambiguous task because an image in itself does not carry enough information to disambiguate the world that it is depicting. Yet, humans have no problems understanding a photograph, seamlessly inferring a plethora of information about the physical space of the scene, the depicted objects and their relationships within the scene, rough scene illumination, cues about surface orientations and material properties, event hints about the geographic location. This remarkable feat is largely due to the vast prior visual experience that humans bring to bear on the task. How can we help computers do the same?
Research in my lab over the past decade has focused on the use of large amounts of visual data, both labeled and unlabeled, as a way of injecting visual experience into the task of computational image understanding. In this talk, I will first show some examples of the power of Big Visual Data to address complex visual tasks with surprisingly simple algorithms. I will then describe our data-driven techniques for gaining a deeper understanding of the scene by parsing the image into its constituent elements to infer information about its 3D geometric, photometric, and semantic properties. Applications of our techniques will demonstrated for several practical tasks including single-view 3D reconstruction, object detection, visual geo-location, and image-based computer graphics.