Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Leveraging Human Perception and Computer Vision Algorithms for Interactive Fine-Grained Visual Categorization /

Abstract

Fine-grained categorization has emerged in recent years as a problem of great interest to the computer vision community, given its wide range of applications including species identification for animals, plants, and insects, as well as classification of man-made objects such as vehicle makes and models and architectural styles. The goal of fine-grained categorization is to distinguish between subcategories (e.g., Pembroke Welsh Corgi, Shiba Inu) that belong to the same entry-level category (e.g., Dog). As fine-grained categories are often visually similar, a general-purpose computer vision algorithm for basic-level category recognition is often ineffective in the fine-grained case. Moreover, fine-grained categories are typically recognizable only by experts (e.g., the average person cannot recognize a Myrtle Warbler, a species of bird), while a layperson can immediately recognize entry-level categories like motorcycles or cats. While fine-grained categorization is difficult for both humans and machines, we combine their respective strengths to create an effective human-in-the-loop classification system. These types of systems integrate machine vision algorithms with user feedback at test time in order to interactively arrive at the correct answer. Incorporating user input drives up recognition accuracy to levels sufficient for practical applications; at the same time, computer vision reduces the amount of human interaction required. Moreover, we are able to incrementally improve our models and algorithms while providing a useful service to users. In this dissertation, we explore two paradigms for interactive categorization. The first relies on a comprehensive vocabulary of semantic parts and attributes to discriminate categories. A bird species recognition system, for example, may request feedback from the user regarding a particular image, such as "Where is the beak?" or "Is the wing blue?" Semantic vocabulary-based methods, however, present certain challenges in terms of scalability and finding experts with necessary domain knowledge, as experts can be a scarce resource. The second paradigm we present eliminates the need for such a vocabulary; instead, it is based on perceptual similarity metrics learned from human-provided similarity comparisons. By leveraging these continuous embedded similarity spaces, we exploit a vastly more powerful representation that can be readily applied to other basic- level categories

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View