Skip to main content
eScholarship
Open Access Publications from the University of California

UC San Diego

UC San Diego Electronic Theses and Dissertations bannerUC San Diego

Analysis of head pose, faces, and eye dynamics in images and videos : a multilevel framework and algorithms

Abstract

This study is to investigate the fundamental problems of extracting and analyzing head and face related visual cues in multi-level. In the coarse level, the problem of head pose estimation is studied; and in the fine level, the problems of 1) facial feature detection and localization, especially eye features; and 2) eye dynamics, including tracking and blink detection, are studied. Algorithms frameworks for solving these problems and the experimental evaluations are presented. We first describe our contribution in the detailed level visual cue analysis, including facial feature detection, eye tracking and blink detection. Following that, the head pose estimation for images are discussed. Face super-resolution algorithms as a potential solution for obtaining more visual details are also presented. Facial feature detection is solved in a general object detection framework and the performance over eye localization is presented. The dependency distance based on the features' empirical mutual information is used to cluster the features, such that a binary tree can be formed to represent the structure of the object feature space. The binary tree representation partitions the object feature space into compact feature subspace in a coarse to fine manner. In each compact feature subspace, independent component analysis (ICA) is used to get the independent sources, whose probability density functions (PDFs) are modeled by Gaussian mixtures. When applying this representation for the task of object detection, a sub-window is used to scan the entire image and each obtained image patch is examined using Bayesian criteria to determine the presence of an object. After the eyes are automatically located with the binary tree-based probability learning, interactive particle filters are used for simultaneously tracking the eyes and detecting the blinks. Eye blink pattern as an important visual cue for both attentiveness analysis and fatigue indication can thereby be obtained. The particle filters use classification based observation models, in which the posterior probabilities are evaluated by logistic regressions in tensor subspaces. Extensive experiments are used to evaluate the performance from two aspects, 1) blink detection rate and the accuracy of blink duration in terms of the frame numbers; 2) eye tracking accuracy. Experimental setup for obtaining the benchmark data in tracking accuracy evaluation is also presented. A marker based commercial motion capturing system is used to provide the ground-truth. The experimental evaluation demonstrates the capability of this approach. Besides detailed facial feature analysis, the coarser level analysis, head pose estimations, also plays an important role, such as in human-computer interaction systems. In this work the problem is formulated as a multi-class classification problem. We propose using a subspace analysis in wavelet space followed by a geometric structural analysis to solve the problem of classification with nonperfectly aligned face images. Different subspace techniques are compared for a fundamental understanding of head pose space structure. Most head and face visual cue analysis approaches require that the image resolution is high enough to extract su±cient visual details. However, due to the physical constraints, the resolution of the input videos might be limited. Therefore, super-resolution is proposed to reconstruct more visual details about facial features. We first use an inter-pixel interference elimination approach, which is a general approach to arbitrary images. Although reasonable reconstruction can be obtained with low aliasing artifacts, the magnification factors are still limited. An identity dependent regression model in subspaces is proposed as an alternative. High magnification factors can be obtained. The relevance model between the low-resolution face images and their high-resolution counterparts is obtained, which is used to inversely reconstruct the high-resolution face images. Occluded low-resolution images can also be reconstructed using un-occluded training samples. A multi- level analysis of the head and face related visual cues are very important for a smart human-interface system. We present possible solutions in this work, and extensive experimental trials are done to evaluate and validate the proposed approaches. On top of the knowledge we learned using such approaches, the recognition of attentions and behaviors can be solved by a semantic interpretation of such visual cues, which indicates the further study direction

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View