TY - GEN
T1 - Action unit detection with segment-based SVMs
AU - Simon, Tomas
AU - Nguyen, Minh Hoai
AU - De La Torre, Fernando
AU - Cohn, Jeffrey F.
PY - 2010
Y1 - 2010
N2 - Automatic facial action unit (AU) detection from video is a long-standing problem in computer vision. Two main approaches have been pursued: (1) static modeling - typically posed as a discriminative classification problem in which each video frame is evaluated independently; (2) temporal modeling - frames are segmented into sequences and typically modeled with a variant of dynamic Bayesian networks. We propose a segment-based approach, kSeg-SVM, that incorporates benefits of both approaches and avoids their limitations. kSeg-SVM is a temporal extension of the spatial bag-of-words. kSeg-SVM is trained within a structured output SVM framework that formulates AU detection as a problem of detecting temporal events in a time series of visual features. Each segment is modeled by a variant of the BoW representation with soft assignment of the words based on similarity. Our framework has several benefits for AU detection: (1) both dependencies between features and the length of action units are modeled; (2) all possible segments of the video may be used for training; and (3) no assumptions are required about the underlying structure of the action unit events (e.g., i.i.d.). Our algorithm finds the best k-or-fewer segments that maximize the SVM score. Experimental results suggest that the proposed method outperforms state-of-the-art static methods for AU detection.
AB - Automatic facial action unit (AU) detection from video is a long-standing problem in computer vision. Two main approaches have been pursued: (1) static modeling - typically posed as a discriminative classification problem in which each video frame is evaluated independently; (2) temporal modeling - frames are segmented into sequences and typically modeled with a variant of dynamic Bayesian networks. We propose a segment-based approach, kSeg-SVM, that incorporates benefits of both approaches and avoids their limitations. kSeg-SVM is a temporal extension of the spatial bag-of-words. kSeg-SVM is trained within a structured output SVM framework that formulates AU detection as a problem of detecting temporal events in a time series of visual features. Each segment is modeled by a variant of the BoW representation with soft assignment of the words based on similarity. Our framework has several benefits for AU detection: (1) both dependencies between features and the length of action units are modeled; (2) all possible segments of the video may be used for training; and (3) no assumptions are required about the underlying structure of the action unit events (e.g., i.i.d.). Our algorithm finds the best k-or-fewer segments that maximize the SVM score. Experimental results suggest that the proposed method outperforms state-of-the-art static methods for AU detection.
UR - https://www.scopus.com/pages/publications/77955991302
U2 - 10.1109/CVPR.2010.5539998
DO - 10.1109/CVPR.2010.5539998
M3 - Conference contribution
AN - SCOPUS:77955991302
SN - 9781424469840
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 2737
EP - 2744
BT - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
T2 - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
Y2 - 13 June 2010 through 18 June 2010
ER -