Skip to main navigation Skip to search Skip to main content

From semantic categories to fixations: A Novel Weakly-supervised Visual-auditory Saliency Detection Approach

  • Guotao Wang
  • , Chenglizhao Chen
  • , Dengping Fan
  • , Aimin Hao
  • , Hong Qin
  • Beihang University
  • Qingdao University
  • Inception Institute of Artificial Intelligence
  • Chinese Academy of Medical Sciences
  • Pengcheng Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

46 Scopus citations

Abstract

Thanks to the rapid advances in the deep learning techniques and the wide availability of large-scale training sets, the performances of video saliency detection models have been improving steadily and significantly. However, the deep learning based visual-audio fixation prediction is still in its infancy. At present, only a few visual-audio sequences have been furnished with real fixations being recorded in the real visual-audio environment. Hence, it would be neither efficiency nor necessary to re-collect real fixations under the same visual-audio circumstance. To address the problem, this paper advocate a novel approach in a weakly-supervised manner to alleviating the demand of large-scale training sets for visual-audio model training. By using the video category tags only, we propose the selective class activation mapping (SCAM), which follows a coarse-to-fine strategy to select the most discriminative regions in the spatial-temporal-audio circumstance. Moreover, these regions exhibit high consistency with the real human-eye fixations, which could subsequently be employed as the pseudo GTs to train a new spatial-temporal-audio (STA) network. Without resorting to any real fixation, the performance of our STA network is comparable to that of the fully supervised ones. Our code and results are publicly available at https://github.com/guotaowang/STANet.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
PublisherIEEE Computer Society
Pages15114-15123
Number of pages10
ISBN (Electronic)9781665445092
DOIs
StatePublished - 2021
Event2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 - Virtual, Online, United States
Duration: Jun 19 2021Jun 25 2021

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021
Country/TerritoryUnited States
CityVirtual, Online
Period06/19/2106/25/21

Fingerprint

Dive into the research topics of 'From semantic categories to fixations: A Novel Weakly-supervised Visual-auditory Saliency Detection Approach'. Together they form a unique fingerprint.

Cite this