Skip to main navigation Skip to search Skip to main content

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

  • Rui Dai
  • , Srijan Das
  • , Kumara Kahatapitiya
  • , Michael S. Ryoo
  • , Francois Bremond
  • Institut national de recherche en informatique et en automatique
  • Université Côte d'Azur
  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

94 Scopus citations

Abstract

Action detection is a significant and challenging task, especially in densely-labelled datasets of untrimmed videos. Such data consist of complex temporal relations including composite or co-occurring actions. To detect actions in these complex settings, it is critical to capture both shortterm and long-term temporal information efficiently. To this end, we propose a novel 'ConvTransformer' network for action detection: MS-TCT11Code/Models: https://github.com/dairui01/MS-TCT. This network comprises of three main components: (1) a Temporal Encoder module which explores global and local temporal relations at multiple temporal resolutions, (2) a Temporal Scale Mixer module which effectively fuses multi-scale features, creating a unified feature representation, and (3) a Classification module which learns a center-relative position of each action instance in time, and predicts frame-level classification scores. Our experimental results on multiple challenging datasets such as Charades, TSU and MultiTHUMOS, validate the effectiveness of the proposed method, which outperforms the state-of-the-art methods on all three datasets.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PublisherIEEE Computer Society
Pages20009-20019
Number of pages11
ISBN (Electronic)9781665469463
DOIs
StatePublished - 2022
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States
Duration: Jun 19 2022Jun 24 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/TerritoryUnited States
CityNew Orleans
Period06/19/2206/24/22

Keywords

  • Action and event recognition
  • Behavior analysis

Fingerprint

Dive into the research topics of 'MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection'. Together they form a unique fingerprint.

Cite this