TY - GEN
T1 - Boundary-Aware Temporal Sentence Grounding with Adaptive Proposal Refinement
AU - Dong, Jianxiang
AU - Yin, Zhaozheng
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Temporal sentence grounding (TSG) in videos aims to localize the temporal interval from an untrimmed video that is relevant to a given query sentence. In this paper, we introduce an effective proposal-based approach to solve the TSG problem. A Boundary-aware Feature Enhancement (BAFE) module is proposed to enhance the proposal feature with its boundary information, by imposing a new temporal difference loss. Meanwhile, we introduce a Boundary-aware Feature Aggregation (BAFA) module to aggregate boundary features and propose a Proposal-level Contrastive Learning (PCL) method to learn query-related content features by maximizing the mutual information between the query and proposals. Furthermore, we introduce a Proposal Interaction (PI) module with Adaptive Proposal Selection (APS) strategies to effectively refine proposal representations and make the final localization. Extensive experiments on Charades-STA, ActivityNet-Captions and TACoS datasets show the effectiveness of our solution. Our code is available at https://github.com/DJX1995/BAN-APR.
AB - Temporal sentence grounding (TSG) in videos aims to localize the temporal interval from an untrimmed video that is relevant to a given query sentence. In this paper, we introduce an effective proposal-based approach to solve the TSG problem. A Boundary-aware Feature Enhancement (BAFE) module is proposed to enhance the proposal feature with its boundary information, by imposing a new temporal difference loss. Meanwhile, we introduce a Boundary-aware Feature Aggregation (BAFA) module to aggregate boundary features and propose a Proposal-level Contrastive Learning (PCL) method to learn query-related content features by maximizing the mutual information between the query and proposals. Furthermore, we introduce a Proposal Interaction (PI) module with Adaptive Proposal Selection (APS) strategies to effectively refine proposal representations and make the final localization. Extensive experiments on Charades-STA, ActivityNet-Captions and TACoS datasets show the effectiveness of our solution. Our code is available at https://github.com/DJX1995/BAN-APR.
UR - https://www.scopus.com/pages/publications/85151046011
U2 - 10.1007/978-3-031-26316-3_38
DO - 10.1007/978-3-031-26316-3_38
M3 - Conference contribution
AN - SCOPUS:85151046011
SN - 9783031263156
T3 - Lecture Notes in Computer Science
SP - 641
EP - 657
BT - Computer Vision – ACCV 2022 - 16th Asian Conference on Computer Vision, Proceedings
A2 - Wang, Lei
A2 - Gall, Juergen
A2 - Chin, Tat-Jun
A2 - Sato, Imari
A2 - Chellappa, Rama
PB - Springer Science and Business Media Deutschland GmbH
T2 - 16th Asian Conference on Computer Vision, ACCV 2022
Y2 - 4 December 2022 through 8 December 2022
ER -