Abstract
Temporal Sentence Grounding (TSG) in videos aims to localize a temporal interval from an untrimmed video that is semantically relevant to a given query sentence. To achieve a balance between tremendous annotation burden and grounding performance, we propose a new Weakly Semi-supervised Temporal Sentence Grounding with Points (WSS-TSG-P) task, where the dataset comprises limited fully-annotated video-sentence pairs by start and end timestamps (full label) and a large amount of weakly-annotated pairs by a single point timestamp (point label). Based on this setting, we first introduce a point-to-moment1 regressor which converts point annotations to pseudo moment labels. To train a good regressor for reliable pseudo moment labels, we propose a point-guided feature aggregation module to aggregate cross-modal representations based on the prototype feature at the given point position. In addition, we propose to perform regressor self-training and design pseudo label generation strategies to exploit both full annotations and point annotations. All heterogeneous labels (full, pseudo moment, and point labels) are used to train a TSG backbone. In addition, we propose a novel point-guided group contrastive learning method by constructing reliable positive and negative sets and re-weighting pseudo moment labels to further improve the model performance. Extensive experiments on benchmark datasets verify that our proposed method outperforms other semi-supervised learning methods and bridges the performance gap between weakly-supervised and fully-supervised learning methods in TSG.
| Original language | English |
|---|---|
| Pages (from-to) | 2268-2278 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Multimedia |
| Volume | 28 |
| DOIs | |
| State | Published - 2026 |
Keywords
- Temporal sentence grounding (TSG)
- point annotations
- weakly semi-supervised learning
Fingerprint
Dive into the research topics of 'Weakly Semi-Supervised Temporal Sentence Grounding in Videos With Point Annotations'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver