TY - GEN
T1 - Can Hallucination Correction Improve Video-Language Alignment?
AU - Zhao, Lingjun
AU - Xie, Mingyang
AU - Cascante-Bonilla, Paola
AU - Daumé, Hal
AU - Lee, Kwonjoon
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Large Vision-Language Models often generate hallucinated content that is not grounded in its visual inputs. While prior work focuses on mitigating hallucinations, we instead explore leveraging hallucination correction as a training objective to improve video-language alignment. We introduce HACA, a self-training framework learning to correct hallucinations in descriptions that do not align with the video content. By identifying and correcting inconsistencies, HACA enhances the model's ability to align video and textual representations for spatio-temporal reasoning. Our experimental results show consistent gains in video-caption binding and text-to-video retrieval tasks, demonstrating that hallucination correction-inspired tasks serve as an effective strategy for improving vision and language alignment.
AB - Large Vision-Language Models often generate hallucinated content that is not grounded in its visual inputs. While prior work focuses on mitigating hallucinations, we instead explore leveraging hallucination correction as a training objective to improve video-language alignment. We introduce HACA, a self-training framework learning to correct hallucinations in descriptions that do not align with the video content. By identifying and correcting inconsistencies, HACA enhances the model's ability to align video and textual representations for spatio-temporal reasoning. Our experimental results show consistent gains in video-caption binding and text-to-video retrieval tasks, demonstrating that hallucination correction-inspired tasks serve as an effective strategy for improving vision and language alignment.
UR - https://www.scopus.com/pages/publications/105028596815
U2 - 10.18653/v1/2025.findings-acl.1314
DO - 10.18653/v1/2025.findings-acl.1314
M3 - Conference contribution
AN - SCOPUS:105028596815
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 25636
EP - 25646
BT - Findings of the Association for Computational Linguistics
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
PB - Association for Computational Linguistics (ACL)
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
Y2 - 27 July 2025 through 1 August 2025
ER -