Skip to main navigation Skip to search Skip to main content

Synthetic Audio Helps for Cognitive State Tasks

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Automatically recognizing a human’s complete cognitive state from text is a difficult task; from text, a model has to recognize a combination of concepts including belief, emotion, common ground, sentiment, and intention. Humans do not only track and update cognitive state from the meaning of words and sentences, but also from paralinguistic cues such as prosody. The NLP community has broadly focused on text-only approaches to cognitive state tasks, but audio can provide vital missing information. We posit that text-to-speech (TTS) models learn to track aspects of cognitive state in order to produce naturalistic audio, and that the signal audio models implicitly identify is orthogonal to the information that language models exploit. We present Synthetic Audio Data fine-tuning (SAD), a framework where we show that seven tasks related to cognitive state modeling benefit from multimodal training on both text and zero-shot synthetic audio data from an off-the-shelf TTS system. We show an improvement over the text-only modality when adding synthetic audio data to text-only corpora. Furthermore, on tasks and corpora that do contain gold audio, we show our SAD framework achieves competitive performance using text and synthetic audio compared to text and gold audio.

Original languageEnglish
Title of host publication2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics
Subtitle of host publicationProceedings of the Conference Findings, NAACL 2025
EditorsLuis Chiruzzo, Alan Ritter, Lu Wang
PublisherAssociation for Computational Linguistics (ACL)
Pages1701-1708
Number of pages8
ISBN (Electronic)9798891761957
DOIs
StatePublished - 2025
Event2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025 - Albuquerque, United States
Duration: Apr 29 2025May 4 2025

Publication series

Name2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Proceedings of the Conference Findings, NAACL 2025

Conference

Conference2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics, NAACL 2025
Country/TerritoryUnited States
CityAlbuquerque
Period04/29/2505/4/25

Fingerprint

Dive into the research topics of 'Synthetic Audio Helps for Cognitive State Tasks'. Together they form a unique fingerprint.

Cite this