Skip to main navigation Skip to search Skip to main content

Prosody Analysis of Audiobooks

  • Charuta Pethe
  • , Bach Pham
  • , Felix D. Childress
  • , Yunting Yin
  • , Steven Skiena
  • Stony Brook University
  • Earlham College

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent advances in text-to-speech have made it possible to generate natural-sounding audio from text. However, audiobook narrations involve dramatic vocalizations and intonations by the reader, with greater reliance on emotions, dialogues, and descriptions in the narrative. Using our dataset of 93 aligned book-audiobook pairs, we present improved models to predict prosody (pitch, volume, and rate of speech) from narrative text using language modeling. Our predicted prosody attributes correlate much better with human audiobook readings than results from a state-of-the-art commercial TTS system: our predicted pitch shows a higher correlation with human reading for 22 out of 24 books in the test set, while our predicted volume attribute proves more similar to human reading for 23 out of the 24 books. Finally, we present a human evaluation study to quantify the extent that people prefer prosody-enhanced audiobook readings over default commercial text-to-speech systems.

Original languageEnglish
Title of host publicationProceedings - 2025 19th International Conference on Semantic Computing, ICSC 2025
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages217-221
Number of pages5
ISBN (Electronic)9798331524265
DOIs
StatePublished - 2025
Event19th International Conference on Semantic Computing, ICSC 2025 - Hybrid, Laguna Hills, United States
Duration: Feb 3 2025Feb 5 2025

Publication series

NameProceedings - IEEE International Conference on Semantic Computing, ICSC
ISSN (Print)2325-6516
ISSN (Electronic)2472-9671

Conference

Conference19th International Conference on Semantic Computing, ICSC 2025
Country/TerritoryUnited States
CityHybrid, Laguna Hills
Period02/3/2502/5/25

Keywords

  • character embedding
  • prosody attribute prediction
  • text to speech

Fingerprint

Dive into the research topics of 'Prosody Analysis of Audiobooks'. Together they form a unique fingerprint.

Cite this