Skip to main navigation Skip to search Skip to main content

rHDP: An Aspect Sharing-Enhanced Hierarchical Topic Model for Multi-Domain Corpus

  • Yitao Zhang
  • , Changxuan Wan
  • , Keli Xiao
  • , Qizhi Wan
  • , Dexi Liu
  • , Xiping Liu
  • Jiangxi University of Finance and Economics
  • East China Jiaotong University
  • Jiangxi Key Laboratory of Data and Knowledge Engineering

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Learning topic hierarchies from a multi-domain corpus is crucial in topic modeling as it reveals valuable structural information embedded within documents. Despite the extensive literature on hierarchical topic models, effectively discovering inter-topic correlations and differences among subtopics at the same level in the topic hierarchy, obtained from multiple domains, remains an unresolved challenge. This article proposes an enhanced nested Chinese restaurant process (nCRP), nCRP+, by introducing an additional mechanism based on Chinese restaurant franchise (CRF) for aspect-sharing pattern extraction in the original nCRP. Subsequently, by employing the distribution extracted from nCRP+ as the prior distribution for topic hierarchy in the hierarchical Dirichlet processes (HDP), we develop a hierarchical topic model for multi-domain corpus, named rHDP. We describe the model with the analogy of Chinese restaurant franchise based on the central kitchen and propose a hierarchical Gibbs sampling scheme to infer the model. Our method effectively constructs well-established topic hierarchies, accurately reflecting diverse parent-child topic relationships, explicit topic aspect sharing correlations for inter-topics, and differences between these shared topics. To validate the efficacy of our approach, we conduct experiments using a renowned public dataset and an online collection of Chinese financial documents. The experimental results confirm the superiority of our method over the state-of-the-art techniques in identifying multi-domain topic hierarchies, according to multiple evaluation metrics.

Original languageEnglish
Article number71
JournalACM Transactions on Information Systems
Volume42
Issue number3
DOIs
StatePublished - Dec 29 2023

Keywords

  • Chinese restaurant franchise
  • Hierarchical topic model
  • aspect sharing pattern
  • hierarchical Dirichlet processes

Fingerprint

Dive into the research topics of 'rHDP: An Aspect Sharing-Enhanced Hierarchical Topic Model for Multi-Domain Corpus'. Together they form a unique fingerprint.

Cite this