Skip to main navigation Skip to search Skip to main content

Dirichlet aggregation: Unsupervised learning towards an optimal metric for proportional data

  • Peking University

Research output: Contribution to conferencePaperpeer-review

9 Scopus citations

Abstract

Proportional data (normalized histograms) have been frequently occurring in various areas, and they could be mathematically abstracted as points residing in a geometric simplex. A proper distance metric on this simplex is of importance in many applications including classification and information retrieval. In this paper, we develop a novel framework to learn an optimal metric on the simplex. Major features of our approach include: 1) its flexibility to handle correlations among bins/dimensions; 2) widespread applicability without being limited to ad hoc backgrounds; and 3) a "real" global solution in contrast to existing traditional local approaches. The technical essence of our approach is to fit a parametric distribution to the observed empirical data in the simplex. The distribution is parameterized by affinities between simplex vertices, which is learned via maximizing likelihood of observed data. Then, these affinities induce a metric on the simplex, defined as the earth mover's distance equipped with ground distances derived from simplex vertex affinities.

Original languageEnglish
Pages959-966
Number of pages8
DOIs
StatePublished - 2007
Event24th International Conference on Machine Learning, ICML 2007 - Corvalis, OR, United States
Duration: Jun 20 2007Jun 24 2007

Conference

Conference24th International Conference on Machine Learning, ICML 2007
Country/TerritoryUnited States
CityCorvalis, OR
Period06/20/0706/24/07

Fingerprint

Dive into the research topics of 'Dirichlet aggregation: Unsupervised learning towards an optimal metric for proportional data'. Together they form a unique fingerprint.

Cite this