Skip to main navigation Skip to search Skip to main content

Simple neologism based domain independent models to predict year of authorship

  • University of California at Santa Barbara
  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

8 Scopus citations

Abstract

We present domain independent models to date documents based only on neologism usage patterns. Our models capture patterns of neologism usage over time to date texts, provide insights into temporal locality of word usage over a span of 150 years, and generalize to various domains like News, Fiction, and Non-Fiction with competitive performance. Quite intriguingly, we show that by modeling only the distribution of usage counts over neologisms (the model being agnostic of the particular words themselves), we achieve competitive performance using several orders of magnitude fewer features (only 200 input features) compared to state of the art models some of which use 200K features.

Original languageEnglish
Title of host publicationCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings
EditorsEmily M. Bender, Leon Derczynski, Pierre Isabelle
PublisherAssociation for Computational Linguistics (ACL)
Pages202-212
Number of pages11
ISBN (Electronic)9781948087506
StatePublished - 2018
Event27th International Conference on Computational Linguistics, COLING 2018 - Santa Fe, United States
Duration: Aug 20 2018Aug 26 2018

Publication series

NameCOLING 2018 - 27th International Conference on Computational Linguistics, Proceedings

Conference

Conference27th International Conference on Computational Linguistics, COLING 2018
Country/TerritoryUnited States
CitySanta Fe
Period08/20/1808/26/18

Fingerprint

Dive into the research topics of 'Simple neologism based domain independent models to predict year of authorship'. Together they form a unique fingerprint.

Cite this