Skip to main navigation Skip to search Skip to main content

Perception-oriented online news extraction

  • City University of New York

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Scopus citations

Abstract

A novel online news extraction approach based on human perception is presented in this paper. The approach simulates how a human perceives and identifies online news content. It first detects news areas based on content function, space continuity, and formatting continuity of news information. It further identifies detailed news content based on the position, format, and semantic of detected news areas. Experiment results show that our approach, achieves much better performance (in average more than 99% in terms of F1 Value) compared to previous approaches such as Tree Edit Distance and Visual Wrapper based approaches. Furthermore, our approach does not assume the existence of Web templates in the tested Web pages as required by Tree Edit Distance based approach, nor does it need training sets as required in Visual Wrapper based approach. The success of our approach demonstrates the strength of the perception-oriented Web information extraction methodology and represents a promising approach for automatic information extraction from sources with presentation design for humans.

Original languageEnglish
Title of host publicationJCDL'08
Subtitle of host publicationProceedings of the 8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008
Pages363-366
Number of pages4
DOIs
StatePublished - 2008
Event8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08 - Pittsburgh, PA, United States
Duration: Jun 16 2008Jun 20 2008

Publication series

NameProceedings of the ACM International Conference on Digital Libraries

Conference

Conference8th ACM/IEEE-CS Joint Conference on Digital Libraries 2008, JCDL'08
Country/TerritoryUnited States
CityPittsburgh, PA
Period06/16/0806/20/08

Keywords

  • Information extraction
  • Online news
  • Web

Fingerprint

Dive into the research topics of 'Perception-oriented online news extraction'. Together they form a unique fingerprint.

Cite this