Skip to main navigation Skip to search Skip to main content

Automatic discovery of semantic structures in HTML documents

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

35 Scopus citations

Abstract

Template-driven HTML documents posses an implicit, fixed schema denoting concepts and their relationships in a hierarchical fashion. Discovering this schema remains a relatively unexplored problem. By exploiting a key observation that semantically related items in HTML documents exhibit spatial locality, we develop an algorithm for automatically partitioning them into tree-like semantic structures which expose the implicit schema.

Original languageEnglish
Title of host publicationProceedings - 7th International Conference on Document Analysis and Recognition, ICDAR 2003
PublisherIEEE Computer Society
Pages245-249
Number of pages5
ISBN (Electronic)0769519601
DOIs
StatePublished - 2003
Event7th International Conference on Document Analysis and Recognition, ICDAR 2003 - Edinburgh, United Kingdom
Duration: Aug 3 2003Aug 6 2003

Publication series

NameProceedings of the International Conference on Document Analysis and Recognition, ICDAR
Volume2003-January
ISSN (Print)1520-5363

Conference

Conference7th International Conference on Document Analysis and Recognition, ICDAR 2003
Country/TerritoryUnited Kingdom
CityEdinburgh
Period08/3/0308/6/03

Fingerprint

Dive into the research topics of 'Automatic discovery of semantic structures in HTML documents'. Together they form a unique fingerprint.

Cite this