Skip to main navigation Skip to search Skip to main content

Parsing without a grammar: Making sense of unknown file formats

  • Stony Brook University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The thousands of specialized structured file formats in use today present a substantial barrier to freely exchanging information between applications programs. We consider the problem of deducing such basic features as the whitespace characters, bracketing delimiter symbols, and self-delimiter characters of a given file format from one or more example files. We demonstrate that for sufficiently large example files, we can typically identify the basic features of interest.

Original languageEnglish
Title of host publicationProceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003
Pages195-202
Number of pages8
StatePublished - 2003
Event3rd IEEE International Conference on Data Mining, ICDM '03 - Melbourne, FL, United States
Duration: Nov 19 2003Nov 22 2003

Publication series

NameProceedings - IEEE International Conference on Data Mining, ICDM
ISSN (Print)1550-4786

Conference

Conference3rd IEEE International Conference on Data Mining, ICDM '03
Country/TerritoryUnited States
CityMelbourne, FL
Period11/19/0311/22/03

Fingerprint

Dive into the research topics of 'Parsing without a grammar: Making sense of unknown file formats'. Together they form a unique fingerprint.

Cite this