TY - GEN
T1 - Automatic discovery of semantic structures in HTML documents
AU - Mukherjee, Saikat
AU - Yang, Guizhen
AU - Tan, Wenfang
AU - Ramakrishnan, I. V.
N1 - Publisher Copyright:
© 2003 IEEE.
PY - 2003
Y1 - 2003
N2 - Template-driven HTML documents posses an implicit, fixed schema denoting concepts and their relationships in a hierarchical fashion. Discovering this schema remains a relatively unexplored problem. By exploiting a key observation that semantically related items in HTML documents exhibit spatial locality, we develop an algorithm for automatically partitioning them into tree-like semantic structures which expose the implicit schema.
AB - Template-driven HTML documents posses an implicit, fixed schema denoting concepts and their relationships in a hierarchical fashion. Discovering this schema remains a relatively unexplored problem. By exploiting a key observation that semantically related items in HTML documents exhibit spatial locality, we develop an algorithm for automatically partitioning them into tree-like semantic structures which expose the implicit schema.
UR - https://www.scopus.com/pages/publications/84945930296
U2 - 10.1109/ICDAR.2003.1227667
DO - 10.1109/ICDAR.2003.1227667
M3 - Conference contribution
AN - SCOPUS:84945930296
T3 - Proceedings of the International Conference on Document Analysis and Recognition, ICDAR
SP - 245
EP - 249
BT - Proceedings - 7th International Conference on Document Analysis and Recognition, ICDAR 2003
PB - IEEE Computer Society
T2 - 7th International Conference on Document Analysis and Recognition, ICDAR 2003
Y2 - 3 August 2003 through 6 August 2003
ER -