Web Page Segmentation for Non Visual Skimming
Résumé
Web page segmentation aims to break a page into smaller blocks, in which contents with coherent semantics are kept together. Examples of tasks targeted by such a technique are advertisement detection or main content extraction. In this paper, we study different seg-mentation strategies for the task of non visual skimming. For that purpose, we consider web page segmentation as a clustering problem of visual elements, where (1) all visual elements must be clustered, (2) a fixed number of clusters must be discovered, and (3) the elements of a cluster should be visually connected. Therefore, we study three different algorithms that comply to these constraints: K-means, F-K-means, and Guided Expansion. Evaluation shows that Guided Expansion evidences statistically-relevant results in terms of compactness and separateness, and satisfies more logical constraints when compared to the other strategies.
Fichier principal
PACLIC_33_2019-Web_Page_Segmentation_for_Non_Visual_Skimming.pdf (670.83 Ko)
Télécharger le fichier
Origine | Fichiers produits par l'(les) auteur(s) |
---|
Loading...