Use this URL to cite or link to this record in EThOS:
Title: BERyL : unified approach to web block classification
Author: Kravchenko, Andrey
ISNI:       0000 0004 6498 4838
Awarding Body: University of Oxford
Current Institution: University of Oxford
Date of Award: 2015
Availability of Full Text:
Access from EThOS:
Full text unavailable from EThOS. Please try the link below.
Access from Institution:
Web blocks such as navigation menus, advertisements, and headers and footers are key components of web pages which define not only the appearance of a web page but also the way in which humans interact with different parts of the page. For machines, however, classifying and interacting with these blocks is a surprisingly hard task. Yet, web block classification has varied applications in the fields of wrapper induction, assistance for visually impaired people, mobile web browsing, web page topic clustering and web searching. Our system for web block classification, BERyL, performs the automated classification of web blocks through a combination of machine learning and declarative, model-driven feature extraction based on Datalog rules. BERyL uses refined feature sets for the classification of individual blocks to achieve accurate classification for all of the block types that we have so far observed. The high accuracy is achieved through these carefully selected features. Some are even tuned to the specific block type. At the same time, BERyL avoids the high cost of feature engineering through a model-driven rather than programmatic approach to feature extraction. Not only does this reduce the time for feature engineering, the model-driven, declarative approach also allows for semi-automatic optimisation of the feature extraction system. BERyL also employs a holistic approach to web block classification where individual blocks are considered within the context of a web page powered by the knowledge representation rules specific to that domain. We validate these claims for a broad range of web blocks in an extensive evaluation.
Supervisor: Gottlob, Georg Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available