Use this URL to cite or link to this record in EThOS:
Title: Extracting structured data from Web query result pages
Author: Weng, Daiyue
ISNI:       0000 0004 6060 1414
Awarding Body: Queen's University Belfast
Current Institution: Queen's University Belfast
Date of Award: 2016
Availability of Full Text:
Access from EThOS:
A rapidly increasing number of Web databases are now become accessible via their HTML form- based query interfaces only. Comparing various services or products from a number of web sites in a specific domain is time-consuming and tedious. There is a demand for value-added Web applications that integrate data from multiple sources. To facilitate the development of such applications, we need to develop techniques for automating the process of providing integrated access to a multitude of database-driven Web sites, and integrating data from their underlying databases. This presents three challenges, namely query form extraction, query form matching and translation, and Web query result extraction. In this thesis, 1 focus on Web query result extraction, which aims to extract structured data encoded in semi-structured HTML pages, and return extracted data in relational tables. 1 begin by reviewing the existing approaches for Web query result extraction. 1 categorize them based on their degree of automation, i.e. manual, semi-automatic and fully automatic approaches. For each category, every approach will be described in terms of its technical features, followed by an analysis listing the advantages and limitations of the approach. The literature review leads to my proposed approaches, which resolve the Web data extraction problem, i.e. Web data record extraction, Web data alignment and Web data annotation. Each approach is presented in a chapter which includes the methodology, experiment and related work. The last chapter concludes the thesis.
Supervisor: Not available Sponsor: Not available
Qualification Name: Thesis (Ph.D.) Qualification Level: Doctoral
EThOS ID:  DOI: Not available