Annotating the semantic web
The web of today has evolved into a huge repository of rich Multimedia content for human consumption. The exponential growth of the web made it possible for information size to reach astronomical proportions; far more than a mere human can manage, causing the problem of information overload. Because of this, the creators of the web(lO) spoke of using computer agents in order to process the large amounts of data. To do this, they planned to extend the current web to make it understandable by computer programs. This new web is being referred to as the Semantic Web. Given the huge size of the web, a collective effort is necessary to extend the web. For this to happen, tools easy enough for non-experts to use must be available. This thesis first proposes a methodology which semi-automatically labels semantic entities in web pages. The methodology first requires a user to provide some initial examples. The tool then learns how to reproduce the user's examples and generalises over them by making use of Adaptive Information Extraction (AlE) techniques. When its level of performance is good enough when compared to the user, it then takes over the process and processes the remaining documents autonomously. The second methodology goes a step further and attempts to gather semantically typed information from web pages automatically. It starts from the assumption that semantics are already available all over the web, and by making use of a number of freely available resources (like databases) combined with AlE techniques, it is possible to extract most information automatically. These techniques will certainly not provide all the solutions for the problems brought about with the advent of the Semantic Web. They are intended to provide a step forward towards making the Semantic Web a reality.