Web resource re-discovery : personal resource storage and retrieval on the World Wide Web
This thesis examines the realm of Web resource re-discovery: the location of previously visited material to enable its further use. As the Web continues to grow, new tools for managing references to useful information - aiding the ever growing numbers of users - must be developed. Retro, a personal information storage and retrieval system, is a prototype of such a tool. Examination of current practice identified two primary tools in use. The first, global indexes were shown to be inadequate - they do not have access to the full content of the Web, and therefore cannot fully support re-discovery; the second, hotlists, required manual intervention, disrupting the primary task: reading and understanding the content. To avoid problems associated with resource discovery systems, and to enable creation of automatic hotlists, Retro moves document indexing to the user's desktop. Problems involved in recording and comparing Web content were present. Personal Web proxies were used to intercept addresses and content of every visited page. Suggestions for possible use of proxy hierarchies, providing shared Web memories, were discussed. Content of HTML pages was extracted into summaries using a two stage SGML parsing technique. Document validity of only 13% indicated that such tools must be used with care. Analysis of Retro, in a limited real-world environment, indicated document re-use at a level suitable for supporting creation of automatic hotlists. Such lists provide useful supplements to existing tools. Projected requirements for personal index storage, over twelve months, averaged 15Mbytes for the Retro filter. This is within acceptable limits for modern desktop computers. Aliases, identified as a serious potential threat for re-discovery tools, were found in 1% of recorded material. Evidence demonstrates that Retro tools provide a useful supplementary environment for re-discovery, and indicates that future research to improve and extend this system is desirable.