Annotating Web Search Results

Manthalu, Sam (2014-12-15)

Thesis

With more than millions of pages, the Web has become a greatly enormous information source. This information is in form of documents, images, videos as well as text. With such vast sizes of data, it is a common problem to get the right information that one wants. Oftentimes users have to search for the right content they are looking for from the Web with the help of search engines. Searching can be done manually by use of available platforms like Google or automatically in form of web crawlers. Since the semantic web is not structured, search results can include varying types of information relating to the same query. Sometimes these results cannot be directly analyzed to meet the specific interpretation need. The search result records (SRRs) returned from the Web following manual or automatic queries are in form of web pages that hold results obtained from underlying databases. Such results can further be used in many applications such as data collection, comparison of prices etc. Thus, there is a need to make the SRRs machine processable. To achieve that, it is important that the SRRs are annotated in a meaningful fashion. Annotation adds value to the SRRs in that the collected data can be stored for further analysis and makes the collection easier to read and understand. Also annotation prepares the data for data visualization. The SRRs bearing same concepts are grouped together thus making it easier to make comparisons and analyze and go through the collection. The purpose of this research is to find out how search results from the Web can be automatically annotated and restructured to allow for data visualization for users in a specific domain of discourse. A case study application is implemented that uses a web crawler to retrieve web pages about any topic in public health domain. This research is a continuation of the work done by Mr. Emanuel Onu in the project “Proposal of a Tool to Enhance Competitive Intelligence on the Web”.

Collections: