1
Clustering Web documents from the Internet | |
Author | Li, Xiaoge |
Call Number | AIT Thesis no. CS-99-2 |
Subject(s) | World Wide Web (Information retrieval system) |
Note | A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science, School of Engineering and Technology |
Publisher | Asian Institute of Technology |
Abstract | Users of Web search engines are often forced to shift through the long ordered list of document returned by the engines. The information retrieval community has explored document clustering as an alternative method of organizing retrieval results, but the clustering technique has not to be developed on major search engines up to now. In this work, we have presented a novel methodology for ordering display web search, together with a browsing interface for exploring the results of ordered map of document space. The method, based on the Self-organizing Maps, performs a completely automatic and unsupervised full text analysis of the web search. The data is collected from Yahoo! and MetaCrawler search engine. The web documents are represented by their word frequency histograms, and high dimensions are reduced by " word stop list" It is important to select appropriate term weighting methods for getting good maps. The results show that each document is mapped onto some grid point, with a link from this point to the document database. The documents are ordered on the grid according to their contents and neighboring documents can be browsed readily. |
Year | 1999 |
Type | Thesis |
School | School of Engineering and Technology (SET) |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Computer Science (CS) |
Chairperson(s) | Sadananda, Ramakoti |
Examination Committee(s) | Devadason, Francis Jawahar;Jian-Guo, Zhang |
Scholarship Donor(s) | Employee Student |
Degree | Thesis (M.Sc.) - Asian Institute of Technology, 1999 |