1 AIT Asian Institute of Technology

Clustering Web documents from the Internet

AuthorLi, Xiaoge
Call NumberAIT Thesis no. CS-99-2
Subject(s)World Wide Web (Information retrieval system)
NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Science, School of Engineering and Technology
PublisherAsian Institute of Technology
AbstractUsers of Web search engines are often forced to shift through the long ordered list of document returned by the engines. The information retrieval community has explored document clustering as an alternative method of organizing retrieval results, but the clustering technique has not to be developed on major search engines up to now. In this work, we have presented a novel methodology for ordering display web search, together with a browsing interface for exploring the results of ordered map of document space. The method, based on the Self-organizing Maps, performs a completely automatic and unsupervised full text analysis of the web search. The data is collected from Yahoo! and MetaCrawler search engine. The web documents are represented by their word frequency histograms, and high dimensions are reduced by " word stop list" It is important to select appropriate term weighting methods for getting good maps. The results show that each document is mapped onto some grid point, with a link from this point to the document database. The documents are ordered on the grid according to their contents and neighboring documents can be browsed readily.
Year1999
TypeThesis
SchoolSchool of Engineering and Technology (SET)
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSComputer Science (CS)
Chairperson(s)Sadananda, Ramakoti
Examination Committee(s)Devadason, Francis Jawahar;Jian-Guo, Zhang
Scholarship Donor(s)Employee Student
DegreeThesis (M.Sc.) - Asian Institute of Technology, 1999


Usage Metrics
View Detail0
Read PDF0
Download PDF0