1 AIT Asian Institute of Technology

Computing a coarse grained linguistic database using WordNet

AuthorKanjana Jiamjitvanich
Call NumberAIT Thesis no.IM-09-02
Subject(s)Subject headings--Databases
WordNet

NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Information Management, School of Engineering and Technology
PublisherAsian Institute of Technology
Series StatementThesis ; no. IM-09-01
AbstractWord Net is an electronic lexical database for English which stores lemmas and exceptional forms of words, word senses and sense glosses, semantic relations between word senses (e.g. hypernym, holonym), syntactic relations between words (e.g. antonym), and other information related to the structure and use of the language. In WordNet, word senses are represented as synsets which are sets of words with synonymous mean¬ing in a particular context. Word Net synsets as well as holonymjhypernym relations are used in many natural language processing tasks in Sweb which is the semantic web application project of University of Trento , such as word sense disambiguation. A known problem of WordNet is that it is too fine-grained in its sense definitions, whereas ordi¬nary users discriminate among fewer word senses. Moreover, many applications which use WordNet data would benefit if the distinction among word senses was done at a more coarse-grained level and if some very rarely used senses were even dropped from the database. In SWeb the WordNet data is stored in a relational database handled by a component called Controlled Vocabulary (CV). The goal of this thesis is to define the appropriate level of granularity of word senses given the requirements defined in the Sweb project, and develop an algorithm (Coarsealgo) which would compute the coarse-grained version of WordNet to improve the current background knowledge used in SWeb. Coarsealgo has the highest score of the performance measure in every part of speech. Coarsealgo has the best performance in finding polysemy and grouping the similar senses correctly.
Year2009
Corresponding Series Added EntryAsian Institute of Technology. Thesis ; no. IM-09-02
TypeThesis
SchoolSchool of Engineering and Technology (SET)
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSInformation Management (IM)
Chairperson(s)Vilas Wuwongse;
Examination Committee(s)Vatcharaporn Esichaikul;Janecek, Paul;
Scholarship Donor(s)RTG Followship;
DegreeThesis (M.Eng.) - Asian Institute of Technology, 2009


Usage Metrics
View Detail0
Read PDF0
Download PDF0