1
Computing a coarse grained linguistic database using WordNet | |
Author | Kanjana Jiamjitvanich |
Call Number | AIT Thesis no.IM-09-02 |
Subject(s) | Subject headings--Databases WordNet |
Note | A thesis submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Information Management, School of Engineering and Technology |
Publisher | Asian Institute of Technology |
Series Statement | Thesis ; no. IM-09-01 |
Abstract | Word Net is an electronic lexical database for English which stores lemmas and exceptional forms of words, word senses and sense glosses, semantic relations between word senses (e.g. hypernym, holonym), syntactic relations between words (e.g. antonym), and other information related to the structure and use of the language. In WordNet, word senses are represented as synsets which are sets of words with synonymous mean¬ing in a particular context. Word Net synsets as well as holonymjhypernym relations are used in many natural language processing tasks in Sweb which is the semantic web application project of University of Trento , such as word sense disambiguation. A known problem of WordNet is that it is too fine-grained in its sense definitions, whereas ordi¬nary users discriminate among fewer word senses. Moreover, many applications which use WordNet data would benefit if the distinction among word senses was done at a more coarse-grained level and if some very rarely used senses were even dropped from the database. In SWeb the WordNet data is stored in a relational database handled by a component called Controlled Vocabulary (CV). The goal of this thesis is to define the appropriate level of granularity of word senses given the requirements defined in the Sweb project, and develop an algorithm (Coarsealgo) which would compute the coarse-grained version of WordNet to improve the current background knowledge used in SWeb. Coarsealgo has the highest score of the performance measure in every part of speech. Coarsealgo has the best performance in finding polysemy and grouping the similar senses correctly. |
Year | 2009 |
Corresponding Series Added Entry | Asian Institute of Technology. Thesis ; no. IM-09-02 |
Type | Thesis |
School | School of Engineering and Technology (SET) |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Information Management (IM) |
Chairperson(s) | Vilas Wuwongse; |
Examination Committee(s) | Vatcharaporn Esichaikul;Janecek, Paul; |
Scholarship Donor(s) | RTG Followship; |
Degree | Thesis (M.Eng.) - Asian Institute of Technology, 2009 |