1
Identifying deplicate questions on Quora | |
Author | Akhileshwar, Chennu |
Call Number | AIT RSPR no.IM-17-12 |
Subject(s) | Machine learning--Technique |
Note | A research submitted in partial fulfillment of the requirements for the degree of Master of Engineering in Information Management, School of Engineering and Technology |
Publisher | Asian Institute of Technology |
Series Statement | Research studies project report ; no. IM-17-12 |
Abstract | Finding whether the two questions are asking the same thing can be challenging, as word choice and sentence structure may vary significantly. Some of the natural language processing techniques have been found to have the limited success in separating related question from duplicate ones. Quora is a very good source which helps the users to exchange their knowledge and they also face this problem of duplicate questions. Since Quora gives importance to similar questions problem, it want to provide a good experience for both the question seeker and writer. Using a data set question pairs provided by Quora in Kaggle, we extract the features from the data set by using some methods like common word share, Jaccard Similarity Coefcient, Cosine Similarity, Tf-Idf. After extracting the features from the data we use some machine learning algorithms to build a model using training data. By using this model we get the final values of the test data set. |
Year | 2017 |
Corresponding Series Added Entry | Asian Institute of Technology. Research studies project report ; no. IM-17-12 |
Type | Research Study Project Report (RSPR) |
School | School of Engineering and Technology (SET) |
Department | Department of Information and Communications Technologies (DICT) |
Academic Program/FoS | Information Management (IM) |
Chairperson(s) | Sumanta Guha; |
Examination Committee(s) | Phan Minh Dung;Bohez, Erik L.J.; |
Scholarship Donor(s) | Asian Institute of Technology Fellowship; |
Degree | Research studies project report (M. Eng.) - Asian Institute of Technology, 2017 |