1 AIT Asian Institute of Technology

Part of speech masking effect on vision-language representation learning

AuthorPasit Tiwawongrut
Call NumberAIT Thesis no.DSAI-25-07
Subject(s)Artificial intelligence
Natural language processing (Computer science)
NoteA thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Data Science and Artificial Intelligence
PublisherAsian Institute of Technology
AbstractVision language (VL) models have shown promising performance across multiple tasks in both zero-shot and fine-tuning setups. Most studies use masked language modeling as a pre-training task by apply random masking to image caption tokens. However, random token masking is not an optimal strategy for training VL mod els, and effective masking strategies in VL remain underexplored. In this work, we investigate the effects of part of speech (POS) masking, as each POS category contributes differently to sentence meaning. By pre-training models with different POS masking strategies, we evaluate each model on image-text retrieval, image text matching, and visual question answering tasks. Our findings contribute to a deeper understanding of how POS masking influences model performance, providing insights that can lead to more effective pre-training strategies for future VL models.Our experiments show that the choice of masked tokens matters. For retrieval tasks, masking simpler tokens like determiners leads to higher accuracy than masking nouns, suggesting that freeing the model from predicting harder words can improve overall alignment. For VALSE, selective POS masking consistently performs better than random masking, The VQA show that content-word masking helps most with fine-grained understanding. Even categories that perform less well in retrieval still add value in VQA, showing that different POS support dif ferent aspects of cross-modal learning. We also confirm that models trained with MLM consistently outperform those trained without it, especially downstream task.
Year2025
TypeThesis
SchoolSchool of Engineering and Technology
DepartmentDepartment of Information and Communications Technologies (DICT)
Academic Program/FoSData Science and Artificial Intelligence (DSAI)
Chairperson(s)Chaklam Silpasuwanchai
Examination Committee(s)Chantri Polprasert;Attaphongse Taparugssanagorn
Scholarship Donor(s)AIT Scholarship
DegreeThesis (M. Sc.) - Asian Institute of Technology, 2025


Usage Metrics
View Detail0
Read PDF0
Download PDF0