Exploring Referring Image Segmentation for Vision–Language Integration

  by   Gary Tam






Departments Computer Science
DescriptionIn this MSc project, students will work on Referring Image Segmentation, where the goal is to segment a specific object in an image given a natural language description (for example, “the person in the blue shirt standing on the left”). This is an important vision–language task that enables more natural and precise human–AI interaction in scenarios such as human–robot collaboration, assistive technologies, and interactive image editing. Students will be expected to carry out a thorough literature survey of recent state-of-the-art methods, implement and evaluate selected models, and report both quantitative and qualitative results. With appropriate supervision, students will critically analyze these approaches and may explore guided improvements or adaptations based on their experimental findings. For further information, interested students may contact Dr. Gary K. L. Tam.
PreparationOur project lab provides access to a 3070 GPU, which should be sufficient for most tasks but may face limitations when handling larger models or complex training processes. If students have access to a 3090, 4090 or even 5090 GPU, it could significantly help speed up inference or training. Alternatively, students can use Google Colab or other external platforms for additional processing power, though these may come with potential time and cost constraints. Students should plan accordingly and manage resources effectively to avoid project delays and unexpected expenses.
Project Categories Artificial Intelligence (AI), Data Science
Project Keywords Computer Vision, Machine Learning


Level of Studies

Level 6 (Undergraduate Year 3) yes
Level 7 (Masters) yes
Level 8 (PhD) yes