Do current state-of-the-art generative models understand gaze? (ZI)

  by   Gary Tam






Departments Zienkiewicz Institute for Modelling, Data and AI
DescriptionThis project aims to explore the potential of generative AI models for understanding and integrating gaze as a key feature. By evaluating state-of-the-art models like SAM (Segment Anything Model) and SAM2, which focus on image segmentation and object recognition, we will investigate how these models handle visual cues and whether gaze data can be leveraged to improve the generation of prompts for downstream application (e.g., referred image segmentation, gaze understanding). Specifically, the objective is to understand the underlying mechanisms of these models, such as their use of image segmentation, attention mechanisms, and how they process and interpret visual information. We will also explore other visual scene generative AI models, such as DALLĀ·E 2 (which generates images from textual descriptions), DeepDream (which emphasizes pattern recognition and abstraction in images), and CLIP (which connects visual and textual data), to assess how these architectures could be adapted for gaze integration. If none of these techniques perform well or are unable to handle gaze-based prompts, your task will be to analyze the limitations of these models, provide insights into their shortcomings, explore potential reasons for these limitations, and consider approaches to address them.
PreparationOur project lab provides access to a 3070 GPU, which should be sufficient for most tasks but may face limitations when handling larger models or complex training processes. If students have access to a 3090 or 4090 GPU, it could significantly help speed up inference or training. Alternatively, students can use Google Colab or other external platforms for additional processing power, though these may come with potential time and cost constraints. Students should plan accordingly and manage resources effectively to avoid project delays and unexpected expenses.
Project Categories Architectures/Networks, Artificial Intelligence (AI), Data Science, Visual Computing
Project Keywords Computer Vision, Image Processing, Machine Learning, Neural Networks


Level of Studies

Level 6 (Undergraduate Year 3) yes
Level 7 (Masters) yes
Level 8 (PhD) yes