Yaxin Luo
Vision & Multimodal Foundation Models
Abu Dhabi, UAE · MBZUAI
Research Interests
My long-term goal is to develop intelligent machines capable of perceiving, understanding, and creating multimodal content (e.g., videos). Interests include multimodal machine learning, vision foundation models, and efficient algorithms for foundation models. Recently, I am focusing on physical aware learning for vision models and analysis of pretrain data of LLM.
Education
PhD in Machine Learning, MBZUAI
PhD Advisor: Prof. Zhiqiang Shen & Prof. Ivan Laptev
BSc in General Engineering (Machine Learning), Technical University of Denmark
Bachelor thesis advisor: Prof. Dimitrios Papadopoulos
BSc in Mathematics, University of Edinburgh
Overall grade of taken courses: UK First-Class; withdrew 19 Mar 2021 (changed major and country)
Working Experience
Research Assistant, MBZUAI
- Analyzing LLM generalization ability on pure vision tasks using only image data
- Exploring reasoning in MLLMs
Publications (first author only)
ICLR 2025 — γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models.
γ-MoD is a plug-and-play approach that replaces redundant dense layers with Mixture-of-Depth (MoD) layers to reduce computation while maintaining performance.
ECCV 2024 — APL: Anchor-based Prompt Learning for One-stage Weakly Supervised Referring Expression Comprehension.
Introduces an Anchor-based Prompt Encoder (APE) to fuse position, color, and category prompts into anchor features, improving weakly supervised vision–language alignment with auxiliary text reconstruction and visual alignment losses; achieves SOTA on RefCOCO and ReferIt.
Other Experience
IEEE Cybermatics Congress 2024 — Conference Local Team Member (Aug 2024)
Acted as a conference helper and session chair of the Smart Data workshop.
SciSec 2024 — Conference Helper (Aug 2024)
Institute of Social Science Survey, Peking University — Summer Internship (Jul 2019–Sep 2019)
Internship in a public psychological healthcare project led by the Ministry of Civil Affairs, China.
Other Skills
HPC (Slurm) DeepSpeed Distributed Computing Embedded Systems Signal Processing & Acoustics
PDF Preview
If the PDF preview does not load, use the download button above.