Yaxin Luo
Native Multimodal Foundation Models & Autonomous Agents
Yaxin.Luo@mbzuai.ac.ae luoyaxin999@gmail.com +971 585699266 (UAE) · +86 17882057622 (China) Website Google Scholar
Abu Dhabi, UAE · MBZUAI
Research Interests
My research vision centers on building Autonomous Agents that can perceive, reason, and act in both digital and physical environments. To achieve this, I focus on developing Native Multimodal Foundation Models with unified understanding, reasoning, generation, and agentic capabilities—these models serve as the core intelligence that powers agentic systems to effectively execute complex real-world tasks (e.g., computer/devices use, autonomous research, and continual self-improvement).
Education
PhD of Machine Learning, MBZUAI
Advisors: Prof. Zhiqiang Shen & Prof. Mohsen Guizani
Bachelor of General Engineering (specialized in Machine Learning), Technical University of Denmark
Bachelor thesis advisor: Prof. Dimitrios Papadopoulos · 2800 Kongens Lyngby, Denmark
Bachelor of Mathematics, University of Edinburgh
Overall grade of taken courses: UK First-Class; withdrew 19 Mar 2021 (changed major and country)
Working Experience
Research Assistant, MBZUAI
- Analyzed LLM generalization ability on pure vision tasks using only image data
- Explored reasoning in multimodal large language models (MLLMs)
Conference/Journal Publications
NeurIPS 2025 — Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Luo Y, Li Z, Liu J, Cui J, Zhao X, Shen Z. arXiv:2505.24878 (May 30, 2025).
NeurIPS 2025 — FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation
Cui J, Bi X, Luo Y, Zhao X, Liu J, Shen Z. arXiv:2506.24125 (2025).
ACL 2025 (Main) — DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
Chen J, Myrzakhan A, Luo Y, Khan HM, Bsharat SM, Shen Z. arXiv:2506.01954 (2025).
CVPR 2025 — DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
Chen X, Luo Y, Luo G, Ji J, Ding H, Zhou Y. CVPR 2025, pp. 14347–14357.
ICLR 2025 — γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Luo Y, Luo G, Ji J, Zhou Y, Sun X, Shen Z, Ji R. arXiv:2410.13859 (2024).
ECCV 2024 — APL: Anchor-Based Prompt Learning for One-Stage Weakly Supervised Referring Expression Comprehension
Luo Y, Ji J, Chen X, Zhang Y, Ren T, Luo G. ECCV 2024 (Sep 2024), Springer Nature Switzerland, pp. 198–215.
Other Experience
IEEE Cybermatics Congress 2024 — Conference Local Team Member (Aug 2024)
Conference helper and session chair for the Smart Data workshop.
SciSec 2024 — Conference Helper (Aug 2024)
Institute of Social Science Survey, Peking University — Summer Internship (Jul 2019–Sep 2019)
Internship in a public psychological healthcare project led by the Ministry of Civil Affairs, China.
Other Skills
HPC (Slurm) DeepSpeed Distributed Computing Embedded Systems Signal Processing & Acoustics Bioinformatics (basic)
PDF Preview
If the PDF preview does not load, use the download button above.