Yaxin Luo

Native Multimodal Foundation Models & Autonomous Agents
Yaxin.Luo@mbzuai.ac.ae luoyaxin999@gmail.com +971 585699266 (UAE) · +86 17882057622 (China) Website Google Scholar
Abu Dhabi, UAE · MBZUAI

Research Interests

My research vision centers on building Autonomous Agents that can perceive, reason, and act in both digital and physical environments. To achieve this, I focus on developing Native Multimodal Foundation Models with unified understanding, reasoning, generation, and agentic capabilities—these models serve as the core intelligence that powers agentic systems to effectively execute complex real-world tasks (e.g., computer/devices use, autonomous research, and continual self-improvement).

Education

PhD of Machine Learning, MBZUAI
Aug 2025–Jun 2029 (expected)
Advisors: Prof. Zhiqiang Shen & Prof. Mohsen Guizani
Bachelor of General Engineering (specialized in Machine Learning), Technical University of Denmark
Sep 2021–Mar 2025
Bachelor thesis advisor: Prof. Dimitrios Papadopoulos · 2800 Kongens Lyngby, Denmark
Bachelor of Mathematics, University of Edinburgh
Sep 2020–Mar 2021 (withdrew)
Overall grade of taken courses: UK First-Class; withdrew 19 Mar 2021 (changed major and country)

Working Experience

Research Assistant, MBZUAI
Jan 2025–Aug 2025
  • Analyzed LLM generalization ability on pure vision tasks using only image data
  • Explored reasoning in multimodal large language models (MLLMs)

Conference/Journal Publications

NeurIPS 2025 — Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Luo Y, Li Z, Liu J, Cui J, Zhao X, Shen Z. arXiv:2505.24878 (May 30, 2025).
NeurIPS 2025 — FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation
Cui J, Bi X, Luo Y, Zhao X, Liu J, Shen Z. arXiv:2506.24125 (2025).
ACL 2025 (Main) — DRAG: Distilling RAG for SLMs from LLMs to Transfer Knowledge and Mitigate Hallucination via Evidence and Graph-based Distillation
Chen J, Myrzakhan A, Luo Y, Khan HM, Bsharat SM, Shen Z. arXiv:2506.01954 (2025).
CVPR 2025 — DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
Chen X, Luo Y, Luo G, Ji J, Ding H, Zhou Y. CVPR 2025, pp. 14347–14357.
ICLR 2025 — γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Luo Y, Luo G, Ji J, Zhou Y, Sun X, Shen Z, Ji R. arXiv:2410.13859 (2024).
ECCV 2024 — APL: Anchor-Based Prompt Learning for One-Stage Weakly Supervised Referring Expression Comprehension
Luo Y, Ji J, Chen X, Zhang Y, Ren T, Luo G. ECCV 2024 (Sep 2024), Springer Nature Switzerland, pp. 198–215.

Other Experience

IEEE Cybermatics Congress 2024 — Conference Local Team Member (Aug 2024)
Conference helper and session chair for the Smart Data workshop.
SciSec 2024 — Conference Helper (Aug 2024)
Institute of Social Science Survey, Peking University — Summer Internship (Jul 2019–Sep 2019)
Internship in a public psychological healthcare project led by the Ministry of Civil Affairs, China.

Other Skills

HPC (Slurm) DeepSpeed Distributed Computing Embedded Systems Signal Processing & Acoustics Bioinformatics (basic)

PDF Preview

If the PDF preview does not load, use the download button above.