Hi there! 👋 I am Yaxin Luo.
About Me
Decorative animated background
Hello! I am a First-Year Machine Learning PhD student at MBZUAI, advised by Prof. Zhiqiang Shen, Prof. Ivan Laptev and Dr. Fabio Pizzati. I am also closely working with my friend Xiaofu Chen.
Previously, I received my Bachelor's degree from Technical University of Denmark and fortune to be supervised by Prof. Dim P. Papadopoulos.
Recently, I am focusing on physical aware learning for vision models and analysis of pretrain data of LLM.
My research interests span:
- Multimodal Foundation Model / World Model: Developing native multimodal foundation models which can perform understanding, reasoning, generation tasks from video, language, speech. These models will serve as the core intelligence—the "brain"—for Embodied AI, Robotics, and many other applications. (My Long-Term research interest and belief)
- Reinforcement Learning: I study reinforcement learning on top of pretrained and SFT-initialized models to move beyond imitation—unlocking new capabilities in generative modeling and robotics, including training agents inside learned world-model environments.
- Data-centric Machine Learning: Beyond the perspectives of models and algorithms, I am also enjoying to analysis and understand training data, improve data quality, compressdata for training efficiency, and build efficient / scalable data pipelines for curating or synthesizing high-quality training data for foundation models.
News
🚀 OpenCaptchaWorld released and expanded to double the dataset size!
Selected Publications
( * indicate equal contribution)
For full and up-to-date publication list, please refer to my Google Scholar page.
OpenCaptchaWorld: AComprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
APL: Anchor-Based Prompt Learning for One-Stage Weakly Supervised Referring Expression Comprehension
γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension