About Me

I am a researcher at Alibaba Tongyi Lab, working on agentic reinforcement learning, agent self-evolution systems, and LLM post-training techniques. I received my Ph.D. from Peking University, School of Computer Science in 2024.

My main research interests include:

Agentic RL: Reinforcement learning algorithms for large language models in dialogue and agent scenarios, including tool use, deep research, and software engineering
Self-Evolving Agent Systems: Self-evolving, continual learning, and autonomous improvement mechanisms for intelligent agents
LLM Post-training: Post-training and fine-tuning techniques for large language models, addressing learning stability and efficiency in long-horizon task scenarios

Hiring: I am looking for researchers and interns interested in agentic RL, agent self-evolution, and LLM post-training techniques. Please feel free to contact me.

Featured Work

AgentEvolver: Towards Efficient Self-Evolving Agent System
Y Zhai, S Tao, C Chen, A Zou, Z Chen, Q Fu, S Mai, L Yu, J Deng, Z Cao, et al.
arXiv preprint arXiv:2511.10395, 2025
[arXiv] [Code]
An efficient self-evolving agent learning system that enables agents to autonomously improve through self-questioning, self-navigating and self-attributing. Open-sourced on ModelScope.
We further present AgentEvolver Game Arena, a multi-agent social game platform featuring Avalon and Diplomacy, designed to study models’ social reasoning capabilities and develop efficient reinforcement learning methods for ultra-long context scenarios.

Selected Publications

Asterisk (*) indicates co-first author. For a complete list, please visit my Google Scholar profile.

ThinkFL: Self-Refining Failure Localization for Microservice Systems via Reinforcement Fine-Tuning
L Zhang*, Y Zhai*, T Jia, C Duan, S Yu, J Gao, B Ding, Z Wu, Y Li
ACM Transactions on Software Engineering and Methodology (TOSEM), 2026

AgentFM: Role-Aware Failure Management for Distributed Databases with LLM-Driven Multi-Agents
L Zhang*, Y Zhai*, T Jia, X Huang, C Duan, Y Li
Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering (FSE), 2025

Provoking Multi-modal Few-Shot LVLM via Exploration-Exploitation In-Context Learning
C Chen*, Y Zhai*, Y Zhao, J Gao, B Ding, J Li
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2025

CuES: A Curiosity-driven and Environment-grounded Synthesis Framework for Agentic RL
S Mai*, Y Zhai*, Z Chen, C Chen, A Zou, S Tao, Z Liu, B Ding
arXiv preprint arXiv:2512.01311, 2025
[arXiv]

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models
L Pan, S Tao, Y Zhai, Z Fu, L Fang, M He, L Zhang, Z Liu, B Ding, A Liu, et al.
arXiv preprint arXiv:2512.09675, 2025

MicroRemed: Benchmarking LLMs in Microservices Remediation
L Zhang, Y Zhai, T Jia, C Duan, M He, L Pan, Z Liu, B Ding, Y Li
arXiv preprint arXiv:2511.01166, 2025

Auto-Rubric: Learning to Extract Generalizable Criteria for Reward Modeling
L Xie, S Huang, Z Zhang, A Zou, Y Zhai, D Ren, K Zhang, H Hu, B Liu, et al.
arXiv preprint arXiv:2510.17314, 2025

Population-Based Evolutionary Gaming for Unsupervised Person Re-identification
Y Zhai, P Peng, M Jia, S Li, W Chen, X Gao, Y Tian
International Journal of Computer Vision, 2023
[arXiv]

Dynamic Belief for Decentralized Multi-Agent Cooperative Learning
Y Zhai, P Peng, C Su, Y Tian
International Joint Conference on Artificial Intelligence (IJCAI), 2023

Stabilizing Visual Reinforcement Learning via Asymmetric Interactive Cooperation
Y Zhai, P Peng, Y Zhao, Y Huang, Y Tian
IEEE International Conference on Computer Vision (ICCV), 2023

Simoun: Synergizing Interactive Motion-Appearance Understanding for Vision-Based Reinforcement Learning
Y Huang, P Peng, Y Zhao, Y Zhai, H Xu, Y Tian
IEEE International Conference on Computer Vision (ICCV), 2023

AD-Cluster: Augmented Discriminative Clustering for Domain Adaptive Person Re-identification
Y Zhai, S Lu, Q Ye, X Shan, J Chen, R Ji, Y Tian
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020
[PDF]

Multiple Expert Brainstorming for Domain Adaptive Person Re-identification
Y Zhai, Q Ye, S Lu, M Jia, R Ji, Y Tian
European Conference on Computer Vision (ECCV), 2020
[arXiv] [Code]

Education

Ph.D. in Computer Science, Peking University, 2018-2024
B.E. in Communication Engineering, Beijing University of Posts and Telecommunications, 2014-2018

Honors and Awards

Alibaba Star Program
Huawei’s Top Minds Program
National Scholarship (2020)
Outstanding Graduate, Peking University (2024)

Yunpeng Zhai (翟云鹏)

Featured Work

Selected Publications

Education

Honors and Awards