Jiayu (Mila) Wang

Contact: milawang [at] cs [dot] wisc [dot] edu

I am Jiayu (pronounciation: “Jee-ah-yü Wahng”), a fourth-year PhD candidate in Computer Sciences at UW-Madison. I am fortunate to be advised by Prof. Aws Albarghouthi and Prof. Fred Sala (Sprocket Lab).

I am passionate about building efficient and intelligent agentic systems. My recent works focus on:

I’m always happy to discuss research, answer questions, or just chat! Feel free to reach out through my socials (see RHS →).

Outside of research, I love playing tennis🎾 and try to get on the court as often as I can—usually 4–5 times a week.

I am currently on the 2025–2026 job market! Feel free to reach out if you think there might be a good fit!

news

Jan 27, 2026 LiveResearchBench, a live benchmark for deep research and a comprehensive evaluation suite that closely aligns with human judgment across six dimensions, has been accepted to ICLR 2026, check out our paper, code, dataset! 🎉🎉
Sep 18, 2025 SPARKLE is accepted to NeurIPS 2025, check out our paper, code, dataset, and checkpoints! 🎉🎉
Jun 5, 2025 🚀 SPARKLE preprint is now live on arXiv! Reinforcement learning has driven impressive gains in LLM reasoning—but what exactly does RL improve? SPARKLE answers this question with a fine-grained evaluation framework that dissects reasoning into plan-following, problem decomposition, and knowledge use. The results are surprising: explicit plans can actually hurt on the hardest problems, yet RL-tuned models remain far more robust and flexible in handling them. We also find clear gains in how RL enhances knowledge integration. And we push back on a common myth: hard problems can be useful for RL—even when they seem unrewarding. SPARKLE shows how to turn those tough cases into real training signal.
Apr 30, 2025 🚀 COSMOS preprint is now available on arXiv! With training-time and test-time adaptation strategies for LLMs exploding in number, figuring out the best one can feel like a wild goose chase. COSMOS makes it easy — predicting performance and cost accurately and efficiently so you don’t have to burn GPU hours testing every option. Smarter choices, fewer experiments.
Apr 23, 2025 I passed my qualifying exam!

selected publications (*equal contribution)

2025

  1. lrb.png
    LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
    Jiayu Wang*, Yifei Ming*, Riya Dulepet, Qinglin Chen, Austin Xu, Zixuan Ke, Frederic Sala, Aws Albarghouthi, Caiming Xiong, and Shafiq Joty
    In Proceedings of the 14th International Conference on Learning Representations (ICLR 2026), Oct 2025
  2. sparkle.png
    Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
    Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, and Frederic Sala
    In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), Jun 2025
  3. cosmos.png
    COSMOS: Predictable and Cost-Effective Adaptation of LLMs
    Jiayu Wang, Aws Albarghouthi, and Frederic Sala
    Apr 2025

2024

  1. spatialeval.png
    Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
    Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, and Neel Joshi
    In Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Jun 2024
  2. gad.png
    Grammar-Aligned Decoding
    Kanghee Park*, Jiayu Wang*, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni
    In Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), May 2024