Jiayu (Mila) Wang

Contact: milawang [at] cs [dot] wisc [dot] edu

I am Jiayu (pronounciation: “Jee-ah-yü Wahng”), a fourth-year PhD candidate in Computer Sciences at UW-Madison. I am fortunate to be advised by Prof. Aws Albarghouthi and Prof. Fred Sala (Sprocket Lab).

I am passionate about building efficient and intelligent agentic systems. My recent works focus on:

I’m always happy to discuss research, answer questions, or just chat! Feel free to reach out through my socials (see RHS →).

Outside of research, I love playing tennis🎾 and try to get on the court as often as I can—usually 4–5 times a week.

I am currently on the 2025–2026 job market! Feel free to reach out if you think there might be a good fit!

news

Sep 18, 2025 SPARKLE is accepted to NeurIPS 2025, check out our paper, code, dataset, and checkpoints! 🎉🎉
Jun 5, 2025 🚀 SPARKLE preprint is now live on arXiv! Reinforcement learning has driven impressive gains in LLM reasoning—but what exactly does RL improve? SPARKLE answers this question with a fine-grained evaluation framework that dissects reasoning into plan-following, problem decomposition, and knowledge use. The results are surprising: explicit plans can actually hurt on the hardest problems, yet RL-tuned models remain far more robust and flexible in handling them. We also find clear gains in how RL enhances knowledge integration. And we push back on a common myth: hard problems can be useful for RL—even when they seem unrewarding. SPARKLE shows how to turn those tough cases into real training signal.
Apr 30, 2025 🚀 COSMOS preprint is now available on arXiv! With training-time and test-time adaptation strategies for LLMs exploding in number, figuring out the best one can feel like a wild goose chase. COSMOS makes it easy — predicting performance and cost accurately and efficiently so you don’t have to burn GPU hours testing every option. Smarter choices, fewer experiments.
Apr 23, 2025 I passed my qualifying exam!
Dec 9, 2024 Attended NeurIPS 2024 in Vancouver and presented two papers:
  • SpatialEval: We took a fresh look at how language models and vision-language models handle spatial reasoning. The twist? We tested them on our new benchmark SpatialEval across TQA, VQA, and VTQA. Found some pretty surprising results!
  • GAD: Ever wondered if constrained decoding changes how LLMs actually behave? We proved it does - and came up with the first solution to fix the distribution distortion problem.

selected publications (*equal contribution)

2025

  1. lrb.png
    LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
    Jiayu Wang*, Yifei Ming*, Riya Dulepet, Qinglin Chen, Austin Xu, Zixuan Ke, Frederic Sala, Aws Albarghouthi, Caiming Xiong, and Shafiq Joty
    Oct 2025
  2. sparkle.png
    Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning
    Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, and Frederic Sala
    In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), Jun 2025
  3. cosmos.png
    COSMOS: Predictable and Cost-Effective Adaptation of LLMs
    Jiayu Wang, Aws Albarghouthi, and Frederic Sala
    Apr 2025

2024

  1. spatialeval.png
    Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models
    Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, and Neel Joshi
    In Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Jun 2024
  2. gad.png
    Grammar-Aligned Decoding
    Kanghee Park*, Jiayu Wang*, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni
    In Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), May 2024