Jiayu (Mila) Wang

I am Jiayu (pronounciation: “Jee-ah-yü Wahng”), a fourth-year PhD candidate in Computer Sciences at UW-Madison. I am fortunate to be advised by Prof. Aws Albarghouthi and Prof. Fred Sala (Sprocket Lab).

I am passionate about building efficient and intelligent agentic systems. My recent works focus on:

Data- and compute-efficient adaptation of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) (e.g., cost-effective adaptation via augmenting model routing with expanded adaptation strategy pool)
Reasoning in agentic systems (e.g., deep research, dissecting mathematical reasoning under RL, grammar-aligned decoding (logical reasoning), spatial reasoning)

I’m always happy to discuss research, answer questions, or just chat! Feel free to reach out through my socials (see RHS →).

Outside of research, I love playing tennis🎾 and try to get on the court as often as I can—usually 4–5 times a week.

I am currently on the 2025–2026 job market! Feel free to reach out if you think there might be a good fit!

news

Sep 18, 2025	SPARKLE is accepted to NeurIPS 2025, check out our paper, code, dataset, and checkpoints! 🎉🎉
Jun 5, 2025	🚀 SPARKLE preprint is now live on arXiv! Reinforcement learning has driven impressive gains in LLM reasoning—but what exactly does RL improve? SPARKLE answers this question with a fine-grained evaluation framework that dissects reasoning into plan-following, problem decomposition, and knowledge use. The results are surprising: explicit plans can actually hurt on the hardest problems, yet RL-tuned models remain far more robust and flexible in handling them. We also find clear gains in how RL enhances knowledge integration. And we push back on a common myth: hard problems can be useful for RL—even when they seem unrewarding. SPARKLE shows how to turn those tough cases into real training signal.
Apr 30, 2025	🚀 COSMOS preprint is now available on arXiv! With training-time and test-time adaptation strategies for LLMs exploding in number, figuring out the best one can feel like a wild goose chase. COSMOS makes it easy — predicting performance and cost accurately and efficiently so you don’t have to burn GPU hours testing every option. Smarter choices, fewer experiments.
Apr 23, 2025	I passed my qualifying exam!
Dec 9, 2024	Attended NeurIPS 2024 in Vancouver and presented two papers: SpatialEval: We took a fresh look at how language models and vision-language models handle spatial reasoning. The twist? We tested them on our new benchmark SpatialEval across TQA, VQA, and VTQA. Found some pretty surprising results! GAD: Ever wondered if constrained decoding changes how LLMs actually behave? We proved it does - and came up with the first solution to fix the distribution distortion problem.

selected publications (*equal contribution)

2025

LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild

Jiayu Wang*, Yifei Ming*, Riya Dulepet, Qinglin Chen, Austin Xu, Zixuan Ke, Frederic Sala, Aws Albarghouthi, Caiming Xiong, and Shafiq Joty

Oct 2025

Paper
Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning

Jiayu Wang, Yifei Ming, Zixuan Ke, Caiming Xiong, Shafiq Joty, Aws Albarghouthi, and Frederic Sala

In Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), Jun 2025

Paper Code Project Page Dataset Checkpoints
COSMOS: Predictable and Cost-Effective Adaptation of LLMs

Jiayu Wang, Aws Albarghouthi, and Frederic Sala

Apr 2025

Paper

2024

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Yixuan Li, and Neel Joshi

In Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), Jun 2024

Paper Code Project Page Poster Talk
Grammar-Aligned Decoding

Kanghee Park*, Jiayu Wang*, Taylor Berg-Kirkpatrick, Nadia Polikarpova, and Loris D’Antoni

In Proceedings of the 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), May 2024

Paper Code Talk