news | Jiayu (Mila) Wang

Sep 18, 2025	SPARKLE is accepted to NeurIPS 2025, check out our paper, code, dataset, and checkpoints! 🎉🎉
Jun 5, 2025	🚀 SPARKLE preprint is now live on arXiv! Reinforcement learning has driven impressive gains in LLM reasoning—but what exactly does RL improve? SPARKLE answers this question with a fine-grained evaluation framework that dissects reasoning into plan-following, problem decomposition, and knowledge use. The results are surprising: explicit plans can actually hurt on the hardest problems, yet RL-tuned models remain far more robust and flexible in handling them. We also find clear gains in how RL enhances knowledge integration. And we push back on a common myth: hard problems can be useful for RL—even when they seem unrewarding. SPARKLE shows how to turn those tough cases into real training signal.
Apr 30, 2025	🚀 COSMOS preprint is now available on arXiv! With training-time and test-time adaptation strategies for LLMs exploding in number, figuring out the best one can feel like a wild goose chase. COSMOS makes it easy — predicting performance and cost accurately and efficiently so you don’t have to burn GPU hours testing every option. Smarter choices, fewer experiments.
Apr 23, 2025	I passed my qualifying exam!
Dec 9, 2024	Attended NeurIPS 2024 in Vancouver and presented two papers: SpatialEval: We took a fresh look at how language models and vision-language models handle spatial reasoning. The twist? We tested them on our new benchmark SpatialEval across TQA, VQA, and VTQA. Found some pretty surprising results! GAD: Ever wondered if constrained decoding changes how LLMs actually behave? We proved it does - and came up with the first solution to fix the distribution distortion problem.
Sep 25, 2024	Two first/co-first authored papers are accepted to NeurIPS 2024! 🎉🎉