I am a second-year PhD student studying computer science at UC Santa Cruz, advised by Chenguang Wang. Previously, I completed my Bachelor’s degree in Computer Science and Mathematics, as well as a Master’s degree in Computer Science, both at Washington University in St. Louis. You can find my CV here.
My current research focuses on LLM post-training, agentic AI, and scaling test-time compute for hard-to-verify tasks. When not engaging in research, I’m frequently rock climbing 🧗.
selected publications
See the full list of publications. (*) denotes equal contribution.
-
rLLM: A Framework for Post-Training Language Agents
Sijun Tan, Michael Luo, Colin Cai, Tarun Venkat, Kyle Montgomery, Aaron Hao, Tianhao Wu, Arnav Balyan, and 5 more authors
2025
-
Weak Discriminative Verification Enables Strong Test-time Scaling
Kyle Montgomery*, Sijun Tan*, Yuqi Chen, Siyuan Zhuang, Tianjun Zhang, Raluca Ada Popa, and Chenguang Wang
In Workshop on Efficient Reasoning at NeurIPS 2025, 2025
-
LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
Sai Kolasani, Maxim Saplin, Nicholas Crispino, Kyle Montgomery, Jared Davis, Matei Zaharia, Chi Wang, and Chenguang Wang
In Workshop on Foundations of Reasoning in Language Models at NeurIPS 2025, 2025
-
VMDT: Decoding the Trustworthiness of Video Foundation Models
Yujin Potter*, Zhun Wang*, Nicholas Crispino*, Kyle Montgomery*, Alexander Xiong*, Ethan Chang, Francesco Pinto, Yuqi Chen, and 6 more authors
In Advances in Neural Information Processing Systems, 2025
-
Humanity’s Last Exam
Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, and 1101 more authors
2025
-
JudgeBench: A Benchmark for Evaluating LLM-Based Judges
Sijun Tan*, Siyuan Zhuang*, Kyle Montgomery*, Willian Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, and Ion Stoica
In Proceedings of the Thirteenth International Conference on Learning Representations, 2025
-
Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
Eric Pasewark*, Kyle Montgomery*, Kefei Duan, Dawn Song, and Chenguang Wang
In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024
-
Agent Instructs Large Language Models to be General Zero-Shot Reasoners
Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, and Chenguang Wang
In Proceedings of the Forty-first International Conference on Machine Learning, 2024