Kyle Montgomery

I am a second-year PhD student studying computer science at UC Santa Cruz, advised by Chenguang Wang. Previously, I completed my Bachelor’s degree in Computer Science and Mathematics, as well as a Master’s degree in Computer Science, both at Washington University in St. Louis. You can find my CV here.

My current research focuses on LLM post-training, agentic AI, and scaling test-time compute for hard-to-verify tasks. I am also a project lead for rLLM. When not engaging in research, I’m frequently rock climbing 🧗.

selected publications

See the full list of publications. (*) denotes equal contribution.

rLLM Blog

rLLM v0.2: RL Training over General Agentic Programs

Sijun Tan, Kyle Montgomery, and the rLLM Team

Oct 2025

Blog Code
Preprint

Weak Discriminative Verification Enables Strong Test-time Scaling

Kyle Montgomery^*, Sijun Tan^*, Yuqi Chen, Siyuan Zhuang, Tianjun Zhang, Raluca Ada Popa, and Chenguang Wang

Oct 2025

A previous version of this work was accepted to the workshop on Efficient Reasoning at NeurIPS 2025.

PDF Blog Code
FoRLM@NeurIPS2025

LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess

Sai Kolasani, Maxim Saplin, Nicholas Crispino, Kyle Montgomery, Jared Davis, Matei Zaharia, Chi Wang, and Chenguang Wang

In Workshop on Foundations of Reasoning in Language Models at NeurIPS 2025, Sep 2025

PDF Code
NeurIPS2025

VMDT: Decoding the Trustworthiness of Video Foundation Models

Yujin Potter^*, Zhun Wang^*, Nicholas Crispino^*, Kyle Montgomery^*, Alexander Xiong^*, Ethan Chang, Francesco Pinto, Yuqi Chen, and 6 more authors

In Advances in Neural Information Processing Systems, Sep 2025

PDF Code
rLLM Blog

rLLM: A Framework for Post-Training Language Agents

Sijun Tan, Michael Luo, Colin Cai, Tarun Venkat, Kyle Montgomery, Aaron Hao, Tianhao Wu, Arnav Balyan, and 5 more authors

Jul 2025

Blog Code
ICLR2025

JudgeBench: A Benchmark for Evaluating LLM-Based Judges

Sijun Tan^*, Siyuan Zhuang^*, Kyle Montgomery^*, Willian Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, and Ion Stoica

In Proceedings of the Thirteenth International Conference on Learning Representations, Apr 2025

PDF Code
ACL2024

Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

Eric Pasewark^*, Kyle Montgomery^*, Kefei Duan, Dawn Song, and Chenguang Wang

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Aug 2024

PDF Code
ICML2024

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, and Chenguang Wang

In Proceedings of the Forty-first International Conference on Machine Learning, Jul 2024

PDF Code