publications | Kyle Montgomery

2025

rLLM Blog

rLLM v0.2: RL Training over General Agentic Programs

Sijun Tan, Kyle Montgomery, and the rLLM Team

Oct 2025

Blog Code
ER@NeurIPS2025

Budget-aware Test-time Scaling via Discriminative Verification

Kyle Montgomery^*, Sijun Tan^*, Yuqi Chen, Siyuan Zhuang, Tianjun Zhang, Raluca Ada Popa, and Chenguang Wang

In Workshop on Efficient Reasoning at NeurIPS 2025, Oct 2025

PDF Blog Code
FoRLM@NeurIPS2025

LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess

Sai Kolasani, Maxim Saplin, Nicholas Crispino, Kyle Montgomery, Jared Davis, Matei Zaharia, Chi Wang, and Chenguang Wang

In Workshop on Foundations of Reasoning in Language Models at NeurIPS 2025, Sep 2025

PDF Code
NeurIPS2025

VMDT: Decoding the Trustworthiness of Video Foundation Models

Yujin Potter^*, Zhun Wang^*, Nicholas Crispino^*, Kyle Montgomery^*, Alexander Xiong^*, Ethan Chang, Francesco Pinto, Yuqi Chen, and 6 more authors

In Advances in Neural Information Processing Systems, Sep 2025

PDF Code
KnowFM@ACL2025

Predicting Task Performance with Context-aware Scaling Laws

Kyle Montgomery, David Park, Jianhong Tu, Michael Bendersky, Beliz Gunel, Dawn Song, and Chenguang Wang

In Knowledgeable Foundation Models at ACL 2025, Aug 2025

PDF Code
rLLM Blog

rLLM: A Framework for Post-Training Language Agents

Sijun Tan, Michael Luo, Colin Cai, Tarun Venkat, Kyle Montgomery, Aaron Hao, Tianhao Wu, Arnav Balyan, and 5 more authors

Jul 2025

Blog Code
ICLR2025

JudgeBench: A Benchmark for Evaluating LLM-Based Judges

Sijun Tan^*, Siyuan Zhuang^*, Kyle Montgomery^*, Willian Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, and Ion Stoica

In Proceedings of the Thirteenth International Conference on Learning Representations, Apr 2025

PDF Code
Preprint

Humanity’s Last Exam

Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, and 1101 more authors

Jan 2025

PDF

2024

ACL2024

Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning

Eric Pasewark^*, Kyle Montgomery^*, Kefei Duan, Dawn Song, and Chenguang Wang

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Aug 2024

PDF Code
ICML2024

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, and Chenguang Wang

In Proceedings of the Forty-first International Conference on Machine Learning, Jul 2024

PDF Code