2025 Notion rLLM: A Framework for Post-Training Language Agents Sijun Tan, Michael Luo, Colin Cai, Tarun Venkat, Kyle Montgomery, Aaron Hao, Tianhao Wu, Arnav Balyan, and 5 more authors 2025 Blog Code ER@NeurIPS2025 Weak Discriminative Verification Enables Strong Test-time Scaling Kyle Montgomery*, Sijun Tan*, Yuqi Chen, Siyuan Zhuang, Tianjun Zhang, Raluca Ada Popa, and Chenguang Wang In Workshop on Efficient Reasoning at NeurIPS 2025, 2025 PDF Code FoRLM@NeurIPS2025 LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess Sai Kolasani, Maxim Saplin, Nicholas Crispino, Kyle Montgomery, Jared Davis, Matei Zaharia, Chi Wang, and Chenguang Wang In Workshop on Foundations of Reasoning in Language Models at NeurIPS 2025, 2025 PDF Code NeurIPS2025 VMDT: Decoding the Trustworthiness of Video Foundation Models Yujin Potter*, Zhun Wang*, Nicholas Crispino*, Kyle Montgomery*, Alexander Xiong*, Ethan Chang, Francesco Pinto, Yuqi Chen, and 6 more authors In Advances in Neural Information Processing Systems, 2025 PDF Code KnowFM@ACL2025 Predicting Task Performance with Context-aware Scaling Laws Kyle Montgomery, David Park, Jianhong Tu, Michael Bendersky, Beliz Gunel, Dawn Song, and Chenguang Wang In Knowledgeable Foundation Models at ACL 2025, 2025 PDF Code Preprint Humanity’s Last Exam Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, and 1101 more authors 2025 PDF ICLR2025 JudgeBench: A Benchmark for Evaluating LLM-Based Judges Sijun Tan*, Siyuan Zhuang*, Kyle Montgomery*, Willian Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, and Ion Stoica In Proceedings of the Thirteenth International Conference on Learning Representations, 2025 PDF Code 2024 ACL2024 Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning Eric Pasewark*, Kyle Montgomery*, Kefei Duan, Dawn Song, and Chenguang Wang In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024 PDF Code ICML2024 Agent Instructs Large Language Models to be General Zero-Shot Reasoners Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, and Chenguang Wang In Proceedings of the Forty-first International Conference on Machine Learning, 2024 PDF Code