2025 Notion rLLM: A Framework for Post-Training Language Agents Sijun Tan, Michael Luo, Colin Cai, Tarun Venkat, Kyle Montgomery, Aaron Hao, Tianhao Wu, Arnav Balyan, and 5 more authors 2025 Blog Code Preprint Weak Discriminative Verification Enables Strong Test-time Scaling K Montgomery*, S Tan*, Y Chen, S Zhuang, T Zhang, R Popa, and C Wang 2025 PDF Code Preprint LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess S Kolasani, M Saplin, N Crispino, K Montgomery, J Davis, M Zaharia, C Wang, and C Wang 2025 PDF Code NeurIPS2025 VMDT: Decoding the Trustworthiness of Video Foundation Models Y Potter*, Z Wang*, N Crispino*, A Xiong*, K Montgomery*, F Pinto, E Chang, Y Chen, and 6 more authors In Advances in Neural Information Processing Systems, 2025 PDF Code KnowFM@ACL2025 Predicting Task Performance with Context-aware Scaling Laws Kyle Montgomery, David Park, Jianhong Tu, Michael Bendersky, Beliz Gunel, Dawn Song, and Chenguang Wang In Knowledgeable Foundation Models at ACL 2025, 2025 PDF Code Preprint Humanity’s Last Exam Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, and 1101 more authors 2025 PDF ICLR2025 JudgeBench: A Benchmark for Evaluating LLM-Based Judges Sijun Tan*, Siyuan Zhuang*, Kyle Montgomery*, Willian Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, and Ion Stoica In Proceedings of the Thirteenth International Conference on Learning Representations, 2025 PDF Code 2024 ACL2024 Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning Eric Pasewark*, Kyle Montgomery*, Kefei Duan, Dawn Song, and Chenguang Wang In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024 PDF Code ICML2024 Agent Instructs Large Language Models to be General Zero-Shot Reasoners Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, and Chenguang Wang In Proceedings of the Forty-first International Conference on Machine Learning, 2024 PDF Code