publications

Publications in reversed chronological order. (*) denotes equal contribution.

2025

  1. Notion
    rLLM: A Framework for Post-Training Language Agents
    Sijun Tan, Michael Luo, Colin Cai, Tarun Venkat, Kyle Montgomery, Aaron Hao, Tianhao Wu, Arnav Balyan, and 5 more authors
    2025
  2. Preprint
    Weak Discriminative Verification Enables Strong Test-time Scaling
    K Montgomery*, S Tan*, Y Chen, S Zhuang, T Zhang, R Popa, and C Wang
    2025
  3. Preprint
    LLM CHESS: Benchmarking Reasoning and Instruction-Following in LLMs through Chess
    S Kolasani, M Saplin, N Crispino, K Montgomery, J Davis, M Zaharia, C Wang, and C Wang
    2025
  4. NeurIPS2025
    VMDT: Decoding the Trustworthiness of Video Foundation Models
    Y Potter*, Z Wang*, N Crispino*, A Xiong*, K Montgomery*, F Pinto, E Chang, Y Chen, and 6 more authors
    In Advances in Neural Information Processing Systems, 2025
  5. KnowFM@ACL2025
    Predicting Task Performance with Context-aware Scaling Laws
    Kyle Montgomery, David Park, Jianhong Tu, Michael Bendersky, Beliz Gunel, Dawn Song, and Chenguang Wang
    In Knowledgeable Foundation Models at ACL 2025, 2025
  6. Preprint
    Humanity’s Last Exam
    Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, and 1101 more authors
    2025
  7. ICLR2025
    JudgeBench: A Benchmark for Evaluating LLM-Based Judges
    Sijun Tan*, Siyuan Zhuang*Kyle Montgomery*, Willian Y. Tang, Alejandro Cuadron, Chenguang Wang, Raluca Ada Popa, and Ion Stoica
    In Proceedings of the Thirteenth International Conference on Learning Representations, 2025

2024

  1. ACL2024
    Re-Tuning: Overcoming the Compositionality Limits of Large Language Models with Recursive Tuning
    Eric Pasewark*Kyle Montgomery*, Kefei Duan, Dawn Song, and Chenguang Wang
    In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, 2024
  2. ICML2024
    Agent Instructs Large Language Models to be General Zero-Shot Reasoners
    Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, and Chenguang Wang
    In Proceedings of the Forty-first International Conference on Machine Learning, 2024