The First Workshop on Test-time Scaling and Reasoning Models
(ScalR @ COLM 2025)

October 10, 2025, Montreal, Canada

About The Workshop

The ScalR workshop focuses on the emerging challenges and opportunities in scaling and reasoning capabilities of large language models at test time. As models continue to grow in size and capability, understanding how to effectively scale their performance and enhance their reasoning abilities during inference has become increasingly important.

Where & When

Palais des Congrès
Montreal, Canada

October 10, 2025

Call For Papers

Submissions and Review Process

We invite researchers and practitioners to submit non-archival, short papers (up to 4 pages excluding references and appendix) describing novel ideas, preliminary results, or negative results related to test-time scaling and reasoning models. We will host paper submissions on OpenReview. Submissions should follow the COLM submission guidelines. Papers should be anonymized for double-blind review. All submissions must be in PDF format, please use the LaTeX style files provided by COLM.

Important Dates

  • Submission Deadline: June 23, 2025 (Submit on OpenReview)
  • Accept/Reject Notification: July 24, 2025

Topics of Interest

Topics of interest include, but are not limited to:

  • Novel test-time algorithms for reasoning, planning, alignment, or agentic tasks
  • Test-time scaling techniques for LLM agents
  • Innovations in model training (algorithms, data, architecture) that facilitate more efficient and robust test-time scaling
  • Test-time scaling for reasoning tasks with verifiable and non-verifiable rewards
  • Novel techniques for training and utilization of outcome and process reward models in test-time scaling scenarios
  • Evaluation: benchmarks, simulation environments, evaluation protocols and metrics, and human in the loop evaluation of test-time scaling methods
  • Theoretical foundations of test-time scaling
  • Test-time scaling techniques for multi-modal reasoning
  • Studies on the faithfulness, trustworthiness, and other safety aspects of large reasoning models
  • All about LLM test time scaling applications: healthcare, robotics, embodiment, chemistry, education, databases with extra encouragements to less-studied domains and beyond
  • Societal implications of LLM test time scaling: bias, equity, misuse, jobs, climate change, and beyond
  • Test time scaling for everyone: multi-linguality, multiculturalism, and inference time adaptations to new values

Invited Speakers

Aviral Kumar

Aviral Kumar

Carnegie Mellon University

Xuezhi Wang

Xuezhi Wang

Google DeepMind

Nathan Lambert

Nathan Lambert

Allen Institute for AI

Lewis Tunstall

Lewis Tunstall

HuggingFace

Azalia Mirhoseini

Azalia Mirhoseini

Stanford University

Schedule

Coming Soon

Organizers

Muhammad Khalifa

Muhammad Khalifa

University of Michigan

PhD candidate focusing on constrained generation and test-time techniques for reasoning. Experienced in organizing local events including NLP@Michigan Day.

Yunxiang Zhang

Yunxiang Zhang

University of Michigan

PhD candidate researching enhancement of knowledge and reasoning capabilities of LLMs for scientific discovery.

Lifan Yuan

Lifan Yuan

UIUC

PhD student focusing on scalable solutions for LLM post-training and inference in reasoning.

Shivam Agarwal

Shivam Agarwal

UIUC

PhD student researching LLM post-training methods and inference scaling.

Hao Peng

Hao Peng

Assistant Professor, UIUC

Research focuses on post-pretraining methods, long-context efficiency, and using LLMs for scientific discovery. Recipient of multiple paper awards and industry recognitions.

Sean Welleck

Sean Welleck

Assistant Professor, CMU

Leads the Machine Learning, Language, and Logic (L3) Lab. Research focuses on LLMs, reasoning, and AI for mathematics. NeurIPS Outstanding Paper Award and NAACL Best Paper Award recipient.

Sewon Min

Sewon Min

Assistant Professor, UC Berkeley

Research expertise in LLMs and NLP, with focus on retrieval augmentation and data-centric approaches. Multiple paper awards at ACL, NeurIPS, and ICLR.

Honglak Lee

Honglak Lee

Professor, University of Michigan

Research spans deep learning, representation learning, and reinforcement learning. Recent work focuses on efficient methods for planning, reasoning, and learning for LLM-based agents. IEEE AI's 10 to Watch, NSF CAREER Award recipient.

Lu Wang

Lu Wang

Associate Professor, University of Michigan

Research focuses on enhancing factuality and reasoning capabilities of language models. Extensive experience organizing workshops at major NLP conferences. Program co-chair for NAACL 2025.