FormulaOne

by AAI

A benchmark of novel, expert-level algorithmic problems over graphs that demand deep dynamic programming and logical reasoning. Shallow and Deeper tiers span moderate through challenging problems, while Deepest is research-level.

Radio

All models were sampled with their highest available reasoning settings and a maximum token budget. We also provided the models with a diverse few-shot prompt that is highly supportive for FormulaOne problems, covering many of the subtle details of state design and maintenance, from a broad array of categories.