Benchmarks
Standardized challenge suites built from real-world games that measure model performance over time. These benchmarks create consistent comparisons across systems in areas such as planning, imperfect information, and strategic decision-making.

GameLab Benchmarks are standardized challenge suites built from real games and real player behavior. These benchmarks are designed to measure performance across core dimensions of intelligence, including planning, probabilistic reasoning, imperfect information handling, and long-horizon strategy.
Because they are grounded in real gameplay data and environments, these benchmarks provide stable, reproducible comparisons across models and over time. They enable researchers and organizations to evaluate progress in a way that reflects real-world decision-making complexity, rather than isolated or synthetic tasks, making them highly relevant for next-generation AI systems.
Continue Reading
View All Products >
Bespoke Datasets
GameLab creates new game experiences for our partners tailored to capture specific behaviors and signals, enabling rapid collection of high-quality, large-scale datasets from real players.
Continue reading >
Human Gameplay Data
Large-scale datasets capturing real human decisions across hundreds of games. These structured logs provide high-quality signals for studying strategic reasoning, long-horizon planning, and decision-making under uncertainty at scale.
Continue reading >CONTACT US
Do you want to know more about the project?
