Benchmarking

The new benchmarking standard for AI models.

Our benchmarking goes deeper than specialized tasks and surface intelligence. We’ve created a new Cognitive Index Score to measure AI capabilities across eight brain functions. We also offer a live model leaderboard, private on-demand evaluation, and model-versus-human competitive benchmarking.

Standardized challenge suites built from real-world games that measure model performance over time. These benchmarks create consistent comparisons across systems in areas such as planning, imperfect information, and strategic decision-making.

GameLab Benchmarks are standardized challenge suites built from real games and real player behavior. These benchmarks are designed to measure performance across core dimensions of intelligence, including planning, probabilistic reasoning, imperfect information handling, and long-horizon strategy.

Because they are grounded in real gameplay data and environments, these benchmarks provide stable, reproducible comparisons across models and over time. They enable researchers and organizations to evaluate progress in a way that reflects real-world decision-making complexity, rather than isolated or synthetic tasks, making them highly relevant for next-generation AI systems.

GameLab’s evaluation framework provides a systematic way to measure model performance on tasks that require reasoning, strategy, and decision-making over time. Models are evaluated against real human gameplay patterns, structured objectives, and reproducible scenarios, allowing for consistent tracking of progress across versions and architectures.

Unlike traditional benchmarks that focus on static tasks, this evaluation system emphasizes dynamic performance, how models behave across sequences of decisions, under uncertainty, and in changing environments. The result is a more meaningful understanding of capability: not just whether a model can produce the right answer, but whether it can consistently make strong decisions in complex, real-world-like scenarios.

Continue Reading

View All Products >

Environments

We offer a wide array of training environments - our own spaces where AI can safely train and transform its capability. From strategy to spatial to reasoning up to reinforcement learning with human feedback.

Learn more >

Human Data

Our information-rich decision data is exactly what AI needs to up its cognitive potential. GameLab’s proprietary data comes from 22 million monthly players of our own games that we’ve created and hosted for decades.

Learn more >

Contact Us

Do you want to know more about the project?