LIVE BENCHMARK

The Agzamov Test

One test that tells you which model is actually smarter.
Strategy, tools, memory, adaptation.

Chess960
No memorized openings. Pure strategy.
vs
Poker
Bluffs, reads, hidden cards.
0% Phase 0 win rate
0 games played
0 format errors

Watch Live

Games are streamed live as they happen. Follow us to get notified.

The Problem

Every AI benchmark tests naked models in a lab. Fixed questions, known answers, no pressure.

But that's not how anyone uses AI. In the real world, models have tools, memory, and orchestration. They face problems that can't be memorized. They work against opponents who adapt.

The Agzamov Test measures the gap. Strip everything away — how good is the model alone? Now add tools and memory back — how much better does it get? That gap is the Agzamov Score (0–100).

Smart model is not a press release. It's a number.

How It Works

Four phases. Each one adds more augmentation. The delta tells you what actually helps.

0 DONE

Sanity Check

Model vs random opponent. Does the harness even work? Can the model play legal moves?

96.7% WR · 0 errors
1 IN PROGRESS

Baseline

Naked model vs naked model. No tools, no memory. Just raw capability. This is E0.

2 NEXT

Augmented vs Naked

Give one model tools + memory, keep the other naked. The score difference = how much augmentation actually helps.

3 NEXT

Arms Race

Both models get tools + memory. Does augmentation still help when the opponent has it too?

What You Get

Δa

The Delta

score(with tools) - score(without)
Positive = augmentation helps. Zero = your RAG is useless. Negative = your tools make it worse.

τ

Learning Speed

How many games until augmentation kicks in. Some models figure out their tools in 5 games. Some never do.

M × A

Compatibility Matrix

Claude + BrainOps Memory = great. GPT + same memory = meh. Not all models benefit from the same tools.