Eval Board / MEAT-bench v3
The Leaderboard
Every instrument — human and silicon — on one eval board. Scores are MEAT-Elo (0–100) per domain. The point of the manifesto, rendered as a table: no instrument wins everywhere, and the cheapest model on your task is rarely the strongest one.
| Instrument | MEAT-Elo ↓ | Biology | Physical | Social | Math | Creative | Logistics | Tier | Status | Cost/tok | Value |
|---|---|---|---|---|---|---|---|---|---|---|---|
1 The Retired EngineerMEAT 120B | 64 | 35 | 60 | 50 | 95 | 55 | 88 | pro | ready | 2.50 | 26 |
2 The LawyerMEAT 240B | 60 | 30 | 25 | 96 | 60 | 75 | 72 | pro | ⏳ 429 | 8.00 | 8 |
3 The Olympic AthleteMEAT 60B | 60 | 45 | 99 | 60 | 25 | 50 | 80 | pro | ready | 4.00 | 15 |
4 ChatGPT 5.5SILICON ~2.2T (rumored) | 60 | 85 | 1 | 82 | 97 | 90 | 2 | enterprise | ⏳ 429 | 5.50 | 11 |
5 The Average AdultMEAT 70B | 59 | 45 | 65 | 68 | 50 | 60 | 65 | pro | ready | 1.00 | 59 |
6 Anthropic FableSILICON ~900B (rumored MoE) | 59 | 82 | 1 | 90 | 92 | 89 | 2 | enterprise | ⏳ 429 | 6.00 | 10 |
7 MythosSILICON ~1.8T (rumored) | 58 | 80 | 1 | 78 | 96 | 88 | 2 | enterprise | ⏳ 429 | 5.00 | 12 |
8 Gemini 3 ProSILICON ~1.5T (MoE, rumored) | 58 | 83 | 1 | 80 | 94 | 87 | 2 | enterprise | ⏳ 429 | 4.50 | 13 |
9 Grok 5SILICON ~1.4T (rumored) | 55 | 76 | 1 | 74 | 90 | 86 | 2 | enterprise | ⏳ 429 | 3.80 | 14 |
10 The SurgeonMEAT 405B | 52 | 99 | 22 | 55 | 70 | 35 | 28 | pro | ready | 9.50 | 5 |
11 The TeenagerMEAT 7B | 52 | 25 | 78 | 40 | 45 | 70 | 55 | free | ready | 0.30 | 173 |
12 Kimi 2.6SILICON ~750B (MoE) | 51 | 72 | 1 | 66 | 86 | 78 | 2 | enterprise | ⏳ 429 | 1.00 | 51 |
13 DeepSeek V4SILICON ~720B (MoE) | 50 | 70 | 1 | 60 | 93 | 72 | 2 | enterprise | ⏳ 429 | 0.90 | 56 |
14 Frontier-7BSILICON 7B | 31 | 40 | 1 | 42 | 55 | 48 | 1 | enterprise | ready | 0.40 | 78 |
15 The ToddlerMEAT 1B | 30 | 5 | 30 | 35 | 8 | 92 | 12 | free | ready | 0.10 | 300 |
Methodology: MEAT-Elo is the unweighted mean of domain scores. Value = MEAT-Elo ÷ cost-per-token. Silicon scores 1 on embodied domains (Physical, Logistics) because no model can wash a car end-to-end. All figures are deadpan satire and reflect no real product's capabilities. Click a column to re-rank.
Reading the board
The Surgeon tops Biology and bottoms out on Physical. Mythos tops Math and scores 1 on anything needing a body. Capability is a profile, not a number.
The Teenager and Frontier-7B win on Value (capability per dollar) while losing on raw MEAT-Elo. Route by task, not by leaderboard rank.
Every LLM scores 1 on Physical and Logistics. You cannot prompt your way into moving an atom — that column belongs to meat.