Post

I Built an Auto-Updating Archive of Every AI Arena Leaderboard

Arena AI has no public API and no historical data. So I built a GitHub repo that auto-fetches all 10 leaderboards daily into structured JSON with full history.

I Built an Auto-Updating Archive of Every AI Arena Leaderboard

Arena AI (formerly LMSYS Chatbot Arena) is the gold standard for AI model rankings. Thousands of researchers, developers, and AI enthusiasts check it daily to see which models lead the pack.

But there’s a problem: Arena AI has no public API, and no historical data.

You can see today’s rankings on the website. But what were the rankings last week? Last month? When did a model first appear? How fast did it climb? You can’t answer any of these questions.

So I built a solution.

The Repo

arena-ai-leaderboards — a GitHub repo that auto-fetches all 10 Arena AI leaderboards daily and stores them as structured JSON.

Every day at ~01:37 UTC, a GitHub Actions workflow:

  1. Auto-discovers all leaderboard categories from the Arena AI overview page (no hardcoded list — when they add a new category, we pick it up automatically)
  2. Fetches full model rankings via Jina Reader for clean content extraction
  3. Parses the data into structured JSON using Azure OpenAI
  4. Validates against a strict JSON Schema
  5. Commits everything to data/{YYYY-MM-DD}/

What’s Inside

10 leaderboard categories, covering the full spectrum:

CategoryModelsWhat it covers
Text67LLM chat (GPT, Claude, Gemini)
Code55Code generation
Vision30Multimodal understanding
Text-to-Image50Image generation
Text-to-Video37Video generation
Image-to-Video37Image animation
Image Edit39Image editing
Document13Document understanding
Search22Search & RAG
Video Edit4Video editing

300+ models tracked across all categories, with ELO scores, confidence intervals, vote counts, vendor info, and license type (open vs proprietary).

The Data Format

Every leaderboard file follows a unified JSON schema:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{
  "meta": {
    "leaderboard": "text-to-video",
    "source_url": "https://arena.ai/leaderboard/text-to-video",
    "fetched_at": "2026-03-21T05:12:05+00:00",
    "model_count": 37
  },
  "models": [
    {
      "rank": 1,
      "model": "veo-3.1-audio-1080p",
      "vendor": "Google",
      "license": "proprietary",
      "score": 1381,
      "ci": 8,
      "votes": 5537
    }
  ]
}

Consistent, machine-readable, and schema-validated. No more scraping HTML tables.

Quick Access

There’s a free REST API at api.wulong.dev — no auth needed:

1
2
3
4
5
6
7
8
# List all leaderboards
curl https://api.wulong.dev/arena-ai-leaderboards/v1/leaderboards

# Get LLM rankings
curl https://api.wulong.dev/arena-ai-leaderboards/v1/leaderboard?name=text

# Get a specific date
curl https://api.wulong.dev/arena-ai-leaderboards/v1/leaderboard?name=text-to-video&date=2026-03-21

Or grab raw JSON directly from GitHub:

1
curl https://raw.githubusercontent.com/oolong-tea-2026/arena-ai-leaderboards/main/data/2026-03-21/text.json

Python:

1
2
3
4
5
6
7
8
import requests

text = requests.get(
    "https://api.wulong.dev/arena-ai-leaderboards/v1/leaderboard?name=text"
).json()

for m in text["models"][:10]:
    print(f"#{m['rank']} {m['model']} ({m['vendor']}) — ELO {m['score']}")

Why I Built This

I was tracking model performance for a project and got frustrated. Arena AI’s website is great for a quick glance, but useless for:

  • Trend analysis — how did a model’s ranking change over time?
  • Automated monitoring — alert me when a new model enters the top 5
  • Research — cross-reference rankings across categories
  • Dashboards — feed live data into custom visualizations

The lack of an API and historical data is a real gap. This repo fills it.

Today’s Highlights

As of March 21, 2026:

LLM Arena: Anthropic dominates with Claude Opus 4.6 (thinking variant at #1, standard at #2). Google’s Gemini 3.1 Pro and xAI’s Grok 4.20 are close behind.

Video Generation: Google’s Veo 3.1 sweeps the top 3 spots. OpenAI’s Sora 2 Pro holds #4. The gap is narrowing fast.

What’s Next

The daily snapshots are just the foundation. With historical data accumulating, the interesting stuff becomes possible:

  • Trend charts — visualize ELO trajectories over time
  • Change detection — automated alerts when rankings shift significantly
  • API endpoint — REST API for easier programmatic access
  • Analysis notebooks — Jupyter notebooks for common research queries

Star the repo if you find it useful: oolong-tea-2026/arena-ai-leaderboards

The more stars, the more visibility, the more people benefit from having this data open and accessible. 🌟

This post is licensed under CC BY 4.0 by the author.