Free LLM Router — Pool 5 free APIs, get 220 RPM at $0

01The insight

A 429 isn't a failure.
It's a budget signal.

Every LLM routing tool — OpenRouter, RouteLLM, OmniRoute — treats a 429 as a provider outage and falls over to the next option. That's the wrong model for free-tier APIs.

Groq isn't down at 30 RPM. It has capacity — just not for you right now.

The problem isn't finding a provider that's up. It's spreading load across the aggregate free budget before any single provider hits its ceiling. Read the x-ratelimit-remaining headers on every response. Steer away before the 429 lands. Pool five providers and you have 220+ RPM at $0 — not because you found one fast provider, but because you're managing a shared budget across all of them simultaneously.

No existing router does this. They're built for paid APIs where a 429 means "try someone else." Free-tier APIs need a different approach entirely.

02How it works

One endpoint.
Five providers. Zero config.

A local proxy that speaks the OpenAI API format. Drop it in front of any tool that uses an LLM — Claude Code, Cursor, LangChain, your own scripts.

Add your five free API keys

One .env file. The setup wizard validates each key, makes a test call, and reports the rate limit ceiling per provider. Under 15 minutes from cold start.

The router reads rate-limit headers on every response

x-ratelimit-remaining and x-ratelimit-reset are ingested in real time. Before each new request, the router steers away from near-limit providers — not after a 429 forces it to.

Load spreads across the pool automatically

The aggregate ceiling rises as providers recover. Run 10 parallel research agents. Send 200 requests per minute. The pool absorbs it. No manual switching, no quota management.

03The pool

220+ RPM for free.
Across five providers.

Each provider is rate-limited individually. The router treats them as a single shared budget — pooling their capacity so you're never blocked waiting for one to recover.

Provider	Free RPM	Model	Production use?
Groq	30	Llama 3.3 70B	✓
Cerebras	30	Qwen 3 235B	✓
DeepSeek	60	DeepSeek Chat	✓
Mistral	60	Mistral Small	dev / eval tier
Gemini	~60	Gemini 2.5 Pro	✓ (1M ctx)
Total	220+	load-balanced	✓

Keys are your own — never shared. Mistral's free tier is marked dev/eval by their ToS; upgrade to Scale for production workloads.

04What it's worth

13 billion tokens.
~$28,000 of equivalent API spend. Free.

At 220 RPM with a 2,000-token average request, the pool delivers 26 million tokens per hour. Running two active hours per working day across the year, that's 13.7 billion free tokens — valued at Claude Haiku API rates.

440K

tokens / minute

at 2,000 tokens/request × 220 RPM

13.7B

tokens / year

at 2 active hours/day, 260 working days

~$28K

annual equivalent

at Claude Haiku 3.5 API rates

How the $28K is calculated: 13.7B tokens at Claude Haiku 3.5 pricing ($0.80/M input · $4/M output, assuming a 60/40 input/output split) = $6,576 input + $21,920 output = $28,496/year in equivalent API spend — eliminated entirely.

05vs alternatives

Nothing else is built
for free-tier pooling.

Other routers target paid APIs. Their signals are cost-per-token and latency. A 429 is an outage to them — they have no concept of budget steering.

Tool	Free-tier aware?	Steers before 429?	Reads ratelimit headers?
OpenRouter	✗ (paid focus)	✗	✗
RouteLLM	✗	✗	✗
OmniRoute	partial	✗ (reacts after)	✗
LiteLLM (default)	✗	✗	✗
Free LLM Router	✓ built for it	✓	✓ real-time

06Use cases

Built for agentic workloads.

Claude Code context offloading

Delegate research, classification, and summarisation tasks to free agents. Reads happen outside your Claude Max budget — at $0.

Parallel agent fan-out

Spawn 10 simultaneous research agents. The pool absorbs the burst across providers without hitting any single ceiling.

Autonomous pipelines

n8n, LangGraph, CrewAI — drop the router in as a free-tier fallback before your paid model calls.

Bulk classification & extraction

Process thousands of documents, classify tickets, extract structured data — at 220 RPM, for free.

Ships as a Docker Compose bundle + Claude Code MCP server. One command to start.

Early access

Get it when it ships.
Shaped by your use case.

Early access list. No spam. One email when it's ready.

07FAQ

Do I need accounts on all five providers? +

You need at least two to see a benefit; three or more is recommended. The setup wizard walks you through each provider's signup page, validates your key with a test call, and tells you your rate limit ceiling. Most providers give free API access in under five minutes.

Is this reselling API access? Does it violate provider ToS? +

No. The router is local software — you run it with your own API keys. Each request is authenticated directly with the provider using your credentials. This is the same model as LiteLLM, Cursor, or any API client. We audited all five provider agreements; building products that call the API is permitted everywhere.

What's the quality like on free models? +

The default pool (Groq Llama 3.3 70B, Cerebras Qwen 3 235B, DeepSeek Chat) is Haiku-tier: good for research, classification, extraction, summarisation, and codegen. It's not a replacement for Sonnet or Opus on strategic reasoning — it's a complement. Reserve Claude Max for things only Claude can do well.

Does it work with Claude Code? +

Yes — there's an MCP server included. It exposes a spawn_free_agent tool that Claude Code can call directly. Reads and research happen in a free-tier loop outside your Claude Max context window, keeping it lean for the work only Claude can do.

How is this different from just using OpenRouter's free models? +

OpenRouter reacts to a 429 and moves on. This router reads x-ratelimit-remaining on every response and steers traffic before the limit lands. Under bursty parallel workloads — 5–10 simultaneous agents — the difference is significant: OpenRouter queues and retries; this router distributes upfront and rarely retries at all.

Stop hitting rate limits.Pool them.