Free LLM Router

Stop hitting rate limits.
Pool them.

5 free inference APIs. 220+ RPM aggregate. $0 spend.
The only router that steers before the 429 — not after.

220+ free RPM
26M tokens / hour
~$28K annual equivalent
$0 spend
Join the waitlist

01The insight

A 429 isn't a failure.
It's a budget signal.

Every LLM routing tool — OpenRouter, RouteLLM, OmniRoute — treats a 429 as a provider outage and falls over to the next option. That's the wrong model for free-tier APIs.

Groq isn't down at 30 RPM. It has capacity — just not for you right now.

The problem isn't finding a provider that's up. It's spreading load across the aggregate free budget before any single provider hits its ceiling. Read the x-ratelimit-remaining headers on every response. Steer away before the 429 lands. Pool five providers and you have 220+ RPM at $0 — not because you found one fast provider, but because you're managing a shared budget across all of them simultaneously.

No existing router does this. They're built for paid APIs where a 429 means "try someone else." Free-tier APIs need a different approach entirely.

02How it works

One endpoint.
Five providers. Zero config.

A local proxy that speaks the OpenAI API format. Drop it in front of any tool that uses an LLM — Claude Code, Cursor, LangChain, your own scripts.

01
Add your five free API keys
One .env file. The setup wizard validates each key, makes a test call, and reports the rate limit ceiling per provider. Under 15 minutes from cold start.
02
The router reads rate-limit headers on every response
x-ratelimit-remaining and x-ratelimit-reset are ingested in real time. Before each new request, the router steers away from near-limit providers — not after a 429 forces it to.
03
Load spreads across the pool automatically
The aggregate ceiling rises as providers recover. Run 10 parallel research agents. Send 200 requests per minute. The pool absorbs it. No manual switching, no quota management.

03The pool

220+ RPM for free.
Across five providers.

Each provider is rate-limited individually. The router treats them as a single shared budget — pooling their capacity so you're never blocked waiting for one to recover.

Provider Free RPM Model Production use?
Groq 30 Llama 3.3 70B
Cerebras 30 Qwen 3 235B
DeepSeek 60 DeepSeek Chat
Mistral 60 Mistral Small dev / eval tier
Gemini ~60 Gemini 2.5 Pro ✓ (1M ctx)
Total 220+ load-balanced

Keys are your own — never shared. Mistral's free tier is marked dev/eval by their ToS; upgrade to Scale for production workloads.

04What it's worth

13 billion tokens.
~$28,000 of equivalent API spend. Free.

At 220 RPM with a 2,000-token average request, the pool delivers 26 million tokens per hour. Running two active hours per working day across the year, that's 13.7 billion free tokens — valued at Claude Haiku API rates.

440K
tokens / minute
at 2,000 tokens/request × 220 RPM
13.7B
tokens / year
at 2 active hours/day, 260 working days
~$28K
annual equivalent
at Claude Haiku 3.5 API rates

How the $28K is calculated: 13.7B tokens at Claude Haiku 3.5 pricing ($0.80/M input · $4/M output, assuming a 60/40 input/output split) = $6,576 input + $21,920 output = $28,496/year in equivalent API spend — eliminated entirely.

05vs alternatives

Nothing else is built
for free-tier pooling.

Other routers target paid APIs. Their signals are cost-per-token and latency. A 429 is an outage to them — they have no concept of budget steering.

Tool Free-tier aware? Steers before 429? Reads ratelimit headers?
OpenRouter ✗ (paid focus)
RouteLLM
OmniRoute partial ✗ (reacts after)
LiteLLM (default)
Free LLM Router ✓ built for it ✓ real-time

06Use cases

Built for agentic workloads.
Claude Code context offloading
Delegate research, classification, and summarisation tasks to free agents. Reads happen outside your Claude Max budget — at $0.
Parallel agent fan-out
Spawn 10 simultaneous research agents. The pool absorbs the burst across providers without hitting any single ceiling.
Autonomous pipelines
n8n, LangGraph, CrewAI — drop the router in as a free-tier fallback before your paid model calls.
Bulk classification & extraction
Process thousands of documents, classify tickets, extract structured data — at 220 RPM, for free.

Ships as a Docker Compose bundle + Claude Code MCP server. One command to start.

Early access

Get it when it ships.
Shaped by your use case.

Early access list. No spam. One email when it's ready.

~50 spots · free during beta

✓ You're on the list. We'll be in touch.

07FAQ

Do I need accounts on all five providers? +
You need at least two to see a benefit; three or more is recommended. The setup wizard walks you through each provider's signup page, validates your key with a test call, and tells you your rate limit ceiling. Most providers give free API access in under five minutes.
Is this reselling API access? Does it violate provider ToS? +
No. The router is local software — you run it with your own API keys. Each request is authenticated directly with the provider using your credentials. This is the same model as LiteLLM, Cursor, or any API client. We audited all five provider agreements; building products that call the API is permitted everywhere.
What's the quality like on free models? +
The default pool (Groq Llama 3.3 70B, Cerebras Qwen 3 235B, DeepSeek Chat) is Haiku-tier: good for research, classification, extraction, summarisation, and codegen. It's not a replacement for Sonnet or Opus on strategic reasoning — it's a complement. Reserve Claude Max for things only Claude can do well.
Does it work with Claude Code? +
Yes — there's an MCP server included. It exposes a spawn_free_agent tool that Claude Code can call directly. Reads and research happen in a free-tier loop outside your Claude Max context window, keeping it lean for the work only Claude can do.
How is this different from just using OpenRouter's free models? +
OpenRouter reacts to a 429 and moves on. This router reads x-ratelimit-remaining on every response and steers traffic before the limit lands. Under bursty parallel workloads — 5–10 simultaneous agents — the difference is significant: OpenRouter queues and retries; this router distributes upfront and rarely retries at all.