How much does Claude Sonnet cost per million tokens?

Claude Sonnet 4 costs $3 per million input tokens and $15 per million output tokens (as of 2026). Claude Haiku 4 costs $0.80 input / $4 output per million tokens. Claude Opus 4 costs $15 input / $75 output per million tokens. Prices are set by Anthropic and subject to change.

How much does GPT-4o cost per million tokens?

GPT-4o costs $2.50 per million input tokens and $10 per million output tokens (as of 2026). GPT-4o mini costs $0.15 input / $0.60 output per million tokens. Prices are set by OpenAI and subject to change.

What is the cheapest AI model for high-volume tasks?

For high-volume, lower-complexity tasks, Gemini 2.0 Flash, GPT-4o mini, and Claude Haiku are currently the cheapest frontier-quality options — all under $1 per million input tokens. The right choice depends on the quality bar your task requires. AnyForge's intent-based routing automatically sends each call to the cheapest model that meets the required quality tier.

How can I reduce my AI API costs?

The most effective lever is tier routing: classify each LLM call by complexity (fast vs smart) and send simple calls to cheaper models (Haiku, GPT-4o mini, Gemini Flash) while reserving frontier models (Claude Sonnet/Opus, GPT-4o, Gemini Pro) for tasks that need them. This typically cuts bills 40–70% with no quality loss. AnyForge Control automates this routing via intent-based classification.

What does AI token routing actually save?

For a team making 1 million LLM calls per day split roughly 70% simple / 30% complex, moving from a flat GPT-4o or Claude Sonnet strategy to a tier-routed strategy (cheap model for simple, frontier for complex) typically saves 50–65% on model costs. The AnyForge cost calculator lets you plug in your own numbers to see the exact savings for your workload.

How does AnyForge Control route LLM calls?

AnyForge Control classifies each incoming prompt by intent: is this a simple lookup, a generation task, or a complex reasoning problem? It routes to the cheapest model in the appropriate quality tier. You can configure routing by rules, regex patterns, or embedding-based semantic similarity. The router is a 5-minute drop-in proxy — SDK-compatible with OpenAI, Anthropic, and Gemini.

AnyForge · The AI governance index

What is your model choice actually costing you?

Most teams default to a frontier model for everything. Smart Routing — Haiku for the bulk, Sonnet for the middle, Opus only where it earns its keep — usually cuts the bill by half or more. Move the sliders.

Team scale · how big is your AI-using engineering org

Pick a preset to populate per-workload daily volumes below. Each row stays editable — adjusting any row switches this to Custom.

Current default model · what you use today

The model your team currently sends every call to. Drives the ungoverned (red) bar — your baseline.

Routing mix · what AnyForge would route to

The models that fill each tier when routing is on. Drives the With AnyForge Control (gold) bar. Default is the Anthropic ladder; mix providers freely to match what you would route to.

Reflex · 70%

Production · 20%

Reasoning · 10%

Annual savings · all workloads

$58K

By workload: $15K chat / rag · $28K single agent · $15K multi-agent crew.

On the multi-agent row specifically: $13K/yr from routing alone, plus $1,895/yr from Crew on top. Crew's biggest savings — HIL gates preventing entire bad runs from completing — aren't modelled here; the calculator under-counts Crew by design until we have measured data.

Your AI portfolio

— set the daily volume per workload to match your stack. Defaults are typical for an active product.

1 call / task · 4.0K in / 500 out per call

Chat / RAG

Daily tasks

$1,950/mo ungoverned

Ungoverned$1,950

With AnyForge Control$702

save $1,248/mo from routing

One question, one answer. Customer chat, document Q&A, simple summarisation. Daily volume = end-user requests, not engineer count — a small SaaS chatbot or internal RAG sees ~1–5K/day.

8 calls / task · 6.0K in / 800 out per call

Single agent

Daily tasks

$3,600/mo ungoverned

Ungoverned$3,600

With AnyForge Control$1,296

save $2,304/mo from routing

One agent loops through plan → tool → result → next step. Cursor, Claude Code, Cline, browser-use. Daily volume = agent invocations across your dev team — ~10 devs × 30 invocations/day ≈ 300/day for an active engineering org; small pilots run 50–100, large teams 500+.

30 calls / task · 8.0K in / 1.0K out per call

Multi-agent crew

Daily tasks

$1,755/mo ungoverned

Ungoverned$1,755

With AnyForge Control$632

save $1,123/mo from routing

+ Crew$474

+$158/mo from Crew on top

= $1,281/mo total saved with Control + Crew

Architect → Engineer → QA → Compliance with HIL gates. The AnyForge Crew shape. Daily volume = orchestrated runs/day — active Crew shops run ~30/day (≈ one per PR); pilots run 5–10, scaled production crosses 100/day.

Bars share one scale across rows and bands. Intent-based routing assumes a 70% / 20% / 10% split across reflex / production / reasoning tiers, filled by Claude Haiku 4.5 · Claude Sonnet 4.6 · Claude Opus 4.7. The Crew band credits a midpoint 25% reduction on routed cost (typically 15%–35% depending on cache hit rate) — applied only to multi-agent because Crew is the multi-agent product. Click the i badge on each band for the mechanisms. Pricing as of 2026-05-01; verify against live API docs before publishing.

AnyForge Platform Fee Calculator

What you actually pay AnyForge

Three columns: same workload running on raw provider API (“DIY direct”), through AnyForge Control proxy, and on AnyForge Crew with managed service. AnyForge's harness reduces tokens ~30% on the same work, and the platform fee is $0.30 / 1M of what flows. Numbers update live; every value is derived from the canonical pricing config.

Team size · 10 devs

Adoption pattern

Claude Code as primary IDE, throughout the day

BYOK provider

Sonnet 4 with heavy prompt caching

DIY direct

Raw provider API. No routing, partial caching, no portable memory or audit.

Tokens / month9.4B

Model spend (BYOK)$20.2K

paid to provider

AnyForge fee—

Total / month

$20.2K

AnyForge proxy

Same workload through AnyForge Control: smart routing + caching cuts tokens ~30%, lower per-token rate.

Tokens / month6.6B

Model spend (BYOK)$9.9K

paid to provider

AnyForge fee$2.0K

$0.30 / 1M

Total / month

$11.9K

$8.3K saved vs DIY

AnyForge Crew

Pick the Full Crew adoption preset to see the Crew + Managed Service breakdown.

Tokens / month6.6B

Model spend (BYOK)$9.9K

paid to provider

AnyForge fee$2.0K

$0.30 / 1M

Total / month

$11.9K

Estimates based on industry-typical token volumes for adoption levels and blended provider rates with caching. AnyForge platform fee: $0.30 / 1M, provider-agnostic. DIY baseline assumes the same workload running on raw provider APIs without routing or shared caching (~30% more tokens, higher per-token effective rate). Self-hosted provider shows $0 model spend — you pay AnyForge for the orchestration, not the inference. Crew Managed Service is a separate contract; floor shown for the selected team size.

Model ladder · 2026 pricing

Last updated: May 1, 2026

Pick the cheapest model that earns its keep.

Three tiers, organised by capability and price band. Reference list — set what your own routing mix uses up in the calculator. The calculator launches with the Anthropic ladder (Haiku 4.5 / Sonnet 4.6 / Opus 4.7) as a recognisable starting point — Anthropic and OpenAI both ship full three-tier stacks, so mix providers freely. Prices are dollars per million tokens.

Tier · reflex

Reflex

Classification, routing, extraction, simple summarisation

ModelProviderInput / 1MOutput / 1M

Claude Haiku 4.5Anthropic$1.00$5.00

GPT-5.4 NanoOpenAI$0.20$1.25

Gemini 3 FlashGoogle$0.50$3.00

Tier · production

Production

Drafting, RAG, customer chat, tool-using agents

ModelProviderInput / 1MOutput / 1M

Claude Sonnet 4.6Anthropic$3.00$15.00

GPT-5.4OpenAI$2.50$15.00

Gemini 3.1 ProGoogle$2.00$12.00

Tier · reasoning

Reasoning

Complex coding, multi-step planning, contract analysis

ModelProviderInput / 1MOutput / 1M

Claude Opus 4.7Anthropic$5.00$25.00

GPT-5.5OpenAI$5.00$30.00

GPT-5.4 ProOpenAI$30.00$180

Routes & agent frameworks

Same model, different routes — does cost actually move?

Per-token list price for the same model is essentially identical across native API, Bedrock, Vertex, and Azure. The interesting cost differences are commitment-driven — EDP credits, CUDs, MACC — and aren't generic. The exception is OpenRouter, which adds a structural ~5% markup. The calculator above does not model commitment-derived discounts because they depend on your existing cloud spend.

RouteHostsPer-token priceReal savings come from

Anthropic native

source →

Claude only

Published list price

Volume tiers, prompt caching API, batch API (50% off), latest features ship here first

AWS Bedrock

source →

Claude, Cohere, Mistral, AI21, Llama, Amazon Titan

Same as native list price for each model

AWS EDP credits, Provisioned Throughput, region availability, no Anthropic features that have not been ported

Google Vertex AI

source →

Claude, Gemini, PaLM, third-party

Same as native list for each model

GCP Committed Use Discounts, regional residency, Gemini-only features (Google Search grounding, etc.)

Azure (AI Foundry / Azure OpenAI)

source →

OpenAI, Llama, Mistral, Cohere, Phi — no Anthropic, no Gemini

Same as OpenAI list; sometimes a small premium

Microsoft EA / MACC commitments, PTUs, enterprise procurement, US-Government regions

OpenRouter

source →

Anthropic, OpenAI, Google, open-source — many providers behind one API

Native list + ~5% markup

One-API convenience, no commitment, useful for spike traffic — worse for steady-state cost

Agent frameworks — what does each lock you into?

Vendor agent frameworks (Bedrock Agents, Vertex Agent Engine, Azure AI Agents, OpenAI Assistants) lock you into the underlying cloud or provider. AnyForge sits in the bring-your-own-harness row — governance applies regardless of route, so the savings story above holds whichever cloud you commit to.

FrameworkWorks withRoute-agnosticNotes

Bring your own harness (AnyForge)

Any LLM, any route

yes

Raw API calls, your own agent loop. Governance applies regardless of route. The same Crew runs on Bedrock, Vertex, or native APIs.

AWS Bedrock Agents

Any model in Bedrock

Action groups, Knowledge Bases, multi-agent collaboration. Tied to AWS console, IAM, observability. Lock-in to AWS.

Google Vertex Agent Builder / Engine

Gemini natively, Anthropic via Vertex, others

GCP-shaped. Tooling, eval harness, deployment all assume GCP IAM and services.

Azure AI Foundry Agent Service

OpenAI + Llama + Mistral via Azure

Azure-only. No Anthropic, no Gemini. Useful for shops standardised on MS stack.

OpenAI Assistants / Anthropic vendor SDKs

That vendor only

Vendor-native agent primitives. Tied to one provider; cannot route across.

From spreadsheet to production

Production AI workloads need Smart Routing — not vibes.

AnyForge does this for you: routing as policy, role-bounded crews, an audit trail your CFO can actually query.

Book a demo →