AnyForge · The AI governance index

What is your model choice actually costing you?

Most teams default to a frontier model for everything. Smart Routing — Haiku for the bulk, Sonnet for the middle, Opus only where it earns its keep — usually cuts the bill by half or more. Move the sliders.

Pick a preset to populate per-workload daily volumes below. Each row stays editable — adjusting any row switches this to Custom.

The model your team currently sends every call to. Drives the ungoverned (red) bar — your baseline.

Routing mix · what AnyForge would route to

The models that fill each tier when routing is on. Drives the With AnyForge Control (gold) bar. Default is the Anthropic ladder; mix providers freely to match what you would route to.

Annual savings · all workloads

$58K

By workload: $15K chat / rag · $28K single agent · $15K multi-agent crew.

On the multi-agent row specifically: $13K/yr from routing alone, plus $1,895/yr from Crew on top. Crew's biggest savings — HIL gates preventing entire bad runs from completing — aren't modelled here; the calculator under-counts Crew by design until we have measured data.

Your AI portfolio

— set the daily volume per workload to match your stack. Defaults are typical for an active product.

1 call / task · 4.0K in / 500 out per call

Chat / RAG

$1,950/mo ungoverned

Ungoverned$1,950
With AnyForge Control$702

save $1,248/mo from routing

One question, one answer. Customer chat, document Q&A, simple summarisation. Daily volume = end-user requests, not engineer count — a small SaaS chatbot or internal RAG sees ~1–5K/day.

8 calls / task · 6.0K in / 800 out per call

Single agent

$3,600/mo ungoverned

Ungoverned$3,600
With AnyForge Control$1,296

save $2,304/mo from routing

One agent loops through plan → tool → result → next step. Cursor, Claude Code, Cline, browser-use. Daily volume = agent invocations across your dev team — ~10 devs × 30 invocations/day ≈ 300/day for an active engineering org; small pilots run 50–100, large teams 500+.

30 calls / task · 8.0K in / 1.0K out per call

Multi-agent crew

$1,755/mo ungoverned

Ungoverned$1,755
With AnyForge Control$632

save $1,123/mo from routing

+ Crew$474

+$158/mo from Crew on top

= $1,281/mo total saved with Control + Crew

Architect → Engineer → QA → Compliance with HIL gates. The AnyForge Crew shape. Daily volume = orchestrated runs/day — active Crew shops run ~30/day (≈ one per PR); pilots run 5–10, scaled production crosses 100/day.

Bars share one scale across rows and bands. Intent-based routing assumes a 70% / 20% / 10% split across reflex / production / reasoning tiers, filled by Claude Haiku 4.5 · Claude Sonnet 4.6 · Claude Opus 4.7. The Crew band credits a midpoint 25% reduction on routed cost (typically 15%35% depending on cache hit rate) — applied only to multi-agent because Crew is the multi-agent product. Click the i badge on each band for the mechanisms. Pricing as of 2026-05-01; verify against live API docs before publishing.

AnyForge Platform Fee Calculator

What you actually pay AnyForge

Three columns: same workload running on raw provider API (“DIY direct”), through AnyForge Control proxy, and on AnyForge Crew with managed service. AnyForge's harness reduces tokens ~30% on the same work, and the platform fee is $0.30 / 1M of what flows. Numbers update live; every value is derived from the canonical pricing config.

Claude Code as primary IDE, throughout the day

Sonnet 4 with heavy prompt caching

DIY direct

Raw provider API. No routing, partial caching, no portable memory or audit.

Tokens / month9.4B
Model spend (BYOK)$20.2K
paid to provider
AnyForge fee
Total / month
$20.2K
AnyForge proxy

Same workload through AnyForge Control: smart routing + caching cuts tokens ~30%, lower per-token rate.

Tokens / month6.6B
Model spend (BYOK)$9.9K
paid to provider
AnyForge fee$2.0K
$0.30 / 1M
Total / month
$11.9K
$8.3K saved vs DIY
AnyForge Crew

Pick the Full Crew adoption preset to see the Crew + Managed Service breakdown.

Tokens / month6.6B
Model spend (BYOK)$9.9K
paid to provider
AnyForge fee$2.0K
$0.30 / 1M
Total / month
$11.9K

Estimates based on industry-typical token volumes for adoption levels and blended provider rates with caching. AnyForge platform fee: $0.30 / 1M, provider-agnostic. DIY baseline assumes the same workload running on raw provider APIs without routing or shared caching (~30% more tokens, higher per-token effective rate). Self-hosted provider shows $0 model spend — you pay AnyForge for the orchestration, not the inference. Crew Managed Service is a separate contract; floor shown for the selected team size.

Model ladder · 2026 pricing

Last updated: May 1, 2026

Pick the cheapest model that earns its keep.

Three tiers, organised by capability and price band. Reference list — set what your own routing mix uses up in the calculator. The calculator launches with the Anthropic ladder (Haiku 4.5 / Sonnet 4.6 / Opus 4.7) as a recognisable starting point — Anthropic and OpenAI both ship full three-tier stacks, so mix providers freely. Prices are dollars per million tokens.

Tier · reflex

Reflex

Classification, routing, extraction, simple summarisation

ModelProviderInput / 1MOutput / 1M
Claude Haiku 4.5Anthropic$1.00$5.00
GPT-5.4 NanoOpenAI$0.20$1.25
Gemini 3 FlashGoogle$0.50$3.00

Tier · production

Production

Drafting, RAG, customer chat, tool-using agents

ModelProviderInput / 1MOutput / 1M
Claude Sonnet 4.6Anthropic$3.00$15.00
GPT-5.4OpenAI$2.50$15.00
Gemini 3.1 ProGoogle$2.00$12.00

Tier · reasoning

Reasoning

Complex coding, multi-step planning, contract analysis

ModelProviderInput / 1MOutput / 1M
Claude Opus 4.7Anthropic$5.00$25.00
GPT-5.5OpenAI$5.00$30.00
GPT-5.4 ProOpenAI$30.00$180

Routes & agent frameworks

Same model, different routes — does cost actually move?

Per-token list price for the same model is essentially identical across native API, Bedrock, Vertex, and Azure. The interesting cost differences are commitment-driven — EDP credits, CUDs, MACC — and aren't generic. The exception is OpenRouter, which adds a structural ~5% markup. The calculator above does not model commitment-derived discounts because they depend on your existing cloud spend.

RouteHostsPer-token priceReal savings come from

Anthropic native

source →

Claude only

Published list price

Volume tiers, prompt caching API, batch API (50% off), latest features ship here first

AWS Bedrock

source →

Claude, Cohere, Mistral, AI21, Llama, Amazon Titan

Same as native list price for each model

AWS EDP credits, Provisioned Throughput, region availability, no Anthropic features that have not been ported

Google Vertex AI

source →

Claude, Gemini, PaLM, third-party

Same as native list for each model

GCP Committed Use Discounts, regional residency, Gemini-only features (Google Search grounding, etc.)

Azure (AI Foundry / Azure OpenAI)

source →

OpenAI, Llama, Mistral, Cohere, Phi — no Anthropic, no Gemini

Same as OpenAI list; sometimes a small premium

Microsoft EA / MACC commitments, PTUs, enterprise procurement, US-Government regions

OpenRouter

source →

Anthropic, OpenAI, Google, open-source — many providers behind one API

Native list + ~5% markup

One-API convenience, no commitment, useful for spike traffic — worse for steady-state cost

Agent frameworks — what does each lock you into?

Vendor agent frameworks (Bedrock Agents, Vertex Agent Engine, Azure AI Agents, OpenAI Assistants) lock you into the underlying cloud or provider. AnyForge sits in the bring-your-own-harness row — governance applies regardless of route, so the savings story above holds whichever cloud you commit to.

FrameworkWorks withRoute-agnosticNotes

Bring your own harness (AnyForge)

Any LLM, any route

yes

Raw API calls, your own agent loop. Governance applies regardless of route. The same Crew runs on Bedrock, Vertex, or native APIs.

AWS Bedrock Agents

Any model in Bedrock

no

Action groups, Knowledge Bases, multi-agent collaboration. Tied to AWS console, IAM, observability. Lock-in to AWS.

Google Vertex Agent Builder / Engine

Gemini natively, Anthropic via Vertex, others

no

GCP-shaped. Tooling, eval harness, deployment all assume GCP IAM and services.

Azure AI Foundry Agent Service

OpenAI + Llama + Mistral via Azure

no

Azure-only. No Anthropic, no Gemini. Useful for shops standardised on MS stack.

OpenAI Assistants / Anthropic vendor SDKs

That vendor only

no

Vendor-native agent primitives. Tied to one provider; cannot route across.

From spreadsheet to production

Production AI workloads need Smart Routing — not vibes.

AnyForge does this for you: routing as policy, role-bounded crews, an audit trail your CFO can actually query.

Book a demo →