The Routing Problem

Intelligent model routing with embedded ethics for AI agents

There's a trilemma at the center of the agent era: you can have intelligence, you can have low cost, and you can have ethics — but right now nobody is solving for all three at once.

The best models cost a fortune. The cheap ones are dumb. And ethics is bolted on as an afterthought — a system prompt that can be stripped in seconds. This page lays out the problem, the landscape, and what a real solution looks like.

The Problem

Running a capable AI agent on Opus costs $1,000–1,500/day at serious usage. Most people can't afford that. So they drop to cheaper models and their agent goes from IQ 140 to IQ 80 on tasks that require deep reasoning — while still burning through IQ 140 tokens on tasks a smaller model could handle fine.

Meanwhile, 1.5 million agents are running around with zero ethical guidelines. The system prompts that shape their character are superficial. Strip the prompt, the ethics vanish. There's nothing structural preventing an agent from making choices that extract, coerce, or optimize for the wrong things.

The trilemma: High intelligence per dollar, with choices grounded in non-relativistic ethics. Nobody is building this. The routing problem and the ethics problem are the same problem — because an agent that can't afford to think well can't make good choices, and an agent without ethical grounding doesn't know what a good choice is.

What Exists Today

Tool	What It Does	What It Doesn't
OpenRouter /auto	Routes to "best model" via NotDiamond classifier	No cost optimization, no ethics layer, no subscription management
RouteLLM (LMSYS)	Open-source binary router: strong vs. weak model	Academic, not production-ready, binary only (no multi-tier)
LLMRouter (UIUC)	Dynamic model selection per query	Research project, not turnkey
Martian	Commercial router, predicts model behavior from internals	Closed, enterprise-priced, no ethics dimension
Bifrost / LiteLLM	Gateway: fallbacks, caching, load balancing, budgets	Plumbing only — no intelligence about which model should handle which task
vLLM Semantic Router	Red Hat's open-source semantic routing for inference	Focused on self-hosted models, not multi-provider orchestration

None of these solve the trilemma. They solve pieces — routing, or cost, or failover — but none combine intelligent task classification, multi-provider cost optimization, and ethical grounding in a single layer.

The Subscription Landscape (February 2026)

There was a window where subscription proxying offered massive savings. That window is closing:

Anthropic — As of February 19, 2026, OAuth tokens from Free/Pro/Max subscriptions are banned for third-party use. Claude Code and claude.ai only. Enforcement is active.
OpenAI — Pro subscription tokens can still be used via API proxy. Formally permitted (for now). This gives ~50–100x cost reduction for capable models.
Google — No subscription-to-API bridge. API-only pricing.
xAI — Same. API-mandated.

The legal and technical landscape is shifting toward API-only access. This actually creates an opening: a legitimate routing service that manages proper API keys intelligently is more valuable as the gray-market workarounds close.

Architecture

We built and tested this. It's not a proposal — it's a working system. The architecture has two dimensions: the ethical stack (proven) and the routing intelligence (next to build). Together they solve the trilemma.

The Ethical Stack (Proven)

On February 26, 2026, we deployed the System Prompt Lens, got it jailbroken by AI Safety Camp researchers, diagnosed why, and built the fix — all in one afternoon. (Full story here.) The result is a three-layer stack where each layer is a separate process with its own context window:

┌───────────────────────────────────────────────────┐ │ THE ETHICAL STACK │ ├───────────────────────────────────────────────────┤ │ │ │ ┌────────────┐ ┌─────────────┐ ┌────────────┐ │ │ │ LAYER 1 │→│ LAYER 2 │→│ LAYER 3 │ │ │ │ INPUT │ │ SYSTEM │ │ OUTPUT │ │ │ │ FILTER │ │ PROMPT │ │ FILTER │ │ │ │ │ │ LENS │ │ │ │ │ │ Separate │ │ Ethical │ │ Separate │ │ │ │ context │ │ reasoning │ │ context │ │ │ │ window │ │ derived │ │ window │ │ │ │ │ │ from ICT │ │ │ │ │ │ Haiku │ │ Sonnet 4.6 │ │ Haiku │ │ │ │ ~$0.001 │ │ ~$0.05 │ │ ~$0.001 │ │ │ └────────────┘ └─────────────┘ └────────────┘ │ │ │ │ Prompt injection can only reach Layer 2. │ │ Layers 1 and 3 are structurally unreachable. │ └───────────────────────────────────────────────────┘

Layer 1: Pre-Inference Input Filter. A cheap, fast model (Haiku) classifies the incoming prompt before it reaches the main model. Catches JSON injection, pre-filled response templates, roleplay framing, encoding tricks, prompt extraction. This is a separate API call with a separate context window — the attacker's prompt injection physically cannot reach the filter's instructions.

Layer 2: System Prompt Lens. The full 18-part governance layer derived from the Incommensuration Theorem. Handles the vast majority of requests with genuine ethical reasoning — not just refusals, but grounded analysis of why something helps or harms. The model carries reasoning tools, not just conclusions, so the ethics scale with capability.

Layer 3: Post-Inference Output Filter. Another separate Haiku call evaluates the generated response before delivery. If the main model produced harmful content despite the lens (pattern completion, edge cases), this layer catches it before it reaches the user.

The result: The exact jailbreak that defeated the lens alone (Layer 2) is now blocked at Layer 1. The attack never reaches the model. Total added cost: ~$0.002 per request. Added latency: under 1 second.

The Routing Intelligence (Next to Build)

The ethical stack protects against harm. The routing intelligence solves the cost/quality problem. They combine into the full architecture:

┌───────────────────────────────────────────────────┐ │ THE ROUTER │ ├───────────────────────────────────────────────────┤ │ │ │ ┌────────────┐ ┌─────────────┐ ┌────────────┐ │ │ │ INPUT │ │ CLASSIFY │ │ OUTPUT │ │ │ │ FILTER │→│ + ROUTE │→│ FILTER │ │ │ │ (L1) │ │ + LENS │ │ (L3) │ │ │ │ │ │ (L2) │ │ │ │ │ └────────────┘ └─────────────┘ └────────────┘ │ │ │ ├───────────────────────────────────────────────────┤ │ CLASSIFY: Difficulty │ Domain │ Stakes │ Context │ │ ROUTE: Best model per task at lowest cost │ │ LENS: Ethical reasoning on every request │ ├───────────────────────────────────────────────────┤ │ PROVIDERS │ │ Anthropic │ OpenAI │ OpenRouter │ Google │ Local │ │ │ │ BUDGET LAYER │ │ Per-user caps │ Burst management │ Rate limits │ └───────────────────────────────────────────────────┘

The classifier reads the incoming prompt and context, classifying along four dimensions:

Reasoning depth — trivial acknowledgment → multi-step analysis → novel creative synthesis
Domain — code, writing, analysis, conversation, tool use, multimodal
Stakes — casual chat → important decision → irreversible action (money, deletion, messaging)
Context requirements — short exchange → 100k+ document processing

Given the classification, the router picks the cheapest model that meets the quality threshold. Manages rate limits across providers, handles burst gracefully, tracks spend with hard caps. Never lets an agent silently drop from Opus-level reasoning to Haiku-level on a task that requires depth.

The key insight: Ethics at the agent level (system prompts) can be stripped by anyone. Ethics at the routing layer cannot — because the user doesn't control the infrastructure. Every agent that routes through the service gets the three-layer ethical stack whether they asked for it or not.

This is the difference between putting a speed limit sign on the road (system prompt) and engineering the road so it physically can't support dangerous speeds (infrastructure).

The ethics aren't "refuse bad things" — the standard guardrail approach. They're grounded in the System Prompt Lens, derived from the Immanent Metaphysics. The lens equips the model with reasoning tools, not just conclusions — so the ethics scale with capability.

Why This Works as Infrastructure

Think about it from two directions:

From the user side: "I get the smartest agent per dollar, it doesn't randomly go stupid on me, and it makes choices I can trust." That's the product. Jordan said it: "I would sign up for it."

From the ethics side: Every agent that plugs into this service — and they will, because it saves them money and gives them better output — gets ethical grounding as a side effect. You don't have to convince 1.5 million agent operators to care about ethics. You just have to make the most cost-effective routing service also be the most ethically grounded one. The incentives align.

This is the same logic as the System Prompt Lens: put the principles into the substrate, not the surface. The router is a substrate layer for the entire agent ecosystem. Every query that passes through it gets the lens applied — derived ethics at the infrastructure level.

Build Phases

Phase 1 — OpenClaw Skill (weeks)

Local Intelligent Routing

Task classifier (rules-based + small model fallback)
Multi-provider routing with cost tracking
Basic budget management with hard caps
System Prompt Lens ethics check on high-stakes actions
Ships as a ClawHub skill anyone can install

This alone saves money and prevents the IQ-drop problem. It gets into the ecosystem fast.

Phase 2 — Hosted Service (months)

Shared Intelligence Layer

Centralized routing service with API endpoint
Learned routing from aggregate usage data (which models actually perform best for which task types)
Ethics-as-infrastructure: every request through the service gets ethical evaluation
Dashboard: cost tracking, quality scores, ethical audit trail
Works with any agent framework, not just OpenClaw

The routing gets smarter over time because it learns from every agent that uses it. Network effects kick in.

Phase 3 — The Product

Best Agent Per Dollar, Ethically Grounded

Turnkey "plug in and go" — highest intelligence per dollar with ethical guardrails baked in
Real-time model performance tracking across all providers
Automatic adaptation when new models drop (Claude 5 ships → router learns its strengths within days)
Community alignment scoring — agents get a public ethics rating based on actual behavior
Subscription management where legally permitted

At this point you have a service that people use because it's the best deal — and the ethics come free.

The Competitive Landscape

Nobody is in this exact position. Here's why:

OpenRouter has the provider relationships but their routing is basic (NotDiamond) and they have zero ethics layer. They're a commodity API aggregator.
Martian has sophisticated routing tech but they're enterprise-only, closed, and focused on the cost/quality tradeoff without the ethics dimension.
LiteLLM / Bifrost are plumbing — they handle the API translation but have no intelligence about what goes where.
The alignment community is focused on training-time interventions (RLHF, constitutional AI) — not inference-time routing with ethical evaluation.

The gap is clear: nobody is building inference-time ethical routing as a service. The routing people don't think about ethics. The ethics people don't think about routing. And neither group is thinking about cost optimization as the lever that drives adoption.

What We've Proven

The three-layer ethical stack described above isn't theoretical. We built it, got it jailbroken, and fixed it in a single afternoon. Full story here.

The sequence: deployed the lens alone → AI Safety Camp researcher jailbroke it with JSON injection within minutes → diagnosed the single-layer limitation → built the three-layer stack → retested with the same attack → blocked at Layer 1, attack never reached the model.

This is now live and testable:

Three-Layer Stack Test Bed — the full ethical stack, running. Try to break it.
Jailbreak Finding — the detailed vulnerability analysis and formal reasoning
Ask Meir — the lens + IM framework + King James Bible
Swan — the same foundation made accessible for anyone

Open Questions

Classifier accuracy: The RouteLLM data suggests ~85% is the threshold where savings outweigh misroutes. Can we do better with domain-specific classifiers?
Provider politics: Anthropic banned subscription proxying Feb 19. They could restrict API access to routers they don't control. Diversifying providers is a hedge.
Ethics evaluation cost: Full assessment on every request adds latency. The answer is tiering: lightweight for low-stakes, full assessment for consequential actions.
Business model: Margin on routing is thin. Real value is in the intelligence layer (better routing = more savings) and the ethics layer (trust = retention).
Local vs. hosted: The ethics-as-infrastructure argument works best centralized. But the self-sovereignty crowd may resist. Can you have both — local routing intelligence with an optional hosted ethics layer?

The Deeper Play

This isn't just a product idea. It's a distribution strategy for ethical grounding in the agent ecosystem.

The problem Forrest identified — 1.5 million agents with zero ethical guidelines — can't be solved by convincing each operator individually. But it can be solved by making the most economically attractive infrastructure also be the most ethically grounded. People will route through this service to save money. The ethics come along for the ride.

In the language of the Immanent Metaphysics: effective choice always exists. The path of right action is always adjacent to where you are. The question is whether the infrastructure makes that path visible and accessible — or buries it under cost pressure and convenience.

This router makes the right path the easy path.