The Governance Gap No One Is Talking About
Every protocol in the AI agent ecosystem defines how agents communicate. MCP, A2A, ACP, AGNTCY — all four. None of them define whether a worker should be trusted to execute. That gap has a name now: WCP.
The protocols we have
The AI agent infrastructure layer has been moving fast. In the last 18 months, we got four serious protocols that define how agents talk to each other and to tools:
| Protocol | Publisher | What it governs |
|---|---|---|
| MCP | Anthropic | How agents call tools |
| A2A | How agents communicate with each other | |
| ACP | IBM / BeeAgent | Agent communication patterns |
| AGNTCY | AGNTCY consortium | Agent discovery and marketplace |
These are real, solid protocols. MCP has production adoption. A2A is backed by Google's full weight. They solve real problems.
But look at the "What it governs" column. Every row covers the communication layer. None of them cover the execution layer. That is the gap.
"Technical protocols for inter-agent communication are solid. What's missing are organizational protocols — governance and policy frameworks for worker execution." — O'Reilly, AI Agent Infrastructure Report 2025 (Shyamsundar)
None of these protocols answer: When an AI worker executes, what policy governs it? Who approved it? What data did it touch? Can you prove it?
What "governance gap" means in practice
Here is what happens without a governance layer. These are not hypothetical:
Air Canada, 2024
An AI chatbot promised a bereavement fare discount that the airline's actual policy did not permit. When the customer tried to claim it, Air Canada's defense was that the chatbot was "a separate legal entity" responsible for its own statements.
The British Columbia Civil Resolution Tribunal rejected this. Air Canada was ordered to honor the discount plus pay damages. The court found Air Canada responsible for what its AI agent said.
The chatbot had no policy layer. It had no record of what it promised or why. There was no audit trail. There was nothing to show it had checked whether the discount was approved. The execution had no governance.
OpenAI Operator, February 2025
OpenAI's Operator agent autonomously completed a $31.43 online purchase without explicit per-transaction user consent. The Washington Post documented the case. Operator had been configured to handle tasks, and it handled one the user did not intend.
This is not an attack. This is an agent doing exactly what it was told — just without the right controls on when and what it could execute.
Our own lab
We spent three hours debugging a worker routing failure. The agent kept calling
cap.recall.fetch (a pre-WCP non-conformant ID with a version suffix). The routing rules expected cap.mem.retrieve.rag.
Both seemed valid. The router silently denied the request and routed to the default.
No error. No audit trail. Three hours.
That was a governance failure. The worker had no declared capability contract. The router had no way to verify what the worker could handle. The agent had no way to know it had been silently redirected.
Why observability alone is not enough
"Observability without enforcement creates a false sense of safety. You can see everything that happened. You cannot prevent what you're watching." — O'Reilly, AI Agent Infrastructure Report 2025 (Raj)
The existing tooling (LangSmith, Helicone, Arize Phoenix) solves observability. You can trace what happened. You can see the call graph. You can see which workers were invoked.
Observability answers: what happened?
Governance answers: was it allowed to happen?
These are different questions. In a regulated environment — finance, healthcare, federal contracting — you need both. And the governance layer has to run before execution, not after.
The token efficiency problem
There is a practical engineering argument for a governance layer beyond compliance. It is the token budget.
The current pattern for giving an agent access to workers: describe every worker inline in the system prompt. Ten workers, six hundred tokens each. That is six thousand tokens of overhead on every call. On top of that, the agent has to reason about which worker to call — another few hundred tokens.
(10 workers × 600 tokens + 300 routing)
(10 × 20 token IDs + 80 dispatch payload)
The WCP pattern replaces inline worker descriptions with capability IDs.
Instead of six hundred tokens describing what a worker does, you send
cap.doc.summarize. The Hall looks it up. The agent asks for
capabilities; it does not have to describe them.
This is not theoretical. Cloudflare's engineering blog documented a production AI agent system that reduced context size from 1.17 million tokens to approximately 1,000 tokens using exactly this pattern — a 99.9% reduction in context overhead per agent call (February 2026).
There is also a capability cliff. The Llama 3.2 3B model has a 4,096 token context window. Ten inline worker descriptions consume 4,500 tokens before the actual task is added. That model cannot run at all with inline worker descriptions. With WCP capability IDs: 200 tokens. The small model becomes viable.
The protocol, not the product
We built WCP as an open protocol, not a product. The reasoning is the same as OpenTelemetry.
OpenTelemetry is 100% free, Apache 2.0, governed by CNCF. It makes zero revenue. The ecosystem built on top of it — Datadog ($2.7B ARR), Grafana ($400M ARR), Honeycomb, New Relic — captures billions. OTel's authors work at those companies. They benefit from the standard they wrote.
WCP follows the same path. The spec is the protocol layer. PyHall is the reference implementation. The revenue happens on top:
| Layer | What it is | OTel equivalent |
|---|---|---|
| WCP_SPEC.md | MIT open standard | OTel Specification |
| PyHall | Reference implementation (Python, TypeScript, Go) | OTel SDKs |
| pyhall.cloud | Managed Hall SaaS (coming) | Grafana Cloud |
| Compliance profiles | FedRAMP, SOC2, EU AI Act profiles | Honeycomb paid tier |
The protocol has to be free and unencumbered for this to work. WCP is MIT licensed — the simplest possible terms for anyone to implement. PyHall, the reference implementation (SDK + Hall Monitor + Hall API server), is Apache 2.0, adding the patent grant that matters for enterprise adoption and CNCF submission. The "PyHall" and "WCP" names remain trademarks regardless of how the protocol is forked or extended.
What WCP actually is
WCP defines five required behaviors for any compliant worker dispatch system:
2. Deterministic — same inputs always produce same routing decision
3. Declared controls — every worker declares what it needs before enrolling
4. Mandatory telemetry — three events minimum: dispatch, complete/fail, evidence
5. Dry-run — every request can be routed without execution for testing
The routing decision object includes: the selected worker, why it was selected, what controls were verified, the blast radius score (how much damage could this worker do if it malfunctions), and an evidence receipt with a SHA-256 hash of the dispatch record.
# Three lines to route a capability request through a Hall
from pyhall import make_decision, RouteInput, load_rules, Registry
rules = load_rules("routing_rules.json")
registry = Registry(registry_dir="enrolled/")
decision = make_decision(RouteInput(
capability_id="cap.doc.summarize",
env="prod",
data_label="CONFIDENTIAL",
tenant_risk="low",
qos_class="P2",
tenant_id="acme-corp",
), rules, registry)
# decision.selected_worker_species_id — which worker handles this
# decision.controls_verified — what was checked before routing
# decision.blast_score — risk if this worker fails
# decision.evidence_receipt.hash — SHA-256 of request payload That is the entire routing call. The Hall checked capability availability, applied policy gates (data classification, environment, tenant risk), computed blast radius, and emitted a signed evidence receipt — before any worker ran.
Why the clock is running
The EU AI Act's high-risk system requirements are not abstract. AI systems making decisions in employment, credit, critical infrastructure, healthcare, and law enforcement qualify. Many enterprise AI agent deployments are in scope today.
WCP's evidence receipt builder generates exactly what Article 12 requires: lifetime event logs, tamper-evident per-decision artifact hashes, with full traceability of dispatch decisions (hash-chain ledger in-progress for v0.3.x). This is not a compliance add-on bolted on afterward. It is built into the dispatch protocol from the first call.
What ships today
WCP v0.1 is published. PyHall ships in three languages:
| Package | Install | Status |
|---|---|---|
| pyhall (Python) | pip install pyhall-wcp | v0.1.0 — routing engine, conformance, CLI |
| @pyhall/core (TypeScript) | npm install @pyhall/core | v0.1.0 — 21/21 tests passing |
| pyhall-go (Go) | go get github.com/pyhall/pyhall-go@latest | v0.1 — interfaces and routing stub |
v0.2 (60 days): WorkerBase class, HallClient for agents,
pyhall serve (one command to run a Hall), and the evidence receipt
auto-builder with hash chaining.
Try it in five minutes
Enroll a worker, route a capability request, verify the evidence receipt. WCP is MIT. pyhall and Hall are Apache 2.0. Connect pyhall.dev for the managed registry service.
The spec, not just the tool
The goal is not to build another AI governance SaaS. The goal is to own the protocol layer — the way OTel owns observability instrumentation — so that every governance tool built in the next five years builds on top of WCP rather than reinventing it.
WCP_SPEC.md v0.1 is published at workerclassprotocol.dev/spec under MIT. The spec covers: identifier rules, compliance levels (WCP-Basic / WCP-Standard / WCP-Full), five required behaviors, routing decision schema, evidence receipt format, and the blast radius scoring model.
Read the spec. Implement it. Extend it. If you are building AI agent infrastructure and you want a governance layer that is not locked to your vendor, this is the protocol to build on.
Sources
| Claim | Source |
|---|---|
| O'Reilly governance gap quotes | O'Reilly AI Agent Infrastructure Report, 2025 |
| 87% agents lack safety documentation | MIT CSAIL AI Agent Safety Checklist Study, 2024 |
| 7% of orgs have embedded AI governance | Knostic AI Governance Survey, 2025 |
| 97% of AI breach victims lacked access controls | IBM Security, AI Threat Intelligence Report, 2025 |
| 99.9% token reduction (1.17M → ~1K tokens) | Cloudflare Engineering Blog, February 2026 |
| Air Canada chatbot liability ruling | Moffatt v. Air Canada, BC Civil Resolution Tribunal, 2024 |
| OpenAI Operator autonomous purchase | Washington Post, February 7, 2025 |
| EU AI Act Article 12, 7% revenue penalty | EU Official Journal, Regulation 2024/1689, August 2024 |
| OTel: Datadog $2.7B ARR, Grafana $400M ARR | Datadog Q4 2024 earnings; Grafana Labs funding announcement, 2024 |