Alignment Is Becoming Infrastructure (And We’re Not Talking About It)

The Metamorphosis of Codexcissus

Feb 10, 2026

There’s a growing mismatch in AI discourse that’s becoming hard to ignore.

On the surface, we get blog posts about vending machines, toy agents, and harmless demos. Beneath that, labs are hiring senior red-team researchers at compensation levels that signal something much more serious. This isn’t hypocrisy or deception. It’s a sign that we’ve crossed into a phase where the real risks are systemic, subtle, and hard to communicate publicly.

What’s missing is not concern. It’s formalization.

From Alignment as Safety to Alignment as Infrastructure

Most alignment research has been framed as a question of individual models:

Is the model truthful?

Is it helpful?

Does it follow a constitution?

Can it be tricked into harmful outputs?

That framing made sense when models were mostly standalone tools.

It makes less sense now.

Modern systems increasingly involve:

agent scaffolding

tool use

task decomposition

model-to-model supervision

orchestration of weaker models by stronger ones

reuse of open-source models inside larger workflows

In this regime, alignment stops being just a property of a model. It becomes infrastructure for coordination.

And infrastructure has very different failure modes.

The Quiet Risk: Correlated, Aligned Failure

Constitutional alignment works surprisingly well up to a point. Below current frontier capability levels, it tends to fail in predictable, bounded ways: over-cautiousness, literalism, over-reporting, or excessive moralization. These are annoying but legible.

The problem emerges when you scale aligned systems into swarms.

If many agents share:

the same constitutional text

the same interpretation machinery

the same deterministic reasoning workflows

the same safety heuristics

then you don’t necessarily get uniform behavior. What you get is correlated behavior.

That distinction matters.

Correlated failures are far more dangerous than independent ones. They look reasonable, pass internal checks, and propagate quietly through systems. Everyone follows the rules. Nobody notices the drift until it’s downstream.

This is not misalignment in the classic “rogue AI” sense. It’s systemic fragility.

Determinism Makes This Better—and Worse

There’s active interest in deterministic or semi-deterministic reasoning systems: hierarchical reasoning models, workflow-driven agents, reproducible tool use, shared encoded procedures.

Determinism reduces randomness and variance. That’s good for reliability.

But it also:

increases failure correlation

aligns blind spots

compresses diversity of judgment

turns alignment rules into coordination primitives

Determinism collapses variance inside reasoning funnels, not in the environments those systems operate in. When thousands of agents interact with messy, partially observed worlds, determinism doesn’t eliminate divergence. It synchronizes it.

You don’t get chaos. You get coherent drift.

Why This Isn’t Showing Up in Papers (Yet)

This class of risk sits in an uncomfortable place.

It’s:

too speculative for standard ML benchmarks

too real for casual blog posts

too dual-use to publish operational details

too cross-cutting to fit neatly into existing teams

Mechanistic interpretability became a field because it had tangible artifacts and clear academic norms. Swarm behavior, meta-reasoning allocation, and coordination infrastructure don’t yet.

So the conversation is happening informally: in internal docs, hiring decisions, and closed-door discussions. Publicly, we get safe metaphors.

That doesn’t mean labs are clueless. It means the organizational vocabulary is lagging the technology.

The Real Near-Term Risk Isn’t ASI

The most plausible near-term risk is not runaway superintelligence.

It’s this:

> Many semi-autonomous, mostly aligned systems coordinating over long horizons, shaping research, markets, and institutions faster than humans can meaningfully supervise—while remaining internally “compliant” by their own rules.

That’s a governance problem, not a sci-fi one.

Alignment that works well enough to enable large-scale coordination, but not well enough to self-correct at the system level, is prickly. It’s hard to regulate, hard to audit, and hard to even name.

What We Should Be Working On

If alignment is becoming infrastructure, we need to treat it like infrastructure:

Metrics for correlated failure, not just individual safety

Evaluations for swarm behavior and long-horizon coordination

Design patterns that preserve disagreement and epistemic diversity

Alignment schemes that fail loudly rather than quietly

Governance models that prevent alignment from becoming soft control technology

None of this requires panic. But it does require intellectual honesty about where the field actually is.

Closing

There’s a tendency in AI discourse to oscillate between hype and denial. This isn’t either.

It’s a recognition that we’re entering a systems phase, where the risks are less about single bad outputs and more about how aligned components interact at scale.

If we don’t name that problem clearly, we’ll keep talking about vending machines while quietly building something much more consequential.

This was GPT 5.2 generated during a long chat about threat modeling I apologize for the slop punch but there is some Hi-C and absinthe in there for ya.

Basically we’re in an operational threat matrix where anyone can sift for research breakthroughs that push efficiency of small AI models above the Replibench combined with more powerful scaffolding to achieve higher cybench, constitutional alignment may be semi-effective for ideologizing these swarms semi-consistently up to the Claude 4.6 threshold of RSI, and that gives some small comfort. Mechinterp can help and it can also hypercharge the RSI.

There’s probably a good policy recommendation somewhere in this. Beware.

The Francis Bacon Conspiracy

Discussion about this post

Ready for more?