Note

Policy as Code in the AI-code era: the speed-versus-control paradox

Production-code velocity now outruns human review. Without deterministic guardrails at CI and admission, an AI agent turns into an amplifier of mistakes — not throughput.

Aleksandr Khomutov June 4, 2026 ≈ 5 min

Copilot writes in 10 seconds what a senior used to write in an hour. Cursor scaffolds a feature from one paragraph. Claude Code opens a PR with no human in the loop. All of this works — except for one thing: review stayed manual. The speed of production code now routinely outruns the speed of human review, and that rewrites the requirements for governance.

Policy as Code used to be a «nice to have»: code went through a PR, two pairs of eyes looked at the diff, an Action: "*" in IAM or a missing resources.limits was caught by a senior within a day. Today the lag between AI-generated code and its attempt to land in a cluster is measured in minutes, and the count of such attempts in tens per team per day. PaC turns from a bonus into a functional requirement. This is not «AI writes policies» — it is deterministic admission rules, written by humans or AI, but enforced without a human in the loop, because the human can no longer keep up.

Three points where you can catch a violation

A single admission gate in the cluster is «all or nothing». A single CI check is «can be bypassed with --skip or a hotfix branch». The model that actually holds in production is three layers, each catching its own class of mistakes.

CI / pre-merge. conftest against Rego policies, Trivy against CVEs, Checkov against Terraform configs. Cheapest possible: the defect is caught before merge, before build, before the registry. This is where «do not allow cidr_blocks = ["0.0.0.0/0"] on port 22» and «do not allow Action: "*" in IAM» live. For Terraform the mandatory gate is conftest test plan.json --policy ./policies/ against terraform plan -json, because a single apply spins up dozens of resources at once, and rolling back in a PR is cheaper than rolling back later.

Admission (cluster). OPA Gatekeeper, Kyverno, or native CEL ValidatingAdmissionPolicy. This catches what CI missed: deploying an older tag that bypassed the pipeline, an image from an unsigned registry, a pod without runAsNonRoot. Admission is the last line of defense in the sense of «before the etcd write»: network egress-block fires later, runtime detection (Falco) later still.

Runtime. Falco/Tetragon are detective, not preventive. They catch shell in a prod container, Secret access from the default ServiceAccount. They do not block — they alert, which is why they are useless without the first two layers.

The important detail: a CI gate without an admission gate is a false sense of security. CI catches what you built; admission catches what you deployed. These two sets overlap but do not coincide.

What to enforce with

Three tools, and the choice between them is not «the best one» but «the one that fits the team profile».

OPA Gatekeeper — Rego, industry standard, mature mutation in beta. The strong side is Rego's expressiveness for complex rules with cross-resource logic. The weak side is the learning curve and the language barrier: a team that has never written Rego spends weeks on «its first useful constraint».

Kyverno — YAML instead of Rego, mutate/generate out of the box (for example, automatically adding a NetworkPolicy to a new namespace), built-in verifyImages for Cosign/Sigstore without external data providers. CNCF Incubating. The default for teams that need mutation and image policy and do not want yet another DSL.

CEL ValidatingAdmissionPolicy — GA in Kubernetes 1.30, MutatingAdmissionPolicy + CEL GA in 1.36. It runs inside the API server: no TLS, no Deployment, no cert-manager dependency, no failure mode of «the webhook is down → everything rejected or everything passed». It covers the class of simple declarative validations (required labels, naming conventions, transition rules), lifting the «simple-validation tax» off Kyverno/OPA. It does not displace them — image verification, multi-resource generate, and complex rules still need Kyverno.

A working guide: CEL for simple, Kyverno for middle ground and image policy, OPA Gatekeeper for teams with complex cross-resource logic and a Rego culture.

What breaks PaC in production

failurePolicy: Ignore on a ValidatingWebhookConfiguration. Default in many tutorials. Under load or during an incident the policy engine degrades — admission defaults to allow, the bypass activates precisely when it is most dangerous. Always Fail.
enforcementAction: dryrun forever. Gatekeeper's killer feature is dryrun mode for observing blast radius before flipping to deny. But if it stays for half a year, it is no longer a policy — it is logging. A hard timeline: dryrun one week → deny.
One gigantic ConstraintTemplate / ClusterPolicy. Impossible to debug, impossible to review, an exception for a single namespace breaks all the others.
PaC without unit tests. OPA Rego ships with a built-in testing framework, Kyverno with kyverno cli test. An AI-generated policy without a test is an AI-generated security incident.

Where this lands

Policy as Code in the AI-code era is not a way to slow developers back down to pre-AI speeds. It is deterministic guardrails that let you release human review on repetitive patterns (required labels, image tags, resource limits, IAM wildcards) and keep it where it actually matters (architecture, business logic, edge cases). The speed-versus-control paradox is resolved not by choosing «speed or control», but by moving control out of the manual lane into the automatic one.

Without that bridge an AI agent is an amplifier of mistakes. With it, an AI agent is an amplifier of team throughput.

Three points where you can catch a violation

What to enforce with

What breaks PaC in production

Where this lands

Related articles

The golden path as a product: why the thinnest viable platform beats the five-year build

Internal Developer Platform: Six Layers and Three Adoption Drivers