Production patterns

Production Patterns

This is the operating manual: repo layout, environment promotion, plan review, drift detection, secrets, state recovery, imports, naming, and agent-assisted workflows.

The problem

Production IaC is about predictable change. The important parts are small state boundaries, clear ownership, reviewed plans, safe secrets, recoverable state, and explicit promotion.

A good operator knows how to make a change, inspect the plan, back out when needed, and identify who must approve it.

Boring is the goal.
Review the plan, not just the pull request.
Recovery procedures are part of the system.

Vocabulary

Think of each root as a production database with code attached. It has an owner, backend, environment, credentials, promotion path, and incident procedure. If nobody owns the root, nobody owns the scope of change.

Promotion should be explicit. Dev, staging, and prod may share modules, but they should not accidentally share state. A prod apply should be a deliberate event with a saved plan or reviewed CI output.

Root: deployable boundary.
Module: reusable API.
Environment: separate inputs and usually separate state.
Plan: review artifact.
State: recovery-critical data.

How the loop works

Use a repo layout that makes ownership obvious. Put reusable modules in one area and live roots in another. Each live root should have backend config, provider config, variables, outputs, and a README explaining how to plan, apply, import, and recover.

Secrets should flow through secret managers, CI secrets, managed identities, or platform bindings. IaC should create the secret container and access rules, not commit secret values.

Schedule drift detection for important roots.
Document import commands next to the resource being imported.
Practice state recovery before an incident.
Use naming conventions as code, not as oral tradition.

Common mistakes

Agents can produce plausible IaC quickly. That is useful, but it needs review. They may choose broad permissions, outdated arguments, wrong scopes, weak names, or destructive replacements. The human operator owns the credentials and the production effect.

Another production trap is importing existing infrastructure without finishing the cleanup. Import is not done until the next plan is no-op or intentionally explains every remaining diff.

Never apply agent-written IaC without reading the plan.
Never let an import leave permanent unexplained drift.
Never approve a destructive plan because the code diff looks harmless.

Working pattern

Use agents for draft work: module scaffolds, provider argument lookup, README generation, import command preparation, and test ideas. Use humans for decisions: provider permissions, state movement, destructive changes, naming policy, production approval, and rollback strategy.

A mature workflow is fast because the dangerous parts are standardized. The operator loop is branch, edit, format, validate, plan, review, apply, verify, and leave a no-op plan behind.

Every root gets a README.
Every production apply gets an auditable plan.
Every import ends with a no-op or explained diff.
Every manual emergency fix becomes follow-up IaC work.

The Review

Does each root have a clear scope of change?
Can a new operator recover state without guessing?
Are naming conventions enforced in code instead of remembered in chat?
Can an agent help draft changes without being allowed to approve risk?