
Policy as Code (PaC) means writing your governance rules (required tags, allowed regions, encryption standards, cost limits) as version-controlled files that are evaluated automatically against every Terraform or OpenTofu plan. The decision that matters most is not which tool you pick but the enforcement level you assign each policy: the same Rego file behaves completely differently as an advisory warning, a soft-mandatory gate with overrides, or a hard-mandatory block, and most of the rollout pain we see in Scalr's support queue traces back to getting that assignment wrong.
This guide covers the full implementation path for Terraform and OpenTofu environments: native HCL validation, OPA, Sentinel, Checkov, and tfsec; CI/CD integration; policy lifecycle management; and the failure modes teams actually hit in production.
Policy as Code means writing your governance policies in a programmable, machine-readable language instead of manual processes or human-readable documents. These policies cover security, compliance, cost management, and operational standards, and you write them in formats like JSON, YAML, or domain-specific languages (DSLs) like:
Once you've written them, you treat policies like application code: version-controlled, tested, audited, and deployed automatically. That turns governance from something reactive and manual into something that runs on its own.
Misconfigurations are a notorious weak link in cloud security. PaC lets you write your security standards down (data encryption, least-privilege access, benchmarks like CIS) and apply them automatically across every deployment. Catching problems this early helps you head off vulnerabilities and stay compliant with regulations like PCI DSS, HIPAA, GDPR, and SOC 2.
Applying policy by hand is a recipe for inconsistency and drift. Different teams read the same guidelines differently, and you end up with unpredictable results. PaC gives you one source of truth for operational rules, applied the same way from dev through production. That cuts down on manual mistakes and gives you more reliable infrastructure.
Cloud bills climb fast without governance. PaC keeps spend in check by enforcing rules on resource sizing, requiring cost-allocation tags, restricting deployments to cheaper regions, and blocking wasteful configurations. Those automatic checks can save you real money without slowing teams down.
Audits hurt less when your policies are version-controlled and your enforcement logs are complete. PaC gives you the audit trail and documentation regulators ask for, so you spend less time proving compliance by hand.
A mature PaC setup has a few connected pieces that feed into each other:
Policy Definition You write policies in a structured, machine-readable format and store them in version control. A clear policy spells out what's allowed and what isn't (e.g., "All S3 buckets must have versioning enabled"). Version control gives you history, lets people collaborate, and keeps everything auditable.
Policy Enforcement A policy doesn't do anything until something enforces it. PaC tools check configurations or plans against your policies and then act, anywhere from blocking a deployment to firing an alert. That can happen at a few stages: a developer's laptop, the CI/CD pipeline, or a platform-level gate.
Policy Testing You wouldn't ship application code without testing it, and policies are no different. Testing makes sure a policy works and doesn't accidentally block legitimate work. Try it out in a controlled environment first.
Policy Monitoring and Auditing Monitoring tracks configurations, logs violations, and produces compliance reports. That data is the feedback loop you use to refine policies over time. And again, version-controlled policies plus enforcement logs make audits a lot less painful and easier to defend.
Terraform can validate configurations right inside your HCL. These native features are a first line of defense: they let module authors enforce contracts and catch basic errors early. For the full input-variable validation reference, see our Terraform variables and outputs guide.
Terraform allows custom validation rules using precondition and postcondition blocks:
Preconditions are evaluated before processing an object (resource, data source, or output). They validate assumptions before an action is taken:
resource "aws_instance" "example" {
ami = var.ami_id
instance_type = var.instance_type
lifecycle {
precondition {
condition = can(regex("^ami-", var.ami_id))
error_message = "The provided AMI ID must start with 'ami-'."
}
precondition {
condition = var.instance_type == "t3.micro" || var.instance_type == "t2.micro"
error_message = "Only t3.micro or t2.micro instance types are allowed."
}
}
}Postconditions are evaluated after an object has been processed. They verify guarantees about the resulting state:
resource "aws_s3_bucket" "example" {
bucket = "my-unique-bucket-12345"
lifecycle {
postcondition {
condition = self.versioning[0].enabled == true
error_message = "S3 bucket versioning was not enabled as expected."
}
}
}Introduced in Terraform v1.5.0, check blocks offer validation focused on overall state rather than individual resource lifecycles:
check "all_s3_buckets_have_logging" {
assert {
condition = alltrue([
for bucket in aws_s3_bucket.all : bucket.logging != null
])
error_message = "Not all S3 buckets have server access logging configured."
}
}When an assertion fails, a check block usually just warns instead of stopping the run, which makes it a good fit for ongoing validation and health checks. Terraform Cloud can validate these checks continuously.
They're useful, but Terraform's built-in features only go so far:
Once you outgrow these limits, you need a dedicated PaC tool to govern properly.
Open Policy Agent (OPA) is an open-source, CNCF-graduated, general-purpose policy engine that decouples policy decision-making from policy enforcement. For a hands-on intro, see OPA Series Part 1: Open Policy Agent and Terraform.
The core workflow involves three steps:
Evaluate with OPA: Feed the JSON plan and your policies to OPA
conftest test --policy ./policy/ tfplan.jsonConvert to JSON: OPA evaluates JSON data, so convert the binary plan
terraform show -json tfplan.binary > tfplan.jsonGenerate a Terraform Plan: Create a binary Terraform execution plan
terraform plan -out=tfplan.binaryOPA produces a decision (allow, deny, or a set of violations), which your CI/CD pipeline acts upon.
Policies in OPA are written in Rego, a declarative language built for querying complex, hierarchical data and expressing policies:
package terraform.aws.s3_versioning
# This rule generates a message for each S3 bucket
# being created without versioning enabled
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.mode == "managed"
action := resource.change.actions[_]
action == "create"
not resource.change.after.versioning
msg := sprintf("S3 bucket '%s' must have versioning enabled.", [resource.address])
}
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_s3_bucket"
resource.mode == "managed"
action := resource.change.actions[_]
action == "create"
resource.change.after.versioning
not resource.change.after.versioning[0].enabled
msg := sprintf("S3 bucket '%s' has versioning but it is not enabled.", [resource.address])
}In this example:
package terraform.aws.s3_versioning organizes the policydeny[msg] { ... } defines rules that generate violation messagesinput.resource_changes[_] iterates through proposed resource changessprintf() creates readable violation messagesconftest is a CLI tool that bundles OPA and simplifies policy testing:
conftest test --policy ./policy/ tfplan.jsonOutput shows whether policies pass or lists violations:
FAIL - tfplan.json - terraform.aws.s3_versioning - S3 bucket 'aws_s3_bucket.my_bucket' must have versioning enabled.
Advantages:
Considerations:
Sentinel is HashiCorp's embedded Policy as Code framework, tightly integrated into Terraform Cloud and Terraform Enterprise.
Sentinel policies are evaluated as an integral part of the run workflow, specifically after terraform plan and before terraform apply. Sentinel receives rich context:
With all that context, you can make pretty sophisticated policy decisions.
Policies are written in HSL (HashiCorp Sentinel Language), a dynamically typed language designed for approachability:
import "tfplan/v2" as tfplan
# Function to find all EC2 instances being created or updated
find_ec2_instances = func() {
instances = {}
for tfplan.resource_changes as address, rc {
if rc.type is "aws_instance" and
(rc.change.actions contains "create" or rc.change.actions contains "update") {
instances[address] = rc
}
}
return instances
}
# Rule: All instances must have the 'Owner' tag
all_instances_have_owner_tag = rule {
all find_ec2_instances() as _, instance {
instance.change.after.tags contains "Owner"
}
}
# Main policy rule
main = rule {
all_instances_have_owner_tag
}Sentinel policies support three enforcement levels:
Advantages:
Considerations:
OPA and Sentinel work on Terraform plans (the intended state), but static analysis tools read your raw HCL (the defined state) to catch common misconfigurations early.
tfsec is an open-source static analysis tool specifically designed to find security misconfigurations in Terraform code.
How it works:
.tf files directly without requiring Terraform initialization or plan generationStrengths:
Note: tfsec development is being consolidated into Aqua Security's Trivy, a broader container and IaC security scanner.
Checkov, by Bridgecrew (now Palo Alto Networks), is a broader static analysis tool supporting Terraform, CloudFormation, Kubernetes, Helm, Dockerfile, and more.
Capabilities:
.tf filesOne caution on severity-based gating. A public-sector team we worked with at Scalr wanted Checkov to fail runs on MEDIUM and HIGH findings while letting LOW findings pass, and tried --skip-check LOW and then --soft-fail-on LOW. Neither flag had any effect; every LOW finding still failed the run. The reason is that when Checkov runs offline, its built-in checks carry no severity metadata at all: severity is None unless you authenticate against the Prisma Cloud API, so a severity-based threshold matches nothing. The configuration that actually worked listed exact check IDs instead, --soft-fail-on CKV2_AWS_41,CKV_AWS_79,CKV_AWS_126, which exits 0 only when every failing check appears on that list. If you are building tiered enforcement on open-source Checkov, plan to maintain explicit check-ID lists rather than severity tiers.
See related articles for deeper coverage:
Advantages:
Considerations:
Scalr is a Terraform/OpenTofu management platform with native Policy as Code integration, enabling organizations to implement governance at scale. Scalr includes OPA enforcement in its per-run pricing, with no per-user fees.
Scalr integrates with established open-source policy frameworks:
Scalr has native OPA integration. Policies written in Rego evaluate Terraform and OpenTofu run data with fine-grained control. Scalr expects Rego policies to define a deny set where each item represents a policy violation. For deeper structural patterns, see OPA Series Part 2: OPA Logic and Structure for Scalr and Part 3: How to Analyze the JSON Plan; for ready-made starters, Part 4: Simple Policies for Scalr.
For static analysis and vulnerability scanning, Scalr integrates Checkov to identify misconfigurations before infrastructure is provisioned. Scans occur before Terraform initialization.
Scalr OPA policies enforce at two distinct stages:

Policies evaluated before the Terraform plan is generated. The tfrun input data includes:

Policies evaluated after plan generation. The tfplan input data includes all tfrun data plus the full JSON representation of the plan, allowing inspection of proposed resource changes.
Hard Mandatory:
Soft Mandatory:
Advisory:
Two practical notes from teams running this in production. First, enforcement levels are set per policy, not per environment. If you want the same tag policy hard-mandatory in production but soft-mandatory in dev, today that means maintaining two policy groups with duplicated Rego and a different scalr-policy.hcl in each. One customer standardizing tag enforcement ended up duplicating an entire policy directory to vary a single enforcement_level line; budget for that structure up front if your environments need different strictness.
Second, advisory mode is more useful than it looks. A customer in a regulated industry had a FinOps team that wanted alerts whenever large EC2 instances appeared in dev plans, but explicitly never wanted to block a developer. Rather than fighting over enforcement, they shipped it as an advisory policy and wired the violation events into EventBridge for email notification. The policy never gates a run, and the FinOps team gets exactly the signal they asked for. Not every policy needs to be a gate; some are sensors.

Scalr promotes a GitOps model for policy management:
scalr-policy.hclExample scalr-policy.hcl:
version = "v1"
policy "terraform_min_version" {
enabled = true
enforcement_level = "hard-mandatory"
}
policy "limit_modules" {
enabled = false
enforcement_level = "soft-mandatory"
}
policy "workspace_name_convention" {
enabled = true
enforcement_level = "advisory"
}Scalr makes it easy to reuse OPA code across multiple policies by pointing a policy group at a shared-functions directory that every policy in the group can import. Functions and imports are standard Rego, so the same helpers don't have to be copied into every policy. Lots of policies end up sharing the same OPA code, and that's exactly where importing helps. For example, you can put shared helper functions in a utility file:
utils.rego:
package utils
array_contains(arr, elem) {
arr[_] = elem
}
get_basename(path) = basename {
arr := split(path, "/")
basename := arr[count(arr)-1]
}That file can then be imported into any policy:
main.rego:
package terraform
import input.tfplan as tfplan
import data.utils
not_allowed_provider = ["null"]
deny[reason] {
resource := tfplan.resource_changes[_]
action := resource.change.actions[count(resource.change.actions) - 1]
utils.array_contains(["create", "update"], action)
provider_name := utils.get_basename(resource.provider_name)
utils.array_contains(not_allowed_provider, provider_name)
reason := sprintf("%s: provider type %q is not allowed", [resource.address, provider_name])
}In Scalr, the OPA administrator sets the path to the shared functions directory in the policy group configuration, and the import data.utils statement pulls the utility file into any policy that references it. No more duplicated code, and policies are easier to maintain as you scale.
Shared functions come with one easy-to-miss configuration detail. A team we worked with at Scalr during an environment-wide OPA rollout had two policies in one repository: enforce_aws_tag set to advisory and enforce_tf_min_version set to hard-mandatory in scalr-policy.hcl. Whenever the advisory tag policy failed, the hard-mandatory version policy failed with it, blocking applies that should have sailed through with a warning. Both policies passed locally and in Docker against openpolicyagent/opa:1.3.0, so the team burned hours on the wrong hypotheses: swapping OPA versions (1.0.0, 1.4.2, 1.6.0), rewriting rules between the deny[msg] if and deny contains msg if syntaxes, and asking the reasonable question of how the platform knows which deny set belongs to which policy (answer: the policy names in scalr-policy.hcl must match the .rego filenames).
The actual cause was none of that. The policy group had the shared-functions option enabled but pointed at a directory with no shared functions in it. With that option set, the per-policy enforcement separation collapsed, so every policy's violations were evaluated together and the strictest enforcement level won. Clearing the functions dropdown in the policy group configuration fixed it immediately. The takeaway: only enable the shared-functions path when you actually have shared code, and when two policies seem entangled, check the policy group configuration before you start rewriting Rego.
Terraform gave us one declarative language for infrastructure automation, and OPA does the same for policy automation. You can enforce OPA policies in Terraform and OpenTofu, but also across Kubernetes, CI/CD pipelines, API gateways, and plenty of other systems. So your team learns one policy language (Rego) and applies it everywhere, which makes life easier for both DevOps and SecOps.
Example 1: Cost Control
package terraform
import input.tfrun as tfrun
deny[reason] {
cost = tfrun.cost_estimate.proposed_monthly_cost
cost > 100
reason := sprintf("Plan exceeds $100 limit: $%.2f", [cost])
}Example 2: S3 Bucket Security
package terraform
import input.tfplan as tfplan
deny[reason] {
r = tfplan.resource_changes[_]
r.mode == "managed"
r.type == "aws_s3_bucket"
r.change.after.acl == "public"
reason := sprintf("S3 bucket %s must not be PUBLIC", [r.address])
}Example 3: Multi-Cloud Instance Types
package terraform
import input.tfplan as tfplan
allowed_types = {
"aws": ["t2.nano", "t2.micro"],
"azurerm": ["Standard_A0", "Standard_A1"],
"google": ["n1-standard-1", "n1-standard-2"]
}
instance_type_key = {
"aws": "instance_type",
"azurerm": "vm_size",
"google": "machine_type"
}
get_basename(path) = basename {
arr := split(path, "/")
basename := arr[count(arr)-1]
}
get_instance_type(resource) = instance_type {
provider_name := get_basename(resource.provider_name)
instance_type := resource.change.after[instance_type_key[provider_name]]
}
deny[reason] {
resource := tfplan.resource_changes[_]
instance_type := get_instance_type(resource)
provider_name := get_basename(resource.provider_name)
not array_contains(allowed_types[provider_name], instance_type)
reason := sprintf("%s: instance type %q not allowed", [resource.address, instance_type])
}Scalr provides the Scalr/sample-tf-opa-policies GitHub repository with example policies covering:
| Feature | OPA/conftest | Sentinel | tfsec | Checkov |
|---|---|---|---|---|
| Language | Rego | HSL | Built-in/JSON/YAML/Rego | Built-in/Python/YAML |
| Evaluation Type | Plan-based | Plan/State/Config-based | Static HCL | Static HCL + Plan |
| Learning Curve | Moderate | Easy | Low | Low |
| Vendor | Open-source | HashiCorp | Open-source | Palo Alto Networks |
| Integration | CI/CD tools | TFC/TFE native | CLI/pre-commit | CLI/pre-commit/IDE |
| Expressiveness | Very high | High | Medium | Medium |
| Customization | Highly flexible | Good | Good | Good |
Choose OPA/conftest when:
Choose Sentinel when:
Choose tfsec when:
Choose Checkov when:
The teams who do this best usually stack a layered approach:
For the broader CI/CD context that this section plugs into, see our CI/CD and GitOps for Terraform & OpenTofu pillar.
Integrate tools into pre-commit hooks to catch issues before code is pushed:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/terraform-linters/pre-commit-terraform
rev: v1.81.0
hooks:
- id: terraform_fmt
- id: terraform_validate
- id: tfsec
- id: checkovBenefits: Fastest feedback loop; developers catch issues immediately on their machines.
Trigger automated checks when a PR is opened:
# .github/workflows/terraform-checks.yml
name: Terraform Validation
on: [pull_request]
jobs:
tfsec:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: aquasecurity/[email protected]
with:
working_directory: ./terraform/
checkov:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: bridgecrewio/checkov-action@master
with:
directory: ./terraform/Benefits: Enforces coding standards; catches static misconfigurations before merge; provides PR feedback.
After terraform plan, convert to JSON and validate against organizational policies:
# In CI/CD pipeline
terraform plan -out=tfplan.binary
terraform show -json tfplan.binary > tfplan.json
conftest test --policy ./policies/ tfplan.json
# Exit with failure if violations found
if [ $? -ne 0 ]; then
echo "Policy violations detected"
exit 1
fiBenefits: Validates intended state with full context; catches issues requiring resolved values; final gate before apply.
Prevent deployment of non-compliant infrastructure:
# Plan-time validation failure blocks this stage
if [ "$PLAN_VALIDATION_STATUS" == "FAILED" ]; then
echo "Cannot proceed to apply due to policy violations"
exit 1
fi
terraform apply tfplan.binaryBenefits: Hard enforcement; prevents non-compliant infrastructure from being deployed.
name: Terraform Deployment
on:
pull_request:
paths:
- 'terraform/**'
push:
branches:
- main
paths:
- 'terraform/**'
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.6.0
- name: Terraform Format
run: terraform fmt -check -recursive ./terraform/
- name: Terraform Validate
run: |
cd terraform
terraform init -backend=false
terraform validate
- name: Run tfsec
uses: aquasecurity/[email protected]
with:
working_directory: ./terraform/
- name: Run Checkov
uses: bridgecrewio/checkov-action@master
with:
directory: ./terraform/
framework: terraform
plan:
needs: validate
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Plan
run: |
cd terraform
terraform init
terraform plan -out=tfplan.binary
- name: Convert Plan to JSON
run: |
cd terraform
terraform show -json tfplan.binary > tfplan.json
- name: Setup conftest
run: |
wget https://github.com/open-policy-agent/conftest/releases/download/v0.49.0/conftest_0.49.0_Linux_x86_64.tar.gz
tar xf conftest_0.49.0_Linux_x86_64.tar.gz
- name: Run Policy Checks
run: |
cd terraform
../conftest test --policy ../policies/ tfplan.json
apply:
needs: plan
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Apply
run: |
cd terraform
terraform init
terraform apply -auto-approveCollaborate Across Teams Define policy requirements with security, compliance, operations, and development teams to ensure policies are practical and effective.
Document Thoroughly For each policy, document:
Modularize and Reuse Write reusable policy functions and modules:
# reusable function for tagging checks
required_tags(resource) {
resource.change.after.tags["Owner"]
resource.change.after.tags["Environment"]
resource.change.after.tags["CostCenter"]
}
# use the function across multiple resource checks
deny[msg] {
resource := input.resource_changes[_]
resource.type == "aws_instance"
not required_tags(resource)
msg := sprintf("%s missing required tags", [resource.address])
}A common authoring mistake is assuming the policy engine will load arbitrary data files sitting next to your .rego code. One customer building workspace-level access policies kept a workspace_configs.json file (an allow_users/deny_users map per workspace) in the policy repository alongside the Rego, expecting to reference it as external data. It never loaded: a hosted OPA integration typically evaluates only the policy input document, not repository data files, so the JSON sat there inert while the policies evaluated against an empty map.
The fix was to encode the data as Rego itself. They moved the map into a shared package:
package common.access
access_map := {
"prod-network": {"allow_users": ["[email protected]"]},
"prod-db": {"deny_users": ["[email protected]"]}
}
ws_access(ws) = access {
access := access_map[ws]
} else = {} { true }Then imported data.common.access.ws_access into the main package terraform policy, matched input.tfrun.workspace.name against the map, and checked the requesting user via created_by.email and created_by.teams, with the policy declared as policy_type = "workspace" in a v2 scalr-policy.hcl. An alternate pattern, useful when the data changes more often than the policies: inject the map through non-sensitive shell variables and read it from input.tfrun.shell_variables. Either way, the rule of thumb is that everything a policy needs must arrive through the input document or be compiled into the policy bundle. There is no third channel.
Unit Test Individual Rules OPA supports testing with _test.rego files:
package terraform.aws.s3_versioning
test_s3_bucket_without_versioning {
deny["S3 bucket 'aws_s3_bucket.bad' must have versioning enabled."] with input as {
"resource_changes": [
{
"type": "aws_s3_bucket",
"mode": "managed",
"address": "aws_s3_bucket.bad",
"change": {
"actions": ["create"],
"after": {}
}
}
]
}
}
test_s3_bucket_with_versioning {
deny[_] == false with input as {
"resource_changes": [
{
"type": "aws_s3_bucket",
"mode": "managed",
"address": "aws_s3_bucket.good",
"change": {
"actions": ["create"],
"after": {
"versioning": [{"enabled": true}]
}
}
}
]
}
}Run tests:
opa test policies/Use Representative Mock Data Create realistic tfplan.json snippets based on actual plans from your infrastructure.
Automate Testing in CI Integrate policy tests into your policy repository's CI pipeline so every change triggers tests.
Store in Git All policies, tests, and documentation belong in Git:
policy-repository/
├── policies/
│ ├── security/
│ │ ├── s3_encryption.rego
│ │ ├── iam_policies.rego
│ │ └── s3_encryption_test.rego
│ ├── cost/
│ │ ├── instance_size_limits.rego
│ │ └── instance_size_limits_test.rego
│ └── compliance/
│ ├── tagging.rego
│ └── tagging_test.rego
├── scalr-policy.hcl
├── README.md
└── CHANGELOG.md
Use Semantic Versioning Tag releases in Git using MAJOR.MINOR.PATCH:
Gradual Rollout Strategy Introduce new policies in advisory or soft-mandatory mode first. Monitor impact, fix false positives, then escalate to hard-mandatory enforcement.
Version Timeline Example
Continuous Improvement Policies aren't static. Regularly review and update based on:
Monitor Policy Performance Track:
Establish Policy Review Cadence Schedule quarterly or semi-annual reviews to retire obsolete policies, refine overly broad ones, and add new governance requirements.
Plan for the Enforcement Layer Failing The policy engine is itself infrastructure, and it can break in ways your policies never anticipated. One customer running self-hosted agents had every run in their policy-targeted environments start failing, sometimes with Policy Check task contains checks for multiple OPA version: [...], more often with a generic agent error that pointed nowhere. Their policies had not changed; a platform regression had. The emergency workaround was the uncomfortable one: detach every environment from every policy group so teams could keep shipping, which meant running with governance off until the regression was fixed (about a day). That failure mode deserves a runbook. Decide in advance who is authorized to disable enforcement during an outage, how long governance-off is acceptable, and how you re-verify compliance for anything applied during the window. The default failure mode of a broken policy gate is either "nobody ships" or "nothing is checked," and you want to choose deliberately rather than at 2 a.m.
When policies wrongly flag compliant resources, the problem lies with the policy, not the infrastructure:
Example: Policy flagging S3 buckets as unversioned when versioning is managed elsewhere:
# BEFORE: Overly broad
deny[msg] {
r = input.resource_changes[_]
r.type == "aws_s3_bucket"
not r.change.after.versioning
msg := sprintf("%s must have versioning", [r.address])
}
# AFTER: Refined to exclude lifecycle rules
deny[msg] {
r = input.resource_changes[_]
r.type == "aws_s3_bucket"
not r.change.after.versioning
not r.change.after.lifecycle # exclude if managed by lifecycle
msg := sprintf("%s must have versioning", [r.address])
}The inverse problem, a "false negative" that turns out to be a pass, wastes time too. One customer writing deny-list Checkov policies built custom YAML checks (CKV2_* IDs with cond_type: resource and operator: not_exists) to block azuread_* resources and azurerm_role_assignment outside designated workspaces. The checks lived in .checkov/custom_policies/, and because runs produced no output for them, the team concluded the policies "weren't loading." They tried --external-checks-dir, pointed at an external checks repository, and started converting the YAML to Python before opening a support case. Nothing was broken: a not_exists check only produces output when an offending resource actually appears in the plan, so silence was the pass. The lesson generalizes. Before debugging a policy that "isn't firing," write a deliberately violating configuration and confirm the policy catches it. A policy with no failing test case has never been proven to load.
Establish a formal, audited process for exceptions:
Exception Policy Template
Exception Request:
Resource: aws_instance.legacy_system
Policy: instance_size_limit
Reason: Legacy system requires t3.xlarge for compatibility
Duration: 90 days (until Q3 migration)
Approved by: [email protected]
Approved date: 2026-02-11
Review date: 2026-05-12Some tools allow in-code suppressions for legitimate exceptions:
Checkov skip comments:
# Suppress specific checks
resource "aws_s3_bucket" "legacy" {
# checkov:skip=CKV_AWS_21:This bucket intentionally allows public read for legacy reasons. Review by 2026-06-01.
bucket = "legacy-public-bucket"
acl = "public-read"
}Guidelines for Suppressions:
Stop treating security as one central team's job and make it shared.
Security and compliance belong to developers, operations, and security together, not to one gatekeeping team. To make that real, give developers the tools, training, and documentation they need to understand and fix violations themselves, and get teams collaborating on policy design and refinement rather than designing rules in isolation. And when PaC drives a genuine infrastructure improvement, say so publicly. Visible wins are how you keep people bought in.
Cross-functional Policy Definition Involve:
Regular Policy Reviews Schedule monthly or quarterly meetings to:
Invest in Team Development Provide comprehensive training on:
Create Documentation Develop and maintain:
Avoid Boiling the Ocean Don't try to implement comprehensive governance immediately:
Clear Error Messages Provide actionable violation messages:
# POOR: Vague message
deny[msg] {
r = input.resource_changes[_]
r.type == "aws_instance"
not r.change.after.tags["Owner"]
msg := "Missing tags"
}
# GOOD: Actionable message with remediation
deny[msg] {
r = input.resource_changes[_]
r.type == "aws_instance"
not r.change.after.tags["Owner"]
msg := sprintf(
"%s: missing required 'Owner' tag. Add tags = { Owner = \"your-name\" } to fix.",
[r.address]
)
}Documentation Links in Violations Include documentation and policy justification:
deny[msg] {
r = input.resource_changes[_]
r.type == "aws_instance"
r.change.after.instance_type == "t3.2xlarge"
msg := sprintf(
"%s: instance type t3.2xlarge exceeds limits. See policy at https://wiki.company.com/policies/instance-sizing",
[r.address]
)
}Developer Feedback Channels
Demonstrate ROI Present metrics showing value:
Address Concerns Proactively address common objections:
Start with Pilot Launch PaC with a single team or environment to:
Policy Principles
Anti-patterns to Avoid
# ANTI-PATTERN: Too broad, many false positives
deny[msg] {
r = input.resource_changes[_]
r.type == "aws_instance"
msg := "EC2 instances must follow org standards"
}
# BETTER: Specific, testable
deny[msg] {
r = input.resource_changes[_]
r.type == "aws_instance"
not r.change.after.monitoring[0].enabled
msg := sprintf("%s: detailed monitoring must be enabled", [r.address])
}Evaluate Policy Impact Monitor CI/CD performance:
Optimize Heavy Policies For policies that evaluate many resources:
# INEFFICIENT: Evaluates all resources every time
deny[msg] {
r = input.resource_changes[_]
# ... complex conditions ...
}
# EFFICIENT: Pre-filter resources first
deny[msg] {
resources := [r | r := input.resource_changes[_]; r.type == "aws_instance"]
count(resources) > 0
r := resources[_]
# ... conditions on r ...
}Use OPA for policy consistency across infrastructure tools:
# Single policy engine for Kubernetes and Terraform
package compliance.pod_security
deny[msg] {
# This rule can evaluate both k8s pods and terraform containers
container := input.containers[_]
not container.securityContext.runAsNonRoot
msg := sprintf("Container %s must run as non-root", [container.name])
}Generate compliance artifacts from policy data:
#!/bin/bash
# Generate compliance report from policy violations
conftest test --output json --policy ./policies/ tfplan.json > violations.json
# Convert to compliance report format
python3 << 'EOF'
import json
import sys
from datetime import datetime
with open('violations.json') as f:
violations = json.load(f)
report = {
'timestamp': datetime.now().isoformat(),
'summary': {
'total_violations': len(violations),
'high_severity': len([v for v in violations if v.get('severity') == 'high']),
'medium_severity': len([v for v in violations if v.get('severity') == 'medium']),
'low_severity': len([v for v in violations if v.get('severity') == 'low']),
},
'violations': violations
}
print(json.dumps(report, indent=2))
EOFPutting Policy as Code into practice is mostly a sequencing problem. Start by figuring out where your current governance actually leaks: the gaps and pain points you already know about, plus the ones that show up once you look. Pick tools that fit the platform you already run rather than the ones with the longest feature list (OPA, Sentinel, or the static analyzers, depending on where your runs happen). Then write the few policies that address your worst problems first, and expand from there instead of trying to cover everything at once.
The enforcement only sticks if it lives in the pipeline. Wire policy checks into CI/CD at the stages that matter (pre-commit, PR, plan-time, and the platform gate before apply) so a violation is caught where the developer can fix it. And the part that is easy to skip: keep the people side moving alongside the tooling. Policies that security, compliance, operations, and developers all had a hand in writing get followed; policies handed down by one team get worked around.
For more detail on specific tools and approaches, see:
