
Your CI/CD pipeline is a loaded gun. You know drift happens — what's easy to underestimate is what your automation does the instant it meets a change it didn't make. Put -auto-approve in the chain and "reconcile to code" can mean destroying live production to get there. This is that failure mode in detail — how a routine commit turns an out-of-band change into an outage — plus the detection telemetry worth building before it does, for both Terraform and OpenTofu.
Infrastructure drift, or configuration drift, occurs when the actual, live state of your deployed infrastructure diverges from the intended state defined in your Infrastructure as Code configuration files and state. Your code no longer accurately represents what's running in your cloud environment.
In a Terraform context, drift means the difference between:
Here's how a single manual click can take down a production database for an hour.
Imagine a team that can't restore a week-old backup of an Azure Cosmos DB instance: it's on Periodic backup, which caps retention at 24 hours. An operator opens the Azure Portal, switches the account to Continuous backup (30-day point-in-time restore), and hits Save. Azure accepts it. The console goes green. Problem solved — or so it looks.
What they don't realize: Azure makes the Periodic-to-Continuous upgrade irreversible. There is no path back to Periodic.
Weeks later, an unrelated, application-only PR merges and triggers a routine terraform apply -auto-approve. Terraform refreshes state, sees that the live account returns Continuous while the state file expects Periodic, and tries to reconcile the difference. Because the provider can't downgrade the attribute, the only plan it can compute to make reality match the code is a destroy and recreate of the live production database. The pipeline executes instantly. Engineers scramble to cancel the GitHub Actions runner, but the API call has already reached Azure Resource Manager. Production is down for an hour until Azure Support recovers the data.
This is a one-way door attribute: some cloud settings, once changed, have no API path back. Pair one with -auto-approve and Terraform's only way to reconcile the drift is to destroy and recreate the resource. The drift didn't cause the outage — the automated reconciliation of it did.
Drift isn't usually malicious; it creeps in through everyday operational realities:
The most common cause. Engineers make quick changes directly via cloud provider consoles to fix urgent issues or test something, bypassing the IaC workflow entirely. Emergency security patches, performance tuning, and debugging often trigger manual changes. It's not always a change, either — a subnet or IAM role deleted by hand in the console shows up on the next plan as a resource Terraform wants to recreate.
Multiple tools managing the same resources without proper coordination cause conflicting changes. Terraform provisions a server while Ansible later modifies its network configuration independently, or a security tool like AWS Config or Security Hub auto-remediates a finding — re-enabling S3 default encryption, resetting a wide-open security group — and writes straight to the cloud API without ever touching Terraform's state. (More on why that specific collision is so vicious in the remediation section below.)
Critical incidents sometimes necessitate immediate manual changes to restore service. If these aren't backported to the IaC code, they become persistent drift that diverges further over time.
Operations teams or developers run custom scripts to modify resources outside the purview of the primary IaC tool, often without documentation or version control.
Team members unfamiliar with IaC principles might make direct changes, underestimating the cascading impact on infrastructure consistency.
Drift doesn't require anyone touching the cloud at all. When a VCS access token expires, webhook processing stops — pull requests stop triggering plans, merges keep landing in Git, and nothing reaches the cloud. Code and reality diverge with every commit, and there's no failed run to alert on because no runs are happening. Across Scalr's own fleet we measured 121 broken VCS connections in a single 30-day window, 79 of them fully broken providers on paid accounts. Drift by silence is one of the most common patterns in Scalr's support queue: the GitOps pipeline dies without an error anyone sees, and the gap only surfaces weeks later when someone runs a plan by hand. Monitor your VCS connection health the same way you monitor the infrastructure itself.
Auto-scaling groups replace instances, managed databases perform automated maintenance, cloud providers change default settings (forcing TLS 1.2 on storage accounts, flipping default S3 encryption)--these provider-initiated changes can alter resource configurations dynamically, and Terraform may then try to downgrade them back to your now-stale config.
Ignoring drift introduces serious business risks:
| Risk | Impact |
|---|---|
| Security Gaps | Drift can undo carefully configured security settings--altered firewall rules, S3 bucket policies, IAM permissions--inadvertently opening vulnerabilities to attacks |
| Compliance Violations | Unauthorized changes can breach PCI DSS, HIPAA, SOC 2, or GDPR requirements, resulting in failed audits and potential fines |
| Budget Blowouts | Unmanaged resources or unintended scaling lead to surprise cost increases and operational overhead in tracking "ghost" infrastructure |
| Stability & Reliability | When code isn't the source of truth, troubleshooting becomes guesswork, leading to unpredictable behavior and downtime |
| Reduced Agility | Teams hesitant to deploy changes slow down innovation and increase deployment friction |
Terraform and OpenTofu provide foundational tools for detecting drift. These native commands are your first line of defense — but they only work as well as your state file and remote backend setup allow.
The terraform plan command is your primary drift detection tool. When executed, Terraform performs a four-step process:
terraform planAn execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# aws_s3_bucket.example will be updated in-place
~ resource "aws_s3_bucket" "example" {
id = "my-example-bucket"
~ versioning {
~ enabled = false -> true
}
}
Plan: 0 to add, 1 to change, 0 to destroy.
The ~ symbol indicates drift. The versioning attribute shows the actual state differs from your code.
While terraform plan implicitly performs a refresh, you can run terraform refresh as a standalone command. This updates your state file to reflect the real-world state of resources without making any changes to your infrastructure.
terraform refreshImportant: OpenTofu has deprecated the standalone tofu refresh command due to safety concerns. Instead, use tofu apply -refresh-only or terraform apply -refresh-only, which perform the same refresh but allow review of changes before committing them to state.
# Recommended approach (works for both Terraform and OpenTofu)
terraform apply -refresh-only
tofu apply -refresh-onlyFor CI/CD pipeline integration, use the -detailed-exitcode flag:
terraform plan -detailed-exitcode
# Returns:
# 0 - No changes (no drift)
# 1 - Error occurred
# 2 - Changes present (drift detected)That exit code is the whole detection mechanism — you don't need an expensive SaaS scanner to act on it. Run the plan on a cron, catch exit code 2, and pipe the raw plan output straight to Slack:
import subprocess, requests
SLACK_WEBHOOK = "https://hooks.slack.com/services/XXX/YYY/ZZZ"
result = subprocess.run(
["terraform", "plan", "-detailed-exitcode", "-no-color"],
capture_output=True, text=True,
)
# 0 = no drift, 1 = error, 2 = drift detected
if result.returncode == 2:
requests.post(SLACK_WEBHOOK, json={
"text": f":rotating_light: *Drift detected*\n```{result.stdout[-3000:]}```"
})
elif result.returncode == 1:
requests.post(SLACK_WEBHOOK, json={
"text": f":warning: *Drift check failed*\n```{result.stderr[-3000:]}```"
})A cron entry running this against each workspace gives you the bones of a detection system for the cost of a few lines of Python. Where it stops scaling — credentials, state locks, per-workspace config — is exactly what the platform section below covers.
While essential, native commands have significant limitations:
The state-file dependency creates failures that plan will never warn you about. AWS EventBridge Rules, for example, don't have a naturally stateful API model. If two separate Terraform workspaces deploy distinct resources but happen to share the same rule name, they overwrite each other's parameters. Each workspace's terraform plan keeps reporting "No changes" — because each state file believes it owns the rule — while the live configuration gets stomped back and forth out-of-band. The thing that's supposed to detect drift is structurally blind to it.
False negatives happen at the dashboard layer too. A Scalr customer came to us after their drift dashboard reported no drift on a workspace where a live terraform plan showed 7 resources pending destruction. The detector and the plan were comparing different things — and the discrepancy only surfaced because someone happened to run a plan manually. Whatever you use for drift detection, know exactly what it compares (Git vs. last applied state vs. live cloud API) and when it last ran. A green badge tells you the detector found nothing the last time it looked, with the inputs it had; it does not tell you the next apply is safe.
A nonzero plan isn't always evidence of an out-of-band change. A team we worked with at Scalr during a Terraform Cloud migration ran a controlled experiment: they built fresh infrastructure in TFC, applied until the plan showed zero pending changes, confirmed zero drift, then migrated the workspace by moving the state file. The first post-migration plan reported 11 changes — on infrastructure that was verifiably drift-free both before and after the move, with nobody touching the cloud in between. The "drift" was an artifact of the state handoff between tools, not a real divergence. When a plan lights up immediately after a migration, refactor, or provider upgrade, read the diff before treating it as drift: tooling transitions generate phantom changes, and reverting them can do real damage.
Native commands can only reason about resources in your state file. Everything else — manually created resources, another team's stack, shadow infrastructure — is invisible. The Infrastructure Coverage Ratio quantifies that blind spot:
ICR = (R_managed / R_total) × 100
R_managed = resources defined in your version-controlled IaC
R_total = total active resources scanned live via cloud provider APIs
A low ICR means a large share of your footprint can be changed out-of-band and never show up in a plan at all. Track it over time: a falling ICR is a leading indicator that ClickOps is outrunning your codebase.
Catching drift is reactive; stopping it from happening is cheaper. Prevention takes both technical controls and team habits. Layer in Policy as Code to block guardrail-violating changes before they reach apply, and pair it with IaC security scanning on every PR.
Make Git your single source of truth. All infrastructure changes must flow through pull requests with required reviews before being applied. This creates an audit trail, ensures all changes are codified, and enables automatic rollback.
Key practices:
Mature platform teams treat production as having exactly one writer: the pipeline. A human writing directly to prod is an exception that has to be accounted for.
They don't pretend the exception never happens; incidents force someone into the console eventually. So they bound it. Manual access goes through short-lived break-glass credentials, every session is logged, and there's a hard reconciliation window — whatever you changed by hand has to land back in HCL before the clock runs out, or it gets flagged. The manual change is allowed; leaving it un-codified is what trips the alarm.
In practice:
"Just lock down IAM write permissions in prod and your drift problems disappear" ignores org reality. If 40+ engineers are used to ClickOps portal access, revoking it isn't a config change — it's a political project. Most platform teams are 12 to 18 months from a full console lockdown, so they live in a hybrid state where manual edits and declarative code constantly collide. One writer is the goal; break-glass access with a tight reconciliation window is how you survive the years before you get there.
Regularly schedule drift detection to catch unauthorized changes quickly. Detection frequency depends on your risk tolerance and operational tempo.
Scheduling strategies:
Schedule around your runner capacity, too. Drift checks consume the same execution slots as production plans and applies. A customer in APAC had their scheduled drift detection fire in the middle of the business day and saturate all 5 of their concurrent run slots for about 30 minutes — every normal plan and apply queued behind the drift sweep until it finished. The fix was unglamorous: move the schedule to 3 AM local time and budget capacity for the sweep. If your drift checks run while engineers are shipping, you've traded drift risk for deployment latency.
Define and enforce policies automatically using Open Policy Agent (OPA) or Sentinel. Policies are checked before terraform apply runs, preventing non-compliant changes.
# Example OPA policy
package terraform.aws.s3
deny[msg] {
input.resource_changes[_].type == "aws_s3_bucket"
not input.resource_changes[_].change.after.server_side_encryption_configuration
msg := "S3 buckets must have server-side encryption configured."
}This policy prevents creation of unencrypted S3 buckets, preventing a common source of drift and security violations.
Once drift is detected, you have two main philosophies and multiple tactical approaches. Making the wrong call can be worse than the drift itself: blindly reverting an emergency scaling event could cause an outage, while ignoring a security group modification could leave you exposed.
Prioritize your Terraform code as the source of truth. Run terraform apply to revert the infrastructure to match your coded state. This is the right call when drift is unauthorized or unintentional -- someone opened a security group port that shouldn't be open, or a manual change broke the expected configuration.
When to revert:
When NOT to revert:
Process:
The risk with reverting is timing. If you revert automatically at 3 AM without understanding the context, you might undo an emergency change that's keeping production running.
The 2 AM SSH reflex. During a high-severity outage, an on-call engineer couldn't reach a critical server. They opened the AWS console and edited a security group to allow SSH on port 22 from 0.0.0.0/0, got back in, resolved the incident, and went back to sleep. The hotfix was never written into the HCL or even logged in Jira. Three months later, a routine deployment ran terraform plan, saw the unauthorized port-22 rule, and did exactly what it was built to do: it closed it. By then the team had wired that open port into downstream admin automation — so closing it triggered a fresh production outage that took half a day to trace back to a security group nobody remembered touching.
Why we don't auto-remediate. The industry reflex is "detect drift → auto-apply to fix it." On paper it's clean; in practice an automated revert can't tell an attack from an emergency change that's the only thing keeping the service up. It gets worse when another system is the one writing: a security tool (AWS Config, Security Hub) auto-remediates a finding, writes to the cloud API without updating state, and on the next run Terraform flags the security fix as drift and tries to revert it back to the non-compliant HCL. Now your pipeline and your security automation are in a continuous, resource-burning tug-of-war, each undoing the other — a loop other teams have hit with AWS Config. That auto-remediation collision loop is why platforms like Scalr deliberately keep a human in the loop — surfacing the drift and the three paths, and letting an engineer decide, rather than blindly applying.
Accept the drifted state as the new desired state. Update your Terraform .tf files to match the actual infrastructure. Suitable for intentional changes like emergency hotfixes that need codification — and as other teams have learned the hard way, the code has to be aligned to the live state before the next apply, or Terraform reverts the fix.
When to align:
Process:
For simple attribute changes, updating the .tf file and running terraform plan to confirm zero diff is straightforward. For resources that were created outside Terraform entirely, you'll need terraform import to bring them into state.
Not all drift requires action. Some changes are expected, temporary, or managed by other systems. The key is to acknowledge them deliberately rather than leaving them as unreviewed noise in your drift reports.
When to ignore:
The danger of ignoring drift is that it becomes habitual. If your team starts dismissing all drift alerts, you'll miss the one that actually matters. Good drift hygiene means reviewing every detection, making an explicit decision, and documenting why you chose to ignore it.
When you detect drift, run through these questions in order:
1. Is the change security-sensitive? If an IAM policy, security group, encryption setting, or access control was modified, treat it as high-priority. Revert immediately unless you can confirm the change was authorized and intentional.
2. Was the change intentional? Check with your team. If someone made a deliberate change during an incident or as part of a planned activity, the right path is usually to align your code rather than revert.
3. Is the change still needed? Emergency scaling events are intentional but temporary. If the incident is resolved and the extra capacity isn't needed, revert. If it is, align your code.
4. Is it managed by another system? Auto-scaling groups, Kubernetes operators, and other automation tools legitimately modify resources. If another system is authoritative for that resource attribute, consider using lifecycle { ignore_changes } in your Terraform configuration to prevent false positives going forward.
5. Can you explain why you're ignoring it? If you can't articulate a clear reason, don't ignore it. The inability to explain drift is itself a signal that something unexpected happened.
When drift represents intentional changes that should be captured, update your state file without modifying infrastructure:
# Terraform - validate non-destructive changes first
terraform plan -target=aws_instance.example
# Update state to match actual infrastructure
terraform apply -refresh-onlyWhen drift represents unauthorized changes, generate and apply a plan to revert:
# Create specific target plan
terraform plan -target=aws_security_group.web_sg -out=tf.plan
# Review the plan carefully
terraform show tf.plan
# Apply if correct
terraform apply tf.planWhen resources were created outside Terraform, import them to bring them under IaC management:
# Import existing resource
terraform import aws_s3_bucket.data bucket-name
# For Terraform 1.5+, use import blocks
import {
to = aws_instance.web
id = "i-1234567890abcdef0"
}Not all drift requires immediate attention. Establish a prioritization framework based on business impact and risk:
| Type | Priority | Example | Approach |
|---|---|---|---|
| Security-critical | P0 | Modified security groups, IAM policies | Immediate remediation |
| Business-critical | P1 | Changes to production databases, load balancers | Scheduled remediation |
| Configuration drift | P2 | Instance type changes, tag modifications | Batch remediation |
| Informational | P3 | Comment changes, cosmetic differences | Document for next update |
You don't need invented job titles for this. In practice, drift duties sit with the platform team and run on a rotation: whoever holds the drift pager that week triages new detections against the priority table above, decides revert/align/ignore, and makes sure any manual change gets reconciled into HCL before its reconciliation window closes. Ownership of a given workspace's drift follows whoever owns that workspace's code; the rotation just guarantees someone is always looking.
While native Terraform commands provide a foundation, mature IaC management platforms offer significantly enhanced capabilities. The difference changes the operational model fundamentally. For a hands-on look at scheduled drift checks, see how to set up scheduled drift detection; for the integrated platform approach, see our deep dive into Scalr's platform architecture.
Before reaching for a platform, look at what manual drift detection actually takes at scale. The shell script running terraform plan -detailed-exitcode is the easy part. The hard part is everything around it:
Credential management. Every drift check needs valid cloud provider credentials. For AWS, that means IAM roles or access keys for each account. For multi-cloud setups, you're managing credentials across AWS, Azure, GCP, and whatever else you run. These credentials need rotation, and if they expire, your drift detection stops working with no error raised.
State locking. When your drift detection script runs terraform plan, it acquires a state lock. If a developer triggers a real plan at the same time, one of them fails. At scale, this contention becomes a real problem -- your drift checks start interfering with production deployments.
Per-workspace configuration. Each Terraform workspace needs its own script invocation with the right backend config, variable files, and provider configuration. Adding a new workspace means updating your drift detection setup. Teams inevitably forget, and new workspaces go unmonitored.
Alerting and reporting. A cron job that prints "DRIFT DETECTED" to a log file isn't actionable. You need to parse the plan output, send structured alerts, track which workspaces have unresolved drift, and give someone a way to act on the findings. This is effectively building a dashboard from scratch.
Maintenance. Terraform CLI updates, provider version changes, backend configuration changes, and CI/CD platform migrations all break drift detection scripts. These scripts are never anyone's primary responsibility, so they break unnoticed and stay broken for weeks.
If you're running fewer than 10 workspaces, all in one cloud provider, with a small team -- the manual approach is fine. Beyond that, the operational cost justifies a platform.
Scalr is an Infrastructure as Code management platform providing drift detection, reporting, and remediation options for both Terraform and OpenTofu environments. It treats drift detection as a first-class platform feature rather than something you bolt on with scripts.
Scalr employs flexible detection strategies:
This dual-source comparison catches more deviations than plan-based detection alone.
In Scalr, drift detection is enabled per environment, not per workspace. When you turn it on for your production environment and set a daily schedule, every workspace in that environment is automatically covered. New workspaces inherit the drift detection policy the moment they're created -- there's no separate configuration step to forget.
This is a meaningful architectural difference. Manual approaches require explicit opt-in per workspace. Scalr's environment-level model is opt-out -- you'd have to deliberately exclude a workspace from drift detection. The default is coverage, not gaps.
That coverage now extends to Terragrunt run-all workspaces. Multi-module stacks orchestrated through run-all are checked on the same environment schedule as your standard Terraform and OpenTofu workspaces, so drift in a Terragrunt-managed deployment surfaces in the Drift Detection tab alongside everything else -- no separate tooling for your Terragrunt stacks.
To set it up:
There's no script to maintain, no credentials to manage separately, and no state lock conflicts -- Scalr coordinates drift checks with regular runs automatically.
Scalr deliberately does not provide fully automated remediation. The platform requires explicit user intervention, prioritizing safety and deliberate action:
The advantage of handling remediation through a platform rather than CLI commands is visibility. When an engineer reverts drift from Scalr, the action is logged, attributed to a user, and visible to the team. When someone runs terraform apply from their laptop to fix drift, nobody else knows it happened.
| Capability | Manual (cron/CI) | Scalr |
|---|---|---|
| Setup per workspace | Script + config per workspace | None -- inherits from environment |
| New workspace coverage | Manual opt-in (often forgotten) | Automatic |
| Credential management | Separate service accounts | Reuses existing credentials |
| State lock handling | Contention with prod runs | Coordinated automatically |
| Alerting | Build your own | Native Slack integration |
| Org-wide visibility | Build your own dashboard | Built-in dashboards |
| Remediation | Separate CLI step | Integrated in same UI |
| Audit trail | CI/CD logs (if retained) | Full audit of detections + actions |
| Maintenance burden | Scripts break with TF/provider updates | Zero -- managed by platform |
| Best for | <10 workspaces, single cloud | 10+ workspaces, multi-team |
The inflection point is usually around 15-20 workspaces, or when a second team starts managing infrastructure. At that point, the operational cost of maintaining per-workspace scripts, managing credentials, and aggregating alerts exceeds the cost of adopting a platform.
Several tools address drift detection, each with distinct philosophies and strengths.
| Feature | Scalr | env0 | Terramate | Driftive | Snyk IaC |
|---|---|---|---|---|---|
| Primary Focus | User-controlled drift mgmt | AI-powered analysis | Orchestration + auto-remediate | Notification-first detection | Unmanaged resources |
| Scheduled Detection | Yes (Native) | Yes (Native) | Yes (CI/CD config) | Manual/scripted | Yes (Integrated) |
| Unmanaged Resources | Not prioritized | Not prioritized | Limited | Limited | Yes (Primary) |
| Remediation | Ignore/Sync/Revert | Auto-policies & more | Automated reconcile | Manual via notifications | Manual |
| OpenTofu Support | Yes (Founding member) | Yes (Founding member) | Yes | Yes | Unconfirmed |
| Reporting & Alerts | UI/Dashboard/Slack | UI/Notifications/AI | Cloud UI/Slack | Slack/GitHub Issues | CLI/Snyk UI |
| Best For | Control-focused orgs | Deep analysis needs | High automation orgs | OSS/self-hosted | Shadow IT concerns |
For large AWS or multi-cloud environments, manual detection becomes impractical. Implement automated, scaled detection:
Schedule regular drift detection in your CI/CD pipeline:
# GitHub Actions example
name: Terraform Drift Detection
on:
schedule:
- cron: '0 8 * * *' # Daily at 8 AM
jobs:
detect_drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
- name: Terraform Init
run: terraform init
- name: Check Drift
run: |
terraform plan -detailed-exitcode
if [ $? -eq 2 ]; then
echo "Drift detected!"
# Send notification to Slack/email
fiFor AWS Organizations:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Deny",
"Action": [
"ec2:ModifyInstanceAttribute",
"rds:ModifyDBInstance"
],
"Resource": "*",
"Condition": {
"StringEquals": {"aws:ResourceTag/ManagedBy": "Terraform"}
}
}
]
}This policy prevents modification of Terraform-managed resources, preventing drift at the source.
Problem: Team members make emergency changes via the AWS console
Prevention:
Recovery:
terraform import operations to bring resources under IaCProblem: Auto Scaling Groups, managed services automatically modify resources — one of the most common drift questions on r/terraform
Solution:
Use lifecycle blocks to ignore expected changes:
lifecycle {
ignore_changes = [instance_type, tags]
}Watch for false positives. Not everything a detector flags is real drift. Tools that compare raw cloud state against your last apply can surface provider-managed read-only fields that terraform plan never shows as a diff. One team kept getting paged because their detector flagged latest_restorable_time on an aws_db_instance every scan — a timestamp AWS bumps on its own, not something anyone changed. Acknowledge fields like this once and filter them out; if you treat provider-generated noise as drift, people start ignoring the report, and then they miss the change that matters.
Problem: terraform apply operations fail midway, leaving partial state
Solution:
-target carefully with state lockingImplement recovery procedures:
terraform apply -refresh-only # Synchronize state
terraform plan -detailed-exitcode # Validate stateProblem: Other systems modify AWS resources independently
A common version: a platform team ran org-wide automation that stamped every AWS resource with cost-allocation tags (prefixes like auto: and cpf-). Those tags lived in the cloud but never in the Terraform, so every plan reported drift on resources nobody had touched in code — and a blind apply would have stripped the tags their finance team depended on for chargeback. This is the textbook ignore, don't revert case: the out-of-band change is intentional and authoritative, so the fix is to make Terraform blind to it rather than fight it. The AWS provider can ignore whole tag families at the source:
provider "aws" {
ignore_tags {
key_prefixes = ["auto:", "cpf"]
}
}Solution:
ignore_tags (or scoped ignore_changes) for fields another system legitimately ownsrun-all Cross-Stack DriftProblem: With run-all, applying one stack re-plans its dependencies — and shared provider configuration can surface as drift across stack boundaries
A team with separate VPC and VPN stacks shared a provider default_tags block that set managedBy. The VPC stack was clean on its own, but running the dependent VPN stack via run-all re-planned the VPC and reported drift — proposing to remove the managedBy tag from VPC resources. The change was never real; it was an artifact of cross-stack re-planning.
Solution:
run-all evaluates dependencies, so a clean stack can show drift only when reached through a dependent oneignore_changes rather than letting a run-all reconcile it awayFor compliance-grade evidence of who-changed-what, layer Terraform audit logs into the workflows below.
Leadership & Documentation:
Team Training:
Combine native commands with platform-based detection:
terraform planCreate decision trees for different drift types:
Prevention is vastly more efficient than remediation:
Maintain visibility into drift patterns:
Go deeper: For a hands-on implementation guide, see how to set up scheduled drift detection.
Drift is inevitable; losing track of it isn't. Get terraform plan running on a schedule this week so you know your baseline. Decide your three remediation paths — revert, align, ignore — before you're staring at a 3 AM alert, not during it. Prevent what you can with GitOps discipline and policy as code. And once you're past a dozen or so workspaces, stop hand-maintaining drift scripts and let a platform coordinate detection, alerting, and remediation in one place. Scheduled drift detection is built into Scalr, where drift-detection runs don't count against the run allowance and plans start free at up to 50 runs per month.
Everything else builds on knowing where you stand. Start there.
