Terraform Drift Detection: How to Prevent and Remediate

Learn how to manage Terraform drift, automated drift detection, safe remediation options, and the tools to keep your infrastructure secure.

Infrastructure drift is a critical issue for any team using Terraform. It occurs when the real-world state of your infrastructure no longer matches the configuration defined in your code. This discrepancy can introduce security vulnerabilities, cause application failures, and lead to unexpected costs. This post explains what drift is, how to prevent it, and how Scalr can help you detect, remediate, and report on it.

What is Terraform Drift?

In simple terms, Terraform drift is the difference between your infrastructure's actual state and what your Terraform configuration files say it should be. The state file is Terraform's source of truth, but if changes are made to the infrastructure outside of a Terraform workflow (for example, through a cloud provider's web console), the state file becomes outdated and no longer reflects reality.

Why is Drift a Problem?

Drift introduces several challenges to infrastructure management. When your infrastructure deviates from its intended state, inconsistencies arise, potentially leading to unexpected and undesirable behavior. This can also create security vulnerabilities, as manual changes might bypass established security protocols. Ultimately, managing and updating an infrastructure becomes significantly more complex when its actual state doesn't align with its Terraform configuration, hindering operational efficiency.

How Drift Happens and How to Prevent It

Drift is often the result of manual changes. Common scenarios include:

  • Emergency Fixes: A security engineer might manually update a firewall rule to address an immediate threat.
  • Testing and Debugging: A developer could temporarily scale up a resource to test performance and forget to revert the change.
  • Lack of Access: A team member without Terraform access might make a change through the cloud provider's UI.

To be proactive and prevent drift, teams should:

  • Enforce GitOps: All infrastructure changes should be initiated through a pull request and reviewed before being applied. This creates an audit trail and ensures all changes are codified.
  • Restrict Manual Access: Limit who can make manual changes in your cloud environment. Use Role-Based Access Control (RBAC) to enforce the principle of least privilege.
  • Implement Continuous Checks: Regularly schedule drift detection to catch any unauthorized changes quickly.

How to Detect Drift

Terraform provides built-in commands and dedicated features to help you identify when your actual infrastructure deviates from your defined configuration. Regular checks are key to catching unintended changes before they escalate.

  • terraform plan: This command compares your Terraform configuration with the current state of your infrastructure and shows any differences.
  • terraform refresh: This command updates your Terraform state file with the current state of your infrastructure.
  • Third-party tools: Several third-party tools, such as Scalr, Driftctl and Terraform Cloud, can help you detect and manage drift.

When Terraform detects drift, the following output is displayed:

Note: Objects have changed outside of Terraform
Terraform detected the following changes made outside of Terraform since the
last "terraform apply" which may have affected this plan:

How Scalr Detects Drift

Note: Scalr does not charge for drift runs, and the drift feature is included on all tiers.

Scalr has a built-in, automated drift detection feature. You can enable it at the environment level and schedule it to run as frequently as you need. When a drift check runs, Scalr compares the current state of your infrastructure with the last known good state from your Terraform code. If any discrepancies are found, Scalr flags the workspace as "drifted."

Drift detected in Scalr

Remediation Options in Scalr

When drift is detected, Scalr provides several options for remediation. We believe in a user-controlled approach rather than automatic remediation, because auto-remediation can be risky. An automatic rollback could unintentionally discard a critical, emergency change.

Remediate the drift in Scalr

Scalr empowers you with the following choices:

  • Ignore: If the change was intentional, you can simply acknowledge and ignore the drift.
  • Sync State: For expected changes, you can sync the state file. This updates your code to match the new reality, bringing your infrastructure back into alignment with your configuration.
  • Revert Infrastructure: For unauthorized changes, Scalr can generate a plan to revert the infrastructure to its last known good configuration. You can review this plan and then approve its application.

Drift Notifications

Immediate awareness is key to managing drift. Scalr integrates directly with Slack to send real-time notifications whenever drift is detected. These alerts are sent to a designated channel, ensuring that the right team members are immediately informed of any unauthorized changes to your infrastructure.

Get notified in Slack when drift happens
Get notifications directly in Slack

Drift Reports

Visibility is crucial for managing drift at scale. Scalr allows you to build comprehensive drift reports through its operational dashboards. These dashboards can be configured at both the account and environment levels.

Build drift reporting in Scalr

You can create a custom view that filters to show only workspaces with drift. This gives you a quick, at-a-glance report of all drifted resources across your organization. From this central dashboard, you can drill down into specific workspaces to investigate the drift and decide on the appropriate remediation action. This reporting capability helps platform and security teams maintain a constant watch over the state of their infrastructure.

Drift Reports for Platform Teams

For platform engineering teams, Terraform drift reports are not just a useful tool; they are essential for governance and oversight. While these teams may not execute daily terraform apply commands, they are ultimately responsible for the stability, security, and cost-effectiveness of the entire cloud estate. Terraform drift reports provide a high-level, aggregate view of infrastructure health, allowing platform engineers to identify which teams or workspaces are consistently deviating from their codified configurations. This visibility is critical for enforcing best practices, identifying potential security risks from unmanaged resources, and understanding the true state of the infrastructure without having to inspect individual pipelines or workspaces.

This is why Scalr built a "stale workspace" report, which includes drift as well. If you're doing proper GitOps, then you likely want your Terraform workspaces executing runs more often than not. The stale workspace report highlights workspaces that have not had a run executed in 7, 14, 30, 60, or 90 days. From the report, platform teams can execute runs to determine the actual state of the infrastructure to ensure there is no drift, security vulnerabilities, or unwanted costs:

Stale workspace reports help identify drift and vulnerabilities

Scalr Drift Demo

More of a visual learner? Check out this short demo of drift detection in Scalr:

Open Source Options

Prefer to use an open-source tool for drift detection? Check out the following options in the market:

driftctl

Developed by Snyk, driftctl is a command-line tool that scans your cloud environments (AWS, GCP, Azure, and GitHub) and compares the actual state of resources with your Terraform state files. It is designed to detect, track, and alert on drift, specializing in identifying unmanaged resources—those that exist in your cloud account but are not tracked in your IaC. This helps you increase your code coverage and spot potential security blind spots.

OpenTofu

OpenTofu is an open-source fork of Terraform and serves as a drop-in replacement, meaning it has the same core drift detection capabilities. By running the tofu plan command, you can see a proposed execution plan that highlights any differences between your configuration files and the infrastructure's current state. While this requires manual execution or simple scripting for automation, it is a fundamental and widely-used method for identifying drift directly within the IaC workflow.