Features

Documentation

Pricing

About

Get Started

All articles

Comprehensive Guide

Terragrunt: The Complete Guide for Terraform and OpenTofu Users

Learn Terragrunt basics, modular Terraform, DRY configs, and automation in this beginner-friendly step-by-step guide.

Sebastian StadilMarch 6, 2026Updated June 11, 2026

Key takeaways

Terragrunt is a thin wrapper around Terraform and OpenTofu that removes duplicated backend configs, variable definitions, and module blocks through include blocks and generated remote_state configuration.
Keep include chains to 2-3 levels and always pass an explicit file name to find_in_parent_folders(). Deep hierarchies obscure where settings originate and add HCL parsing overhead.
run-all behaves differently from per-module runs: dependency plans can use stale state, skipped units emit no planfile, and aggregate change detection in orchestration layers can disagree with per-unit logs.
Restrict mock_outputs with mock_outputs_allowed_terraform_commands = ["validate", "plan"] so placeholder values are never consumed during an apply.
Git-dependent functions like get_repo_root() fail on remote runners that work from a code tarball instead of a git clone; a marker file plus find_in_parent_folders() is the portable alternative.
Decide your unit-to-workspace mapping before you scale. Retrofitting it later means migrating state across workspaces.

What is Terragrunt?

Terragrunt is a thin, language-agnostic wrapper around Terraform and OpenTofu that helps teams maintain DRY (Don't Repeat Yourself) configurations at scale. It doesn't replace Terraform. It runs on top of Terraform and gives you a framework for managing many modules, environments, and regions without the repetitive boilerplate that comes with large-scale Terraform deployments.

As infrastructure grows across multiple environments and cloud accounts, keeping the same configuration in sync across dozens or hundreds of modules gets tedious and error-prone. Terragrunt gives you patterns and tooling to cut that repetition without losing clarity or control.

Key Components

Terragrunt centers on a few critical concepts:

Modules: The actual Terraform/OpenTofu code (your infrastructure definitions)
Configurations: Terragrunt's HCL-based configuration files (terragrunt.hcl) that manage module instantiation
Dependencies: Mechanisms to orchestrate execution order and share outputs between modules
Remote State: Automated backend configuration for storing and managing Terraform state

Why Use Terragrunt?

Terragrunt solves a few of the main problems teams run into when scaling Terraform across multiple environments:

1. Reducing Configuration Repetition

In vanilla Terraform, running the same module across development, staging, and production means repeating backend configurations, variable definitions, and module blocks. Terragrunt pulls these together through inheritance and templating:

# Parent configuration (env.hcl)
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket         = "my-company-tfstate-${get_aws_account_id()}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "my-company-tfstate-lock-${get_aws_account_id()}"
  }
}

Child configurations inherit this automatically, so you don't repeat it across your whole infrastructure codebase.

2. Managing Multiple Environments

Terragrunt lets you layer variables and override them per environment. You can deploy a single component (like a VPC module) across several environments with only small configuration differences:

# Child configuration inherits parent settings
include "root" {
  path = find_in_parent_folders()
}
 
inputs = {
  environment = "production"
  instance_count = 10
  cidr_block = "10.0.0.0/16"
}

3. Orchestrating Dependencies

Dependencies between modules are awkward in vanilla Terraform. Terragrunt gives you explicit dependency management: it works out the execution order for you and shares outputs between modules without you running Terraform output commands by hand:

dependency "vpc" {
  config_path = "../vpc"
}
 
inputs = {
  vpc_id = dependency.vpc.outputs.vpc_id
}

4. Simplifying State Management

Terragrunt generates backend configuration for you, including the S3 buckets and DynamoDB tables for state locking. That takes the manual setup off your plate and keeps state storage patterns consistent across your organization.

Installation and Setup

Prerequisites

Terraform or OpenTofu installed (v0.12.26 or later for Terraform)
Terragrunt binary available in your PATH
Cloud provider credentials configured (AWS, Azure, GCP, etc.)

Installing Terragrunt

macOS (via Homebrew):

brew install terragrunt

Linux:

wget https://github.com/gruntwork-io/terragrunt/releases/download/v0.50.x/terragrunt_linux_amd64
chmod +x terragrunt_linux_amd64
sudo mv terragrunt_linux_amd64 /usr/local/bin/terragrunt

Windows: Download the binary from the Terragrunt releases page and add it to your PATH.

Project Structure

A typical Terragrunt project follows this organization:

project/
├── terragrunt.hcl                 # Root configuration
├── components/
│   ├── vpc/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   └── outputs.tf
│   ├── eks/
│   │   ├── main.tf
│   │   └── ...
│   └── rds/
│       └── ...
├── environments/
│   ├── dev/
│   │   ├── terragrunt.hcl
│   │   ├── vpc/
│   │   │   └── terragrunt.hcl
│   │   └── eks/
│   │       └── terragrunt.hcl
│   ├── staging/
│   └── production/
└── common/
    ├── common.hcl
    ├── region.hcl
    └── env.hcl

DRY Configurations: The include and find_in_parent_folders Pattern

Understanding include Blocks

The include block is how Terragrunt does code reuse. It lets child configurations inherit and extend parent configurations:

# Child configuration
include "root" {
  path = find_in_parent_folders()
}
 
# Additional configuration merged with parent
inputs = {
  tags = {
    Environment = "staging"
  }
}

The find_in_parent_folders Challenge

find_in_parent_folders() searches up the directory tree for a configuration file, but it gets finicky once you stray from standard naming. Best practice has shifted to naming the parent file explicitly:

# More explicit and reliable
include "root" {
  path = find_in_parent_folders("env.hcl")
}

Avoiding the Deep Include Chain

Deep include hierarchies cut repetition, but they make it harder to see where a setting came from and they add HCL parsing overhead at scale. Keep include chains to 2-3 levels maximum:

# Root level (env.hcl)
remote_state { ... }
terraform { ... }
 
# Mid level (region.hcl)
include "root" {
  path = find_in_parent_folders("env.hcl")
}
locals { ... }
 
# Child level (terragrunt.hcl)
include "region" {
  path = find_in_parent_folders("region.hcl")
}

Common Anti-Pattern: Copy-Pasting Configurations

When teams don't understand include and locals, they tend to copy-paste terragrunt.hcl files instead, which throws away the whole point of Terragrunt. Use include and locals to avoid duplication.

Dependencies and Run Order

Explicit Dependency Declaration

Terragrunt analyzes dependency blocks to determine execution order when using run-all commands:

# Module: app/terragrunt.hcl
dependency "vpc" {
  config_path = "../vpc"
}
 
dependency "rds" {
  config_path = "../rds"
}
 
inputs = {
  vpc_id = dependency.vpc.outputs.vpc_id
  db_endpoint = dependency.rds.outputs.endpoint
}

Terragrunt will ensure VPC and RDS are applied before the app module.

Mock Outputs for Planning

Mock outputs enable planning without applying dependencies first:

dependency "vpc" {
  config_path = "../vpc"
 
  mock_outputs = {
    vpc_id = "vpc-mock123"
    subnet_ids = ["subnet-mock1", "subnet-mock2"]
  }
}

The danger is that whatever commands you let the mock outputs reach will treat them as real values. One of our community members ran the first apply of a new stack before its dependencies had been applied, and the modules took the mocks as inputs. Infrastructure got created with placeholder values baked in. After a retry, their report read "looks like the second run worked," followed shortly by "I spoke too soon, second run also passed mock inputs to a module." The fix is to restrict mocks to read-only commands so an apply can never see them:

dependency "vpc" {
  config_path = "../vpc"
 
  mock_outputs = {
    vpc_id = "vpc-mock123"
    subnet_ids = ["subnet-mock1", "subnet-mock2"]
  }
  mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}

The run-all Complexity

run-all plan and run-all apply commands execute across multiple modules, but understanding behavior is critical:

run-all plan: If a dependency has unapplied changes, dependent modules' plans use the old state
run-all destroy: Only destroys the specified module, not modules that depend on it
run-all show -json: Outputs concatenated JSON (not valid single JSON document), making parsing difficult

For safety, use flags to control parallelism and error handling:

# Sequential execution for critical operations
terragrunt run-all apply --terragrunt-parallelism 1
 
# Continue on errors with caution
terragrunt run-all plan --terragrunt-ignore-dependency-errors

Running run-all through an orchestration platform adds a wrinkle: one "run" now wraps many planfiles, and the layer that aggregates them can disagree with the per-unit logs. A Scalr customer watched the console output of a terragrunt run --all show one unit with three pending changes while the platform's change detection reported no changes and never offered an apply. When the summary and the logs conflict, trust the logs and inspect the individual planfiles.

Conditional skips interact badly with the same aggregation. Another customer gated a disaster-recovery module with skip = get_env("ENABLE_FAILOVER_INFRA", "false") != "true" ? true : false. Locally this did exactly what you'd expect; remotely the run failed with Plan operation failed — Error: Terraform plan operation generated no planfile for failover-infra unit. A skipped unit emits no planfile, and an orchestrator that enumerates units up front treats the missing file as a failure rather than a skip. If you gate units with skip, confirm how your runner handles units that produce no plan before relying on it.

Performance Optimization

Terragrunt's --dependency-fetch-output-from-state flag speeds up dependency resolution for S3 backends by reading state directly instead of invoking terraform output:

terragrunt run-all plan --dependency-fetch-output-from-state

This can significantly reduce execution time in large deployments.

Remote State Management

Automatic Backend Configuration

Terragrunt generates backend configurations automatically based on remote_state blocks, eliminating manual backend setup:

# Root configuration generates backend.tf
remote_state {
  backend = "s3"
  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }
  config = {
    bucket         = "my-tfstate-${get_aws_account_id()}-${get_aws_region()}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "my-tfstate-lock-${get_aws_account_id()}"
    s3_bucket_tags = {
      owner = "infra-team"
      name  = "Terraform State"
    }
  }
}

Limitations and Workarounds

Automatic backend creation has limitations:

S3 auto-created log buckets may lack stringent security defaults
disable_init can inadvertently disable all backend initialization
Fine-grained IAM or KMS key customization isn't fully supported
Azure backends don't support auto-creation; storage accounts must be pre-created

If you need tight security controls in production, it's worth managing the state infrastructure separately instead of letting Terragrunt auto-generate it.

Multi-Account State Management

For cross-account deployments, explicitly configure role assumptions:

remote_state {
  backend = "s3"
  config = {
    bucket         = "terraform-state-account-b"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    role_arn       = "arn:aws:iam::ACCOUNT_B:role/TerraformRole"
    region         = "us-east-1"
  }
}

Environment Management

Hierarchical Variable Inheritance

Terragrunt lets you define variables in layers, so you set broad defaults and then override them per environment:

# Root: common variables
inputs = {
  project_name = "myapp"
  team = "platform"
  enable_monitoring = true
}
 
# Region level: region-specific overrides
include "root" { path = find_in_parent_folders("root.hcl") }
inputs = {
  aws_region = "us-east-1"
}
 
# Environment level: environment-specific configuration
include "region" { path = find_in_parent_folders("region.hcl") }
inputs = {
  environment = "production"
  instance_count = 20
  enable_high_availability = true
}

Using locals for Computed Values

Locals enable complex computations and conditionals:

locals {
  environment = read_terragrunt_config(find_in_parent_folders("env.hcl"))
  region_config = read_terragrunt_config(find_in_parent_folders("region.hcl"))
 
  instance_type = local.environment.inputs.environment == "production" ? "t3.large" : "t3.medium"
  tags = merge(
    local.environment.inputs.common_tags,
    { Environment = local.environment.inputs.environment }
  )
}
 
inputs = {
  instance_type = local.instance_type
  tags = local.tags
}

Multi-Region and Multi-Account Patterns

Organizing infrastructure across regions and accounts:

environments/
├── us-east-1/
│   ├── region.hcl
│   ├── dev/
│   │   ├── terragrunt.hcl
│   │   ├── vpc/
│   │   └── eks/
│   └── prod/
├── eu-west-1/
└── ap-southeast-1/

Each region can have region-specific configurations while inheriting organization-wide defaults.

Deciding Your Unit-to-Workspace Mapping

If you run Terragrunt through any platform that has a workspace or stack concept, settle the unit-to-workspace mapping before you scale, because every option has a cost. A team adopting Terragrunt at enterprise scale (Azure, OpenTofu 1.10.6, Terragrunt 1.0.2, one unit per concern) hit the dilemma directly. Run-all with one workspace per environment kept the workspace count flat but gave up per-unit state and run features. One workspace per unit kept full granularity but multiplied into a workspaces-times-environments explosion; in their words, "manually managing workspaces, run triggers, and cross-workspace dependencies per environment is not sustainable." Their resolution was to manage the workspaces themselves as code through a provider, so the mapping scales with the unit count instead of being hand-maintained. Whichever direction you take, deciding late means migrating state.

Common Issues and Solutions

1. Performance Regressions (v0.50.15+)

Recent versions show significant slowdowns (15x in some cases) due to O(n²) complexity in locals evaluation:

Symptom: terragrunt run-all plan taking 8+ minutes for 30-50 modules

Solution:

Downgrade to v0.48.x or earlier if using affected versions
Avoid reading parent terragrunt configs repeatedly; cache results in locals
Consider upgrading to latest version when performance patches are released

2. Path Resolution Failures

Incorrect path references are a leading cause of Terragrunt issues:

# Incorrect - breaks when executed from different directories
locals {
  config_path = "./config.yaml"  # ❌ Relative path doesn't work
}
 
# Correct - uses Terragrunt execution context
locals {
  config_path = "${get_parent_terragrunt_dir()}/config.yaml"  # ✓
}

Key functions:

get_terragrunt_dir(): Current module's directory
get_parent_terragrunt_dir(): Parent directory in hierarchy
find_in_parent_folders(): Search up for named file

Git-dependent helpers are a separate trap. Two customers hit the same wall independently: get_repo_root() resolved fine on their laptops, then died in remote runs with Unsuitable value: value must be known. The cause is the same in both cases: remote runners typically upload a code tarball rather than performing a git clone, so there is no .git directory for the function to anchor on. The workaround one community member landed on: commit a unique marker file at the repo root and resolve paths with find_in_parent_folders("marker-file") instead. Treat any function that assumes a git checkout as non-portable between workstations and CI.

3. CI/CD Integration Challenges

GitHub Actions:

- uses: hashicorp/setup-terraform@v2
  with:
    terraform_version: 1.5.0
    terraform_wrapper: false  # Critical for Terragrunt

Atlantis Integration requires custom Docker image with Terragrunt:

FROM runatlantis/atlantis:latest
RUN curl -L https://github.com/gruntwork-io/terragrunt/releases/download/v0.50.0/terragrunt_linux_amd64 \
    -o /usr/local/bin/terragrunt && \
    chmod +x /usr/local/bin/terragrunt

Note: Terraform Cloud (TFC) doesn't natively support Terragrunt; organizations using TFC must build custom runners or wrapper scripts.

In Scalr's support queue, the most confusing CI failures trace back to version skew between the binary you tested with and the binary the pipeline actually ran. A platform engineering team we worked with at Scalr had pinned Terragrunt v0.93.4 and OpenTofu v1.10.6, yet every run died with nothing more than Error: Failed terraform init (exit 1). Output: Improperly formatted hcl files:. The giveaway was in their agent logs: the init phase was invoking terragrunt run-all hclvalidate and terragrunt run-all hclfmt --check --diff (commands removed in Terragrunt 0.90), so an older binary on the runner was executing that phase. The actual trigger was a single line, inputs = { vpc_environment = values.vpc_environment }: the values keyword was added in 0.78, recent versions accept it, and the stale binary rejected it as malformed HCL. When syntax that validates locally fails in CI, check which binary version executed each phase before you touch the code.

4. State Locking Race Conditions

Parallel execution can trigger state lock failures:

Error: Error locking state: Error acquiring the state lock
ConditionalCheckFailedException: The conditional request failed

Solution: Reduce parallelism for operations with shared dependencies:

# Sequential execution for safety
terragrunt run-all apply --terragrunt-parallelism 1
 
# Or configure in terragrunt.hcl
extra_arguments "serial_locking" {
  commands = ["apply", "destroy"]
  arguments = ["-parallelism=1"]
}

5. Cross-Account AWS Authentication

AWS assume role configurations require explicit setup:

remote_state {
  config = {
    role_arn = "arn:aws:iam::ACCOUNT_B:role/TerraformRole"
  }
}
 
terraform {
  extra_arguments "assume_role" {
    commands = get_terraform_commands_that_need_vars()
    env_vars = {
      AWS_ROLE_ARN = "arn:aws:iam::ACCOUNT_A:role/ResourceRole"
    }
  }
}

6. Memory Usage at Scale

Large deployments can consume excessive RAM due to repeated config parsing:

Symptoms: 16GB+ RAM usage for 50 modules

Solutions:

Use provider cache to avoid re-downloading: --terragrunt-provider-cache
Minimize deep include chains and repeated read_terragrunt_config() calls
Consider reducing module count through composition

Terragrunt vs Atmos Comparison

Both tools help you orchestrate Terraform, but they go about it very differently:

Aspect	Terragrunt	Atmos
Config Language	HCL	YAML
Structure	File hierarchy-based	Component + Stack-based
Inheritance	include blocks, find_in_parent_folders	Deep YAML merging, imports
Learning Curve	Moderate	Steeper (new paradigm)
Flexibility	High (for experienced teams)	High (structured approach)
Tooling Integration	Minimal; mostly CLI	Broader Cloud Posse ecosystem
Community	Large, mature	Growing, Cloud Posse-centric
Operational Overhead	Self-managed	Self-managed (more extensive)

Atmos is particularly strong for teams invested in Cloud Posse components and infrastructure patterns. Terragrunt is better for teams already using Terraform and wanting minimal tooling overhead.

Best Practices for 2026

1. Keep Configurations Simple and Shallow

Avoid deep include hierarchies. Maintain no more than 3 levels:

root.hcl (remote_state, terraform block)
  └── region.hcl (region-specific settings)
      └── terragrunt.hcl (module instantiation)

2. Use explicit find_in_parent_folders()

Always specify the file name:

include "root" {
  path = find_in_parent_folders("root.hcl")  # Never the bare find_in_parent_folders()
}

3. Minimize Locals Re-evaluation

Cache file reads to avoid repeated parsing:

locals {
  root_config = read_terragrunt_config(find_in_parent_folders("root.hcl"))
}
 
inputs = merge(
  local.root_config.inputs,
  { environment = "prod" }
)

4. Implement Provider Caching

For multi-module operations, use provider caching:

terragrunt run-all plan --terragrunt-provider-cache

5. Use Mock Outputs Carefully

Mock outputs help with planning but can mask dependency issues. Validate mocks match reality, and always set mock_outputs_allowed_terraform_commands = ["validate", "plan"] so an apply can never consume a placeholder:

dependency "vpc" {
  config_path = "../vpc"
 
  mock_outputs = {
    vpc_id = "vpc-mock123"  # Update when actual IDs change
  }
  mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}

6. Enforce Sequential Execution for Destructive Operations

Prevent race conditions and unexpected cascading destruction:

terragrunt run-all destroy --terragrunt-parallelism 1

7. Document Variable Inheritance

Clearly document where variables originate and can be overridden. Use comments:

# From root: project_name, team
# From region: aws_region, availability_zones
# Override here: instance_count, environment
inputs = {
  instance_count = 5
}

8. Monitor and Alert on Performance

Track terragrunt run-all execution times and alert on regressions. Performance degradation often indicates configuration or version issues.

9. Consider Platform Alternatives for Enterprise Use Cases

For organizations requiring:

Tight RBAC and governance
Integrated cost estimation and policy enforcement
Managed CI/CD and GitOps workflows
Unified environment promotion
Built-in team collaboration

Platforms like Scalr, Terraform Cloud, or Env0 give you these out of the box, so you carry less operational overhead than you would running Terragrunt yourself.

10. Test Module Compositions Independently

While Terragrunt orchestrates modules, test each module in isolation to catch issues early:

cd components/vpc
terraform init
terraform plan

This ensures modules remain reusable and independently testable.

Is Terragrunt worth it?

Terragrunt gives you DRY configuration patterns, dependency orchestration, and environment management that vanilla Terraform handles poorly at scale. It also adds a complexity layer of its own that you have to manage.

The teams that get the most out of it keep their configurations simple and documented, understand how their setup performs, adopt Terragrunt gradually instead of applying it to every project at once, watch for performance regressions while staying current with releases, and switch to a platform when the operational overhead stops being worth it.

For a lot of teams, Terragrunt hits the right balance between flexibility and structure. For others, a managed IaC platform is the simpler path. Scalr is free up to 50 runs per month if you want to test that path against your Terragrunt setup. If Terraform Cloud's lack of native Terragrunt support is what pushed you to look elsewhere, our guide to selecting a Terraform Cloud alternative walks through how the managed platforms compare. Weigh your team's needs, existing expertise, and operational constraints when deciding whether Terragrunt is the right fit.

Terragrunt runs the same way on top of either engine. If you are still choosing the underlying tool, see OpenTofu vs Terraform.

Frequently asked questions

What is Terragrunt and what problem does it solve?

Terragrunt is a thin wrapper around Terraform and OpenTofu that keeps configurations DRY at scale. Instead of repeating backend configs, variable definitions, and module blocks across dozens of environments, child configurations inherit from parent files via include blocks, and remote state backends are generated automatically.

Does Terraform Cloud support Terragrunt?

No, Terraform Cloud does not natively support Terragrunt. Organizations using TFC must build custom runners or wrapper scripts to execute Terragrunt commands.

Why do mock_outputs cause problems in Terragrunt?

If a dependency has not been applied yet, its mock_outputs can be consumed as real values, including during apply, which creates infrastructure with placeholder values baked in. Restrict mocks to read-only commands with mock_outputs_allowed_terraform_commands = ["validate", "plan"].

Why does Terragrunt work locally but fail in CI?

The two most common causes are version skew (a stale Terragrunt binary in the CI image rejecting syntax your local version accepts) and git-dependent functions like get_repo_root() failing because remote runners upload a code tarball with no .git directory. Pin and verify binary versions in CI, and use find_in_parent_folders() with a marker file instead of git-based path helpers.

How many levels of include should a Terragrunt hierarchy have?

Keep include chains to 2-3 levels maximum: for example root.hcl, region.hcl, then the unit's terragrunt.hcl. Deeper hierarchies obscure where settings originate and add HCL parsing overhead at scale.

How should Terragrunt units map to workspaces in an orchestration platform?

There are two main options: one workspace per environment running run-all (simpler, but you lose per-unit state and run features), or one workspace per unit (full granularity, but the workspace count multiplies by environment). Teams at scale typically manage the workspaces themselves as code via a provider so the mapping does not have to be hand-maintained.

About the author

Sebastian StadilCEO at Scalr

Sebastian Stadil is the CEO of Scalr with 15+ years of DevOps experience. He started with AWS in 2004 and advised early Microsoft Azure and Google Cloud.