
Terragrunt is a thin, language-agnostic wrapper around Terraform and OpenTofu that helps teams maintain DRY (Don't Repeat Yourself) configurations at scale. It doesn't replace Terraform. It runs on top of Terraform and gives you a framework for managing many modules, environments, and regions without the repetitive boilerplate that comes with large-scale Terraform deployments.
As infrastructure grows across multiple environments and cloud accounts, keeping the same configuration in sync across dozens or hundreds of modules gets tedious and error-prone. Terragrunt gives you patterns and tooling to cut that repetition without losing clarity or control.
Terragrunt centers on a few critical concepts:
terragrunt.hcl) that manage module instantiationTerragrunt solves a few of the main problems teams run into when scaling Terraform across multiple environments:
In vanilla Terraform, running the same module across development, staging, and production means repeating backend configurations, variable definitions, and module blocks. Terragrunt pulls these together through inheritance and templating:
# Parent configuration (env.hcl)
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "my-company-tfstate-${get_aws_account_id()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "my-company-tfstate-lock-${get_aws_account_id()}"
}
}Child configurations inherit this automatically, so you don't repeat it across your whole infrastructure codebase.
Terragrunt lets you layer variables and override them per environment. You can deploy a single component (like a VPC module) across several environments with only small configuration differences:
# Child configuration inherits parent settings
include "root" {
path = find_in_parent_folders()
}
inputs = {
environment = "production"
instance_count = 10
cidr_block = "10.0.0.0/16"
}Dependencies between modules are awkward in vanilla Terraform. Terragrunt gives you explicit dependency management: it works out the execution order for you and shares outputs between modules without you running Terraform output commands by hand:
dependency "vpc" {
config_path = "../vpc"
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
}Terragrunt generates backend configuration for you, including the S3 buckets and DynamoDB tables for state locking. That takes the manual setup off your plate and keeps state storage patterns consistent across your organization.
macOS (via Homebrew):
brew install terragruntLinux:
wget https://github.com/gruntwork-io/terragrunt/releases/download/v0.50.x/terragrunt_linux_amd64
chmod +x terragrunt_linux_amd64
sudo mv terragrunt_linux_amd64 /usr/local/bin/terragruntWindows: Download the binary from the Terragrunt releases page and add it to your PATH.
A typical Terragrunt project follows this organization:
project/
├── terragrunt.hcl # Root configuration
├── components/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── eks/
│ │ ├── main.tf
│ │ └── ...
│ └── rds/
│ └── ...
├── environments/
│ ├── dev/
│ │ ├── terragrunt.hcl
│ │ ├── vpc/
│ │ │ └── terragrunt.hcl
│ │ └── eks/
│ │ └── terragrunt.hcl
│ ├── staging/
│ └── production/
└── common/
├── common.hcl
├── region.hcl
└── env.hcl
The include block is how Terragrunt does code reuse. It lets child configurations inherit and extend parent configurations:
# Child configuration
include "root" {
path = find_in_parent_folders()
}
# Additional configuration merged with parent
inputs = {
tags = {
Environment = "staging"
}
}find_in_parent_folders() searches up the directory tree for a configuration file, but it gets finicky once you stray from standard naming. Best practice has shifted to naming the parent file explicitly:
# More explicit and reliable
include "root" {
path = find_in_parent_folders("env.hcl")
}Deep include hierarchies cut repetition, but they make it harder to see where a setting came from and they add HCL parsing overhead at scale. Keep include chains to 2-3 levels maximum:
# Root level (env.hcl)
remote_state { ... }
terraform { ... }
# Mid level (region.hcl)
include "root" {
path = find_in_parent_folders("env.hcl")
}
locals { ... }
# Child level (terragrunt.hcl)
include "region" {
path = find_in_parent_folders("region.hcl")
}When teams don't understand include and locals, they tend to copy-paste terragrunt.hcl files instead, which throws away the whole point of Terragrunt. Use include and locals to avoid duplication.
Terragrunt analyzes dependency blocks to determine execution order when using run-all commands:
# Module: app/terragrunt.hcl
dependency "vpc" {
config_path = "../vpc"
}
dependency "rds" {
config_path = "../rds"
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
db_endpoint = dependency.rds.outputs.endpoint
}Terragrunt will ensure VPC and RDS are applied before the app module.
Mock outputs enable planning without applying dependencies first:
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
vpc_id = "vpc-mock123"
subnet_ids = ["subnet-mock1", "subnet-mock2"]
}
}The danger is that whatever commands you let the mock outputs reach will treat them as real values. One of our community members ran the first apply of a new stack before its dependencies had been applied, and the modules took the mocks as inputs. Infrastructure got created with placeholder values baked in. After a retry, their report read "looks like the second run worked," followed shortly by "I spoke too soon, second run also passed mock inputs to a module." The fix is to restrict mocks to read-only commands so an apply can never see them:
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
vpc_id = "vpc-mock123"
subnet_ids = ["subnet-mock1", "subnet-mock2"]
}
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}run-all plan and run-all apply commands execute across multiple modules, but understanding behavior is critical:
For safety, use flags to control parallelism and error handling:
# Sequential execution for critical operations
terragrunt run-all apply --terragrunt-parallelism 1
# Continue on errors with caution
terragrunt run-all plan --terragrunt-ignore-dependency-errorsRunning run-all through an orchestration platform adds a wrinkle: one "run" now wraps many planfiles, and the layer that aggregates them can disagree with the per-unit logs. A Scalr customer watched the console output of a terragrunt run --all show one unit with three pending changes while the platform's change detection reported no changes and never offered an apply. When the summary and the logs conflict, trust the logs and inspect the individual planfiles.
Conditional skips interact badly with the same aggregation. Another customer gated a disaster-recovery module with skip = get_env("ENABLE_FAILOVER_INFRA", "false") != "true" ? true : false. Locally this did exactly what you'd expect; remotely the run failed with Plan operation failed — Error: Terraform plan operation generated no planfile for failover-infra unit. A skipped unit emits no planfile, and an orchestrator that enumerates units up front treats the missing file as a failure rather than a skip. If you gate units with skip, confirm how your runner handles units that produce no plan before relying on it.
Terragrunt's --dependency-fetch-output-from-state flag speeds up dependency resolution for S3 backends by reading state directly instead of invoking terraform output:
terragrunt run-all plan --dependency-fetch-output-from-stateThis can significantly reduce execution time in large deployments.
Terragrunt generates backend configurations automatically based on remote_state blocks, eliminating manual backend setup:
# Root configuration generates backend.tf
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "my-tfstate-${get_aws_account_id()}-${get_aws_region()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "my-tfstate-lock-${get_aws_account_id()}"
s3_bucket_tags = {
owner = "infra-team"
name = "Terraform State"
}
}
}Automatic backend creation has limitations:
disable_init can inadvertently disable all backend initializationIf you need tight security controls in production, it's worth managing the state infrastructure separately instead of letting Terragrunt auto-generate it.
For cross-account deployments, explicitly configure role assumptions:
remote_state {
backend = "s3"
config = {
bucket = "terraform-state-account-b"
key = "${path_relative_to_include()}/terraform.tfstate"
role_arn = "arn:aws:iam::ACCOUNT_B:role/TerraformRole"
region = "us-east-1"
}
}Terragrunt lets you define variables in layers, so you set broad defaults and then override them per environment:
# Root: common variables
inputs = {
project_name = "myapp"
team = "platform"
enable_monitoring = true
}
# Region level: region-specific overrides
include "root" { path = find_in_parent_folders("root.hcl") }
inputs = {
aws_region = "us-east-1"
}
# Environment level: environment-specific configuration
include "region" { path = find_in_parent_folders("region.hcl") }
inputs = {
environment = "production"
instance_count = 20
enable_high_availability = true
}Locals enable complex computations and conditionals:
locals {
environment = read_terragrunt_config(find_in_parent_folders("env.hcl"))
region_config = read_terragrunt_config(find_in_parent_folders("region.hcl"))
instance_type = local.environment.inputs.environment == "production" ? "t3.large" : "t3.medium"
tags = merge(
local.environment.inputs.common_tags,
{ Environment = local.environment.inputs.environment }
)
}
inputs = {
instance_type = local.instance_type
tags = local.tags
}Organizing infrastructure across regions and accounts:
environments/
├── us-east-1/
│ ├── region.hcl
│ ├── dev/
│ │ ├── terragrunt.hcl
│ │ ├── vpc/
│ │ └── eks/
│ └── prod/
├── eu-west-1/
└── ap-southeast-1/
Each region can have region-specific configurations while inheriting organization-wide defaults.
If you run Terragrunt through any platform that has a workspace or stack concept, settle the unit-to-workspace mapping before you scale, because every option has a cost. A team adopting Terragrunt at enterprise scale (Azure, OpenTofu 1.10.6, Terragrunt 1.0.2, one unit per concern) hit the dilemma directly. Run-all with one workspace per environment kept the workspace count flat but gave up per-unit state and run features. One workspace per unit kept full granularity but multiplied into a workspaces-times-environments explosion; in their words, "manually managing workspaces, run triggers, and cross-workspace dependencies per environment is not sustainable." Their resolution was to manage the workspaces themselves as code through a provider, so the mapping scales with the unit count instead of being hand-maintained. Whichever direction you take, deciding late means migrating state.
Recent versions show significant slowdowns (15x in some cases) due to O(n²) complexity in locals evaluation:
Symptom: terragrunt run-all plan taking 8+ minutes for 30-50 modules
Solution:
Incorrect path references are a leading cause of Terragrunt issues:
# Incorrect - breaks when executed from different directories
locals {
config_path = "./config.yaml" # ❌ Relative path doesn't work
}
# Correct - uses Terragrunt execution context
locals {
config_path = "${get_parent_terragrunt_dir()}/config.yaml" # ✓
}Key functions:
get_terragrunt_dir(): Current module's directoryget_parent_terragrunt_dir(): Parent directory in hierarchyfind_in_parent_folders(): Search up for named fileGit-dependent helpers are a separate trap. Two customers hit the same wall independently: get_repo_root() resolved fine on their laptops, then died in remote runs with Unsuitable value: value must be known. The cause is the same in both cases: remote runners typically upload a code tarball rather than performing a git clone, so there is no .git directory for the function to anchor on. The workaround one community member landed on: commit a unique marker file at the repo root and resolve paths with find_in_parent_folders("marker-file") instead. Treat any function that assumes a git checkout as non-portable between workstations and CI.
GitHub Actions:
- uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
terraform_wrapper: false # Critical for TerragruntAtlantis Integration requires custom Docker image with Terragrunt:
FROM runatlantis/atlantis:latest
RUN curl -L https://github.com/gruntwork-io/terragrunt/releases/download/v0.50.0/terragrunt_linux_amd64 \
-o /usr/local/bin/terragrunt && \
chmod +x /usr/local/bin/terragruntNote: Terraform Cloud (TFC) doesn't natively support Terragrunt; organizations using TFC must build custom runners or wrapper scripts.
In Scalr's support queue, the most confusing CI failures trace back to version skew between the binary you tested with and the binary the pipeline actually ran. A platform engineering team we worked with at Scalr had pinned Terragrunt v0.93.4 and OpenTofu v1.10.6, yet every run died with nothing more than Error: Failed terraform init (exit 1). Output: Improperly formatted hcl files:. The giveaway was in their agent logs: the init phase was invoking terragrunt run-all hclvalidate and terragrunt run-all hclfmt --check --diff (commands removed in Terragrunt 0.90), so an older binary on the runner was executing that phase. The actual trigger was a single line, inputs = { vpc_environment = values.vpc_environment }: the values keyword was added in 0.78, recent versions accept it, and the stale binary rejected it as malformed HCL. When syntax that validates locally fails in CI, check which binary version executed each phase before you touch the code.
Parallel execution can trigger state lock failures:
Error: Error locking state: Error acquiring the state lock
ConditionalCheckFailedException: The conditional request failed
Solution: Reduce parallelism for operations with shared dependencies:
# Sequential execution for safety
terragrunt run-all apply --terragrunt-parallelism 1
# Or configure in terragrunt.hcl
extra_arguments "serial_locking" {
commands = ["apply", "destroy"]
arguments = ["-parallelism=1"]
}AWS assume role configurations require explicit setup:
remote_state {
config = {
role_arn = "arn:aws:iam::ACCOUNT_B:role/TerraformRole"
}
}
terraform {
extra_arguments "assume_role" {
commands = get_terraform_commands_that_need_vars()
env_vars = {
AWS_ROLE_ARN = "arn:aws:iam::ACCOUNT_A:role/ResourceRole"
}
}
}Large deployments can consume excessive RAM due to repeated config parsing:
Symptoms: 16GB+ RAM usage for 50 modules
Solutions:
--terragrunt-provider-cacheread_terragrunt_config() callsBoth tools help you orchestrate Terraform, but they go about it very differently:
| Aspect | Terragrunt | Atmos |
|---|---|---|
| Config Language | HCL | YAML |
| Structure | File hierarchy-based | Component + Stack-based |
| Inheritance | include blocks, find_in_parent_folders | Deep YAML merging, imports |
| Learning Curve | Moderate | Steeper (new paradigm) |
| Flexibility | High (for experienced teams) | High (structured approach) |
| Tooling Integration | Minimal; mostly CLI | Broader Cloud Posse ecosystem |
| Community | Large, mature | Growing, Cloud Posse-centric |
| Operational Overhead | Self-managed | Self-managed (more extensive) |
Atmos is particularly strong for teams invested in Cloud Posse components and infrastructure patterns. Terragrunt is better for teams already using Terraform and wanting minimal tooling overhead.
Avoid deep include hierarchies. Maintain no more than 3 levels:
root.hcl (remote_state, terraform block)
└── region.hcl (region-specific settings)
└── terragrunt.hcl (module instantiation)
Always specify the file name:
include "root" {
path = find_in_parent_folders("root.hcl") # Never the bare find_in_parent_folders()
}Cache file reads to avoid repeated parsing:
locals {
root_config = read_terragrunt_config(find_in_parent_folders("root.hcl"))
}
inputs = merge(
local.root_config.inputs,
{ environment = "prod" }
)For multi-module operations, use provider caching:
terragrunt run-all plan --terragrunt-provider-cacheMock outputs help with planning but can mask dependency issues. Validate mocks match reality, and always set mock_outputs_allowed_terraform_commands = ["validate", "plan"] so an apply can never consume a placeholder:
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
vpc_id = "vpc-mock123" # Update when actual IDs change
}
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}Prevent race conditions and unexpected cascading destruction:
terragrunt run-all destroy --terragrunt-parallelism 1Clearly document where variables originate and can be overridden. Use comments:
# From root: project_name, team
# From region: aws_region, availability_zones
# Override here: instance_count, environment
inputs = {
instance_count = 5
}Track terragrunt run-all execution times and alert on regressions. Performance degradation often indicates configuration or version issues.
For organizations requiring:
Platforms like Scalr, Terraform Cloud, or Env0 give you these out of the box, so you carry less operational overhead than you would running Terragrunt yourself.
While Terragrunt orchestrates modules, test each module in isolation to catch issues early:
cd components/vpc
terraform init
terraform planThis ensures modules remain reusable and independently testable.
Terragrunt gives you DRY configuration patterns, dependency orchestration, and environment management that vanilla Terraform handles poorly at scale. It also adds a complexity layer of its own that you have to manage.
The teams that get the most out of it keep their configurations simple and documented, understand how their setup performs, adopt Terragrunt gradually instead of applying it to every project at once, watch for performance regressions while staying current with releases, and switch to a platform when the operational overhead stops being worth it.
For a lot of teams, Terragrunt hits the right balance between flexibility and structure. For others, a managed IaC platform is the simpler path. Scalr is free up to 50 runs per month if you want to test that path against your Terragrunt setup. If Terraform Cloud's lack of native Terragrunt support is what pushed you to look elsewhere, our guide to selecting a Terraform Cloud alternative walks through how the managed platforms compare. Weigh your team's needs, existing expertise, and operational constraints when deciding whether Terragrunt is the right fit.
Terragrunt runs the same way on top of either engine. If you are still choosing the underlying tool, see OpenTofu vs Terraform.
