
Terragrunt is a thin, language-agnostic wrapper around Terraform and OpenTofu that helps teams maintain DRY (Don't Repeat Yourself) configurations at scale. Rather than replacing Terraform, it enhances it by providing a framework for managing multiple modules, environments, and regions without the repetitive boilerplate that typically accompanies large-scale Terraform deployments.
At its core, Terragrunt addresses a fundamental problem: as infrastructure complexity grows across multiple environments and cloud accounts, managing identical configurations across dozens or hundreds of modules becomes tedious and error-prone. Terragrunt introduces patterns and tooling to abstract away this repetition while maintaining clarity and control.
Terragrunt centers on a few critical concepts:
terragrunt.hcl) that manage module instantiationTerragrunt solves several fundamental challenges that teams encounter when scaling Terraform:
In vanilla Terraform, managing the same module across development, staging, and production environments means repeating backend configurations, variable definitions, and module blocks. Terragrunt consolidates these through inheritance and templating:
# Parent configuration (env.hcl)
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "my-company-tfstate-${get_aws_account_id()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "my-company-tfstate-lock-${get_aws_account_id()}"
}
}Child configurations inherit this automatically, eliminating duplication across your entire infrastructure codebase.
Terragrunt enables hierarchical variable inheritance and environment-specific overrides. A single component (like a VPC module) can be deployed across multiple environments with minimal configuration differences:
# Child configuration inherits parent settings
include "root" {
path = find_in_parent_folders()
}
inputs = {
environment = "production"
instance_count = 10
cidr_block = "10.0.0.0/16"
}Dependencies between modules are complex in vanilla Terraform. Terragrunt provides explicit dependency management, automatically determining execution order and sharing outputs between modules without manual Terraform output commands:
dependency "vpc" {
config_path = "../vpc"
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
}Terragrunt automates backend configuration generation, including creation of S3 buckets and DynamoDB tables for state locking. This removes manual setup overhead and ensures consistent state storage patterns across your organization.
macOS (via Homebrew):
brew install terragruntLinux:
wget https://github.com/gruntwork-io/terragrunt/releases/download/v0.50.x/terragrunt_linux_amd64
chmod +x terragrunt_linux_amd64
sudo mv terragrunt_linux_amd64 /usr/local/bin/terragruntWindows: Download the binary from the Terragrunt releases page and add it to your PATH.
A typical Terragrunt project follows this organization:
project/
├── terragrunt.hcl # Root configuration
├── components/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── eks/
│ │ ├── main.tf
│ │ └── ...
│ └── rds/
│ └── ...
├── environments/
│ ├── dev/
│ │ ├── terragrunt.hcl
│ │ ├── vpc/
│ │ │ └── terragrunt.hcl
│ │ └── eks/
│ │ └── terragrunt.hcl
│ ├── staging/
│ └── production/
└── common/
├── common.hcl
├── region.hcl
└── env.hcl
The include block is Terragrunt's primary mechanism for code reuse. It allows child configurations to inherit and extend parent configurations:
# Child configuration
include "root" {
path = find_in_parent_folders()
}
# Additional configuration merged with parent
inputs = {
tags = {
Environment = "staging"
}
}find_in_parent_folders() searches up the directory tree for a configuration file, but this can be finicky with non-standard naming conventions. Best practice has shifted to explicitly specifying the parent file name:
# More explicit and reliable
include "root" {
path = find_in_parent_folders("env.hcl")
}While deep include hierarchies reduce repetition, they can obscure where settings originate and add HCL parsing overhead at scale. Keep include chains to 2-3 levels maximum:
# Root level (env.hcl)
remote_state { ... }
terraform { ... }
# Mid level (region.hcl)
include "root" {
path = find_in_parent_folders("env.hcl")
}
locals { ... }
# Child level (terragrunt.hcl)
include "region" {
path = find_in_parent_folders("region.hcl")
}Despite DRY principles, teams often copy-paste terragrunt.hcl files when include and locals aren't understood. This negates Terragrunt's benefits. Always leverage include and locals to avoid duplication.
Terragrunt analyzes dependency blocks to determine execution order when using run-all commands:
# Module: app/terragrunt.hcl
dependency "vpc" {
config_path = "../vpc"
}
dependency "rds" {
config_path = "../rds"
}
inputs = {
vpc_id = dependency.vpc.outputs.vpc_id
db_endpoint = dependency.rds.outputs.endpoint
}Terragrunt will ensure VPC and RDS are applied before the app module.
Mock outputs enable planning without applying dependencies first:
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
vpc_id = "vpc-mock123"
subnet_ids = ["subnet-mock1", "subnet-mock2"]
}
}However, be aware that mock outputs don't account for actual resource IDs, which can cause discrepancies during apply.
run-all plan and run-all apply commands execute across multiple modules, but understanding behavior is critical:
For safety, use flags to control parallelism and error handling:
# Sequential execution for critical operations
terragrunt run-all apply --terragrunt-parallelism 1
# Continue on errors with caution
terragrunt run-all plan --terragrunt-ignore-dependency-errorsTerragrunt's --dependency-fetch-output-from-state flag speeds up dependency resolution for S3 backends by reading state directly instead of invoking terraform output:
terragrunt run-all plan --dependency-fetch-output-from-stateThis can significantly reduce execution time in large deployments.
Terragrunt generates backend configurations automatically based on remote_state blocks, eliminating manual backend setup:
# Root configuration generates backend.tf
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "my-tfstate-${get_aws_account_id()}-${get_aws_region()}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "my-tfstate-lock-${get_aws_account_id()}"
s3_bucket_tags = {
owner = "infra-team"
name = "Terraform State"
}
}
}Automatic backend creation has limitations:
disable_init can inadvertently disable all backend initializationFor production environments requiring tight security controls, consider managing state infrastructure separately outside of Terragrunt's auto-generation.
For cross-account deployments, explicitly configure role assumptions:
remote_state {
backend = "s3"
config = {
bucket = "terraform-state-account-b"
key = "${path_relative_to_include()}/terraform.tfstate"
role_arn = "arn:aws:iam::ACCOUNT_B:role/TerraformRole"
region = "us-east-1"
}
}Terragrunt supports layered variable definition, allowing broad defaults with environment-specific overrides:
# Root: common variables
inputs = {
project_name = "myapp"
team = "platform"
enable_monitoring = true
}
# Region level: region-specific overrides
include "root" { path = find_in_parent_folders("root.hcl") }
inputs = {
aws_region = "us-east-1"
}
# Environment level: environment-specific configuration
include "region" { path = find_in_parent_folders("region.hcl") }
inputs = {
environment = "production"
instance_count = 20
enable_high_availability = true
}Locals enable complex computations and conditionals:
locals {
environment = read_terragrunt_config(find_in_parent_folders("env.hcl"))
region_config = read_terragrunt_config(find_in_parent_folders("region.hcl"))
instance_type = local.environment.inputs.environment == "production" ? "t3.large" : "t3.medium"
tags = merge(
local.environment.inputs.common_tags,
{ Environment = local.environment.inputs.environment }
)
}
inputs = {
instance_type = local.instance_type
tags = local.tags
}Organizing infrastructure across regions and accounts:
environments/
├── us-east-1/
│ ├── region.hcl
│ ├── dev/
│ │ ├── terragrunt.hcl
│ │ ├── vpc/
│ │ └── eks/
│ └── prod/
├── eu-west-1/
└── ap-southeast-1/
Each region can have region-specific configurations while inheriting organization-wide defaults.
Recent versions show significant slowdowns (15x in some cases) due to O(n²) complexity in locals evaluation:
Symptom: terragrunt run-all plan taking 8+ minutes for 30-50 modules
Solution:
Incorrect path references are a leading cause of Terragrunt issues:
# Incorrect - breaks when executed from different directories
locals {
config_path = "./config.yaml" # ❌ Relative path doesn't work
}
# Correct - uses Terragrunt execution context
locals {
config_path = "${get_parent_terragrunt_dir()}/config.yaml" # ✓
}Key functions:
get_terragrunt_dir(): Current module's directoryget_parent_terragrunt_dir(): Parent directory in hierarchyfind_in_parent_folders(): Search up for named fileGitHub Actions:
- uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
terraform_wrapper: false # Critical for TerragruntAtlantis Integration requires custom Docker image with Terragrunt:
FROM runatlantis/atlantis:latest
RUN curl -L https://github.com/gruntwork-io/terragrunt/releases/download/v0.50.0/terragrunt_linux_amd64 \
-o /usr/local/bin/terragrunt && \
chmod +x /usr/local/bin/terragruntNote: Terraform Cloud (TFC) doesn't natively support Terragrunt; organizations using TFC must build custom runners or wrapper scripts.
Parallel execution can trigger state lock failures:
Error: Error locking state: Error acquiring the state lock
ConditionalCheckFailedException: The conditional request failed
Solution: Reduce parallelism for operations with shared dependencies:
# Sequential execution for safety
terragrunt run-all apply --terragrunt-parallelism 1
# Or configure in terragrunt.hcl
extra_arguments "serial_locking" {
commands = ["apply", "destroy"]
arguments = ["-parallelism=1"]
}AWS assume role configurations require explicit setup:
remote_state {
config = {
role_arn = "arn:aws:iam::ACCOUNT_B:role/TerraformRole"
}
}
terraform {
extra_arguments "assume_role" {
commands = get_terraform_commands_that_need_vars()
env_vars = {
AWS_ROLE_ARN = "arn:aws:iam::ACCOUNT_A:role/ResourceRole"
}
}
}Large deployments can consume excessive RAM due to repeated config parsing:
Symptoms: 16GB+ RAM usage for 50 modules
Solutions:
--terragrunt-provider-cacheread_terragrunt_config() callsWhile both tools enhance Terraform orchestration, they differ significantly in approach:
| Aspect | Terragrunt | Atmos |
|---|---|---|
| Config Language | HCL | YAML |
| Structure | File hierarchy-based | Component + Stack-based |
| Inheritance | include blocks, find_in_parent_folders | Deep YAML merging, imports |
| Learning Curve | Moderate | Steeper (new paradigm) |
| Flexibility | High (for experienced teams) | High (structured approach) |
| Tooling Integration | Minimal; mostly CLI | Broader Cloud Posse ecosystem |
| Community | Large, mature | Growing, Cloud Posse-centric |
| Operational Overhead | Self-managed | Self-managed (more extensive) |
Atmos is particularly strong for teams invested in Cloud Posse components and infrastructure patterns. Terragrunt is better for teams already using Terraform and wanting minimal tooling overhead.
Avoid deep include hierarchies. Maintain no more than 3 levels:
root.hcl (remote_state, terraform block)
└── region.hcl (region-specific settings)
└── terragrunt.hcl (module instantiation)
Always specify the file name:
include "root" {
path = find_in_parent_folders("root.hcl") # Not just find_in_parent_folders()
}Cache file reads to avoid repeated parsing:
locals {
root_config = read_terragrunt_config(find_in_parent_folders("root.hcl"))
}
inputs = merge(
local.root_config.inputs,
{ environment = "prod" }
)For multi-module operations, use provider caching:
terragrunt run-all plan --terragrunt-provider-cacheMock outputs help with planning but can mask dependency issues. Validate mocks match reality:
mock_outputs = {
vpc_id = "vpc-mock123" # Update when actual IDs change
}Prevent race conditions and unexpected cascading destruction:
terragrunt run-all destroy --terragrunt-parallelism 1Clearly document where variables originate and can be overridden. Use comments:
# From root: project_name, team
# From region: aws_region, availability_zones
# Override here: instance_count, environment
inputs = {
instance_count = 5
}Track terragrunt run-all execution times and alert on regressions. Performance degradation often indicates configuration or version issues.
For organizations requiring:
Platforms like Scalr, Terraform Cloud, or Env0 provide these capabilities out-of-the-box, reducing operational overhead compared to self-managed Terragrunt.
While Terragrunt orchestrates modules, test each module in isolation to catch issues early:
cd components/vpc
terraform init
terraform planThis ensures modules remain reusable and independently testable.
Terragrunt is a powerful tool for managing Terraform at scale, offering DRY configuration patterns, dependency orchestration, and environment management that vanilla Terraform struggles with. However, it introduces its own complexity layer that requires careful management.
The best outcomes come from teams that:
For many organizations, Terragrunt strikes the right balance between flexibility and structure. For others, managed IaC platforms provide a more streamlined path forward. Evaluate your team's needs, existing expertise, and operational constraints when deciding whether Terragrunt is the right fit.
