Features

Documentation

Pricing

About

Get Started

All articles

Terraform Configuration Ingestion

Avoid duplicate Terraform code using Terraform variable files, modules or yaml/json configuration files.

Brendan ThompsonMay 2, 2022

Key takeaways

Duplicated Terraform code across environments can be eliminated by ingesting configuration three ways: variable files, higher-order modules, or YAML/JSON configuration files.
Variable files (.tfvars) require converting static assignments to input variables, then supplying a separate tfvars file per environment passed in via the CI/CD pipeline.
Turning code into a higher-order module lets you version and control what each environment consumes, and the module itself can be driven by variable or configuration files.
YAML or JSON config files, read with yamldecode or jsondecode, co-locate multiple projects and environments in one place but carry blast-radius, validation, and structure tradeoffs.

One problem I see pretty regularly with Terraform codebases is a lot of repeated code. That repeated code is usually there to deploy environments or multiple instances of the same codebase. Picture some networking and a virtual machine that hosts a web application. The code gets copied and pasted into different folders or repositories, with some (usually minor) configuration changed. If you had three environments, you now have three copies of the codebase to maintain and keep in sync, which puts extra and unnecessary strain on your engineers. So how would I fix this?

The answer might seem easy, but in my experience it ends up being a little more complicated. It starts with removing the repeated, often near-identical code. Realistically your non-production environments should be identical to production, just at a reduced scale. You can solve all of this by ingesting configuration into your Terraform codebase or module. You wouldn't keep three copies of your application's code, now would you?

In my mind there are three ways that this can be achieved:

Terraform Variable Files
Usage of higher-order modules
yaml/json configuration Files

Let's walk through each one to see how it solves the repeated-code problem. But first, let's write the code you'd use in the copy-and-paste scenario, then improve on it to fit each ingestion approach.

First we setup our providers.

provider.tf

provider "azurerm" {
  features {}
}

Next we will setup all our networking infrastructure, as you can see we are statically assigning all of our values here.

network.tf

resource "azurerm_resource_group" "this" {
  name     = format("rg-%s", local.name_suffix)
  location = "australiaeast"
}
 
resource "azurerm_virtual_network" "this" {
  name                = format("vn-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  address_space = ["10.0.0.0/23"]
}
 
resource "azurerm_subnet" "this" {
  name                 = format("sn-%s", local.name_suffix)
  resource_group_name  = azurerm_resource_group.this.name
  virtual_network_name = azurerm_virtual_network.this.name
 
  address_prefixes = ["10.0.1.0/24"]
}
 
resource "azurerm_public_ip" "this" {
  name                = format("pip-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  allocation_method = "Dynamic"
}
 
resource "azurerm_network_security_group" "this" {
  name                = format("nsg-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
}
 
resource "azurerm_subnet_network_security_group_association" "this" {
  subnet_id                 = azurerm_subnet.this.id
  network_security_group_id = azurerm_network_security_group.this.id
}
 
resource "azurerm_network_interface" "this" {
  name                = format("nic-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  ip_configuration {
    name                          = "config"
    subnet_id                     = azurerm_subnet.this.id
    private_ip_address_allocation = "Dynamic"
    public_ip_address_id          = azurerm_public_ip.this.id
  }
}

Create our naming suffix in the form of a local variable, SSH key, and the virtual machine.

main.tf

locals {
  name_suffix = "aue-dev-blt"
}
 
resource "tls_private_key" "this" {
  algorithm = "RSA"
  rsa_bits  = "4096"
}
 
resource "azurerm_linux_virtual_machine" "this" {
  name                = replace(format("vm-%s", local.name_suffix), "-", "")
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  size           = "Standard_F2"
  admin_username = "adminuser"
  network_interface_ids = [
    azurerm_network_interface.this.id
  ]
 
  admin_ssh_key {
    username   = "adminuser"
    public_key = tls_private_key.this.public_key_openssh
  }
 
  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }
 
  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "16.04-LTS"
    version   = "latest"
  }
}

Finally we will output our private key so that we can access the server!

outputs.tf

output "ssh_private_key" {
  value     = tls_private_key.this.private_key_openssh
  sensitive = true
}

Warning! Do not do this in any real world scenarios, this should be put into a secrets vault of some description!

How Do You Use Terraform tfvars Files to Remove Duplication?

To use tfvars files we first need input variables, so the first thing to change is removing all that static assignment and using variables instead!

variables.tf

variable "location" {
  type        = string
  description = <<EOT
  (Required)  The Azure region where the resources will be deployed.
  EOT
 
  validation {
    condition = contains(
      ["australiaeast", "australiasoutheast"],
      var.location
    )
    error_message = "Err: invalid Azure location provided."
  }
}
 
variable "environment" {
  type        = string
  description = <<EOT
  (Required)  The environment short name for the resources.
  EOT
 
  validation {
    condition = contains(
      ["dev", "uat", "prd"],
      var.environment
    )
    error_message = "Err: invalid environment provided."
  }
}
 
variable "project_identifier" {
  type        = string
  description = <<EOT
  (Required)  The identifier for the project, 4 character maximum
  EOT
 
  validation {
    condition     = length(var.project_identifier) <= 4
    error_message = "Err: project identifier cannot be longer than 4 characters."
  }
}
 
variable "network" {
  type = object({
    address_space = string
    subnets = list(
      object({
        role          = string
        address_space = string
      })
    )
  })
  description = <<EOT
  (Required)  Network details
  EOT
}

Above we have setup variables for our location, environment, project_identifier and the network. In a more real world scenario we would likely pass in a lot of the Virtual Machine configuration as well, but for the sake of keeping the code simple we will just do the bare minimum. With all those inputs now in place we will have to update the codebase to use them!

main.tf

locals {
  location_map = {
    "australiaeast"      = "aue"
    "australiasoutheast" = "aus"
  }
 
  name_suffix = lower(format(
    "%s-%s-%s",
    local.location_map[var.location],
    var.environment,
    var.project_identifier
  ))
}
 
resource "azurerm_resource_group" "this" {
  name     = format("rg-%s", local.name_suffix)
  location = var.location
}
 
resource "tls_private_key" "this" {
  algorithm = "RSA"
  rsa_bits  = "4096"
}
 
resource "azurerm_linux_virtual_machine" "this" {
  name                = replace(format("vm-%s", local.name_suffix), "-", "")
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  size           = "Standard_F2"
  admin_username = "adminuser"
  network_interface_ids = [
    azurerm_network_interface.this.id
  ]
 
  admin_ssh_key {
    username   = "adminuser"
    public_key = tls_private_key.this.public_key_openssh
  }
 
  os_disk {
    caching              = "ReadWrite"
    storage_account_type = "Standard_LRS"
  }
 
  source_image_reference {
    publisher = "Canonical"
    offer     = "UbuntuServer"
    sku       = "16.04-LTS"
    version   = "latest"
  }
}

As you can see above our locals block is a little more interesting now. We are using the location input variable to help select the shortname for the location which we will use for naming our resources. The suffix is now built up of our input variables which means its extremely dynamic, changing our environment for instance becomes a breeze!

network.tf

resource "azurerm_virtual_network" "this" {
  name                = format("vn-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  address_space = [var.network.address_space]
}
 
resource "azurerm_subnet" "this" {
  for_each = {
    for v in var.network.subnets :
    v.role => v.address_space
  }
 
  name                 = format("sn-%s-%s", local.name_suffix, each.key)
  resource_group_name  = azurerm_resource_group.this.name
  virtual_network_name = azurerm_virtual_network.this.name
 
  address_prefixes = [each.value]
}
 
resource "azurerm_public_ip" "this" {
  name                = format("pip-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  allocation_method = "Dynamic"
}
 
resource "azurerm_network_security_group" "this" {
  name                = format("nsg-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
}
 
resource "azurerm_subnet_network_security_group_association" "this" {
  for_each = azurerm_subnet.this
 
  subnet_id                 = each.value.id
  network_security_group_id = azurerm_network_security_group.this.id
}
 
resource "azurerm_network_interface" "this" {
  name                = format("nic-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  ip_configuration {
    name                          = "config"
    subnet_id                     = azurerm_subnet.this["iaas"].id
    private_ip_address_allocation = "Dynamic"
    public_ip_address_id          = azurerm_public_ip.this.id
  }
}

Our network configuration is now fed by the network input variable, this variable allows the creation of multiple subnets if we so desired!

Now that this is all in place how do we consume this codebase? We do that through the use of tfvar files, you would have a tfvar file for every environment or project that you wanted to build. An example of this is below:

dev.tfvars

location           = "australiaeast"
environment        = "dev"
project_identifier = "blt"
network = {
  address_space = "10.0.0.0/23"
  subnets = [
    {
      role          = "iaas"
      address_space = "10.0.1.0/24"
    }
  ]
}

prd.tfvars

location           = "australiaeast"
environment        = "prd"
project_identifier = "blt"
network = {
  address_space = "10.10.0.0/23"
  subnets = [
    {
      role          = "iaas"
      address_space = "10.10.1.0/24"
    }
  ]
}

We have two environments declared above. They completely change the configuration of the Terraform code without touching the code itself... no more copy and paste! Your CI/CD pipeline would then select the correct tfvars file and pass it into the Terraform run. Now a change to the codebase flows through your environments more easily, and when you're testing or developing in the lower environments you can be fairly confident that what works in development will work in production too. That gives your engineers more confidence in the codebase, and the business as well.

Why Turn Your Terraform Code Into a Higher-Order Module?

Now that we've defined input variables for our code, we can go one step further and turn that code into a module. This lets us strictly version and control which version of our code each environment consumes. We could drive the module with either variable files or configuration files. For this example, we'll statically assign the inputs.

First things first, we will create the module itself, we will do this locally.

$ mkdir -p $(pwd)/modules/app-server

And now the relevant files for us to store our code:

touch -p $(pwd)/modules/app-server/{main,network,inputs,outputs}.tf

One thing you might notice straight away is the lack of a providers.tf file, we actually do not need one inside our module at all when we call the module we will create an instance of the correct provider at that level and Terraform will sort it all out for us.

For the sake of not making this post 10,000 lines I won't put the code in for the module itself as it would be identical to the code in the Variable Files section, just put into the empty files we created just before. I will instead show how we call the module.

main.tf

provider "azurerm" {
  features {}
}
 
module "dev_app_server" {
  source = "./modules/app-server"
 
  location           = "australiaeast"
  environment        = "dev"
  project_identifier = "blt"
  network = {
    address_space = "10.0.0.0/23"
    subnets = [
      {
        role          = "iaas"
        address_space = "10.0.1.0/24"
      }
    ]
  }
 
}
 
output "ssh_private_key" {
  value     = module.dev_app_server.ssh_private_key
  sensitive = true
}

The interface for the module is simple, and it's satisfied by passing through everything we declared as input variables. Modules give you a lot of control over what the consumers of your code are doing. Here the module is defined locally, but nothing stops us from storing it in git or a module registry like Terraform Cloud.

Creating a module also gives our consumers more self-service. Everything they need to create an app-server is defined here, and they know it satisfies organisational requirements like security controls.

How Does yamldecode Feed YAML or JSON Configuration Into Terraform?

The final way to pass configuration into our Terraform code is with configuration files. These are json or yaml files that we ingest using the yamldecode or jsondecode functions, which gives us dot access to the attributes.

First off let's look at the example yaml file that we will use:

config.yaml

project:
  blt:
    environments:
      dev:
        location: australiaeast
        network:
          address_space: 10.0.0.0/23
          subnets:
          - role: iaas
            address_space: 10.0.1.0/24
      prd:
        location: australiaeast
        network:
          address_space: 10.1.0.0/23
          subnets:
          - role: iaas
            address_space: 10.1.1.0/24

A lot of the keys here will look familiar, since they represent the input variables we were previously passing into our Terraform code. The advantage of the configuration file is that it can house multiple projects in a single file, with all their environments alongside. We could even turn the configuration file into a module, or a provider!

Let's look at the code now to see what's changed and how we consume this configuration file.

To start we have our decreased variables.tf this now only holds what I call orientation variables as they allow our Terraform code to orientate itself within the configuration file.

variables.tf

variable "environment" {
  type        = string
  description = <<EOT
  (Required)  The environment short name for the resources.
  EOT
 
  validation {
    condition = contains(
      ["dev", "uat", "prd"],
      var.environment
    )
    error_message = "Err: invalid environment provided."
  }
}
 
variable "project_identifier" {
  type        = string
  description = <<EOT
  (Required)  The identifier for the project, 4 character maximum
  EOT
 
  validation {
    condition     = length(var.project_identifier) <= 4
    error_message = "Err: project identifier cannot be longer than 4 characters."
  }
}

Next we have our main.tf which holds the key to consuming our configuration file. Through the use of two local variables raw_config and config we setup our environment.

raw_config: we ingest the config file itself into Terraform and then we use the yamldecode function to convert the yaml into configuration that Terraform can easily work with.
config: using our orientation variables and passing those into some lookup functions we set our local.config to be scoped to the project blt for the dev environment.

main.tf

locals {
  raw_config = yamldecode(file("./config.yaml"))
  config = lookup(
    lookup(
      local.raw_config.project,
      var.project_identifier,
      null
    ).environments,
    var.environment,
    null
  )
 
  location_map = {
    "australiaeast"      = "aue"
    "australiasoutheast" = "aus"
  }
 
  name_suffix = lower(format(
    "%s-%s-%s",
    local.location_map[local.config.location],
    var.environment,
    var.project_identifier
  ))
}
 
...

The last file is our network.tf, I have also truncated the unchanged or boring portions.

network.tf

resource "azurerm_virtual_network" "this" {
  name                = format("vn-%s", local.name_suffix)
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
 
  address_space = [local.config.network.address_space]
}
 
resource "azurerm_subnet" "this" {
  for_each = {
    for v in local.config.network.subnets :
    v.role => v.address_space
  }
 
  name                 = format("sn-%s-%s", local.name_suffix, each.key)
  resource_group_name  = azurerm_resource_group.this.name
  virtual_network_name = azurerm_virtual_network.this.name
 
  address_prefixes = [each.value]
}
 
...

The parts to notice above are how we are now consuming the subnets object from our local.config. Below shows the difference between the for loop with our config and previously with the variables:

// Config
resource "azurerm_subnet" "this" {
  for_each = {
    for v in local.config.network.subnets :
    v.role => v.address_space
  }
  ...
}
 
// Variables
resource "azurerm_subnet" "this" {
  for_each = {
    for v in var.network.subnets :
    v.role => v.address_space
  }
  ...
}

The two look almost identical, so why bother with configuration files, I hear you ask? I've found them useful, especially in larger and more complex projects, because you can co-locate a lot of configuration in one place and then consume it from many. Say you pull all the networking out to be handled by a central networking component, but you still want information about that infrastructure. You can read it from the configuration file. Maybe you consume the configuration directly, or maybe it gives you enough to construct data sources. Either way it removes the dependency on the other workspace or codebase. Or say you have a cost center tag that needs to be on every resource in the blt project. Those resources might be scattered across any number of Terraform codebases, and with the configuration file you can update that value everywhere from a single place.

There are certainly some things to consider however when using this approach:

Blast Radius: something good can also be something bad. If someone was to mistakenly make a change to a line of configuration it would affect all code consuming that configuration.
Validation: as we are now consuming yaml or json we need to ensure some level of validation is done throughout the codebase to ensure that if an invalid (e.g. null) value is passed in the code doesn't break. I think this is actually a benefit in the long run, it just takes some upfront development cost.
Structure: now you must also consider the structure of your configuration and enforce this in some way for all your consumers or could potentially break things. I see this as being solved through easy to consume documentation and/or the use of json/ yaml schema files.

Which Terraform Configuration Ingestion Method Should You Use?

We've gone through three ways of ingesting configuration into Terraform code. The most basic is the tfvar file, which pre-fills the required input variables and passes different instances of those inputs into the same Terraform code. A bit more advanced is the higher-order module, where you wrap your Terraform code in a module and provide an interface for your consumers to fill. Last is the configuration file, such as a yaml file. It gives you the most flexibility, and you can pair it with the higher-order module too.

There's no rule that says "you must use Method A every time because reason X". Each one offers different benefits, and you should pick whichever suits your situation.

Hopefully that gives you a useful starting point for getting configuration into your Terraform code and cutting down on duplication. As always, if you have feedback or want to know more, feel free to reach out!

You can follow Brendan @BrendanLiamT on Twitter.

About the author

Brendan Thompsonsolutions engineer at Scalr

Brendan Thompson is a solutions engineer at Scalr, specializing in Terraform and cloud infrastructure.

Part of

CI/CD and GitOps for Terraform & OpenTofu

Comprehensive guide to building reliable CI/CD pipelines and implementing GitOps workflows for Terraform and OpenTofu infrastructure automation.

Sebastian Stadil

March 31, 2026