TrademarkTrademark
Features
Documentation

How to use Terraform or OpenTofu to Manage Datadog

A simple guide to using Datadog and Terraform
Ryan FeeApril 25, 2024
How to use Terraform or OpenTofu to Manage Datadog
Key takeaways
  • The Datadog Terraform provider lets you define monitors, downtimes, dashboards, and SLOs as code, authenticating with a Datadog API key and App key.
  • Use the datadog_dashboard_json resource with Terraform for expressions to programmatically generate dashboard widgets, since the standard datadog_dashboard resource is not well-suited for dynamic widgets.
  • To manage Datadog at scale, use reusable modules, separate state files per environment, and split monolithic state by team, service, or resource type.
  • Scalr provides a built-in Datadog integration that streams run events, sends metrics through the Datadog agent, and forwards all audit logs for analysis.

Datadog is a monitoring and analytics platform that teams use to watch cloud infrastructure and application performance, and to manage incidents when something breaks. Once you have a lot of monitors and dashboards, clicking through the UI to manage them stops scaling. Defining them as code with Terraform or OpenTofu fixes that.

The Datadog Terraform provider lets you define monitoring resources in the same code that provisions your infrastructure. This guide walks through how to use the provider, and then covers the integrations Scalr has with Datadog for monitoring your Terraform runs.

Why Use the Datadog Terraform Provider?

A few reasons to manage Datadog through the provider rather than the UI:

  1. Infrastructure as Code (IaC): By defining monitoring resources in Terraform configuration files, you treat monitoring infrastructure as part of your overall infrastructure codebase. This promotes consistency, reproducibility, and version control.
  2. Simplified Workflow: With Terraform, you can automate the provisioning, updating, and deletion of monitoring resources, reducing manual intervention and potential errors.
  3. Scalability: As your infrastructure grows, managing monitoring resources manually becomes cumbersome. Terraform's declarative syntax allows you to scale your monitoring setup effortlessly.
  4. Integration: Datadog's Terraform Provider works alongside other Terraform-supported providers, enabling you to incorporate monitoring into your existing infrastructure provisioning workflows, like an AWS account.

Getting Started

To start using the Datadog Terraform or OpenTofu Provider, follow these steps:

  1. Install Terraform or OpenTofu: If you haven't already, download and install Terraform from the official website ( https://developer.hashicorp.com/terraform or https://opentofu.org/ ).
  2. A Datadog account, which can be created here: https://www.datadoghq.com/
  3. Configure your Datadog API Key: Obtain your Datadog API key from the Datadog dashboard. This key will be used by Terraform to authenticate with Datadog.
  4. Configure Datadog App Key: Obtain your Datadog App key from the Datadog dashboard. This key will be used by Terraform to authenticate with Datadog.

Using the Datadog Provider: Examples

Define Required Providers and Provider Configuration

Start by configuring the Datadog Terraform provider in your Terraform code. Open your Terraform configuration file (commonly named main.tf) and add the following block:

Replace "your-datadog-api-key" with your Datadog API key and "your-datadog-app-key" with the application keys you generated earlier. The latest version of the official Datadog Terraform documentation can be found in the registry here.

Example: Create a Datadog Monitor

Here's a simple example demonstrating how to create a basic monitor using the Datadog Terraform Provider:

In this example, we define a monitor that triggers when the 5-minute load average on a host named "host0" exceeds 2.0.

Example: Create Downtime

Here's a simple example demonstrating how to set downtime in Datadog using the Datadog Terraform Provider:

Example: Pull a Monitor Datasource

In this example, we'll use the datadog_monitor datasource to pull details about the monitor that was created in a previous step:

This will return details about the monitor such as the message that the monitor has set as well as the thresholds.

Execute Terraform or Tofu Commands

Once you have created your Terraform or Tofu code for the Datadog resources or datasources, you can then execute it with the following commands:

For Terraform:

For OpenTofu:

Terraform or Tofu will initialize the Datadog provider and apply the changes to your Datadog environment. Upon a successful Terraform apply, the state file will be created.

Best Practices and Advanced Usage

Variables and Dynamic Configurations

Utilize Terraform variables to make your configurations more dynamic. Instead of hardcoding values, use variables to create reusable and flexible scripts.

These variables can be defined as environment variables to make it more dynamic and avoid having secrets in the code.

For example, you can specify the following before running the Terraform init to set the Datadog API key:

export TF_VAR_api-key=<your-api-key>

Remote State Management

Consider using remote state management to store your Terraform state files securely. Services like Scalr or AWS S3 can be configured as remote backends to store state files. Here is an example of connecting to Scalr:

Find out more about remote state management here.

Provider Summary

That covers the basic steps to configure the Terraform Datadog provider. The provider supports a lot more than monitors and downtimes: roles, logs, and integrations with clouds like AWS. The library.tf documentation for the Terraform Datadog provider is a good place to look for those resources and more examples.

Once monitoring lives in your Terraform workflow, it gets versioned, reviewed, and rolled out the same way as the rest of your infrastructure.

Scalr's Integration with Datadog

Scalr, a Terraform Automation and Collaboration platform, provides a built-in integration with Datadog. Scalr is featured in the Datadog catalog, so connecting the two products is straightforward. The sections below cover what the integration sends to Datadog and how to set each part up.

Events

The Scalr to Datadog integration for events can stream event details for Terraform and OpenTofu runs executed in Scalr. Datadog users can build reports based on the source of the run, whether it was from the Terraform CLI, a VCS provider like GitHub, or manually executed through the UI. Users can also track the result of the run, the execution time, and much more.

A screenshot of a Datadog Dashboard populated with metrics from Scalr

The events are sent through the Datadog API integration, see the official documentation here.

Metrics

Scalr will send metrics to Datadog for in-depth analysis and reporting such as queued runs, queue state, the number of environments, and workspace count. These metrics are visualized in their out-of-the-box dashboard to help correlate deployments with other infrastructure changes and to track trends within your pipeline. The metrics functionality is an agent-based integration, which means you must use the Datadog agent. See how to install the Datadog agent and enable the Scalr integration here.

Audit Logs

Scalr sends all of its audit logs to Datadog for further analysis. Audit logs allow you to get insights into all actions taken, who performed the action, how it was done, and more. The audit log feature can use the same Datadog connection that is used for events or a new one can be created.

For example, you may want to know how and when a Terraform run was discarded in the Scalr pipeline. Scalr will send the following data to Datadog:

Summary

There are two ways Datadog fits into a Terraform or OpenTofu setup. You can use the provider to manage Datadog itself, and you can use the Scalr integration to watch your Terraform and Tofu runs from inside Datadog. Most teams end up doing both.

For further reading, see our guide to monitoring Terraform runs in Datadog.

Advanced Usage: Dashboards, SLOs, and Resource Mapping

Defining a Dashboard

You can define entire dashboards in code. A common challenge, however, is dynamically generating widgets. The standard datadog_dashboard resource is not well-suited for this.

The solution is to use the datadog_dashboard_json resource, which accepts a JSON string. This allows you to use Terraform's for expressions to programmatically generate your widget definitions.

# variables.tf
variable "widget_configurations" {
  type = map(object({
    title = string
    query = string
  }))
  default = {
    "cpu_usage" = {
      title = "CPU Usage",
      query = "avg:system.cpu.user{*}"
    },
    "memory_usage" = {
      title = "Memory Usage",
      query = "avg:system.mem.used{*}"
    }
  }
}
 
# main.tf
resource "datadog_dashboard_json" "programmatic_dashboard" {
  dashboard = jsonencode({
    title       = "Programmatically Generated Dashboard"
    layout_type = "ordered"
    widgets = [
      for key, config in var.widget_configurations : {
        definition = {
          type  = "timeseries"
          title = config.title
          requests = [
            {
              q = config.query
            }
          ]
        }
      }
    ]
  })
}

Defining a Service Level Objective (SLO)

SLOs are worth keeping in code so they get reviewed like anything else. This example defines a 99.9% availability SLO based on the ratio of successful API requests to total requests.

resource "datadog_service_level_objective" "api_availability" {
  name        = "API Request Availability"
  type        = "metric"
  description = "99.9% of all API requests should be successful (non-5xx)."
  
  query {
    numerator   = "sum:trace.http.request.hits{env:prod,service:core-api,!status_code:5xx}.as_count()"
    denominator = "sum:trace.http.request.hits{env:prod,service:core-api}.as_count()"
  }
 
  thresholds {
    target    = 99.9
    timeframe = "30d"
    warning   = 99.95
  }
  
  tags = ["service:core-api", "env:prod", "slo:availability"]
}

Scaling Your Datadog Terraform Configuration

As your usage grows, managing a single, monolithic Terraform configuration becomes a bottleneck. To manage Datadog at scale, consider these additional best practices:

  1. Use Modules: Encapsulate common patterns into reusable modules. For example, create a "standard service" module that bundles monitors for latency, error rate, and saturation. This promotes consistency and reduces code duplication.
  2. Separate Environments: Use a directory-based structure to maintain separate state files for each environment (e.g., dev, staging, prod). This provides strong isolation and prevents changes in one environment from impacting another.
  3. Split Your State: The biggest challenge at scale is a slow, monolithic Terraform state file. Split your state into smaller, more manageable units. Common strategies include splitting by team, by service, or by Datadog resource type. This dramatically improves performance and reduces the blast radius of any single change.

Datadog Product to Terraform Resource Mapping

The following table serves as a quick reference for mapping common Datadog products to their corresponding Terraform resource types.

Datadog Product Primary Terraform Resource
Monitors datadog_monitor
Dashboards datadog_dashboard, datadog_dashboard_json
Synthetic Tests datadog_synthetics_test
Service Level Objectives (SLOs) datadog_service_level_objective
User Management datadog_user
Role-Based Access Control datadog_role
Log Indexes & Pipelines datadog_logs_index, datadog_logs_custom_pipeline

Table mapping Datadog features to corresponding Terraform provider resources

Key Sources

Terraform Datadog Provider Documentation: registry.terraform.io/providers/DataDog/datadog/latest/docs

Datadog API Documentation: docs.datadoghq.com/api/latest/

Terraform Documentation: developer.hashicorp.com/terraform/docs

Terraform Registry (Datadog Modules): registry.terraform.io/search/modules?q=datadog

Datadog Provider GitHub Repository: github.com/DataDog/terraform-provider-datadog

About the author
Ryan Feedirector of platform engineering at Scalr
Ryan Fee is the director of platform engineering at Scalr, with over 15 years of experience improving infrastructure experiences at companies large and small.