OPA
OPA
September 26, 2024

Everything you need to know about Open Policy Agent (OPA) and Terraform

By
Ryan Fee

As organizations increasingly adopt Infrastructure as Code (IaC) methodologies, such as Terraform and OpenTofu, the need for policy enforcement mechanisms has grown in tandem. This is where the concept of "Policy as Code" comes into play, with Open Policy Agent (OPA) emerging as a powerful tool in this domain, especially when integrated with Terraform. In this blog post, we'll explore the synergy between OPA and Terraform, diving deep into what Policy as Code means, how OPA works, and its practical applications in Terraform workflows, such as Scalr’s pipeline.

Policy as Code

Policy as Code is an approach to defining and managing policies using the same practices and tools used for writing and maintaining software code. This method allows organizations to automate policy enforcement, making it an integral part of the development and deployment process.

Policy as code, a best practice in the DevOps world, reduces human error that could make organizations vulnerable.  By automating policy enforcement, the risk of manual mistakes in applying security rules is minimized. Implementing policy as code also results in faster developments and a better developer experience. Developers prefer to catch any issues as soon as possible in the development process and with policies integrated into the development workflow, security checks become part of the process, reducing delays in deployment. Lastly, from a maintainer perspective, policies can be quickly updated and rolled out to adapt to new compliance requirements or security threats using standard GitOps practices. Because it is code, it should be stored in a VCS repository where it is versioned, tested, and audited.

Open Policy Agent (OPA) Basics

OpenPolicy agent has become the de facto standard for policy as code across the industry. HashiCorp created a proprietary product named Sentinel to use with Terraform Cloud, but once Styra created OPA as an alternative open-source solution, OPA quickly became the leading solution. OPA was accepted by the CNCF in March of 2018. It has had enough adoption that now even HashiCorp supports it in their Terraform Cloud pipelines.

Key OPA Features

  • Declarative Language: OPA uses Rego, a purpose-built declarative language for writing policies.
  • Unified Policy Framework: OPA can be used to enforce policies in Terraform, OpenTofu, Kubernetes, CI/CD pipelines, API gateways, and more.
  • Context-Aware Decisions: OPA can make decisions based on arbitrary data, allowing for complex, context-aware policies.
  • High Performance: OPA is designed to be lightweight and fast, capable of evaluating thousands of policies per second.
  • Extensibility: OPA can be easily integrated into various systems and workflows through its APIs.

How it Works

At its core, OPA operates on three main components:

  1. Policies: Written in Rego, these define the rules and logic for making decisions.
  2. Data: This is the context against which policies are evaluated. It can include configuration files, user information, or any other relevant data. In this case, it is the plan generated from the Terraform configuration files.
  3. Queries: These are the questions asked of OPA, typically in the form "is this allowed?" or "what should this be?"

When a query is made to OPA, it evaluates the relevant policies against the provided data and returns a decision. This decision can be a simple yes/no or more complex structured data.

Terraform & Open Policy Agent

Terraform allows developers to define and provision infrastructure resources across various cloud providers. While Terraform itself focuses on resource management, it doesn't inherently provide robust policy enforcement capabilities. This is where OPA comes in, complementing Terraform by adding a layer of policy control to the infrastructure provisioning process.

The main use case for using OPA policy with Terraform is to evaluate the Terraform plan and check for anything that might be non-compliant. When a Terraform plan runs, it generates a plan file, which is what OPA can check against. An OPA policy can be written for any data that is present, or not, in the plan file. By evaluating OPA policy against the plan, developers will catch any issues prior to executing a Terraform apply, to ensure non-compliant resources are not created. The standard workflow is as follows:

  • Define Policies: Write an OPA policy in Rego that express your organization's requirements for infrastructure.
  • Integrate with Workflow: Integrate with Scalr, Terraform Cloud, or a CI/CD pipeline to run OPA checks at appropriate stages.
  • Provide Input: Feed plans, generated from the Terraform configuration files, into OPA as input data.
  • Evaluate Policies: Run the OPA policy to evaluate your policies against the Terraform using a json format.
  • Act on Results: Based on OPA's output, decide whether to proceed with applying changes or halt for policy violations.

Terraform Plan vs Terraform Run Information

When Scalr generates the data for the OPA policy to ingest, it breaks it out into two sections: tfrun and tfplan. 

The tfrun section contains all of the information about the actual run:

  • Is the source of the run from a VCS pull request, the Terraform CLI, the UI, etc.
  • If VCS, who was the commit author.
  • The environment, workspace, tags, etc.
  • What was the estimated cost of the resources being created.
  • Is it a plan only run or will it execute a Terraform apply as well.
  • And more…

By adding all of the extra tfrun context to the input file, users are able to create more advanced logic to determine what to check during the run. Users also have the option to include the policy as a pre-plan check as the tfrun data is available before a Terraform plan is executed.

The tfplan section contains all of the data that was generated from the Terraform code by running a Terraform plan, for example:

  • The region a resource is being deployed in.
  • The tags being applied to a resource.
  • What resources are being updated, deleted, created, or changed.
  • What modules and providers are being called
  • And more…

The tfrun section is more focused on how the run is happening, whereas the tfplan is more focused on what is being done in the Terraform code.

OPA Pre-Plan Checks

OPA policies can be executed in Scalr's pre-plan stage. At this stage, Scalr has the tfrun data available to it to have an OPA policy check against. For example, an organization decides that runs in a specific environment cannot be CLI-based. They want to ensure they catch this as early as possible to improve development speed, which is why they would implement it as a pre-plan check.

Example of a pre-plan policycheck in Scalr

OPA Post-Plan Checks

Post-plan checks are used to evaluate the data generated from a Terraform plan. The plan will generate data about the resources being created, changed, and deleted as part of the proposed plan. Post-plan checks help with security standards regarding the creation and deployment of provider resources. For example, they can be used to prevent users from deploying public S3 buckets or restrict which Terraform providers can be used.

Example of a post-plan policy check in Scalr

Open Policy Agent Impact Analysis

Scalr is the only product in the market that offers an impact analysis for OPA policies. An impact analysis is the equivalent of running a Terraform plan, but for OPA. When a pull request is opened against an OPA policy that is currently active in Scalr, Scalr will check the OPA code in the pull request against all existing workspaces and return the results back to the OPA maintainer. This lets the OPA maintainer know what would happen if they were to merge the new OPA policy code; is there a typo that breaks the code, will some workspaces fail based on changes to the OPA, or is everything working correctly and there is nothing to worry about. This helps greatly with operational excellence when working with OPA policy at scale.

Example policy impact analysis in Scalr

Enforcement Levels

In Scalr, the OPA maintainers have the option to set three different enforcement levels:

  • Advisory - If the policy is violated, users will get a warning message, but the Terraform run can continue.
  • Soft - If the policy is violated, the run will be stopped, and someone with the proper permissions can either approve or deny the run from continuing.
  • Hard - If the policy is violated, the run will be completely stopped, and the users will be notified of the reason.

The enforcement is set outside of the rego policy in a file name scalr-policy.hcl. Find out more about that here.

OPA Policy Examples

Below there are a few examples of OPA policies:

Pull Request Evaluation: Deny a run if the merged by and pull request author is the same person. This shows the power of having the tfrun section available to be able to evaluate the source of the run. Only the tfrun section needs to be imported in this case:

Rego file:

package terraform

import input.tfrun as tfrun

deny["Merged by and PR author are the same person"] {
    not is_null(tfrun.vcs)
    pr := tfrun.vcs.pull_request
    not is_null(pr)
    pr.merged_by == pr.author
}

Limit Module Usage: If a specific resource is being created, this policy will enforce that it must be created based on specific Terraform modules. This is helpful if a private module registry is being used and you want to ensure developers are using modules from the registry. Only the tfplan section needs to be imported in this case:

Rego file:

package terraform

import input.tfplan as tfplan


# Map of resource types which must be created only using module
# with corresponding module source
resource_modules = {
    "aws_db_instance": "terraform-aws-modules/rds/aws"
}

array_contains(arr, elem) {
  arr[_] = elem
}

deny[reason] {
    resource := tfplan.resource_changes[_]
    action := resource.change.actions[count(resource.change.actions) - 1]
    array_contains(["create", "update"], action)
    module_source = resource_modules[resource.type]
    not resource.module_address
    reason := sprintf(
        "%s cannot be created directly. Module '%s' must be used instead",
        [resource.address, module_source]
    )
}

deny[reason] {
    resource := tfplan.resource_changes[_]
    action := resource.change.actions[count(resource.change.actions) - 1]
    array_contains(["create", "update"], action)
    module_source = resource_modules[resource.type]
    parts = split(resource.module_address, ".")
    module_name := parts[1]
    actual_source := tfplan.configuration.root_module.module_calls[module_name].source
    not actual_source == module_source
    reason := sprintf(
        "%s must be created with '%s' module, but '%s' is used",
        [resource.address, module_source, actual_source]
    )
}

Limit Provider Usage: There might be some providers that are banned from being used in your organization and you want to prevent developers from executing code on them. This policy has a list of blacklisted providers that can be checked against the Terraform plan file:

Rego file:

package terraform

import input.tfplan as tfplan

# Blacklisted Terraform providers
not_allowed_provider = [
  "null"
]


array_contains(arr, elem) {
  arr[_] = elem
}

get_basename(path) = basename{
    arr := split(path, "/")
    basename:= arr[count(arr)-1]
}

deny[reason] {
    resource := tfplan.resource_changes[_]
    action := resource.change.actions[count(resource.change.actions) - 1]
    array_contains(["create", "update"], action)  # allow destroy action

    # registry.terraform.io/hashicorp/aws -> aws
    provider_name := get_basename(resource.provider_name)
    array_contains(not_allowed_provider, provider_name)

    reason := sprintf(
        "%s: provider type %q is not allowed",
        [resource.address, provider_name]
    )
}

Limit Cost: Scalr integrates with Infracost to generate an estimated cost when deploying resources. The cost is generated based on information in the Terraform plan file and then injected into the tfrun section of the output file:

Rego file:

package terraform

import input.tfrun as tfrun


deny[reason] {
    cost = tfrun.cost_estimate.proposed_monthly_cost
    cost > 5
    reason := sprintf("Plan is too expensive: $%.2f, while up to $5 is allowed", [cost])
}

Limit Instance Type: This policy is a little more advanced as it shows how you can make decisions across multiple providers. In this case, OPA will check for instance types across AWS, Azure, and GCP:

Rego file:

package terraform

import input.tfplan as tfplan

# Allowed sizes by provider
allowed_types = {
    "aws": ["t2.nano", "t2.micro"],
    "azurerm": ["Standard_A0", "Standard_A1"],
    "google": ["n1-standard-1", "n1-standard-2"]
}

# Attribute name for instance type/size by provider
instance_type_key = {
    "aws": "instance_type",
    "azurerm": "vm_size",
    "google": "machine_type"
}

array_contains(arr, elem) {
  arr[_] = elem
}

get_basename(path) = basename{
    arr := split(path, "/")
    basename:= arr[count(arr)-1]
}

# Extracts the instance type/size
get_instance_type(resource) = instance_type {
    # registry.terraform.io/hashicorp/aws -> aws
    provider_name := get_basename(resource.provider_name)
    instance_type := resource.change.after[instance_type_key[provider_name]]
}

deny[reason] {
    resource := tfplan.resource_changes[_]
    instance_type := get_instance_type(resource)
    # registry.terraform.io/hashicorp/aws -> aws
    provider_name := get_basename(resource.provider_name)
    not array_contains(allowed_types[provider_name], instance_type)

    reason := sprintf(
        "%s: instance type %q is not allowed",
        [resource.address, instance_type]
    )
}

Scalr maintains a repository of example OPA policies that can be used by the OpenTofu or Terraform community

Summary

The integration of Open Policy Agent with Terraform represents a powerful approach to implementing Policy as Code in infrastructure management. By leveraging OPA's flexible policy engine alongside Terraform, organizations can achieve a higher level of security, compliance, and governance when executing their Terraform configuration files.

By adopting OPA policy, teams can shift left on security and compliance, catching potential issues early in the development process. This not only reduces the risk of non-compliant infrastructure being deployed but also speeds up the development cycle by providing immediate feedback to developers to update their Terraform code.

Note: While this blog references Terraform, everything mentioned in here also applies to OpenTofu. New to OpenTofu? It is a fork of Terraform 1.5.7 as a result of the license change from MPL to BUSL by HashiCorp. OpenTofu is an open-source alternative to Terraform that is governed by the Linux Foundation. All features available in Terraform 1.5.7 or earlier are also available in OpenTofu. Find out the history of OpenTofu here.

Start using the OpenTofu & Terraform platform of the future.

A screenshot of the modules page in the Scalr Platform