
Operating Terraform at scale is hard. As a platform team, you have a lot of moving parts to watch if you want developers to work on their own without slowing each other down. A few questions to ask yourself as your Terraform usage grows:
All of these questions revolve around reporting and visibility. If you can answer all of these in a matter of seconds, then you're likely going to have smooth operations.

I generally break this down into three areas. Operational dashboards show you the pipeline. Reporting covers current and historical information. Monitoring then catches the things you couldn't see in the dashboards or reports.
In terms of operations, you want to be able to view how the Terraform runs are processing, but most importantly, from a single place. Rather than wasting time jumping from workspace to workspace, it's important to see the current runs in the context in which you are working.
If you're the owner of a specific application and operating within that environment, then you'll want to see only the runs for that environment.
If you're managing a platform that developers use for Terraform operations, then you'll want to see all of the runs across all environments and workspaces. That is where the Scalr run dashboards help.
The time this saves when searching for runs, assisting in an incident, or prioritizing runs is not trivial.
Now that you have a view into current operations, you also need to know how modules, providers, and Terraform versions are being used across the org. That picture helps you avoid technical debt, so you spend your energy building instead of maintaining.
Can you easily determine which Terraform versions, modules, or providers are used across your Terraform ecosystem? Do you know what source the developers are pulling modules or providers from? If the answer is no, then your organization is not likely operating at maximum efficiency.
Terraform Reports: Modules and Providers
With this information on hand, you can make sure your time goes to the areas that matter most to the business.
With operational dashboards in place and reports grading how well your Terraform platform is maintained, the next job is catching pipeline issues that aren't obvious. By streaming Terraform run events to a tool like Datadog, you will be able to quickly understand system-wide issues or build alerts to watch for run errors in some of the more critical workspaces.
As you grow you'll also want to understand how the pipeline is keeping up with the demand. This can be seen in the run dashboards, but assuming you are not watching this all day, you will want to be alerted of a queue backing up due to a lack of resources. Using Scalr to stream events into Datadog will not only allow you to alert on Terraform workspace level issues, but your overall pipeline.
New Feature: Datadog Integration
Adopting Terraform is not hard, but scaling it is. As you keep rolling it out, hold on to the three components that make scaling manageable:
Set this up from the start and you'll catch issues or incidents that would have slipped by before, keeping your team running smoothly.
