Here at Ably, we’ve engineered a serverless WebSocket platform that makes it easy to reliably handle realtime data distribution to millions of web and mobile apps at the edge. Terraform is central to our infrastructure management. We use it to define and manage resources across a wide variety of providers including AWS, Snowflake, PagerDuty, and Grafana.
The ability to use the same workflow and tooling to control different providers is a major selling point for Terraform, and one of the reasons we switched from AWS CloudFormation.
To provide remote state management, along with remote operations, so engineers were not deploying to production locally, we initially used HashiCorp’s Terraform Cloud (TFC) as our managed Terraform service provider.
As Ably grows, so does our infrastructure footprint, which leads to more and more Terraform workspaces and Terraform operations. We began noticing that runs were getting stuck in a queue. Digging into the details of our TFC plan, it only included one runner. This was sufficient when we first started using Terraform, but it was time for an upgrade.
HashiCorp does not provide detailed pricing information on their website, so we set up a call with their sales team to find out more. What came back in the quotes led us to evaluate other providers.
When evaluating different Terraform service offerings, the key areas we looked at were:
Thankfully, Scalr also ticked all the other boxes, offering a 99.9% uptime SLA, 2-hour ticket response time, and a very easy-to-use calculator on their pricing page.
There were two stages to our switch from TFC to Scalr:
The goal throughout this process was to minimize any disruption to engineers and allow them to use TFC until the last second. A feat we accomplished with around 20 minutes of downtime at the end of it all.
First things first, access control to AWS. Scalr supports AWS IAM role delegation, which is great as it meant we could use temporary credentials to give Scalr access and could remove the AWS IAM user we had to use for TFC.
We used the Scalr Terraform Provider to effectively make Scalr manage itself after an initial bootstrap step of manually creating a workspace and Scalr service account to facilitate this.
At the time of migration, Scalr only supported attaching one set of cloud credentials to an environment. As a result, we opted to have an environment per AWS account we deploy to. With the new provider configurations and the ability to have multiple sets of AWS credentials in the same environment, we would have likely organized our workspaces so they are grouped by project rather than AWS deployment account.
Now that all of the workspaces were available, it was time to move the state. Using both Scalr’s and TFC’s APIs, we created a bash script to migrate state across all workspaces in our repo:
The final preparation step was updating the backend configuration blocks of any workspace in our repo to use the new Scalr values. We did this in a separate branch to ensure a PR was ready to merge after migration.
On the go-live day, we locked all TFC workspaces, ran the script to migrate state once more, and executed plans on all Scalr workspaces one last time. After they had passed, we merged the PRs we had open and asked engineers to update any feature branches they were working on.
It all went without a hitch!
We have been using Scalr for over 3 months now, and we’re very happy with the experience. As well as being reasonably priced, we also have a great relationship with the Scalr team, who are always eager to hear product suggestions and help out with any questions.
Quite a few of the features we have asked for have now made their way into the product, such as git submodule support and pre-init custom hooks.
We look forward to all the features currently in development!
Ably gives you the capabilities to deliver the live experiences your customers demand without go-live delays, runaway costs, and unhappy users. Ably’s Serverless WebSocket platform reliably handles high-scale realtime data distribution to web and mobile apps at the edge, so engineering teams can focus on core product innovation without having to provision and maintain complex realtime infrastructure.
Developers at companies like HubSpot, Toyota, and Webflow use our APIs and global edge network to power things like business-critical live chat, food order delivery tracking, and document collaboration for more than 300 million people each month.
If this sounds like something you’d like to be part of, have a look at our open roles (all remote-first) and come join us.