Productboard: 'Why we decided to migrate to AWS' (Webinar Recap)
Serverless solutions are a growing trend. More companies are deciding to move their IT from previous solutions (whether an on-premises datacenter or platform-as-a-service type) to the cloud.
Why are they doing that? What are the benefits of running your infrastructure in the cloud? How does the migration process look like from a technical point of view? What should you look out for?
We discussed these topics with Tomáš Fedor and Tomáš Růžička from productboard - a Czech project management software developer. This fast-growing technology business recently decided to move their workloads from Heroku (Platform as a Service) to Amazon Web Services (AWS Cloud). They were kind enough to agree to share their migration story in a webinar.
This recap is for those of you who want to learn more from productboard’s first-hand experience, but prefer reading to listening.
productboard's infrastructure was a typical monolith that they ran on the Heroku platform (PaaS). This solution was good enough for the beginning but as the business grew, they needed to start thinking long-term about both organisational and infrastructure scaling. With the company's growth and the acquisition of corporate customers came much higher demands on the security of their product.
From the infrastructure point of view, it became clear that their monolithic approach was not sustainable anymore. Updating, scaling or developing new features independently from one another was a real challenge. All these combined noticeably restricted the efficiency of their development.
“When you have multiple people touching very closed parts of the codebase, it restricts your development efficiency,” says Tomáš Růžička
Giving developers more control
The company put a great emphasis on improving the developers' experience. One of the main aims of the new design of their architecture was to give developers more control and more insight into what is happening in each separate part of the system. They decided to use the Infrastructure as Code principle which allows them to share knowledge easily. They started with IM rolls and users that allowed them to have more control over the level of access different people have to particular services.
The technologies implemented
The first step before migrating to AWS was to find equivalents of technologies they were using that are available on the new platform.
Except for basic ones such as EC2, S3, DynamoDB, and Route 53 they have also implemented RDS, which allows them to scale relational databases in the cloud with just a few clicks. The installed ACM (AWS Certificate Manager) for management of certificates and ElastiCache, a fully managed data store that allows for secure, fast performance.
First of all, they had to decide how to organise their infrastructure in AWS. They decided to split it into 3 main accounts: production, staging, and ops (operations).
“Production and staging are pretty simple, ops are used mostly for infrastructure purposes.” says Tomáš Fedor
Next, they implemented proper IAM (AWS IAM) that enables them to manage access to AWS services and resources securely.
They run a Kubernetes cluster in each account to have better control over what is running where. Additionally, they run all their main infrastructure services such as Grafana, Prometheus, aws-iam-authenticator, and DataDog in those Kubernetes clusters. Orchestrating their containers in Kubernetes allows them to build, test and deploy new features much faster now.
Another big question was what kind of CI were they going to use? Heroku offers a very convenient solution for PRs (Pull Requests). productboard's Dev team is currently using CircleCI, which does not offer them any PR so they need to find a different solution. They are considering Gitlab as it also offers much more than CI and has better support for Kubernetes. Security is one of the key factors for productboard and they are very impressed with the number of improvements that GitLab has done in this field.
Running the proof of concept first for a zero downtime switch
On the application layer, they had to do quite a lot of steps.
Before they even launched Kubernetes, they had a proof of concept service in production. This allowed them to explore what are the most optimal settings for running their infrastructure without disrupting the service. Once they decided on the final design, they started working on the monolith itself. The aim was to remove any Heroku dependencies. They went through the list and found alternatives of Heroku add-ons on AWS. They had a few Heroku-specific pieces of code that they feature-flagged in order to allow for running the app simultaneously on AWS and Heroku during the transition time.
Since they were not running any Docker containers in production just yet, they had to dockerize the application. During this process, it was key to configure all the environments settings, database connections and so on. Moving closer to the infrastructure part of the migration, they decided to use Helm charts so they could easily configure the components based on the metrics that they've seen in Heroku. This is where settings such as limits and thresholds were adjusted.
Then came the deployment process itself. In Heroku, a lot of it already comes built-in. For example, Heroku Ruby builtpack provided them with automatic migration and asset precompiling etc. It was important to find a counterpart in the new setup. They decided to go with the combination of CI and Init Container. Based on that, they figured out what the optimal setup for the migration is and how to upgrade all the pods that they will be running with zero downtime.
Before the final switch, they ran an image on the cluster on AWS and monitored what was happening, looking out for any issues that may have arisen. Once they were confident that their design was performing well on staging, they ran stress-tests and set performance benchmarks to see what the thresholds were, and how it all compared to the Heroku version.
As for the front-end part of our application, it was nothing more than a simple switch of a CDN.
Now that they have all this figured out, they're thinking about all of the different ways to split the monolith. This is where the fun part starts. The challenge ahead of them is figuring out how to tackle this. What is the right bounding context for all the different features?
At the moment, the whole project is in staging and they're planning to go to production at the end of July.
Was there anything during the process that surprised them?
“Yes. Since nobody on board has ever migrated a large-scale application running on production, we were not aware of all the little details that you might run into” - says Tomáš Růžička.
He further explains that Heroku fights for their customers and it is not easy to leave them when so many of their systems were completely dependant on it.
“In our case, Postgres, for example, couldn't be easily replicated with zero downtime. We actually needed to publicly expose our Redis instance inside AWS temporarily so that we could use our current application staging in Heroku but have the data already migrated to AWS. Exposed databases aren’t something you want to have in production”.
“For us, one of the most important benefits is that we have a scalable, secure solution that will help us deliver features to production fast.” - says Tomáš Fedor.
He also says that the total cost of the AWS-based solution in comparison to Heroku comes out way more efficient.
“With the company expanding and our infrastructure growing, it will be cheaper for us to run it in AWS. We will also be able to meet the demands of our corporate customers such as private clouds, private databases, security certifications and so on” - he adds.