How to save on Kubernetes costs


Have you ever been surprised how much you’ve seen on a monthly bill for your Kubernetes services? Well, you are not alone in it. Finally it is time to do something with it.

We will speak about tips where to look and how to find out savings for k8s in general, and also for solutions by specific cloud service providers such as AWS, eventually GCP.

Where to look for?

In fact, we have a Control plane and Worker Nodes.

Rightsizing the Control plane can be a chapter on its own. Regarding the number of the Nodes, k8s docs says:

A 5-member etcd cluster can tolerate two-member failures, which is enough in most cases. Although larger clusters provide better fault tolerance, the write performance suffers because data must be replicated across more machines.

In the case of specific cloud providers you don’t really have to care about it much.
AWS EKS  (GCP GKE) manages the master nodes for you: creates the master nodes to install container runtime, Kubernetes master processes, automatically scales and does backups when needed. Therefore AWS EKS is great for small teams to be able to focus on deployment of applications instead of taking care of the Control plane in detail.

When it comes to the size of the instances, we should be good with the m3.medium master nodes for AWS and n1-standard-1 master nodes for the GCP.
But it still depends on the type of workloads that the cluster should handle.

Now let’s look at the Worker nodes. To find out what makes our expenses higher we need to look at the cluster from different points of view.

Use of labels or tags

I personally like to start in a way opposite to approaches you may find on the web.

And that is by monitoring. If you have one huge cluster or several smaller ones, it is difficult to choose where to start from. Therefore with monitoring you can find out if higher costs are customer or environment specific, or if there is any other common pattern. For k8s we can leverage the use of the Labels. In the case of AWS you can use Cost allocation tags.

These can be assigned automatically to AWS resources in order to track AWS costs in detail. Thus, tags can help you manage, identify, organize, search for, and filter the resources by owner, Stack (environment), Cost center, Application, etc.

However AWS-generated cost allocation tags are not available for all the AWS services, so for other services can be used User-defined cost allocation tags simply by using AWS Tag Editor.

As of August 2022 all EC2 instances which join an EKS cluster are automatically tagged with an AWS-generated cost allocation tag.

Labels are equivalent in GCP and we have to differentiate them from the GCP Tags. Information about Labels is forwarded to the billing system.

Terminate unused resources

Downscaling (on development environments) when it is not used can be the first choice here. Sounds like we can’t save much here. So we’ll look at one example:

Assuming a team working 8 hours per day. Theoretically. During the week it makes 40 hours when used, but 128 hours when not used. That’s quite a lot. Therefore downscaling can also help us a lot.

Next we should discover the presence of some temporarily created resources intended for  testing. Some of them could remain forgotten and without proper tagging can be overlooked easily. But who likes housekeeping? Annoying…

Kubernetes Janitor is your choice to clean up clusters automatically. It can be used to delete temporary deployments or only specified resources. For these we can specify the time frame after which they will be automatically deleted. In the case of AWS, you can even remove EBS volumes that can be easily overlooked once detached. Additionally certain resources are only needed during business hours. We can afterwards reduce the number of pods used by applications. 

Kubernetes Downscaler allows for scheduling systems to scale in or out at defined times, for example after business hours or on weekends. It offers also additional options, such as forced extraordinary uptime.

Use auto-scaling

Kubernetes supports three types of autoscaling:

  • Horizontal Pod Autoscaler (HPA)
  • Vertical Pod Autoscaler (VPA)
  • Cluster Autoscaler

HPA watches the usage of pods, automatically resizing them to maintain a target level of utilization. In order to do that, we need a source of metrics. When scaling is based on CPU usage, we can leverage the standard metrics-server.

VPA adjusts the resource requests and limits of containers in the cluster. Keep in mind that VPA evicts (terminates) pods that need the new resource limits.

From the open-source tools, the Karpenter can be used.

With AWS, you need to run the Cluster Autoscaling which has two main functions:

  • It looks for pods that do not have enough resources and provides additional nodes
  • detects underutilized nodes and reschedule pods onto other nodes

Resource requests optimization

Using the Resource requests, Kubernetes sets the load on vCPU and memory. Requests then   reserve resources on worker nodes. There are very often differences between the requested and actually used resources. This difference, or an excess, is sometimes called a “Slack”. 

To find these excesses we can use the Kubernetes Resource Report tool.

This version only supports node costs for AWS EC2 (all regions, On Demand, Linux) and GKE/GCP machine types (all regions, On Demand, without sustained discount).

Use different purchase options for Kubernetes workloads

For the AWS and GCP providers, on-demand instances are the most expensive option. Therefore we can better use reserved instances or even Spot instances. Spot Instances are available at up to a 90% discount compared to On-Demand prices. They are the best choice for short jobs or stateless services that can be rescheduled quickly, without data loss.

To avoid interruption we can reserve Spot instances for a fixed period of time and by using the right workload management tools (Ocean from Spot by NetApp), which is not recommended to do under other circumstances.

AWS provides an even more exciting feature to run containers on, which is called Fargate. If you are familiar with AWS ECS (Elastic Container Service) then you know what I mean. Simply, with Fargate you deploy containers without need to manage the infrastructure that hosts them. You pay only for the resources consumed from the moment of downloading docker image until the pod's deletion, considering the fact that a one-minute minimum charge applies.
At first you must define a Fargate profile that specifies which pods will Fargate use when they're launched. For the pod specification just define the container image, how much vCPU and memory to provision for the pod.

GCP Cloud Run on a GKE cluster is the closest equivalent to Fargate used in AWS.

Cost Optimization Tools

Besides the above mentioned methods we can use some of the cloud optimizations tools or platforms. We can shortly mention some of them: - allows to allocate cloud costs at the level of namespace, label, and workload. It can easily integrate cluster metrics with Grafana to create dashboards to check the cost per workload.

gMaestro is cloud agnostic and runs on GCP, Azure, AWS, as well as OpenShift and K8s. Installation itself is very easy, using a simple command it will create a single Pod on your cluster. Based on the recommendations you will be able to quickly auto apply them by downloading provided yaml files, eventually patch Deployments, or do the same using UI. – savings are done through automating spot instances and reservations

Harness – a CI/CD solution that includes cloud management and BI tools for cost transparency and governance with limited automation

Apptio Cloudability – financial management solution for monitoring, allocating, and analyzing cloud costs

Cloudcheckr – tool that delivers cloud cost reporting, allocation, and optimization recommendations for manual implementation

Kubecost and Project Borealis

Kubecost allocates the total costs of a cluster across the applications. With this, application owners have an overview of their portion of shared Kubernetes costs. It additionally also provides rightsizing recommendations for both - the pods within the cluster and the cluster itself.

Project Borealis is now called Armory - Continuous Deployments as a Service. Besides offering continuous deployment, it can also simplify cost allocation within Kubecost. Implementing this functionality requires you to apply labels or annotations on your application.

To integrate Kubecost and Armory Borealis, you can leverage a GitHub Actions configuration that will:

  • call Kubecost API
  • leverage jq to extract the sizing recommendations from the response
  • patch recommendations into the application’s Kubernetes manifest
  • deploy the updated manifest using Armory Project Borealis

Once set this up you have a manual GitHub Actions workflow that reads your sizing recommendations from Kubecost, and deploys them using a Project Borealis pipeline to ensure your application remains healthy during the resize. GitHub Actions allows you to choose whether this workflow will run after every commit, on a schedule, or will be triggered manually.


All in all, if you divide the whole thing into smaller steps, it’s not so difficult to achieve savings.

Of course before applying changes, even these suggested by some of the tools mentioned, it requires just a bit of understanding what we are doing. But that’s obvious.

Besides looking from a cost point of view, whether changes applied meet expectations, a good practice is to observe performance of the cluster to avoid unwanted underprovisioning.

🙏 I’d like to thank my friend Anton Vorobiev for his valuable suggestions and comments.

Continue reading to other articles about cost saving:

How to reduce storage costs on Amazon S3?
Cost optimization in general - Why is the cloud sometimes so expensive?
How to become Chuck Norris of FinOps and grow your cloud practice?
Cloud FinOps framework - Enhanced cost optimization governance, recommendations and policies, helping you better understand and control your Cloud costs