by Marek Bartík

There are certain use cases where scaling horizontally based on cpu usage does not really work well. Let’s say you have a consumer worker pool running in Kubernetes. The consumers are pulling messages from a PubSub topic. When the queue is filling up we want more workers to process the messages quickly. On the other hand, when the queue is empty, we don’t want to pay for a big worker pool that sits idle. With PubSub Stackdriver metrics adapter running on GKE we can easily autoscale our worker pool for minimum latency and maximum cost-effectivity.

Autoscaling Deployments with External Metrics

This tutorial demonstrates how to automatically scale your GKE workloads based on metrics available in Stackdriver.

If you want to autoscale based on metric exported by your Kubernetes workload or a metric attached to Kubernetes object such as Pod or Node visit Autoscaling Deployments with Custom Metrics instead.

This example shows autoscaling based on number of undelivered messages in a Cloud Pub/Sub subscription, but the instructions can be applied to any metric available in Stackdriver.

Stackdriver Cloud Pub/Sub Monitoring


Provision GCP resources

We’ll be using terraform here to provision all necessary GCP resources. The cluster and nodepool’s definition is in file Make sure to follow all the steps in README to create a service account for terraform with all necessary permissions to create all the resources.

Then run:

terraform init
terraform plan -out planfile
terraform apply planfile

The PubSub topic will be named “echo”, the subscription to it “echo-read”. If you’ve run terraform apply successfully, this is provisioned already.

resource "google_pubsub_topic" "echo" {
name = "echo"

resource "google_pubsub_subscription" "echo" {
name = "echo-read"
topic = "${}"

ack_deadline_seconds = 20


Deploy Stackdriver metrics adapter

Make sure you have kubectl installed and you can access the cluster.

Deploy the stackdriver adapter:

kubectl create -f


Deploy the HPA and deployment

Here’s how the HPA’s definition look like:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
name: pubsub
minReplicas: 1
maxReplicas: 5
- external:
resource.labels.subscription_id: echo-read
targetAverageValue: "2"
type: External
apiVersion: apps/v1
kind: Deployment
name: pubsub

We’ll autoscale between 1–5 replicas, based on external metric|subscription|num_undelivered_messages from our echo-read subscription.

Target value is 2 undelivered messages. What does it actually mean though?

Example: let’s say my deployment is currently running 3 replicas and my queue grows from 6 to 8 undelivered messages.
I have 8/3=2.6 undelivered messages per replica.
That hits the threshold and triggers a scale-out to 4 replicas, which will have 8/4=2 undelivered messages per replica and that fits the desired targetAverageValue.
If I had 50 undelivered messages, I will have 5 replicas as it’s my maximum.
If I had 0 undelivered messages, I will have 1 replica as it’s my minimum.

The scaleTargetRef is a reference of the resource that I’m autoscaling. It’s a deployment that is defined in file pubsub-deployment.yaml.

Deploy the HPA with the deployment that is going to be autoscaled:

kubectl apply -f  pubsub-hpa.yaml
kubectl apply -f pubsub-deployment.yaml


Test it!

Publish some messages to the topic

for i in {1..200}; do 
gcloud pubsub topics publish echo --message=”Autoscaling #${i}”

And watch the cluster’s resources doing its magic

watch 'kubectl get pods; echo ; kubectl get hpa'
scale-out on saturated queue
scale-in on empty qeue


With this simple setup you have a pretty decent setup for horizontal autoscaling. The ugly thing is running the stackdriver adapter yourself, at least the HPA controller is part of GKE and is fully managed for you. 
The other cool thing about HPA is that you can use multiple metrics (even a combination of custom/external/cpu) in the same HPA resource and your deployment is going to be scaled based on either of them hitting a threshold.


Marek Bartík

Marek Bartík

Marek is a NoOps/NoCode enthusiast. Starting as a C++ programmer while doing masters in Computer Systems and Networks, growing up in the SysAdmin era, quickly realized communication and collaboration is the key. Nowadays he focuses on Cloud Architecting, microservices and Continuous Everything to solve business problems, not technical ones. Marek is passionate about DevOps and Cloud Native.

Posts by Topic