Professional Services

Istio: Multi-Cluster Federation and Hybrid Cloud

August 23, 2018

Your business is successful and you need to go global. How do you scale your app across multiple regions? How do you handle deployments to multiple clusters? How do you provide great user experience with low latency and resilience while maintaining low cost? One way is to use Kubernetes Federations. You deploy multiple clusters, join them in a federation and sync the API resources. Federations, however, are still in beta version and are not recommended for production usage. If you go up another layer, you can manage the multi-cluster control plane with a service mesh like Istio.

What is a Kubernetes Federation?

Kubernetes Federation is an open source project that focuses on making it easy to manage multiple clusters. It does so by providing
2 major building blocks:

Sync resources across clusters: Federation provides the ability to keep resources in multiple clusters in sync. For example, you can ensure that the same app deployment exists in multiple clusters.
Cross-cluster discovery: Federation provides the ability to auto-configure DNS servers and load balancers with backends from all clusters.

Some other use cases that federation enables are:

High availability: By spreading the load across clusters and auto configuring DNS servers and load balancers, federation minimises the impact of cluster failure.
Avoiding provider lock-in: By making it easier to migrate applications across clusters, federation prevents cluster provider lock-in.

ISTIO blog-1 Kubernetes Federation with clusters in SF, NY and Berlin. The architecture of the system. Image from CoreOS https://coreos.com/blog/kubernetes-cluster-federation.html

Federation is not helpful unless you have multiple clusters. Some of the reasons why you might want multiple clusters are:

Low latency: Having clusters in multiple regions minimises latency by serving users from the cluster that is closest to them.
Fault isolation: It might be better to have multiple small clusters rather than a single large cluster for fault isolation (for example: multiple clusters in different availability zones of a cloud provider).
Scalability: There are scalability limits to a single Kubernetes cluster (about 5000 nodes per cluster the last time I checked)
Hybrid cloud: You can have multiple clusters on different cloud providers or on-premises data centers.

Caveats

While there are a lot of attractive use cases for federation, there are also some caveats:

Increased network bandwidth and cost: The federation control plane watches all clusters to ensure that the current state is as expected. This can lead to significant network cost if the clusters are running in different regions on a cloud provider or on different cloud providers.
Reduced cross-cluster isolation: A bug in the Federation control plane can impact all clusters. This is mitigated by keeping the logic in the Federation control plane to a minimum. It mostly delegates to the control plane in Kubernetes clusters whenever it can. The design and implementation also errs on the side of safety and avoiding multi-cluster outage.
Maturity: The Federation project is relatively new and is not very mature. Not all resources are available and many are still alpha.

Hybrid cloud capabilities

Federations of Kubernetes Clusters can include clusters running on different cloud providers (e.g. Google Cloud, AWS), and on-premises (e.g. on OpenStack). Kubefed is the recommended way to deploy federated clusters.

Thereafter, your API resources can span different clusters and cloud providers.

Should I go with Kubernetes Federation?

Kubernetes Federation is currently considered alpha for many of its features, and there is no clear path to evolve the API to GA. I would not recommend using Kubernetes Federation for your production systems. Ingresses typically don’t work even when you are using a simple federation of k8s cluster from one public provider. Managing more ingresses with Hybrid cloud could be an awful pain.

Federation uses Public DNS and IP adresses with external LoadBalancer for cross-cluster service discovery, which is usually a quite expensive option. I didn’t find out how to make it work on a private network as one cluster does not see the other cluster’s k8s services, but pods only. Moreover Kubernetes Federation's project development seems rather stale. 143 stars on Github? Seriously?

Multi-cluster federation with Istio

You want the same features as Kubernetes Federation with a more stable and mature solution? Check out Istio’s multi-cluster support.

Multi-cluster functions by enabling Kubernetes control planes running a remote configuration to connect to one Istio control plane. Once one or more remote Kubernetes clusters are connected to the Istio control plane, Envoy can then communicate with the single Istio control plane and form a mesh network across multiple Kubernetes clusters.

This guide describes how to install a multi-cluster Istio topology using the manifests and Helm charts provided within the Istio repository.

I made a github repo for an easy provisioning of the whole system on GCP based on the previously mentioned guide.

Demo time

We will deploy the Bookinfo application to two GKE clusters. All the services will run in one cluster, only the Reviews-3 will run in the other. We leverage the GKE’s alias IPs feature, where pods in one cluster can communicate with pods in the other cluster, using just private IPs on a private network.

Product page requests will be load balanced across all the reviews’ versions, even though it runs on a different cluster, in a different zone, region, continent…

istio demo

As mentioned above, you typically don’t want your services to communicate cross-cluster to different zones/regions as it usually causes higher latency and network bandwidth fees. A typical use case would be if you had a central cluster close to your HQ - say in Frankfurt - and you had customers not only in Europe but in Brazil as well. You could deploy a smaller cluster to Brazil for public-facing frontend APIs and some subset of services and the rest (like payment-gateway APIs, databases, …) will run in Frankfurt only to save some costs.

You can autoscale the services in each cluster independently, depending on a local cluster’s traffic needs - there is no need for overprovisioning.

One disadvantage of this setup is that the Istio’s ingress-gateway is deployed as a LoadBalancer only in the master cluster. That means all traffic is being proxied through the master cluster, and even if your client is in Brazil, the request he makes goes to Frankfurt and back to Brazil. You could possibly avoid this by deploying more Istio masters.

I will play with this a little bit more in the future. I’d like to use Google https LoadBalancer with Istio ingress-gateway and have all the frontends deployed to all clusters.

Conclusion

Even nowadays with all the clouds, k8s and service meshes, multiple clusters are still hard. But it's 2018 and we can do better! Leveraging the advantages of having multi-cluster setups can benefit our business greatly. Kubernetes Federations might not be the perfect way to setup such an ecosystem, so take a look at Istio and see for yourself. It is definitely worth trying!

Source:

https://istio.io/docs/setup/kubernetes/multicluster-install/

FAQs

Q1: What is Kubernetes Federation and what are its two main functions?

Kubernetes Federation is an open-source project designed to make it easy to manage multiple clusters. Its two major building blocks are:

Sync resources across clusters: This allows you to keep resources, like application deployments, in sync across multiple clusters.
Cross-cluster discovery: This enables the automatic configuration of DNS servers and load balancers with backends from all connected clusters.

Q2: What are the reasons for running multiple Kubernetes clusters instead of one large one?

Running multiple clusters can provide several benefits:

Low latency: Serving users from the cluster geographically closest to them.
Fault isolation: Using multiple smaller clusters can contain failures and prevent a single large outage.
Scalability: It helps overcome the scalability limits of a single Kubernetes cluster (around 5000 nodes).
Hybrid cloud: It allows for managing clusters across different cloud providers and on-premises data centers.

Q3: What are the main drawbacks of using Kubernetes Federation?

The primary caveats of Kubernetes Federation are its immaturity (it is considered alpha), increased network bandwidth and cost from its control plane watching all clusters, and reduced cross-cluster isolation, where a bug in the control plane could impact all clusters.

Q4: Is Kubernetes Federation recommended for production environments?

No, it is not recommended for production systems. The project is considered alpha, and key features like Ingresses typically do not work well, especially in a hybrid cloud setup.

Q5: What alternative solution for managing multiple clusters does the article suggest?

Istio’s multi-cluster support is presented as a more stable and mature solution that provides features similar to those of Kubernetes Federation.

Q6: How does Istio's multi-cluster support work?

It works by enabling remote Kubernetes clusters to connect to a single, central Istio control plane. Once connected, the Envoy proxies in all clusters can communicate with the single control plane to form a mesh network that spans across the multiple Kubernetes clusters.

Q7: What is a major advantage of the described Istio multi-cluster setup on GKE?

A major advantage is the ability to leverage GKE’s alias IPs feature. This allows pods in one cluster to communicate with pods in another cluster using only private IPs on a private network, which avoids the expensive public DNS and external LoadBalancer approach used by Federation.

Q8: What is a disadvantage of the specific Istio multi-cluster topology described in the article's demo?

A disadvantage of the described setup is that the Istio ingress-gateway is deployed as a LoadBalancer in only the master cluster. This means all traffic is proxied through that single master cluster, which can introduce latency if a client in one region (e.g., Brazil) has their request routed through a master cluster in another region (e.g., Frankfurt).

Marek Bartík

Marek is a NoOps/NoCode enthusiast. Starting as a C++ programmer while doing masters in Computer Systems and Networks, growing up in the SysAdmin era, quickly realized communication and collaboration is the key. Nowadays he focuses on Cloud Architecting, microservices and Continuous Everything to solve business problems, not technical ones. Marek is passionate about DevOps and Cloud Native.