Optimizing resource provisioning and performance in large microservice-based applications is difficult, especially when applications span multiple datacenters. Existing global load balancing systems use simple capacity-based heuristics. We present Service Layer Traffic Engineering (SLATE), a system that globally optimizes the flow of requests for end-to-end application latency and cost deployed in multi-cluster (multi-zone/region/continent). SLATE exploits global knowledge of all clusters, the multi-hop nature of microservice application, request differentiation and latency modeling. SLATE outperforms existing global load balancing systems(Meta’s ServiceRouter and Google’s Traffic Director) by up to 18.3× in average latency and reduces egress cost by up to 11.6× with the same average latency achieved. Our system is completely transparent to the application and compatible with existing proxies, which makes it pluggable. The entire system can be pluggined into a running deployment seamlessly, with a single command.
There are four clusters are running in Oregon, Salt Lake City, Iowa, South Carolina, and Oregon and Iowa are suddenly overloaded. SLATE will calculate optimal request routing rules across all clusters, which minimizes latency considering how much each cluster is overloaded and inter-cluster network latency within a few seconds.
Different request types in each service can often have vastly different call sizes. In the example application below, the call size in MP→DB is much larger than the one in FR→MP. Additionally, DB service is degraded or unavailable in us-west1-b. SLATE automatically routes requests that can minimize cost as well as latency with the multi-hop knowledge considered. The green arrow is cost optimal path, utilizing the service call that contains less data on average. The red arrow is a greedy, single-hop locality fail over which is default fail over mechanism used in service meshes like Istio.
Not all requests to a service should be treated the same. In the example application, each service has two request types: /LU and /DA, where /DA is significantly more expensive than /LU. SLATE differentiates them and routes small number of /DA requests first, alleviating us-west1-b cluster more effectively and quickly.
We also conducted a survey in the Istio community on their Kubernetes multi-cluster deployment patterns to understand the need for optimizing request routing. This survey includes 18 questions regarding their current multi-cluster deployment with load balancing they are using, interest in cross cluster routing. Respondents are the companies who are deploying their services in multi-cluster (number of responses: 38). You can find the full result here.