UIUC Service Layer Networking Workshop

Date: Thursday October 26th, 2023

Location: Virtual (Zoom)

About the Workshop

The workshop will be held virtually and will discuss ongoing research related to microservices and service layer networking that is being conducted in Prof. Brighten Godfrey's and Prof. Radhika Mittal's research group at UIUC. The workshop invitees are from different sectors of industry, including service mesh products, cloud providers, and cluster operators. The goal of the workshop is to get industry feedback on the ongoing research, and seek potential avenues of collaborations and technology transfer. We have provided a detailed agenda below:

Agenda

Time (Pacific Timezone) Description Presenter
11:00-11:10 PST Overview Brighten Godfrey / Radhika Mittal
11:10-11:40 PST Blackbox Request Tracing for Modern Cloud Applications
PDF (under review)KubeCon'22 TalkSlides
Sachin Ashok
Monitoring and debugging modern cloud-based applications is challenging since even a single API call can involve many inter-dependent distributed microservices. To provide observability for such complex systems, distributed tracing frameworks track request flow across the microservice call tree. However, such solutions require instrumenting every component of the distributed application to add and propagate tracing headers, which has slowed adoption. This project explores whether tracing requests can be achieved without any application instrumentation, which we refer to as request trace reconstruction. We present TraceWeaver, an optimization framework that incorporates readily available information from production settings (e.g., timestamps, call graphs) to reconstruct request traces with usefully high accuracy. Evaluation with (1) benchmark microservice applications and (2) a production microservice dataset demonstrates high accuracy. We evaluate use cases for TraceWeaver, including A/B testing and finding performance outliers, showing effective results. Finally, we discuss potential future approaches which can aid in improving accuracy and ease of adoption of TraceWeaver.
11:40-12:10 PST SLATE: Service Layer Traffic Engineering Gangmuk Lim / Aditya Prerepa
Optimizing resource provisioning and performance in large microservice-based applications is difficult. Although container schedulers are part of the picture, another key component is request routing, which controls real-time assignment of requests to microservices and the network communication among them. Today's request routing strategies are simplistic, but this hides a subtly tricky job, especially with deployments spanning regions or clusters, since routing decisions directly affect service load and latency, bandwidth cost, and tradeoffs among these, all varying in real time. In this talk, we will show our system, SLATE, service layer traffic engineering: a system that automatically optimizes the flow of requests in microservice-based clusters. We will demonstrate two specific use cases, (1) latency optimization when one cluster is overloaded, (2) bandwidth cost optimization when microservices span multiple clusters in different regions. These problems require multi-hop decision of where requests travel cross-cluster in application execution graph, how much should be routed across clusters, which cluster they should be routed to and which subset of requests should be routed to minimize cost and latency. It will showcase that such a system has the opportunity to significantly improve request latency and cost, and identify the challenges in building such a system.
12:10-12:30 PST Multi-party Load Balancing in the Cloud Talha Waheed
A typical cloud cluster is shared by multiple applications and services, where each service employs an independent load balancer to direct incoming requests to service instances depending on their load. These load balancers, operated individually by multiple parties (services) on a shared cluster, indirectly influence one another. We highlight the challenges associated with such multi-party load balancing. We present an empirical scenario where independent multi-party load-balancing decisions lead to 33% under-utilization and 12x higher request processing latency than what can ideally be achieved. We then show how simple techniques that leverage cross-service information can produce a near-optimal outcome for the above scenario. However, they break fairness guarantees, producing a less desirable outcome in a different scenario, and suffer from slow convergence. Our results spotlight multi-party load balancing as an open problem in need for a new solution. We outline potential directions for future research, aiming for a solution that combines the benefits of centralized approach (e.g., fast convergence and multi-objective optimization) with those of decentralized techniques (e.g. faster response, scalability, and service-specific customizability).
12:30-12:50 PST Break
12:50-1:10 PST Minimizing Latency and Maximizing Utilization Through Request Prioritization Gerard Matthew
Service mesh implementations lack a mechanism for prioritizing requests, treating all incoming requests uniformly causing higher priority requests to experience degraded performance under high load. Operators often resort to over-provisioning their applications to be ahead of bursty demands or use load-shedding to remain within latency thresholds. Requests contend for limited CPU, memory, and network resources. In this presentation, we propose an approach that leverages request priority to optimize resource allocation, thereby ensuring enhanced performance for higher-priority requests during high load. We present a prototype which focuses on prioritizing CPU resources, through a combination of priority admission control, prioritized network connections and priority-aware load balancing algorithms. Our preliminary results show that we are able to achieve near-optimal performance for high priority requests under high load conditions. By achieving near optimal performance for high-priority requests under high load, operators are able to reduce over-provisioning of applications thus reducing long-term operational costs while remaining within latency thresholds.
1:10-1:30 PST Expressive Policies for Microservice Networks Karuna Grewal
Microservice-based application deployments need to administer safety properties while serving requests. However, today such properties can be specified only in limited ways (primarily whether service A call a certain API on service B), which can lead to overly permissive policies and the potential for illegitimate flow of information across microservices, or ad hoc policy implementations. We argue that a range of use cases require safety properties for the flow of requests across the whole microservice network, rather than only between adjacent hops. To begin to address this, we propose a system for declaring and deploying service tree policies which are compiled down into declarative filters that are inserted into microservice deployment manifests. Building on the ideas from automata theory, we use a light-weight dynamic monitor based enforcement mechanism. A preliminary prototype shows that we can capture a wide class of policies that we describe as case studies.
1:30-2:00 PST Industry attendee feedback and discussion