Home · Netflix/zuul Wiki https://github.com/Netflix/zuul/wiki
What is Zuul?
Zuul is the front door for all requests from devices and web sites to the backend of the Netflix streaming application. As an edge service application, Zuul is built to enable dynamic routing, monitoring, resiliency and security. It also has the ability to route requests to multiple Amazon Auto Scaling Groups as appropriate.
To dive right in: Getting Started 2.0
Why did we build Zuul?
The volume and diversity of Netflix API traffic sometimes results in production issues arising quickly and without warning. We need a system that allows us to rapidly change behavior in order to react to these situations.
Zuul uses a range of different types of filters that enables us to quickly and nimbly apply functionality to our edge service. These filters help us perform the following functions:
-
Authentication and Security - identifying authentication requirements for each resource and rejecting requests that do not satisfy them.
-
Insights and Monitoring - tracking meaningful data and statistics at the edge in order to give us an accurate view of production.
-
Dynamic Routing - dynamically routing requests to different backend clusters as needed.
-
Stress Testing - gradually increasing the traffic to a cluster in order to gauge performance.
-
Load Shedding - allocating capacity for each type of request and dropping requests that go over the limit.
-
Static Response handling - building some responses directly at the edge instead of forwarding them to an internal cluster
-
Multiregion Resiliency - routing requests across AWS regions in order to diversify our ELB usage and move our edge closer to our members
For more details: How We Use Zuul At Netflix
Overview
Zuul gives us a lot of insight, flexibility, and resiliency, in part by making use of other Netflix OSS components:
-
Hystrix is used to wrap calls to our origins, which allows us to shed and prioritize traffic when issues occur
-
Ribbon is our client for all outbound requests from Zuul, which provides detailed information into network performance and errors, as well as handles software load balancing for even load distribution
-
Turbine aggregates finegrained metrics in realtime so that we can quickly observe and react to problems
-
Archaius handles configuration and gives the ability to dynamically change properties
Surgical Routing
We can create a filter to route a specific customer or device to a separate API cluster for debugging. Prior to using Zuul, we were using Hadoop to query through billions of logged requests to find the several thousand requests we were interested in.
Stress Testing
We have an automated process that uses dynamic Archaius configurations within a Zuul filter to steadily increase the traffic routed to a small cluster of origin servers. As the instances receive more traffic, we measure their performance characteristics and capacity. This informs us of how many EC2 instances we will need to run at peak, whether our autoscaling policies need to be modified, and whether or not a particular build has the required performance characteristics to be pushed to production.
Multi-Region Resiliency
Zuul is central to our multi-region ELB resiliency project that we call Isthmus. As part of Isthmus Zuul is used to route requests from the west coast cloud region to the east coast to help us have multi-region redundancy in our ELBs for our critical domains.