A brief review of "VL2 - A Scalable and Flexible Data Center Network" Sections 1-4

I have mentioned on my very first blog that I'm currently in a full-time Bible School in Anaheim, CA. Well, right now we are in between semesters; and what better way to spend your time then learning how virtualized cloud networks work?
During the seven week interim, I have decided to take a class on coursera.org called "Cloud Networking", taught by P. Brigthen Godfrey and Ankit Singla from the University of Illinois at Urbana-Champagne. What a class! I thought I would sail right through it since I have experience with working with OpenStack's Networking Project - Neutron, but I was wrong.

It's week three now, and passing the quizzes is only half the battle. The class has weekly research paper readings that include in-depth technical discussions and complex terminology. Even after working in cloud for four years and obtaining a CCNA, I have to confess that I really have to put my entire being into reading these papers, otherwise It's hard to grasp the concept being presented.

One paper that really impressed me was "VL2 - A Scalable and Flexible Data Center Network". It started off stating the need for data centers that can support a high bi-sectional bandwidth throughout the network topology; meaning that what is important these days is not so much the throughput of north-south traffic(The data leaving the data center), but rather the east-west traffic(The traffic in between servers):

"Unfortunately. the designs for today's data center network prevent agility in several ways. First, existing architectures do not provide enough capacity between the servers they interconnect."

Agility. That's a keyword throughout the paper. The motivation behind the VL2 architecture. According the paper, agility is defined as "the capacity to assign any server to any service."

Assigning any server to any service is casting a wide net, and new solutions need to be implemented if we are going to talk about this from the networking perspective. Unfortunately, traditional networking typologies strongly couple the location of an application/service to it's address(more on this later)

VL2: A Scalable and Flexible Data Center Network - Figure 1
Section two talks about some background of conventional architectures, illustrating why it's pretty much terrible in meeting the above requirement of agility. When they mention the conventional architectures, they are talking about the traditional tree-like topology we are used to. The three-teired diagram with access, aggr, and core networking devices.

Section three was admit-tingly a hard one for me. Talking about the measurements and how they quantified the traffic matrix(who sends how much data to whom and when?) and churn (how often does the state of the network change due to changes in demand or switch/link faiures and recoveries, etc..?). They say they have studied production data centers of large cloud service providers, but did not mention who or where exactly. The rest of the section goes into traffic flow analysis, traffic matrix analysis, and failure characteristics. I could probably spend an entire week studying just this section. However, the key take away is that network traffic within their scope of study was unpredictable, and that the data "shows that the traffic pattern changes nearly constantly, with no periodicity that could help predict the future." This will be important later.

Section four gets into the meat of their Sofware Defined Networking solution. Now, yes, I know that this paper is old and that much has changed since it has been published and presented, but I still appreciate the efforts the ones who have gone before us have put into making network virtualization possible.

They  started off this section with how they were handling the volatility mentioned in section three. The network is unpredictable. Now what? Well, why not just randomize the paths up to the intermediate switches? Oh and hey...there exists a protocol for that. VLB. But no protocol is perfect. "While our mechanisms to realize VLB do not perfectly meet either of these conditions, we show in sec. 5.1 that our scheme's performance is close to the optimum." Close enough.

Another honorable mention in section four is separating names from locators. This goes back to solving the original request to have an agile network topology. Any server can host any service. By separating the name from the locator, a service is no longer tied to a specific part of the network topology. For example, if my service had the IP 192.168.1.25 in a /24 subnet, then it would be hard for me to migrate that service to another server that's in another subnet without changing the IP. So what if we could make the name of the service independent from where it is in the network topology?
That's exactly what VL2 did. They did it in the way of a "Layer 2.5 shim" that runs on every server. This shim layer invokes a directory system that provides the mappings of names and locations. As a packet is leaving the server, the directory system is queried and a location for the destination is found. The destination is then encapsulated and sent off to the next hop (which is also encapsulated).

I would highly recommend reading through section 4.2. It explains this "shim layer" in a more concise way, and talks about the concepts used to overcome different use cases such as handling broadcast and the interaction with hosts on the internet.

Overall, I would say this paper is must-read for those in the cloud networking space. I'm still learning and hope to one day be able to reach a level of mastery that can help architect such topologies. I felt like this class and these readings have been helping me tremendously! Maybe once I'm done with school I can apply what I have learned ;)

Comments