I get asked about software-defined networks (SDN) quite often and at this point I sort of expect that most people have no idea what it means. Yeah they can explain that SDN stands for Software Defined Networking, but then they quickly go on a rant explaining the virtues of OpenFlow[1] and how managing physical switches through software APIs is all the rage. At this point I usually ask where they learned about SDN and why they think that managing switches with software, which by the way we’ve been doing for the past 20 plus years, is all that new of a concept. In other words, why would that make SDN important.
My history with SDN goes back to 2010 when August Schell was investigating SDN and specifically Nicira Networks. Shortly after learning about the technology Nicira was building, we became an early investor in Nicira. I was in briefings with Martin Casado, one of the founders of Nicira, and he explained his concept of virtual networks and and what led him there. He begins to tell the story of how, as a network engineer, he was trying to figure out a way to better manage large data center networks. He lamented the fact that most network outages were caused by humans making mistakes in configurations. He pointed to the fact that network switches were each configured manually by a network engineer using a cryptic CLI that was easy to mess up. Originally this led him in the direction of creating a common protocol for configuring switches and OpenFlow was born. OpenFlow incorporates a controller architecture used to manage a disparite set of network switches using a common API driven protocol. According to interviews with Martin, OpenFlow was the byproduct of the open source NOX controller. NOX was built as a way to enable controllers that could talk to and manage physical switches. OpenFlow was a good idea, but led to some not so good problems along the way. One of the problems he faced was that switch vendors were competing with one another based on unique features and thus required proprietary software to configure and manage these devices. Getting vendors to agree on a basic set of controls to enable OpenFlow ended up being very difficult and led to a lowest common denominator approach. This had the effect of minimizing the advanced features of many networking products. Another problem was the fact that getting OpenFlow coded into the ASIC[2] chips for any vendor was a lengthy process, sometimes taking as long as seventeen or eighteen months. This made development of OpenFlow very difficult, it also made updating and debugging problems in OpenFlow almost impossible.
Along the way, Martin realized that OpenFlow wasn’t the answer to his problem. While it was helpful, it wasn’t flexible enough and led to the development of Open Virtual Switch (OVS). OVS is a software based switch that runs on the x86 compute platform. OVS brought the ability to develop, modify, bug fix, and ultimately change the switch at software speed. This was a huge accomplishment and offered much more flexibility and configurability of the network, plus it had the advantage of being software that could easily be changed. The advent of OVS brought with it new ideas and capabilities. One of those ideas was to create overlay networks or tunnels between each OVS endpoint. This is the basic concept Nicira leveraged to create virtual layer 2 networks. The ah-ha moment came when they developed the controller cluster software that managed the relationships between the physical network connections to each OVS node and the overlay tunnels between OVS nodes used to connect VM workloads to a software based network. The controller software became the secret sauce and SDN was effectively born! The network became programmable by issuing API calls to the controllers which in turn would direct OVS to create, modify or delete virtual networks. This was a huge breakthrough and ultimately led to Nicira’s success and acquisition by VMware.
According to Martin, the term SDN was coined in 2009 and at the time had a fairly specific meaning. That meaning has been lost over the years as many vendors use SDN to refer to anything networking related. That said, a Wikipedia article on SDN claims that one of the first SDN projects began as a project called GeoPlex at AT&T labs around 1995. While all of this may be true, SDN as it exists today is really several things. Any solution that has separate control and data planes, and that provides a software based API to enable automated configuration and management should probably be considered a bona-fide SDN. This includes overlay technologies such as NSX and PlumGrid, Network Function Virtualization products from vendors such as Palo Alto Networks, Checkpoint and Trend Micro (as well as others), and even switch hardware that is managed using OpenFlow.
In my opinion, and it’s just that, my own opinion, technologies like NSX make the most sense in virtualized data centers. There are a number of reasons why. For example, the simplicity in which one can create new network topologies that include L2-7 features completely in software is a tremendous enabler. There’s no more need to make changes to top-of-rack switches to trunk VLANs for multiple tenant workloads. This process alone can take days or weeks to complete as network engineers have to allocate subnets, create routing for these new subnets, create trunk configurations on a cluster of hosts and then finally create port groups on the virtual switches of the virtualization stack. All of these steps become unnecessary with virtual networks. Then there’s the discussion around outages and what causes them. Most network outages occur due to human error when making configuration changes. With overlay technologies such as NSX, the goal is to severely limit the number of changes that are required to the underlay or physical network components. The idea is to get the underlay set and have it be performant and leave it alone. Then all application related networking changes are made in the overlay network. When changes that are made to an application specific overlay cause a problem or outage, the problem is contained to that one application rather than the entire network. These configuration changes that are made in the overlay can all be automated to minimize or limit human error.
[1] OpenFlow is a protocol that allows a server to tell network switches where to send packets. In a conventional network, each switch has proprietary software that tells it what to do. OpenFlow is designed to work with any compatible switch from multiple switch vendors.
[2] ASIC is an Application Specific Integrated Circuit that performs some very specific compute function. Network switches commonly employ ASICs for switching functions which are very compute intensive.
Ron Flax
Vice President
Chief Technology Officer
@ronflax
Leave a Reply
You must be logged in to post a comment.