Cloud Networking Designs by Lippis
A Two-Tier Network Model Emerges In The Cloud
So whats so different about the high performance data center and new cloud-computing environments that the three-tier model could be collapsed into two? In a word it's "performance". In two words its "consistent performance" under heavy load. Performance demand is more critical in this market with applications such as storage connect, high performance computing (HPC), video, extreme web 2.0 volumes, etc., requiring unique network attributes. Consider this: approximately 10 million servers are sold every year. In 2003 20% of servers were sold into HPC and large public facing Web sites according to IDC. In 2009 that number will increase to 50% of server units are sold into cloud and HPC environments. In short, high performance data center and new cloud-computing sites is becoming extremely server dense. Take server density on a scale we have not seen previously and add ultra application demand at load and you have the requirements for a new kind of networking.
To deliver performance at scale and under load of a cloud computing data center equipped with tens to hundreds of thousands of servers delivering applications to millions of users, network performance has to be non-blocking, highly reliable and faultless with low and predictable latency (sub-microsecond) for broadcast, multicast and unicast traffic types. In addition the cloud network needs to be aware of application flows rather than static addressing of devices so that changes in applications, servers and storage can occur without re-configuring the network. Ten-gigabit ethernet connections to servers, storage and between switches are the design direction now, which will scale up as the IEEE develops the 40 GbE and 100GbE standards, expected to be ratified in 2010.
Meeting these requirements offers scale and optimization of servers, applications and storage elements, which allow millions of applications to randomly spin up and down with demand much like atomic behavior described by Brownian motion. In short, traffic profiles in this high performance and dense application environment is unpredictable. This is a key design criterion; that is, networks need to anticipate wild matrix flows with overlapping peaks and valleys and move these flows without dropping packets at microsecond latency between server and storage over the network.
Access Layer Becomes a Virtual Layer
So how is networking design changing to address these high performance requirements? First, the access layer in virtualized data centers is changing dramatically and disappearing as it's increasingly being subsumed into servers, either in the form of virtual switches and/or blade switches inside servers. A new wave of technology and intelligence is stretching the classic physical access layer into a new virtual access layer. In this new virtual access layer, switching takes place in a hypervisor virtual switching instance, and in other cases the network fabric is stretched to the rack level ensuring single point of management. Effectively the classic access model or end-of-row, top-of- rack and Blade Switching is evolving to a Distributed Access Fabric combining the advantages and benefits of EoR and ToR models.
Secondly, network traffic in clouds is a matrix of overlapping flows with web 2.0 and mash-ups driving massive server-server connections. Network latency becomes a fundamental limiting factor to application performance as the network becomes the bus connecting storage and computing. And as networking speeds increase to 40Gbs, 100Gbs and above the boundaries between storage, networking and computing are being redefined as virtualization is starting to show now.
Cloud Access and Cloud Core Made Up The Two-Tier Model
To accommodate these requirements a two-tier network model is being considered consisting of what I call a "Cloud Access" tier and "Cloud Core" tier. The Cloud Access tier connects servers while the Cloud Core consists of a series of non-blocking switches delivering mesh connectivity between non-blocking Cloud Access switches. The Cloud Core also connects storage and wide area services/routers to the cloud. Within both cloud tiers are switches that provide layer 2 and layer 3 services giving the cloud architect design options of deploying all layer 2, all layer 3 or a hybrid yielding choice as to where to place the layer 2/layer 3 boundary. We reviewed cloud switches in Lippis Report 120 Research Note.
For example, layer 3 services may only be in the Cloud Core or in both Cloud Access and Core which is important for web 2.0 and mash-up based traffic flows. In this model there is no third tier where traffic has to flow to accommodate server-to-server flows; traffic is either switched at Cloud Access or in the Cloud Core at less than 10 microseconds. Oversubscription needs to be carefully managed in a two-tier structure ranging from 1.5:1-to-10:1 Access:Core.
There are examples of a two-tier model in high performance data center applications. For example, the Infiniband architecture describes a leaf and spine structure, which is also championed by Arista Networks. What is important about this market segment is that ethernet switches based upon previous generation ASICS and network operating system technologies may not be up to the performance task. Only two firms, Cisco and Arista have developed new operating systems and hardware for this market.
While Cisco does not tout a two-tier architecture in its Data Center 3.0 program, its Nexus data center switches can clearly be configured in this form. For example, its high end Nexus 7000 would occupy the Cloud Core while its Nexus 5000/2000 occupies the Cloud Access tier. The Nexus 2000 provides GbE connections to servers while obtaining configuration and NX-OS services from the Nexus 5000 via 10GbE placed in end-of-row. The Nexus 2000 and 5000 may be two separate physical devices but they are logically one, making up the Cloud Access tier. In this scenario the Nexus 2000 is a line extender and I expect to see others introduce a similar approach as it delivers the cabling efficiency of top-of-rack and network management operational efficiency of end-of-row. The layer 2/layer 3 boundary resides in the Nexus 7000.
Arista Networks would deploy a series of its Arista 7148SX to construct the Cloud Core while having the option to deploy any of its three 10G switches in the Cloud Access, that being the 7148SX, 7184S, or 7124S. Arista's Extensible OS (EOS) operating system is unique and purposely built for self-healing resilience and open extensibility designed specifically for cloud computing environments.
Over the next two quarters other networking companies will be announcing cloud-networking products, with most if not all based upon this two-tier model. Look for offerings from Force10, HP, Brocade and Juniper during 2009. Clearly there will be trailblazers and certain vertical market segments that will deploy the two-tier model sooner with a wider adoption after 2010 into 2015. Also note that the two- and three-tier models will co-exist with three-tier being the network architecture in building/campus networks and non-cloud/high performance data centers. But for the high-end cloud and high performance data centers, the two-tier model offers the attributes of low latency, cost and packet throughput required.