Lippis Report 153: Why Ethernet will be the dominant Two Tier High End Data Center Network Fabric
In Lippis Report 151: A Two or Three Tier High-End Data Center Ethernet Fabric Architecture? we detailed the new two tier data center Ethernet fabric that is becoming conventional wisdom amongst business leaders of high end data centers and cloud computing service providers. The networking industry is headed for a major innovation and competitive cycle fueled by a multi-billion dollar addressable market for data center network fabrics. Over the last eighteen months, every major Ethernet infrastructure provider has announced or taken a position on two tier network fabrics for high-end data centers. Companies such as Cisco, Arista Networks, Force10, Voltaire, HP/3Com, Juniper, Extreme, Brocade, BLADE Network Technology, et al have announced network fabrics for data centers with two thousand and more servers that either support storage enablement or not. In this Lippis Report Research Note, we review why it is Ethernet that will be the network fabric of high performance computing or HPC and cloud computing deployments.
Cisco Launches FabricPath Switching System For Scalable Data Center Ethernet Fabrics
For high-end data centers, HPC plus private and public cloud computing networks connecting thousands of servers, a new set of requirements have emerged. Low latency and high performance are the two driving requirements. Yes, there are more, especially when the fabric needs to enable converged storage, but let’s focus on latency and performance for now. Traditional three tier (server access, distribution and core) fabrics designed primarily for north-south traffic flows, that is client-server computing utilized spanning tree protocol (STP) and slower speed Ethernet (100Mbs to 1Gbs). Thanks to web 2.0, mash-ups and social networking sites east-to-west or server-server traffic flows have spiked requiring networks to support both north-south and east-west flows.
As most network engineers know, STP was designed to avoid loops that confused Ethernet as it was designed as a bus topology. STP shuts down redundant links between common switches to maintain the bus. Therefore, connecting access switches to distribution switches utilizing STP would require that network engineers over-subscribe the links between switches as only half of the bandwidth could be used. Oversubscription would also create blocking of packets between points too. To avoid this design, nearly every major switch manufacturer offered link aggregation that is the ability to shut off STP and aggregate links between switches. While this was and is a benefit, the down side has been that vendors only offered the ability to aggregate two links, which still drove oversubscription and blocking.
Force10 Is First To Offer 40 Giga bit Ethernet For The Data Center
Recently, industry players such as Cisco and Arista Networks have offered the ability to scale up aggregation of links from 16 to 32, while at the same time delivering multipathing that allows packets to be forwarded across multiple links to arrive at its intended destination. Switch-processing capacity to support these massive inter-switch links have been increased too. These design changes, along with Ethernet’s innovation march, has ushered in the two-tier network design fabric option.
A two-tier fabric is designed with two kinds of switches; one that connects servers and the second that connect switches creating a non-blocking, low latency fabric. We use the terms ‘leaf’ switch to denote server connecting switches and ‘spine’ to denote switches that connect leaf switches. Together a leaf and spin architecture create the network fabric.
Cloud Networking Platform
In late June 2010, Cisco announced its’ FabricPatch Switching System or FSS and its’ F-Series modules that support 32 ports of 10GbE of auto-sensing 1/10GbE and is essentially for server access and aggregation. FabricPath provides a new level of bandwidth scale to connect Nexus switches and delivers a new fabric design option with unique attributes for IT architects and designers. FabricPath is a NX-OS innovation, meaning that its’ capabilities are embedded within the NX-OS network OS for the data center. FabricPath essentially is multipath Ethernet; a scheme that provides high-throughput, reduced and more deterministic latency, and greater resiliency compared to traditional Ethernet.
FabricPath combines today’s layer 2 or Ethernet networking attributes and enhances it with layer 3 capabilities. In short, FabricPath brings some of the capabilities available in routing into a traditional switching context. For example, FabricPath offers the benefits of layer 2 switching such as low cost, easy configuration and workload flexibility. What this means is that when IT needs to move VMs and/or applications around the data center to different physical locations, it can do so in a simple and straightforward manner without requiring VLAN, IP address and other network reconfiguration. In essence, FabricPath delivers plug and play capability, which has been an early design attribute of Ethernet. Further, large broadcast domains and storms inherent in layer 2 networks that occurred during the mid 1990s have been mitigated with technologies such as VLAN pruning, Reverse Path Forwarding, Time-to-Live, etc.
A Simpler Data Center Fabric Emerges For The Age of Massively Scalable Data Centers
The layer 3 capabilities added to FabricPath deliver scalable bandwidth allowing IT architects to build much larger layer 2 networks with very high cross-sectional bandwidth eliminating the need for oversubscription. In addition, FabricPath affords high availability as it eliminates STP, which only allows one path and blocks all others, and replaces it with multiple paths between endpoints within the data center. This offers increased redundancy as traffic has multiple paths in which to reach its final destination.
FabricPath employs routing techniques such as building a route table of different nodes in a network. It possesses a routing protocol, which calculates paths that packets can traverse through the network. What is being added to FabricPath is the ability for the control plane or the routing protocols to know the topology of the network and choose different routes for traffic to flow. Not only can FabricPath choose different routes, it can use multiple routes simultaneously so traffic can span across multiple routes at once. These layer 3 features enable FabricPath to use all links between switches to pass traffic as STP is no longer used and would shut down redundant links to eliminate loops. Therefore, this would yield incremental levels of resiliency and bandwidth capacity, which is paramount as compute and virtualization density continue to raise driving scale requirements up.
STP MiTM Attack and L2 Mitigation Techniques on the Cisco Catalyst 6500
Designing A 160 Tbps Data Center Fabric
As an example to how multi link aggregation, the elimination of STP, high switching capacity and 10GbE connections create a highly scalable two-tier layer 2 Ethernet fabric, we use Cisco’s FSS and its’ F-Series module in the Nexus 7000. The following details the design of a 160 Tbps switching fabric with FabricPath and the F-Series module for high performance data centers using Cisco’s Nexus 7000 switches. This architecture can support over 8,000 servers connected at 10GbE or 4,000 servers dual homed at 10GbE with attributes of being non-blocking, low latency (5 microseconds), high bandwidth, reliability, plus simplicity of workload movement.
To build a 160 Tbps two-tier fabric, thirty-two Nexus 7018 switches populated with F-Series 10GbE modules would connect servers. These thirty switches are leaf switches. Each leaf chassis provides 256 10GbE ports to connect servers and another 256 10GbE ports to connect into spine switches. Therefore, each leaf is directly connected to each spine with sixteen FabricPath ports at 10GbE equaling a total of 256 10GbE ports for each leaf switch. There are sixteen spine switches each accepting 512 10GbE FabricPath ports. A single leaf chassis connects 256 10GbE ports into a spine equaling approximately 2.5Tbs. Multiplying each thirty-two leaf’s contribution into the fabric yields 80Tbs. As Ethernet is full-duplex, the total fabric switching capacity is 160
Tbps. Therefore, 160Tbps of switching fabric is available across all thirty-two leaf chassis. As 256 10GbE equals 2.5 Tbs, which also equals 16 FabricPath links to each one of sixteen spine switches, yields 2.5 Tbs, the fabric is non-blocking.
Building Mission-Critical Data Center
As for layer 2 and layer 3 forwarding, the job of the spine is to forward packets from leaf switches at layer 2, creating a single tier fabric. A key attribute of this architecture is that each 16-way FabricPath links are Equal Cost Multipathing or ECMP. What 16-way FabricPath ECMP provides are two benefits: 1) It delivers more paths for traffic to flow, which increases available bandwidth in the fabric and 2) as they’re distributed across all switches, diversity of routes is enabled to distribute packet forwarding. In essence what 16-way FabricPath ECMP provides is a very low latency, high bandwidth approach to supporting both north-to-south and east-to-west traffic flows simultaneously.
While the above is a Cisco deployment example Arista’s new 7500 series of Ethernet switches support 6 Billion packets per second at wire speed. The 7500s can be configured into a massive two-tier network fabric thanks to it support of 32 port MLAG (Multi-Chassis Link Aggregation) affording the connection of 18,000 to 30,000 servers.
Multi-Chassis Link Aggregation
Ethernet continues to evolve. The IEEE recently ratified the 40 and 100 GbE standard with vendors such as Force 10, Cisco, Arista, Extreme, BLADE, Brocade, Voltaire, HP et al announcing support and scheduling product delivery. While the above two-tier network example provides the perspective from the large switch provider, below is BLADE Network Technologies perspective, a company focused on server connectivity.
BLADE Network Technologies believes that as Ethernet delivers new levels of speed and intelligence, it will be the dominant two-tier network fabric for high-end next-generation data centers.
For many applications, low latency is a key requirement, and latency is an area where two-tier networks excel. Studies of stock trading exchanges have shown that tens of milliseconds of delay in data delivery can represent a ten percent drop in revenues, and delays of even five microseconds per trade can cost hundreds of thousands of dollars. Industry-specific requirements for uncompressed data and end-to-end deterministic latency within tens of microseconds make attaining such performance even more difficult. These factors have combined to make raw switching speed a top priority, and today’s best-of-breed 10 Gigabit Ethernet switches achieve can operate with under 700 nanoseconds of port-to-port latency while consuming a miniscule amount of power equivalent to that of standard light bulbs.
As next-generation networks get flatter – driven by latency and bandwidth requirements – emerging Layer 2 technologies such as the IETF’s Transparent Interconnection of Lots of Links or TRILL, enable this trend. The idea behind TRILL is to replace spanning tree as a mechanism to find loop free trees within Layer 2 broadcast domains. Using a routing protocol to build forwarding trees within a Layer 2 broadcast domain enables the flexibility and efficiency to route Layer 2 traffic, just like one would Layer 3 traffic, without the overhead associated with Layer 3 packet processing. TRILL will offer important features, such as support for both broadcast and multicast, load splitting along multiples paths, support for multiple points of attachment, and no tangible delay in service after attachment.
In the data center, bottlenecks are moving from the CPU and memory access to the I/O of the servers. Today’s multi-core servers are now able to sustain a great amount of traffic, requiring fast, flat networks, especially now that virtualization is widely deployed. Analysts have predicted that the 10G market will double year-to-year in 2010 and 2011. More servers using 10G increases the requirement for 40G and 100G in upstream networks. With 10G widely available and 40G coming online, Ethernet networks can enable data and storage traffic to use a single wire, using FCoE or iSCSI for example, and provide the raw speed that makes Ethernet with its economies of scale, to supplant InfiniBand for HPC requirements.
The reason Ethernet will be the network fabric for high-end data center networks is that the vendor community continues to innovate and build upon this protocol. Ethernet innovations are many and are beyond bandwidth increases from 10Mbs, 100Mbs, 1Gbs, 10Gbs, 40Gbs and 100Gbs, which are obvious. Link aggregation, multi-pathing and so much more propel Ethernet’s relevance and suitability to new challenging networking requirements.