April 24th, 2008
It is a common industry misconception that Throughput should be used as the measure of network efficiency. Throughput of a network is the amount of packets exchanged over the network. It is easy to measure as it is equivalent to the rate of packets over a link.
Goodput of a network can be referred as the amount of packets exchanged between the applications using the network. This obviously does not include TCP re-transmitted packets which are considered as part of the throughput calculation.
In a congested network, one can find Throughput of 90% with Goodput of 30%. That is, most of the packets switched in the network are re-transmitted. As packets are re-transmitted due to packet loss, this typically indicates insufficient buffer space in the switching system in front of the congested links.
Goodput is discussed in a paper written by Robert Morris called “TCP Behavior with Many Flows”. Morris argues that re-transmitted packets have a substantial and adverse impact on the network. The limits start to appear when the number of active TCP flows exceeds the switch buffer as measured in packets. In other words, throughput is only good when the packet loss rate due to queue overflow is low. Each lost packet consumes network resources before it is dropped, contributing to lowered efficiency in other parts of the network. Sufficiently high packet loss rates also cause long and unpredictable delays in time-out-based protocols such as TCP.
It is difficult to get into mathematics and calculation in a blog post but in case you are interested in further information and analysis of Data Center networks you can find it in a recent white paper we published on this subject.
Posted in Ori Aruj (ori at dunenetworks.com) | No Comments »
December 14th, 2007
We would like to briefly discuss latency numbers and ways these are measured.
Some vendors measure and list latency based on results of tests done under no load. That is sending a single or a few packets from one port to another port in the system while the rest of the ports are idle. As a result, one can find quotes of sub 1us for latency tests published by switch vendors. Measuring the latency of that same switch under slightly different load and traffic pattern might yield results of up to 600us (also evidenced by the IBM Zurich Research Labs paper). Therefore, many today believe that these numbers are irrelevant for analyzing real network performance.
In seems that to analyze the performance of a real network, one should measure latency under loads ranging from zero to 100% and create a latency graph. A typical traffic pattern representing a real life network environment should be full-mesh. In full-mesh, traffic is uniformly distributed from each port to all other ports in the system.
Both the absolute latency value and its variance under different loads in this latency graph are important factors to predict the computing/network performance.
Posted in Ori Aruj (ori at dunenetworks.com) | No Comments »
November 20th, 2007
In this Blog post, we would like to evaluate the use of limited-buffer switches (i.e. XAUI switches) as potential solutions for the changing requirements of the emerging data center switching market.
The data center market is clearly growing large enough to justify its own set of requirements implemented in a new breed of product lines expected in 2008. These requirements emerged from the cross pollination of the Enterprise and HPC switches and include high throughput and lossless operation, reduced latency, dense and low cost 10GE port machine and high speed 40GE/100GE interfaces.
To address this need we have recently noticed a number of switch introductions, based on Ethernet “XAUI switch” crossbar devices. The Ethernet XAUI switch device was originally designed for pizza box desktop applications. In the recently introduced systems, it is being cascaded in a CLOS structure in order to build larger switching systems. These new designs suffer from two main issues (i) limited buffering and (ii) problematic switching scheme.
We believe that this design although relatively easy to implement might lead to severe network issues. IBM Zurich Research Labs and Cisco seem to agree. Cisco shared their opinion earlier this year and in a blog posting named The Fallacy of Wire-Rate Switching in the Data Center provided an additional point of view to this issue.
IBM Zurich Research Labs published a paper, where they analyzed a similar design. That is a CLOS inter-connect using limited-buffer crossbar devices and a static routing scheme. As can be seen, Throughput, Latency and Lossless performance were evaluated and were far from optimal.
We believe that designing a system with static routing over a CLOS interconnect is an outdated approach. Cisco and Juniper are known to be using fabrics with dynamic routing. Even system vendors using internally static routing fabrics try to work around it using additional ASIC devices that wrap the internal limited fabric to address some of the issues presented above (See Woven).
As for buffering, while TCP remains one of the key protocols in use and even for non-TCP traffic, it seems that substantial buffer is required.
Dune recently released a white paper that discussed these items in more detail. Send me an email if you are interested to read more about it (ori at dunenetworks.com).
Posted in Ori Aruj (ori at dunenetworks.com) | No Comments »
November 7th, 2007
A little bit of background. XAUI is an important interface for 10 Gigabit Ethernet component and system implementers. It provides the low pin-count and long board trace lengths, system vendors require to drive down port costs. XAUI supports 10Gbps by using four SerDes 3.125Gbps lanes, each lane encoding data with an 8B/10B code. The XAUI standard was created to reduce XGMII 10GE’s 72 pin solution to 16 pins, enabling higher density switching chips and optical transceivers.
Recently, we observed that the market is driving the support of multiple ports of 10 Gigabit Ethernet to the limit. High levels of 10GE integration, dictate that large numbers of XAUI interfaces integration in a single silicon. This comes to a level where silicon device is bound by its external interface. Many noticed that reducing by half the number of links can be easily achieved by using 6.25Gbps links which are embedded in any case in all coming networking devices. The logic design supporting it is simple mux/demux of the 4 x 3.125Gbps XAUI into 2 x 6.25Gbps (RXAUI). This paved the way to the RXAUI.
Some people asked us why RXAUI and Not XFI?
Obviously we believe that eventually 10Gbps links is the right way to go, however we believe that going through an intermittent step is simple, is in no contradiction with other schemes and is welcomed due to today’s silicon reality.
Silicon to be introduced in 2008/9 typically selects their technology in 2007. The present reality of silicon is:
1. XFI SerDes blocks are not widely available and the ones designed are too big to be integrated in large numbers into high scale ASIC solutions.
2. Most of the XFI blocks available do not operate well on long board traces not to mention over backplane and fabric (they are not KR). This require the ASIC vendor to use another type of SERDES links in his design.
3. ASIC vendors prefer to use a single type of SerDes in the device. Minimizing ASIC design, characterization, debug and support issues. Many ASIC vendors often use 3rd parties SerDes IP and like to avoid multiple license/royalty payments
4. Today’s FPGAs do not integrate XFI.
5. Last, RXAUI support is simple!
As a result, most ASSP vendors (PP, NP, TM, Fabrics) in design today do not integrate XFI and are using 6.25Gbps SerDes.
We expect that devices implementing RXAUI will be released starting 2008 enabling highly dense 10GE systems for Enterprise and data centers switches. We expect ICs which focus on the 10GE market, as SFP+ Phy vendors and 10GBaseT
Phy vendors, will lead the way.
BTW, you can get a hold of the RXAUI spec available several vendors are using today in the Dune Website.
Posted in Ori Aruj (ori at dunenetworks.com) | No Comments »
November 7th, 2007
This is the inaugural posting on the The Switching Silicon Blog and we hope one of many to follow. We would like this posting to start of a series of useful and important discussions between experts in the fields of switching, traffic management and packet processing and the fields around them.
We encourage open discussion and sharing of all different views on a specific subject.
Our aim is to share concise blog postings that address points of interest in the industry, perhaps even create a bit of controversy that will get people interested and willing to contribute to the discussion.
Some of the blogs we find interesting are the Nyquist Capital blog and Cisco’s Data Center blog. We would love to hear about other blogs in this space that you think we will find of interest.
If you would like to propose or even author a blog posting, please email me (ori at dunenetworks.com).
Welcome !
Posted in Ori Aruj (ori at dunenetworks.com) | No Comments »