CCR Dell Cluster Achieves 70% of Peak Performance
on LINPACK Benchmark

 

Release: November 1, 2002

Buffalo, NY. Updated every six months, the TOP500 list (www.top500.org) provides the de facto standard in rankings for supercomputers worldwide. Introduced by Prof. Jack Dongarra (University of Tennessee) in 1993, the list is rank ordered according to verifiable results of the LINPACK benchmark, a program that solves a dense system of linear equations. The intent of the LINPACK benchmark is to provide a measure of realistic peak performance (as opposed to theoretical peak performance) of the computer system under consideration, where the focus is on large scale computing, specifically, on a computer's ability to efficiently solve numerically intensive problems.

The Center for Computational Research (www.ccr.buffalo.edu) at the University at Buffalo (CCR) is the sixth largest supercomputing center in the world according to www.gapcon.com. As part of its bioinformatics initiative, CCR recently purchased a large cluster from Dell (www.dell.com). This cluster is composed of 300 Dell PowerEdge 2650 servers, each containing two Intel Pentium 4 processors running at 2.4GHz, 2GB of memory, and a 146GB of disk. The nodes on the system are connected via a high performance, full bisection bandwidth (2Gigabit/s), network from Myricom (www.myricom.com).

This cluster achieved a LINPACK score of 2004 GFlop/s (i.e., 2.004 TF/s), roughly 70% of the theoretical maximum peak performance of 2880 GFlop/s. This machine is listed as the 22nd fastest machine in the world according to the top500 list (www.top500.org), which was released at SC2002 (www.sc-2002.org) in November 2002. This machine is the 4th academic supercomputer on the list and the 4th Intel-based supercomputer on the list.

A key to obtaining the highest possible LINPACK score on a given computer system is the computational kernel used in solving the large system of linear equations. The benchmark itself was run using HPL (www.netlib.org/benchmark/hpl), a portable and freely available implementation of the high performance computing LINPACK benchmark. HPL relies on an underlying matrix multiplication kernel, the optimization of which is critical to obtaining the best performance. For this kernel, the UB scientists used a highly tuned computational kernel developed by Kazushige Goto (Visiting Scientist, UT-Austin)). This kernel achieves, at present, an extremely high level of performance relative to its competitors for the 32-bit Intel architecture found in CCR's Pentium 4 cluster. Details of this kernel can be found in the (recently submitted) paper at the author's site (www.cs.utexas.edu/users/flame/goto/).

How useful is a 2TFlop/s cluster? It's important to note that this cluster is not just a loosely connected network of Linux servers, but is instead a high performance parallel computing engine capable of delivering on big science and engineering problems, as proved by the high LINPACK score. This new capability will allow UB and affiliated researchers to pose, and solve, new research questions that are well beyond the capability of small and mid-range computing facilities. So, while 10 scientists could each move forward doing evolutionary research on their own 30 node cluster, the same 10 scientists have the opportunity to do revolutionary science on a cost-effective 300 node cluster.