FNN: Flat Neighborhood Networks

Welcome to the home of FNN documents and software! Since the press releases on our clusters KLAT2 and KASY0, which have Flat Neighborhood Networks, many of you have been asking us for more information, access to the GA (genetic search algorithm) we developed to design FNNs, etc. This is where everything will be posted....

What is a FNN?

A FNN is a network that guarantees single-switch latency and full link bandwidth per PE (Processing Element) pair on a wide variety of parallel communication patterns. It does so using a flat topology of PEs connected to switches, where each switch forms a neighborhood of tightly coupled PEs. The switches are nominally connected only to PEs and not to each other. They key is that each PE in a FNN is connected to several switches, and is thus a member of several neighborhoods. The overall effect is that PE pairs can communicate with single switch latency and full link bandwidth across the entire machine, even though each individual neighborhood does not encompass the entire machine.

There are two classes of FNNs based on which pairs of PEs have guaranteed latency and bandwidth:

Universal FNNs guarantee the latency and bandwidth for all PE pairs.
Sparse FNNs guarantee the latency and bandwidth for PE pairs involved in selected communication patterns.

Publications On FNNs

Publications on Sparse FNNs (SFNN) have their own page. The following publications discuss what we now call Universal FNNs:

H. G. Dietz and T.I.Mattox, Inside The KLAT2 Supercomputer: The Flat Neighborhood Network & 3DNow!, Ars Technica, June 22, 2000.
H. G. Dietz and T.I.Mattox, "Compiler Techniques For Flat Neighborhood Networks," 13th International Workshop on Languages and Compilers for Parallel Computing 2000 (LCPC00), IBM T.J. Watson Research Center, Yorktown Heights, New York, USA, August 11, 2000. Preprints are available as PS and PDF versions for personal use only.
Thomas Hauser, Timothy I. Mattox, Raymond P. LeBeau, Henry G. Dietz and P. George Huang, "High-Cost CFD on a Low-Cost Cluster," Gordon Bell Price/Performance Finalist and regular paper in SC2000, Dallas, Texas, USA, November 4-10, 2000. Preprints are available as 13MB PS and 31MB PDF versions for personal use only. (There also is a 4MB PDF version that some PDF viewers don't like.)
H. G. Dietz and T.I.Mattox, "KLAT2's Flat Neighborhood Network," to appear in the Proceedings of the Extreme Linux track of the 4th Annual Linux Showcase (ALS2000), Atlanta, GA, USA, October 12, 2000. Preprints are available as 7.4MB PS and 2.3MB PDF versions for personal use only.

To Create Your Own Universal FNN Designs

The following form can be used to submit design parameters to a simplified version of the FNN Design GA (Genetic search Algorithm) that uses a CGI interface. This CGI is a somewhat improved version of our original FNN design CGI; it creates Universal FNN designs very quickly, so it's easy to play with different parameters. We hope to be posting software tools for Sparse FNN design in the near future....

In case you are wondering where the 96, 48, 3 parameters came from, they are the parameters that generate the network design used in Bunyip, from the Australian National University, Canberra. Although our KLAT2 was clearly the first machine designed to use a FNN, our Australian friends just happened to build a network that is a universal FNN at about the same time the we built ours. Thanks to our GA, KLAT2's network design offers substantially higher bandwidths at a lower cost (e.g., Bunyip's bisection bandwidth could be much higher -- try 96, 32, 4), but both machines clearly demonstrate the superior price/performance of FNNs. Incidentally, the Bunyip folks also are using SWAR technology; but they are hand-coding SSE while we've been using a variety of tools with 3DNow!. Small world, eh?

If you're just looking for a brief overview of the Universal FNN technology, read on. There is a separate page for the Sparse FNN technology.

Does The World Need Yet Another Network Topology?

One would think (well, we did ;-) that the latest round of Gb/s network hardware would have made the design of a high-bandwidth cluster network a trivial exercise. However, that isn't the case when the prices are considered:

When we invented FNNs in 2000, the cheapest of the Gb/s NICs available were PCI Ethernet cards priced under $300 each; now they are $50-$100. Prices have continued to drop. Prices on custom high-performance NICs (e.g., Myrinet) start at close to $1000 and have not been going down.
In late 2002, 48-port 100Mb/s Fast Ethernet switches have dropped to less than $25/port. Gigabit Ethernet switches are starting to follow the same trend, with $100/port pricing in sight for switches up to about 48 ports. Wider switches with the needed performance are unlikely to become cheap in the near future. Thus, it would be necessary to build a hierarchical switch fabric using multiple layers of switches, yielding higher cost, higher latency, and significantly lower bisection bandwidth (unless you use a "fat tree" or other scheme, which adds still more expense -- especially because cheap layer 2 Ethernet switches don't support those topologies).

In summary, the cost of the "obvious" Gb/s network for KLAT2's 66 single-processor nodes was OVER 30 TIMES the cost of the network we built for KLAT2. In fact, to match KLAT2's bisection bandwidth, a network built using Gb/s hardware would have cost even more. Gigabit Ethernet is getting cheaper, but obvious topologies just are not competitive with FNN performance. So, if you've got tons of money that you have to spend immediately, you can impress your friends by buying expensive custom network hardware that can use an obvious topology and still be competitive with FNN performance. Otherwise, read on.... ;-)

The Flat Neighborhood Concept

When no solution seems to work, it is time to rephrase the problem. We wanted to have the minimum possible latency between any pair of PCs. Clearly, for KLAT2's 66 nodes, you couldn't put 65 NICs in each machine to implement a direct connection... the next best thing would be to have just one switch delay between any two PCs. The problem then becomes that a 66-way switch that can handle communication at full wire-speed is not cheap.

In Fall 2002, you can buy a wire-speed 48-way 100Mb/s switch for about $800. When KLAT2 was built, it was $500 for 32 ports. If we ignore KLAT2's two "hot spare" processors, we could use 32 dual-processor PCs and channel bonding of multiple NICs (http://www.beowulf.org/). However, most dual-processor PCs have memory bandwidth problems and, even if we wanted to adopt that solution, dual-Athlon PCs are not yet widely available.

The "Flat Neighborhood" network topology came from the realization that it was sufficient to share at least one switch with each PC -- all PCs do not have to share the same switch. A switch defines a local network neighborhood, or subnet. If a PC has several NICs, it can belong to several neighborhoods. For two PCs to communicate directly, they simply use NICs that are in a neighborhood that the two PCs have in common. Coincidentally, this flat, interleaved, arrangement of the switches results in spectacular bisection bandwidth -- approaching the same bisection bandwidth that we would have gotten if we had wire-speed switches that were wide enough to span the entire cluster! We even get the benefit that, because four NICs are available for simultaneous use in each PC, we bypass some of the I/O serialization that using IP would imply with a single Gb/s NIC (or channel-bonded set of NICs) under Linux.

A Ludicrously Simple Example

The above example shows how one could construct a FNN for 6 PCs using just two NICs/PC and three 4-port switches. Note that every PC has at least one single-switch-latency path to every other PC; some PC pairs have more than one such path.

No Free Lunch

Unfortunately, Flat Neighborhood Networks introduce several interesting new problems. These problems are:

How to design a Flat Neighborhood Network. Unfortunately, only very small FNN wiring patterns can be designed by hand. We created a genetic algorithm (GA) that can search for an appropriate wiring pattern, also optimizing secondary properties of the network for specific types of communication traffic.
How to physically wire the network. This may seem like a trivial concern, but flat neighborhood designs do not necessarily have good wiring locality properties and, in the general case, are not regular (i.e., often have no symmetry).
How to perform basic routing between PCs. Most network hardware and software assumes a variety of network properties that FNNs violate. For example, if you ask PC #0 for the network address of PC #1, you do not get the same answer that you get if you ask PC #2 the same question.
How to take full advantage of extra bandwidth that is available for some (but not all) communication paths. What is needed is very similar to channel bonding, however, the standard Linux support for channel bonding works in a way that is incompatible with the flat neighborhood topology.

Well, those are not easy problems to solve, but we managed to solve all but the last within a week or so. The last one is a pain, which translates to "we are writing the paper on it." We will post more details, including the GA, soon....

KLAT2's Flat Neighborhood Network

As you can see above, KLAT2 is large enough that none of the network design, construction, and routing issues are trivial. In fact, we ran the GA on Odie -- a cluster of four 600MHz Athlons that we built last year. However, here's the good news:

Around $8,100 for the complete network, including cables
Single-switch latency on all pairwise communications
Bisection bandwidth of over 25Gb/s -- better than the Gb/s options and essentially the same as if we had wire-speed full-cluster-width switches with 4-way-channel-bonding, which would have been very expensive
The FNN can be tuned to specific communication patterns; KLAT2's was partially tuned for the row/column patterns used in ScaLAPACK
We designed the FNN for 64 nodes and added one additional switch for multicast; however, by placing the two "hot spares" on that switch, they are able to logically replace any faulty machine with only a little extra latency and no rewiring

The only thing set in stone is our name.