In this project, you will write a C program using LAM-MPI to run on a cluster supercomputer. The input to your program will consist of pairs of integers: key value and datum. Your task is simply to sort the pairs by key value into increasing order and output the sorted result. The goal is simply to achieve the best possible speedup. Your sort need not be stable, i.e., it need not maintain elements with duplicate keys in their original order.
The input to your program will consist of a series of 32-bit binary integers (int values) in a completely raw binary format. If the data fit in memory, then, given the following C struct:
typedef struct { int key, datum; } item;
item a[MAXSORT];
Your program could read the entire key-datum pair array from the standard input by:
read(0, &a, (MAXSORT * sizeof(a[0])));
Of course, you should ensure that only one processor is reading the standard input. Your program's output is sent to the standard output in a very similar way:
write(1, &a, (MAXSORT * sizeof(a[0])));
To further simplify things, you will be given a few values for important constants:
MAXSORTMAXSORT is always a power of 2.
NPROCMAXMEMMAXSORT is
selected for 599 students to be such that all key-datum pairs
fit in memory of a single system, whereas 699 students will find
that MAXSORT may be too large. Note that the value
of MAXMEM is not necessarily the same as the system
limit enforced by Linux, but your program plus data memory use
should be kept within this bound.
The program sorthelp.c is provided to help you with this project. This helper code is actually four different programs, which you compile by:
gcc sorthelp.c -o binary -O2 ln -f binary check ln -f binary generate ln -f binary view
The above compilation trick gives you four different names for the same executable file... which looks at the name it was executed with in order to determine its function. This same trick is used for lots of unix programs; for example, gzip, gunzip, and zcat are actually the same program. Here's an overview of the functionality of the project helper:
Normally, all you should need is generate and check. However, the other programs allow you to do rather neat things in terms of using various unix tools. For example, the standard text-based unix sort utility can be used via a pipe like:
./generate 1000 |./view 1000 |sort -n |./binary 1000 |./check 1000
In other words, a very low-performance, sequential, simulation of your program for the case of MAXSORT being 1,000 is simply:
./view 1000 |sort -n |./binary 1000
Test your program using LAM on a single-processor system, e.g., in the 577 FPAT lab. Good test parameters (for cases that are small enough for you to examine by hand) are:
#define MAXSORT 1024 #define NPROC 4 #ifdef EE599 #define MAXMEM 32768 #else #define MAXMEM 4096 #endif
Notice that the memory space per logical processor is plenty for EE599, but only about half what you'd like for EE699.
Once you are confident that your code is functional, request access to KLAT2. At that time, you'll be given a specific test case to benchmark on KLAT2. Aside from performance tweaks, your code shouldn't need any changes to run on KLAT2. You do that using this set of instructions with the following parameters:
#define MAXSORT (24*1024*1024) #define NPROC 64 #ifdef EE599 #define MAXMEM (256*1024*1024) #else #define MAXMEM (128*1024*1024) #endif
So, the undergraduate and graduate students have the same problem to solve, but graduate students have less memory... Alzhiemer's kicks-in pretty early, eh?
What you will hand-in for this project is a brief "implementor's
notes" file (psort.html) and source and make files to
construct and run an executable named psort using LAM
MPI. In your write-up, be sure to document the speed you get
sorting the test case given to you to run on KLAT2.