Assignment 4: M-P-I, Open-M-P, N-B-o-d-y!

The N-body problem is simply to compute the trajectory of objects in space as driven by gravitational forces. Your task is simply to convert the sequential program nb.c into a parallel program using both MPI and OpenMP parallelism. To be specific, your program should carve the problem among 4 shared-nothing MPI PEs, each of which is actually 2 shared-everything OpenMP PEs -- for a total of 8 PE. I leave it up to you how you call IPROC and NPROC when you sort-of have two different values for them simultaneously. The display update code (call to display() in main) can remain purely sequential.

PS: Yes, the project title is to be sung to the Micky Mouse theme. ;-)

What To Do

Nbody is about N-squared computation of forces for each timestep. We'll make the results visible by having you update an output PPM file showing where each object is. The output file, nbody.ppm, will simply be mapped into one PE using mmap(). Thus, simply changing the mapped image will change the file. You can watch the animated progress simply by:

display -update 2 nbody.ppm

ImageMagick's display program knows how to read PPM files, and the above command simply says to keep refreshing the display every 2 seconds. It will not be the most fluid display, but pretty cool that you can do animation just by changing a memory-mapped PPM file, right?

The nb.c program itself is pretty straightforward, although it does need to be compiled with -lm to provide the sin, cos, atan2, and sqrt functions. Once compiled, it can be run like, for example:

./nb 32 2 nbody.ppm

This would mean there are 32 bodies, the PPM output is done by mapping nbody.ppm (and is sized by that file's initial content), and the PPM file is updated every 2 seconds. Feel free to redefine MAXBODIES to something greater than 1024 and recompile if you want to try really large numbers of bodies.

Inside nb.c, you'll see a loop in main. Executing clear_forces(), compute_forces(), compute_velocities(), and compute_positions() is where you can get parallel execution. It may take a little rewriting, but the idea is to have the work of updating the various body forces, velocities, and positions be done in parallel. The forces are a little tricky to compute in parallel, but veocities and positions are fundamentally computed without communication. Notice that macros are defined for accessing the information about each body; you can simply redefine these macros to change the data structure. For example, you could make the old and new x values not be interleaved by removing double x[2] from bodyType and adding:

double	x[2][MAXBODIES];
#define	X(B)	x[old][B]
#define	XN(B)	x[old^1][B]

In the above way, you can change the false sharing implied by your parallel data accesses.

Note that I am not expecting you to parallelize the display update. In fact, you can simply make display updates infrequent enough to be irrelevant, e.g., once every 120s. For comparative timing tests, you might also want to replace while (1) with a for executing a fixed number of iterations and give srand() a constant argument rather than time(0).

Due Dates, Submission Procedure, & Such

You will be submitting source code, a make file, and a simple HTML-formatted "implementor's notes" document called a4.html, which should explicitly describe what you did and the performance issues you dealt with. Please arrange your make file so that the executable file, which should be called a4, is compiled when one simply types make. All of the things needed should be packed into a single tar file for submission by a command like tar -zcvf a4.tgz files -- and that tar is all you submit. Do not include executable files, or other things that can be generated by make, in the tar you submit.

For full consideration, your project should be submitted no later than the final exam timeslot, Thursday, May 7, 2015, at 8AM. Submit your project tar file here:

Cluster/Multi-Core Computing