EE599/EE699 Using KLAT2

Assuming that you now have an account to log into, here's how you use KLAT2....

Logging Into KLAT2

You can't get there from where you are. You use KLAT2 in two steps, the first of which is logging into a machine behind the same firewall as KLAT2. The machine you should use is kaos.ee.engr.uky.edu and the only way you should be able to log into it is using the ssh protocol (i.e., ssh or even a java-enabled WWW browser using MindTerm).

The second step is to actually get to KLAT2... which really means the KLAt2 node of your choice. This must be done using rlogin or rsh, for example:

rlogin k00

Now you're on KLAT2, but you should still be in the same directory you were in on kaos, because that's all cross-mounted using NFS (the Network File System).

If somebody else has KLAT2's nodes busy chugging away, you might want to come back and try again later. The easiest way to find out is simply:

uptime

Hopefully, all three numbers printed after load average are 0.00; if they're more like 1.00, somebody else is probably doing something. Another clue is, if it said there was more than 1 user, in which case you can find out who is using the system by:

who

People who don't play nice (i.e., hog the machine for excessive periods) may be find their processes killed and their account disabled. Tim Mattox and Hank Dietz (in that order) are the "enforcers" to talk to if you have a problem.

Running LAM-MPI

The first step is to "boot" the "Local Area Multicomputer" software which LAM-MPI uses to run jobs. This software is run per-user, so you need to run it even if somebody else is already running it on KLAT2. Also, because KLAT2's nodes don't have disks, you should make sure you don't leave it running when you log out... more on that later. To boot LAM-MPI from any node of KLAT2, type:

lamboot -b -v -l /klat2c.grp

You'll see a few messages (well, ok, one per node and then some), after which LAM-MPI is patiently awaiting you giving it a program to run. Of course, you don't really see it waiting unless you do a ps -ax and see lamd running (which, incidentally, is how you find that somebody else has left lamd running) or, somewhat prettier, type:

lamnodes

And you'll get a list of nodes you are running lamd on. If you don't have lamd running, lamnodes will tell you that too.

Compile your program using mpicc instead of cc or gcc. Notice that the LAM-MPI header file to include is just #include "mpi.h", not some LAM-dependent header name or a strange path; LAM-MPI is installed for all users on KLAT2. Unfortunately, now is the time you need to remember that you're using NFS cross-mounted stuff. To force NFS to update everybody's copy of the executable file blah, type:

lamexec N md5sum blah

In case you're wondering, the N part says to run this non-MPI program on all nodes in the multicomputer. If you don't see the same number printed by md5sum for each node, try running it again (although that shouldn't happen).

Now you're ready to run blah:

mpirun -O N blah

The -O part forces LAM-MPI to avoid doing transformations of all data to/from machine-independent formats... KLAT2 is quite homogeneous, so you definitely don't want to have LAM-MPI wasting time doing needless conversions.

You can mpirun things as many times as you like without restarting the system using lamboot, but when you are ready to log out, please type:

lamhalt

So that your copy of LAM-MPI will not be wasting memory space on KLAT2's diskless nodes.