Assignment 1: Simple As 1, 2, 3, ...

The goal of this first project is to make sure you can access the system provided and use raw C code with system V shared memory to do something in parallel. Yeah, stone knives and bearskins.... Hopefully, this will not only get you thinking in terms of parallel vs. sequential execution, but also will help you realize that not everything is better in parallel.

To Begin

For this project, you are simply writing a program that uses fork() and a native x86 atomic xchg instruction (shared va System V shared memory) to impose a partial order on processes printing-out little messages.

The Point Of The Project

It's a trivial first project. Relax. The point is mostly that you can actually write and run a trivial C program, and secondarily that you learn a bit about the problems that flow from the generality of MIMD.

Functional Requirements

Your program is to simply do the somewhat-parallel equivalent of what the sequential program nseq.c does:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
  int nproc = atoi(argv[1]);
  int iproc;

  for (iproc=0; iproc<nproc; ++iproc) {
    printf("I am PE%d\n", iproc);
  }

  for (iproc=0; iproc<nproc; ++iproc) {
    printf("I am still PE%d\n", iproc);
  }

  return(0);
}

The catch is that I want you to have each of the logical processing elements (PEs) print its own message -- in other words, you must use fork to create a separate process for each PE requested. Easy, right? In fact, they can even use printf().

Well, not quite that easy. You see, unlike running on deterministic SIMD hardware, having each of these MIMD PEs printing its messages independently means you would very likely see the output ordered differently every time you run the program. Now, I don't care what order PEs print in... except every PE must print the "I am PE" message before any PE prints the "I am still PE" message. Technically, we call that a partial ordering.

Thus, you'll need to do some communications to order the printing. In effect, I'm asking you to implement a barrier synchronization of all PEs between printing the first and second messages. Of course, when all PEs are done, the program should terminate... but not nefore then. The last thing your program should do is have PE0 print the message "All done".

In fact, I'm going to give you the framework in the form of the somewhat-too-parallel program npar.c:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/shm.h>
#include <sys/wait.h>

volatile struct shared {
  int lock;
  /* OTHER SHARED STUFF GOES HERE... */
} *shared;

inline static int
xchg(register int reg, volatile int * volatile obj)
{
  /* Atomic exchange instruction */
__asm__ __volatile__ ("xchgl %1,%0"
                      :"=r" (reg), "=m" (*obj)
                      :"r" (reg), "m" (*obj));
  return(reg);
}


inline static void
lock(void)
{
  /* Atomic spin lock... */
  while (xchg(1, &(shared->lock))) ;  
}

inline static void
unlock(void)
{
  /* Unlock... */
  shared->lock = 0;
}

int
main(int argc, char **argv)
{
  register int i, shmid;
  register int iproc = 0;
  register int nproc = 0;

  /* Allocate System V shared memory */
  shmid = shmget(IPC_PRIVATE,
                 sizeof(struct shared),
                 (IPC_CREAT | 0600));
  shared = ((volatile struct shared *) shmat(shmid, 0, 0));
  shmctl(shmid, IPC_RMID, 0);

  /* Initialize... */
  shared->lock = 0;

  /* MAYBE SOMTHING FOR YOU TO INITIALIZE HERE? */

  /* Make the processes... */
  if ((argc != 2) ||
      ((nproc = atoi(argv[1])) < 1) ||
      (nproc > 8)) {
	fprintf(stderr, "%d is not a reasonable number of processes\n", nproc);
	exit(1);
  } else {
    /* Children are 1..nproc-1 */
    while (++iproc < nproc) {
      /* Fork a child */
      if (!fork()) break;
    }
    /* Parent is 0 */
    if (iproc >= nproc) iproc = 0;
  }

  printf("I am PE%d\n", iproc);

  /* DO SOMETHING CLEVER HERE */

  printf("I am still PE%d\n", iproc);

  /* Parent waits for all children to have died */
  if (iproc == 0) {
    for (i=1; i<nproc; ++i) wait(NULL);
    printf("All Done\n");
  }

  /* Check out */
  return(0);
}

Your task is simply to replace the RED-colored comments with your own synchronization stuff using the System V shared memory segment allocated above. Your code should use lock() and unlock() to ensure that any compound accesses to data in the shared memory are atomic. For example:

lock(); shared->thing = (shared->thing * 10) + iproc; unlock();

would ensure that shared->thing ends-up with the correct value despite the fact that a load, multiply, add, and store must function together as a single atomic operation (which hardware certainly does not directly implement).

What You Should Find

Remember, the idea in this project is not to enforce a complete ordering -- that would constrain events to happen sequentially, sacrificing parallelism. Even once you've modified the above code, you should see that the order in which PEs print will vary from one run to another.

In case you're wondering why output isn't mixed at the character level, it's because printf() does smart buffering that issues a single system write() call when it sees '\n' being printed to a device that the system believes is a terminal. You'll see very different behavior if you redirect the output of npar.c as it appears above to a file, because buffers will not be flushed until each process terminates. If you want to avoid this "when does the buffer flush?" ambiguity, you can simply cause an explicit flush by executing fflush(stdout); when you want to force the buffer to be written.

Due Dates

The recommended due date for this assignment is before class, Wednesday, September 18, 2019. This submission window will close when class begins on Friday, September 20, 2019. You may submit as many times as you wish, but only the last submission that you make before class begins on Friday, September 20, 2019 will be counted toward your course grade.

Note that you can ensure that you get at least half credit for this project by simply submitting a tar of an "implementor's notes" document explaining that your project doesn't work because you have not done it yet. Given that, perhaps you should start by immediately making and submitting your implementor's notes document? (I would!)

Submission Procedure

For each project, you will be submitting a tarball (i.e., a file with the name ending in .tar or .tgz) that contains all things relevant to your work on the project. Minimally, each project tarball includes the source code for the project and a semi-formal "implementors notes" document as a PDF named notes.pdf (this is following the same guidelines used for CPE480). In this project, your implementor's notes must describe the synchronization algorithm you used. The tarball also may include test cases, sample output, a make file, etc., but should not include any files that are built by your Makefile (e.g., no binary executables). For this particular project, name the C source file nbar.c.

Submit your tarball below. The file can be either an ordinary .tar file created using tar cvf file.tar yourprojectfiles or a compressed .tgz file file created using tar zcvf file.tgz yourprojectfiles. Be careful about using * as a shorthand in listing yourprojectfiles on the command line, because if the output tar file is listed in the expansion, the result can be an infinite file (which is not ok).

Your account is ... the alphanumeric ID you use with UK stuff, all uppercase
Your password is ... the thing I sent you in an email

Your section is EE599-002 (undergrad) EE699-002 (grad)


EE599/699 GPU & Multi-Core Computing