The Aggregate's `nodescape` Utility

There are a multitude of tools that allow the status of a cluster's nodes to be monitored, and quite a few of them have graphical displays. However, the graphical displays are generally quite artificial, not in any way representative of the physical layout of nodes. In contrast, nodescape borrows from our Senscape concepts to color-tint an arbitrary image to show at a glance both relative values of attributes and how recently that status has been updated. The image being tinted can be an artificially-created abstract one or it can be an actual photograph of the system, thus making the correspondence between the logical status and the physical nodes obvious.

Physically-Correct Cluster Status Images

The creation of a physically-correct status display starts with a photograph, or perhaps several photographs stitched into one. However, there is a little preparatory work needed to use a photo with nodescape.... Here are the basic steps:

Begin by making a good-quality digital photograph, in color or black & white, of the machine you'll be monitoring. The photo should expose the physical structure of the machine, so you may want to open front doors on rack mounts to make the individual boxes visible. There are no restrictions on the spatial properties of the photo, so feel free to make artistic use of perspective, fisheye lens distortion, etc. This base image should have enough resolution to be suitable for any display you might use; all our master images are at least 3000x3000 pixels.
Load the image into your favorite image editing tool (gimp works nicely) and do whatever editing you desire. You can freely add textual labels, logos, or other annotation. It is probably wise to avoid pure white and pure black for the areas that will be tinted by nodescape; lower contrast will make the tinting more visible. Write the image out as an 8-bits/color P6-format PPM file. For example, the current base image for NAK looks like:
In addition to the base image, nodescape needs a key image that tells it which pixels belong to which machine. The key image must be in P5 PGM format, a graymap in which pixels valued 0 correspond to node 0, those valued 1 belong to node 1, and so forth. The background should be pure white (255). This image is most eaily created by selecting the region for each machine in sequence and coloring each appropriately. Note that this key fully determines where color tinting will be applied, so the top of the "k" in the label "nak" is white in the key to avoid coloring it despite it overlapping with the image of one of the nodes. NAK also is an interesting example because the nodes are not racked in an obvious numerical order, but according to an order designed to minimze the number of cables between racks. The key image for NAK looks like:
Although not strictly necessary, you may want to produce lower-resolution images with resolution precisely matching that of the intended LCD panel. That will speed-up the display processing. If you do this, be careful about how you scale the key image. Do not use interpolation of any kind on the key image: scale by taking the value of the nearest pixel, which gimp calls interpolation "None."

There are a multitude of other tricks that can come in handy. For example, I think having the non-node portions of the base image somewhat blurry makes a nicer display. This can be done by selecting the white portion of the key image and simply blurring that region on a copy of the base image.

Using `nodescape`

Once you've got key and base images, there really isn't much to using them with nodescape. For example, creating continuously-updating images of various attributes of NAK is done by:

$ nodescape knak.pgm bnak.ppm

As UDP status messages arrive, nodescape will create and update a separate PPM status image named after each attribute for which it has been sent data. The updates are performed by keeping all the relevant images mapped into nodescape's memory using mmap(), and work is performed only as new packets arrive or age-step intervals expire without new packets, so nopdescape has surprisingly little overhead.

The creation of color-tinted versions of the base image uses the overlay pixel coloring formula from gimp:

pixel = (base / MAX) * (base + ((2.0 * tint) / MAX) * (MAX - base)));
if (pixel < 0) pixel = 0; else if (pixel > MAX) pixel = MAX;

This formula is applied separately to each of the R, G, and B color channels, with MAX set at the maximum pixel value (typically 255). As seen in the sample at the top of this page, the result is a very credible tinting that lets details of the base image show through nicely.

So, how do we determine the tint color? Well, it's a little complicated:

Initially, any attribute image begins by writing a copy of the base image. However, this copy is immediately mapped and tinted, so it is only a copy for a moment....
Tinting is never applied for any pixel corresponding to a white pixel in the key image. Thus, these pixels will remain copies of the base pixels.
A node for which no data has ever been recorded is tinted solid magenta. This color is used because it is highly visible and, as a mix of red and blue, cannot occur as a pure color in a spectrum. Nodes that have never checked-in are thus easily spotted.
In the interest of simplicity, nodescape learns the range of values for each attribute by noting the minimum and maximum values seen. However, this implies that at least the first node to check in will set both the minimum and maximum. This special case is handled by assigning a tint of green whenever there is no difference between the minimum and maximum.
A node for which new (or sufficiently recent) data has been recorded and the value range is known is given a "pure" color from the spectrum in blue-to-green-to-red order. The spectrum is approximated by a piecewise linear mapping, but appears visually fairly smooth. Yes, we know this spectrum is really backward, but red is a more familar indication of problems, which usually occur at higher values.
The nodescape tool is part of the Senscape.Org family; thus, it not only shows attribute values, but also their certainty or age. Normally, senscapes "gray out" old or unreliable data; however, for nodescape, old data usually means a sick node, so we want to make it stand-out from the background of the image. Thus, pixels of nodes with old data are probabilistically tinted either magenta or blue-to-green-to-red. The fraction tinted magenta slowly increases to a maximum value as the data ages, but some pixels remain so that the last known value is still discernable. You can see this effect for node 62 (the node next to the lower right corner) in the image at the top of this page.

The one issue with nodescape is that the output images are always PPM files, and programs like WWW browsers often don't understand how to display a PPM. The PBM tools, Imagemagick, etc., can be used to convert such an image to nearly any format desired -- and PNG is particularly appropriate -- but that will be a separate step. For example, a WWW server might have to use a CGI to actively convert from a PPM to a PNG when a status image is fetched. Alternatively, the PPM image could be regularly translated when it is copied to a WWW server. In any case, nodescape doesn't do this for you.

Using `epacsedon` (yeah, that's `nodescape` backwards)

The generic client for the nodescape server is called epacsedon. It's not a very complex program, really just a way to send status information to nodescape via UDP. However, running a client on every node is not always easy and any overhead is multiplied by the fact that every node has to do the same, so there are a variety of design decisions made to facilitate this use. The command line format is:

epacsedon server_address {node_number} ((property_name {property_value}) | @{delay})+

Ok, that looks nasty. It really isn't too bad -- once you know what it does:

server_address: The hostname of the nodescape server
node_number: The node_number that this information is to be associated with. Explicitly specifying this allows status update messages to come from places other than the node whose status is being reported, which can be useful. However, usually each node reports its own status. If that's what you're doing, and the node hostname starts with an alphabetic character and has the decimal node number immediately after the alphabetics, epacsedon will parse the hostname to get the node number... so you don't specify a node number at all.
property_name {property_value}: The name of the property to report. This property_name must start with an alphabetic character. You can make-up your own names, but epacsedon actually has a few built-in. The loadavg is built-in, using the last minute average. Similarly, any name that ends in : is essentially built-in; what actually happens is that sensors -u is invoked and the output parsed for the first value matching the name given. For example, temp1_input: will get the first temperature reading from within the lm_sensors data, which is usually the lowest-numbered processor core temperature. Also note that the image generated by nodescape will be in a filename that doesn't have the : at the end, e.g., temp1_input.ppm. If the property isn't built-in, the numeric property_value is sent as a double floating-point value. If the property isn't built-in and no value is given, the value 0.0 is sent.
@{delay}: A single invokation of epacsedon can send many status updates. The @ arguments allow control of timing between such updates. For example, @2.7 would insert a delay of approximately 2.7 seconds before processing the next part of the command line. The @ by itself is special; it doesn't set a delay, but rather enables infinite repeating of the complete sequence of operations specified on the command line. Note that repeating doesn't make much sense unless the properties are all built-in, because any other values will be unchanged across all repeats.

It is highly likely that epacsedon will soon integrate a copy of helpme so that it can issue audio error messages when any property it checks is out of the normal range. The catch is that to do that it needs to know what the normal range is....

Other Random Uses

By now, you've probably realized that nodescape really has nothing to do with cluster supercomputers per se, but is really a generic way to dynamically tint areas of an image to show properties. For example, it can be used to color an abstract graph of computer network properties, show traffic conditions on a multitude of road sensors, etc.

Author Contact Info

If you have any questions or comments, contact:

Professor Hank Dietz, James F. Hardymon Chair in Networking
College of Engineering
Electrical and Computer Engineering Department
453 Anderson Hall
(Office 469 Anderson Tower, Lab 672 Anderson Tower)
Lexington, KY 40506-0046

Office Phone: (859) 257 4701
Lab Phone:    (859) 257 9695
Fax :         (859) 257 3092
Email:        hankd@engr.uky.edu
Home URL:     http://aggregate.org/hankd/

This page is: http://aggregate.org/NODESCAPE/

The only thing set in stone is our name.

The Aggregate's nodescape Utility