The Aggregate's nodescape Utility
There are a multitude of tools that allow the status of a cluster's
nodes to be monitored, and quite a few of them have graphical displays.
However, the graphical displays are generally quite artificial, not
in any way representative of the physical layout of nodes. In contrast,
nodescape borrows from our Senscape concepts to color-tint an
arbitrary image to show at a glance both relative values of attributes
and how recently that status has been updated. The image being tinted
can be an artificially-created abstract one or it can be an actual
photograph of the system, thus making the correspondence between the
logical status and the physical nodes obvious.
Physically-Correct Cluster Status Images
The creation of a physically-correct status display starts with a photograph,
or perhaps several photographs stitched into one. However, there is a little
preparatory work needed to use a photo with nodescape.... Here are
the basic steps:
-
Begin by making a good-quality digital photograph, in color or black &
white, of the machine you'll be monitoring. The photo should expose the
physical structure of the machine, so you may want to open front doors on
rack mounts to make the individual boxes visible. There are no
restrictions on the spatial properties of the photo, so feel free to make
artistic use of perspective, fisheye lens distortion, etc. This base image
should have enough resolution to be suitable for any display you might use;
all our master images are at least 3000x3000 pixels.
-
Load the image into your favorite image editing tool (gimp works
nicely) and do whatever editing you desire. You can freely add textual labels,
logos, or other annotation. It is probably wise to avoid pure white and pure
black for the areas that will be tinted by nodescape; lower contrast
will make the tinting more visible. Write the image out as an
8-bits/color P6-format PPM file.
For example, the current base image for NAK looks like:
-
In addition to the base image, nodescape needs a key image that tells
it which pixels belong to which machine. The key image must be in P5 PGM format,
a graymap in which pixels valued 0 correspond to node 0, those valued 1 belong
to node 1, and so forth. The background should be pure white (255). This image
is most eaily created by selecting the region for each machine in sequence and
coloring each appropriately. Note that this key fully determines where color
tinting will be applied, so the top of the "k" in the label "nak" is white in
the key to avoid coloring it despite it overlapping with the image of one of
the nodes. NAK also is an interesting example because the nodes are not racked
in an obvious numerical order, but according to an order designed to minimze the
number of cables between racks. The key image for NAK looks like:
-
Although not strictly necessary, you may want to produce lower-resolution
images with resolution precisely matching that of the intended LCD panel.
That will speed-up the display processing. If you do this, be careful about
how you scale the key image. Do not use interpolation of any
kind on the key image: scale by taking the value of the nearest pixel,
which gimp calls interpolation "None."
There are a multitude of other tricks that can come in handy. For example,
I think having the non-node portions of the base image somewhat blurry makes
a nicer display. This can be done by selecting the white portion of the key image
and simply blurring that region on a copy of the base image.
Using nodescape
Once you've got key and base images, there really isn't much to using them
with nodescape. For example, creating continuously-updating images
of various attributes of NAK is done by:
$ nodescape knak.pgm bnak.ppm
As UDP status messages arrive, nodescape will create and update
a separate PPM status image named after each attribute for which it has been
sent data. The updates are performed by keeping all the relevant images
mapped into nodescape's memory using mmap(), and work is
performed only as new packets arrive or age-step intervals expire without
new packets, so nopdescape has surprisingly little overhead.
The creation of color-tinted versions of the base image uses the overlay
pixel coloring formula from gimp:
pixel = (base / MAX) * (base + ((2.0 * tint) / MAX) * (MAX - base)));
if (pixel < 0) pixel = 0; else if (pixel > MAX) pixel = MAX;
This formula is applied separately to each of the R, G, and B color channels,
with MAX set at the maximum pixel value (typically 255).
As seen in the sample at the top of this page, the result is a very credible
tinting that lets details of the base image show through nicely.
So, how do we determine the tint color? Well, it's a little complicated:
-
Initially, any attribute image begins by writing a copy of the base image.
However, this copy is immediately mapped and tinted, so it is only a copy
for a moment....
-
Tinting is never applied for any pixel corresponding to a white pixel in the
key image. Thus, these pixels will remain copies of the base pixels.
-
A node for which no data has ever been recorded is tinted solid magenta.
This color is used because it is highly visible and, as a mix of red and blue,
cannot occur as a pure color in a spectrum. Nodes that have never checked-in
are thus easily spotted.
-
In the interest of simplicity, nodescape learns the range of values
for each attribute by noting the minimum and maximum values seen. However, this
implies that at least the first node to check in will set both the minimum and
maximum. This special case is handled by assigning a tint of green
whenever there is no difference between the minimum and maximum.
-
A node for which new (or sufficiently recent) data has been recorded and the value
range is known is given a "pure" color from the spectrum in
blue-to-green-to-red
order. The spectrum is approximated by a piecewise linear mapping,
but appears visually fairly smooth. Yes, we know this spectrum is really backward,
but red is a more familar indication of problems, which usually occur at higher values.
-
The nodescape tool is part of the Senscape.Org
family; thus, it not only shows attribute values, but also their certainty or age.
Normally, senscapes "gray out" old or unreliable data; however, for nodescape,
old data usually means a sick node, so we want to make it stand-out from the background
of the image. Thus, pixels of nodes with old data are probabilistically tinted either
magenta or
blue-to-green-to-red.
The fraction tinted magenta slowly increases to a maximum value
as the data ages, but some pixels remain so that the last known value is still discernable.
You can see this effect for node 62 (the node next to the lower right corner)
in the image at the top of this page.
The one issue with nodescape is that the output images are always PPM
files, and programs like WWW browsers often don't understand how to display a PPM.
The PBM tools, Imagemagick, etc., can be used to convert such an image to nearly
any format desired -- and PNG is particularly appropriate -- but that will be a
separate step. For example, a WWW server might have to use a CGI to actively
convert from a PPM to a PNG when a status image is fetched. Alternatively, the
PPM image could be regularly translated when it is copied to a WWW server. In
any case, nodescape doesn't do this for you.
Using epacsedon (yeah, that's nodescape backwards)
The generic client for the nodescape server is called epacsedon.
It's not a very complex program, really just a way to send status information
to nodescape via UDP. However, running a client on every node is not
always easy and any overhead is multiplied by the fact that every node has to
do the same, so there are a variety of design decisions made to facilitate this use.
The command line format is:
epacsedon server_address {node_number} ((property_name {property_value}) | @{delay})+
Ok, that looks nasty. It really isn't too bad -- once you know what it does:
- server_address
- The hostname of the nodescape server
- node_number
- The node_number that this information is to be associated with.
Explicitly specifying this allows status update messages to come from places
other than the node whose status is being reported, which can be useful.
However, usually each node reports its own status.
If that's what you're doing, and the node hostname starts with an alphabetic
character and has the decimal node number immediately after the alphabetics,
epacsedon will parse the hostname to get the node number...
so you don't specify a node number at all.
- property_name {property_value}
- The name of the property to report.
This property_name must start with an alphabetic character.
You can make-up your own names, but epacsedon actually has a few
built-in.
The loadavg is built-in, using the last minute average.
Similarly, any name that ends in : is essentially built-in;
what actually happens is that sensors -u is invoked and the
output parsed for the first value matching the name given.
For example, temp1_input: will get the first temperature reading
from within the lm_sensors data, which is usually the lowest-numbered
processor core temperature.
Also note that the image generated by nodescape will be in a filename
that doesn't have the : at the end, e.g., temp1_input.ppm.
If the property isn't built-in, the numeric property_value
is sent as a double floating-point value.
If the property isn't built-in and no value is given,
the value 0.0 is sent.
- @{delay}
- A single invokation of epacsedon can send many status updates.
The @ arguments allow control of timing between such updates.
For example, @2.7 would insert a delay of approximately 2.7 seconds
before processing the next part of the command line.
The @ by itself is special;
it doesn't set a delay, but rather enables infinite repeating of the
complete sequence of operations specified on the command line.
Note that repeating doesn't make much sense unless the properties are all
built-in, because any other values will be unchanged across all repeats.
It is highly likely that epacsedon will soon integrate a copy of
helpme so that it can issue audio error
messages when any property it checks is out of the normal range. The catch
is that to do that it needs to know what the normal range is....
Other Random Uses
By now, you've probably realized that nodescape really has nothing
to do with cluster supercomputers per se, but is really a generic way to
dynamically tint areas of an image to show properties. For example, it can be
used to color an abstract graph of computer network properties, show traffic
conditions on a multitude of road sensors, etc.
Author Contact Info
If you have any questions or comments, contact:
Professor Hank Dietz, James F. Hardymon Chair in Networking
College of Engineering
Electrical and Computer Engineering Department
453 Anderson Hall
(Office 469 Anderson Tower, Lab 672 Anderson Tower)
Lexington, KY 40506-0046
Office Phone: (859) 257 4701
Lab Phone: (859) 257 9695
Fax : (859) 257 3092
Email: hankd@engr.uky.edu
Home URL: http://aggregate.org/hankd/
This page is: http://aggregate.org/NODESCAPE/
The only thing set in stone is our name.