TIK: Temporal Image format from Kentucky

H. G. Dietz
http://aggregate.org/hankd/

Department of Electrical and Computer Engineering
Center for Visualization & Virtual Environments
University of Kentucky, Lexington, KY 40506-0046

Initial release: July 7, 2016; Latest update: June 19, 2017


This document should be cited using something like the bibtex entry:

@techreport{tik20160707,
author={Henry Gordon Dietz},
title={TIK: Temporal Image Kentucky},
month={July},
year={2016},
institution={University of Kentucky},
howpublished={Aggregate.Org online technical report},
URL={http://aggregate.org/DIT/TIK/}
}


Introduction

There are a multitude of "portable" file formats for images and video, but TIK (pronounced "tick") is yet another. Why make another? Because TIK is not really a file format for containing images nor video -- it is a format for compressed encoding of Time Domain Continuous Imaging (TDCI) pixel waveforms.

The primary paper on tick is TIK: a time domain continuous imaging testbed using conventional still images and video, as published by IS&T. However, there is also a local copy of the full paper and presentation slides. There is also Temporal super-resolution for time domain continuous imaging (slides, full paper).

Before describing TIK, it is useful to note that not only are there a multitude of image and video file formats, but many of them are extensible wrappers that could be used to hold TDCI data. For example, TIFF has already been extended to handle DNG, so why not use it for TIK too? Well, the answer is that we will. There will almost certainly be a version of TIK that uses a TIFF wrapper. At this writing, TDCI is still a very young concept, so we have given priority to making the simplest possible encoding for TIK to ease other people playing with TDCI... and TIFF isn't simple.

The Basic TIK Format

The Basic TIK format is a simple extension of the Netpbm PGM (Portable Gray Map) and PPM (Portable Pixel Map) formats. These formats are well known and widely recognized, yet have a much simpler directory structure than other formats. In fact, they use a trivial textual header followed by byte-aligned raw binary data. They key idea is that loading a TIK file into an image-editing tool that doesn't understand TDCI will result in an image representing the initial state of the TDCI encoding.

Thus, although we strongly recommend using file names that end in either .TIK or .tik, a TIK file is really just a structured subset of the PGM or PPM file structure:

  1. A three-character "magic number" of either "P6\n", for any RGB encoding (as a PPM), or "P5\n" for any other color encoding (as a PGM).
  2. One or more structured comments, each of which begins a new line starting with "# TIK " and ends with "\n". All the content of structured comments is parsed as TIK field names and values.
  3. Optional unstructured comments, starting with "# " but not having the first word be "TIK"
  4. The image width as an ASCII decimal integer string. This value is in units suitable for interpretation of the initial image data as either a PPM or PGM. For RGB pixels, it is literally the number of pixels. For other color encodings, it is generally the number of image samples per line. For example, the live-view CHDK color encoding uses UYVYYY bytes to encode four pixels, but the "P5\n" width would be the number of bytes in a line -- 6/4 the number of pixels.
  5. Whitespace -- any number of spaces, tabs (\t), newlines (\n, or carriage-returns (\r). Here, we prefer one space.
  6. The image height, in lines, as an ASCII decimal integer string.
  7. Whitespace. Here, we prefer one newline.
  8. The maximum value per color channel as an ASCII decimal integer string. This is primarily used to encode the number of bytes used to store each color channel value. Normally, a single byte is used; however, any value between 256 and 65535 (inclusive) will force two-byte encoding. The actual value is used by many tools to scale the channel values read, and TIK tools will do this scaling, but be aware that not all PPM or PGM tools do.
  9. A single newline.
  10. The raw binary image data, starting with a width*height array of samples, each of which encodes one value for PGM or three for PPM. Note that encoding of samples is according to the format rules given below.
  11. The TDCI encoded data. The encoding of this data is entirely determined by the structured comments in the header. In some formats, the encoded data will start with a single byte with the value 0. This is used to ensure that the following TDCI data will not be interpreted as a sequence of conventional PPM or PGM images in a single file. The ambiguity comes from the fact that tools like ffmpeg and display have the ability to generate/operate on streams of PPM or PGM images sent through a pipe without extra markers; the 0 byte ensures that a second "P6\n" or "P5\n" header will not be found, whereas without that byte, it is unlikely, but possible, data would be misinterpreted in that way. Note that it is not expected that more than one TIK TDCI encoding would reside within a single file.

Header Structured Comments

The header in a basic TIK format file specifies everything about the encoded data using structured comments that begin with "#", one or more spaces or tabs, "TIK", one or more spaces or tabs, and then a series of space or tab separated words. Each word is either a decimal integer ASCII number or a keyword that does not start with a digit nor negative sign ("-"). There may be a variable number of words in a structured comment, but the first word defines how the other words will be treated, and the sequence ends at the the end of the line.

The V structured comment must appear before any other structured comments in the file, normally as the second line of the file. However, here are the currently-defined structured comment types listed in alphabetical order:

B number
The time that the first frame begins at is number nanoseconds. This value is most useful for setting the times for separate video or frame sequences to be merged. Note that for Canon PowerShots under CHDK the camera time is measured in 1/1000s ticks, and using that directly rather than nanoseconds avoids 64-bit math in camera. The CHDK time can thus be output as the time in ticks followed by six zero, e.g., a time of 56 ticks would become B 56000000.
E number
An EV (exposure value) to be used for approximately scaling pixel values into known luminances....
F number
The time per frame is number nanoseconds. This value may be computed from the framerate, FPS (frames per second), as ((int) ((1000000000.0 / FPS) + 0.5)).
G number
The gamma by which the samples should be decoded is number. Although some formats use a linear (gamma 1.0) encoding, typically, the dynamic range of image data is compressed. In truth, it usually is not compressed using a simple gamma value, but using an approximately correct gamma value when operating on pixel values (e.g., averaging value) helps preserve approximate linearity. For example, typical JPEG data is approximately scaled by pow(datum, 1.0/2.2), so a roughly linear scaled value can be obtained by pow(datum, 2.2). Such an encoding would be marked with 2200000, which may be computed from the decoding gamma as ((int) ((gamma * 1000000.0) + 0.5)). Note that the scaling here is by 1000000, not 1000000000 as is used for times.
R number numberXdiv numberYdiv
The capture used a rolling shutter scan with the time at which the pixel at coordinates X,Y is sampled being offset from the start time by ((X*number/numberXdiv)+(Y*number/numberYdiv)). If either of numberXdiv or numberYdiv is zero, it means that dimension suffers no delays; if negative, it means the corresponding axis is scanned in reverse order (large to small). Note that in some TIK file formats, this also changes the pixel walk order to match the rolling scan order. For the purposes of TIK files, a rolling shutter is any type of shutter that causes this type of temporal skew of sampling of pixel values based on X,Y coordinates. This happens with most electronic shutters, but also occurs using a mechanical focal plane shutter. Suppose you are capturing 10 frames/second using a DSLR with a focal plane shutter that has a flash sync of 1/100 second, but is set to expose for 1/50 second. The flash sync speed implies that it takes approximately 0.01s for the curtain to open on the far side of the sensor, so we would encode that starting with R 10000000 (i.e., ((int) ((1000000000.0 * 0.01) + 0.5))), meaning that rather than all pixels being sampled over the interval 0-0.02s, pixels are sampled at linearly-varying times as a function of their X,Y coordinates. For example, consider capturing a 320x240 image using a rolling shutter. "R 10000000 0 239" would specify that an entire line is read simultaneously, but line Y is read at time ((X*0)+(Y*10000000/239)ns offset from the frame start (note that Y would be going from 0 to 239). The scan order can be more complex; for example, "R 10000000 -76560 240" would mean that the pixels are sampled in the same line order just described, but within each line X coordinates are traversed in reverse order, so that pixel X,Y is sampled starting at (((319-X)*10000000/76560)+(Y*10000000/240))ns. Note that 76560 is 319*240. Even specifying scan orders with temporal gaps between lines or columns is possible using this notation.
T number
The shutter open time per frame is number nanoseconds. This value may be computed from the shutter speed, Tv (in seconds), as ((int) ((1000000000.0 * Tv) + 0.5)). The sum of this time and the last pixel's rolling shutter delay is assumed to always be no longer than the time per frame.
V version format_name ...
This must be the very first structured comment in the .tik file. The version is an 8-digit number specifying the standard compliance date of the TIK encoding. For example, 20160712 would mean that this file is formatted as specified by the TIK standard that was in effect on July 12, 2016. The format_name can be any of those described later in this document, some of which require additional arguments.
X number
The X dimension (width) of the image data is number. The units are those of pixel data blocks. For RGB data, each tuple of three values counts as one unit; for UYVYYY data, each tuple of six values counts as one unit.
Y number
The Y dimension (height) of the image data is number.
Z number
The Z dimension (maximum value) of the image data is number. For example, 100 would mean single-color-channel values are all between 0 and 100 inclusive. This can also be considered as setting the white point.

Time Domain Continuous Image Data Format

Different (version, format_name) tuples in the TIK V structured comment can imply different encodings of the data. The encoding of a tupple (a, b) is equivalent to that specified for (version, format_name) where b is equal to format_name and version is largest standard release value not greater than a. Valid releases are:

20160721 CONVERT pattern numberBegin numberEnd
This .tik file is just a header describing image data, and need not contain any pixel values. Instead, it specifies a pattern for naming one or more files, each of which holds image pixel data in any still image format that the ImageMagick convert tool can transform into a P6 file. The pattern is taken as a format string which is used with sprintf and one integer parameter to produce each file name. The integer parameter first has the value numberBegin, and is incremented by 1 each time a file is processed, ending with the last value not greater than numberEnd. For example, "CONVERT IMG%05u.JPG 1 4" would attempt to process the sequence of images IMG00001.JPG, IMG00002.JPG, IMG00003.JPG, and IMG00004.JPG; if any image cannot be opened, it will be skipped, but still counted against the framerate. For example, if there was no IMG00002.JPG, but a structured comment set F 1000000000, then the three other frames would be interpreted as spanning time intervals from 0..1s, 2..3s, and 3..4s (this behavior can be useful for processing timestamped surveillance still captures). The output from convert is a P6 file, so the header of this .tik file should also start with P6; however, treating this .tik header as a P6 image results in undefined behavior -- the file could hold any image, or could even claim the image dimensions are 0,0 with a maximum value of 0.
20160721 FFMPEG filename
This .tik file is just a header describing image data, and need not contain any pixel values. Instead, the pixel values are extracted using ffmpeg to decode the contents of the video in the file named filename (if filename is omitted, the video filename is assumed to be the next argument on the tik command line). In this way, TDCI information can be specified for arbitrary video files without needing to incorporate such information in the files. The output from ffmpeg is a stream of P6 files, so the header of this .tik file should also start with P6; however,treating this .tik header as a P6 image results in undefined behavior -- the file could hold any image, or could even claim the image dimensions are 0,0 with a maximum value of 0.
20160712 UYVYYY
The storage format is actually just a sequence of P5 images, each including a header, in sequence. That's it. There are just two issues. The first is that the headers after the first do not need to have the structured comments in them. The second is that the YUVYYY values used inside CHDK are unsigned for Y, but signed for U and V, and this is maintained in the TIK file. Thus bytes 0 and 2 of each group of 6 bytes are signed. The CHDK color conversion formulas depend on that:
R = min(max(((Y << 12) + (V * 5743) + 2048) &rt;&rt; 12), 0), 255)
G = min(max(((Y << 12) + (U * 1411) + (V * 2925) + 2048) &rt;&rt; 12), 0), 255)
B = min(max(((Y << 12) + (U * 7258) + 2048) &rt;&rt; 12), 0), 255)
20160712 RGB
The header and initial image are processed normally. Next, a single 0 byte is output. The image is scanned in an order determined either by the rolling shutter specification or in the default increasing Y, nesting increasing X, pixel order (which is also the standard order for PGM and PPM files... what we would encode as two non-negative numbers for the for rolling shutter scan order R, where the non-zero Y value is at least width times larger than the X value). Treating a tuple of Red, Green, and Blue values as a pixel value, encode the number of pixels unchanged from their previously-recorded values (the span), then output the three raw pixel values. The span can be greater than the number of pixels in an image, representing entire frames with expected values. If the span is 0-127, it is output as a single byte; otherwise, the 7 least-significant bits of span are output in a byte ORed with 0x80, the remaining span is shifted right by 7 bits, and the process repeated until no 1 bits remain in span. The pixel values are output as one byte each for R, G, and B. Thus, a span of 257 pixels followed by an R, G, B values of 0x11, 0x22, 0x33 would be encoded as the five bytes: 0x02, 0x01, 0x11, 0x22, 0x33.
20160629 UYVYYY
This format is now deprecated. The header and initial image are processed normally. Next, a single 0 byte is output. The image is scanned in increasing Y, nesting increasing X, pixel order (which is also the standard order for PGM and PPM files... what we would encode as YX for rolling shutter scan order). Although a tuple of UYVYYY values really represents four pixels, it is treated as one unit value. Encode the number of units unchanged from their previously-recorded values (the span), then output the six-byte UYVYYY value. The span can be greater than the number of units in an image, representing entire frames with expected values. If the span is 0-127, it is output as a single byte; otherwise, the 7 least-significant bits of span are output in a byte ORed with 0x80, the remaining span is shifted right by 7 bits, and the process repeated until no 1 bits remain in span. Thus, a span of 257 pixels followed by UYVYYY values of 0x11, 0x22, 0x33, 0x44, 0x55, 0x66 would be encoded as the eight bytes: 0x02, 0x01, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66. A subtle point is that the YUVYYY values used inside CHDK are unsigned for Y, but signed for U and V; in the TIK file, U and V values are converted to unsigned by adding 128. Thus, one must subtract 128 from the encoded U and V values to recover U and V values suitable for use with the CHDK color conversion formulas:
R = min(max(((Y << 12) + (V * 5743) + 2048) &rt;&rt; 12), 0), 255)
G = min(max(((Y << 12) + (U * 1411) + (V * 2925) + 2048) &rt;&rt; 12), 0), 255)
B = min(max(((Y << 12) + (U * 7258) + 2048) &rt;&rt; 12), 0), 255)

The tik Program

The demonstration program for TDCI imaging is called tik -- just like the file format. However, it does not just process .tik files. For example, it can be used to extract virtual exposures from conventional video files, etc. Much of that flexibility comes from use of ffmpeg to decode videos, or ImageMagick convert to convert still images from various formats, into sequences of PPMs that tik can process. Note that tik literally runs ffmpeg as a separate program, which means that either tool can be updated completely independently without the need to recompile from source code. At some point, tik might also mutate into a plugin for ffmpeg, but not right now... the same is true about having a graphical user interface.

tik Command Line Options

Like so many other video processing applications, the command line for tik is fairly complex. Note that number refers to an arbitrary floating-point number which can be specified directly or as 1/value notation where value is a floating-point number. Thus, 0.01 also can be written as 1/100. It is useful to note that the command line options are not identical to the fields in TIK structured comments; for example, TIK expresses framerate as an integer time per frame in nanoseconds, while the command line option uses a floating-point FPS (Frames Per Second) value.

Currently, the following options are recognized:

-anumber
Set the shutter angle to number (which can only be done after a framerate has been set using -f). In "old school" cinematography, a rotary disc shutter blade was synchronized to the advance of the film so that the film was covered during advance. This meant that the shutter was open for something less than 360 degrees of the shutter rotation cycle, and it became common to refer to this number as the shutter angle. Of course, the angle is merely specifying the shutter speed in seconds, Tv, relative to the FPS (frames Per Second). The formula is Tv=(angle/360)/FPS. Typical cinematography used a shutter angle of about 180, so 24 FPS frames would each use an shutter speed of 1/48s. In modern video capture, it is common that Tv is determined entirely by the exposure conditions, and 24 FPS video might well use a Tv of 1/500s, or a shutter angle of less than 18 degrees -- and often obvious discontinuity of motion. For virtual exposures, tik allows setting any shutter angle greater than 0 degrees and aligns the exposures at the start of the frame time. Thus, 180 exposes during the first half of the frame time, 360 would make framerate and exposure time identical, and a number greater than 360 includes temporal data from after the end of the frame time.
-bnumber
Set the begin exposures time to number seconds from the start of the TDCI input. Use of this option implies that the goal is creation of one or more virtual exposures.
-efilename
Specify the error model to use as the one in the PPM file filename. An error model is expressed as a 256x256 RGB P6 PPM image, and can be edited using ordinary image editors, such as gimp. However, if filename is omitted, the -e command specifies that an error model image should be created from the input TDCI, which is assumed to be a static, constant, scene containing a plurality of color and brightness values. Notes that these error models essentially account for all types of pixel value error, including photon shot noise, so using an error model tuned to your particular camera, and even specific tone curve and ISO settings, can produce higher-quality results. If no -e is specified and an error model is needed, a built-in default one will be used.
-fnumber
Set the FPS (Frames Per Second) to number.
-gnumber
Set the gamma of encoded image data to number. Gamma is a way to specify an exponential tonal remapping used to simulate the fact that human eyesight is logarithmically sensitive to light while sensors used in digital cameras have more linear response. Most image processing works best on data with linear gamma, and this is the default assumed for inputs if no gamma value is specified. However, typical JPEG or PPM image files are actually encoded with an effective gamma of roughly 2.2, so specifying this as -g2.2 will allow internal processing to better preserve approximate linearity. The actual gamma used in encoding images is often not a simple exponential function and the approximate value is not always 2.2 -- Apple prefered a gamma of 1.8. However, small errors in the gamma handling are generally not critical, as the gamma correction is only applied internal to tik: the pixel values output have precisely the same gamma as the ones input.
-i
Toggle interactive mode. In interactive mode, tik becomes rather verbose with output to stderr (file descriptior 2), but the status messages and progress indications can be reassuring and useful. By default, interactive mode is set only if stderr is a TTY device (a terminal).
-mfilename
This option is not yet implemented and may change. Set the exposure time map file name to filename. The values in the map image are taken to be fractions of the specified virtual shutter time for each exposure that each pixel should use. The virtual exposure for a pixel location is (Tv*G)/255 centered at the position in the Tv interval which is (Tv/2)+(((B-R)/255)*(Tv/2)) offset from the start. Thus, reddish colors are at the start of the interval and blueish ones are at the end; white would represent sampling the entire Tv interval.
-ninteger
Set the number of frames to process as integer. On input, the default is to process all frames. On output, the default is to process just a single frame.
-ofilename
Set the output file name to filename. There are suitable defaults if no name is specified. In the usual unix convention, specifying - as the filename will catenate all output to stdout.
-pnumber
Set the minimum acceptable probability that two pixel values are equivalent to number percent. This, combined with the error model, controls the statistical merging of pixel values in TIK files. In effect, this allows specification of bounds on considering a changed pixel value to be changed only by noise -- not a real change in scene appearance. The analogy is somewhat imprecise, but the usual values of 32, 5, and 0.3 would roughly correspond to accepting one, two, or three standard deviations as still being the same value within noise tolerance. The default value is 4.55, or two standard deviations. Smaller values might reduce TIK file sizes and can increase effective dynamic range, but at the expense of some loss of temporal accuracy.
-qnumber
Set the encoding quality to number percent. This controls a variety of internal mechanisms as well as directly setting the encoding quality for any JPEG output files. In general, higher values require more compute time and often will generate larger files. It is generally best to stay in the range from 75 to 100.
-tnumber
Set the shutter speed in seconds, Tv, to number. It does not make sense to use this option in the same command line as -a.
-v
The input is a video (frames) to be converted to a TIK TDCI format.
filename
Any filename given is taken to be an input to the program. Inputs in formats not directly recognized by tik are passed to ffmpeg, which decodes them and sends the results back to tik via a named pipe created by tik for that purpose. None of files named will be modified in any way, but they may be opened and read more than once in a single execution of tik.

Examples Of tik Commands

The tik command line is complex enough that usage is not obvious. Thus, it is useful to show examples of common uses.

Create An Error Model

To create an error model file named myerrmod.ppm from a video named testchart.avi:

tik -e -omyerrmod.ppm testchart.avi

Encode A Video As A TIK File

Using an error model file named myerrmod.ppm, make a TIK file named myvideo.tik from a video named myvideo.mp4:

tik -emyerrmod.ppm -omyvideo.tik myvideo.mp4

Create JPEG Virtual Exposures From A TIK File

Use the a TIK file named myvideo.tik to extract ten virtual exposures into files named mystill0.jpg through mystill9.jpg. Make the sequence of ten virtual exposures start 0.5 seconds into the TDCI, representing a sequence of frames at 24 FPS and a shutter angle of 90 degrees:

tik -n10 -omystill%d.jpg -b0.5 -f24 -a90 myvideo.tik

Code

The following code is internally ready for testing:


The Aggregate. The only thing set in stone is our name.