TIK: Temporal Image format from Kentucky
H. G. Dietz
http://aggregate.org/hankd/
Department of Electrical and Computer Engineering
Center for Visualization & Virtual Environments
University of Kentucky, Lexington, KY 40506-0046
Initial release: July 7, 2016; Latest update: June 19, 2017
This document should be cited using something like the bibtex entry:
@techreport{tik20160707,
author={Henry Gordon Dietz},
title={TIK: Temporal Image Kentucky},
month={July},
year={2016},
institution={University of Kentucky},
howpublished={Aggregate.Org online technical report},
URL={http://aggregate.org/DIT/TIK/}
}
Introduction
There are a multitude of "portable" file formats for images and
video, but TIK (pronounced "tick") is yet another. Why make
another? Because TIK is not really a file format for containing
images nor video -- it is a format for compressed encoding of
Time Domain Continuous Imaging (TDCI) pixel waveforms.
The primary paper on tick is TIK: a time
domain continuous imaging testbed using conventional still images and
video, as published by IS&T. However, there is also a local copy of the full paper and presentation slides. There is also Temporal super-resolution for
time domain continuous imaging (slides, full
paper).
Before describing TIK, it is useful to note that not only are
there a multitude of image and video file formats, but many of
them are extensible wrappers that could be used to hold TDCI
data. For example, TIFF has already been extended to handle DNG,
so why not use it for TIK too? Well, the answer is that we
will. There will almost certainly be a version of TIK that
uses a TIFF wrapper. At this writing, TDCI is still a very young
concept, so we have given priority to making the simplest
possible encoding for TIK to ease other people playing with
TDCI... and TIFF isn't simple.
The Basic TIK Format
The Basic TIK format is a simple extension of the Netpbm
PGM (Portable Gray Map) and PPM (Portable Pixel Map) formats.
These formats are well known and widely recognized, yet have a
much simpler directory structure than other formats. In fact,
they use a trivial textual header followed by byte-aligned raw
binary data. They key idea is that loading a TIK file into an
image-editing tool that doesn't understand TDCI will result in
an image representing the initial state of the TDCI encoding.
Thus, although we strongly recommend using file names that end in
either .TIK or .tik, a TIK file is really just
a structured subset of the PGM or PPM file structure:
-
A three-character "magic number" of either "P6\n", for
any RGB encoding (as a PPM), or "P5\n" for any other
color encoding (as a PGM).
-
One or more structured comments, each of which begins a new line
starting with "# TIK " and ends with "\n".
All the content of structured comments is parsed as TIK field
names and values.
-
Optional unstructured comments, starting with "# " but
not having the first word be "TIK"
-
The image width as an ASCII decimal integer string. This value
is in units suitable for interpretation of the initial image
data as either a PPM or PGM. For RGB pixels, it is literally the
number of pixels. For other color encodings, it is generally the
number of image samples per line. For example, the live-view
CHDK color encoding uses UYVYYY bytes to encode four pixels, but
the "P5\n" width would be the number of bytes in a line
-- 6/4 the number of pixels.
-
Whitespace -- any number of spaces, tabs (\t), newlines
(\n, or carriage-returns (\r). Here, we prefer
one space.
-
The image height, in lines, as an ASCII decimal integer string.
-
Whitespace. Here, we prefer one newline.
-
The maximum value per color channel as an ASCII decimal integer
string. This is primarily used to encode the number of bytes
used to store each color channel value. Normally, a single byte
is used; however, any value between 256 and 65535 (inclusive)
will force two-byte encoding. The actual value is used by many
tools to scale the channel values read, and TIK tools will do
this scaling, but be aware that not all PPM or PGM tools do.
-
A single newline.
-
The raw binary image data, starting with a width*height array of
samples, each of which encodes one value for PGM or three for
PPM. Note that encoding of samples is according to the format
rules given below.
-
The TDCI encoded data. The encoding of this data is entirely
determined by the structured comments in the header. In some
formats, the encoded data will start with a single byte with the
value 0. This is used to ensure that the following TDCI data
will not be interpreted as a sequence of conventional PPM or PGM
images in a single file. The ambiguity comes from the fact that
tools like ffmpeg and display have the ability
to generate/operate on streams of PPM or PGM images sent through
a pipe without extra markers; the 0 byte ensures that a second
"P6\n" or "P5\n" header will not be found,
whereas without that byte, it is unlikely, but possible, data
would be misinterpreted in that way. Note that it is not
expected that more than one TIK TDCI encoding would reside
within a single file.
Header Structured Comments
The header in a basic TIK format file specifies everything about
the encoded data using structured comments that begin with
"#", one or more spaces or tabs, "TIK", one or
more spaces or tabs, and then a series of space or tab separated
words. Each word is either a decimal integer ASCII number or a
keyword that does not start with a digit nor negative sign
("-"). There may be a variable number of words in a
structured comment, but the first word defines how the other
words will be treated, and the sequence ends at the the end of
the line.
The V structured comment must appear before any other
structured comments in the file, normally as the second line of
the file. However, here are the currently-defined structured
comment types listed in alphabetical order:
-
B number
-
The time that the first frame begins at is number
nanoseconds. This value is most useful for setting the times
for separate video or frame sequences to be merged. Note that
for Canon PowerShots under CHDK the camera time is measured in
1/1000s ticks, and using that directly rather than nanoseconds
avoids 64-bit math in camera. The CHDK time can thus be output
as the time in ticks followed by six zero, e.g., a time of 56
ticks would become B 56000000.
-
E number
-
An EV (exposure value) to be used for approximately scaling
pixel values into known luminances....
-
F number
-
The time per frame is number nanoseconds. This value
may be computed from the framerate, FPS (frames per second), as
((int) ((1000000000.0 / FPS) + 0.5)).
-
G number
-
The gamma by which the samples should be decoded is
number. Although some formats use a linear (gamma 1.0)
encoding, typically, the dynamic range of image data is
compressed. In truth, it usually is not compressed using a
simple gamma value, but using an approximately correct gamma
value when operating on pixel values (e.g., averaging value)
helps preserve approximate linearity. For example, typical JPEG
data is approximately scaled by pow(datum, 1.0/2.2), so
a roughly linear scaled value can be obtained by pow(datum,
2.2). Such an encoding would be marked with
2200000, which may be computed from the decoding gamma
as ((int) ((gamma * 1000000.0) + 0.5)). Note that the
scaling here is by 1000000, not 1000000000 as is used for times.
-
R number numberXdiv numberYdiv
-
The capture used a rolling shutter scan with the time at which
the pixel at coordinates X,Y is sampled being offset from the
start time by
((X*number/numberXdiv)+(Y*number/numberYdiv)).
If either of numberXdiv or numberYdiv is zero,
it means that dimension suffers no delays; if negative, it means
the corresponding axis is scanned in reverse order (large to
small). Note that in some TIK file formats, this also changes
the pixel walk order to match the rolling scan order. For the
purposes of TIK files, a rolling shutter is any type of shutter
that causes this type of temporal skew of sampling of pixel
values based on X,Y coordinates. This happens with most
electronic shutters, but also occurs using a mechanical focal
plane shutter. Suppose you are capturing 10 frames/second using
a DSLR with a focal plane shutter that has a flash sync of 1/100
second, but is set to expose for 1/50 second. The flash sync
speed implies that it takes approximately 0.01s for the curtain
to open on the far side of the sensor, so we would encode that
starting with R 10000000 (i.e., ((int)
((1000000000.0 * 0.01) + 0.5))), meaning that rather than
all pixels being sampled over the interval 0-0.02s, pixels are
sampled at linearly-varying times as a function of their X,Y
coordinates. For example, consider capturing a 320x240 image
using a rolling shutter. "R 10000000 0 239" would
specify that an entire line is read simultaneously, but line Y
is read at time ((X*0)+(Y*10000000/239)ns offset from the frame
start (note that Y would be going from 0 to 239). The scan order
can be more complex; for example, "R 10000000 -76560
240" would mean that the pixels are sampled in the same
line order just described, but within each line X coordinates
are traversed in reverse order, so that pixel X,Y is sampled
starting at (((319-X)*10000000/76560)+(Y*10000000/240))ns. Note
that 76560 is 319*240. Even specifying scan orders with temporal
gaps between lines or columns is possible using this notation.
-
T number
-
The shutter open time per frame is number nanoseconds.
This value may be computed from the shutter speed, Tv (in
seconds), as ((int) ((1000000000.0 * Tv) + 0.5)). The
sum of this time and the last pixel's rolling shutter delay is
assumed to always be no longer than the time per frame.
-
V version format_name ...
-
This must be the very first structured comment in the
.tik file. The version is an 8-digit
number specifying the standard compliance date of the TIK
encoding. For example, 20160712 would mean that this
file is formatted as specified by the TIK standard that was in
effect on July 12, 2016. The format_name can be any of
those described later in this document, some of which require
additional arguments.
-
X number
-
The X dimension (width) of the image data is number.
The units are those of pixel data blocks. For RGB
data, each tuple of three values counts as one unit; for
UYVYYY data, each tuple of six values counts as one
unit.
-
Y number
-
The Y dimension (height) of the image data is number.
-
Z number
-
The Z dimension (maximum value) of the image data is number.
For example, 100 would mean single-color-channel values
are all between 0 and 100 inclusive. This can also be considered as
setting the white point.
Time Domain Continuous Image Data Format
Different (version, format_name) tuples in the
TIK V structured comment can imply different encodings
of the data. The encoding of a tupple (a, b)
is equivalent to that specified for (version,
format_name) where b is equal to
format_name and version is largest standard
release value not greater than a. Valid releases are:
-
20160721 CONVERT pattern numberBegin numberEnd
-
This .tik file is just a header describing image data,
and need not contain any pixel values. Instead, it specifies a
pattern for naming one or more files, each of which
holds image pixel data in any still image format that the
ImageMagick convert tool can transform into a
P6 file. The pattern is taken as a format
string which is used with sprintf and one integer
parameter to produce each file name. The integer parameter first
has the value numberBegin, and is incremented by 1 each
time a file is processed, ending with the last value not greater
than numberEnd. For example, "CONVERT IMG%05u.JPG
1 4" would attempt to process the sequence of images
IMG00001.JPG, IMG00002.JPG,
IMG00003.JPG, and IMG00004.JPG; if any image
cannot be opened, it will be skipped, but still counted against
the framerate. For example, if there was no
IMG00002.JPG, but a structured comment set F
1000000000, then the three other frames would be
interpreted as spanning time intervals from 0..1s, 2..3s, and
3..4s (this behavior can be useful for processing timestamped
surveillance still captures). The output from convert
is a P6 file, so the header of this .tik file
should also start with P6; however, treating this
.tik header as a P6 image results in undefined
behavior -- the file could hold any image, or could even claim
the image dimensions are 0,0 with a maximum value of 0.
-
20160721 FFMPEG filename
-
This .tik file is just a header describing image data,
and need not contain any pixel values. Instead, the pixel values
are extracted using ffmpeg to decode the contents of
the video in the file named filename (if
filename is omitted, the video filename is assumed to
be the next argument on the tik command line). In this
way, TDCI information can be specified for arbitrary video files
without needing to incorporate such information in the files.
The output from ffmpeg is a stream of P6
files, so the header of this .tik file should also
start with P6; however,treating this .tik
header as a P6 image results in undefined behavior --
the file could hold any image, or could even claim the image
dimensions are 0,0 with a maximum value of 0.
-
20160712 UYVYYY
-
The storage format is actually just a sequence of P5
images, each including a header, in sequence. That's it. There
are just two issues. The first is that the headers after the
first do not need to have the structured comments in them. The
second is that the YUVYYY values used inside CHDK are unsigned
for Y, but signed for U and V, and this is maintained in the
TIK file. Thus bytes 0 and 2 of each group of 6 bytes are
signed. The CHDK color conversion formulas depend on that:
R = min(max(((Y << 12) + (V * 5743) + 2048) &rt;&rt; 12), 0), 255)
G = min(max(((Y << 12) + (U * 1411) + (V * 2925) + 2048) &rt;&rt; 12), 0), 255)
B = min(max(((Y << 12) + (U * 7258) + 2048) &rt;&rt; 12), 0), 255)
-
20160712 RGB
-
The header and initial image are processed normally. Next, a
single 0 byte is output. The image is scanned in an order
determined either by the rolling shutter specification or in the
default increasing Y, nesting increasing X, pixel order (which
is also the standard order for PGM and PPM files... what we
would encode as two non-negative numbers for the for rolling
shutter scan order R, where the non-zero Y value is at
least width times larger than the X value). Treating a tuple of
Red, Green, and Blue values as a pixel value, encode the number
of pixels unchanged from their previously-recorded values (the
span), then output the three raw pixel values. The span can be
greater than the number of pixels in an image, representing
entire frames with expected values. If the span is 0-127, it is
output as a single byte; otherwise, the 7 least-significant bits
of span are output in a byte ORed with 0x80, the remaining span
is shifted right by 7 bits, and the process repeated until no 1
bits remain in span. The pixel values are output as one byte
each for R, G, and B. Thus, a span of 257 pixels followed by an
R, G, B values of 0x11, 0x22, 0x33 would be encoded as the five
bytes: 0x02, 0x01, 0x11, 0x22, 0x33.
-
20160629 UYVYYY
-
This format is now deprecated.
The header and initial image are processed normally. Next, a
single 0 byte is output. The image is scanned in increasing Y,
nesting increasing X, pixel order (which is also the standard
order for PGM and PPM files... what we would encode as
YX for rolling shutter scan order). Although a tuple
of UYVYYY values really represents four pixels, it is treated as
one unit value. Encode the number of units unchanged from their
previously-recorded values (the span), then output the six-byte
UYVYYY value. The span can be greater than the number of units
in an image, representing entire frames with expected values.
If the span is 0-127, it is output as a single byte; otherwise,
the 7 least-significant bits of span are output in a byte ORed
with 0x80, the remaining span is shifted right by 7 bits, and
the process repeated until no 1 bits remain in span. Thus, a
span of 257 pixels followed by UYVYYY values of 0x11, 0x22,
0x33, 0x44, 0x55, 0x66 would be encoded as the eight bytes:
0x02, 0x01, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66. A subtle point
is that the YUVYYY values used inside CHDK are unsigned for Y,
but signed for U and V; in the TIK file, U and V values are
converted to unsigned by adding 128. Thus, one must subtract 128
from the encoded U and V values to recover U and V values
suitable for use with the CHDK color conversion formulas:
R = min(max(((Y << 12) + (V * 5743) + 2048) &rt;&rt; 12), 0), 255)
G = min(max(((Y << 12) + (U * 1411) + (V * 2925) + 2048) &rt;&rt; 12), 0), 255)
B = min(max(((Y << 12) + (U * 7258) + 2048) &rt;&rt; 12), 0), 255)
The tik Program
The demonstration program for TDCI imaging is called
tik -- just like the file format. However, it does not
just process .tik files. For example, it can be used to
extract virtual exposures from conventional video files, etc.
Much of that flexibility comes from use of ffmpeg to decode
videos, or ImageMagick convert to convert still images from
various formats, into sequences of PPMs that tik can
process. Note that tik literally runs ffmpeg
as a separate program, which means that either tool can be
updated completely independently without the need to recompile
from source code. At some point, tik might also mutate
into a plugin for ffmpeg, but not right now... the same
is true about having a graphical user interface.
tik Command Line Options
Like so many other video processing applications, the command line for
tik is fairly complex. Note that number refers to an
arbitrary floating-point number which can be specified directly or as
1/value notation where value is a floating-point
number. Thus, 0.01 also can be written as 1/100. It is
useful to note that the command line options are not identical to the fields
in TIK structured comments; for example, TIK expresses framerate as an integer
time per frame in nanoseconds, while the command line option uses a
floating-point FPS (Frames Per Second) value.
Currently, the following options are recognized:
-
-anumber
-
Set the shutter angle to number (which can only be done after a
framerate has been set using -f). In "old school" cinematography, a
rotary disc shutter blade was synchronized to the advance of the film so that
the film was covered during advance. This meant that the shutter was open for
something less than 360 degrees of the shutter rotation cycle, and it became
common to refer to this number as the shutter angle. Of course, the angle is
merely specifying the shutter speed in seconds, Tv, relative to the FPS
(frames Per Second). The formula is Tv=(angle/360)/FPS. Typical
cinematography used a shutter angle of about 180, so 24 FPS frames would each
use an shutter speed of 1/48s. In modern video capture, it is common that Tv
is determined entirely by the exposure conditions, and 24 FPS video might well
use a Tv of 1/500s, or a shutter angle of less than 18 degrees -- and often
obvious discontinuity of motion. For virtual exposures, tik allows
setting any shutter angle greater than 0 degrees and aligns the exposures at
the start of the frame time. Thus, 180 exposes during the first half of the
frame time, 360 would make framerate and exposure time identical, and a number
greater than 360 includes temporal data from after the end of the frame time.
-
-bnumber
-
Set the begin exposures time to number seconds from the start of the
TDCI input. Use of this option implies that the goal is creation of one or
more virtual exposures.
-
-efilename
-
Specify the error model to use as the one in the PPM file filename.
An error model is expressed as a 256x256 RGB P6 PPM image, and can be edited
using ordinary image editors, such as gimp. However, if
filename is omitted, the -e command specifies that an error
model image should be created from the input TDCI, which is assumed to be a
static, constant, scene containing a plurality of color and brightness values.
Notes that these error models essentially account for all types of pixel value
error, including photon shot noise, so using an error model tuned to your
particular camera, and even specific tone curve and ISO settings, can produce
higher-quality results. If no -e is specified and an error model is
needed, a built-in default one will be used.
-
-fnumber
-
Set the FPS (Frames Per Second) to number.
-
-gnumber
-
Set the gamma of encoded image data to number. Gamma is a way to
specify an exponential tonal remapping used to simulate the fact that human
eyesight is logarithmically sensitive to light while sensors used in digital
cameras have more linear response. Most image processing works best on data
with linear gamma, and this is the default assumed for inputs if no gamma
value is specified. However, typical JPEG or PPM image files are actually
encoded with an effective gamma of roughly 2.2, so specifying this as
-g2.2 will allow internal processing to better preserve approximate
linearity. The actual gamma used in encoding images is often not a simple
exponential function and the approximate value is not always 2.2 -- Apple
prefered a gamma of 1.8. However, small errors in the gamma handling are
generally not critical, as the gamma correction is only applied
internal to tik: the pixel values output have precisely the
same gamma as the ones input.
-
-i
-
Toggle interactive mode. In interactive mode, tik becomes rather
verbose with output to stderr (file descriptior 2), but the status
messages and progress indications can be reassuring and useful. By default,
interactive mode is set only if stderr is a TTY device (a terminal).
-
-mfilename
-
This option is not yet implemented and may change. Set
the exposure time map file name to filename. The values
in the map image are taken to be fractions of the specified
virtual shutter time for each exposure that each pixel should
use. The virtual exposure for a pixel location is (Tv*G)/255
centered at the position in the Tv interval which is
(Tv/2)+(((B-R)/255)*(Tv/2)) offset from the start.
Thus, reddish colors are at the start of the interval and
blueish ones are at the end; white would represent sampling
the entire Tv interval.
-
-ninteger
-
Set the number of frames to process as integer. On input, the default
is to process all frames. On output, the default is to process just a single
frame.
-
-ofilename
-
Set the output file name to filename. There are suitable defaults if
no name is specified. In the usual unix convention, specifying
- as the filename will catenate all output to
stdout.
-
-pnumber
-
Set the minimum acceptable probability that two pixel values are equivalent to
number percent. This, combined with the error model, controls the
statistical merging of pixel values in TIK files. In effect, this allows
specification of bounds on considering a changed pixel value to be changed
only by noise -- not a real change in scene appearance. The analogy is
somewhat imprecise, but the usual values of 32, 5, and 0.3 would roughly
correspond to accepting one, two, or three standard deviations as still being
the same value within noise tolerance. The default value is 4.55, or two
standard deviations. Smaller values might reduce TIK file sizes and can
increase effective dynamic range, but at the expense of some loss of temporal
accuracy.
-
-qnumber
-
Set the encoding quality to number percent. This controls a variety
of internal mechanisms as well as directly setting the encoding quality for
any JPEG output files. In general, higher values require more compute time and
often will generate larger files. It is generally best to stay in the range
from 75 to 100.
-
-tnumber
-
Set the shutter speed in seconds, Tv, to number. It does not make
sense to use this option in the same command line as -a.
-
-v
-
The input is a video (frames) to be converted to a TIK TDCI format.
-
filename
-
Any filename given is taken to be an input to the program. Inputs in
formats not directly recognized by tik are passed to ffmpeg,
which decodes them and sends the results back to tik via a named pipe
created by tik for that purpose. None of files named will be modified
in any way, but they may be opened and read more than once in a single
execution of tik.
Examples Of tik Commands
The tik command line is complex enough that usage is not obvious.
Thus, it is useful to show examples of common uses.
Create An Error Model
To create an error model file named myerrmod.ppm from a video
named testchart.avi:
tik -e -omyerrmod.ppm testchart.avi
Encode A Video As A TIK File
Using an error model file named myerrmod.ppm, make a TIK file
named myvideo.tik from a video named myvideo.mp4:
tik -emyerrmod.ppm -omyvideo.tik myvideo.mp4
Create JPEG Virtual Exposures From A TIK File
Use the a TIK file named myvideo.tik to extract ten virtual exposures
into files named mystill0.jpg through mystill9.jpg. Make the
sequence of ten virtual exposures start 0.5 seconds into the TDCI,
representing a sequence of frames at 24 FPS and a shutter angle of 90 degrees:
tik -n10 -omystill%d.jpg -b0.5 -f24 -a90 myvideo.tik
Code
The following code is internally ready for testing:
The only thing set in stone is our name.