References: EE599/EE699 Program Optimization & Parallelization

All reference materials will be posted here.

HLL Languages & Coding Styles

The following are reference materials that describe programming languages, compiler issues, and coding styles....

C Language Standard
The August 3, 1998 draft of the C standard.
Writing Small And Fast Software
The set of slides (by Felix von Leitner) that we discussed in class.

Methods For Locating Relevant Code Sections

The following tools can help you locate the code sections that you should optimize. Always remember Amdahl's Law: if 1/Kth of the program is not affected by an improvement, the best speedup you can hope for is K.

time
The unix/linux/shell time command is very useful as a crude first pass in that only user time can be improved by normal means. Real (wall clock) and system (OS processor time spent on behalf of the user) times also are reported.
gprof
The GNU standard tool for detailed performance analysis. Best for determining which functions to optimize within a large program.
VTune
Intel's performance tuning software that uses the hardware performance registers. Available for both Windows and Linux, but only an evaluation period is free.
CodeAnalyst
AMD's answer to Intel's VTune, the good news is that it's free, the bad news is that it only works with Windows and Microsoft Visual C++.
PAPI
PAPI (Performance Application Programming Interface) provides a free and consistent interface and tools for use of performance counter hardware, primarily for a Linux environment.
Valgrind
A very nice simulation tool that tracks program behaviour using instruction-level simulation of unmodified executables using a taggged-memory system to tell you about references to uninitialized data, malloc/free activity, etc. Only real problem is that MMX/3DNow!/SSE are not supported.
KCachegrind
A detailed cache simulator using Valgrind. Asks hardware for actual cache parameters and simulates behavior; thus, there is no interference from the monitoring code.

Compiler-Like Software Tools

The following are software tools that can help you optimize your code. In one sense or another, most of these software tools are essentially compilers, but they operate with a conventional C compiler (typically, GCC)....

C-Mix
CMix is a "program specializer" or "partial evaluator" that takes C code, uses compile-time constants to simplify it (interprocedurally) by compile-time evaluation, and then generates C code for the "residual" runtime computation.
ATLAS (Automatically Tuned Linear Algebra Software)
ATLAS generates a machine-specific optimized C/F77 BLAS/LAPACK by generating many alternative matrix multiply codings, timing them, and keeping the fastest. The latest versions are 3DNow!/SSE/SSE2 aware.
SWAR: SIMD Within A Register
This is the home of everything we've been doing for MMX, 3DNow!, SSE, etc., including inline GCC interfaces and Scc, the SWARC module compiler.

Superoptimizers, Genetic Programming, Etc.

When you can't figure-out a better way, why not let the computer do it for you? These are techniques based on having the computer search for better implementations.

Superoptimizer: a look at the smallest program
Henry Massalin's 1987 paper that started this whole approach.
Superopt, the GNU Superoptimizer
The GNU superoptizier works for a variety of targets.
Aha! and PDF documentation
A RISC-oriented superoptimizer.
Denali: a goal-directed superoptimizer
A theorem-proving superoptimizer (code not available).
Genetic Programming
A method whereby programs to solve problems are evolved from an initial population of random programs. Lots of links here. John Koza is generally credited with founding this approach.

Hacks & Other Sneaky Programming Methods

The following are tricks of the programming trade that, although often ponderous in operation, can yield significant speedups for many common tasks.

The Aggregate Magic Algorithms
Our little collection of hacks and pointers to other people's hacks.
Hacker's Delight
A site describing the book by this title and providing materials very similar to those of The Aggregate Magic Algorithms site.
My comments/code for Bresenham linear interpolation
Here's a quickie overview of the incremental differential stuff....

Particular Processors & Instruction Sets

The following references describe in detail how particular instruction sets and architectural implementations work....

IA-32 Intel Architecture Software Developer's Manual (Part 1)
A good reference for the base IA32 architecture.
IA-32 Intel Architecture Software Developer's Manual (Part 2)
A good reference for the base IA32 instruction set.
AMD-K6 Processor Multimedia Extensions (MMX)
The original AMD documentation on MMX (a very nice summary).
AMD K6 Processor Code Optimization
This is the definitive reference for K6 (and K6-2) code tweaks.
AMD Athlon Processor, x86 Code Optimization Guide
This is the definitive reference for Athlon code tweaks.
Intel Pentium 4 and Intel Xeon Processor Optimization
This is the definitive reference for Pentium 4 code tweaks.
PowerPC Compiler Writer's Guide
The definitive reference for PowerPC code tweaks.
x86 optimization references
A collection of (old) pointers to documents about x86 optimizations.


Advanced Program Optimization & Parallelization.