References: EE599/EE699 Program Optimization & Parallelization
All reference materials will be posted here.
HLL Languages & Coding Styles
The following are reference materials that describe programming languages,
compiler issues, and coding styles....
- C Language Standard
- The August 3, 1998 draft of the C standard.
- Writing Small And Fast Software
- The set of slides (by Felix von Leitner) that we discussed in class.
Methods For Locating Relevant Code Sections
The following tools can help you locate the code sections that
you should optimize. Always remember Amdahl's Law:
if 1/Kth of the program is not affected by an improvement, the
best speedup you can hope for is K.
- time
- The unix/linux/shell time command is very useful as a crude
first pass in that only user time can be improved by normal means.
Real (wall clock) and system (OS processor time spent on
behalf of the user) times also are reported.
- gprof
- The GNU standard tool for detailed performance analysis.
Best for determining which functions to optimize within a large program.
- VTune
- Intel's performance tuning software that uses the hardware
performance registers. Available for both Windows and Linux, but
only an evaluation period is free.
- CodeAnalyst
- AMD's answer to Intel's VTune, the good news is that it's free,
the bad news is that it only works with Windows and Microsoft Visual C++.
- PAPI
- PAPI (Performance Application Programming Interface)
provides a free and consistent interface and tools for use of
performance counter hardware, primarily for a Linux environment.
- Valgrind
- A very nice simulation tool that tracks program behaviour
using instruction-level simulation of unmodified executables
using a taggged-memory system to tell you about references to
uninitialized data, malloc/free activity, etc. Only real problem
is that MMX/3DNow!/SSE are not supported.
- KCachegrind
- A detailed cache simulator using Valgrind. Asks hardware for
actual cache parameters and simulates behavior; thus, there is no
interference from the monitoring code.
Compiler-Like Software Tools
The following are software tools that can help you optimize your
code. In one sense or another, most of these software tools are
essentially compilers, but they operate with a conventional C
compiler (typically, GCC)....
- C-Mix
- CMix is a "program specializer" or "partial evaluator" that
takes C code, uses compile-time constants to simplify it
(interprocedurally) by compile-time evaluation, and then
generates C code for the "residual" runtime computation.
- ATLAS (Automatically Tuned Linear Algebra Software)
- ATLAS generates a machine-specific optimized C/F77
BLAS/LAPACK by generating many alternative matrix multiply
codings, timing them, and keeping the fastest. The latest
versions are 3DNow!/SSE/SSE2 aware.
- SWAR: SIMD Within A Register
- This is the home of everything we've been doing for MMX,
3DNow!, SSE, etc., including inline GCC interfaces and Scc, the
SWARC module compiler.
Superoptimizers, Genetic Programming, Etc.
When you can't figure-out a better way, why not let the computer
do it for you? These are techniques based on having the computer
search for better implementations.
- Superoptimizer: a look at the smallest program
- Henry Massalin's 1987 paper that started this whole approach.
- Superopt, the GNU Superoptimizer
- The GNU superoptizier works for a variety of targets.
- Aha! and
PDF documentation
- A RISC-oriented superoptimizer.
- Denali: a goal-directed superoptimizer
- A theorem-proving superoptimizer (code not available).
- Genetic Programming
- A method whereby programs to solve problems are evolved from
an initial population of random programs. Lots of links here.
John Koza is generally credited with founding this approach.
Hacks & Other Sneaky Programming Methods
The following are tricks of the programming trade that, although
often ponderous in operation, can yield significant speedups for
many common tasks.
- The Aggregate Magic Algorithms
- Our little collection of hacks and pointers to other people's hacks.
- Hacker's Delight
- A site describing the book by this title and providing materials very
similar to those of The Aggregate Magic Algorithms site.
- My comments/code for Bresenham linear interpolation
- Here's a quickie overview of the incremental differential stuff....
Particular Processors & Instruction Sets
The following references describe in detail how particular instruction
sets and architectural implementations work....
- IA-32 Intel Architecture Software Developer's Manual (Part 1)
- A good reference for the base IA32 architecture.
- IA-32 Intel Architecture Software Developer's Manual (Part 2)
- A good reference for the base IA32 instruction set.
- AMD-K6 Processor Multimedia Extensions (MMX)
- The original AMD documentation on MMX (a very nice summary).
- AMD K6 Processor Code Optimization
- This is the definitive reference for K6 (and K6-2) code tweaks.
- AMD Athlon Processor, x86 Code Optimization Guide
- This is the definitive reference for Athlon code tweaks.
- Intel Pentium 4 and Intel Xeon Processor Optimization
- This is the definitive reference for Pentium 4 code tweaks.
- PowerPC Compiler Writer's Guide
- The definitive reference for PowerPC code tweaks.
- x86 optimization references
- A collection of (old) pointers to documents about x86 optimizations.
Advanced Program Optimization & Parallelization.