README for libsse.

General Information:
	The home site for libsse is off the SWAR homepage at Purdue University:
		http://shay.ecn.purdue.edu/~swar

	libsse was written by Randy Fisher, who can currently be contacted at:
		rfisher@ecn.purdue.edu
	It is based on libmmx which was written by Hank Dietz and Randy Fisher.
	Hank Dietz can currently be contacted at:
		hankd@ecn.purdue.edu

	Please include "libsse" in the subject line of any correspondence
	pertaining to the library.

	Please see the file "bug-reports" for information on reporting problems
	with the library.

	Please read the file INSTALL for information on making and istalling
	the library, or the file UPGRADE for information on upgrading from an
	earlier version of libsse.

Introduction:

	Intel's SSE family of multimedia extensions to the x86 instruction set
	contains CPU instructions which allow a single operation to be applied
	to multiple data items simultaneously.  This data is stored in a
	"partitioned" floating-point (FP) register, meaning that the register
	is logically divided into multiple independent sections called
	"fields", each of which can hold a single datum.  For example, a
	128-bit register may be partitioned into four 32-bit fields, with the
	first consisting of bits 0 through 31, the second consisting of bits
	32-63, and so forth.

	Throughout this document, and all SWAR literature, the notation "AxB"
	will be used to indicate a register partitioning of A fields of B bits
	each.  "AxB" is read as "A by B".  For example, a 64-bit register can
	be partitioned as 4 fields of 16-bits each (4 by 16).  The notation
	"AxBf" indicates A fields of B bits each, containing floating point
	data.

	Once the data has been stored in the partitioned register, SSE
	instructions can be used to operate simultaneously on all the fields of
	the register.  Most of these instructions are "non-interfering",
	meaning that their application to one field is independent of their
	application to any other field of the same register.  In this manner,
	a single operation can be applied to multiple data streams
	concurrently.  Thus, the SSE instructions treats the fields of a
	partitioned register as though they were equivalent registers on
	separate nodes of a SIMD parallel computer.  We refer to this type of
	processing as SWAR (SIMD Within A Register).

	This library is intended to provide C function level support for the
	SSE instruction set.  It does so by providing a data type for
	partitioned registers, and functions which allow operands of this type
	to be passed to an SSE instruction and returned from it, loaded in SSE
	registers, and stored from SSE registers to memory.  All of the
	original SSE instructions are supported by the library.

	The libsse functions access SSE instructions via inline assembly and
	use the SSE support provided by the GNU assembler (gas) once this
	support is added.  It is possible to modify the library sources to use
	earlier versions of the assembler, however, we suggest that you upgrade
	to a newer version of the assembler if possible.


The sse_t data type union:
	
	The data is signed, 32 bit floating point.  libsse stores the contents
	of each register as an array of 4 single-precision elements.  This
	first-class type union is defined to be "sse_t" in the header file
	"sse.h":

	typedef union {
		float	sf[4];	/* Single-precision (32-bit) value */
	} sse_t;

	Within an application, variables of type sse_t are declared as normal,
	and can be initialized by initializing the "elements" of the chosen
	partitioning as would be done for an array:
			sse_t a;
			sse_t c.sf = {456.45, 98.6, -12.3, -4.2};

	Values may be set within the application by setting the elements of
	the chosen partitioning:
			c.sf[3] = 3.2; c.sf[2] = 2.7;
			c.sf[1] = 1.2; c.sf[0] = 0.0;


Using libsse in an application:

	To use libsse in an application, the header file "sse.h" must be
	#included in the application before any variables of type sse are
	declared, and before the first occurrence of a libsse function.

	sse_ok() checks to see if the processor supports SSE.  If so, sse_ok()
	returns 1, otherwise, it returns 0.  This return value may be checked
	by a program, and allows SSE code to be skipped or alternative code to
	be used if SSE is not supported by the CPU.  mm_support() can be used
	to see which multimedia extensions are supported by the processor
	including MMX, Extended MMX, 3DNow!, and SSE, although this library
	currently only supports SSE.

	If SSE is supported, any of the other libsse functions may be called.

	The processor's streaming SIMD Extension units may need to be saved
	and restored before and after a context switch or when an exception
	handler needs to use these units.  The FXSAVE and FXRSTOR instructions
	perform these tasks.

	See the document "functions" for complete descriptions of the libsse
	functions.


Using SSE_TRACE:
	Defining SSE_TRACE before the inclusion of sse.h, either in the source
	or on the compiler command line, enables the printing of trace
	information onto stderr.  This information should be useful for
	debugging and optimizing your code.

