Next: 2.5.1 Infinity Up: 2 Numbers Previous: 2.4 Complex Numbers

2.5 Floating Point

Floating point numbers are commonly used to approximate real numbers. Floating point facilities are common in computer hardware so most floating point operations can be performed very quickly on computers.

There are many different floating point number systems [5, 49, 50, 35], although they are all very similar. A floating point number can be written as:

where a,b, and c are all in a finite subdomain of the integers.

All of the numbers in a particular floating point number system can be specified with a single choice of b. The set of floating point numbers with b=2 is denoted by . is the system of choice for computer implementations since a and c are usually stored in binary.

Implementations usually represent a and c in a fixed number of bits. A common example is IEEE 754 [5] 64-bit double precision where a is stored in 53 bits (fifty-two bits for the magnititude, one for the sign) while c is stored in 11 bits (using biased binary representation). Such a system is compactly expressed as : two exponent values are reserved to indicate non-normalized numbers. The floating point operations described below are required in IEEE 754 compliant numerical libraries.

Formally, the system includes all numbers which may be expressed as and satisfy:

where a and c are integers. The subtraction present in the right conjunct shifts the ``decimal place'' so as to relate the exponent range with unity, rather than

Another view of the floating point numbers is to imagine the numbers of as being described by A base b digits multiplied by b raised to an exponent between m and M:

Both describe the same system of numbers. The former description builds upon the preceding number systems while the latter gels with one's common experience of performing calculations. The relation between

and

is clearer; as are other important floating point concepts, such as the distinction between normalized numbers, where

, and denormalized numbers, where

Throughout this presentation the exact details of the underlying floating point system will not be important so will be used to denote any particular floating point system. The exact format used to store floating point numbers does not concern us. The meticulous reader is encouraged to read one of [x,y,z] for details omitted in this brief exposé of floating point. We use for numerical examples.

Next: 2.5.1 Infinity Up: 2 Numbers Previous: 2.4 Complex Numbers

Jeff Tupper

March 1996