Floating point numbers approximate real numbers. Operations with floating point numbers approximate corresponding operations with real numbers. Consider the following addition operation:
When the implied real result of a floating point operation is not a floating point number the result is rounded to a floating point number. The most common form of rounding is ``rounding to nearest'' where the result is rounded to the nearest floating point number. Using such rounding the previous example would result in:
Another form of rounding is ``upward rounding'' where the result is rounded up to a larger floating point number. If the result is positive, it is rounded away from zero; if the result is negative, it is rounded towards zero. Using such rounding the previous example would result in:
Another form of rounding is ``downward rounding'' where the result is rounded down to a smaller floating point number. If the result is positive, it is rounded towards zero; if the result is negative, it is rounded away from zero. Using such rounding the previous example would result in:
Numerical libraries provide three forms of rounding:
,
, and
. The default
mode of rounding is
.
When an explicit rounding mode is not specified,
as was done earlier,
is assumed.
Although
IEEE 754 requires that the algebraic operators
+, -, ,
, and
are rounded to
the nearest floating point number, other operators
are not so favoured. The following example will illustrate
what can happen with operators whose results are not
guaranteed to be accurate to within one ULP (Unit in the
Last Place). With a
implementation that is guaranteed
to be accurate to within 40 ULPS the following may occur:
Using real numbers directly in computations is currently infeasible. Floating point numbers are commonly used because of their computational advantages. Unfortunately, rounding causes the result returned to be inexact.
Jeff Tupper | March 1996 |