Floating point numbers approximate real numbers. Operations with floating point numbers approximate corresponding operations with real numbers. Consider the following addition operation:
When the implied real result of a floating point operation is not a floating point number the result is rounded to a floating point number. The most common form of rounding is ``rounding to nearest'' where the result is rounded to the nearest floating point number. Using such rounding the previous example would result in:
Another form of rounding is ``upward rounding'' where the result is rounded up to a larger floating point number. If the result is positive, it is rounded away from zero; if the result is negative, it is rounded towards zero. Using such rounding the previous example would result in:
Another form of rounding is ``downward rounding'' where the result is rounded down to a smaller floating point number. If the result is positive, it is rounded towards zero; if the result is negative, it is rounded away from zero. Using such rounding the previous example would result in:
Numerical libraries provide three forms of rounding: , , and . The default mode of rounding is . When an explicit rounding mode is not specified, as was done earlier, is assumed.
Although IEEE 754 requires that the algebraic operators +, -, , , and are rounded to the nearest floating point number, other operators are not so favoured. The following example will illustrate what can happen with operators whose results are not guaranteed to be accurate to within one ULP (Unit in the Last Place). With a implementation that is guaranteed to be accurate to within 40 ULPS the following may occur:
Using real numbers directly in computations is currently infeasible. Floating point numbers are commonly used because of their computational advantages. Unfortunately, rounding causes the result returned to be inexact.
Jeff Tupper | March 1996 |