2.5.3 Rounding

Next: 2.5.4 Algebraic Properties Up: 2.5 Floating Point Previous: 2.5.2 NAN

2.5.3 Rounding

Floating point numbers approximate real numbers. Operations with floating point numbers approximate corresponding operations with real numbers. Consider the following addition operation:

Both

and

are members of

;

is not.

When the implied real result of a floating point operation is not a floating point number the result is rounded to a floating point number. The most common form of rounding is ``rounding to nearest'' where the result is rounded to the nearest floating point number. Using such rounding the previous example would result in:

Another form of rounding is ``upward rounding'' where the result is rounded up to a larger floating point number. If the result is positive, it is rounded away from zero; if the result is negative, it is rounded towards zero. Using such rounding the previous example would result in:

Another form of rounding is ``downward rounding'' where the result is rounded down to a smaller floating point number. If the result is positive, it is rounded towards zero; if the result is negative, it is rounded away from zero. Using such rounding the previous example would result in:

Numerical libraries provide three forms of rounding: , , and . The default mode of rounding is . When an explicit rounding mode is not specified, as was done earlier, is assumed.

Although IEEE 754 requires that the algebraic operators +, -, , , and are rounded to the nearest floating point number, other operators are not so favoured. The following example will illustrate what can happen with operators whose results are not guaranteed to be accurate to within one ULP (Unit in the Last Place). With a implementation that is guaranteed to be accurate to within 40 ULPS the following may occur:

math6326

The actual value,

, is bracketed by

and

. These brackets may be widely separated; with our example sine implementation they may differ by up to 80 ULPS. The result using ``rounding to nearest'' only guarantees that the true result will fall within the bracketed region.

Using real numbers directly in computations is currently infeasible. Floating point numbers are commonly used because of their computational advantages. Unfortunately, rounding causes the result returned to be inexact.

Next: 2.5.4 Algebraic Properties Up: 2.5 Floating Point Previous: 2.5.2 NAN

Jeff Tupper

March 1996