7.2 Floats: IEEE 754 and Floating-Point Pitfalls

Floating-point numbers, as defined by the IEEE 754 standard, are the primary data type for representing real numbers in most programming languages, including Python. They are designed to cover a vast range of values, from the astronomically large to the infinitesimally small, but this range comes at a cost: precision. Understanding their internal representation is crucial to avoiding subtle and often critical bugs in numerical computations.

Internal Representation: Sign, Exponent, and Mantissa

A float is not stored as a decimal number but as a binary (base-2) fraction. Conceptually, a 64-bit (“double-precision”) float uses 1 bit for the sign, 11 bits for the exponent, and 52 bits for the significand (or mantissa). The value is essentially calculated as: sign * (1 + mantissa) * (2 ** exponent) This binary fractional representation is the root cause of most floating-point pitfalls. Many simple decimal numbers, like 0.1, cannot be represented exactly as a finite binary fraction. Just as 1/3 becomes the repeating decimal 0.333... in base-10, 0.1 (1/10) becomes the repeating binary fraction 0.0001100110011.... The conversion from the decimal literal in your code to the nearest representable binary floating-point number inevitably introduces a tiny rounding error right from the start.

The Classic Precision Pitfall

This inherent representation error manifests in seemingly bizarre ways during arithmetic operations. The most famous example is the failure of simple addition to produce an expected result.

# The classic floating-point surprise
result = 0.1 + 0.2
print(result)           # Output: 0.30000000000000004
print(result == 0.3)    # Output: False

# The error is already present in the stored value of 0.1
from math import isclose
value = 0.1
print(f"{value:.20f}") # Output: 0.10000000000000000555

Comparing Floats for Equality

Because of these representation errors, using the equality operator (==) to compare floats is one of the most common and dangerous mistakes. The results are unpredictable and depend on the specific computation path.

a = 0.1 + 0.1 + 0.1
b = 0.3
print(a == b)          # Output: False

# The correct approach: check if numbers are "close enough"
# Using a relative tolerance and absolute tolerance
print(abs(a - b) < 1e-9)          # A simple absolute tolerance check

# The robust approach: use math.isclose (Python 3.5+)
import math
print(math.isclose(a, b))          # Output: True
print(math.isclose(a, b, rel_tol=1e-9, abs_tol=1e-9))

The Problem of Catastrophic Cancellation

This occurs when subtracting two nearly equal numbers. The result may have far fewer significant digits than the original numbers, amplifying any relative errors that were present in the operands. This is particularly devastating in numerical algorithms.

# Consider two numbers that are very close together
x = 1.000000000000001
y = 1.000000000000002

# The true difference is 1e-15
true_diff = 1e-15

# The computed difference loses precision
computed_diff = y - x
print(f"Computed difference: {computed_diff}")          # May not be exactly 1e-15
print(f"Relative error: {(computed_diff - true_diff) / true_diff}")

Other Notable Pitfalls and Behaviors

Beyond precision, the IEEE 754 standard defines other special values and edge cases that developers must be aware of.

Not a Number (NaN): This is a special value that represents an undefined or unrepresentable result, such as 0.0 / 0.0. A crucial property of NaN is that it is not equal to anything, including itself.

nan_value = float('nan')
print(nan_value)               # Output: nan
print(nan_value == nan_value)  # Output: False (This is a defining trait of NaN)
print(math.isnan(nan_value))   # Output: True (The correct way to check)

Infinity: The standard also represents positive and negative infinity, which result from operations like division by zero or exceeding the maximum representable value.

positive_inf = float('inf')
negative_inf = float('-inf')
result_from_division = 1.0 / 0.0
print(positive_inf)                  # Output: inf
print(result_from_division)          # Output: inf
print(positive_inf > 1e308)          # Output: True (It is greater than any finite number)

Rounding and Aggregation Errors: Summing a large sequence of floats can compound small representation errors into a significant total error. The order of operations can affect the result due to the way rounding occurs at each step.

# Summing a large list of the same value
large_list = [0.1] * 1000000
total_sum = sum(large_list)
print(total_sum) # Output: 100000.00000133388 (not exactly 100000.0)

Best Practices for Working with Floats

Never use == or != for floats. Always use a tolerance check with math.isclose() or abs(a - b) < tol.
Be mindful of operations that can lead to catastrophic cancellation (subtracting similar numbers) or overflow/underflow (very large exponents).
Consider your data’s domain. If you are working with decimal fractions for financial or monetary calculations (where exactness is required), the decimal.Decimal module is a far superior choice. If you are working with rational numbers (fractions), the fractions.Fraction module is appropriate.
Use libraries for advanced math. For complex numerical analysis, linear algebra, or statistics, rely on robust libraries like NumPy and SciPy, which are built to handle these issues carefully and often provide alternative data types.