7.8 The math and statistics Modules

The math and statistics modules are indispensable tools in the Python standard library, providing a robust suite of functions for mathematical and statistical operations that go far beyond the basic arithmetic operators. While the built-in types handle fundamental operations, these modules offer specialized, often more precise, and performant implementations of advanced functions crucial for scientific computing, data analysis, engineering, and financial applications.

Core Mathematical Functions with math

The math module provides access to the underlying C library functions for floating-point mathematics. Its functions generally expect integer or float arguments and return float values. This makes it distinct from the cmath module, which is designed for complex numbers.

A critical aspect of using math is understanding its domain constraints. Many of its functions are undefined for certain inputs, and attempting to use them will raise a ValueError. For example, math.sqrt(-1) is invalid because the square root of a negative number is not a real number. You must use cmath.sqrt(-1) for complex results.

import math

# Basic functions and constants
print(math.pi)        # Output: 3.141592653589793
print(math.e)         # Output: 2.718281828459045
print(math.sqrt(25))  # Output: 5.0

# Exponential and logarithmic functions
print(math.exp(1))    # Output: ~2.718... (e^1)
print(math.log(100, 10)) # Output: 2.0 (log base 10 of 100)
print(math.log(math.e))   # Output: 1.0 (natural log of e)

# Trigonometric functions (work in radians, not degrees)
angle_in_radians = math.radians(60) # Convert 60 degrees to radians
print(math.sin(angle_in_radians))    # Output: ~0.866... (sin(60°))
print(math.degrees(math.pi / 2))     # Output: 90.0

Working with Special Values: NaN and Infinity

The math module provides a consistent and portable way to work with special floating-point values like infinity (∞) and “Not a Number” (NaN). These are defined by the IEEE 754 standard. It’s crucial to use math.isnan() and math.isinf() for checking these values, not the equality operators (==), because NaN is not equal to itself by definition.

import math

positive_inf = math.inf
negative_inf = -math.inf
not_a_num = math.nan

print(positive_inf > 10**100) # Output: True
print(not_a_num == not_a_num) # Output: False (This is why you need isnan)

# Correct way to check for NaN and Infinity
print(math.isnan(not_a_num))       # Output: True
print(math.isinf(positive_inf))    # Output: True
print(math.isfinite(42))           # Output: True

Combinatorics and Number Theory

For working with integers, math provides efficient implementations of factorial and combinatorial functions. A common pitfall is passing a non-integer or negative integer to math.factorial(), which will raise a ValueError. The math.comb() and math.perm() functions were added in Python 3.8 and are far more efficient and less error-prone than manually calculating combinations and permutations using factorials.

import math

# Factorial
print(math.factorial(5)) # Output: 120

# Combinations: ways to choose k items from n items without repetition, order doesn't matter.
n_choose_k = math.comb(10, 2) # Number of ways to choose 2 items from 10
print(n_choose_k) # Output: 45

# Permutations: ways to choose k items from n items without repetition, order matters.
n_permute_k = math.perm(10, 2) # Number of ways to arrange 2 items from 10
print(n_permute_k) # Output: 90

Statistical Analysis with the statistics Module

While the math module focuses on fundamental mathematical functions, the statistics module provides higher-level functions for calculating common statistical measures of central tendency and spread. It is designed to work with various iterables and numeric types, including int, float, Decimal, and Fraction. This module is ideal for applications where importing a heavy library like NumPy is unnecessary.

A key best practice is to be aware of the different functions for calculating the mean. mean() is the arithmetic average, fmean() is a faster, always-floating-point version, and geometric_mean() and harmonic_mean() serve specific purposes in fields like finance and physics.

import statistics
from fractions import Fraction

data = [1, 2, 2, 3, 4, 5, 5, 5, 6]

# Measures of central tendency
print(statistics.mean(data))       # Output: 3.666...
print(statistics.fmean(data))      # Output: 3.666...
print(statistics.median(data))     # Output: 4
print(statistics.mode(data))       # Output: 5

# Measures of spread
print(statistics.variance(data))   # Output: 2.75 (sample variance)
print(statistics.stdev(data))      # Output: ~1.658 (sample standard deviation)

# Works with Fractions
fraction_data = [Fraction(1, 2), Fraction(3, 4), Fraction(1, 4)]
print(statistics.mean(fraction_data)) # Output: Fraction(1, 2)

Precision and Data Type Considerations

A critical pitfall to avoid is mixing data types carelessly. The math module primarily returns floats, which can lead to precision issues when working with Decimal or Fraction objects. The statistics module is more flexible but understanding its return type is vital. For instance, statistics.mean() of integers returns a float, while statistics.mean() of Fractions returns a Fraction. If absolute decimal precision is required for financial calculations, it’s often better to use the Decimal type with its own methods or ensure the statistics functions are fed Decimal objects.

import math
import statistics
from decimal import Decimal

# math returns float, which can be imprecise
decimal_num = Decimal('10.1')
result = decimal_num + math.sqrt(Decimal('4')) # TypeError: must be real number, not Decimal
# Correct approach: convert to float for math, then back if needed, or use Decimal's own methods.

# statistics handles Decimals correctly
decimal_data = [Decimal('10.1'), Decimal('20.2'), Decimal('30.3')]
print(statistics.mean(decimal_data)) # Output: Decimal('20.2')