Topics
Variance measures the spread of data points around their mean, quantifying how much values in a distribution deviate from the expected value.
For a random variable , variance is defined as:
This represents the expected squared deviation from the mean. For discrete random variables with equally likely outcomes:
Key properties:
- Always non-negative ()
- for constants
- For independent variables,
Sample Variance
In statistics, sample variance estimates population variance from a finite sample. The unbiased estimator uses bessel’s correction:
Where is the sample mean. The denominator corrects bias in the estimation.
In Python’s NumPy:
np.var()
computes population variance (by default)- Set
ddof=1
for sample variance (defaultddof=0
)
from numpy import array, var
M = array([
[1,2,3,4,5,6],
[1,2,3,4,5,6]])
col_var = var(M, ddof=1, axis=0)
row_var = var(M, ddof=1, axis=1)
Note
The choice between population and sample variance depends on whether you’re describing the entire population or estimating from a sample.