variance

Topics

descriptive statistics

probability

Variance measures the spread of data points around their mean, quantifying how much values in a distribution deviate from the expected value.

For a random variable $X$ , variance is defined as:

Va r (X) = E [(X - E [X])^{2}]

This represents the expected squared deviation from the mean. For discrete random variables with $n$ equally likely outcomes:

Va r (X) = \frac{1}{n} i = 1 \sum n (x_{i} - μ)^{2}

Key properties:

Always non-negative ( $Va r (X) \geq 0$ )
$Va r (a X + b) = a^{2} Va r (X)$ for constants $a, b$
For independent variables, $Va r (X + Y) = Va r (X) + Va r (Y)$

Sample Variance

In statistics, sample variance estimates population variance from a finite sample. The unbiased estimator uses bessel’s correction:

s^{2} = \frac{1}{n - 1} i = 1 \sum n (x_{i} - \overset{x}{ˉ})^{2}

Where $\overset{x}{ˉ}$ is the sample mean. The $n - 1$ denominator corrects bias in the estimation.

In Python’s NumPy:

np.var() computes population variance (by default)
Set ddof=1 for sample variance (default ddof=0)

from numpy import array, var
 
M = array([
[1,2,3,4,5,6],
[1,2,3,4,5,6]])
 
col_var = var(M, ddof=1, axis=0)
row_var = var(M, ddof=1, axis=1)

Note

The choice between population and sample variance depends on whether you’re describing the entire population or estimating from a sample.

variance of a vector