Topics

Variance measures the spread of data points around their mean. If we are given a vector and asked to find the variance, we can do the following:

from numpy import array, var
 
M = array([1,2,2,2,5,6])
n = len(M)
 
M_centered = M - M.mean()
dot_prod = M_centered @ M_centered
 
variance = dot_prod / (n - 1)
print(variance) # 4.0
 
# confirming
print(var(M, ddof=1)) # 4.0

So basically,

  • center the vector (around the mean) and get
  • calc inner product:
  • variance:

Additionally, covariance, between 2 vectors and is found in the same way:

  • center and around their respective means
  • calc inner product between and