Topics

Covariance measures the joint probability of two random variables and describes how they change together. It is denoted as . It’s the expected value of the product of the differences of each variable from their expected values:

If you observe above formula,

Intuition: Say we have a scatter plot of 2 variables and . A visual inspection gives us the idea that there’s a positive linear relationship between and . We can formalize this is another way by saying:

If a point is above the mean , the corresponding is also above the mean and vice versa.

This formulation directly translates to the fact that the product (a measure) will be positive for all such cases, and negative otherwise. Thus, the average measure for all the datapoints will be:

This average measure will be positive and large if for more datapoints, the positive linear relationship holds true, i.e. strong positive association. On the flipside, if and had a negative linear relationship, i.e. when one goes up, the other goes down, then this measure will be negative and large, i.e. strong negative association.

The sign of the covariance indicates whether the variables increase together (positive) or decrease together (negative). A covariance of zero means they are independent.

Similar to variance, we can calculate the sample covariance using bessel’s correction:

# vector covariance
from numpy import array, cov
 
x = array([1,2,3,4,5,6,7,8,9])
y = array([9,8,7,6,5,4,3,2,1])
 
Sigma = cov(x,y)[0,1]