If Independent Features Then { Multivariate Normal Distributions == Product of Univariate Normal Distributions }

Below is a simply mathematical proof to show that in a multivariate gaussian distribution if features are independent then probability density of a point can be computed as the product of probability density of individual features modeled as univariate gaussian distribution.
$N(x; \mu, \Sigma)$
$= \frac{1}{\sqrt{(2\pi)^k|\Sigma|}}e\left( -\frac{1}{2}(x-\mu)'\Sigma^{-1}(x-\mu) \right ) ~~~(1)$
$= \prod_{i=1}^{K}\left ( \frac{1}{\sqrt{2\pi\sigma_i^2}}e^{-\frac{(x_i-\mu_i)^2}{2\sigma_i^2}} \right ) ~~~(2)$

where

• K = Number of Features
• $\mu$= Mean vector of size K
• $\Sigma$ = Covariance Matrix of size K X K
• $|\Sigma|$ = Determinant of covariance matrix
• $\sigma$ = standard deviation of feature i
• $\mu_i$ = mean value for feature i

Covariance matrix $\Sigma$ for a dataset with independent feature is a diagonal matrix. For a diagonal matrix we can easily show that

1. Inverse of a diagonal matrix is equal to the reciprocal of diagonal elements i.e
$\begin{bmatrix} \sigma_1^2 & 0 & 0 & \cdot \\ 0 & \sigma_2^2 & 0 & \cdot \\ 0 & 0 & \sigma_3^2 & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix}^{-1} = \begin{bmatrix} \frac{1}{\sigma_1^2} & 0 & 0 & \cdot \\ 0 & \frac{1}{\sigma_2^2} & 0 & \cdot \\ 0 & 0 & \frac{1}{\sigma_3^2} & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix}~~~(3)$
2. Determinant of a diagonal matrix is equal to the product of diagonal elements i.e
$\begin{bmatrix} \sigma_1^2 & 0 & 0 & \cdot \\ 0 & \sigma_2^2 & 0 & \cdot \\ 0 & 0 & \sigma_3^2 & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix} = \sigma^2_1 \times \sigma^2_2 \times \sigma^2_3 \times \cdots ~~~(4)$

Using the above two properties of the diagonal matrix we can show that equation 1 essentially same as equation 2 when features are independent. Let’s first tackle $\sqrt{(2\pi)^k|\Sigma|}$ in equation 1. Since determinant of a diagonal matrix is equal to the product of diagonal elements we can rewrite

$\sqrt{(2\pi)^k|\Sigma|}$

$= \sqrt{(2\pi)^k\sigma_1^2\sigma_2^2\sigma_3^2\cdots\sigma^2_k}$

$= \frac{1}{\sqrt{2\pi\sigma_1^2}}\times\frac{1}{\sqrt{2\pi\sigma_2^2}}\times\cdots\times\frac{1}{\sqrt{2\pi\sigma_k^2}}$

$= \prod_{i=1}^{K}\left ( \frac{1}{\sqrt{2\pi\sigma_i^2}}\right ) ~~~(5)$

Now let’s focus on the exponential part in equation 1. Using 3, we can show that

$(x-\mu)'\Sigma^{-1}(x-\mu)$

$= \begin{bmatrix} x_1-\mu_1 & x_2-\mu_2 & \cdot & \cdot \end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_1^2} & 0 & 0 & \cdot \\ 0 & \frac{1}{\sigma_1^2} & 0 & \cdot \\ 0 & 0 & \frac{1}{\sigma_1^2} & \cdot \\ 0 & 0 & 0 & \cdot \end{bmatrix}\begin{bmatrix} x_1-\mu_1\\ x_2-\mu_2\\ \cdot\\ \cdot\\ \end{bmatrix}$

$= \begin{bmatrix} \frac{x_1-\mu_1}{\sigma_1^2} & \frac{x_2-\mu_2}{\sigma_2^2} & \cdot & \cdot \end{bmatrix}\begin{bmatrix} x_1-\mu_1\\ x_2-\mu_2\\ \cdot\\ \cdot\\ \end{bmatrix}$

$=\frac{(x_1-\mu_1)^2}{\sigma_1^2} + \frac{(x_2-\mu_2)^2}{\sigma_2^2} + \cdot + \cdot ~~~ (6)$

Now $e^{x+y}$ can be written as $e^x \times e^y$. Thus

$e^{(x-\mu)'\Sigma^{-1}(x-\mu)}$

$= e^{\frac{(x_1-\mu_1)^2}{\sigma_1^2}} \times e^{\frac{(x_2-\mu_2)^2}{\sigma_2^2}} \times \cdots$

$= \prod_{i=1}^{K}e^{\frac{(x_i-\mu_i)^2}{\sigma_i^2}} ~~~(7)$

Replacing 1 with 5 and 7 we get

$\frac{1}{\sqrt{(2\pi)^k|\Sigma|}}e\left( -\frac{1}{2}(x-\mu)'\Sigma^{-1}(x-\mu) \right )$

$= \prod_{i=1}^{K}\left ( \frac{1}{\sqrt{2\pi\sigma_i^2}}\right ) \prod_{i=1}^{K}e^{\frac{-(x_i-\mu_i)^2}{2\sigma_i^2}}$

$= \prod_{i=1}^{K}\left( \frac{1}{\sqrt{2\pi\sigma_i^2}} e^{\frac{-(x_i-\mu_i)^2}{2\sigma_i^2}} \right)$

Hence proved.

About Ritesh Agrawal

I am a applied researcher who enjoys anything related to statistics, large data analysis, data mining, machine learning and data visualization.
This entry was posted in Data Mining, General and tagged , . Bookmark the permalink.