If Independent Features Then { Multivariate Normal Distributions == Product of Univariate Normal Distributions }

Below is a simply mathematical proof to show that in a multivariate gaussian distribution if features are independent then probability density of a point can be computed as the product of probability density of individual features modeled as univariate gaussian distribution.
N(x; \mu, \Sigma)
= \frac{1}{\sqrt{(2\pi)^k|\Sigma|}}e\left( -\frac{1}{2}(x-\mu)'\Sigma^{-1}(x-\mu) \right ) ~~~(1)
= \prod_{i=1}^{K}\left ( \frac{1}{\sqrt{2\pi\sigma_i^2}}e^{-\frac{(x_i-\mu_i)^2}{2\sigma_i^2}} \right ) ~~~(2)

where

  • K = Number of Features
  • \mu= Mean vector of size K
  • \Sigma = Covariance Matrix of size K X K
  • |\Sigma| = Determinant of covariance matrix
  • \sigma = standard deviation of feature i
  • \mu_i = mean value for feature i

Covariance matrix \Sigma for a dataset with independent feature is a diagonal matrix. For a diagonal matrix we can easily show that

  1. Inverse of a diagonal matrix is equal to the reciprocal of diagonal elements i.e
    \begin{bmatrix} \sigma_1^2 & 0 & 0 & \cdot \\ 0 & \sigma_2^2 & 0 & \cdot \\ 0 & 0 & \sigma_3^2 & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix}^{-1} = \begin{bmatrix} \frac{1}{\sigma_1^2} & 0 & 0 & \cdot \\ 0 & \frac{1}{\sigma_2^2} & 0 & \cdot \\ 0 & 0 & \frac{1}{\sigma_3^2} & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix}~~~(3)
  2. Determinant of a diagonal matrix is equal to the product of diagonal elements i.e
    \begin{bmatrix} \sigma_1^2 & 0 & 0 & \cdot \\ 0 & \sigma_2^2 & 0 & \cdot \\ 0 & 0 & \sigma_3^2 & \cdot \\ \cdot & \cdot & \cdot & \cdot \end{bmatrix} = \sigma^2_1 \times \sigma^2_2 \times \sigma^2_3 \times \cdots ~~~(4)

Using the above two properties of the diagonal matrix we can show that equation 1 essentially same as equation 2 when features are independent. Let’s first tackle \sqrt{(2\pi)^k|\Sigma|} in equation 1. Since determinant of a diagonal matrix is equal to the product of diagonal elements we can rewrite

\sqrt{(2\pi)^k|\Sigma|}

= \sqrt{(2\pi)^k\sigma_1^2\sigma_2^2\sigma_3^2\cdots\sigma^2_k}

= \frac{1}{\sqrt{2\pi\sigma_1^2}}\times\frac{1}{\sqrt{2\pi\sigma_2^2}}\times\cdots\times\frac{1}{\sqrt{2\pi\sigma_k^2}}

= \prod_{i=1}^{K}\left ( \frac{1}{\sqrt{2\pi\sigma_i^2}}\right ) ~~~(5)

Now let’s focus on the exponential part in equation 1. Using 3, we can show that

(x-\mu)'\Sigma^{-1}(x-\mu)

= \begin{bmatrix} x_1-\mu_1 & x_2-\mu_2 & \cdot & \cdot \end{bmatrix} \begin{bmatrix} \frac{1}{\sigma_1^2} & 0 & 0 & \cdot \\ 0 & \frac{1}{\sigma_1^2} & 0 & \cdot \\ 0 & 0 & \frac{1}{\sigma_1^2} & \cdot \\ 0 & 0 & 0 & \cdot \end{bmatrix}\begin{bmatrix} x_1-\mu_1\\ x_2-\mu_2\\ \cdot\\ \cdot\\ \end{bmatrix}

= \begin{bmatrix} \frac{x_1-\mu_1}{\sigma_1^2} & \frac{x_2-\mu_2}{\sigma_2^2} & \cdot & \cdot \end{bmatrix}\begin{bmatrix} x_1-\mu_1\\ x_2-\mu_2\\ \cdot\\ \cdot\\ \end{bmatrix}

=\frac{(x_1-\mu_1)^2}{\sigma_1^2} + \frac{(x_2-\mu_2)^2}{\sigma_2^2} + \cdot + \cdot ~~~ (6)

Now e^{x+y} can be written as e^x \times e^y. Thus

e^{(x-\mu)'\Sigma^{-1}(x-\mu)}

= e^{\frac{(x_1-\mu_1)^2}{\sigma_1^2}} \times e^{\frac{(x_2-\mu_2)^2}{\sigma_2^2}} \times \cdots

= \prod_{i=1}^{K}e^{\frac{(x_i-\mu_i)^2}{\sigma_i^2}} ~~~(7)

Replacing 1 with 5 and 7 we get

\frac{1}{\sqrt{(2\pi)^k|\Sigma|}}e\left( -\frac{1}{2}(x-\mu)'\Sigma^{-1}(x-\mu) \right )

= \prod_{i=1}^{K}\left ( \frac{1}{\sqrt{2\pi\sigma_i^2}}\right ) \prod_{i=1}^{K}e^{\frac{-(x_i-\mu_i)^2}{2\sigma_i^2}}

= \prod_{i=1}^{K}\left( \frac{1}{\sqrt{2\pi\sigma_i^2}} e^{\frac{-(x_i-\mu_i)^2}{2\sigma_i^2}} \right)

Hence proved.

 

Posted in Data Mining, General | Tagged , | Leave a comment