Let \(P\) be the probability distribution of the observed data.

Let \(Q\) be the approximate probabilistic model that describes the probability distribution of the observed data.

KL Divergence is a metric to measure the divergence between the two probability distributions \(P\) and \(Q\).