Let \(P\) be the probability distribution of the observed data.
Let \(Q\) be the approximate probabilistic model that describes the probability distribution of the observed data.
KL Divergence is a metric to measure the divergence between the two probability distributions \(P\) and \(Q\).