Machine learning(ML) is an approach to learn some pattern of data, and leverage it to predict properties of unseen data. The same statement is valid for statistics. This is because the essence of ML theory comes from statistics. In this post, I will explain how to introduce probabilistic models and statistics to ML problems.
Context Modeling
To apply probability theory on real world problems, we have to express all of the problem-related variables into random variables. Observed examples are called a dataset.
- When is the number of samples, dataset is
- in a supervised setting
- in an unsupervised setting
For the ease of notation, I will denote a data point as , but it is straight forward to replace with for the supervised setting.
Assumptions
We have to introduce some assumptions to bridge between machine learning tasks and probability theory. These assumptions may vary from model to model, but they are the most common assumptions widely applied.
Data Distribution
We believe that there is a distribution where the data are drawn. Estimating this distribution is a main objective for most machine learning problems. This distribution is known when we are dealing with synthetic data but it's unknown for most real-world problems.
Parametrized Data Distribution
Considering the whole search space is intractable. Therefore, we constrain the search space by assuming that the data distribution follows a certain functional form. This functional form is parametrized by model parameters . Thanks to this assumption, we can efficiently explore the search space by optimizing .
It is important to note that estimating is equivalent to estimating the data distribution . Therefore, ML methods focuses on optimizing the model parameter based on a given dataset .
Notations on Model Parameter
You may encounter a few different notations for the model parameter .
- is unlikely to be of interest.
- is used to indicate that the probability distribution is parametrized.
- is considered as a random variable.
- This notation is used when we applies a Bayesian approach on the parameter .
- is considered as a fixed value, not having a statistical interpretation.
- Conditioned variable and do not have an intrinsic difference. Both are included in the density function with given/fixed values. However, the author tries to differentiate them because does not have a statistical interpretation in the context.
i.i.d. Sampling
The dataset is assumed to be independent and identically distributed (i.i.d.). Thanks to this assumption, the likelihood of becomes a simple product of likelihoods for each data point.
Latent Variable Model
A probability model might contain unobserved variables. We call them latent variables, denoted by , and the probability model with latent variables is called latent variable model. The joint distribution can be decomposed as follows:
Therefore, the model is composed of three variables
- observed random variable
- unboserved (latent) variables
- model parameter
We don't assume independence between variables in general. We just believe that there are some underlying latent variables that control the events. Each observation may have its own independent latent variables, or all observations may be affected by shared latent variables.
Complete/Incomplete Data
The dataset is called incomplete dataset because it partially describes the statistical process of the dataset. On the other hand, is called complete dataset because it fully describes the statistical process of the dataset.
Why Do We Introduce Latent Variables?
In most cases, marginal likelihood is intractable. However, we may but the complete data likelihood is tractable. This is why we introduce latent variable models for complex problems.
Notations
Throughout my posts, I'll use the notations below:
- : prior distribution (on the model paramter )
- : posterior distribution (on the model parameter )
- means a set of observations
- means a set of latent variables
- : marginal likelihood / evidence
- : joint likelihood / complete data likelihood
- : posterior distribution on latent variables