Consider a statistical process where an outcome is a function of various
predictor variables . It may be desirable to explain this
process with a linear equation,
where are parameters that "explain" the relationship between
the predictor variables and the outcome variables. The residual term is a
random variable that accounts for the difference between the observed value
and the evaluation output of the linear function .
The notation for this equation can be simplified by using a matrix where the
columns are the different predictor variables and the
rows are different observations in a dataset. The first column consists of
ones and corresponds to the intercept
OLS is powerful and adequate in many situations; however, there may be cases where
the assumptions of OLS modeling (normally distributed , etc.) are violated.
This is common in transportation engineering especially, where the outcome
variable is often discontinuous. In these cases, it is more common to use
maximum likelihood estimation (MLE).
In a linear model, we assume that the points follow a normal (Gaussian) probability
distribution, with mean and variance : .
The equation of this probability density function is:
What we want to find is the parameters and that maximize this
probability for all points . This is the "likelihood" function, .
For various reasons, it's easier to use the log of the likelihood function:
Most MLE programs work by having a computer attempt to find values of
and that maximize the value of this likelihood function. Note that for
a linear model , the MLE and OLS estimates are equivalent.
MLE is most suitable to problems where an analytical solution is difficult or does