Model estimation

Model estimation is the use of statistical analysis techniques to find parameters that most likely explain observed data. Model estimation is a component of Model Calibration and Validation.

# Notation

Consider a statistical process where an outcome is a function of various predictor variables . It may be desirable to explain this process with a linear equation,

where are parameters that "explain" the relationship between the predictor variables and the outcome variables. The residual term is a random variable that accounts for the difference between the observed value and the evaluation output of the linear function .

The notation for this equation can be simplified by using a matrix where the columns are the different predictor variables and the rows are different observations in a dataset. The first column consists of ones and corresponds to the intercept

$$ X = \begin{vmatrix} 1 & x_{11} & x_{21} & \ldots & x_{p1}\ 1 & x_{12} & x_{22} & \ldots & x_{p2}\ \vdots & \vdots & \vdots & & \vdots\ 1 & x_{1n} & x_{2n} & \ldots & x_{pn}\ \end{vmatrix} $$

The linear equation above then becomes . The purpose of model estimation is to find estimates of that minimize the difference between the true observed response and the "fitted" response .

# Ordinary Least Squares

Assume we have a linear equation and we want to find the estimates ; one plausible method would be to find values that minimize the sum of squared residuals, or the distance between and .

If we take the derivative of this sum with respect to , set equal to zero and solve for , we arrive at the following estimator equation:

This estimator is referred to as the Ordinary Least Squares (OLS) estimator. If we assume that the residuals are distributed normally with variance , variance of the OLS estimates is

# Maximum Likelihood Estimation

OLS is powerful and adequate in many situations; however, there may be cases where the assumptions of OLS modeling (normally distributed , etc.) are violated. This is common in transportation engineering especially, where the outcome variable is often discontinuous. In these cases, it is more common to use maximum likelihood estimation (MLE).

In a linear model, we assume that the points follow a normal (Gaussian) probability distribution, with mean and variance : . The equation of this probability density function is:

What we want to find is the parameters and that maximize this probability for all points . This is the "likelihood" function, .

For various reasons, it's easier to use the log of the likelihood function:

$$\log(\mathcal{L}) = \sum_{i = 1}^n-\frac{n}{2}\log(2\pi) -\frac{n}{2}\log(\sigma^2) - \frac{1}{2\sigma^2}(y - X\beta)^2 $$

Most MLE programs work by having a computer attempt to find values of and that maximize the value of this likelihood function. Note that for a linear model , the MLE and OLS estimates are equivalent. MLE is most suitable to problems where an analytical solution is difficult or does not exist.