Linear Regression with OLS – Victor's Website

Table of Contents

Introduction to Linear Regression
- Assumptions of Linear Regression
Ordinary Least Squares (OLS)

Introduction to Linear Regression

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. In its simplest form, linear regression assumes a linear relationship between the dependent variable $y$ and one independent variable $x$ :

$y = \beta_0 + \beta_1 x + \epsilon$

where $\beta_0$ is the intercept, $\beta_1$ is the slope coefficient, and $\epsilon$ is the error term. The goal of linear regression is to estimate the values of the parameters $\beta_0$ and $\beta_1$ that best fit the data.

Assumptions of Linear Regression

Before estimating the parameters of a linear regression model, we need to check that the following assumptions are satisfied:

Linearity: The relationship between the dependent variable $y_i$ and the independent variables $x_{i1}, x_{i2}, ..., x_{ik}$ is linear in the parameters $\beta_0, \beta_1, ..., \beta_k$ , i.e.,

$y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + ... + \beta_k x_{ik} + \epsilon_i$

Independence: The errors $\epsilon_1, \epsilon_2, ..., \epsilon_n$ are independent and identically distributed (i.i.d.). This means that the probability distribution of each error is the same and that the occurrence of one error does not affect the occurrence of another error.
Homoscedasticity: The errors $\epsilon_1, \epsilon_2, ..., \epsilon_n$ have the same variance $\sigma^2$ , i.e., $Var(\epsilon_i) = \sigma^2$ for all $i$ .
No Autocorrelation: There is no correlation between the errors at different observation points, i.e., $Cov(\epsilon_i, \epsilon_j) = 0$ for all $i \neq j$ .
Exogeneity: The regressors $x_{i1}, x_{i2}, ..., x_{ik}$ are uncorrelated with the errors $\epsilon_i$ , i.e., $Cov(x_{ij}, \epsilon_i) = 0$ for all $i$ and $j$ . This assumption is also known as the “no omitted variables” assumption.

If these assumptions are not met, the results of the linear regression analysis may be biased or unreliable.

Ordinary Least Squares (OLS)

The most commonly used method to estimate the parameters of a linear regression model is Ordinary Least Squares (OLS). OLS estimates the parameters by minimizing the sum of squared errors:

$\hat{\beta}_{OLS} = \arg \min_{\beta} (y-X\beta)'(y-X\beta)$

where $\hat{\beta}_{OLS}$ is the vector of OLS estimators, $y$ is the vector of observed values of the dependent variable, and $X$ is the design matrix of the independent variables. The OLS estimators are given by:

$\hat{\beta}_{OLS} = (X'X)^{-1}X'y$

To show that the OLS estimators are unbiased and consistent, we need to make some assumptions. Under the assumptions of the Gauss-Markov theorem, the OLS estimators are unbiased:

$E[\hat{\beta}_{OLS}] = \beta$

where $\beta$ is the true vector of coefficients. The OLS estimators are also consistent:

$\lim_{n \to \infty} \hat{\beta}_{OLS} = \beta$

where $n$ is the sample size.

Derivation of OLS Estimators

To derive the OLS estimators, we start by writing the sum of squared errors in matrix form:

$S(\beta) = (y - X\beta)'(y - X\beta)$

Expanding and simplifying, we get:

$S(\beta) = y'y - 2\beta'X'y + \beta'X'X\beta$

To minimize this expression with respect to $\beta$ , we take the derivative with respect to $\beta$ and set it equal to zero:

$\frac{\partial S(\beta)}{\partial \beta} = -2X'y + 2X'X\beta = 0$

Solving for $\beta$ , we get the OLS estimators:

$\hat{\beta}_{OLS} = (X'X)^{-1}X'y$

Variance of OLS Estimators

The variance of the OLS estimators can be estimated using the formula:

$Var(\hat{\beta}_{OLS}) = \sigma^2 (X'X)^{-1}$

where $\sigma^2$ is the variance of the error term $\epsilon$ .

Since $\sigma^2$ is unknown, it is typically estimated using the residuals:

$\hat{\sigma}^2 = \frac{\hat{\epsilon}'\hat{\epsilon}}{n - k}$

where $\hat{\epsilon} = y - X\hat{\beta}_{OLS}$ is the vector of residuals and $k$ is the number of regressors in the model (excluding the constant term).

Substituting $\hat{\sigma}^2$ for $\sigma^2$ in the formula for the variance of the OLS estimators, we get:

$\widehat{Var}(\hat{\beta}_{OLS}) = \hat{\sigma}^2 (X'X)^{-1}$

This expression gives an estimate of the variance-covariance matrix of the OLS estimators. Each diagonal element of this matrix represents the variance of the corresponding OLS estimator, while each off-diagonal element represents the covariance between two OLS estimators.

It is worth noting that this formula assumes that the error terms are homoscedastic (i.e., have equal variances) and that they are uncorrelated with each other and with the regressors. If these assumptions are violated, the formula for the variance-covariance matrix may need to be adjusted using techniques such as heteroscedasticity-robust standard errors or cluster-robust standard errors.

Derivation of Variance

To derive the variance of the OLS estimators, we need to start with the OLS estimator for $\beta$ , which we know is given by:

$\hat{\beta}_{OLS} = (X'X)^{-1}X'y$

where $X$ is the matrix of regressors, $y$ is the vector of the dependent variable, and $(X'X)^{-1}$ is the inverse of the matrix product $X'X$ .

Now, we want to find the variance of this estimator. The variance of the OLS estimator can be defined as:

$Var( \hat{\beta}_{OLS} ) = E\left[(\hat{\beta}_{OLS} - \beta)(\hat{\beta}_{OLS} - \beta)'\right]$

where $\beta$ is the true population parameter vector. Expanding this expression, we get:

$Var(\hat{\beta}_{OLS}) = E\left[\left((X'X)^{-1}X'\epsilon\right)\left((X'X)^{-1}X'\epsilon\right)'\right]$

where $\epsilon$ is the vector of errors, i.e., $\epsilon = y - X\beta$ .

Using matrix algebra, we can simplify this expression as follows:

$\begin{align*} Var(\hat{\beta}_{OLS}) &= E\left[\left((X'X)^{-1}X'\epsilon\right)\left((X'X)^{-1}X'\epsilon\right)'\right] \ &= E\left[\left((X'X)^{-1}X'\epsilon\right)\epsilon'X(X'X)^{-1}\right] \ &= (X'X)^{-1}X'E(\epsilon\epsilon')X(X'X)^{-1} \ &= (X'X)^{-1}X' E(\epsilon\epsilon')X(X'X)^{-1} \ &= (X'X)^{-1}X' \sigma^2 I X(X'X)^{-1} \ &= \sigma^2 (X'X)^{-1} \end{align*}$

where we have used the fact that $E(\epsilon) = 0$ and $E(\epsilon\epsilon') = \sigma^2 I$ , where $I$ is the identity matrix of size $n$ .

Therefore, the variance of the OLS estimators can be estimated as:

$\widehat{Var}(\hat{\beta}_{OLS}) = \hat{\sigma}^2 (X'X)^{-1}$

where $\hat{\sigma}^2$ is an estimate of the variance of the errors, calculated as:

$\hat{\sigma}^2 = \frac{\hat{\epsilon}'\hat{\epsilon}}{n-k}$

where $\hat{\epsilon} = y - X\hat{\beta}_{OLS}$ is the vector of residuals, and $k$ is the number of regressors in the model (excluding the constant term).

Proof of Unbiasedness

To show that the OLS estimators are unbiased, we need to show that:

$E[\hat{\beta}_{OLS}] = \beta$

where $\beta$ is the true vector of coefficients.

We start with the OLS estimators:

$\hat{\beta}_{OLS} = (X'X)^{-1}X'y$

Taking the expected value of both sides, we get:

$E[\hat{\beta}_{OLS}] = E[(X'X)^{-1}X'y]$

Using the linearity of expectation, we can move the expectation inside:

$E[\hat{\beta}_{OLS}] = (X'X)^{-1}X'E[y]$

Since $y$ is generated from the model $y = X\beta + \epsilon$ , we know that:

$E[y] = X\beta$

Substituting this into the previous equation, we get:

$E[\hat{\beta}_{OLS}] = (X'X)^{-1}X'X\beta = \beta$

Therefore, the OLS estimators are unbiased.

Proof of Consistency

To show that the OLS estimators are consistent, we need to show that:

$\lim_{n \to \infty} \hat{\beta}_{OLS} = \beta$

where $n$ is the sample size.

We can rewrite the OLS estimators as:

$\hat{\beta}_{OLS} = (X'X)^{-1}X'y = (X'X)^{-1}X'(X\beta + \epsilon) = \beta + (X'X)^{-1}X'\epsilon$

Taking the norm of both sides, we get:

$||\hat{\beta}_{OLS} - \beta|| = ||(X'X)^{-1}X'\epsilon||$

Using the Cauchy-Schwarz inequality, we can bound the norm of the OLS residuals:

$||\hat{\beta}_{OLS} - \beta|| \leq ||(X'X)^{-1}|| \ ||X'|| \ ||\epsilon||$

Since $||\epsilon||$ is bounded and $||X'||$ is finite, we only need to show that $(X'X)^{-1}$ goes to zero as $n$ goes to infinity.

By the Law of Large Numbers, we know that:

$\frac{1}{n} X'X \to_p E[xx']$

where $x$ is a column vector of the independent variables. Under the assumption that $E[xx']$ is full rank, we can use the Central Limit Theorem to show that:

$\sqrt{n}(\frac{1}{n} X'X - E[xx']) \to_d N(0, \sigma^2_{xx'})$

where $\sigma^2_{xx'}$ is the variance-covariance matrix of the independent variables. Since $(X'X)^{-1}$ is the inverse of $\frac{1}{n} X'X$ , we can write:

$(X'X)^{-1} = (\frac{1}{n} X'X)^{-1} = (\frac{1}{n} X'X - E[xx'])^{-1} \frac{1}{n}(X'X)^{-1} E[xx']^{-1}$

Using the matrix inversion lemma, we can rewrite this expression as:

$E[xx']^{-1}) \frac{1}{n}(X'X)^{-1} E[xx']^{-1}$

Taking the norm of both sides, we get:

$||(X'X)^{-1}|| \leq ||(\frac{1}{n} X'X - E[xx'])^{-1}|| \frac{1}{n} ||(X'X)^{-1}|| \ ||E[xx']|| \ ||(X'X)^{-1}||$

Since $E[xx']$ is full rank, we know that its inverse exists and is bounded. Therefore, we only need to show that $||(\frac{1}{n} X'X - E[xx'])^{-1}||$ goes to zero as $n$ goes to infinity.

Using the same argument as before, we can show that:

$\sqrt{n}(\frac{1}{n} X'X - E[xx']) \to_d N(0, \sigma^2_{xx'})$

By the Continuous Mapping Theorem, we have:

$||(\frac{1}{n} X'X - E[xx'])^{-1}|| \to_p \infty$

Therefore, we have shown that $||(X'X)^{-1}||$ goes to zero as $n$ goes to infinity, which implies that the OLS estimators are consistent.