Linear Regression #6: Heteroscedasticity

Tests for Heteroscedasticity

For basing inference on OLS, the error terms need to be homoscedastic. That is a constant variance of the error terms. Again, we can apply graphical methods in order to get a first overview. Therefore, we plot the studentised residuals vs. the fitted values and examine the distribution of the data points. If there is an increase in variance, there might be heteroscedasticity.

White Test

The White test for heteroscedasticity has the following hypotheses:

\[H_0:\ \mathbb{E}\left[\varepsilon_i^2\middle|\ X_i\right]=\sigma^2=const\]
\[H_1:\ \mathbb{E}\left[\varepsilon_i^2\middle|\ X_i\right]=\sigma_i^2\neq\sigma_{k\neq i}^2\neq const\]

The null hypothesis assumes that the error terms all have the same constant variance. The alternative hypothesis on the contrary assumes that there are differences between the variances of the error terms and they are hence not constant.
With a regression model \(\small{ \hat{y}=X\hat{\beta} }\), the White test procedures as follows:

Extract the residuals \(\small{ {\hat{\varepsilon}}_i }\) and square them: \(\small{ {\hat{\varepsilon}}_i^2 }\)
Run auxiliary regression \(\small{ {\hat{\varepsilon}}_i=\beta_0+\beta_1x_1+\ldots+\beta_nx_n }\).
Idea: If there is heteroscedasticity, then \(\small{ \mathbb{E}\left[\varepsilon_i^2\middle|\ x_i\right] }\) depends on \(\small{ x_i }\) and thus \(\small{ x_i }\) influences \(\small{ \varepsilon_i^2 }\).
Let \(\small{ N }\) be the number of observations and \(\small{ R_{aux}^2 }\) the R-Squared of the auxiliary regression model.
Calculate the test statistic \(\small{ \theta=N\ast\ R_{aux}^2 }\) and test whether \(\small{ \theta }\) follows the \(\small{ \mathcal{X}_q^2 }\)-distribution with \(\small{ q }\) degrees of freedom or not.
\(\small{ H_0:\theta\sim\mathcal{X}_q^2 }\) or \(\small{ H_1:\theta\sim\mathcal{D}\neq\mathcal{X}_q^2 }\)

R-Code: Manual White Test for Heteroscedasticity

# Extract Squared Residuals
res=lr$residuals
res2=res^2

# Run Auxiliary Regression
aux=lm(res2~df$X)

# Calculate Test Statistic
N=length(res)
R2=summary(aux)$r.squared
theta= N*R2

# Testing Result
p_value <- 1 - pchisq(theta, N-1) 
if(p_value>0.05){
  print("Homoscedasticity, accept H0!")
}
else{
  print("Heteroscedasticity, reject H0!")
}

Conclusion

If we detect heteroscedasticity, the Gauss-Markow assumptions are violated. Hence, we cannot apply OLS regression, but have to use FGLS instead:

Homoscedasticity

\[Var\left(\hat{\beta}\middle|\ X\right)\\=\left(X^\prime X\right)^{-1}\left[X^\prime \ Var\left(\varepsilon\middle|\ X\right)X\right]\left(X^\prime X\right)^{-1} \]

Heteroscedasticity

\[Var\left(\hat{\beta}\middle|\ X\right)\\=\left(X^\prime X\right)^{-1}\left[X^\prime\sigma^2I_NX\right]\left(X^\prime X\right)^{-1}\\=\sigma^2\left(X^\prime X\right)^{-1} \]

Linear Regression #7: Testing Normal Distribution