1 Welcome to Lab 5

Intended Learning Outcomes:

  • Produce scatterplots of report quality;

  • Calculate and interpret the sample correlation coefficient;

  • Fit linear regression models by using lm;

  • Interpret estimates of model parameters.

1.1 Correlation coefficient

In the lectures we learned how to assess the strength of a linear relationship between random variables using the correlation coefficient. The population correlation is a measure of the magnitude of the strength of the relationship between two random variables \(X\) and \(Y\), and is defined as

\[\begin{equation} \rho(X,Y) =\frac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}}, \tag{1.1} \end{equation}\]

and can be estimated by replacing each of \(\mathrm{Cov}(X,Y)\), \(\mathrm{Var}(X)\) and \(\mathrm{Var}(Y)\) by their unbiased estimators to give

\[\begin{equation} r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} = \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^n(x_i-\bar{x})^2\sum_{i=1}^n(y_i-\bar{y})^2}}, \tag{1.2} \end{equation}\]

the sample correlation coefficient (\(-1 \leq r \leq 1\)).

Given a sample of data, we can assess the statistical significance of the observed correlations between variables in the wider population. To do this we perform a hypothesis test.