Lab 5 - Exploring relationships and understanding correlation
1 Welcome to Lab 5
Intended Learning Outcomes:
Produce scatterplots of report quality;
Calculate and interpret the sample correlation coefficient;
Fit linear regression models by using
lm
;Interpret estimates of model parameters.
1.1 Correlation coefficient
In the lectures we learned how to assess the strength of a linear relationship between random variables using the correlation coefficient. The population correlation is a measure of the magnitude of the strength of the relationship between two random variables \(X\) and \(Y\), and is defined as
\[\begin{equation} \rho(X,Y) =\frac{\mathrm{Cov}(X,Y)}{\sqrt{\mathrm{Var}(X)\mathrm{Var}(Y)}}, \tag{1.1} \end{equation}\]
and can be estimated by replacing each of \(\mathrm{Cov}(X,Y)\), \(\mathrm{Var}(X)\) and \(\mathrm{Var}(Y)\) by their unbiased estimators to give
\[\begin{equation} r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}} = \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^n(x_i-\bar{x})^2\sum_{i=1}^n(y_i-\bar{y})^2}}, \tag{1.2} \end{equation}\]
the sample correlation coefficient (\(-1 \leq r \leq 1\)).
Given a sample of data, we can assess the statistical significance of the observed correlations between variables in the wider population. To do this we perform a hypothesis test.