4 Exercise 2: Chi-squared test for association
The sinking of the Titanic occurred on the 15th of April in 1912. The data frame TITANIC3 in the PASWR2package contains information regarding class, sex, and survival as well as several other variables.
4.1 Contingency tables
In order to analyse the associations between these variables we should make contingency tables.
For example for a contingency table for passenger class versus survival we can use the code:
## survived
## pclass 0 1
## 1st 123 200
## 2nd 158 119
## 3rd 528 181
Question Using xtabs() and other tools you have learnt such as subset() (or [] and $), subset your data and create contingency tables of:
- male passengers’ class versus survival, and
- female passengers’ class versus survival.
At the time of data collection in 1912, passenger gender was recorded as 'sex' and separated only into 'male' or 'female'.
TABLE2 <- xtabs(~ pclass + survived, data = subset(TITANIC3, sex == "male"))
TABLE3 <- xtabs(~ pclass + survived, data = subset(TITANIC3, sex == "female"))
#OR
MEN <- TITANIC3[TITANIC3$sex=="male",]
TABLE2 <- xtabs(~ pclass + survived, data = MALES)
WOMEN <- TITANIC3[TITANIC3$sex=="female",]
TABLE3 <- xtabs(~ pclass + survived, data = WOMEN)
#To view the tables
TABLE2## survived
## pclass 0 1
## 1st 118 61
## 2nd 146 25
## 3rd 418 75
## survived
## pclass 0 1
## 1st 5 139
## 2nd 12 94
## 3rd 110 106
4.2 Hypothesis testing
To test if there is an association between class and survival generally for all passengers or for men and/or women, we should complete Chi-Squared Hypothesis Tests.
This can be done using:
##
## Pearson's Chi-squared test
##
## data: TABLE1
## X-squared = 127.86, df = 2, p-value < 2.2e-16
With a \(p\)-value of less than 0.05, this suggests there is a significant association between class and survival of passengers generally.
Question Test if there is an association between class and survival for men and women on the titanic: