4 Exercise 2: Chi-squared test for association

The sinking of the Titanic occurred on the 15th of April in 1912. The data frame TITANIC3 in the PASWR2package contains information regarding class, sex, and survival as well as several other variables.

TITANIC3 <- TITANIC3

4.1 Contingency tables

In order to analyse the associations between these variables we should make contingency tables.

For example for a contingency table for passenger class versus survival we can use the code:

TABLE1 <- xtabs(~ pclass + survived, data = TITANIC3)
TABLE1
##       survived
## pclass   0   1
##    1st 123 200
##    2nd 158 119
##    3rd 528 181

Question Using xtabs() and other tools you have learnt such as subset() (or [] and $), subset your data and create contingency tables of:

  1. male passengers’ class versus survival, and
  2. female passengers’ class versus survival.

At the time of data collection in 1912, passenger gender was recorded as 'sex' and separated only into 'male' or 'female'.

TABLE2 <- xtabs(~ pclass + survived, data = subset(TITANIC3, sex == "male"))
TABLE3 <- xtabs(~ pclass + survived, data = subset(TITANIC3, sex == "female"))

#OR 

MEN <- TITANIC3[TITANIC3$sex=="male",]
TABLE2 <- xtabs(~ pclass + survived, data = MALES)

WOMEN <- TITANIC3[TITANIC3$sex=="female",]
TABLE3 <- xtabs(~ pclass + survived, data = WOMEN)

#To view the tables
TABLE2
##       survived
## pclass   0   1
##    1st 118  61
##    2nd 146  25
##    3rd 418  75
TABLE3
##       survived
## pclass   0   1
##    1st   5 139
##    2nd  12  94
##    3rd 110 106

4.2 Hypothesis testing

To test if there is an association between class and survival generally for all passengers or for men and/or women, we should complete Chi-Squared Hypothesis Tests.

This can be done using:

TEST1 <- chisq.test(TABLE1)
TEST1
## 
##  Pearson's Chi-squared test
## 
## data:  TABLE1
## X-squared = 127.86, df = 2, p-value < 2.2e-16

With a \(p\)-value of less than 0.05, this suggests there is a significant association between class and survival of passengers generally.

Question Test if there is an association between class and survival for men and women on the titanic:

TEST2 <- chisq.test(TABLE2)
TEST2
## 
##  Pearson's Chi-squared test
## 
## data:  TABLE2
## X-squared = 33.033, df = 2, p-value = 6.714e-08
TEST3 <- chisq.test(TABLE3)
TEST3
## 
##  Pearson's Chi-squared test
## 
## data:  TABLE3
## X-squared = 115.7, df = 2, p-value < 2.2e-16