In this workshop, we illustrate both parametric and non-parametric techniques for comparing two samples. Our objective in this workshop will be to assess whether or not two samples can be regarded as being from the same population. We perform the parametric version first, which requires normality assumptions of the data, and then move on to the non-parametric form which requires no normality assumptions.
To start, create a new R script by clicking on the corresponding menu button (in the top left); and save it somewhere in your computer. You can now write all your code into this file, and then execute it in R using the menu button Run
or by simple pressing ctrl
\(+\) enter
.
\(~\)
Eriksen, Björnstad an Götestam (1986) studied a social skills training program for alcoholics. Twenty-three alcohol-dependent male inpatients at an alcohol treatment centre were randomly assigned to two groups. The 12 control group patients were given a traditional treatment programme. The 11 treatment group patients were given the traditional treatment, plus a class in social skills training (“SST”). The patients were monitored for one year at 2-week intervals, and their total alcohol intake over the year was recorded (in cl pure alcohol).
A: Control | 1042 | 1617 | 1180 | 973 | 1552 | 1251 | 1151 | 1511 | 728 | 1079 | 951 | 1391 |
B: SST | 874 | 389 | 612 | 798 | 1152 | 893 | 541 | 741 | 1064 | 862 | 213 |
c
function to enter these data into R as two vectors called A
for the control group, and B
for the treatment group.A
and B
using a side-by-side boxplot
. What conclusions can you draw about the two groups?hist
) of both samples. On the basis of these plots, do the samples look approximately Normally distributed?
## SOLUTION
# (a)
A <- c(1042, 1617, 1180, 973, 1552, 1251, 1151, 1511, 728, 1079, 951, 1391)
B <- c(874, 389, 612, 798, 1152, 893, 541, 741, 1064, 862, 213)
# (b)
summary(A)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 728 1025 1166 1202 1421 1617
summary(B)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 213.0 576.5 798.0 739.9 883.5 1152.0
# (c)
boxplot(A,B)
# (d)
par(mfrow=c(1,2))
hist(A)
hist(B)
A better graphical way to assess whether the data is distributed normally is to draw a normal quantile plot (or quantile-quantile, or QQ plot). We can do this in R using the qqnorm
function, where to draw the Normal quantile plot for a vector of data x
we use the command
qqnorm(x)
With this technique, the quantiles of the data (i.e. the ordered data values) are plotted against the quantiles which would be expected of a matching normal distribution of a normal distribution. If the data are normally distributed, then the points of the QQ plot will should lie on an approximately straight line. Deviations from a straight line suggest departures from the normal distribution.
qqnorm
. Do the samples look approximately Normally distributed? Do your conclusions agree with those made on the basis of the histograms?## SOLUTION
# (a)
par(mfrow=c(1,2))
hist(A)
qqnorm(A)
par(mfrow=c(1,2))
hist(B)
qqnorm(B)
# Kolmogorov-Smirnov test
ks.test(A,"pnorm", mean=mean(A), sd=sd(A))
##
## One-sample Kolmogorov-Smirnov test
##
## data: A
## D = 0.12213, p-value = 0.9842
## alternative hypothesis: two-sided
ks.test(B,"pnorm", mean=mean(B), sd=sd(B))
##
## One-sample Kolmogorov-Smirnov test
##
## data: B
## D = 0.13791, p-value = 0.9666
## alternative hypothesis: two-sided
# Shapiro-Wilk test
shapiro.test(A)
##
## Shapiro-Wilk normality test
##
## data: A
## W = 0.96414, p-value = 0.8409
shapiro.test(B)
##
## Shapiro-Wilk normality test
##
## data: B
## W = 0.96798, p-value = 0.8654
# Since all p-values are large, we do not reject the null hypothesis that
# the data are normally distributed.
\(~\)
We will use the independent sample \(t\)-test to compare the two samples (ref to Lecture 2).
The t.test
function, from the stats
package, produces a variety of t-tests. Unlike most statistical packages, the default assumes unequal variance of the two samples, however we can override this with the var.equal
argument.
# independent 2-sample t-test assuming equal variances and a pooled variance estimate
t.test(x,y,var.equal=TRUE)
# independent 2-sample t-test unequal variance and the Welsh df modification.
t.test(x,y)
t.test(x,y,var.equal=FALSE)
You can use the alternative="less"
or alternative="greater"
options to specify a one-sided test.
A paired-sample \(t\)-test is obtained from the same function by specifying paired=TRUE
and ensuring both samples are the same size:
# paired t-test
t.test(x,y,paired=TRUE)
Finally, a one-sample \(t\)-test can be performed to test the population mean \(\mu\) of the sample against a specific value by supplying only a single sample and a value for mu
:
# one sample t-test
t.test(y,mu=3) # Ho: mu=3
R Help: t.test
\(~\)
t.test
to compare these two samples A
and B
. Use the optional argument var.equal
to perform an equal-variance test (TRUE
).t
.# (a)
library(stats)
t.test(A,B, alternative = "two.sided", var.equal=TRUE)
##
## Two Sample t-test
##
## data: A and B
## t = 4.0076, df = 21, p-value = 0.0006381
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 222.3822 702.1329
## sample estimates:
## mean of x mean of y
## 1202.1667 739.9091
# (b)
t<-t.test(A,B, alternative = "two.sided", var.equal=TRUE)$statistic
t
## t
## 4.00757
# (c) we can also extract the p-value from the output summary
p<-t.test(A,B, alternative = "two.sided", var.equal=TRUE)$p.value
p #Since the p-value is less than 0.05 we reject H0
## [1] 0.0006380701
# (d)
t.test(A,B, alternative = "two.sided", var.equal=TRUE, conf.level = 0.95)$conf.int
## [1] 222.3822 702.1329
## attr(,"conf.level")
## [1] 0.95
\(~\)
We can also perform the significance test by comparing the value of the \(t\) test statistic to the critical value.
df
[qt]
(r_5_basicstats.html#dist) function to find the critical \(t\) value for the test of \(H_0: \mu_A=\mu_B\) against \(H_1: \mu_A\neq\mu_B\) at the 5% level of significance when we have df
degrees of freedom? Check you get the same value as below.
# (a)
n <- length(A)
m <- length(B)
df <- n+m-2
df
## [1] 21
# (b)
tcrit <- qt(0.975, df=df)
tcrit
## [1] 2.079614
\(~\)
t
, and hence perform the significance test.# (a)
t<-t.test(A,B, alternative = "two.sided", var.equal=TRUE)$statistic
t
## t
## 4.00757
t<-unname(t)
t
## [1] 4.00757
tcrit <- qt(0.975, df=df)
tcrit
## [1] 2.079614
# Since the absolute value of the test statistic is greater than the critical value, $t_{1-\alpha/2},df$, tcrit, we reject H0.
# (b)
pvalue <- 2*(1-pt(t,df))
pvalue
## [1] 0.0006380701
# (c)
# Strong evidence here to reject H_0 at 5% level, sample sizes are rather small
# (d)
# From the boxplots, this assumption does not look unreasonable though perhaps B is slightly less spread than A.
\(~\)
t.test
and the optional argument var.equal=FALSE
to perform an unequal-variance test.# (a)
t.test(A,B, alternative = "two.sided", var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: A and B
## t = 4.0013, df = 20.674, p-value = 0.0006644
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 221.7733 702.7418
## sample estimates:
## mean of x mean of y
## 1202.1667 739.9091
# (b)
## Results don't change much, looks like the equal-variance assumption was safe
\(~\)
Nonparametric methods do not assume any distribution, and try to make only the weakest possible assumptions about the distributions that describe our data and are often used in such situations. The nonparametric equivalent to the independent sample \(t\)-test is the (Wilcoxon) rank-sum test. Nonparametric tests do not assume knowledge of the distributions of the two groups, and are often based on the data ranks with tests comparing the data ranks of the two groups.
Recall that the rank-sum test statistic is the sum of the ranks of the first sample (of size \(n\)) in the combined sample (of size \(n+m\) where \(m\) is the size of the second sample), that is \[ W = \sum_{i=1}^n r_i. \]
The exact critical value for the rank-sum test \(W\) is a function of the sample sizes, \(n\) and \(m\), and the significance level, \(\alpha\) (see the statistical tables on Ultra). When the samples are both large, the distribution of the rank-sum statistic is approximately Normal, which allows us to use Normal tables to construct confidence intervals and test hypotheses.
Note: the reported value of the rank-sum test statistic using the R function wilcox.test
is \(W-\frac{n(n+1)}{2}\).
c
function to combine the two samples together into a larger vector composed of the elements of the A group followed by the B group.rank
function on the combined sample to rank all the observations; call this rks
.rks
should now be the ranks \(r_i\) for the A group in the combined sample. Extract these from the larger vector and save them as Aranks
.wilcox.test
function function and try it on A
and B
. The optional argument exact=TRUE
will compute the test exactly, whereas exact=FALSE
will use a Normal approximation. Do the results agree with your t-test results?
# (a)
AB <- c(A,B)
# (b)
rks <- rank(AB)
# (c)
## statistic for group A
Aranks <- rks[1:n]
# (d)
W <- sum(Aranks)
W
## [1] 195
# (e)
wilcox.test(A,B,exact=TRUE)
##
## Wilcoxon rank sum exact test
##
## data: A and B
## W = 117, p-value = 0.0009807
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(A,B,exact=TRUE)$statistic
## W
## 117
wilcox.test(A,B,exact=FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: A and B
## W = 117, p-value = 0.001883
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(A,B,exact=FALSE)$statistic
## W
## 117
# we get the same value of the test statistic as in (d).
# since the p-values are less than 0.05 we reject H0,
# so we have the same conclusion as with the t-test.
\(~\)