9 Two-sample Tests

In the last chapter, we have studied the basic concepts of hypothesis testing. We have discussed the tests about the value of the mean of a population. What if we want to compare two means of two populations? They are called two-sample tests. They have a similar form to the tests we have seen so far, and they are generally called Wald-type tests.

9.1 Independent Two-Sample Tests

9.1.1 Two-sample Test

Suppose we have two population $X$ and $Y$ \[X\sim(\mu_1,\sigma_1^2) \hspace{5mm} \text{and} \hspace{5mm} Y \sim(\mu_2, \sigma_2^2)\] and we want to compare $\mu_1$ and $\mu_2$.

Hence, the null hypothesis is \[H_0: \mu_1 = \mu_2 \hspace{5mm} \text{or} \hspace{5mm} H_0: \mu_1 - \mu_2 = 0\]

For the alternative hypotheses, we have three choices:

$H_1: \mu_1 \ne \mu_2$ or equivalently $H_1: \mu_1 - \mu_2 \ne 0$
$H_1: \mu_1 > \mu_2$ or equivalently $H_1: \mu_1 - \mu_2 > 0$
$H_1: \mu_1 < \mu_2$ or equivalently $H_1: \mu_1 - \mu_2 < 0$

Example 9.1 In the rent example, I want to test whether undergraduate students are paying more than graduate students for rent.

Then if I let $\mu_1$ to be the mean rent that undergraduate students are paying and $\mu_2$ to be the mean rent that graduate students are paying. Then my hypotheses are

\[H_0: \mu_1 = \mu_2 \hspace{5mm} \text{vs.} \hspace{5mm} H_1: \mu_1 > \mu_2\]

We may also be interested in testing a specific value of the difference of the two means \[H_0: \mu_1 - \mu_2 = \mu_0\] In the above, we were only considering the case $\mu_0 = 0$, but we can choose any other value $\mu_0$ for our test.

9.1.2 Distribution of the Difference of Two Normal Means

Suppose we have $n_1$ sample data from $X$ \[X_1, X_2, ..., X_{n_1} \overset{\text{iid}}{\sim} \mathcal{N}(\mu_1, \sigma_1^2)\] and $n_2$ sample data from $Y$ \[Y_1, Y_2, ..., Y_{n_2} \overset{\text{iid}}{\sim} \mathcal{N}(\mu_2, \sigma_2^2)\]

From the sample data, we can use the sample means $\bar{X}_{n_1}$ and $\bar{Y}_{n_2}$ to approximate $\mu_1$ and $\mu_2$. An evidence against the null hypothesis $H_0: \mu_1 - \mu_2 = \mu_0$ will be that the difference $\bar{X}_{n_1}-\bar{Y}_{n_2}$ is very different from $\mu_0$. Therefore, in order to construct the test, we need to find the distribution of the difference of the two means $\bar{X}_{n_1}-\bar{Y}_{n_2}$.

Now, suppose that data from $X$ and from $Y$ are sampled independently from each other, i.e., the collection of $X_i$ does not depend on the collection of $Y_j$ and vice versa, for all $i = 1, 2, ..., n_1$ and $j = 1, 2, ..., n_2$. Then, the expectation and variance formula of the sample mean (Chapter 6) and the linearity property of normal distribution (Chapter 5) gives us \[\begin{align*} \bar{X}_{n_1} = \frac{1}{n_1}\left(X_1 + ... + X_{n_1}\right) & \sim \mathcal{N}\left(\mu_1, \frac{\sigma_1^2}{n_1}\right) \\ \bar{Y}_{n_2} = \frac{1}{n_2}\left(Y_1 + ... + Y_{n_1}\right) & \sim \mathcal{N}\left(\mu_2, \frac{\sigma_2^2}{n_2}\right) \\ \end{align*}\]

We can further derive the distribution of the difference of the two means by applying again the linearity property of normal distribution on $\bar{X}_{n_1}$ and $\bar{Y}_{n_2}$ and obtain

\[\bar{X}_{n_1} - \bar{Y}_{n_2} \sim \mathcal{N}\left(\mu_1 - \mu_2, \frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}\right)\]

Now, we can use this knowledge to construct tests for $\mu_1 - \mu_2$.

9.1.3 When $\sigma_1$ and $\sigma_2$ are Known

Similar to the one-sample case in Section 8.4, when $\sigma_1$ and $\sigma_2$ are known, we can construct the two-sample test using the following pivotal quantity

\[\frac{(\bar{X}_{n_1} - \bar{Y}_{n_2}) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}} \sim \mathcal{N}(0,1)s\]

Thus, to test $H_0: \mu_1 - \mu_2 = \mu_0$, we use the test statistic

\[\frac{(\bar{X}_{n_1} - \bar{Y}_{n_2})-\mu_0}{\sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}} \hspace{3mm} \overset{H_0: \mu_1 - \mu_2 = \mu_0}{\sim} \hspace{3mm} \mathcal{N}(0,1)\]

When we are testing $H_0: \mu_1 = \mu_2$, i.e., $\mu_1 - \mu_2 = 0$, we replace $\mu_0$ in the above formula by 0. The procedure is the same as for the one-sample case, we are just using a different test statistic.

Example 9.2 Suppose we want to know whether undergraduate students are paying more than graduate students for rent. We randomly collect $100$ undergraduate students and get an average rent of $\$860$. Another random sample of $70$ graduate students are collected and the average rent found is $\$800$. It is known that the standard deviation of rent for undergraduate students is $\$120$ and that of graduate student is $\$100$. Conduct a test at 5% significance level.

Solution: Let $\mu_1$ be the average rent for undergraduate students and $\mu_2$ be the average rent for graduate students.

Step 1: Hypotheses: \[H_0: \mu_1 = \mu_2 \hspace{5mm} \text{vs.} \hspace{5mm} H_1: \mu_1 > \mu_2\]
Step 2: From step 1, $\mu_0 = 0$. The test statistic is \[t = \frac{\bar{x}_{n_1} - \bar{y}_{n_2}}{\sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}} = \frac{\bar{x} - \bar{y}}{\sqrt{\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}}} = \frac{860 - 800}{\sqrt{\frac{120^2}{100} + \frac{100^2}{70}}} = 3.54\]
Step 3: This is a one-sided upper-tail test at $\alpha = 0.05$ and known $\sigma_1$ and $\sigma_2$. So the critical value is \[c = z_{1-\alpha} = z_{0.95} = 1.645\]
Step 4: Since $t = 3.54 > c = 1.645$ and this is a one-sided upper-tail test, we reject the null hypothesis at $5\%$ level of significance. We conclude that at $5\%$ significance level, the evidence in the data supports that undergraduate students are paying more than graduate students.

Notes: In Example 9.2, we see that only Step 2 changes, other steps remain the same as when we are doing one-sample test.

Exercise 9.1 Conduct the same test as Example 9.2 if I let $\mu_0 = 100$ and I want to test $H_0: \mu_1-\mu_2 = \mu_0$ vs $H_0: \mu_1-\mu_2 > \mu_0$

9.1.4 When $\sigma_1$ and $\sigma_2$ are Unknown and $\sigma_1 = \sigma_2 = \sigma$

Consider the case where we do not know $\sigma_1$ and $\sigma_2$ but somehow we know that the two population has the same standard deviation.

Suppose $S_1$ and $S_2$ are the sample standard deviations, i.e., \[S_1^2 = \frac{1}{n_1}\sum_{i=1}^{n_1}(X_i - \bar{X}_{n_1})^2 \hspace{5mm} \text{and} \hspace{5mm} S_2^2 = \frac{1}{n_2}\sum_{i=1}^{n_2}(Y_i - \bar{Y}_{n_2})^2\]

We can try to “pool” $S_1$ and $S_2$ to get a single estimate for $\sigma_1 = \sigma_2 = \sigma$ using the formula \[S_p = \sqrt{\frac{(n_1-1)S_1^2 + (n_2-1)S_2^2}{n_1+n_2-2}}\] This is called the pooled standard deviation.

Then, if our assumptions hold, it can be proved that \[\frac{(\bar{X}_{n_1} - \bar{Y}_{n_2})-\mu_0}{S_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} \hspace{3mm} \overset{H_0:\mu_1-\mu_2=\mu_0}{\sim} \hspace{3mm} t(n_1+n_2-2)\]

We can now use this test statistic to conduct tests for $\mu_1-\mu_2$. Everything is the same as the case where $\sigma_1$ and $\sigma_2$ are known, just the denominator of the test statistic is changed and the standard normal distribution become the $t$ distribution with degree of freedom $n_1+n_2-2$.

Example 9.3 Suppose we need to compare the performance of two call centers in terms of average call lengths and find out if the difference is statistically significant or the difference is just an occurrence that happens by chance.

We randomly select 30 calls from center 1 and 20 calls from center 2. The sample means are 122 seconds and 135 seconds respectively. The standard deviation is 15 seconds and 20 seconds respectively. We know that the standard deviation of lengths of calls from the two centers are the same.

Conduct a hypothesis test at $5\%$ level of significance.

Solution:

Step 1: Let $\mu_1$ be the mean length of calls of center 1 and $\mu_2$ be the mean length of calls of center 2. The hypotheses are \[H_0: \mu_1 = \mu_2 \hspace{5mm} \text{vs.} \hspace{5mm} H_1: \mu_1 \ne \mu_2\]
Step 2: From step 1, $\mu_0 = 0$. The test statistic can be calculated as follows

\[s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}} = \sqrt{\frac{(30-1)\times 15^2 + (20-1)\times 20^2}{30+20-2}} = 17.1543\] \[t = \frac{\bar{x} - \bar{y}}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} = \frac{122-135}{17.1543 \times \sqrt{\frac{1}{30} + \frac{1}{20}}} = -2.6252\]

Step 3: $\alpha = 0.05$. Our test is two-sided, so we need to use the $t$ statistic \[c = t_{1-\alpha/2, n_1+n_2-2} = t_{0.975, 48} = 2.0211\] Here we use $df = 40$ in the $t$-table because it is closest to $df=48$.
Step 4: Because $|t| = 2.6252 > c = 2.0211$, we reject the null hypothesis at $5\%$ level of significance. We conclude that there is sufficient evidence in our data that the lengths of calls of the two centers are different.

Notes: When you cannot find the degree of freedom in the $t$-table

You can use the degree of freedom that is closest to the degree of freedom you want.
- For example, you want $df = 39$, you can use $df = 40$ in the table.
If the degrees of freedom in the tables are far from yours, choose the smaller one because that means your critical value is higher and test will be more conservative.
- For example, you want $df = 48$ which is far from both $40$ and $60$ in the $t$-table, you choose $df = 40$ because it is more conservative.
You can always calculate the exact critical value with R using function qt(). For example, qt(0.975, df = 48) will output 2.010635, which means $P(X < 2.010635) = 0.975$ if $X$ follows a $t$-distribution with 48 degrees of freedom.

9.1.5 When $\sigma_1$ and $\sigma_2$ are Unknown and $\sigma_1 \ne \sigma_2$

When you do not know whether $\sigma_1$ and $\sigma_2$ are the same, you can use the Welch procedure to approximate the distributions of the difference in means of the two normal samples.

In particular, if we are testing $H_0: \mu_1 - \mu_2 = \mu_0$, use the observed test statistic \[t = \frac{(\bar{x} - \bar{y}) - \mu_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\] and the $t$-distribution of the degree of freedom \[df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{s_1^4}{n_1^2(n_1-1)} + \frac{s_2^4}{n_2^2(n_2-1)}}\]

Example 9.4 Let us revisit Example 9.3 and suppose we do not know that the two standard deviations are equal. Then

Step 1: Same as in Example 9.3.
Step 2: The test statistic is

\[t = \frac{(\bar{x} - \bar{y}) - \mu_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} = \frac{122-135}{\sqrt{\frac{15^2}{30} + \frac{20^2}{20}}} = -2.4790\]

Step 3: $\alpha = 0.5$ and the test is two-sided so the quantile we need is $1-0.05/2 = 0.975$-quantile. The degree of freedom we need to use is

\[df = \frac{\left(\frac{15^2}{30} + \frac{20^2}{20}\right)^2}{\frac{15^4}{30^2(30-1)} + \frac{20^4}{20^2(20-1)}} = 32.89 \approx = 33\]

The critical value is then

\[c = t_{0.975, 33} \approx t_{0.975, 30} = 2.0423\]

Step 4: Because $|t| = 2.4790 > c = 2.0423$, we reject the null hypothesis at $5\%$ level of significance and conclude that there is sufficient evidence in our data that the lengths of calls of the two centers are different.

Notes: When conduct the Welch’s procedure by hand, we can use the approximation for the degree of freedom: \[df \approx \min(n_1-1, n_2-1)\] which means that we use the smaller of $n_1-1$ and $n_2-1$ as our degree of freedom. In the Example 9.4 above, $n_1-1 = 29$ and $n_2-1 = 19$, so we can use $19$ degree of freedom.

Exercise 9.2 If we use the approximation, will the test result change? Has the test become more conservative or more lenient?

9.1.6 Summary

Consider testing $H_0: \mu_1 - \mu_2 = \mu_0$.

Case	Numerator of $t$	Denominator of $t$	Distribution of $t$	Degree of freedom
$\sigma_1, \sigma_2$ known	$(\bar{x}-\bar{y}) - \mu_0$	$\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$	$z$
$\sigma_1 = \sigma_2 = \sigma$ unknown		$\sqrt{\left(\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}\right)\left(\frac{1}{n_1} + \frac{1}{n_2} \right)}$	$t$	$n_1 + n_2 - 2$
$\sigma_1 \ne \sigma_2$ unknown		$\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$	$t$	$\frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{s_1^4}{n_1^2(n_1-1)} + \frac{s_2^4}{n_2^2(n_2-1)}}$ or $\min(n_1-1, n_2-1)$

Notes:

When we do not know if the two variance is the same or if they are different, it is better to use the Welch’s procedure directly. When solving for assessments, you can use the approximation $\min(n_1-1, n_2-1)$.
This way, the test will be more conservative, which means that our Type I error is small. However, this also has the risk of increasing Type II error (Figure 8.4.

9.2 Wald-type Tests and Confidence Intervals

We can see in all of the tests we have learned so far, the test statistics has the same structure:

\[\frac{\hat{\theta} - \theta_0}{\text{sd}(\hat{\theta})}\]

where $\theta$ is a parameter of interest, $\theta_0$ is the hypothesized value, $\hat{\theta}$ is the statistic that estimate the parameter of interest, and $sd(\hat{\theta})$ is the standard deviation of that statistic²¹.

Tests using this type of test statistic are called Wald-type tests. The Wald-type tests whose test statistics follow $t$-distribution are usually called $t$-tests and the statistic is usually called the $t$-statistic.

Using the $t$-statistic, we can conduct tests and build confidence intervals for $\theta$. As mentioned in the Chapter 7, \[\text{Confidence level} = 1 - \alpha.\]

Example 9.5 For the case where $\sigma_1$ and $\sigma_2$ is known,

$\theta = \mu_1 - \mu_2$
$\hat{\theta} = \bar{x} - \bar{y}$
$\mathrm{sd}(\hat{\theta}) = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$

9.2.1 From Hypothesis Testing to Confidence Interval

Suppose that with significance level $\alpha$ and a two-sided test, we have the critical value $c$. Then a $(1-\alpha)100\%$ confidence interval for $\theta$ can be built by

\[\hat{\theta} \pm c \times \text{sd}(\hat{\theta})\]

Example 9.6 In Example 9.4, the $(1-\alpha)100\% = 95\%$ confidence interval for $\mu_1 - \mu_2$ is

\[\begin{align*} & (\bar{x} - \bar{y}) \pm c \times \text{Denominator of $t$} \\ & = (122-135) \pm 2.0423 \times \sqrt{\frac{15^2}{30} + \frac{20^2}{20}} \\ & = (-23.7099, -2.2901) \end{align*}\]

Notes: Note that we used $1-\alpha/2$ and $\alpha/2$ quantiles for confidence intervals by convention, which is discussed in Chapter 7. Therefore, the equivalence of the confidence intervals in Chapter 7 would be a two-sided test.

9.2.2 From Confidence Interval to Hypothesis Testing

Suppose we have a $(1-\alpha)100\%$ confidence interval $(a, b)$ for $\theta$. Now we want to test a two-sided test \[H_0: \theta = \theta_0 \hspace{5mm} \text{vs} \hspace{5mm} H_1:\theta \ne \theta_0\] We just need to see if $\theta_0$ lies in the confidence interval. If \[a < \theta_0 < b\] Then we do not reject the null hypothesis at $100\alpha\%$ level of confidence; and we reject the null hypothesis otherwise.

Example 9.7 In Example 9.6, we see that the hypothesized value $\mu_0=0$ for $\mu_1 - \mu_2$ does not lie in the $95$% confidence interval. Therefore if we are to conduct a two-sided test, we reject the null hypothesis at $100\alpha\%$ level of confidence.

9.3 $t$-test for Paired Data

In Section 9.1, we were discussing two populations that are independent of each other. What if the two distributions are paired with each other?

Example 9.8 Some examples of paired random variables $X$ and $Y$

$X$: diameter of the left eye of a patient, $Y$: diameter of the right eye of that patient
$X$: age of the wife, $Y$: age of the husband
$X$: height of the person of the twin who was adopted, $Y$: height of the other person of the twin.

In these examples, $X$ and $Y$ are paired and are clearly not independent.

For paired data, we have $n_1 = n_2 = n$, i.e., the size of the two samples are the same and each $X_i$ is paired with a specific $Y_i$. Therefore, we can denote our data as $(X_i, Y_i)$ for $i = 1, 2, ..., n$. In this case, we cannot use the techniques from Section 9.1 to conduct our tests, because $X$ and $Y$ are dependent on each other.

How to remove the dependency? Note that comparing $\mu_1$ and $\mu_2$ is equivalent to investigating the difference between the two. Let \[D = X-Y\] be the difference of the two random variables $X$ and $Y$. For each pair of data $(X_i, Y_i)$ we have the corresponding data $D_i$. Then, a test $H_0: \mu_1 - \mu_2 = \mu_0$ is equivalent to testing $H_0: \mu_D = \mu_0$. Now, we come back to one-sample tests we discussed in Chapter 8, by replacing $\mu$ by $\mu_1-\mu_2$.

The observed test statistic is

\[t = \frac{\bar{d} - \mu_0}{\mathrm{sd}(\bar{d})} = \frac{\bar{d} - \mu_0}{s_d/\sqrt{n}}\]

where

$\bar{d}$ is the mean of the differences $d_i = (x_i-y_i)$ calculated from our observed sample
$s_d$ is the standard deviation of the differences $d_i$
we will use $n_d-1$ degree of freedom for our test, where $n_d = n$ is the number of pairs we collected in our sample.

Example 9.9 A supermarket chain wants to know if its “buy one, get one free” campaign increases customer traffic enough to justify the cost of the program. For each of $10$ stores it selects two days at random to run the test. For one of those days, the program will be in effect and not for the other. The supermarket chain wants to test the hypothesis that there is no mean difference in traffic against the alternative that the program leads to a difference in the mean traffic. The results from the $10$ stores are presented in the table below.

Store #	Customer visits with Program	Customer visits without Program	Difference $d_i$
$1$	$140$	$136$	$140-136 = 4$
$2$	$233$	$235$	$233-235 = -2$
$3$	$110$	$108$	$2$
$4$	$42$	$35$	$7$
$5$	$332$	$328$	$4$
$6$	$135$	$135$	$0$
$7$	$151$	$144$	$7$
$8$	$33$	$39$	$-6$
$9$	$178$	$170$	$8$
$10$	$147$	$141$	$6$
Mean	$150.1$	$147.1$	$\bar{d} = 150.1-147.1 = 3$
Std. dev	$86.98$	86.33

Using the information provided test the null hypothesis of no difference in average traffic at a $5\%$ level of significance.

Solution:

Let us start by summarizing the information given to us in the question:

The data is paired based on store.
In total we have $10$ stores, so $10$ pairs and $n_d = 10$.
$\bar{d} = 3$
$s_d = \sqrt{\frac{\sum_{i=1}^{10}(d_i-\bar{d})^2}{10-1}} = 4.5216$
Step 1: Hypotheses: \[H_0: \mu_D = 0 \hspace{5mm} \text{vs} \hspace{5mm} H_1: \mu_D \ne 0\]
Step 2: Test statistic: \[t = \frac{\bar{d}-\mu_0}{\mathrm{sd}(\bar{d})} = \frac{3-0}{4.5216/\sqrt{10}} = 2.0981\]
Step 3: In this test, $\alpha = 0.05$ and we are doing a two-sided test, the degree of freedom is $n_d - 1 = 10 - 1 = 9$. So our critical value is \[c = t_{1-\alpha/2, df} = t_{0.975, 9} = 2.2622\]
Step 4: Since $|t| = 2.0981 < c = 2.2622$, we do not reject the null hypothesis at $5\%$ level of significance. We conclude that there is not sufficient evidence in the data to claim that the programs increases the customer visits.

Example 9.10 In Example 9.9, a $95\%$ confidence interval for the mean difference of customer visits between with and without the program is

\[\bar{d} \pm c \times \mathrm{sd}(\bar{d}) = 3 \pm 2.2622 \times \frac{4.5216}{\sqrt{10}} = (-0.2346, 6.2345)\]

We see that the interval contains $\mu_0 = 0$ so we do not reject $H_0$ for a two-sided test. This is the same with what we did in Example 9.9.

9.4 $t$-test for Proportions

In Section 9.1 and 9.3, we were interested continuous data. But what if our data is binary?

Example 9.11 If we are interested in people’s height, then the data will be a specific number, and we can apply techniques from Section 9.1 and 9.3 to conduct tests. This type of data is continuous.

But suppose we are interested in whether Canadians take the Covid vaccine. Then, for each individual in our sample, the data we collect is whether they are vaccinated (yes) or not (no). This data can be further coded as either 1 or 0, respectively. This type of data is called binary.

When obtaining binary data, we usually are interested in proportions. Here, it is the proportion of Canadians taking the Covid-19 vaccine. Therefore, the population proportion is our parameter of interest. However, the proportion is nothing so new, it is just the mean of the binary data.

Recall the Bernoulli distribution we learn in Chapter 4 which also takes on binary values of $0$ and $1$. We can use Bernoulli distribution to model our data. Let $X$ be our binary random variable of interest. The population distribution follows a Bernoulli distribution: \[\mathbb{P}(X = x) = \begin{cases} p & \text{if } x=1 \\ 1-p & \text{if } x = 0 \end{cases}\] where $p$ is the probability of 1s (successes/yeses), i.e., the probability a person takes the Covid-19 vaccine. Similarly, $1-p$ is the probability that a person does not take the vaccine. In the population where there are a lot of people, $p$, according to the frequentist view of probability in Chapter 3 will be the population proportion of 1s (successes/yeses).

Since $X \sim \text{Bernoulli}(p)$, from Chapter 4 we know \[X \sim (p, p(1-p))\] that is, $\mathbb{E}(X)=p$ and $\mathrm{var}(X)=p(1-p)$.

Suppose we collect the data $X_1, X_2, ..., X_n \overset{\text{iid}}{\sim} \text{Bernoulli}(p)$ from the distribution of $X$. We know the expectation and variance of $X$: $X \sim (p, p(1-p))$. Then by Central Limit Theorem, when $n$ is large, we have the normal approximation \[\bar{X}_n = \frac{X_1 + X_2 +...+X_n}{n} = \frac{\text{number of 1s in the sample}}{n} = \hat{p} \overset{\cdot}{\sim} \mathcal{N}\left(p, \frac{p(1-p)}{n}\right)\] Where $\hat{p}$ is the proportion of yeses in the sample.

Now, we can use this approximation to conduct $t$-tests. We just need to replace $\bar{x}$ by $\hat{p}$ and $s$ by $\sqrt{\hat{p}(1-\hat{p})}$.

Notes:

When we conduct the tests for proportions, always use the standard normal distribution $z$ instead of $t$ distribution because the approximation by the Central Limit Theorem gives normal distribution.
The Central Limit Theorem works for large sample only. Conventionally, the tests should be conducted only when the number of 1s (yeses) exceed 15 and the number of 0s (nos) exceed 15.

9.4.1 One-sample Test of Proportion

Suppose we are testing $H_0: p = p_0$. Then under the null hypothesis, we know the variance to be $\sigma^2 = p_0(1-p_0)$. Now we can proceed similar like the one-sample of known variance case.

Example 9.12 An importer of electronic goods is considering packaging a new, easy-to-read instruction booklet with DVD players. It wants to package this booklet only if it helps customers more than the current booklet. Previous tests found that only $30\%$ of customers were able to program their DVD player.

An experiment with the new booklet found that $16$ out of $60$ customers were able to program their DVD player.

At a $5\%$ level of significance does the data suggest that more than $30\%$ of customers were able to program their DVD player with the new manual?

Solution:

Step 1: In this question, $p_0 = 0.3$, so the hypotheses are \[H_0: p = 0.3 \hspace{5mm} vs \hspace{5mm} H_0: p > 0.3\]
Step 2: The test statistic is \[t = \frac{\bar{x} - \mu_0}{\sqrt{\sigma^2/n}} = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} = \frac{\frac{16}{60}-0.3}{\sqrt{\frac{0.3(1-0.3)}{60}}} = -0.56\]
Step 3: The test is one-sided upper-tail and the significance level is $\alpha = 0.05$, so the critical value is \[c = z_{1-\alpha} = z_{0.95} = 1.645\]
Step 4: Since $-0.56 < 1.645$ and this is a one-sided upper-tail test, we do not reject the null hypothesis at 5% level of significance and we conclude that the data does not provide sufficient evidence to suggest that the proportion is greater than 30%.

Notes: For confidence intervals, we need to use \[s = \sqrt{\hat{p}(1-\hat{p})}\] instead of $p_0(1-p_0)/n$ because without the hypothesis test, we do not have the null hypothesis and we do not know $p_0$ or $\sigma$ anymore.

Example 9.13 The $95\%$ confidence interval for Example 9.12 is \[\hat{p} \pm z_{1-\alpha/2} \times \frac{s}{\sqrt{n}} = \frac{16}{60} \pm 1.96 \times \sqrt{\frac{\frac{16}{60}\frac{44}{60}}{60}} = (0.1548, 0.3786)\]

Note that we use $z_{1-\alpha/2}$ instead of $z_{1-\alpha}$ because CI is always two-sided.

9.4.2 Two-sample Test of Proportion

Suppose we are testing $H_0: p_1 - p_2 = 0$. Then under the null hypothesis, the two variance $\sigma_1$ and $\sigma_2$ is the same. We then need to pool the variances together by having \[s_p = \sqrt{\hat{p}(1-\hat{p})} \hspace{5mm} \text{where} \hspace{5mm} \hat{p} = \frac{n_1\hat{p}_1 + n_2\hat{p}_2}{n_1+n_2}\] i.e., $\hat{p}$ is the proportion of yeses in both populations. Now, the test proceed similarly.

Example 9.14 An experiment investigated the claim that taking vitamin C can help to prevent the common cold. Volunteers were randomly assigned to one of two groups: A group that received a $1000$ mg/day supplement of vitamin C and a group that received a placebo. The response variable was whether or not the individual developed a cold during the cold season.

In the experiment $302$ out of $407$ people receiving VitC supplement got a cold, $335$ people out of $411$ people receiving a placebo got a cold.

Test whether people taking VitC will be less likely to catch a cold at $5$% significance level.

Solution:

Step 1: Let $p_1$ be the proportion of people in the VitC group, $p_2$ be the proportion of people in the placebo group. Hypotheses: \[H_0: p_1 = p_2 \hspace{5mm} \text{vs} \hspace{5mm} H_1: p_1 < p_2\]
Step 2: Since under the null hypothesis, the two variance is the same, we use the pool estimate for $\sigma$ and we proceed similarly to the case of unknown $\sigma_1 = \sigma_2 = \sigma$: \[\begin{align*} \frac{\bar{x} - \bar{y}}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} & = \frac{\hat{p_1} - \hat{p_2}}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \\ & = \frac{\frac{302}{407} - \frac{335}{411}}{\sqrt{\left(\frac{302+335}{407+411}\right)\left(1-\frac{302+335}{407+411}\right)\left(\frac{1}{407} + \frac{1}{411}\right)}} \\ & = -2.5173 \end{align*}\]
Step 3: Because this is a proportion, we need to use a $z$-distribution. A one-sided lower-tail test at $\alpha = 0.05$ implies the critical value \[c = z_{\alpha} = z_{0.05} = -1.645\]
Step 4: Since $t = -2.5173 < c = -1.645$, and this is a one-sided lower-tail test, we reject the null hypothesis at $5$% significance interval. We conclude that the data provide sufficient evidence to claim that people taking VitC are less likely to catch a cold.

Exercise 9.3 Think about Chapter 1, how should the experiment be designed so that the claim in Example 9.14 trustworthy?

For confidence intervals, again, without a hypothesis test, we do not have the null hypothesis (we are not sure if it is correct or not), and we do not have $p_1 = p_2$ anymore. Hence, we cannot use the pool variance to plug in our confidence interval.

We need to use the variance formula when $\sigma_1 \ne \sigma_2$, i.e., $\frac{\sigma^2_1}{n_1} + \frac{\sigma^2_2}{n_2}$.
We also need to use the two-sided critical value $z_{1-\alpha/2}$.

So a $(1-\alpha)100\%$ confidence interval for $p_1-p_2$ is

\[\hat{p}_1 - \hat{p}_2 \pm z_{1-\alpha/2} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]

Exercise 9.4 What is the confidence interval for Example 9.14?

9.5 Summary of Wald-type hypothesis tests

9.5.1 Two-sided vs One-sided Tests

$H_0$	$H_1$	Quantile	Reject $H_0$	$p$-value
$\mu = \mu_0$}	$\mu \ne \mu_0$	$1-\alpha/2$	$\|t\| > c$	$\mathbb{P}(\|T\| > t)$
	$\mu > \mu_0$	$1-\alpha$	$t > c$	$\mathbb{P}(T > t)$
	$\mu < \mu_0$	$\alpha$	$t < c$	$\mathbb{P}(T < t)$

Note here that $T$ is the test statistic and $t$ is the observed test statistic.

9.5.2 Table of Wald-tests

Case	Test statistic $t$	Dist. of $t$	df	$(1-\alpha)100\%$ CI
1. $H_0: \mu = \mu_0$
a. $\sigma$ known	$\frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}$	$z$		$\bar{x} \pm z_{1-\alpha/2}\times \frac{\sigma}{\sqrt{n}}$
b. $\sigma$ unknown	$\frac{\bar{x} - \mu_0}{s/\sqrt{n}}$	$t$	$n-1$	$\bar{x} \pm t_{1-\alpha/2, df}\times \frac{s}{\sqrt{n}}$
2. $H_0: \mu_1 - \mu_2 = \mu_0$
a. $\sigma_1, \sigma_2$ known	$\frac{(\bar{x} - \bar{y}) - \mu_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$	$z$		$(\bar{x} - \bar{y}) \pm z_{1-\alpha/2} \times \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}$
b. $\sigma_1 = \sigma_2 = \sigma$ unknown	$\frac{(\bar{x} - \bar{y}) - \mu_0}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}$	$t$	$n_1+n_2-2$	$(\bar{x} - \bar{y}) \pm t_{df,1-\alpha/2} \times s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$
c. $\sigma_1, \sigma_2$ unknown	$\frac{(\bar{x} - \bar{y}) - \mu_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$ (*)	$t$	$\frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{s_1^4}{n_1^2(n_1-1)} + \frac{s_2^4}{n_2^2(n_2-1)}}$ or $\min(n_1-1, n_2-1)$	$(\bar{x} - \bar{y}) \pm t_{df,1-\alpha/2} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}$
3. $H_0: \mu_1 - \mu_2 = \mu_D = \mu_0$ for paired two-sample
	$\frac{\bar{d} - \mu_0}{s_d/\sqrt{n}}$	$t$	$n_D-1$	$\bar{d} \pm t_{1-\alpha/2,df}\times \frac{s_D}{\sqrt{n_D}}$
4. $H_0: p = p_0$
	$\frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}$	$z$		$\hat{p} \pm z_{1-\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$
5. $H_0: p_1 = p_2$
	$\frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}$ (**)	$z$		$\hat{p}_1 - \hat{p}_2 \pm z_{1-\alpha/2} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$

where

(*) $s_p = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}}$

(**) $\hat{p} = \frac{n_1p_1+n_2p_2}{n_1+n_2}$.

Note that the statistic is unknown until we collect the data and its value is subject to chance, so itself has a standard deviation.↩︎

Case	Numerator of \(t\)	Denominator of \(t\)	Distribution of \(t\)	Degree of freedom
\(\sigma_1, \sigma_2\) known	\((\bar{x}-\bar{y}) - \mu_0\)	\(\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\)	\(z\)
\(\sigma_1 = \sigma_2 = \sigma\) unknown		\(\sqrt{\left(\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{n_1+n_2-2}\right)\left(\frac{1}{n_1} + \frac{1}{n_2} \right)}\)	\(t\)	\(n_1 + n_2 - 2\)
\(\sigma_1 \ne \sigma_2\) unknown		\(\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)	\(t\)	\(\frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{s_1^4}{n_1^2(n_1-1)} + \frac{s_2^4}{n_2^2(n_2-1)}}\) or \(\min(n_1-1, n_2-1)\)

Store #	Customer visits with Program	Customer visits without Program	Difference \(d_i\)
\(1\)	\(140\)	\(136\)	\(140-136 = 4\)
\(2\)	\(233\)	\(235\)	\(233-235 = -2\)
\(3\)	\(110\)	\(108\)	\(2\)
\(4\)	\(42\)	\(35\)	\(7\)
\(5\)	\(332\)	\(328\)	\(4\)
\(6\)	\(135\)	\(135\)	\(0\)
\(7\)	\(151\)	\(144\)	\(7\)
\(8\)	\(33\)	\(39\)	\(-6\)
\(9\)	\(178\)	\(170\)	\(8\)
\(10\)	\(147\)	\(141\)	\(6\)
Mean	\(150.1\)	\(147.1\)	\(\bar{d} = 150.1-147.1 = 3\)
Std. dev	\(86.98\)	86.33

\(H_0\)	\(H_1\)	Quantile	Reject \(H_0\)	\(p\)-value
\(\mu = \mu_0\)}	\(\mu \ne \mu_0\)	\(1-\alpha/2\)	\(\|t\| > c\)	\(\mathbb{P}(\|T\| > t)\)
	\(\mu > \mu_0\)	\(1-\alpha\)	\(t > c\)	\(\mathbb{P}(T > t)\)
	\(\mu < \mu_0\)	\(\alpha\)	\(t < c\)	\(\mathbb{P}(T < t)\)

Case	Test statistic \(t\)	Dist. of \(t\)	df	\((1-\alpha)100\%\) CI
1. \(H_0: \mu = \mu_0\)
a. \(\sigma\) known	\(\frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\)	\(z\)		\(\bar{x} \pm z_{1-\alpha/2}\times \frac{\sigma}{\sqrt{n}}\)
b. \(\sigma\) unknown	\(\frac{\bar{x} - \mu_0}{s/\sqrt{n}}\)	\(t\)	\(n-1\)	\(\bar{x} \pm t_{1-\alpha/2, df}\times \frac{s}{\sqrt{n}}\)
2. \(H_0: \mu_1 - \mu_2 = \mu_0\)
a. \(\sigma_1, \sigma_2\) known	\(\frac{(\bar{x} - \bar{y}) - \mu_0}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\)	\(z\)		\((\bar{x} - \bar{y}) \pm z_{1-\alpha/2} \times \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\)
b. \(\sigma_1 = \sigma_2 = \sigma\) unknown	\(\frac{(\bar{x} - \bar{y}) - \mu_0}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}\)	\(t\)	\(n_1+n_2-2\)	\((\bar{x} - \bar{y}) \pm t_{df,1-\alpha/2} \times s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\)
c. \(\sigma_1, \sigma_2\) unknown	\(\frac{(\bar{x} - \bar{y}) - \mu_0}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\) (*)	\(t\)	\(\frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{s_1^4}{n_1^2(n_1-1)} + \frac{s_2^4}{n_2^2(n_2-1)}}\) or \(\min(n_1-1, n_2-1)\)	\((\bar{x} - \bar{y}) \pm t_{df,1-\alpha/2} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\)
3. \(H_0: \mu_1 - \mu_2 = \mu_D = \mu_0\) for paired two-sample
	\(\frac{\bar{d} - \mu_0}{s_d/\sqrt{n}}\)	\(t\)	\(n_D-1\)	\(\bar{d} \pm t_{1-\alpha/2,df}\times \frac{s_D}{\sqrt{n_D}}\)
4. \(H_0: p = p_0\)
	\(\frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}}\)	\(z\)		\(\hat{p} \pm z_{1-\alpha/2} \times \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\)
5. \(H_0: p_1 = p_2\)
	\(\frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}\) (**)	\(z\)		\(\hat{p}_1 - \hat{p}_2 \pm z_{1-\alpha/2} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}\)

A First Course In Statistics

9 Two-sample Tests

9.1 Independent Two-Sample Tests

9.1.1 Two-sample Test

9.1.2 Distribution of the Difference of Two Normal Means

9.1.3 When \(\sigma_1\) and \(\sigma_2\) are Known

9.1.4 When \(\sigma_1\) and \(\sigma_2\) are Unknown and \(\sigma_1 = \sigma_2 = \sigma\)

9.1.5 When \(\sigma_1\) and \(\sigma_2\) are Unknown and \(\sigma_1 \ne \sigma_2\)

9.1.6 Summary

9.2 Wald-type Tests and Confidence Intervals

9.2.1 From Hypothesis Testing to Confidence Interval

9.2.2 From Confidence Interval to Hypothesis Testing

9.3 \(t\)-test for Paired Data

9.4 \(t\)-test for Proportions

9.4.1 One-sample Test of Proportion

9.4.2 Two-sample Test of Proportion

9.5 Summary of Wald-type hypothesis tests

9.5.1 Two-sided vs One-sided Tests

9.5.2 Table of Wald-tests