Is it true that the average price of a cup of coffee is different depending on the size of the city you live in? It certainly seems reasonable that the average price for a cup of coffee would be more in a large city compared to a small one, but how do you tell if that is really true? Confidence intervals for the difference of two means are the way to go to really be sure of your answer. So solve your coffee woes by reading further!
Confidence Interval for the Difference of Two Means with Known Standard Deviations
If you were only interested in the average coffee price in one city you could do a confidence interval for a population mean. In that case, in order to do a proper confidence interval you would need that:
Either the sample size is large enough (\(n \ge 30\)) or the population distribution is approximately normal.
The sample is random or it is reasonable to assume it is representative of the larger population.
If you know the population standard deviation, \(\sigma\), the confidence interval is given by
\[ \bar{x} \pm (z \text{ critical value})\left(\frac{\sigma}{\sqrt{n}}\right)\]
where \(\bar{x}\) is the sample mean.
But here you have two different cities and you want to compare the average coffee price, so how do you construct the confidence interval? Let's start by listing some of the notation used going forward.
First the population notation:
Population \(1\)
Population \(2\)
Population Mean
\( \mu_1\)
\( \mu_2\)
Population Standard Deviation
\(\sigma_1\)
\(\sigma_2\)
And now for the samples:
Sample from Population \(1\)
Sample from Population \(2\)
Sample Size
\(n_1\)
\(n_2\)
Sample Mean
\(\bar{x}_1\)
\(\bar{x}_2\)
Sample Standard Deviation
\(s_1\)
\(s_2\)
Then the conditions for constructing a confidence interval for the difference of two means are:
The samples are independent.
Either the sample size is large enough (\(n_1 \ge 30\) and \(n_2 \ge 30\)) or the population distribution is approximately normal.
The samples are random or it is reasonable to assume that the samples are representative of the larger population.
These conditions don't change even if you don't know the population standard deviations.
Because the samples are independent and random, you know that
\[ \mu_{\bar{x}_1 - \bar{x}_2} = \mu_1 - \mu_2\]
and that
\[ \sigma_{x_1 - x_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} }.\]
Then the confidence interval for the difference in the two population means is
\[\bar{x}_1 - \bar{x}_2 \pm (z \text{ critical value})\sqrt{\frac{\sigma_1^2}{n_1} +\frac{\sigma_2^2}{n_2} } .\]
In general you aren't going to know what the population standard deviations are, but let's look at an example illustrating the use of the formulas.
You do a survey of \(40\) small town coffee shops and \(49\) big city coffee shops, and find that the mean price of a large cup of coffee is \(\$3.75\) and in the big cities it is \(\$ 4.50\). You also know that the population standard deviation in small towns is \(1.20\), and in big cities the population standard deviation of \(0.98\).
Construct a \(99\%\) confidence interval for the difference of their two means, and draw conclusions from it.
Answer
It helps to lay out the information you have. Call the small city Population \(1\) and the large city Population \(2\). Then you know that
\[ \begin{array}{lll} & n_1 = 40 & \bar{x}_1 = 3.75 & \sigma_1 = 1.20 \\ & n_2 = 49 & \bar{x}_2 = 4.50 & \sigma_2 = 0.98 . \end{array}\]
You know that the \(z\) critical value for a \(99\%\) confidence interval is \(2.58\). Then calculating the confidence interval for the difference in the means,
\[\begin{align} & \bar{x}_1 - \bar{x}_2 \pm (z \text{ critical value})\sqrt{\frac{\sigma_1^2}{n_1} +\frac{\sigma_2^2}{n_2} } \\ & \qquad = 3.75-4.50 \pm 2.58 \sqrt{\frac{(1.20)^2}{40} +\frac{(0.98)^2}{49} } \\ & \qquad = -0.75 \pm 2.58\sqrt{0.036 + 0.0196} \\ & \qquad \approx -0.75 \pm 0.61 \\ & \qquad = (-1.36, -0.14) .\end{align}\]
Now what can you conclude from this? First, you can conclude that the method used to construct this interval estimate is successful in capturing the actual difference in the population means about \(99\%\) of the time.
More importantly, you can conclude with \(99\%\) confidence that the actual difference in the mean price of a large cup of coffee is between \(-\$1.36\) and \(-\$0.14\). Because both endpoints of the confidence interval are negative, you can estimate that the mean price of a large cup of coffee is between \(\$0.14\) and \(\$1.36\) lower in a small town than it is in a big city.
Notice that in the previous example both ends of the confidence interval were negative. What happens if one end is negative and one end is positive? That implies that \(0\) is inside the confidence interval, so in other words it would be plausible that there was no difference in the two means.
Confidence Interval for the Difference of Two Independent Population Means
If you don't know the population standard deviations, but you do know that your samples are independent (meaning that choosing a member of the first population doesn't affect your choice for a member of the second population), then you can calculate the confidence interval using the formula:
\[\bar{x}_1 - \bar{x}_2 \pm (t \text{ critical value})\sqrt{\frac{s_1^2}{n_1} +\frac{s_2^2}{n_2} } ,\]where \(n_1\) and \(n_2\) are the sample sizes, \(s_1\) and \(s_2\) are the sample standard deviations, and \(\bar{x_1}\) and \(\bar{x}_2\) are the sample means.