Navigation Panel: Graphical Version | PostScript version | SIMMER Home Page | Fields Institute Home | U of T Math Network Home


Society Investigating Mathematical Mind-Expanding Recreations

April 1998 Feature Presentation
Hypothesis Testing and the Chi-Squared Test of Independence
Alison Gibbs and Martin Van Driel

  • The Logic of Hypothesis Testing
  • The Heart of Statistics
  • The Chi-Squared Test of Independence
  • References
  • Problems
  • Answers to Problems
  • The Logic of Hypothesis Testing--
    Beyond a Reasonable Doubt

    Steps for Carrying out a Statistical Hypothesis Test:

    Trial by Jury Statistical Hypothesis Testing

    Prosecutor Statistician
    Trial Collection of Data
    Jury decides on the verdict Statistical test
    Assume defendant is innocent Assume the null hypothesis is true
    Weigh the evidence provided by Assess the evidence provided by testimony and exhibits the data (as summarized in the test statistic) assuming defendant is innocent assuming null hypothesis is true
    Evidence against the defendant Calculate a p-value for the test statistic assuming defendant is innocent assuming null hypothesis is true
    Defendant found guilty Reject the null hypothesis if beyond a reasonable doubt p-value less than the significance level

    The Heart of Statistics

    Law of Large Numbers

    The mean of a random observation is defined as the expected outcome, based on the distribution. For example, if we toss a fair coin four times, then the mean number of heads is 2. Now suppose we repeat the coin tossing experiment 5 times and observe on each experiment 4,2,0,1 and 2 heads respectively. Then the sample mean based on these 5 observations is
             4+2+0+1+2     9
             ---------  =  -  = 1.8 heads
                 5         5

    The Law of Large Numbers states that as the sample size (number of observations) increases, the sample mean will approach the actual mean.

    Normal Distribution

    For a population with a standard deviation sigma and mean µ, we say that the data has a Normal Distribution if 95% of the observations are within 2 standard deviations of the mean, and 68% of the observations are within one standard deviation of the mean, and the mean is also the median.

    Central Limit Theorem

    Given observations x_1, x_2, ..., x_n from a common distribution, for the sample mean
            _    x1 + x2 + ... + xn
            x  = ------------------ ,

    the Central Limit Theorem states that as the sample size increases, the distribution of

             becomes closer to a normal distribution.}  Also, the distribution
            of the sum of the random observations,
            SUM x
            i=1  i
    SUM_i=1^n x_i becomes closer to a normal distribution.

    The chi^2 Test of Independence

    We will be analyzing count data, for example, the number of women present this evening.

    The Background Theory

    The Calculations Behind the Test



    The Chance Database:


    Mendenhall, W., Wackerly, D. and Schaeffer, R. Mathematical Statistics with Applications, 4th edition. PWS-Kent Publishing Company, Boston, 1990.


    Moore, D. and McCabe. G. Introduction to the Practice of Statistics, 2nd edition. W.H. Freeman and Company, New York, 1993.


    Morgan, Larry. Statistics Handbook for the TI-83. Texas Instruments Inc., 1997.


    Paulos, John Allen. Innumeracy: Mathematical Illiteracy and its Consequences. Hill and Wang, New York, 1988.


    Rice, John A. Mathematical Statistics and Data Analysis, 2nd edition. Wadsworth, Belmont, California, 1995.


    The SIMMS Project (Systemic Initiative for Montana Mathematics and Science). What Did You Expect, Big Chi?. Simon and Schuster, Houston.


    1. A life insurance company sells a term insurance policy to a 21-year-old male. The policy pays $100,000 if the insured dies within the next 5 years. The company collects a premium of $250 each year. There is a high probability that the man will live, and the insurance company will gain $1250 in premiums. But if he were to die, the company would would lose almost $100,000! Why would the insurance company want to take on this much risk?

    2. In advertising for a study guide, the producers claim that students that use it do significantly better (p<0.05) than students who don't. What does this mean? Is there any reason you many not want to trust the producers' claim?

    3. A researcher is looking for evidence of extra-sensory perception. She tests 500 subjects, 4 of whom do significantly better (p<0.01) than random guessing. Should she conclude that these 4 have ESP?

    4. Did the baseball player Reggie Jackson earn the title "Mr. October"? In his 21-year career he had 2584 hits in 9864 regular season at-bats. During the World Series, he had 35 hits in 98 at bats. Is the improvement in his batting average during the World Series statistically significant?

    5. Does gender influence newspaper preference? Test the hypothesis that there is not relationship between gender and preferred Toronto daily newspaper for the data we've collected:
      Gender Globe Star Sun

    6. Here is some more data on Jane Austen and her imitator (from J. Rice, Mathematical Statistics and Data Analysis, 2nd ed.). The following table gives the relative frequency of the word a preceded by (PB) and not preceded by (NPB) the word such, the word and followed by (FB) or not followed by (NFB) I, and the word the preceded by and not preceded by on.
      Words  Sense and Sensibility  Emma  
      Sanditon I  Sandition II 
      a PB such 14 16 8 2
      a NPB such 133 180 93 81
      and FB I 12 14 12 1
      and NFP I 241 285 139 153
      the PB on 11 6 8 17
      the NPB on 259 265 221 204
      Was Austen consistent in these habits of style from one work to another? Did her imitator successfully copy this aspect of her style?

    7. Was there block judging in the ice dance competition at the Olympics? Claims have been made that the decision had been determined by the judges before the games even started. In particular, it has been claimed that the judges from the five Eastern bloc countries (Russia, Ukraine, Lithuania, Poland and the Czech Republic) agreed to support each other's competitors. Some also claim that France was part of the conspiracy. Could we test this judging irregularity statistically? How? (Hint: it's not a chi^2 test!)

    Answers to Problems

    1. Hopefully lots of 21-year-olds buy policies from the insurance company. The law of large numbers guarantees that only a few will die, so premiums collected will more than cover pay-outs.

    2. Assuming that there is no difference between the two groups of students, the probability of seeing a difference as great or greater than that observed is less than 0.05. Of course, we've no indication how the producers of the study guide found students who used it, and students who didn't. Perhaps their claim says more about the students who buy study guides.

    3. One percent of the time we'd expect a person who is guessing randomly to do that well. So in a group of 500, it wouldn't be surprising if 5 people did that well. It's not likely that the 4 really do have ESP.

    4. We can test this by performing a chi^2 test of independence on the following table:
                      Hit   No hit 
      Regular season 2584 7280 World Series 35 63
      The test statistic has value 4.536 and has a chi^2 distribution with 1 degree of freedom under the hypothesis of no relationship. The p-value is 0.033. Whether or not the null hypothesis should be rejected depends on the significance level. Assuming there's no relationship between Jackson's batting average and whether or not it's a World Series game, observing a difference as greater or greater than what Jackson accomplished would happen 3% of the time. Do you consider that highly unusual?

    5. Up to you!

    6. The test of Austen with herself (taking just the first three columns) has a test statistic of 23.287 which, under the null hypothesis of no relationship between work and word distribution has a chi^2 distribution with 10 degrees of freedom and a p-value of 0.0097. So it appears that Austen was not consistent in the use of these word combinations! So does it matter what the imitator did?

    7. An open question!

    Navigation Panel: 

      Switch to graphical version (better pictures & formulas)
      Access printed version in PostScript format (requires PostScript printer)
      Go to SIMMER Home Page
      Go to The Fields Institute Home Page
      Go to University of Toronto Mathematics Network Home Page