A chisquared test, also referred to as \chi^2 test (or chisquare test), is any statistical hypothesis test in which the sampling distribution of the test statistic is a chisquare distribution when the null hypothesis is true. Chisquared tests are often constructed from a sum of squared errors, or through the sample variance. Test statistics that follow a chisquared distribution arise from an assumption of independent normally distributed data, which is valid in many cases due to the central limit theorem. A chisquared test can then be used to reject the hypothesis that the data are independent.
Also considered a chisquare test is a test in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chisquare distribution as closely as desired by making the sample size large enough. The chisquared test is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. Does the number of individuals or objects that fall in each category differ significantly from the number you would expect? Is this difference between the expected and observed due to sampling variation, or is it a real difference?
Contents

Examples of chisquare tests with samples 1

Pearson's chisquare test 1.1

Yates's correction for continuity 1.2

Other chisquare tests 1.3

Chisquared test for variance in a normal population 2

Example chisquared test for categorical data 3

Applications 4

See also 5

References 6
Examples of chisquare tests with samples
One test statistic that follows a chisquare distribution exactly is the test that the variance of a normally distributed population has a given value based on a sample variance. Such tests are uncommon in practice because the true variance of the population is usually unknown. However, there are several statistical tests where the chisquare distribution is approximately valid:
Pearson's chisquare test
Pearson's chisquare test, also known as the chisquare goodnessoffit test or chisquare test for independence. When the chisquare test is mentioned without any modifiers or without other precluding context, this test is often meant (for an exact test used in place of \chi^2, see Fisher's exact test).
Yates's correction for continuity
Using the chisquare distribution to interpret Pearson's chisquare statistic requires one to assume that the discrete probability of observed binomial frequencies in the table can be approximated by the continuous chisquare distribution. This assumption is not quite correct, and introduces some error.
To reduce the error in approximation, Frank Yates suggested a correction for continuity that adjusts the formula for Pearson's chisquare test by subtracting 0.5 from the difference between each observed value and its expected value in a 2 × 2 contingency table.^{[1]} This reduces the chisquare value obtained and thus increases its pvalue.
Other chisquare tests
Chisquared test for variance in a normal population
If a sample of size n is taken from a population having a normal distribution, then there is a result (see distribution of the sample variance) which allows a test to be made of whether the variance of the population has a predetermined value. For example, a manufacturing process might have been in stable condition for a long period, allowing a value for the variance to be determined essentially without error. Suppose that a variant of the process is being tested, giving rise to a small sample of n product items whose variation is to be tested. The test statistic T in this instance could be set to be the sum of squares about the sample mean, divided by the nominal value for the variance (i.e. the value to be tested as holding). Then T has a chisquare distribution with n − 1 degrees of freedom. For example if the sample size is 21, the acceptance region for T for a significance level of 5% is the interval 9.59 to 34.17.
Example chisquared test for categorical data
Suppose there is a city of 1 million residents with four neighborhoods: A, B, C, and D. A random sample of 650 residents of the city is taken and their occupation is recorded as "blue collar", "white collar", or "no collar". The null hypothesis is that each person's neighborhood of residence is independent of the person's occupational classification. The data are tabulated as:

\begin{array}{lcccccc} & \text{A} & \text{B} & \text{C} & \text{D} & & \text{total} \\[6pt] \hline \text{White collar} & 90 & 60 & 104 & 95 & & 349 \\[6pt] \hline \text{Blue collar} & 30 & 50 & 51 & 20 & & 151 \\[6pt] \hline \text{No collar} & 30 & 40 & 45 & 35 & & 150 \\[12pt] \hline \text{total} & 150 & 150 & 200 & 150 & & 650 \end{array}
Let us take the sample living in neighborhood A, 150/650, to estimate what proportion of the whole 1 million people live in neighborhood A. Similarly we take 349/650 to estimate what proportion of the 1 million people are whitecollar workers. By the assumption of independence under the hypothesis we should "expect" the number of whitecollar workers in neighborhood A to be

\frac{150}{650}\times\frac{349}{650}\times650 \approx 80.54.
Then in that "cell" of the table, we have

\frac{(\text{observed}\text{expected})^2}{\text{expected}} = \frac{(9080.54)^2}{80.54}.
The sum of these quantities over all of the cells is the test statistic. Under the null hypothesis, it has approximately a chisquare distribution whose number of degrees of freedom are

(\text{number of rows}1)(\text{number of columns}1) = (31)(41) = 6. \,
If the test statistic is improbably large according to that chisquare distribution, then one rejects the null hypothesis of independence.
A related issue is a test of homogeneity. Suppose that instead of giving every resident of each of the four neighborhoods an equal chance of inclusion in the sample, we decide in advance how many residents of each neighborhood to include. Then each resident has the same chance of being chosen as do all residents of the same neighborhood, but residents of different neighborhoods would have different probabilities of being chosen if the four sample sizes are not proportional to the populations of the four neighborhoods. In such a case, we would be testing "homogeneity" rather than "independence". The question is whether the proportions of bluecollar, whitecollar, and nocollar workers in the four neighborhoods are the same. However, the test is done in the same way.
Applications
In cryptanalysis, chisquare test is used to compare the distribution of plaintext and (possibly) decrypted ciphertext. The lowest value of the test means that the decryption was successful with high probability.^{[2]}^{[3]} This method can be generalized for solving modern cryptographic problems.^{[4]}
See also
References

^ Yates, F (1934). "Contingency table involving small numbers and the χ^{2} test". Supplement to the Journal of the Royal Statistical Society 1(2): 217–235. JSTOR 2983604

^ "Chisquared Statistic". Practical Cryptography. Retrieved 18 February 2015.

^ "Using Chi Squared to Crack Codes". IB Maths Resources. British International School Phuket.

^ Ryabko, B.Ya.; Stognienko, V.S.; Shokin, Yu.I. (2004). "A new test for randomness and its application to some cryptographic problems" (PDF). Journal of Statistical Planning and Inference 123: 365–376. Retrieved 18 February 2015.

Weisstein, Eric W., "ChiSquared Test", MathWorld.

Corder, G.W. & Foreman, D.I. (2014). Nonparametric Statistics: A StepbyStep Approach. Wiley, New York. ISBN 9781118840313

Greenwood, P.E., Nikulin, M.S. (1996) A guide to chisquared testing. Wiley, New York. ISBN 047155779X

Nikulin, M.S. (1973). "Chisquared test for normality". In: Proceedings of the International Vilnius Conference on Probability Theory and Mathematical Statistics, v.2, pp. 119–122.

Bagdonavicius, V., Nikulin, M.S. (2011) "Chisquared goodnessoffit test for right censored data". The International Journal of Applied Mathematics and Statistics, p. 3050.
This article was sourced from Creative Commons AttributionShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, EGovernment Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a nonprofit organization.