### Seeing is believing!

Before you order, simply sign up for a free user account and in seconds you'll be experiencing the best in CFA exam preparation.

##### Subject 2. Test of Independence Using Contingency Table Data

When analysis of categorical data is concerned with more than one variable, two-way tables (also known as contingency tables) are employed. A contingency table is a type of table in a matrix format that displays the frequency distribution of the variables in terms of joint frequencies and marginal frequencies. They are heavily used in survey research, business intelligence, engineering and scientific research.

The table below shows the favorite leisure activities for 50 adults - 20 men and 30 women.

Entries in the "Total" row and "Total" column are called marginal frequencies. They represent the frequency distribution for each variable. Entries in the body of the table are called joint frequencies.

Effects in a contingency table are defined as relationships between the row and column variables; that is, are the levels of the row variable differentially distributed over levels of the column variables? Significance in this hypothesis test means that interpretation of the cell frequencies is warranted. Non-significance means that any differences in cell frequencies could be explained by chance.

Hypothesis tests may be performed on contingency tables in order to decide whether or not effects are present. These tests are based on a statistic called chi-square.

The procedure used to test the significance of contingency tables is similar to all other hypothesis tests. That is, a statistic is computed and then compared to a model of what the world would look like if the experiment was repeated an infinite number of times when there were no effects. In this case the statistic computed is called the chi-square statistic. For the detailed steps please refer to the textbook example.

The chi-square statistic has degrees of freedom of (r - 1) (c - 1), where r is the number of categories for the first variable and c is the number of categories of the second variable.