**Subject 3. Summarizing Data Using Frequency Distributions**

Very often, the data available is vast, leading to a situation where dealing with individual numbers becomes laborious and messy. In such circumstances, it is neater and more convenient to summarize results into what is known as a frequency table. The data in the display is called a frequency distribution.

An

**interval**, also called a

**class**, is a set of values within which an observation falls.

- Each interval has a lower limit and an upper limit.
- Intervals must be all-inclusive and non-overlapping.

A

**frequency distribution**is a tabular display of data categorized into a small number of non-overlapping intervals. Note that:

- Each observation can only lie in one interval.
- The total number of intervals will incorporate the whole population.
- The range for an interval is unique. This means a value (observation) can only fall into one interval.

It is important to consider the number of intervals to be used. If too few intervals are used, too much data may be summarized and we may lose important characteristics; if too many intervals are used, we may not summarize enough.

A frequency distribution is constructed by dividing the scores into intervals and counting the number of scores in each interval. The actual number of scores and the percentage of scores in each interval are displayed. This helps in the analysis of large amount of statistical data, and works with all types of measurement scales.

**Absolute frequency**is the actual number of observations in a given interval.**Relative frequency**is the result of dividing the absolute frequency of each return interval by the total number of observations.**Cumulative absolute frequency**and**cumulative relative frequency**are the results from cumulating the absolute and relative frequencies as we move from the first to the last interval.

The following steps are required when organizing data into a frequency distribution together with suggestions on constructing the frequency distribution.

- Identify the highest and lowest values of the observations.
- Setup classes (groups into which data is divided). The classes must be mutually exclusive and of equal size.
- Add up the number of observations and assign each observation to its class.
- Count the number of observations in each class. This is called the class frequency.

The

**relative frequency**for a class is calculated by dividing the number of observations in a class by the total number of observations and converting this figure to a percentage (multiplying the fraction by 100). Simply, relative frequency is the percentage of total observations falling within each interval. It is another way of analyzing data; it tells us, for each class, what proportion (or percentage) of data falls in that class.

Let's look at an example.

The following table shows the holding period returns of a portfolio of 40 stocks.

The highest HPR is 32% and the lowest one is -27%. Let's use 6 non-overlapping intervals, each with a width of 10%. The first interval starts at -27% and the last one ends at 33%. Therefore, the entire range of the HPRs is covered.

*Hint: If, in an examination, your relative frequency column does not sum to 1 (or 100%), you know that you have made a mistake.*

#### Practice Question 1

For the grouped frequency distribution shown below, the class width is ______.

A. 18

B. 5

C. 6Correct Answer: C

The class width is the distance from the lower class limit to the lower class limit. Here, the class width is 28 - 22 = 6.

#### Practice Question 2

For the grouped frequency distribution shown below, the class boundaries for class #3 are ______.

A. 34 - 39

B. 33.5 - 39.5

C. 34.5 - 38.5Correct Answer: B

The class boundaries are from the midpoints of the class limits (on a bar graph they are the edges for the bars). For class #3, halfway between 33 and 34 is 33.5 and halfway between 39 and 40 is 39.5. So, the class boundaries are 33.5 - 39.5.

#### Practice Question 3

For the grouped frequency distribution shown below, which of the following is true?

A. The minimum data value is 22 and the maximum data value is 57.

B. The number of scores is 29.

C. The class mark for class #1 is 25.Correct Answer: B

The only true statement is that there are 29 scores. You cannot tell the maximum or minimum from observing a grouped frequency distribution. The class mark for class #1 is 24.5.

#### Practice Question 4

A researcher has decided to create a frequency distribution using the following classes:30-45, 45-60, 60-75, 76 & over

The selection of this set involves which of the following?

I. non-overlapping classes

II. open-ended classes

III. equal class intervalsCorrect Answer: II

Note that "76 & over" is an open-ended class. Also, the classes are overlapping (e.g., the classes "30-45" and "45-60" have the point "45" in common).

#### Practice Question 5

When a researcher uses the classes 129-147, 147-165, and 165-183 to create a distribution, he is violating which of the following suggested practices?A. Avoid the use of open-ended classes

B. Avoid the use of uneven or non-standard classes

C. Avoid the use of overlapping classesCorrect Answer: C

The classes are overlapping (e.g., the classes "147-165" and "165-183" have the point "165" in common). There are no open-ended classes and the classes have the same class interval.

#### Practice Question 6

Frequency distributions are useful for ALL BUT which of the following objectives?A. Investigation of characteristics of each observation

B. Summarization of data

C. Condensation of large sets into smaller setsCorrect Answer: A

Frequency distributions are also useful in illustrating the amount of variability in data.

#### Practice Question 7

Which of the following statements is not correct?A. The sum of the absolute frequency is the total number of observations.

B. Frequency distributions do not work with ratio scales.

C. Each observation is contained in one and only one category.Correct Answer: B

Frequency distributions are used to summarize rates of return. They do work with ratio scales. Frequency distributions can be used with any kind of measurement scale.

#### Practice Question 8

When grouping data with the purpose of constructing a frequency distribution, the suggested number of equal-sized classes for a data set with 210 observations is ______.A. 6

B. 7

C. 8Correct Answer: C

When constructing a frequency distribution, you may approximate the number of classes of the distribution (k) with the smallest integer (k) such that: 2^{k} >= Number of observations

Therefore, we are looking for the smallest integer k, such that: 2

^{k}>= 210. We obtain k=8.

#### Practice Question 9

The figures below describe the total return rates (in percentages) for the past three years for 15 top-performing utilities mutual funds.Biggest return: 15.1

Smallest return: 3.1

Median return: 9.5

What interval length should be used, if one is desired, to construct a frequency distribution with four equally spaced intervals?

A. 4

B. 3.75

C. 3Correct Answer: C

1. Determine how many classes we need. Since 2^{4} > 15, we need 4 classes.

2. Interval: (15.1 - 3.1)/4 = 3

#### Practice Question 10

A cumulative frequency distribution of days absent during a calendar year by employees of a manufacturing company is shown below.

How many employees were absent fewer than six days?

A. 46

B. 60

C. 91Correct Answer: A

This is the difference between the cumulative numbers for the 0-2 group and the 6-8 group (60 - 14 = 46). There are 46 people between those two groups and they were absent fewer than six days.

#### Practice Question 11

A list of the percentages of the total number of cases observed at each score value or each subinterval of scores is ______.A. a histogram

B. a relative frequency distribution

C. a cumulative frequency polygonCorrect Answer: B

#### Practice Question 12

A cumulative frequency distribution of days absent during a calendar year by employees of a manufacturing company is shown below.

How many employees were absent more than five days?

A. 22

B. 31

C. 14Correct Answer: C

The cumulative number for class 6-8 will give us the number of people absent more than five days: 14.

#### Practice Question 13

A data set of 105 observations is organized in a relative frequency distribution into 9 classes. The sum of the relative frequencies across all the classes equals ______.A. 1

B. 100

C. 105Correct Answer: A

Relative class frequencies are percentages and across all classes, they must sum up to 100%, which is the same as summing up to 1.00.

