Subject 1. The Nature of Statistics
Statistics can refer to numerical data (e.g., a company's average revenue for the past 20 years). It can also refer to methods of collecting, classifying, analyzing, and interpreting numerical data. Statistical methods provide a powerful set of tools for making decisions in business and other fields.
Statistics involves two different processes:
- Describing sets of data. Descriptive statistical methods can be used to describe the important aspects of data sets that have been collected. This reading will focus on the use of descriptive statistics to consolidate a mass of numerical data into useful information.
- Drawing conclusions (making estimates, judgments, predictions, etc.). Inferential statistical methods can be used to draw conclusions about a large group from a smaller group actually observed.
We use statistical methods to analyze the results of data. Since the amount of information available may be vast, it may be extremely time-consuming and expensive to collect all the necessary data. For instance, suppose we are interested in the durability of tennis balls. Theoretically, in order to carry out an accurate assessment, we would need to collect large quantities of all different makes of tennis balls from all over the world. Clearly, this is not practical; aside from taking up lots of time, it would be cost-prohibitive to purchase all the balls we would need for our study. A more practical solution would be to use a sample.
A population consists of an entire set of objects, observations, or scores that have something in common. It comprises every possible member of the specified group. In our example above, the population of tennis balls consists of every tennis ball that has ever been manufactured anywhere in the world. This is a huge number of tennis balls. Another example of a population would be all males between the ages of 15 and 18.
A sample is a subset of a population. The sample is comprised of some of the members of the population. Since it is usually impractical (or too expensive or time-consuming) to test every member of a population, using data gathered from a sample of the population is typically the best approach available for describing that population.
In our example above, a sample might be a selection of 1,000 tennis balls of various makes collected from different sources. It would be a virtually impossible task to collect every possible tennis ball in the world; this same size provides a manageable number to work with as well as a substantial amount of possible data.
Before we move on, there are several points worth noting:
- Don't be fooled by the word "population." This does not necessarily refer to people. As with the example above, we can have a population of tennis balls. A population can consist of anything, living or not.
- Although populations are often vast, they can also be of manageable size. For example, the population of even numbers between 1 and 9 would comprise the numbers 2, 4, 6 and 8. In this case, it is possible to sample the entire population and get accurate results. This is rare, however, and for your purposes, populations can generally be considered to be vast.
- In general, the bigger the sample, the better your results will be (because you are using data from more of the population for analysis). However, this point can present difficulties, as you will see when we study variance and standard deviation later.
- The ideal process would be to select a sample that is "representative" of the population (a sample that takes into account extreme values on both sides but contains many "average" values). In this way, the results that we get will be more meaningful. Because we frequently don't know about the exact values of a population (which is why we sample in the first place), we will never really know if our sample is truly representative or not. It's all we have to work with, however, so it's all we can use.
- Some populations are only hypothetical. Consider an experimenter interested in the possible effectiveness of a new teaching method for reading. He or she might define a population as the reading achievement scores that would result if all 6-year olds in the U.S. were taught with this new method. The population is hypothetical in the sense that there is not a group of students who have been taught using the new method; the population consists of the scores that would be obtained if they were taught with this method.
Both large groups of data (populations) and smaller groups (samples) have values associated with them, such as the average of all values in a sample and the average of all population values. Values from a population are called parameters, and values from a sample are called statistics.
- A parameter is a numerical quantity measuring some aspect of a population of scores.
- The mean, for example, is a measure of central tendency.
- Greek letters are used to designate parameters.
- Parameters are rarely known and are usually estimated by statistics computed in samples.
- Populations can have many parameters, but investment analysts are usually only concerned with a few, such as the mean return or the standard deviation of returns.
- Estimates of these parameters taken from a sample are called statistics. Much of the field of statistics is devoted to drawing inferences from a sample concerning the value of a population parameter.
- Most commonly, statistics refers to numerical data such as a company's earnings per share or average returns over the past five years.
- Statistics can also refer to the process of collecting, organizing, presenting, analyzing, and interpreting numerical data for the purpose of making decisions.
Hint: One way to easily remember these terms is to recall that "population" and "parameter" both start with a "p," and "sample" and "statistic" both start with a "s."
Inferential statistics generally require that sampling be random although some types of sampling (such as those used in voter polling) seek to make the sample as representative of the population as possible by choosing a sample that resembles the population on most important characteristics.
A typical statistical procedure:
- Define the population and identify the parameter(s) of interest.
- Draw a sample from the population.
- Determine the corresponding statistic(s) of the sample and use it (or them) to estimate the parameter(s) of the population.
Practice Question 1Under which measurement scale is data most likely categorized without being ranked?
C. nominalCorrect Answer: C
Data is categorized, but not ranked, under a nominal scale. Under an ordinal scale, data is ranked, while under an interval scale, data is ranked and separated by equal intervals.
Study notes from a previous year's CFA exam:
1. The Nature of Statistics