Statistics can refer to numerical data (e.g., a company's average revenue for the past 20 years). It can also refer to methods of collecting, classifying, analyzing, and interpreting numerical data. Statistical methods provide a powerful set of tools for making decisions in business and other fields.
Statistics involves two different processes:
We use statistical methods to analyze the results of data. Since the amount of information available may be vast, it may be extremely time-consuming and expensive to collect all the necessary data. For instance, suppose we are interested in the durability of tennis balls. Theoretically, in order to carry out an accurate assessment, we would need to collect large quantities of all different makes of tennis balls from all over the world. Clearly, this is not practical; aside from taking up lots of time, it would be cost-prohibitive to purchase all the balls we would need for our study. A more practical solution would be to use a sample.
A population consists of an entire set of objects, observations, or scores that have something in common. It comprises every possible member of the specified group. In our example above, the population of tennis balls consists of every tennis ball that has ever been manufactured anywhere in the world. This is a huge number of tennis balls. Another example of a population would be all males between the ages of 15 and 18.
A sample is a subset of a population. The sample is comprised of some of the members of the population. Since it is usually impractical (or too expensive or time-consuming) to test every member of a population, using data gathered from a sample of the population is typically the best approach available for describing that population.
In our example above, a sample might be a selection of 1,000 tennis balls of various makes collected from different sources. It would be a virtually impossible task to collect every possible tennis ball in the world; this same size provides a manageable number to work with as well as a substantial amount of possible data.
Before we move on, there are several points worth noting:
Both large groups of data (populations) and smaller groups (samples) have values associated with them, such as the average of all values in a sample and the average of all population values. Values from a population are called parameters, and values from a sample are called statistics.
A statistic is defined as a numerical quantity (such as the mean) calculated in a sample. It has two different meanings.
Note that we will always know the exact composition of our sample, and by definition, we will always know the values within our sample. Ascertaining this information is the purpose of samples. Sample statistics will always be known, and can be used to estimate unknown population parameters.
Hint: One way to easily remember these terms is to recall that "population" and "parameter" both start with a "p," and "sample" and "statistic" both start with a "s."
Inferential statistics generally require that sampling be random although some types of sampling (such as those used in voter polling) seek to make the sample as representative of the population as possible by choosing a sample that resembles the population on most important characteristics.
A typical statistical procedure:
|achu: Note: 'parameter' defn broader than actuarial/math defn. u is parameter while mean is still a 'statistic' as well.|
|bittily: achu, do you mean 'u' written as in greek|