Subject 7. Common Biases in Sampling Methods

As has already been mentioned, if there are problems with the choice of sample, then the conclusions that are drawn from the sample could be in error.

There are a number of different types of bias that can creep into samples. It is important to be aware of them and have the ability to comment on their possible appearance in the data where appropriate.

Data-snooping bias is the bias in the inference drawn as a result of prying into the empirical results of others to guide your own analysis.

Finding seemingly significant but in fact spurious patterns in data is a serious problem in financial analysis. Although it afflicts all non-experimental sciences, data-snooping is particularly problematic for financial analysis because of the large number of empirical studies performed on the same datasets. Given enough time, enough attempts, and enough imagination, almost any pattern can be teased out of any dataset. In some cases, these spurious patterns are statistically small, almost unnoticeable in isolation. But because small effects in financial calculations can often lead to very large differences in investment performance, data-snooping biases can be surprisingly substantial.

For example, after examining the empirical evidence from 1986 to 2002, Professor Minard concludes that a growth investment strategy produces superior investment performance. After reading about Professor Minard's study, Monica decides to conduct research of growth versus value investing based on the same or related historical data used by Professor Minard. Monica's research is subject to data-snooping bias because, among other things, the data used by Professor Minard may be spurious.

The best way to avoid data-snooping bias is to examine new data. However, data-snooping bias is difficult to avoid because investment analysis is typically based on historical or hypothesized data.

Data-snooping bias can easily lead to data-mining bias.

Data-mining is the practice of finding forecasting models by extensive searching through databases for patterns or trading rules (i.e., repeatedly "drilling" in the same data until you find something). It has a very specific definition: continually mixing and matching the elements of a database until one "discovers" two more or more data series that are highly correlated. Data-mining also refers more generically to any of a number of practices in which data can be tortured into confessing anything.

Two signs may indicate the existence of data-mining in research findings about profitable trading strategies:

  • Many of the variables actually used in the research are not reported. These terms may indicate that the researchers were searching through many unreported variables.
  • There is no plausible economic theory available to explain why these strategies work.

To avoid data-mining, analysts should use out-of-sample data to test a potentially profitable trading rule. That is, analysts should test the trading rule on a data set other than the one used to establish the rule.

Sample selection bias occurs when data availability leads to certain assets being excluded from the analysis. The discrete choice has become a popular tool for assessing the value of non-market goods. Surveys used in these studies frequently suffer from large non-response numbers, which can lead to significant bias in parameter estimates and in the estimate of mean.

Survivorship bias is the most common type of sample selection bias. It occurs when studies are conducted on databases that have eliminated all companies that have ceased to exist (often due to bankruptcy). The findings from such studies most likely will be upwardly biased, since the surviving companies will look better than those that no longer exist. For example, many mutual fund databases provide historical data about only those funds that are currently in existence. As a result, funds that have ceased to exist due to closure or merger do not appear in these databases. Generally, funds that have ceased to exist have lower returns relative to the surviving funds. Therefore, the analysis of a mutual fund database with survivorship bias will overestimate the average mutual fund return because the database only includes the better-performing funds. Another example is the return data on stocks listed on an exchange, as it is subject to survivorship bias; it's difficult to collect information on delisted companies and these companies often have poor performances.

Look-ahead bias exists when studies assume that fundamental information is available when it is not. For example, researchers often assume that a person had annual earnings data in January; in reality, the data might not be available until March. This usually biases results upwards.

Time period bias occurs when a test design is based on a time period that may make the results time-period specific. Even the worst performers have months or even years in which they look wonderful. After all, stopped clocks are right twice a day. To eliminate strategies that have just been lucky, research must encompass many years. However, if the time period is too long, the fundamental economic structure may have changed during the time frame, resulting in two data sets that reflect different relationships.

User Contributed Comments 9

You need to log in first to add your comment.
achu: DataSnoop,DataMining, SelectiveSampling (sample selection), SURVIVORSHIP, LookAhead, TimePeriod (broken clock) biases.
gill15: Really dont wanna do this section. What are the odds of it showing up. Lets see 18 units - 67 sections - I say about 8 topics within those 67 units so 500 total topics. Odds of not being on exam is (499/500) per question -- 240 questions - (499/500)^240 = 61.65% chance of not being on.

I'm Skipping. I'm already an actuary i dont even know why i'm here.
gill15: Look at the gill loser guy above....showing off with his math....oh wait that was me two months ago...

and now I`m here doing the section....lazy bastard....why didnt you just do it then..
robertucla: Gill15, Nervous breakdown?
schweitzdm: Data Snooping is briefly mentioned in two footnotes in the entire quant book.

SKIP that concept.
guest: data mining imp.
davidkhang: Data Snoop Doggy Dizzle....
cfastudypl: This section is rather interesting even for future research effort or work besides passing the CFA exams.
mtsimone: I'm puzzled by the comments. If you work w/ portfolio analysis this is immensely practical. But I guess if you want to simply pass the test, why not skip it...