AnalystNotes.com
  
Home  |   Study Notes  |   Practice Exams  |   Forums  |   Packages  |   Contact
Select Your CFA® Level:
Level 1 12.2014 >
Level 1 06.2015 >
Level 2 06.2015 >
If it costs $500, it must be good, right?! Not at all! Other providers would have you believe that you're getting an extra value by paying their high costs. Why should you have to spend $500 or more? Read more...

 

 

10.i. describe the properties of Student's t-distribution and calculate and interpret its degrees of freedom;

10.j. calculate and interpret a confidence interval for a population mean, given a normal distribution with 1) a known population variance, 2) an unknown population variance, or 3) an unknown variance and a large sample size;
CFA Program Curriculum (2014), Volume 1, page 554.

There are 22 basic questions for this subject. Take a quiz: practice basic questions

Confidence intervals are typically constructed by using the following structure:

Confidence Interval = Point Estimate ± Reliability Factor x Standard Error

  • Point estimate is the value of a sample statistic of the population parameter.
  • Reliability factor is a number based on the sampling distribution of the point estimate and the degree of confidence (1 - α).
  • Standard error refers to the standard error of the sample statistic that is used to produce the point estimate.
Whatever the distribution of the population, the sample mean is always the point estimate used to construct the confidence intervals for the population mean. The reliability factor and the standard error, however, may vary depending on three factors:
  1. Distribution of population: normal or non-normal.
  2. Population variance: known or unknown.
  3. Sample size: large or small.
z-Statistic: a standard normal random variable

If a population is normally distributed with a known variance, z-statistic is used as the reliability factor to construct confidence intervals for the population mean.

In practice, the population standard deviation is rarely known. However, learning how to compute a confidence interval when the standard deviation is known is an excellent introduction to how to compute a confidence interval when the standard deviation has to be estimated.

Three values are used to construct a confidence interval for μ:

  1. The sample mean (m);
  2. The value of z (which depends on the level of confidence), and
  3. The standard error of the mean (σ)m.
The confidence interval has m for its center and extends a distance equal to the product of z and in both directions. Therefore, the formula for a confidence interval is:

m - z σm <= μ <= m + z σm

For a (1 - α)% confidence interval for the population mean, the z-statistic to be used is zα/2. zα/2 denotes the points of the standard normal distribution such that α/2 of the probability falls in the right-hand tail.

Effectively, what is happening is that the (1 - α)% of the area that makes up the confidence interval falls in the center of the graph, that is, symmetrically around the mean. This leaves α% of the area in both tails, or α/2 % of area in each tail.

Commonly used reliability factors are as follows:

  • 90% confidence intervals: z0.05 = 1.645. α is 10%, with 5% in each tail.
  • 95% confidence intervals: z0.025 = 1.96. α is 5%, with 2.5% in each tail.
  • 99% confidence intervals: z0.005 = 2.575. α is 1%, with 0.5% in each tail.
Example

Assume that the standard deviation of SAT verbal scores in a school system is known to be 100. A researcher wishes to estimate the mean SAT score and compute a 95% confidence interval from a random sample of 10 scores.

The 10 scores are: 320, 380, 400, 420, 500, 520, 600, 660, 720, and 780. Therefore, m = 530, N = 10, and σm= 100 / 101/2 = 31.62. The value of z for the 95% confidence interval is the number of standard deviations one must go from the mean (in both directions) to contain .95 of the scores.

It turns out that one must go 1.96 standard deviations from the mean in both directions to contain .95 of the scores. The value of 1.96 was found using a z table. Since each tail is to contain .025 of the scores, you find the value of z for which 1 - 0.025 = 0.975 of the scores are below. This value is 1.96.

All the components of the confidence interval are now known: m = 530, σm = 31.62, z = 1.96.
Lower limit = 530 - (1.96)(31.62) = 468.02
Upper limit = 530 + (1.96)(31.62) = 591.98

Therefore, 468.02 ≤ μ ≤ 591.98. This means that the experimenter can be 95% certain that the mean SAT in the school system is between 468 and 592. This also means if the experimenter repeatedly took samples from the population and calculated a number of different 95% confidence intervals using the sample information, on average 95% of those intervals would contain μ. Notice that this is a rather large range of scores. Naturally, if a larger sample size had been used, the range of scores would have been smaller.

The computation of the 99% confidence interval is exactly the same except that 2.58 rather than 1.96 is used for z. The 99% confidence interval is: 448.54 <= μ <= 611.46. As it must be, the 99% confidence interval is even wider than the 95% confidence interval.

Summary of Computations

  1. Compute m = ∑X/N.
  2. Compute σm = σ/N1/2
  3. Find z (1.96 for 95% interval; 2.58 for 99% interval)
  4. Lower limit = m - z σm
  5. Upper limit = m + z σm
  6. Lower limit <= μ <= Upper limit
Assumptions:
  1. Normal distribution
  2. σ is known
  3. Scores are sampled randomly and are independent
There are three other points worth mentioning here:
  1. The point estimate will always lie exactly at the midway mark of the confidence interval. This is because it is the "best" estimate for μ, and so the confidence interval expands out from it in both directions.
  2. The higher the percentage of confidence, the wider the interval will be. This is because as the percentage is increased, a wider interval is needed to give us a greater chance of capturing the unknown population value within that interval.
  3. The width of the confidence interval is always twice the part after the positive or negative sign, that is, twice the reliability factor x standard error. The width is simply the upper limit minus the lower limit.
It is very rare for a researcher wishing to estimate the mean of a population to already know its standard deviation. Therefore, the construction of a confidence interval almost always involves the estimation of both μ and σ.

Students' t-Distribution

When σ is known, the formula m - z σm <= μ <= m + z σm is used for a confidence interval. When σ is not known, σm = s/N1/2 (N is the sample size) is used as an estimate of σ and μ. Whenever the standard deviation is estimated, the t rather than the normal (z) distribution should be used. The values of t are larger than the values of z so confidence intervals when σ is estimated are wider than confidence intervals when σ is known. The formula for a confidence interval for μ when σ is estimated is:

m - t sm <= μ <= m + t sm

where m is the sample mean, sm is an estimate of σm, and t depends on the degrees of freedom and the level of confidence.

The t-distribution is a symmetrical probability distribution defined by a single parameter known as degrees of freedom (df). Each value for the number of degrees of freedom defines one distribution in this family of distributions. Like a standard normal distribution (e.g. a z-distribution), the t-distribution is symmetrical around its mean. Unlike a standard normal distribution, the t-distribution has the following unique characteristics.

  • It is an estimated standardized normal distribution. When n gets larger, t approximates z (s approaches σ).
  • The mean is 0, and the distribution is bell-shaped.
  • There is not one t-distribution, but a family of t-distributions. All t-distributions have the same mean of 0. Standard deviations of these t-distributions differ according to the sample size, n.
  • The shape depends on degrees of freedom (n - 1). The t-distribution is less peaked than a standard normal distribution, and has fatter tails (i.e. more probability in the tails).
  • tα/2 tends to be greater than zα/2 for a given level of significance, α.
  • Its variance is v/(v-2) (for v > 2), where v = n-1. It is always bigger than 1. As v increases, the variance approaches 1.

The value of t can be determined from a t table. The degrees of freedom for t is equal to the degrees of freedom for the estimate of σm which is equal to N-1.

A portion of t-table is presented as below:

Suppose the sample size (n) is 30, and the level of significance (α) is 5%. df = n - 1 = 29. tα/2 = t0.025 = 2.045 (Find the 29 df row, and then move to the 0.05 column).

Example

Assume a researcher is interested in estimating the mean reading speed (number of words per minute) of high-school graduates and computing the 95% confidence interval. A sample of 6 graduates was taken and the reading speeds were: 200, 240, 300, 410, 450, and 600. For these data,

  • m = 366.6667
  • sm = 60.9736
  • df = 6-1 = 5
  • t = 2.571
Therefore, the lower limit is: m - (t) (sm) = 209.904 and the upper limit is: m + (t) (sm) = 523.430. Therefore, the 95% confidence interval is:
209.904 <= μ <= 523.430

Thus, the researcher can be 95% sure that the mean reading speed of high-school graduates is between 209.904 and 523.430.

Summary of Computations

  1. Compute m = ∑X/N.
  2. Compute s
  3. Compute σm = s/N1/2
  4. Compute df = N-1
  5. Find t for these df using a t table
  6. Lower limit = m - t sm
  7. Upper limit = m + t sm
  8. Lower limit <= μ <= Upper limit
Assumptions:
  1. Normal distribution
  2. Scores are sampled randomly and are independent
Discuss the issues surrounding selection of the appropriate sample size

It's all starting to become a little confusing. Which distribution do you use?

When a large sample size (generally bigger than 30 samples) is used, a z table can always be used to construct the confidence interval. It does not matter if the population distribution is normal, or if the population variance is known or not. This is because the central limit theorem assures that when the sample is large, the distribution of the sample mean is approximately normal. However, the t-statistic is more conservative because the t-statistic tends to be greater than the z-statistic, and therefore using t-statistic will result in a wider confidence interval.

However, if there is only a small sample size, a t table has to be used to construct the confidence interval when the population distribution is normal and the population variance is not known.

If the population distribution is not normal, there is no way to construct a confidence interval from a small sample (even if the population variance is known).

Therefore, all else equal, you should try to select a sample larger than 30. The larger the sample size, the more precise the confidence interval.

In general, at least one of the following is needed:

  • A normal distribution for the population.
  • A sample size that is greater than or equal to 30.
If one or both of the above occur, then a z-table or t-table is used, dependent upon whether σ is known or unknown. If neither of the above occurs, then the question cannot be answered.

A summary of the situation is as follows:

  • If the population is normally distributed, and the population variance is known, use a z-score irrespective of sample size.
  • If the population is normally distributed, and the population variance is unknown, use a t-score irrespective of sample size.
  • If the population is not normally distributed, and the population variance is known, use a z-score only if n >= 30, otherwise it cannot be done.
  • If the population is not normally distributed, and the population variance is unknown, use a t-score only if n >= 30, otherwise it cannot be done.
< Previous | Next >
   What are our notes philosophy?       site features?       basic questions?
   User Actions:

    Basic Questions: 22.    Click to start.

    My private notes: Add Note

 User Comments ( Log in to Post )
Posted by danlan:
Good summary.
Posted by akanimo:
summary

unknown variance - always use t score
known variance - always use z score

(note: if population is non-normal then n >= 30 for above to remain true .. else unsolvable)
Posted by DAS11:
Great notes.
Posted by surob:
Yeah, agreed. Good notes.
Good comments too. Thanks
Posted by olukayode:
where do we get these t tables from please
Posted by olukayode:
sm is the standard deviation for the sample, so u calculate the s.d for the sample given
Posted by StanleyMo:
Wow, good notes. :)

the table can be get from the appendix of CFA curriculum books.
Posted by StanleyMo:
Another point to note:

However, the t-statistic is more conservative because the t-statistic tends to be greater than the z-statistic, and therefore usingt-statistic will result in a wider confidence interval.
Posted by JKiro:
any knows how the sample std deviation (s(m)= 60.9736)was computed?
Posted by ambar:
Using the mean calculated, calculate the standard deviation s = [&#8721;(mean - observation)/n]^1/2
Using s, calculate s(m) = s/n^1/2 = 60.9736
Posted by verhuizing:
(1-a) % = confidence interval.
stel (1-a) = .95 dan is |0.025|0.95|0.025|
Za/2 = 0.05 = 1.65 = 90%
Za/2 = 0.025 = 1.96 = 95%
Za 0.005 = 2.58 = 99%
Posted by thekid:
Jkiro...

S= { Sum {(observation - mean)^2} / n -1 }^0.5 ....B/c SAMPLE standard deviation.
Posted by thekid:
Continuation from previous comment...

Jkiro...

"n-1" b.c sample standard deviation

Then use 'S' to solve for S(m)=s/(n)^0.5
Posted by jpducros:
please note that the higher the df, the peaker the curve, but the THINNER the tails. This is different from a Leptokurdic distribution, where the higher the peak, the FATTER the tails. I think a little comment in the curriculum would be useful.
Posted by EminYus:
t-distributions given on exam?
Posted by tankdan:
Can't figure out how to get s(m)= 60.97 using the formula above. I'm doing each (obs-mean)^2 and summing all the results for 111,533.34. Then dividing by 5 (N-1 which = 22306.67. Then taking the square root which equals 149.35.

What am I missing?!
Posted by jjsiow:
I cannot get 60.97 either. Can someone please help explain?
Posted by johntan1979:
The sd from calculations is 149.354165.

Since sample sd is not known or given, we have to estimate sample sd by dividing 149.354165 by the square root of sample size i.e. 6^1/2

giving you 60.973583