- CFA Exams
- 2021 CFA Level II Exam
- Study Session 2. Quantitative Methods (1)
- Reading 4. Introduction to Linear Regression
- Subject 1. Linear regression

###
**CFA Practice Question**

Which of the following statements is true?

B. the closer the r-value is to zero the closer the dots in a scatter plot are to a straight line.

C. the regression line must pass through at least one of the data points.

D. the scatter plot of x-y data results in dots that must fall on a straight line

A. the regression line always passes through the point (average of x-values, average of y-values).

B. the closer the r-value is to zero the closer the dots in a scatter plot are to a straight line.

C. the regression line must pass through at least one of the data points.

D. the scatter plot of x-y data results in dots that must fall on a straight line

Correct Answer: A

The regression line must pass through the point (average of the x-values, average of the y-values). The regression need not pass through any of the data points.

###
**User Contributed Comments**
3

User |
Comment |
---|---|

investoprenuer |
Why? Can someone explain? |

davidt876 |
emember you can find b0 (the y-intercept) by solving the equation using the average values of X and Y. you can only do that because you have assumed that all regression lines pass through the point (X-bar, Y-bar). but that's not a real answer to the question "Why?" .. the point (X-bar, Y-bar) is also smack in the middle of any scatter plot. it would make sense that the line with the least distance between it and the data points would have to pass through the centre point of all the data points. not a very scientific explanation but conceptually it helps me understand. now if you have a comfortable understanding of calculus this is the real answer to "Why?": http://www.pmean.com/10/LeastSquares.html |

davidt876 |
actually found a good explanation by Silverfish here: https://stats.stackexchange.com/questions/123651/geometric-interpretation-of-multiple-correlation-coefficient-r-and-coefficient in short: 1. x-bar and y-bar of the actual and expected values are equal; and 2. the linear regression line is a straight line.. and the average of any data points along a straight line MUST form another point along that line. at length: estimated value = actual value + error we know that the sum of all the error values for each point along a regression line is equal to 0.* so if you summed both sides of that equation for all data points then the error values sum to 0.. and you're left with: sum of estimated values = sum of actual values they have the same number of data points and so the arithmetic mean of both must also be equal. this applies to both the y values and the x values. so it's less that the regression line has to pass through the average of the actual data points, and more that it has to pass through the average of its own estimated data points (but of course they're equal). the final question is why the regression line has to pass through the average of its own estimated data points. that’s because the regression line is a straight line. imagine a line between two points. if you find the midpoint between the x-values and the y-values (their average) you have inevitably found a point on that line. this will apply no matter how many points are on that line. that is if you find the average of the x and y values and plot it, it will always be on the line. *for an explanation of why this is check out the answer by Manuel S here: https://stats.stackexchange.com/questions/189584/why-do-residuals-in-linear-regression-always-sum-to-zero-when-an-intercept-is-in |