- CFA Exams
- CFA Level I Exam
- Study Session 3. Quantitative Methods (2)
- Reading 8. Big Data Projects
- Subject 3. Data Exploration Objectives and Methods
CFA Practice Question
Which statement is correct?
A. If the word 'CFA' appears many times in a document, while not appearing many times in others, it probably has a high document frequency value.
B. Words that are common in every document, such as this, what, and if, rank high in document frequency.
C. Any outliners should always be removed from a dataset in data strangling.
Explanation: A is correct. This means that it’s very relevant.
B is incorrect. Document frequency works by increasing proportionally to the number of times a word appears in a document, but is offset by the number of documents that contain the word. Words that are common in every document, such as this, what, and if, rank low even though they may appear many times, since they don’t mean much to that document in particular.
C is also incorrect. They should be examined first, and a decision should be made to either remove them (trimming) or replace them with appropriate values (e.g. winsorization).
User Contributed Comments 3
User | Comment |
---|---|
SenanOM | The explanation says A is correct, but then it also says C is correct. Am I going crazy, or is this a mistake...? |
rtw1984 | Mistake indeed.. |
JNW1980 | This entire question bank on this section seems to have a lot of errors... |