- CFA Exams
- CFA Level I Exam
- Topic 1. Quantitative Methods
- Learning Module 7. Big Data Projects
- Subject 4. Model Training
CFA Practice Question
There are different methods for dealing with imbalanced datasets. They are:
II. Undersample majority class
III. Generate synthetic samples
I. Oversample minority class
II. Undersample majority class
III. Generate synthetic samples
Correct Answer: I, II and III
Imbalanced classes are a common problem in machine learning classification where there are a disproportionate ratio of observations in each class. Oversampling can be defined as adding more copies of the minority class. Oversampling can be a good choice when you don't have a ton of data to work with.
Undersampling can be defined as removing some observations of the majority class. Undersampling can be a good choice when you have a ton of data -think millions of rows. But a drawback is that we are removing information that may be valuable. This could lead to underfitting and poor generalization to the test set.
Generate synthetic samples: it's important to generate the new samples only in the training set to ensure our model generalizes well to unseen data.
User Contributed Comments 0
You need to log in first to add your comment.