Seeing is believing!

Before you order, simply sign up for a free user account and in seconds you'll be experiencing the best in CFA exam preparation.

Subject 4. Unsupervised Machine Learning Algorithms PDF Download

Principal Components Analysis

Principal-component analysis (PCA) is often used to reduce multidimensional data sets to a lower number of dimensions for analysis. PCA retains those characteristics of the data set that contribute most to its variance, by keeping lower-order principal components (the ones that explain a large part of the variance present in the data) and ignoring higher-order ones (that do not explain much of the variance present in the data). Such low-order components often contain the most important aspects of the data.

Eigenvectors are used to define the principal components (i.e. the new uncorrelated composite variables). An eigenvalue gives the proportion of total variance in the initial data that is explained by each eigenvector and its associated principal component.

The challenge is to decide how many principal components to retain, as there is always a trade off. The main drawback is that these components cannot be easily labeled or directly interpreted.


Clustering focuses on sorting observations into groups (clusters) such that observations in the same cluster are more similar to each other than they are to observations in other clusters. Groups are formed based on a set of criteria that may or may not be pre-specified.

  • Cohesion: observations inside each cluster are similar.
  • Separation: observations in two different clusters are not similar.

Euclidian distance is the straight-line distance between two points, and can be used to define "similarity". The smaller the distance, and the more similar the observations. Once the distance is determined, the groups can be created.

Two popular clustering approaches are discussed below.

K-means partitions observations into a fixed number (k) of non-overlapping clusters. Each cluster is characterized by its centroid, and each observation belongs to the cluster with the centroid to which that observation is closest.

The algorithm follows an iterative process until it find these clusters that has minimized intra-cluster distance and maximized inter-cluster distance. It runs fast, and works well in large data sets. The final result, however, depends on the number of pre-determined clusters, and the initial location of the centroids.

Hierarchical clustering is used to build a hierarchy of clusters. The initial data set and the final set are the same. Two main strategies are used to define the intermediary clusters.

Agglomerative (bottom-up) hierarchical clustering begins with each observation being its own cluster. Then, the algorithm finds the two closest clusters, defined by some measure of distance, and combines them into a new, larger cluster. This process is repeated until all the observations are clumped into a single cluster.

Divisive (top-down) hierarchical clustering starts with all observations belonging to a single cluster. The observations are then divided into two clusters based on some measure of distance. The algorithm then progressively partitions the intermediate clusters into smaller clusters until each contains only one observation.

User Contributed Comments 0

You need to log in first to add your comment.
I was very pleased with your notes and question bank. I especially like the mock exams because it helped to pull everything together.
Martin Rockenfeldt

Martin Rockenfeldt

My Own Flashcard

No flashcard found. Add a private flashcard for the subject.