Why should I choose AnalystNotes?

AnalystNotes specializes in helping candidates pass. Period.

Subject 1. Data Types PDF Download

Data can be defined as a collection of numbers, characters, words, text, images, audio and video in a raw or organized format to represent facts or information.

Numerical versus Categorical Data

From a statistical perspective, data can be classified as numerical data and categorical data.

Numerical data are values to represent measured or counted quantities as a number. They can be further split into two types:

  • Continuous data can be measured and take on any numerical value in a specified range of values. There are normally lots of decimal places involved and (theoretically, at least) there are no gaps between permissible values (i.e., all values can be included in the data set).

    Examples would include the height of a person and the time to complete an assignment. These values can be measured using sufficiently accurate tools to numerous decimal places.

  • Discrete data result from a counting process and therefore are limited to a finite number of values. That is, the values in the data set can be counted. There are distinct spaces between the values, such as the number of children in a family or the number of shares comprising an index.

Categorical data are values that describe a quality or characteristic of a group of observations and usually take only a limited number of values that are mutually exclusive. They can be further classified into nominal data and ordinal data.

Nominal data are not amenable to being organized in a logical order.

  • Nominal measurement represents the weakest level of measurement.
  • It consists of assigning items to groups or categories.
  • No quantitative information is conveyed and no ordering (ranking) of the items is implied.
  • Nominal scales are qualitative rather than quantitative.

Religious preference, race, and sex are all examples of nominal scales. Another example is portfolio managers categorized as value or growth style will have a scale of 1 for value and 2 for growth. Frequency distributions are usually used to analyze data measured on a nominal scale. The main statistic computed is the mode. Variables measured on a nominal scale are often referred to as categorical or qualitative variables.

Ordinal data are categorical values that can be logically ordered and ranked.

  • Measurements on an ordinal scale are categorized.
  • The various measurements are then ranked in their categories.
  • Measurements with ordinal scales are ordered with higher numbers representing higher values. The intervals between the numbers are not necessarily equal.

Example 1

On a 5-point rating scale measuring attitudes toward gun control, the difference between a rating of 2 and a rating of 3 may not represent the same difference as that between a rating of 4 and a rating of 5.

Example 2

Two categories might be value and growth. Within each category, the portfolio managers measured will be weighted according to performance on a scale from 1 to 10, with 1 being the best- and 10 the worst-performing manager.

There is no "true" zero point for ordinal scales, since the zero point is chosen arbitrarily. The lowest point on the rating scale in the example was arbitrarily chosen to be 1. It could just as well have been 0 or -5.

Cross-Sectional versus Time-Series versus Panel Data

Based on how they are collected, data can be categorized into three types.

Time-series data is a set of observations for a single observational unit collected at usually discrete and equally spaced time intervals. Examples: the daily closing price of a certain stock recorded over the last six weeks, weekly sales figures of ice cream sold during a holiday period at a seaside resort.

Cross-sectional data are observations that come from different observational units at a single point in time. The underlying population should consist of members with similar characteristics. For example, suppose you are interested in how much companies spend on research and development expenses. Firms in some industries, such as retail, spend little on research and development (R&D), while firms in industries such as technology spend heavily on R&D. Therefore, it's inappropriate to summarize R&D data across all companies. Rather, analysts should summarize R&D data by industry and then analyze the data in each industry group.

Panel data is a mix of time-series and cross-sectional data that consists of observations through time on one or more variables for multiple observation units.

Structured versus Unstructured Data

Based on whether or not data are in a highly organized form, they can be classified into structured and unstructured types.

Structured data are highly organized in a pre-defined manner, usually with repeating patterns. Market data, fundamental data and analytical data are typical examples.

Unstructured data do not follow any conventionally organized forms. Common types are text, audio, video. They are typically alternative data as they are usually collected from unconventional sources such as individuals, business processes and sensors.

Typically, unstructured data must first be transformed into structured data that financial models can process.

User Contributed Comments 2

User Comment
beatjeff Nominal scale is the weakest value
achu "NOIR" nominal (nothing but a label) ordinal (only a relative rank) interval, ratio.
You need to log in first to add your comment.
I was very pleased with your notes and question bank. I especially like the mock exams because it helped to pull everything together.
Martin Rockenfeldt

Martin Rockenfeldt

My Own Flashcard

No flashcard found. Add a private flashcard for the subject.

Add

Actions