Hacker theme

Hacker is a theme for GitHub Pages.

Download as .zip Download as .tar.gz View on GitHub
8 August 2025

Mind on Statistics (6th. Ed) Chapter 2 - Turning Data into Information

by Arpon Sarker

Introduction

The objectives are:

Definitions

Raw Data: term used for numbers and category labels that have been collected but have not yet been processed.

Variable: a characteristic that can differ from one individual to the next

Observational Unit/Observation: a single individual entity in a study

Statistic: A summary measure computed from sample data

Parameter: A summary measured using entire population

Distribution: describes how often the possible responses of a variable occur. This is either a frequency distribution (counts) or relative frequency distribution (percentages)

Percentile: kth percentile is a number that has k% of the data values at or below it and (100-k%) of the data values at or below it.

Visual Summaries

Categorical Variables

Quantitative Variables

Distribution is based on location, spread and shape of data.

Numerical Summaries of Quantitative Variables

Location

Mean: \(\bar{x}=\frac{\sum{x_i}}{n}\) Median: Find middle value in ordered dataset

If dataset is skewed then mean and median are not equal.

Spread

Range = high value - low value

Interquartile Range (IQR) = upper quartile $Q_3$ - lower quartile $Q_1$

Standard Deviation

How to Handle Outliers

If the outlier is a legitimate data value and represents natural variability for the group and variables measured: Do not discard legitimate values unless goal is to study only a partial range of the possible values.

If a mistake was made while taking measurements or entering into the computer: Outliers should be corrected and retained, otherwise discard.

The individual in question belongs to a different group than the bulk of individuals measured: consider reason for studying the data in deciding whether to discard or not.

Bell-Shaped Distributions and Standard Deviations

Sample standard deviation \(s = \sqrt{\frac{\sum{(x_i-\bar{x})^2}}{n-1}}\) Sample variance \(s^2=\frac{\sum{(x_i-\bar{x})^2}}{n-1}\)

Empirical Rule:

The Empirical Rule implies range from minimum to maximum data values equals 4 to 6 standard deviations so for relatively large samples, you can get a rough idea of the value of the standard deviation: \(s \approx \frac{range}{6}\)

A standardised score or z-score measures how far a value is from the mean in terms of standard deviations.

tags: mathematics - statistics