Map > Data
Science > Explaining the Past
> Data Exploration > Univariate
Analysis > Numerical Variables |
|
|
|
|
|
Numerical Variables
|
|
|
A numerical or continuous variable (attribute) is one that may take on any value within a finite or infinite
interval (e.g., height, weight, temperature, blood glucose, ...). There are two types of
numerical variables, interval and ratio. An interval variable
has values whose differences are interpretable, but it does not have a true zero. A good example is temperature in Centigrade degrees. Data on an interval scale can be added and subtracted but cannot be meaningfully multiplied or divided.
For example, we cannot say that one day is twice as hot as another day. In
contrast, a
ratio variable has values with a true zero and can be added,
subtracted, multiplied or divided (e.g., weight). |
|
|
|
|
|
Univariate
Analysis - Numerical |
Statistics |
Visualization |
Equation |
Description |
Count |
Histogram |
N |
The number of values
(observations) of the variable. |
Minimum |
Box Plot |
Min
|
The smallest value of
the variable.
|
Maximum |
Box Plot |
Max
|
The largest value of the
variable.
|
Mean |
Box Plot |
|
The sum of the
values divided by the count.
|
Median |
Box Plot |
|
The middle value. Below and above
median lies an equal number of values.
|
Mode |
Histogram |
|
The most frequent value.
There can be more than one mode.
|
Quantile |
Box Plot |
|
A set of 'cut points'
that divide a set of data into groups containing equal numbers of
values (Quartile, Quintile, Percentile, ...).
|
Range |
Box Plot |
Max-Min
|
The difference between maximum and
minimum. |
Variance |
Histogram |
|
A measure of data dispersion. |
Standard
Deviation |
Histogram |
|
The square root of variance. |
Coefficient of
Variation |
Histogram |
|
A measure of data dispersion divided by
mean. |
Skewness |
Histogram |
|
A measure of symmetry or asymmetry in the distribution
of data. |
Kurtosis |
Histogram |
|
A measure of whether the data are peaked or flat relative to a normal distribution. |
|
|
|
|
|
|
Box plot and histogram for the "sepal length" variable
from the Iris dataset. |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Example: |
|
|
Statistical analysis using Microsoft
Excel (Iris.xls) |
|
|
|
|
|
sepal length |
Count |
150 |
Minimum |
4.3 |
Maximum |
7.9 |
Mean |
5.84 |
Median |
5.8 |
Mode |
5 |
Quartile
1 |
5.1 |
Range |
3.6 |
Variance |
0.69 |
Standard
Deviation |
0.83 |
Coefficient
of Variation |
14.2% |
Skewness |
0.31 |
Kurtosis |
-0.55 |
|
|
|
|
|
|
|
|
|