The information in this list was taken from slides prepared by Professor Herman Bennett of the Harvard Kennedy School. This list will cover the following:→What data are

» Observation, variable, population, sample

→How to graphically represent data

» Histograms and distributions

→How to describe data features in a single number

» Descriptive statistics: mean, variance, standard deviation, etc

→How to describe the relationship between two variables

» Covariance and correlation

» Correlation and causality

A distribution with two peaks. | Bimodal Distribution | - | - | - |

A common tool used to summarize data by organizing it into r | Bins | - | - | - |

A variable that is itself not a number, but datasets represe | Categorical Variable | - | - | - |

measures the symettry of a data set around the mean. | Coefficient of Skewness | - | - | - |

Ratio of the standard deviation of the data set to the mean. | Coefficient of Variation | - | - | - |

Denoted by r, it allows comparisons between covariances by d | Correlation Coefficient | - | - | - |

The basic statistic to measure the relationship between two | Covariance | - | - | - |

A histogram that plots the fraction of observations in the d | Cumulative Frequency Distribution | - | - | - |

Numerical representations of phenomena we observe in the wor | Data | - | - | - |

The entire collection of observations and variables that are | Dataset | - | - | - |

A special case of categorical variable that takes the value | Dummy Variable | - | - | - |

A barchart which displays the frequency of observations in a | Histogram | - | - | - |

The arithmetic average of all values of a variable in the da | Mean | - | - | - |

A common measure that best describes the middle or center of | Measures of Central Tendency | - | - | - |

Measures that provide information that describes individual | Measures of Dispersion | - | - | - |

The value of the variable for the middle observation when ob | Median | - | - | - |

The peak in a distribution. | Mode | - | - | - |

The most repeated value | Mode | - | - | - |

A variable that is itself a number (i.e. income, age, # of k | Numerical Variable | - | - | - |

The unit of analysis; e.g.: an individual, a political juris | Observation | - | - | - |

A statistical observation that is markedly different in valu | Outlier | - | - | - |

The collection of ALL relevant observations. | Population | - | - | - |

A collection of obser vations comprising only a subset of th | Sample | - | - | - |

Graphs each observation as one dot, using the X-axis for one | Scatter Plot | - | - | - |

Any number that is a function of the data | Statistic | - | - | - |

A distribution with symmetric tails. | Symmetric Distribution | - | - | - |

The diminishing ends of a distribution. | Tails | - | - | - |

A distribution with one peak. | Unimodal Distribution | - | - | - |

A characteristic recorded for each observation. | Variable | - | - | - |

The average of squared deviations from the mean | Variance | - | - | - |

Gives the number below which X% of the data lies. | Xth percentile | - | - | - |

Measures how far an observation is from the mean of the dist | Z-Score | - | - | - |

