What is Statistics?
All of these questions reflect recent headlines from various news sites and all of them depend upon statistics for their answers.
Statistics encompasses methods for data collection, analysis, and interpretation. Statistical methods influence our lives daily. They help to determine the medicines that are available to us, the ads that we get from the grocery store, and the websites our search engines refer us to. They are essential to weather forecasts, our insurance rates, and the quality control of the products we buy. We use them to keep track of our health, the economy, and social issues such as race and gender equality, crime rates, and poverty. We see them in the form of graphics and charts, means and percentages, conclusions and predictions in academic research, media reports, and advertisements.
StatsStuff covers basic statistical methods in the following areas:
- Data Collection: This involves collecting data in such a way as to reduce biases and to facilitate the production of accurate and reliable conclusions.
- Data Summary: These methods are used to create numerical and visual descriptions of data that enable us to grasp key information without looking at the data values in detail.
- Probability Theory: Probability is the mathematics that underlies the science of statistics.
- Statistical Inference: The process used to draw conclusions about large populations based on well chosen subsets or samples.
StatsStuff's purpose is to introduce basic ideas and concepts and to help site users to become more aware of how statistical methods influence their lives.
Statistics and Variation
The field of Statistics is largely concerned with describing variation in data, that is, difference between subjects in the group or groups being studied. Variation in data can arise from a number of sources:
- Natural variation: differences that exist inherently between subjects such as size, weight, or color.
- Induced variation: differences that exist because the researcher introduced them. For instance, a researcher induces variation in a drug study by giving some subjects a treatment and some a placebo.
- Measurement variation: differences due to inadquacies in the tools or methods we use to create measurements.
- Sampling variation: differences that exist due to the many possibilities for choosing a sample.
For a more in-depth description of sources of variation, see the GAISE report, page 11.
In Statistics, we describe the variation of a particular variable by its distribution. A distribution can describe a variable related to observed data (such as how much money each person in a sample spends on fast food) or to an experiment (such as the outcomes of a die roll). Essentially, a distribution indicates the possible values of a variable and their associated frequencies or probabilities. Distributions can be displayed as formulas, tables, or graphs. Distributions will be be important throughout our discussion of statistics.
$\small{P(outcome = x) = \frac{1}{6}}$ for $\small{x = 1, 2, 3, 4, 5, 6}$