Jump to: | Unit 1 | Unit 2 | Unit 3 | Unit 4 |
Suggested readings: Section 1.1 and 1.2 of OpenIntro Statistics with Randomization and Simulation
Learning Objective (LO) 1. Identify variables as numerical and categorical.
If the variable is numerical, further classify it as continuous or discrete based on whether or not the variable can take on an infinite number of values or only non-negative whole numbers, respectively.
If the variable is categorical, determine if it is ordinal based on whether or not the levels have a natural ordering.
LO 2. Define associated variables as variables that show some relationship with one another. Further categorize this relationship as positive or negative association, when possible.
LO 3. Define variables that are not associated as independent.
Test yourself: Give one example of each type of variable you have learned.
Suggested readings: Sections 1.3 and 1.5 of OpenIntro Statistics with Randomization and Simulation
LO 4. Identify the explanatory variable in a pair of variables as the variable suspected of affecting the other. However, note that labeling variables as explanatory and response does not guarantee that the relationship between the two is actually causal, even if there is an association identified between the two variables.
LO 5. Classify a study as observational or experimental, then determine and explain whether the study’s results can be generalized to the population and whether the results suggest correlation or causation between the quantities studied.
If random sampling has been employed in data collection, the results should be generalizable to the target population.
If random assignment has been employed in study design, the results suggest causality.
LO 6. Question confounding variables and sources of bias in a given study.
Test yourself: Describe when a study’s results can be generalized to the population at large and when causation can be inferred.
Explain why random sampling allows for generalizability of results.
Explain why random assignment allows for making causal conclusions.
Suggested reading: Section 1.6 of OpenIntro Statistics with Randomization and Simulation
LO 7. Use scatterplots for describing the relationship between two numerical variables, making sure to note the direction (positive or negative), form (linear or non-linear), and the strength of the relationship as well as any unusual observations that stand out.
LO 8. When describing the distribution of a numerical variable, mention its shape, center, and spread, as well as any unusual observations.
LO 9. Note that there are three commonly used measures of center and spread:
center: mean (the arithmetic average), median (the midpoint), mode (the most frequent observation).
spread: standard deviation (variability around the mean), range (max-min), interquartile range (middle 50% of the distribution).
LO 10. Identify the shape of a distribution as symmetric, right skewed, or left skewed, and unimodal, bimodoal, multimodal, or uniform.
LO 11. Use histograms and box plots to visualize the shape, center, and spread of numerical distributions, and intensity maps for visualizing the spatial distribution of the data.
LO 12. Define a robust statistic (e.g. median, IQR) as a statistic that is not heavily affected by skewness and extreme outliers, and determine when such statistics are more appropriate measures of center and spread compared to other similar statistics.
Suggested readings: Section 1.7 of OpenIntro Statistics with Randomization and Simulation
LO 13. Use frequency tables and bar plots to describe the distribution of one categorical variable.
LO 14. Use side-by-side box plots for assessing the relationship between a numerical and a categorical variable.
Suggested reading: Section 3.1 of OpenIntro Statistics with Randomization and Simulation
Learning Objective (LO) 1. Define population proportion (parameter) and sample proportion (point estimate).
LO 2. Calculate the sampling variability of the proportion, the standard error, as , where is the population proportion. Note that when the population proportion is not known (almost always), this can be estimated using the sample proportion, .
LO 3. Recognize that the Central Limit Theorem (CLT) is about the distribution of point estimates, and that given certain conditions, this distribution will be nearly normal. In the case of the proportion the CLT tells us that if the observations in the sample are independent, the sample size is sufficiently large (checked using the success/failure condition: and , then the distribution of the sample proportion will be nearly normal, centered at the true population proportion and with a standard error of .
LO 4. Remember that confidence intervals are calculated as $point estimate\pmmargin of error and test statistics are calculated as .
Suggested reading: Section 3.2 of OpenIntro Statistics with Randomization and Simulation
Suggested reading: Section 4.1 of OpenIntro Statistics with Randomization and Simulation
LO 10. Use the -distribution for inference on a single mean, difference of paired (dependent) means, and difference of independent means.
LO 11. Explain why the -distribution helps make up for the additional variability introduced by using (sample standard deviation) in calculation of the standard error, in place of (population standard deviation).
LO 12. Describe how the -distribution is different from the normal distribution, and what “heavy tail” means in this context.
LO 13. Note that the -distribution has a single parameter, degrees of freedom, and as the degrees of freedom increases this distribution approaches the normal distribution.
LO 14. Note that the -distribution has a single parameter, degrees of freedom, and as the degrees of freedom increases this distribution approaches the normal distribution.
Suggested reading: Section 4.2 of OpenIntro Statistics with Randomization and Simulation
LO 17. Define observations as paired if each observation in one dataset has a special correspondence or connection with exactly one observation in the other data set.
LO 18. Carry out inference for paired data by first subtracting the paired observations from each other, and then treating the set of differences as a new numerical variable on which to do inference (such as a confidence interval or hypothesis test for the average difference).
LO 19. Calculate the standard error of the difference between means of two paired (dependent) samples as and use this standard error in hypothesis testing and confidence intervals comparing means of paired (dependent) groups.
LO 21. Recognize that a good interpretation of a confidence interval for the difference between two parameters includes a comparative statement (mentioning which group has the larger parameter).
Suggested reading: Section 4.3 of OpenIntro Statistics with Randomization and Simulation
LO 23. Calculate the standard error of the difference between means of two independent samples as and use this standard error in hypothesis testing and confidence intervals comparing means of independent groups.
LO 24. Use a t-statistic, with degrees of freedom for inference for the difference in two independent means: