|Jump to:||Unit 1||Unit 2||Unit 3||Unit 4|
Suggested readings: Section 1.1 and 1.2 of OpenIntro Statistics with Randomization and Simulation
Learning Objective (LO) 1. Identify variables as numerical and categorical.
If the variable is numerical, further classify it as continuous or discrete based on whether or not the variable can take on an infinite number of values or only non-negative whole numbers, respectively. If the variable is categorical, determine if it is ordinal based on whether or not the levels have a natural ordering.
LO 2. Define associated variables as variables that show some relationship with one another. Further categorize this relationship as positive or negative association, when possible.
LO 3. Define variables that are not associated as independent.
Test yourself: Give one example of each type of variable you have learned.
Suggested readings: Sections 1.3 and 1.5 of OpenIntro Statistics with Randomization and Simulation
LO 4. Identify the explanatory variable in a pair of variables as the variable suspected of affecting the other. However, note that labeling variables as explanatory and response does not guarantee that the relationship between the two is actually causal, even if there is an association identified between the two variables.
LO 5. Classify a study as observational or experimental, then determine and explain whether the study’s results can be generalized to the population and whether the results suggest correlation or causation between the quantities studied.
If random sampling has been employed in data collection, the results should be generalizable to the target population. If random assignment has been employed in study design, the results suggest causality.
LO 6. Question confounding variables and sources of bias in a given study.
Test yourself: Describe when a study’s results can be generalized to the population at large and when causation can be inferred.
Explain why random sampling allows for generalizability of results.
Explain why random assignment allows for making causal conclusions.
Suggested reading: Section 1.6 of OpenIntro Statistics with Randomization and Simulation
LO 7. Use scatterplots for describing the relationship between two numerical variables, making sure to note the direction (positive or negative), form (linear or non-linear), and the strength of the relationship as well as any unusual observations that stand out.
LO 8. When describing the distribution of a numerical variable, mention its shape, center, and spread, as well as any unusual observations.
LO 9. Note that there are three commonly used measures of center and spread:
center: mean (the arithmetic average), median (the midpoint), mode (the most frequent observation). spread: standard deviation (variability around the mean), range (max-min), interquartile range (middle 50% of the distribution).
LO 10. Identify the shape of a distribution as symmetric, right skewed, or left skewed, and unimodal, bimodoal, multimodal, or uniform.
LO 11. Use histograms and box plots to visualize the shape, center, and spread of numerical distributions, and intensity maps for visualizing the spatial distribution of the data.
LO 12. Define a robust statistic (e.g. median, IQR) as a statistic that is not heavily affected by skewness and extreme outliers, and determine when such statistics are more appropriate measures of center and spread compared to other similar statistics.
Suggested readings: Section 1.7 of OpenIntro Statistics with Randomization and Simulation
LO 13. Use frequency tables and bar plots to describe the distribution of one categorical variable.
LO 14. Use side-by-side box plots for assessing the relationship between a numerical and a categorical variable.