What factors make some people live longer than others? Are more expensive restaurants really better? Is voter fraud a problem? What data would you need to gather to answer these questions, and how would you measure that data to get your answer? Answering real questions in the world involves analyzing datasets, from sports stats to food sales to census information.

In Bootstrap:Data Science, students form their own questions about the world around them, analyze data using multiple methods, and write a research paper about their findings. The module covers functions, looping and iteration, data visualization, linear regression, and more. Social studies, science, and business teachers can utilize this module to help students make inferences from data. Math teachers can use this module to introduce foundational concepts in statistics, and it is aligned to the Data standards in CS Principles.

The final project in Bootstrap:Data Science can be used as the Create Task for AP CS Principles!

We provide all of our materials free of charge, to anyone who is interested in using our lesson plans or student workbooks.

Lesson Plans

Introduction to Computational Data Science

Students are introduced to the Animals Dataset, learn about Tables, Categorical and Quantitative data, and consider the kinds of questions that can be asked about a dataset.

Starting to Program

Students begin to program in Pyret, learning about basic datatypes, operations, and value definitions.

Applying Functions

Students learn how to apply Functions, and how to interpret the information contained in a Contract: Name, Domain and Range. They then use this knowledge to explore more of the Pyret language.

Displaying Categorical Data

Students learn to apply functions to entire Tables, generating pie charts and bar charts. They then explore other plotting and display functions that are part of the Data Science library.

Data Displays and Lookups

Students continue to practice making different kinds of data displays, this time focusing less on programming and more on using displays to answer questions. They also learn how to extract individual rows from a table, and columns from a row.

Defining Functions

Students learn a structured approach to problem solving called the “Design Recipe”. They then use these functions to create images, and learn how to apply them to enhance their scatterplots.

Table Methods

Students learn about table methods, which allow them to order, filter, and build columns to extend the animals table.

Defining Table Functions

Students continue practicing the Design Recipe, writing helper functions to filter rows and build columns in the Animals Dataset, using Methods.

Method Chaining

Students continue practicing their Design Recipe skills, making lots of simple functions dealing with the Animals Dataset. Then they learn how to chain Methods together, and define more sophisticated subsets.

If-Expressions

Students build on their knowledge of the image-scatter-plot function, motivating the need for if-expressions in their programming toolkit. This drives deeper insight into subgroups within a population, and motivates the need for more advanced analysis.

Randomness and Sample Size

Students learn about random samples and statistical inference, as applied to the Animals Dataset. In the process, students get a light introduction to the role of sample size and the importance of statistical inference.

Grouped Samples

Students learn about grouped samples, and practice creating them from the Animals Dataset. In the process, they practice using the Design Recipe to create filter functions, and come up with questions they wish to explore.

Choosing Your Dataset

Students summarize their dataset by exploring the data and identifying categorical and quantitative columns, datatypes, and more. They also define a few sample rows, random subsets, and logical subsets.

Histograms

Students explore new visualizations in Pyret, this time focusing on the distribution in a quantitative dataset. Students are introduced to Histograms by comparing them to bar charts, and learn to construct them by hand and in Pyret.

Visualizing the “Shape” of Data

Students explore the concept of "shape", using histograms to determine whether a dataset has skewness, and what the direction of the skewness means. They apply this knowledge to the Animals Dataset, and then to their own.

Measures of Center

Students learn different ways to report the center of a quantitative data set: mean, median and mode(s). After applying these concepts to a contrived dataset, they apply them to their own datasets and interpret the results.

Spread of a Data Set

Students learn how to evaluate the spread of a quantitative column using box plots, and explore how this offers a different perspective on shape from what can be achieved with a histogram. After applying these concepts to a contrived dataset, they apply them to their own datasets and interpret the results.

Checking Your Work

Students consider the concept of trust and testing — how do we know if a particular analysis is trustworthy?

Scatter Plots

Students investigate scatter plots as a method of visualizing the relationship between two quantitative variables.

Correlations

Students continue to interpret scatter plots, and think about direction and strength of linear relationships.

Linear Regression

Students compute the “line of best fit” using linear regression, and summarize linear relationships in a dataset.

Ethics and Privacy

Students consider ethical issues and privacy in the context of data science.

Threats to Validity

Students consider possible threats to the validity of their analysis.

All the lessons

This is a single page that contains all the lessons listed above.

Other Resources

Of course, there’s more to a curriculum than software and lesson plans! We also provide a number of resources to educators, including standards alignment, a complete student workbook, an answer key for the programming exercises and a forum where they can ask questions and share ideas.

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). CCbadge Bootstrap:Data Science by Emmanuel Schanzer, Nancy Pfenning, Emma Youndtsmith, Jennifer Poole, Shriram Krishnamurthi, Joe Politz, Ben Lerner, and Dorai Sitaram is licensed under a Creative Commons 4.0 Unported License. Based on a work at www.BootstrapWorld.org. Permissions beyond the scope of this license may be available by contacting schanzer@BootstrapWorld.org.