Students look for linear relationships in demographic data about US states using scatter plots in Pyret. Emphasis is placed on testing our hypotheses by making scatter plots, rather than making plots before really thinking about them.
Lesson Goals |
Students will be able to…
|
Student-facing Lesson Goals |
|
Materials |
|
Key Points For The Facilitator |
|
🔗Exploring the Data
Overview
Students explore relationships between columns in the State Demographics dataset and practice defining rows in Pyret.
Launch
Let’s think back to our Fitting Models lesson…
-
What kind of data does the
age
variable represent? What aboutpounds
? -
Both
age
andpounds
are quantitative variables. -
What kind of data visualization helped us to analyze the relationship between weight and adoption time?
-
A scatter plot, because it shows the relationship between two quantitative variables
-
When we fit a model to the scatter plot, what measure did we use to determine how well it fit the lizard data?
-
We used S - the Standard Deviation of the Residuals - to measure fitness.
-
When comparing models for a given dataset, the model with the lowest S makes predictions with the least error.
We’re going to be working with a dataset about the states in the US. Let’s pick a few states to keep an eye out for as we work.
-
What states should we focus on besides our own?
-
Our neighbors!
-
A state we’ve always wanted to visit!
-
Solicit other ideas…
The dataset we are going to be working with locates each state within a region of the United States. Cartographers aren’t in total agreement about how best to describe regions of the U.S.
-
What would you call the region we live in?
-
Examples: New England, West Coast, Southeast..
-
What other states are in this region?
-
Answers will vary…
Come to a consensus about which states your students will explore. When more students are looking into the same data, you’ll find much richer class discussions! If students aren’t familiar with neighboring states, here’s a useful map!
If your students strongly disagree with how the dataset categorizes what region your state is a part of, please let us know!
-
Open the State Demographics Starter File and save a copy that’s just for you. Then click "Run".
-
Turn to Exploring the States Dataset and take a minute to record your Notices and Wonders in the table at the top.
-
What did you Notice?
-
What did you Wonder?
-
Which column in this dataset will we generally use as our identifier column?
-
state
-
Which columns in this dataset are categorical?
-
region
,pop-trend
,poverty-rate
-
Which columns in this dataset have to do with wealth?
-
pct-in-poverty
,poverty-rate
,median-income
,per-capita-income
-
Which columns in this dataset are about education levels?
-
pct-college-or-higher
,pct-hs-or-higher
-
With a partner, complete Exploring the States Dataset.
-
What did you learn about defining rows in Pyret?
-
Example:
x = row-n(states-table, 0)
will make the namex
have the value of the first row in the table (the index starts at zero!). -
How would you define a name
y
to be the value of the second row in the table? The third? -
y = row-n(states-table, 1)
for the second row. Change the1
to a2
for the third. -
Would a model built from two states with low
median-income
be likely to fit the rest of the data well? Why or Why not? -
No! This is a particular subset of the data with shared characteristics (also called a grouped sample) and is unlikely to be representative of the pattern in the full dataset.
In math, x = 4 will define a variable x to be the value 4.
Any time we see x after it’s been defined, we can substitute in the value of 4.
This works in Pyret, too. But in Pyret, values can be more than just numbers!
In this file, the variables alabama
and alaska
are defined as rows from the table.
Debrief the rest of the page with students.
Investigate
-
With your partner, make a prediction: Identify two pairs of quantitative columns from the list in the Definitions Area of State Demographics Starter File that you think might have a relationship.
-
Record your reasoning in questions 1 and 2 of Looking for Patterns.
Exploring the States Dataset
The State Demographics Starter File has a lot of interesting data, and endless possible combinations of columns to explore. But randomly smashing columns together in a scatter plot is not the habit we want students to cultivate! Instead, make sure students are actually talking with their partners about why two columns may or may not be related.
Making sense: can students predict these relationships, and explain their thinking?
(If so, probably not worth having them spend time on more than one of them!)
-
pop-2010
vs.pop-2020
. -
pop-2020
vs.num-households
-
num-housing-units
vs.num-households
-
num-households
vs.num-veterans
The District of Columbia: DC often shows up as an outlier or extreme value. But why?
The dataset is designed so that students will quickly begin searching for relationships between varying levels of education and income, and there are linear relationships in each of them. Here are a few relationships to spark students' interest.
-
pct-college-or-higher
vs.pct-in-poverty
-
median-income
vs.pct-college-or-higher
-
median-income
vs.pct-home-owners
-
pct-college-or-higher
vs.pct-home-owners
-
pct-home-owners
vs.num-housing-units
-
median-income
vs.per-capita-income
-
What columns did you decide might have relationships? Why?
-
Ideally students will have identified at least one pair of columns that connect income and education.
-
We can only look for relationships between quantitative columns, so make sure students are not trying to work with categorical columns.
-
Complete Looking for Patterns
-
As you work, keep an eye out for what you can learn about the states we decided to focus on.
-
How did your predictions compare to the scatter plots you made in Pyret?
-
Which columns appear to have the strongest relationships?
-
Answers will vary. Some contenders include:
-
positive relationship:
pct-college-or-higher
andper-capita-income
-
negative relationship:
pct-in-poverty
andmedian-income
-
strong, but not particularly interesting:
-
pop-2010
andpop-2020
-
per-capita-income
andmedian-income
-
-
What did you learn about the states we decided to keep an eye out for?
Synthesize
-
Why did we use scatter plots for our exploration of this dataset?
-
Because we were looking for relationships between columns
-
Share your scatter plots with one another. (Perhaps by copying and pasting scatter-plots into a shared document and then labeling them?)
-
Did you and your classmates use similar words to describe the scatter plots you came up with? If so, what were they?
Note: Students will acquire the formal vocabulary that data scientists use to assess relationships in Building Linear Models, which is all about identifying form, direction, and strength.
These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927).
Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.