Exploring State Demographics

Students look for linear relationships in demographic data about US states using scatter plots in Pyret. Emphasis is placed on testing our hypotheses by making scatter plots, rather than making plots before really thinking about them.

Lesson Goals

Students will be able to…

Represent data on two quantitative variables on a scatter plot and describe how the variables are related.
Read and interpret real world data presented in a scatter plot.
Identify situations than can be modeled by a linear relationship.

Student-facing Lesson Goals

Let’s explore demographic data from the US states and Washington, D.C.

Materials

Key Points For The Facilitator

Heads up: This lesson opens with an optional QandA for those who have completed our Fitting Models lesson. If you haven’t taught that lesson, open with the states discussion that follows.
This lesson is the first of four in the linear models sequence, which establishes a structure we will revisit for exploring other kinds of relationships in data:
- identify possible relationships in a dataset
- build a model from samples
- fit a model
- improve upon the model

Preparation

If you are using our Google Slides, you will see the word "Optional" in the title of any slide that corresponds to an optional section of the lesson plan. Adjust the slides based on which portions of the lesson you will be doing with your students.
Bootstrap: Data Science introduces the concepts of Form, Direction and Strength in the lesson on Correlations. Some of our Algebra 2 teachers like to cover that material before teaching this lesson. Neither the lesson nor the terminology are prerequisites for this lesson.

🔗Exploring the Data

Overview

Students explore relationships between columns in the State Demographics dataset and practice defining rows in Pyret.

Launch

Let’s think back to our Fitting Models lesson…

What kind of data does the age variable represent? What about pounds?
Both age and pounds are quantitative variables.
What kind of data visualization helped us to analyze the relationship between weight and adoption time?
A scatter plot, because it shows the relationship between two quantitative variables
When we fit a model to the scatter plot, what measure did we use to determine how well it fit the lizard data?
We used S — the Standard Deviation of the Residuals — to measure fitness.
When comparing models for a given dataset, the model with the lowest S makes predictions with the least error.

We’re going to be working with a dataset about the states in the US. Let’s pick a few states to keep an eye out for as we work.

What states should we focus on besides our own?
Our neighbors!
A state we’ve always wanted to visit!
Solicit other ideas…

The dataset we are going to be working with locates each state within a region of the United States. Cartographers aren’t in total agreement about how best to describe regions of the U.S.

What would you call the region we live in?
Examples: New England, West Coast, Southeast..
What other states are in this region?
Answers will vary…

Come to a consensus about which states your students will explore. When more students are looking into the same data, you’ll find much richer class discussions! If students aren’t familiar with neighboring states, here’s a useful map!

If your students strongly disagree with how the dataset categorizes what region your state is a part of, please let us know!

Open the State Demographics Starter File and save a copy that’s just for you. Then click "Run".
Turn to Exploring the States Dataset and take a minute to record your Notices and Wonders in the table at the top.

What did you Notice?
What did you Wonder?
Which column in this dataset will we generally use as our identifier column?
state
Which columns in this dataset are categorical?
region, pop-trend, poverty-rate
Which columns in this dataset have to do with wealth?
pct-in-poverty, poverty-rate, median-income, per-capita-income
Which columns in this dataset are about education levels?
pct-college-or-higher, pct-hs-or-higher

With a partner, complete Exploring the States Dataset.

What did you learn about defining rows in Pyret?
Example: x = row-n(states-table, 0) will make the name x have the value of the first row in the table (the index starts at zero!).
How would you define a name y to be the value of the second row in the table? The third?
y = row-n(states-table, 1) for the second row. Change the 1 to a 2 for the third.
Would a model built from two states with low median-income be likely to fit the rest of the data well? Why or Why not?
No! This is a particular subset of the data with shared characteristics (also called a grouped sample) and is unlikely to be representative of the pattern in the full dataset.

In math, x = 4 will define a variable x to be the value 4.

Any time we see x after it’s been defined, we can substitute in the value of 4.

This works in Pyret, too. But in Pyret, values can be more than just numbers!

In this file, the variables alabama and alaska are defined as rows from the table.

Debrief the rest of the page with students.

Investigate

With your partner, make a prediction: Identify two pairs of quantitative columns from the list in the Definitions Area of State Demographics Starter File that you think might have a relationship.
Record your reasoning in questions 1 and 2 of Looking for Patterns.

Exploring the States Dataset

The State Demographics Starter File has a lot of interesting data, and endless possible combinations of columns to explore. But randomly smashing columns together in a scatter plot is not the habit we want students to cultivate! Instead, make sure students are actually talking with their partners about why two columns may or may not be related.

Making sense: can students predict these relationships, and explain their thinking?
(If so, probably not worth having them spend time on more than one of them!)

pop-2010 vs. pop-2020.
pop-2020 vs. num-households
num-housing-units vs. num-households
num-households vs. num-veterans

The District of Columbia: DC often shows up as an outlier or extreme value. But why?

The dataset is designed so that students will quickly begin searching for relationships between varying levels of education and income, and there are linear relationships in each of them. Here are a few relationships to spark students' interest.

pct-college-or-higher vs. pct-in-poverty
median-income vs. pct-college-or-higher
median-income vs. pct-home-owners
pct-college-or-higher vs. pct-home-owners
pct-home-owners vs. num-housing-units
median-income vs. per-capita-income

What columns did you decide might have relationships? Why?
Ideally students will have identified at least one pair of columns that connect income and education.
We can only look for relationships between quantitative columns, so make sure students are not trying to work with categorical columns.

Complete Looking for Patterns
As you work, keep an eye out for what you can learn about the states we decided to focus on.

How did your predictions compare to the scatter plots you made in Pyret?
Which columns appear to have the strongest relationships?
Answers will vary. Some contenders include:
positive relationship: pct-college-or-higher and per-capita-income
negative relationship: pct-in-poverty and median-income
strong, but not particularly interesting:
- pop-2010 and pop-2020
- per-capita-income and median-income
What did you learn about the states we decided to keep an eye out for?

Synthesize

Why did we use scatter plots for our exploration of this dataset?
Because we were looking for relationships between columns
Share your scatter plots with one another. (Perhaps by copying and pasting scatter-plots into a shared document and then labeling them?)
Did you and your classmates use similar words to describe the scatter plots you came up with? If so, what were they?

Note: Students will acquire the formal vocabulary that data scientists use to assess relationships in Building Linear Models, which is all about identifying form, direction, and strength.

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.