Grouped Samples

email twitter instagram facebook

Lessons

Standards in this Lesson

Common Core Math Standards

6.EE.B.6: Use variables to represent numbers and write expressions when solving a real-world or mathematical problem; understand that a variable can represent an unknown number, or, depending on the purpose at hand, any number in a specified set.
8.SP.A.1: Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities. Describe patterns such as clustering, outliers, positive or negative association, linear association, and nonlinear association.

CSTA Standards

2-AP-11: Create clearly named variables that represent different data types and perform operations on their values.
2-DA-08: Collect data using computational tools and transform the data to make it more useful and reliable.
2-DA-09: Refine computational models based on the data they have generated.

Oklahoma Standards

OK.3.AP.C.01: Create programs using a programming language that utilize sequencing, repetition, conditionals, and variables to solve a problem or express ideas both independently and collaboratively.
OK.6.D.1.3: Create and analyze box and whisker plots observing how each segment contains one quarter of the data.
OK.7.D.1.2: Use reasoning with proportions to display and interpret data in circle graphs (pie charts) and histograms. Choose the appropriate data display and know how to create the display using a spreadsheet or other graphing technology.
OK.8.DA.CVT.01: Develop, implement, and refine a process that utilizes computational tools to collect and transform data to make it more useful and reliable.
OK.8.DA.S.01: Analyze multiple methods of representing data and choose the most appropriate method for representing data.
OK.A1.D.1.1: Describe a data set using data displays, describe and compare data sets using summary statistics, including measures of central tendency, location, and spread. Know how to use calculators, spreadsheets, or other appropriate technology to display data and calculate summary statistics.
OK.L1.AP.A.01: Create a prototype that uses algorithms (e.g., searching, sorting, finding shortest distance) to provide a possible solution for a real-world problem.
OK.L1.IC.C.02: Test and refine computational artifacts to reduce bias and equity deficits.
OK.PA.A.2.2: Identify, describe, and analyze linear relationships between two variables.
OK.PA.D.1.1: Describe the impact that inserting or deleting a data point has on the mean and the median of a data set. Know how to create data displays using a spreadsheet and use a calculator to examine this impact.

Textbook Alignment

IM 7 Math™

IM.7.8.18: Comparing Populations Using Samples
IM.7.8.11: Comparing Groups

Practices in this Lesson

K12CS

P3: Recognizing and Defining Computational Problems

Science and Engineering

SEP.3: Planning and Carrying Out Investigations

Math

MP.3: Construct viable arguments and critique the reasoning of others
MP.2: Reason abstractly and quantitatively

(Also available in CODAP)

Students practice creating grouped samples (non-random subsets) and think about why it might sometimes be useful to answer questions about a dataset through the lens of one group or another.

Lesson Goals

Students will be able to…

Make grouped samples from a population

Student-facing Lesson Goals

Let’s combine what we know about sampling and filtering with creating displays.

Materials

Lesson Slides
Animals Starter File
Grouped Samples Starter File
Grouped Samples from the Animals Dataset
Displaying Data
Data Cycle: Analyzing Categorical Data
Samples from My Dataset
The Design Recipe
The Design Recipe
Classroom visual: Language Table

Preparation

All students should log into code.pyret.org (CPO) and open their saved "Animals Starter File". If they don’t have the file, they can open a new one from Animals Starter File.

Glossary

grouped sample: a non-random subset of individuals chosen from a larger set, where the individuals belong to a specific group
ratio: the relative sizes of two or more values

🔗Problems with a Single Population 10 minutes

Overview

This activity is all about grouped samples: Students make a bunch of non-random samples from the Animals Dataset, and see how each samples might answer the same question differently.

Launch

When looking at a scatter plot of animals, it looks like the amount an animal weighs may have something to do with how long it takes to be adopted. A scatter plot with dots loosely clustered around a line with a positive slope 🖼Show image

But if we label the dots by animal, we notice every data point after 25 pounds belongs to a dog from the shelter! The cats are all clumped together in the lower weight range, making it hard to see how weeks to adoption may relate to a cat’s weight.

A scatter plot with images of each species of animal in place of the dots, loosely clustered around a line with a positive slope 🖼Show image

Investigate

Divide the class into groups of 3-4, with one student identified as the "reporter".

Looking at this scatter plot (above), does it make sense to analyze all the animals together? Why or why not?
- No. Every data point after 25 pounds belongs to a dog from the shelter. The cats are clumped in the lower weight range.
Are there some questions where it would be important to break up the population into species-specific populations? What are they?
- Sample response: Yes. If we want to know whether dogs or cats are more likely to be fixed, we would need to look at each species separately.
Are there some questions where it would be important to keep the whole population together? What are they?
- Sample response: Yes. If we want to know if, in general, young animals are adopted more quickly, we would look at the entire population.

Have the reporters share their findings with the class.

Synthesize

You’ve been handed a dataset from a country where half the people have access to amazing medical care, and the other half have no healthcare.

Why might it be important to look at a particular sample of a population?
- Sample response: Maybe we want to determine if emissions from a nearby factory impact the health of residents of one particular neighborhood.
Why is it sometimes bad to blindly take random samples?
- If we took a random sample of the population as a whole, we might think that they are generally middle-income and have average health. But if we ask the same question about the two groups _separately, we would discover inequality hiding in plain sight!_

🔗Grouped Samples 20 minutes

Launch

Depending on the question we’re asking, sometimes it makes more sense to ask about "just the cats" or "just the dogs". Averaging every animal together will give us an answer, but it may not be a useful answer.

Sometimes important facts about samples get lost if we mix them with the rest of the population!

Data Scientists define grouped samples of datasets, breaking them up into sub-groups that may be helpful in their analysis.

Earlier, you learned how to define values in Pyret. We can define Numbers, Strings, Images, and even rows:

name = "Flannery"
age  = 16
logo = star(50, "solid", "red")
sasha= animals-table.row-n(0)

Let’s use this skill to define Tables…

We already know how to define values, and how to filter a dataset. So let’s put those skills together to define a grouped sample of the dogs in the shelter:

dogs  = animals-table.filter(is-dog)

The .filter method walks across each row in the table, and passes it to the is-dog function. If is-dog produces true, .filter adds it to a new table. Otherwise, it just silently moves on to the next row. Finally, we define the name dogs to be the table produced by .filter.

Investigate

A “kitten” is an animal who is a cat and who is young. How would you define a table of just kittens?

Turn to Grouped Samples from the Animals Dataset, and see what code will compute whether or not an animal is a kitten.
Can you fill in the code for the other grouped samples?
When you’re done, try out your solutions in the Grouped Samples Starter File.
Make a bar chart showing the distribution of sex in the kittens sample , by typing bar-chart(kittens, "sex").
Make bar charts showing the sex column for every grouped sample. Which one best represents the distribution of species for the whole population? Why?

Synthesize

How could we filter and sort a table?
How can we combine methods?

🔗Displaying Samples 20 minutes

Overview

Students revisit the data display activity, now using the samples they created.

Launch

Making grouped and random samples is a powerful skill, which allows us to dig deeper than just making charts or asking questions about a whole dataset. Now that we know how to make grouped samples, we can make much more sophisticated displays!

Let’s start with question: what’s the ratio of fixed to unfixed cats at the shelter? Let’s use the Data Cycle to get an answer, using our knowledge of grouped samples.

Ask Questions icon 🖼Show image This is an Arithmetic Question. We know it’s not a lookup question because there’s no ratio written somewhere in the table for us to read. Instead, we’ll have to count all the fixed cats and the unfixed cats, then compare the totals.

Consider Data icon 🖼Show image We know that we’ll need to count only the cats!, and can ignore everything else. And once we’ve picked the rows for cats, the only column we want is the fixed column. This is a huge hint that we’ll need to filter the dataset!

Analyze Data icon 🖼Show image We could use a bar-chart or a pie-chart to do this analysis, but since we care more about the ratio ("2x as many fixed as unfixed") than the count ("20 fixed vs. 10 fixed"), a pie chart is a better choice.We’ve decided what to make and we know which rows and columns we’re plotting, so the next step is to write the code!

Interpret Data icon 🖼Show image What did our displays tell us? In this case, we got a clear answer to our question. But perhaps that’s not the end of the story! We might have new questions about whether a higher percentage of dogs are spayed and neutered than cats, or whether it’s even possible to "fix" a tarantula. All of this belongs in our data story!

Investigate

Complete Displaying Data, using what you’ve learned about samples to make more sophisticated data displays.
Complete Data Cycle: Analyzing Categorical Data.

Synthesize

What connections do you see between the "Consider Data" and "Analyze Data" steps?
How do we know when we need to filter? How do we know when we don’t?

🔗Your Analysis flexible

Overview

Students apply their knowledge of table methods, defining table functions, and the Design Recipe to create grouped samples for their dataset.

Launch

Are there grouped samples that you’d like to explore in your own dataset? Here are a few examples, taken from some of the sample datasets:

In the RI Schools dataset, it might be good to create grouped samples for public v. charter schools
In the Movies dataset, it might be valuable to create grouped samples for modern movies, and analyze them separately from older movies.
In the US Presidents dataset, it could be useful to make a grouped sample for each political party.

Investigate

What grouped samples make sense for your dataset?

Sometimes a pair of fresh eyes is the best way to think about your work. Pair up so that everyone is working with someone from another group.
Talk with one another about your datasets and analysis thus far, then work together to come up with grouped samples you would like to explore.
Return to your research groups, and open to Samples from My Dataset.
Name these samples, and write the Pyret code to test an individual row from your dataset on
Turn to The Design Recipe, and use the Design Recipe to write the filter functions that you planned out on Samples from My Dataset. When the teacher has checked your work, type them into the Definitions Area and use the .filter method to define your new sample tables.

Synthesize

Have students share the grouped samples they created for their datasets. After each share-back, ask the class if they have suggestions for other possible grouped samples.

🔗Additional Exercises

Extra, blank design recipes are provided in the workbook

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.