According to the US Census Bureau, the average American household earned more than $45,000 in 2003 - more than 3x the poverty line that year. Can we conclude that only a small percentage of Americans were in poverty that year?
Take two minutes to write down what you think on Page 7.
Invite an open discussion for a few minutes, then give students time to write down what they think.
Open the Unit 4 Starter File, then click "Save a Copy" and then Run the program. Now that you are familiar with how tables organize data, it’s time to solve some problems with them. We already know how to evaluate an identifier once a program has been run: we just type the identifier into the Interactions Area and hit "Enter" to see the value. For example, we can type the identifier presidents or nutrition into the interactions window, and we see the table. There are some other identifiers defined here - what are their names?
You’ll notice that there’s a new table defined here as well, called countries. What columns are included in this table, and what do they tell us about each country?
The identifiers are a, b, and c, each of which is defined to be a different List.
Let’s take a look at one of these identifiers:
To make a list, we use square brackets and the list: constructor, followed by a comma-separated list of values. What is the type of a?
In the Interactions Area, try making a few lists for practice:
A list of all the days of the week
A list of first 10 even numbers
A list of your favorite colors
Mean, Median, and Mode
Mean, Median, and Mode
(Time 30 minutes)
We encounter quantitative, 1-dimensional data all the time. Sometimes we have a list of temperatures for the day, and we want to know what the average is. Maybe we want to split a list of players into two teams, or find the most common birthday in our group of friends. All of these involve taking 1-dimensional data and asking questions about it’s "center", but there are several different kinds of center.
Have your students come up with other questions involving "center".
There are 3 ways to measure the "center" of a list of data: mean, median and mode. One of the most important questions we can ask about a column of quantitative data is: what is the average value?
Use your favorite method of teaching the concept of averages.
We calculate the mean by adding up each element in the list, and dividing by the number of elements in that list.
For example, the mean of the list [list: 1, 4, 5, 8, 2] is calculated by (1 + 4 + 5 + 8 + 2) / 5, which evaluates to 4.
Open your workbooks to Page 8 and practice calculating the mean of each list of Numbers by hand. DO NOT fill in the median and mode columns yet, even if you know how!
Notice that calculating the mean requires being able to add and divide, so the mean only makes sense for quantitative data. For example, the mean of a list of Presidents doesn’t make sense. Same thing for a list of zipcodes: even though we can add and divide the numbers of zipcodes, the output doesn’t correspond to some "center" zipcode.
It would be nice if Pyret had a way for us to compute the mean of any List. What would that function be called?
Get students to give suggestions as to what the mean function should be called.
Type mean([list: 1, 2, 3]]). What does this give us? Why?
Type each of the following programs into the interactions window, to check your work:
2, which is the mean of the numbers 1, 2 and 3.
This function takes a List of Numbers as input, and gives us the mean (a Number) as output. Write the contract for this function into your Contracts page as:
Notice that we use List<Number> to descibe "lists of numbers"!
The second measure of center is the median. The median is the "middle" value of a list, or a value that separates the top half of a list from the bottom half.
As an example, consider this list:
Here 2 is the median, because it separates the "top half" (all values greater than 2, which is just 3), and the "bottom half" (all values less than or equal to 2).
If students are not already familiar with median, we recommend the following
"pencil and paper algorithm" for median finding over a list:
Cross out the highest number in the list.
Cross out the lowest number in the list.
Repeat these steps until there is only one number left in the list. This number is the median. If there are two numbers left, take the mean of those numbers, for reasons explained in the next point.
For lists that have an even number of elements, this question is a little trickier.
There is no one number in the list separating the top half and the bottom half, because there are only 2 numbers! In this case, we take the mean of the two middle numbers. So here, the median is (2 + 3) / 2 which evaluates to 2.5.
If students are entirely unfamiliar with median, it may help them to work through several more examples of lists with even/odd sizes, before they return to the workbook assignment.
Return to your workbook and complete the column for median values.
Pyret has a function to compute the median of a list as well, with the contract:
# median :: List<Number> -> Number
Test your answers in the median column with the median function.
The third and last measure of center is the mode. The mode of a list is the element that appears most often in the list.
Here the mode is 2, since 2 appears more than any other number.
What is the mode of this list?
This list has multiple modes: 1, 4, because they appear equally often, and more than other elements in the list.
Complete the final column, by calculating the mode for each example list.
For the examples in which a list has multiple modes, students should write in the smallest mode because that is the behavior of the mode function in Pyret, which can only return one Number, as opposed to modes which returns a List<Number>.
There are two different functions provided by Pyret: mode, and modes.
Type each of these lines of code into the interactions window. What’s different about these two functions, when applied to the same List?
mode will return the smallest mode, which is a Number, but modes will return a List<Number> containing all of the modes. Their contracts are:
Have students add these two contracts to their contract list.
Note that later, we will reveal that mode and modes
can be used on Lists of Strings as well.
(Time 15 minutes)
In the last lesson, you learned how to extract a column from a table, turning it into a list. Now let’s use that knowledge to start asking questions about some of our datasets. Suppose we wanted to know what the average number of calories are on the menu. We’d need to first extract that column from the table, and then take the mean of the resulting list. We can write this using identifiers:
...or as a single expression, by combining the extract expression with mean:
Which style do you like better? Why?
Turn to Page 9 in your workbooks and complete all of the questions. You may have to do some programming to answer some of these!
This exercise gives students more practice using Pyret to compute mean/median/mode. Students will also see first hand that calculating a median of medians of many lists is not necessarily the same as the median of a larger list.
After all the students complete this workbook page, discuss the implications of this for the countries table. Taking the median of the median-life-expectancy column is an inaccurate measure of the median life expectancy of humans all over the world. The most accurate measure of median human life expectancy would require a table with every human as a row.
The punchline of this portion of the exercise is: don’t take the median of medians.
Which Measure is Best?
Which Measure is Best?
(Time 15 minutes)
By now, you may have noticed that the mean, median, and mode of a data set are rarely the same value. So, which one should we use, and when?
For each of the following example lists, discuss with the students what the strengths/weaknesses of each measurement.
Imagine that a math teacher is tracking their students’ grades. Here are the students’ grades on the first test.
Which measure of center gives the best indication of how the class did?
Notice that the mean is well over 75, even though most of the students scored below 70! The mean here is more affected by outliers: those two 100s are bringing the average up. This is because the mean is calculated using every value in the list, while the median is calculated with at most 2 values from the list.
In general, here are some guidelines for when to use one measurement over the other:
If the data is unlikely to have values occurring multiple times (like with decimals, or with grades), do not use mode.
If the data is more "coarse grained", meaning the data is quantitative but there are only a small number of possible values each entry can take, then mode will be useful.
If the data is going to have lots of outliers, then median gives a better estimate of the center than mean.
Suppose we want to look at how much sodium is in our menu. Would taking the mean or median be more accurate? Why or why not?
Suppose we want to know how long the average person on Earth lives. Would taking the mean of median-life-expectancy give us the answer? Why or why not?
Make sure to save your work. Hit the Save button in the top left. This will save your program in the code.pyret.org folder within your Google Drive.
Use mean, median and mode with the household-income list. Do you think the "average household income" is still a good measure to use when talking about poverty? Why or why not? Take two minutes to write your answer on Page 7.
Have the class debrief their findings. Did anyone’s mind change after looking at the data? Is the data convincing or not? Why or why not?
(Time 20 minutes)
Take 10 minutes to answer question 4 in your Project Report.
Congratulations! You’ve just learned the basics of the Pyret programming language, and how to use that language to answer a data science question. Throughout this course, you’ll learn new and more powerful tools that will allow you to answer more complex questions, and in greater detail.
If your students are working in pairs/groups, make sure that each student has access to a version of the program. The student who saved the program to their Google Drive can share their program with anyone by hitting the Publish button in the top left, choosing "Publish a new copy", then clicking the "Share Link" option. This will allow them to copy a link to the program, then send to their partners in an email/message.