Unit 1:   Intro to Computational Data Science

imageUnit 1Intro to Computational Data Science
Unit Overview

Students are introduced to the Animals Dataset, and learn about Table, Categorical and Quantitative data. They begin to program as well, and learn about Numbers, Strings, Types, Operations, Contracts, and Function Application.

English

add translation

Product Outcomes:
  • Students add columns to a Pyret table

Standards and Evidence Statements:

Standards with prefix BS are specific to Bootstrap; others are from the Common Core. Mouse over each standard to see its corresponding evidence statements. Our Standards Document shows which units cover each standard.

  • 6.SP.1-3: The student develops an understanding of statistical variability.

    • Recognize a statistical question as one that anticipates variability in the data related to the question and accounts for it in the answers

  • Data 3.1.3: Explain the insight and knowledge gained from digitally processed data by using appropriate visualizations, notations, and precise language.

    • Visualization tools and software can communicate information about data.

    • Tables, diagrams, and textual displays can be used in communicating insight and knowledge gained from data.

  • BS-PL.1: The student is familiar with declaring values and applying built-in functions using the programming language

    • representing (numeric, string, boolean, image, etc) values in the programming language

    • interpreting a function application and identifying its arguments

  • BS-PL.2: The student is comfortable using and writing Contracts for built-in functions

    • representing a function’s input and output using a contract

    • using a function by refering to its contract

Length: 95 Minutes
Glossary:
  • arguments: the inputs to a function; expressions for arguments follow the name of a function

  • categorical data: values or classifications that do not follow a numerical progression, and are not subject to the laws of arithmetic

  • contract: a statement of the name, domain, and range of a function

  • data row: part of a table showing information about a particular individual in a sample

  • data science: The study of using data to answer questions about the world

  • definitions area: the text box in the Editor, where definitions for values and functions are written

  • domain: the type of data that a function expects

  • editor: software in which you can write and evaluate code

  • error message: information from the computer about errors in code

  • function: a mathematical object that consumes inputs and produces an output

  • header: the titles of each column of a table, usually shown at the top

  • interactions area: the text box in the Editor, where we enter expressions to evaluate

  • programming language: a set of rules for writing code that a computer can evaluate

  • quantitative data: number values for which arithmetic makes sense (e.g. - taking the average of a column)

  • range: the type of data that a function produces

Materials:
    Preparation:
    • Computer for each student (or pair), with access to the internet

    • Student workbooks, and something to write with

    Types

    Functions

    Values



    Introduction

    Overview

    Learning Objectives

      Evidence Statementes

        Product Outcomes

          Materials

            Preparation

            • Computer for each student (or pair), with access to the internet

            • Student workbooks, and something to write with

            Introduction (Time 10 minutes)

            • IntroductionTake a minute to look at the opening questions we have prepared for you, and choose a topic that interests you.
              • Once you’ve selected your topic, choose a question you’d like to answer.

              • Spend one minute discussing your answer, and explaining why you answered the way you did. Do other students agree with you?

              • What information could you collect, to determine if your answer is right or not?

              Have students work in groups (no larger than 4), with each group choosing an Opening Question (or writing their own). After they’ve had time to discuss, have a few students share back what they talked about.

            • What’s the greatest movie of all time? Is Climate Change real? Who is the best quarterback? Is Stop-and-Frisk racially biased? These questions quickly turn into a discussion about data - how you assess it and how you interpret the results. In this course, you’ll learn how to use data to ask and answer questions like this. The process of learning from data is called Data Science. Data science techniques are used by scientists, business people, politicians, sports analysts, and hundreds of other different fields to ask and answer questions about data.

              You can motivate relevance of data science by using additional examples that relate to student interests. Here are a few:

            • We’ll use a programming language to investigate these questions. Just like any human language, programming languages have their own vocabulary and grammar that you will need to learn. The language you’ll be learning for data science is called Pyret.

              Set expectations for the class. This course is an introduction to data science, so some questions will be out of reach!

            The Animals Dataset

            Overview

            Learning Objectives

              Evidence Statementes

                Product Outcomes

                  Materials

                    Preparation

                    • Computer for each student (or pair), with access to the internet

                    • Student workbooks, and something to write with

                    The Animals Dataset (Time 20 minutes)

                    • The Animals DatasetOpen the Animals Spreadsheet in a new tab, or turn to Page 2. This is some data from an animal shelter, listing animals that have been adopted. We’ll be using this as an example throughout the course, but you’ll be applying what you learn to a dataset you choose as well.

                      • Turn to Page 3 in your Student Workbook. What do you notice about this dataset? Write down your observations in the first column.

                      • Sometimes, looking at data sparks questions. What do you wonder about this dataset? Write down your questions in the second column.

                      • There’s a third column, called "Question Type" - we’re going to return to that later, so you can ignore it for now.

                      • If you look at the bottom of the spreadsheet file, you’ll see that this document contains multiple sheets called "pets" and "README". Which sheet are we looking at?

                      • Each sheet contains a table. For our purposes, we only care about the animals table on the "pets" sheet.

                      Each student (or pair of students) should have a Google Account. Have students share back their noticings (statements) and wonderings (questions), and write them on the board.

                    • Data Science is all about using a smaller sample of data to make predictions about a larger population. It’s important to remember that tables are only a sampling of a larger population: this table describes some animals, but obviously it isn’t every animal in the world! Still, if we took the average age of the animals at this particular shelter, it might tell us something about the average age of animals in other shelters.

                    • There are two different kinds of data that come up in Data Science: Categorical and Quantitative. Categorical Data is used to classify, not measure. Categories aren’t subject to the laws of arithmetic. For example, we couldn’t ask if "cat" is more than "lizard", and it doesn’t make sense to find the "average ZIP code" in a list of addresses. We use Categorical Data to ask "which one"? When you look at a weather forecast, temperature is quantitative but whether it’s snowing or raining is categorical.

                      "Species" is a categorical variable, because we can ask questions like "which species does Mittens belong to?" What are some other categorical variables you see in this table?

                    • Quantitative Data is used to measure an amount of something, or to compare two pieces of data to see which is less or more. If we want to ask "how much" or "which is most", we’re talking about Quantitative Data.

                      What kind of data - categorical or quantitative - are the following columns?

                      • Hair color

                      • Age

                      • ZIP Code

                      • Date

                    • Sometimes it can be tricky to figure out if data is categorical or quantitative, because it depends on how that data is being used!

                      For each of the following questions, determine whether the data being used is categorical or quantitative.

                      • We’d like to sort a list of phone numbers by area code.

                      • We’d like to find out the average price of cars in a lot.

                      • We’d like to find out the most popular color for cars.

                      • We’d like to find out which puppy is the youngest.

                      • We’d like to find out which cats have been fixed.

                      • We want to know which people have a ZIP code of 02907.

                      The big idea here is that some data can be both categorical and quantitative – what matters is how we use it!

                    • On Page 3 in your Student Workbook, fill in the blanks for Questions 1 and 2.

                    • Open up the Animals Starter File in a new tab. Click "Connect to Google Drive" to sign into your Google account, and then click the "Save as" button. This will save a copy of the file into your own account, so that you can make changes and retrieve them later.

                    • image This screen is called the editor, and it looks something like the diagram you see here. There are a few buttons at the top, but most of the screen is taken up by two large boxes: the Definitions Area on the left and the Interactions Area on the right.

                      For now, we will only be writing programs in the Interactions area.

                      The Definitions Area is where programmers define values and functions that they want to keep, while the Interactions Area allows them to experiment with those values and functions. This is like writing function definitions on a blackboard, and having students use those functions to compute answers on scrap paper.

                    • The first few lines in the Definitions Area tell Pyret to import files from elsewhere, which contain tools we’ll want to use for this course. We’re importing a file called Bootstrap:Data Science, as well as files for working with Google Sheets, tables, and images:  

                    • After that, we see a line of code that defines shelter-sheet to be a spreadsheet. This table is loaded from Google Drive, so now Pyret can see the same spreadsheet you do. After that, we see the following code:   The first line (starting with #) is called a comment. Comments are notes for humans, which the computer ignores. The next line defines a new table. We call it animals-table, and we load it from the shelter-sheet defined above. You can see the names we are giving to each of the columns, called name, species, gender, age, fixed, legs, pounds and weeks. (We could use any names we want for these columns, but it’s always a good idea to pick names that make sense!)

                      Have students look back at the column names in the Google Sheet, and in the load-table function. Point out that they refer to the same columns, even though they have different names!

                    • Click "Run", and type animals-table into the Interactions Area to see what the table looks like in Pyret. Is it the same table you saw in Google Sheets? What is the same? What is different?

                      • In Data Science, every table is composed of cells, which are arranged in a grid of rows and columns.

                      • Most of the cells contain data, but the first row and first column are special.

                      • The first row is called the header row, which gives a unique name to each variable (or "column") in the table.

                      • The first column in the table is the identifier column, which contains a unique ID for each row. Often, this will be the name of the people or places in the table, or sometimes just an ID number.

                      How many variables are listed in the header row? What are they called? What is being used for the identifier column in this dataset?

                    • After the header, Pyret tables can have any number of data rows. Each data row has values for every column variable (nothing can be left empty!). A table can have any number of data rows, including zero, as in the table below:

                      name

                      species

                    Values and Operators

                    Overview

                    Learning Objectives

                    • Students learn about different types of values, and operators on those values.

                    Evidence Statementes

                      Product Outcomes

                      • Students add columns to a Pyret table

                      Materials

                        Preparation

                          Values and Operators (Time 20 minutes)

                          • Values and OperatorsPyret lets us use many different kinds of data. In this table, for example, you can see Numbers (the number of legs each animal has), Strings (the species of the animal), and Booleans (whether it is true or false than animal is fixed). Let’s get some practice playing with these Datatypes.

                            With your partner(s), go through the questions on Page 4. Talk about the answers to each question, and write down your answers when required.

                            Give students time to experiment, and then debrief as a group.

                          • By now you’ve discovered a number of important things about our programming language:

                            • Numbers and Strings evaluate to themselves.

                            • Anything in quotes is a String, even something like "42".

                            • Strings must have quotation marks on both sides.

                            • Operators like +, -, *, and / need spaces around them.

                            • Any time there is more than one operator being used, Pyret requires that you use parentheses.

                            • Types matter! We can add two Numbers or two Strings to one another, but we can’t add the Number 4 to the String "hello".

                          • You’ve also seen a few error messages here. Error messages are a way for Pyret to tell you what went wrong, and are a really helpful way of finding mistakes! You’ve seen errors for missing spaces around operators, missing quotation marks, and mismatched operators without parentheses. What other errors do you think there are?

                            • In 6 / 0 we know that you can’t divide any number by 0! In this case, Pyret obeys the same rules as humans, and gives an error.

                            • An unclosed quotation mark is a problem, but so is an unmatched parentheses. For example, you’ll get an error message if you type (2 + 2.

                          • As you’ve seen, operators like + and - behave exactly the way in Pyret that they do in math class: they add and subtract Numbers, and produce new Numbers! But what about operators like greater-than and less-than-or-equal?

                            • To identify if an animal’s gender is "male", we need to know if the value in that column is equal to the string "male".

                            • To sort the table by age, we need to know if one animal’s age is less than another’s.

                            • To filter the table to show only young animals, we need to know if an animal’s age is less than 2.

                          • Those come in handy when comparing quantitative data, and Pyret has them, too: Equals (==), less-than (<), greater-than (>), as well as greater-than-or-equal (>=) and less-than-or-equal (<=).

                            With your partner(s), complete Page 5. Talk about the answers to each question, and write down your answers when required.

                            Have students share back. Point out that all the same rules about parentheses, spaces, and types still applies!

                          • By using and and or, we can combine tests. For example, we might want to ask if a character in a videogame has run out of health points and if they have any more lives. We might want to know if someone’s ZIP Code puts them in Texas or New Mexico. When you go out to eat at a restaurant, you might ask what items on the menu have meat and cheese. We’ll use these Boolean operators in a lot of our Data Science work later on.

                            Have students play "true or false", in which they stand if you say something true, and sit if you say something false. Start simple ("I am wearing a hat"), and gradually get complex ("I am wearing a hat, and I am standing on one leg").

                          Applying Functions

                          Overview

                          Learning Objectives

                          • Students learn about Contracts, and how they are used in function applications

                          Evidence Statementes

                            Product Outcomes

                              Materials

                                Preparation

                                  Applying Functions (Time 30 minutes)

                                  • Applying FunctionsSo now you know about Numbers, Strings, Booleans and Operators - all of which behave just like they do in math. But what about functions? You may remember functions from algebra: .

                                    • What is the name of this function?

                                    • What will the expression evaluate to? ?

                                    • The values that we give to a function are called its arguments. How many arguments does expect?

                                    "Arguments" are the values passed into a function. This is subtly different from variables, which are the placeholders that get replaced with those values!

                                  • Pyret has lots of built-in functions, which we can use to write more interesting programs. They also work pretty much the same way they do in algebra! Let’s explore one of Pyret’s function, called num-sqrt. Type this line of code into the interactions area and hit Enter.  

                                    • What is the name of this function?

                                    • What did the expression num-sqrt(16) evaluate to?

                                    • Does the num-sqrt function produce Numbers? Strings? Booleans?

                                    • How many arguments does num-sqrt expect?

                                  • Of course, functions on a computer can do a lot more than make Numbers! Type this line of code into the interactions area and hit Enter.  

                                    • What is the name of this function?

                                    • What did the expression evaluate to?

                                    • How many arguments does triangle expect?

                                    • Does the triangle function produce Numbers? Strings? Booleans?

                                  • You’ve just created an example of a new Datatype, called an Image.
                                    • What are the types of the arguments triangle was expecting?

                                    • How does this output relate to the inputs?

                                    • Try making different triangles. Change the size and color! Try using "outline" for the second argument.

                                  • The triangle function consumes a Number and two Strings as input, and produces an Image. As you can imagine, there are many other functions for making images, each with a different set of arguments. For each of these functions, we need to keep track of three things:

                                    • Name - the name of the function, which we type in whenever we want to use it

                                    • Domain - the data we give to the function (names and Types!), written between parentheses and separated by commas

                                    • Range - the type of data the function produces

                                    Domain and Range are Types, not specific values. As a convention, we capitalize Types and keep names in lowercase. triangle works on many different Numbers, not just the 20 we used in the example above!

                                  • Can you see what is wrong with each of these expressions? Try copying them into Pyret, one at a time, and reading the error messages aloud.

                                    • triangle(20, "solid", "red"

                                    • triangle(20 "solid" "red")

                                    • triangle("20", "solid", "red")

                                    • triangle(20, "solid", "red", "striped")

                                    Explanations for each error message:

                                    • Pyret needs both parentheses around the arguments, so that it knows exactly where the expression begins and ends.

                                    • Arguments must be separated with a comma.

                                    • triangle expects the first argument to be a Number. "20" is a String.

                                    • triangle takes exactly three arguments. Functions must be called with the correct number of arguments.

                                  • These three parts make up a contract for each function. Let’s take a look at the Name, Domain, and Range of num-sqrt and triangle:   The first part of a contract is the function’s name. In this example, our functions are named num-sqrt and triangle.

                                    The second part is the Domain, or the names and types of arguments the function expects. triangle has a Number and two Strings as variables, representing the length of each side, the mode, and the color. We write name-type pairs with double-colons, with commas between each one.

                                    Finally, after the arrow goes the type of the Range, or the function’s output, which in this case is Image.

                                    Turn to the back of your workbook. We’ve given you the contracts for many Image-producing functions (as well as quite a few others!). Try using some of these contracts to make shapes.

                                  • Turn to the back of your workbook, and get some practice reading and using contracts! Make sure you try out the following functions:

                                    • text

                                    • circle

                                    • ellipse

                                    • star

                                  • Here’s the contract for another new function. Can you figure out how to use it in the Interactions Area?  

                                    The string s is printed n times, written as a single String.

                                  • Here’s an example of another function. Type it into the Interactions Area to see what it does. Can you figure out the contract, based on the example?  

                                    The contract is string-contains :: (s :: String, search :: String) -> Boolean. Be sure the names students come up with for the variables make sense!

                                  • Can you think of a situation when you’d want to consume a Table, and use that to produce an image? Have you ever seen any pictures created from tables of data?

                                    Give the class a minute to brainstorm.

                                  • The library included at the top of the file includes some helper functions that are useful for Data Science, which we will use throughout this course. Here is the contract for a function that does just that, and an example of using it:  
                                    • What is the Name of this function?

                                    • How many inputs are in its Domain?

                                    • Type the example into the Interactions Area.

                                    • What comes back?

                                  • In the Interactions Area, type pie-chart(animals-table, "species") and hit Enter. What happens? What happens when you hover over a slice of the pie? These plots are interactive! This allows us to experiment with the data before generating the final image.

                                    Hovering over a pie slice or bar reveals the value or percentage of the whole, and the label.

                                  • The function pie-chart consumes a Table of data, along with the name of a categorical column you want to display. The computer will go through the column, counting the number of times that each value appears. It will then create a pie slice for each value, with the size of the slice being the percentage of times it appears. In this example, we used our animals-table table as our dataset, and made a pie chart showing the distribution of species across the shelter.

                                  • Here is the contract for another function:  

                                    Use this function to make a bar chart showing the number of each gender across the shelter.

                                  • Do you think we could use any column? What about a quantitative column?

                                    Experiment with these two functions, passing in different column names for the label and data columns. If you get an error message, read it carefully! What do you think are the rules for what kinds of columns can be used by bar-chart and pie-chart?

                                  (Optional) Exploring other plots

                                  Overview

                                  Learning Objectives

                                    Evidence Statementes

                                      Product Outcomes

                                        Materials

                                          Preparation

                                            (Optional) Exploring other plots (Time 10 minutes)

                                            • (Optional) Exploring other plotsOPTIONAL: there are lots of other functions, for all different kinds of charts and plots. Even if you don’t know what these plots are for yet, see if you can use your knowledge of Contracts to figure out how to use them. What do you think they mean?

                                              • How many columns are needed to make a histogram?

                                              • Are histograms made from quantitative or categorical columns?

                                              • What do you think a histogram tells us about the data?

                                              • How many columns are needed to make a box-plot?

                                              • Are box-plotss made from quantitative or categorical columns?

                                              • What do you think a box-plot tells us about the data?

                                              • Can you answer the same questions for other plots?

                                            • Sometimes we want to summarize a categorical column in a Table, rather than a pie chart. For example, it might be handy to have a table that has a row for dogs, cats, lizards, and rabbits, and then the count of how many of each type there are.

                                            • Pyret has a function that does exactly this! Try typing this code into the Interactions Area:   What did we get back? count is a function that consumes a table and the name of a categorical column, and produces a new table with exactly the columns we want: the name of the category and the number of times that category occurs in the dataset. What are the names of the columns in this new table?

                                              • Use the count function to make a table showing the number of animals of each gender at the shelter.

                                              • Use the count function to make a table showing the number of animals that are fixed (or not) at the shelter.

                                            • Sometimes the dataset we have is already summarized in a table like this, and we want to make a chart from that. In this situation, we want to use the raw values in the summary table as-is: the size of the pie slice or bar is taken directly from the count column, and the label is taken directly from the value column. When we want to use the raw values as-is, we have another function:  

                                              Type this in and try it out. How would you make a bar chart based on the raw data?

                                            Closing

                                            Overview

                                            Learning Objectives

                                              Evidence Statementes

                                                Product Outcomes

                                                  Materials

                                                    Preparation

                                                      Closing (Time 5 minutes)

                                                      • ClosingToday you’ve learned about quantitative and categorical data. You’ve learned about Numbers, Strings, Booleans, and Images. You’ve learned about operators and functions, and how they can be used to make shapes, visually display data, and even transform tables!

                                                      • One of the other skills you’ll learn in this class is how to diagnose and fix errors. Some of these errors will be syntax errors: a missing comma, an unclosed string, etc. All the other errors are contract errors. If you see an error and you know the syntax is right, ask yourself these two questions:

                                                        • What is the function that is generating that error?

                                                        • What is the contract for that function?

                                                        • Is the function getting what it needs, according to its Domain?

                                                      • By learning to use values, operations and functions, you are now familiar with the fundamental concepts needed to write simple programs. You will have many opportunities to use these concepts in this course, by writing programs to answer data science questions.

                                                        Make sure to save your work, so you can go back to it later!