Unit 3:   Exploring Datasets

imageUnit 3Exploring Datasets
Unit Overview

Students learn to prepare for analyzing a new dataset by considering logical subsets of that data. They begin with the Animals Dataset, and then apply what they’ve learned to a dataset of their own choosing. In the process, they practice using the Design Recipe to create filter functions, and come up with questions they wish to explore. The focus of this unit is categorical variables, and by the end students will know how to display categorical variables.

English

add translation

Product Outcomes:
  • Students choose a dataset they are interested in

Standards and Evidence Statements:

Standards with prefix BS are specific to Bootstrap; others are from the Common Core. Mouse over each standard to see its corresponding evidence statements. Our Standards Document shows which units cover each standard.

  • Data 3.1.3: Explain the insight and knowledge gained from digitally processed data by using appropriate visualizations, notations, and precise language.

    Length: 95 Minutes

    Materials:
      Preparation:
      • Computer for each student (or pair), with access to the internet

      • Student workbooks, and something to write with

      Types

      Functions

      Values

      Number

      num-sqrt, num-sqr

      4, -1.2. 2/3

      String

      string-repeat, string-contains

      "hello" "91"

      Boolean

      ==, <, >, <=, >=, string-equal

      true false

      Image

      triangle, circle, star, rectangle, ellipse, square, text, overlay, bar-chart, pie-chart, bar-chart-raw, pie-chart-raw

      imageimage

      Table

      count, .row-n, .order-by, .filter



      Review

      Overview

      Learning Objectives

        Evidence Statementes

          Product Outcomes

            Materials

              Preparation

              • Computer for each student (or pair), with access to the internet

              • Student workbooks, and something to write with

              Review (Time 10 minutes)

              • ReviewOpen your saved animals-dataset file. You should have several functions defined:

                • is-fixed

                • gender

                • is-cat

                • is-young

                If you didn’t have a chance to type them in from your workbook, make sure you do!

                Take 10m and write a function is-dog, then type it into the Definitions Area.

              Making Subsets

              Overview

              Learning Objectives

                Evidence Statementes

                  Product Outcomes

                    Materials

                      Preparation

                      Making Subsets (Time 20 minutes)

                      • Making SubsetsA lot of Data Science involves making predictions based on data. Suppose we want to survey Americans and try to predict who our next president will be. Obviously, it would take too long to ask everyone who they’re voting for! Instead, pollsters try to take a sample of Americans, and generalize the opinion of the sample to estimate how Americans as a whole feel.
                        • Would it be problematic to only call voters who are registered Democrats? To only call voters under 25? To only call regular churchgoers? Why or why not?

                        • Suppose we are interested how in women feel about a particular issue. Should we still make sure we’re surveying men, too? Why or why not?

                      • As you can see, sampling is a complicated issue! Depending on the question we want to answer, sometimes it makes sense to work with an entire dataset, and sometimes it makes sense to carve out a subset of the data (e.g. - calling only women). In this Unit, we’ll be practicing what you learned about writing functions, and then using the .filter method to create subsets.

                      • Data Scientists don’t always know what the interesting questions are right away. So whenever they explore a dataset, one of the first things do is define some logical subsets, just to have them handy later. Someone looking at our animals dataset might want to consider "just the lizards" or "just males". This also helps them reason about the data, without being biased by a particular question.

                        A "kitten" is an animal whose species == "cat" and whose age < 2. How would you make a subset of just kittens? Turn to Page 14, and see what code will compute whether or not an animal is a kitten. Can you fill in the code for the other subsets?

                      • Sometimes we want to create a table that’s just a random sample of an existing table. Type the following code into the Definitions Area (left-hand side of your screen), and click "Run".  
                        • What do you get when you evaluate tiny-sample in the Interactions Area? small-sample?

                        • What is the contract for random-rows? What does the function do?

                      • We already know how to define values, and how to filter a dataset. So let’s define some subsets, in addition to the random samples we just made:  

                      • We can make a pie-chart showing how many of each species is in the shelter, by writing  

                        Which of our subsets do you think will give us the most accurate approximation of the original chart?   Compare the charts you get from each of these. Which one is the most representative of the whole population? Why?

                      Choose Your Dataset

                      Overview

                      Learning Objectives

                        Evidence Statementes

                          Product Outcomes

                          • Students choose a dataset they are interested in

                          Materials

                            Preparation

                            Choose Your Dataset (Time 20 minutes)

                            Exploring Your Dataset

                            Overview

                            Learning Objectives

                              Evidence Statementes

                                Product Outcomes

                                • Students choose a dataset they are interested in

                                Materials

                                  Preparation

                                  Exploring Your Dataset (Time 40 minutes)

                                  • Exploring Your Dataset
                                    • Look at the spreadsheet for your data. What do you notice? What do you wonder? Complete Page 15, making sure to have at least two Lookup Questions, two Compute Questions, and two Relate Questions.

                                    • In the Definitions Area, use random-rows to define at least three tables of different sizes: tiny-sample, small-sample, and medium-sample.

                                    • In the Definitions Area, use .row-n to define at least three values, representing different rows in your table.

                                    • Take a minute to think about subsets that might be useful for your dataset. Name these subsets and write the Pyret code to test an individual row from your dataset on Page 16.

                                    Have students share back.

                                  • Turn to Page 17, and use the Design Recipe to write the filter functions that you planned out on Page 16. When the teacher has checked your work, type them into the Definitions Area and use the .filter method to define your new subset tables.

                                  • Choose one categorical column from your dataset, and try making a bar or pie-chart for the whole table. Now try making the same display for each of your subsets. Which is most representative of the entire column in the table?

                                    Have students share back. Encourage students to read their observations aloud, to make sure they get practice saying and hearing these observations.

                                  Closing

                                  Overview

                                  Learning Objectives

                                    Evidence Statementes

                                      Product Outcomes

                                        Materials

                                          Preparation

                                          Closing (Time 5 minutes)

                                          • ClosingCongratulations! You’ve explored the Animals dataset, formulated your own and begun to think critically about how questions and data shape one another. For the rest of this course, you’ll be learning new programming and Data Science skills, practicing them with the Animals dataset and then applying them to your own data.

                                            Have students share which dataset they chose, and pick one question they’re looking at.