Unit 6:   Advanced Analysis

imageUnit 6Advanced Analysis
Unit Overview

Students continue practicing the Design Recipe, and learn how to build and transform columns in a table. They also learn how to chain methods together, and define more sophisticated subsets. Finally, they consider the concept of trust and testing - how do we know if a particular analysis is trustworthy?

English

add translation

Product Outcomes:
  • Students define functions that sort, filter, or extend the animals table

Standards and Evidence Statements:

Standards with prefix BS are specific to Bootstrap; others are from the Common Core. Mouse over each standard to see its corresponding evidence statements. Our Standards Document shows which units cover each standard.

  • Data 3.1.1: Use computers to process information, find patterns, and test hypotheses about digitally processed information to gain insight and knowledge. [P4]

    • BS-DR.1: The student is able to translate a word problem into a Contract and Purpose Statement

      • BS-DR.2: The student can derive test cases for a given contract and purpose statement

        • BS-DR.4: The student can solve word problems that involve data structures

          • BS-PL.3: The student is able to use the syntax of the programming language to define values and functions

            Length: 70 Minutes

            Materials:
              Preparation:

                Types

                Functions

                Values

                Number

                +, -, *, /, num-sqrt, num-sqr

                4, -1.2. 2/3

                String

                string-repeat, string-contains

                "hello" "91"

                Boolean

                true false

                Image

                triangle, circle, star, rectangle, ellipse, square, text, overlay, bar-chart, pie-chart, bar-chart-raw, pie-chart, raw, histogram

                imageimage

                Table

                count, .row-n, .order-by, .filter, mean, median, mode



                Review

                Overview

                Learning Objectives

                  Evidence Statementes

                    Product Outcomes

                      Materials

                        Preparation

                          Review (Time 15 minutes)

                          • ReviewTake a minute to look back at the opening questions you saw at the beginning of the class, and choose another one that interests you.

                            Using what you know now, what information would you need to collect in order to answer it? What subsets would you need to create? What analysis would you need to perform?

                            Debrief as a class.

                            • What kinds of displays and charts have you learned about so far?

                            • What does each kind of display tell us about a dataset?

                            • When would you use each kind of display?

                            Spend some time on this - let students discuss amongst themselves, and facilitate as necessary.

                          Chaining Methods

                          Overview

                          Learning Objectives

                          • Students learn the syntax for chaining methods together

                          Evidence Statementes

                            Product Outcomes

                              Materials

                                Preparation

                                  Chaining Methods (Time 30 minutes)

                                  • Chaining MethodsTable methods can be chained together, so that we can build, filter and order a Table. For example:   This code takes the animals-table, and builds a new column. According to our Contracts Page, .build-column produces a new Table, and that’s the Table whose .filter method we use. That method produces yet another Table, and we call that Table’s order-by method. The Table that comes back from that is our final result.

                                    Suggestion: use different color markers to draw nested boxes around each part of the expression, showing where each Table came from.

                                  • It can be difficult to read code that has lots of method calls chained together, so we can add a line-break before each "." to make it more readable. Here’s the exact same code, written with each method on its own line:  

                                  • Suppose we want to build a column and then use it to filter our table. If we use the methods in the wrong order (trying to filter by a column that doesn’t exist yet), we might wind up crashing the program. Even worse, the program might work, but produce results that are incorrect!

                                  • How well do you know your table methods? Complete Page 36 and Page 37 in your Student Workbook to find out.

                                    Have students discuss their answers.

                                  Confirming Analysis

                                  Overview

                                  Learning Objectives

                                  • Students learn how to define functions using Table Plans

                                  Evidence Statementes

                                    Product Outcomes

                                    • Students define functions that sort, filter, or extend the animals table

                                    Materials

                                      Preparation

                                      Confirming Analysis (Time 20 minutes)

                                      • Confirming AnalysisData Analysis is often used to make predictions based on some sample data. For example, we might look at the Animals Dataset and try to make predictions about other animal shelters based on that sample. But if the sample dataset doesn’t represent the full population, those predictions can be wrong - and sometimes, really really wrong!

                                        • Uber and Google are making self-driving cars, which use artificial intelligence to interpret sensor data and make predictions about whether a car should speed up, slow down, or slam on the brakes. This AI is trained on a lot of sample data, which it learns from. What might be the problem if the sample data only included roads in California?

                                        • Law enforcement in many towns has started using facial-recognition software to automatically detect whether someone has a warrant out for their arrest. A lot of facial-recognition software, however, has been trained on sample data containing mostly white faces. As a result, it has gotten really good at telling white people apart, but often can’t tell the difference between people who aren’t white. Why might this be a problem?

                                        • Why might it be a bad thing to only test medicines only on men (or only on women), before prescribing them to the general public?

                                      • A good Sample Table should be representative of the population, and relevant to what’s being analyzed.

                                        • At least the columns that matter - whether we’ll be ordering or filtering by those columns.

                                        • A good Sample Table has enough rows to be a representative sample of the dataset. If our dataset has a mix of dogs and cats, for example, we want at least one of each in this table.

                                        • A good Sample Table has rows in mostly random order, so that we’ll notice if our analysis winds up sorting them.

                                      • Sample Tables can also be used to verify that a certain analysis is correct. For example: suppose you’ve been given a function that is supposed to filter a table and show only the cats. If you test it on a Sample Table that only has cats to begin with, will that tell you whether or not the function works?

                                        You’ll need a table with cats and non-cats.

                                      • Suppose you have a function that takes in a table of animals and shows only the kittens. What would your Sample Table need to have in order to verify this function?

                                        You’ll need a table with cats and non-cats, as well as cats under the age of 2.

                                      • Suppose you have a function that takes in a table of animals and shows only the kittens, sorted in ascending order by weight. What would your Sample Table need to have in order to verify this function?

                                        You’ll need a table with cats and non-cats, as well as cats under the age of 2, with the rows ordered randomly.

                                      • Turn to Page 38 in your student workbook. On each page, you’ve been given a function called fixed-cats and a description of what it claims to do.

                                        List the names of the animals that you would use in a Sample Table to verify whether the function works as-advertised. When you’ve finished, open the Trust-but-Verify Starter File. There are three versions of fixed-cats here. Are they all correct? If not, which ones are broken?

                                        Debrief with the class.

                                      • Turn to Page 39. Using the same Starter File, construct a Sample Table and figure out which (if any) of the functions are correct!

                                        Debrief with the class.

                                      Closing

                                      Overview

                                      Learning Objectives

                                        Evidence Statementes

                                          Product Outcomes

                                            Materials

                                              Preparation

                                                Closing (Time 5 minutes)

                                                • ClosingAs our analysis gets more complex, method chaining is a great way to keep the code simple. But complex analysis also has more room for mistakes, so it’s critical to think about a Sample Table that allows us to trust that our code really does what it’s supposed to!