Unit 7:   Select and Order

imageUnit 7Select and Order
Unit Overview

Students are introduced to their first examples of operations that consume and produce tables, and learn how to select columns and order rows. They are also introduced to the beginnings of Table Plans, as a vehicle for thinking through compound queries.

Product Outcomes:
    Length: 90 minutes
    Glossary:
    • ascending: an order in which lower values appear first, and higher values appear after

    • descending: an order in which higher values appear first, and lower values appear after

    • order: An arrangement of data according to some pattern

    • outlier: an observation point that is distant from other observations, perhaps due to experimental error or measurement variability.

    • select: a query that creates a table containing only a subset of a specified table’s columns

    • table query: An expression that consumes tables and creates a new table object

    Materials:
      Preparation:

        Types

        Functions

        Number

        +, -, *, /, num-sqrt, num-sqr, mean, median, mode, modes

        String

        n/a

        Image

        draw-plot, draw-plots

        Series

        function-plot, scatter-plot, bar-chart, pie-chart, freq-bar-chart, histogram

        Table

        extract



        Review

        Overview

        Learning Objectives

          Evidence Statements

            Product Outcomes

              Materials

                Preparation

                  Review (Time 15 minutes)

                  • Take 15 minutes to answer question 9 in your Project Report.

                    See the Sample Project Report to see an exemplar of student work.

                  Select Queries

                  Overview

                  Learning Objectives

                    Evidence Statements

                      Product Outcomes

                        Materials

                          Preparation

                            Select Queries (Time 20 minutes)

                            • Open the Unit 7 Starter File, Save a Copy, and Run the program. Today we’re going to look at a new table, called movies.
                              • What was the domestic gross for "Big Hero 6"?

                              • What was the overseas gross for "Up"?

                              • What was the total gross for "Zootopia"?

                              • What year did "Maleficent come out?"

                              Suggestion: Give the class one question at a time, and time them for each one. Afterwards, ask them what made it hard to answer things quickly.

                            • You may have noticed that answering these questions requires a little extra work, because the columns we need are separated by a lot of uneccessary columns. Finding the year "Maleficent" came out requires looking at the title, year columns, but not any of the columns in between. It would be nice if we could choose only the columns we need, and put those into a new table.

                              Type movies-selected into the Interactions Window.

                            • When we want to answer questions about just a few columns in a table, we can remove the unnecessary ones with select. select is a table query: a special key word that will create a new table using information from a starting table. Table queries are called queries because they are used to ask questions using specific information in tables. select creates a table containing only the columns that the programmer specifies.

                               

                              Demonstrate the select query to the class, selecting different columns from the movies table.

                            • The select keyword will choose the column names given (in this case, title and year) from the specified table movies, and create a new table with just those columns. If you turn to the back of your student workbook, on Page 45, you’ll find a reference sheet for this and other queries. This is a great place to take notes on how each query is used and to sweat the details. For example, pay close attention to the comma that separates the column names!

                              In your own words, write down what a select query is for.

                              The benefits of using the select keyword are mostly for the programmer: having less columns does not make it significantly faster to perform operations on the table, but less columns does make it easier for humans to observe relations between columns (like in the examples above).

                            • When thinking about select queries, we ask ourselves are any columns unnecessary? If the answer is no, we have no work to do. But if the answer is yes, we can zoom in and think about which columns we want to keep. Suppose we wanted to make a brochure showing local restaurants and ratings. "Are any columns unnecessary?" The following code will select only the country and continent columns from the countries table, and bind the new table to the variable countries-selected:

                               

                              • In your Definitions Area, complete the next two exercises ("Countries and Continents" and "Fat Content Information"). Test your code by hitting the Run button and typing the new variable names into the Interactions Area.

                              • Complete Page 25 in your workbook, and complete the Word Problems requiring you to write and interpret select statements.

                              • Complete Page 26.

                              Encourage students to think of the interactions area as a playground: if a student needs to write a definition but don’t know exactly how, they should experiment with different expressions in the Interactions Area until they are satisfied. THEN they should bind that expression to a variable name in the Definitions Area.

                            Order Queries

                            Overview

                            Learning Objectives

                              Evidence Statements

                                Product Outcomes

                                  Materials

                                    Preparation

                                      Order Queries (Time 20 minutes)

                                      • Answer the following questions as fast as you can:

                                        • Which three foods have the most sodium?

                                        • Which 3 movies had the highest domestic gross ticket sales?

                                        • What are the 3 countries with the smallest population?

                                        Suggestion: Give the class one question at a time, and time them for each one. Afterwards, ask them what made it hard to answer things quickly.

                                      • Let’s look again at our favorite 4th grade class. This time, we have 3 columns: First name, Last name, and the students’ grades on the most recent test.

                                        first

                                        last

                                        test-grade

                                        "John"

                                        "Doe"

                                        82

                                        "Jane"

                                        "Smith"

                                        86

                                        "Javon"

                                        "Jackson"

                                        88

                                        "Angela"

                                        "Enriquez"

                                        76

                                        "Jack"

                                        "Thompson"

                                        81

                                        "Dominique"

                                        "Rodriguez"

                                        93

                                        "Sammy"

                                        "Carter"

                                        79

                                        "Andrea"

                                        "Garcia"

                                        91

                                        • Who had the highest score on the test?

                                        • What are the names of the top 4 scorers?

                                      • When we answer this question, we have to look through every row in the table to figure out what the top 4 scores are, then the names of the students with those scores. The next operation we will see is ordering. Ordering is sorting all of the rows in a table in a meaningful way, using one of the columns in particular. Data scientists will often order tables so that they are easier to search through.

                                      • This table has all of the same rows as the classroom table, but they are now ordered by test-grade in descending order. It is in descending order by test grade because the test grades start high at the top of the table, and get lower further in the table.

                                        first

                                        last

                                        test-grade

                                        "Dominique"

                                        "Rodriguez"

                                        93

                                        "Andrea"

                                        "Garcia"

                                        91

                                        "Javon"

                                        "Jackson"

                                        88

                                        "Jane"

                                        "Smith"

                                        86

                                        "John"

                                        "Doe"

                                        82

                                        "Jack"

                                        "Thompson"

                                        81

                                        "Sammy"

                                        "Carter"

                                        79

                                        "Angela"

                                        "Enriquez"

                                        76

                                      • The opposite order, in which values are INCREASING as you go further down the table, is called ascending. Here’s what the classroom table looks like ordered by the test-grade column in ascending order:

                                        first

                                        last

                                        test-grade

                                        "Angela"

                                        "Enriquez"

                                        76

                                        "Sammy"

                                        "Carter"

                                        79

                                        "Jack"

                                        "Thompson"

                                        81

                                        "John"

                                        "Doe"

                                        82

                                        "Jane"

                                        "Smith"

                                        86

                                        "Javon"

                                        "Jackson"

                                        88

                                        "Andrea"

                                        "Garcia"

                                        91

                                        "Dominique"

                                        "Rodriguez"

                                        93

                                      • Pyret lets you change the order of a table’s rows with the order query. This makes our questions a lot easier to answer quickly! For example, let’s think back to our "most sodium" question...

                                        Return to your Unit 7 template file. Type presidents-ordered into the interactions window.

                                        Here, movies are in alphabetic order by title, from A to Z. The expression for ordering the presidents table is the following:

                                         

                                        Demonstrate that this code can be crammed onto one line, but it may be more readable this way.

                                      • The key word order is followed by the name of the table we are ordering (in this case, nutrition), then a colon (:). The colon always comes in front of an expression using column names. In this case, we are ordering by the sodium column, in descending order.

                                        Change the keyword ascending in the definition of movies-ordered to descending. Then hit Run and type movies-ordered into the interactions window. What has changed?

                                        Now the table is in descending alphabetical order by the title column.

                                        What happens if we instead order the movies-ordered table by the studio column?

                                        It turns out you can order tables by columns Numbers AND Strings. A table in ascending order by a String means it is in alphabetical order, and descending means reverse alphabetical order.

                                        • Can we order the movies table by the second word in the title? If not, why?

                                        • In your definitions window, complete the next two exercises: "Salt Order" and "Population Order".

                                        • Complete the word problems on Page 27.

                                        • On page Page 45, write down what an order query is for.

                                        Test your code by hitting the Run button and typing the new variable names into the interactions window.

                                      • We can also order a table by more than one thing at a time! For example, suppose we wanted to show all the movies, sorted ascending by studio name and sorted alphabeticall by title within those groups?  

                                      • The order keyword allow us to reorganize the entries of a table so that we can more easily answer these kinds of questions:

                                        • What rows have the highest value in a particular column?

                                        • What rows have the lowest value in a particular column?

                                      Combining Queries

                                      Overview

                                      Learning Objectives

                                        Evidence Statements

                                          Product Outcomes

                                            Materials

                                              Preparation

                                                Combining Queries (Time 25 minutes)

                                                • These table operations are even more useful when they are combined:

                                                  • Type studio-and-gross into the interactions window.

                                                  • Find the definition of studio-and-gross in the Definitions Area. What table is this query using?

                                                  • Are the rows in the studio-and-gross table in any particular order?

                                                  This query does something you haven’t seem before: it uses the result of a prior query. We ordered our table by party and bound the result to movies-ordered, and now we’re selecting only the relevant columns from that table. Note that the new table is still sorted, even though we’ve dropped the column we used to sort!

                                                • This is another example of why query order matters. If we’d selected our columns first, there would be no way for us to achieve this ordering. That’s why, when combining queries, we always put order before select.

                                                • In this course, we’ll use Table Plans to help us set up our queries. One way to think about Table Plans is to view them as a "Recipe" for writing queries: by answering a series of questions about our data, we can gradually piece together the queries we need.

                                                  If your students are familiar with the Design Recipe from Bootstrap:Algebra or Bootstrap:Reactive, this is the time to make those connections!

                                                • We can also think of Table Plans as a kind of puzzle-solver, for getting us from one table to another.
                                                  • Turn to Page 28, and look at the start and end table.

                                                  • What queries will get us from one to the other? Use this 2-step Table Plan to figure out the order and select queries that will produce the final table. When you’ve finished, type these queries into the Definitions Area, under "Gross and Domestic".

                                                  • Complete Page 29. When you’ve finished, type these queries into the Definitions Area, under "Title and Year"

                                                • With all the commas, colons and end markers, it’s easy to make a small syntax error! Open the Select and Order Syntax Errors file, and see if you can fix all the bugs you find. Once you’re done, uncomment each query by removing the hash sign (#) and click Run.

                                                • Table Plans allow us to turn query-writing into a series of questions. By focusing on just one question at a time, we minimize our chances of making mistakes, and give ourselves a way to retrace our steps if something goes wrong. Being comfortable with Table Plans takes time, but once you’ve mastered them you’ll find it gets easier and easier to program complex table queries.

                                                Closing

                                                Overview

                                                Learning Objectives

                                                  Evidence Statements

                                                    Product Outcomes

                                                      Materials

                                                        Preparation

                                                          Closing (Time 10 minutes)

                                                          • At this point, you may have noticed that your dataset contains a few "extreme" points that skew the results of your analysis. Perhaps one movie was a box-office failure, or a war in one country is artificially dragging down their life-expectency. These extreme points are called outliers, and sometimes it’s best to remove them from your dataset in order to identify a pattern. Or perhaps you’ve decided you want to zoom in on a subset of the data, or break it apart and do an analysis on two different groups. Maybe you want to run a separate analysis for movies with female leads v. male leads, or analyze Democratic and Republican presidents separately. You don’t know how to do this - yet - but that’s exactly what we’ll cover in the next lesson!

                                                            Take 10 minutes to answer question 9 in your Project Report.

                                                            See the Sample Project Report to see an exemplar of student work.