name age shopping history interest in game buys game

Jan

teens

previous customer

no

no

Jose

teens

previous customer

no

no

Maribel

twenties

previous customer

no

yes

Noah

thirties

previous customer

no

yes

Sydney

thirties

previous customer

yes

yes

Mariana

thirties

new customer

yes

no

Rasula

twenties

new customer

yes

yes

Jillian

teens

previous customer

no

no

Ariella

teens

new customer

yes

yes

Isabela

thirties

previous customer

yes

yes

Danial

teens

previous customer

yes

yes

Kate

twenties

previous customer

no

yes

Taikhoom

twenties

previous customer

yes

yes

Peter

thirties

new customers

no

no

"Age" as the Root Node

The decision stump below splits the above training data by age and indicates whether the individuals in each group buy the game.

A one-level decision tree showing three branches (teens, twenties and thirties), coming from the root node of age.

Students won’t see the check marks — they will add them in question 3.

1 Where do the Y/N lists beneath each of the three branches come from?

2 What prediction will our current model (decision stump) make for each group?

  • people in their teens will / will not buy the game

  • people in their twenties will / will not buy the game

  • people in their thirties will / will not buy the game

3 Place checkmarks below each of the values in the stump leaves for which our prediction is correct.

4 Find the likelihood of a correct prediction for each age group. teens: % twenties: % thirties:%

5 How accurate is the current prediction across our entire dataset? correct predictions out of 14 attempts. ( % accuracy).

Improving Our Prediction

We made our prediction without considering all of the columns in our training data. If we add another level to our tree, we might be able to improve our accuracy!

6 Before moving on to the second level of his decision tree, Ernie removed all of the rows for people in their twenties. Bert said, "I don’t think that’s a good idea! Why would we alter our dataset just because we’re starting the second level of the tree?" Explain Ernie’s (correct) decision to Bert.

7 We used "age" as our root node. What questions could we ask at our second-level decision node? [column] or [column]

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, 1738598, 2031479, and 1501927). CCbadge Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.