(Also available in CODAP)
Students choose a dataset that is interesting to them, and explore it by creating and interpreting displays.
For this project, you will need to make a copy of the Dataset Exploration template. This contains instructions and example text in blue - you’ll want to delete all of the blue text before submitting!
On the title page, be sure to include (1) your name(s), (2) your teacher’s name, (3) the date, (4) a link to your published Pyretfile, and (5) a link to the spreadsheet containing the dataset
Assessment and Grading
We provide a student-facing Exploration Paper Rubric, but teachers should always feel free to edit and adapt it for their classroom.
About this Dataset
Explain why this dataset matters to you. Why would your family, friends, school, or neighborhood care about this dataset?
Why should a stranger care about this dataset?
Do you know where the dataset came from? Why was it collected? Was there any agenda behind the creation of this dataset?
How large is the dataset? What columns does it contain, and what do they represent? Are they categorical or quantitative? Are they Numbers, Strings, Booleans, or something else?
If you have any questions about this dataset, add them to the last section of the paper, entitled "What Questions Do You Have?"
Some displays only work with one column, while others work with more than one. Some displays are designed for categorical data, and others for quantitative data.
For each of the columns in your dataset, consider what display would make the most sense. Make sure you’ve made at least one of the following: Pie Chart, Bar Chart, Histogram, Box Plot, Scatter Plot, Linear Regression Plot
Include an image of each display, along with a title, the code you used to generate the image, and the columns you examined. After you’ve made the display, consider the following:
Is this display interesting?
What do you think it tells you?
It’s ok if you don’t know what every display means! Your goal here is to make an educated guess about what these displays might tell us about the dataset.
If any of these displays spark your curiosity, add your questions to the last section of the document.
Measures of Center and Spread
Select at least two quantitative columns from your dataset, which you would like to focus on.
For each column, explore the mean, median and modes
For each column, compute the 5-Number Summary by making a box plot
For each column, compute the standard deviation
Copy each result into the summary table in the document.
What do the center and spread of these columns tell you? Write your interpretation below the summary table.
Based on these measures, do you think these columns are related in some way? What other relationships might there be in the dataset? Add these questions about proposed relationships to the last section of the document.
Developing a Research Question
After looking at all of your accumulated questions, which ones are the most interesting?
Do you have any theories about what the answers might be?
Does this dataset have any grouped samples that would make sense to explore separately? (For example, it might make sense to explore dogs v. cats separately)
These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.