A bunch of new animals are coming to the shelter, and that means more data! Open the New Animals Dataset and take a careful look.

What do you Notice? What do you Wonder?

There are many different ways that data can be dirty!

  1. Missing Data - A column containing some cells with data, but some cells left blank.

  2. Inconsistent Types - A column with inconsistent data types. For example, a years column where almost every cell is a Number, but one cell contains the string "5 years old".

  3. Inconsistent Units - A column with consistent data types, but inconsistent units. For example, a weight column where some entries are in pounds but others are in kilograms.

  4. Inconsistent Naming - Inconsistent spelling and capitalization for entries lead to them being counted as different. For example, a species column where some entries are "cat" and others are "Cat" will not give us a full picture of the cats.

1 Which animals' row(s) have missing data?

2 Which column(s) have inconsistent types?

3 Which column(s) have inconsistent units?

4 Which column(s) have inconsistent naming?

5 If we want to analyze this data, what should we do with the rows for Tanner, Toni, and Lizzy?

6 If we want to analyze this data, what should we do with the rows for Chanel and Bibbles?

7 If we want to analyze this data, what should we do with the rows for Porche and Boss?

8 If we want to analyze this data, what should we do with the row for Niko?

9 If we want to analyze this data, what should we do with rows for Mona, Rover, Susie Q, and Happy?

10 Sometimes data cleaning is straightforward. Sometimes the problem is evident but the solution is less certain. For which questions were you certain of your data cleaning suggestion? For which were you less certain? Why?

These materials were developed partly through support of the National Science Foundation, (awards 1042210, 1535276, 1648684, and 1738598). CCbadge Bootstrap by the Bootstrap Community is licensed under a Creative Commons 4.0 Unported License. This license does not grant permission to run training or professional development. Offering training or professional development with materials substantially derived from Bootstrap must be approved in writing by a Bootstrap Director. Permissions beyond the scope of this license, such as to run training, may be available by contacting contact@BootstrapWorld.org.