Oh great, now we have to teach another new class? Exhausted teachers everywhere
Even Code.org is starting to think about Data Science, and their curriculum manager asked four terrific questions about the goals for a Data Science course on Facebook. We've been been actively supporting our Data Science course for a year now (and have been thinking about this problem a year before that!), so we answered him and shared what we've learned. We'd like to share our knowledge on this topic publicly for anyone who's thinking about making a K12 Data Science course.
Data Science uses math and programming, but it can't be a math and programming course. Sure, students will make use of datatypes, functions, iteration/loops, linear regression, measures of center and variation, etc. — but Data Science is all about turning questions into programs and making meaning from the results! Students should discuss threats to validity, learn to think carefully about outliers, and do a ton of talking and writing about their analysis.
Companies don't want just "coders". Every Engineer or CS major knows someone who can write a thousand lines of horrible, bug-riddled code that compiles and runs, but can't document their code or read a spec. Likewise, it would be be terrible mistake teaching kids to master Excel, R or Python, but not how to interpret results and write about their findings.
A good Data Science class should be a good mix of math and programming, but the final project can't just be a program or a good-looking chart. Students should choose real research questions and write real research papers, using appropriate language and precision to explain their thinking.
Spreadsheets aren't enough. We could write a lot on this subject, but Jesse Adler already did. Ideally, you'd want a course that lets you start with spreadsheets and then seamlessly transition into programing. But don't worry about the tools right now: pick your learning goals first, then find the tools that help you get there.
Charts and Plots aren't enough. They're usually necessary, but they're never sufficient. Students need to touch the data — this is where things get messy, interesting, and important! Suppose disaggregating a dataset by gender shows strong correlations, where none were found to begin with; what does that mean? Maybe some outliers are confounding an analysis, but when we look at those outliers we find something essential that we overlooked! When we train teachers in Data Science, touching the data is where things get real: this is where it all comes down to being able to defend and explain their own thinking. That's a powerful realization!
The best tools get out of the way quickly, so you can teach the concepts instead of the language. A language for an introductory Data Science class should make it easy for students to dig into data, without spending a lot of time on syntax or special libraries. Does your language require that students learn about for-loops just to filter a dataset? Do you need to spend a week or so on "intro programming" before you can write your first query on real data? Are the error messages designed with young learners in mind? How much time are you spending teaching a language (Python, Snap, C, or Java...) vs. teaching Data Science?
A good Data Science class should use appropriate tools to do real analysis and manipulation. There's research out there on the importance of authenticity. Kids (and adults) need to get their hands dirty! A good language makes it possible to teach students how to sort, filter, and extend a dataset, visualize data in multiple representations, and do some simple programming.
We think this is a false dichotomy. Focusing on impact without teaching a specific skillset risks becoming an empty, feel-good class in which we all talk about data (and maybe make some charts), but don't actually do anything. Focusing on a skillset without connecting it to real impact will result in tons of kids learning how to load a CSV file, and which commands create graphs...and they'll forget it by the end of next summer. And even if they don't, who cares? Inert knowledge is where CS Education goes to die.
A good Data Science class should teach a good mix of skills, grounded in engaging, authentic projects kids care about.
Yes, it's CS. Teaching real, rigorous programming helps make this clear — yet another reason they need to touch the data, and why spreadsheets alone aren't enough.
But yes, it's also math. And that's ok! Math Education also has a problem with inert knowledge: people dislike math because it's rarely situated in real projects. They think "I'll never use this!", and it becomes an inauthentic exercise in symbol pushing. Rigor isn't the problem, but some folks in CS think that the solution is to push rigor as far away as possible for fear of "spoiling the fun".
But Math is a fundamental part of Data Science! It's not possible without it. Rather than tuck our tails between our legs, let's embrace Data Science as the ultimate answer to "when am I ever going to use math?" and be proud of it. At Bootstrap, we reach nearly 25,000 students every year — primarily in underserved schools where math phobia is high — and we've found that rigor is a very good thing when it's tied to a project that matters. Ask any child who just learned how to make toast or beat a videogame: they figured it out; they solved it; they know how to do it. That's what makes it fun! If there's no rigor, there's nothing to solve. Nothing to crack, and no feeling of "YES!!!!" when it all comes together. We'd argue that you can't have fun without rigor.
Is the earth warming? Are Tom Brady or Lionel Messi the G.O.A.T? Does the school I go to matter more than the grades I get? Is stop-and-frisk racist? Who's got the best pizza in town? These are all questions that kids care about, which can be answered with rigorous analysis and mountains of publicly-available data. This is awesome stuff, and we shouldn't shy away from the rigorous math.
A good Data Science class should fully embrace rigor, be it on the CS side or the math side of the equation. It should make it clear life is messy, and that rigorous (and repeatable, and explainable!) analysis is how we get the answers to things that matter.
At Bootstrap, we know that there are a finite number of hours in the day and rooms in the building. A curriculum that has to be a standalone course will forever be relegated to "opt-in" status. Maybe the kids with the means and inclination will take it, but that's all. We think every child is a data scientist, so Data Science is for everyone, which is why we've created a curricular module that can be flexibly adapted and integrated into:
A good Data Science class shouldn't even have to be its own class! It should be tailored to the content domain, so it's applied in context to the classes where it makes sense. It should support and reinforce the business, statistics, or social studies class in which it's embedded, and in return it will be supported by putting Data Science where it should be: in context.