google-plus

The Bootstrap Blog

Data Science is Here

Oh great, now we have to teach another new class? Exhausted teachers everywhere
A lot of people are starting to talk about Data Science. Universities around the globe are creating specialized Masters of Data Science programs, and job prospects for Data Scientists are even better than for Computer Scientists. It's only a matter of time before undergraduate majors in Data Science start cropping up, and within 3-5 years we'll see calls for Middle and High Schools to start teaching it too.

Even Code.org is starting to think about Data Science, and their curriculum manager asked four terrific questions about the goals for a Data Science course on Facebook. We've been been actively supporting our Data Science course for a year now (and have been thinking about this problem a year before that!), so we answered him and shared what we've learned. We'd like to share our knowledge on this topic publicly for anyone who's thinking about making a K12 Data Science course.

1. What should students learn? What should they produce?

Data Science uses math and programming, but it can't be a math and programming course. Sure, students will make use of datatypes, functions, iteration/loops, linear regression, measures of center and variation, etc. — but Data Science is all about turning questions into programs and making meaning from the results! Students should discuss threats to validity, learn to think carefully about outliers, and do a ton of talking and writing about their analysis.

Companies don't want just "coders". Every Engineer or CS major knows someone who can write a thousand lines of horrible, bug-riddled code that compiles and runs, but can't document their code or read a spec. Likewise, it would be be terrible mistake teaching kids to master Excel, R or Python, but not how to interpret results and write about their findings.

A good Data Science class should be a good mix of math and programming, but the final project can't just be a program or a good-looking chart. Students should choose real research questions and write real research papers, using appropriate language and precision to explain their thinking.

2. Can I just use spreadsheets? If not, what's the best coding language? Is it okay if students just make charts?

Spreadsheets aren't enough. We could write a lot on this subject, but Jesse Adler already did. Ideally, you'd want a course that lets you start with spreadsheets and then seamlessly transition into programing. But don't worry about the tools right now: pick your learning goals first, then find the tools that help you get there.

Charts and Plots aren't enough. They're usually necessary, but they're never sufficient. Students need to touch the data — this is where things get messy, interesting, and important! Suppose disaggregating a dataset by gender shows strong correlations, where none were found to begin with; what does that mean? Maybe some outliers are confounding an analysis, but when we look at those outliers we find something essential that we overlooked! When we train teachers in Data Science, touching the data is where things get real: this is where it all comes down to being able to defend and explain their own thinking. That's a powerful realization!

The best tools get out of the way quickly, so you can teach the concepts instead of the language. A language for an introductory Data Science class should make it easy for students to dig into data, without spending a lot of time on syntax or special libraries. Does your language require that students learn about for-loops just to filter a dataset? Do you need to spend a week or so on "intro programming" before you can write your first query on real data? Are the error messages designed with young learners in mind? How much time are you spending teaching a language (Python, Snap, C, or Java...) vs. teaching Data Science?

A good Data Science class should use appropriate tools to do real analysis and manipulation. There's research out there on the importance of authenticity. Kids (and adults) need to get their hands dirty! A good language makes it possible to teach students how to sort, filter, and extend a dataset, visualize data in multiple representations, and do some simple programming.

3. Should we focus more on technical skills or the impact on society?

We think this is a false dichotomy. Focusing on impact without teaching a specific skillset risks becoming an empty, feel-good class in which we all talk about data (and maybe make some charts), but don't actually do anything. Focusing on a skillset without connecting it to real impact will result in tons of kids learning how to load a CSV file, and which commands create graphs...and they'll forget it by the end of next summer. And even if they don't, who cares? Inert knowledge is where CS Education goes to die.

A good Data Science class should teach a good mix of skills, grounded in engaging, authentic projects kids care about.

4. Is this more CS or Math? Won't the math scare students away?

Yes, it's CS. Teaching real, rigorous programming helps make this clear — yet another reason they need to touch the data, and why spreadsheets alone aren't enough.

But yes, it's also math. And that's ok! Math Education also has a problem with inert knowledge: people dislike math because it's rarely situated in real projects. They think "I'll never use this!", and it becomes an inauthentic exercise in symbol pushing. Rigor isn't the problem, but some folks in CS think that the solution is to push rigor as far away as possible for fear of "spoiling the fun".

But Math is a fundamental part of Data Science! It's not possible without it. Rather than tuck our tails between our legs, let's embrace Data Science as the ultimate answer to "when am I ever going to use math?" and be proud of it. At Bootstrap, we reach nearly 25,000 students every year — primarily in underserved schools where math phobia is high — and we've found that rigor is a very good thing when it's tied to a project that matters. Ask any child who just learned how to make toast or beat a videogame: they figured it out; they solved it; they know how to do it. That's what makes it fun! If there's no rigor, there's nothing to solve. Nothing to crack, and no feeling of "YES!!!!" when it all comes together. We'd argue that you can't have fun without rigor.

Is the earth warming? Are Tom Brady or Lionel Messi the G.O.A.T? Does the school I go to matter more than the grades I get? Is stop-and-frisk racist? Who's got the best pizza in town? These are all questions that kids care about, which can be answered with rigorous analysis and mountains of publicly-available data. This is awesome stuff, and we shouldn't shy away from the rigorous math.

A good Data Science class should fully embrace rigor, be it on the CS side or the math side of the equation. It should make it clear life is messy, and that rigorous (and repeatable, and explainable!) analysis is how we get the answers to things that matter.

5. So where does this new course fit?

At Bootstrap, we know that there are a finite number of hours in the day and rooms in the building. A curriculum that has to be a standalone course will forever be relegated to "opt-in" status. Maybe the kids with the means and inclination will take it, but that's all. We think every child is a data scientist, so Data Science is for everyone, which is why we've created a curricular module that can be flexibly adapted and integrated into:

  • Computer Science teachers are always in search of good, interesting projects — and there's data for every interest a student might have! Our Data Science module assumes no prior programming background, yet it gets students working with real data right away. We think this is a great way to introduce computing and programming. In a few years, almost every CS1 course is going to have a significant Data Science component; you can get a head start!
  • The AP CS Principles course is structured around seven big ideas, and one of them is Data. A real Data Science module could easily be dropped into a CSP class, which would go a whole lot deeper than the those lessons do now. The resulting research paper and program could even be used for the Create Task! In fact, some schools are already doing this using our Data Science module.
  • Business classes often have students spend time learning to use spreadsheets. Students make charts, learn to program formulas, and write reports on their analysis of sales data, financial trends, etc. These classes could the exact same concepts using a Data Science module, doing the same activities but with some real programming in place of a spreadsheet.
  • Statistics classes have the math part down, but they have a reputation (undeserved, for many!) of being dry. Students learn about linear regression and r2, but echoes of algebra remain: "When am I ever going to use this?" Data Science is the mechanism to put the math to use and the question to rest, just as our Bootstrap:Algebra module has done for tens of thousands of students every year.
  • Social Studies classes are what make us the most excited. Social Studies teachers don't get nearly enough respect from the STEM field, and we think that's a problem. These are the classes where students look at everything from immigration to national policy, or at the impact of things like the Irish Potato Famine to the Electoral College. When we talk about laws, trade, or even climate, we're talking about data. And Social Studies teachers care deeply about making meaning from that data: they know why writing effectively about these subjects matters. One of our pilot classes was a Social Studies class, which explored the impact of poverty on academic performance — heavy stuff, but it was something deeply personal and relevant to the students. Instead of only talking about poverty, they looked at the data to draw conclusions.

A good Data Science class shouldn't even have to be its own class! It should be tailored to the content domain, so it's applied in context to the classes where it makes sense. It should support and reinforce the business, statistics, or social studies class in which it's embedded, and in return it will be supported by putting Data Science where it should be: in context.

You can check out our Data Science module right now, and integrate it into your Computing, Business, Stats or Social Studies course!
Posted November 12th, 2018