Developing data science pedagogical content knowledge

Teaching is decision making

There’s a moment from teaching high school math in North Carolina that I remember especially vividly. It was an ordinary Thursday afternoon, and I felt exhausted. In the back corner a few girls were off-task, two boys were yelling out, asking to go to the bathroom for the 2nd time in the 50 minute period, the lesson I was teaching for the first time in my life wasn’t working, and my old whiteboard marker was nearly out of ink. How can I get the girls to get on-task? Do I let the boys go to the bathroom? I can’t possibly say no, can I? Am I soft for always saying yes? Should I re-explain sin waves again or let them get onto the worksheet? Is the worksheet too hard? Do I leave the room in search of a better whiteboard marker?

I paused and shaking my head subtly, thought to myself — holy cow, teaching involves making A LOT of decisions.

Since that day, in the back of my mind I’ve always thought of teaching as the sum of thousands, if not millions, of decisions. Better teachers make better decisions on average. Becoming better at teaching means making better decisions.

I’m now a PhD student in a school of education, focusing on teaching data science. I teach a course mostly for Masters students. For example, here’s an objective we cover: Students will be able to reshape a data set and explain the relative advantages of a dataset in different shapes.

I’ve never taught this objective before. How can I make good decisions in teaching this objective? How can I choose examples, explanations, and activities that help my students master this objective? What misconceptions might they develop?

Not so long ago, I thought I needed experiments to answer these questions. We could come up with a few different explanations and then randomize students to each of them to see which one works best. I’ve since realized that isn’t a good idea — if you list out of all of the decisions a teacher has to make, an experiment for each of them quickly becomes unwieldy.

In this old way of thinking, it’s bad if teachers make decisions based only on their intuition. I don’t think so anymore. With all of the decisions they make, teachers must make most of their decisions based on intuition.

It’s useful to break down a teacher’s intuition into two parts: general teaching intuition (should students work in groups?) and teaching intution related to a specific content objective (how should I explain reshaping data?).

Shulman named this pedagoglical content knowledge in his 1986 paper and described it as the following:

“that special amalgam of content and pedagogy that is uniquely the province of teachers, their own special form of professional understanding”

It’s useful perhaps to name the enemy. Most often it’s lack of empathy for what it’s like to be new to something. The teacher has seen a concept many times and can’t quite remember what it’s like to not get it or to have fragile understanding of prerequesite knowledge. This has been referred to as the expert blind spot or the curse of knowledge, and its why teachers are sometimes bad at knowing whats best for their students.

I see pedagogical content knowledge as the antidote to the expert blind spot, and as one of the essential ingredients to effective teaching. Here’s my question then:

How can teachers develop pedagogical content knowledge in data science?

For such a fundamental question, research offers surprisingly little practical advice.

How I developed pedagogical content knowledge in under an hour

Imagine you are teaching spread and gather for reshaping data next week.

You need to quickly build empathy for what it’s like to be a beginner having never seen these functions before. You need to understand the misconceptions that come up. And the explanations and exercises that help students move towards mastery. You can’t think your way there because you are in the expert blind spot.

I didn’t find that research offered useful advice. I had an idea though — I remembered a lesson from taking the startup garage course: Conduct user interviews with prototypes to generate discussion. The prototypes can be incredibly simple, even a piece of paper with just a few sketches on them.

This idea is similar to the idea of a think aloud protocol which educators have been using for many years. However, I’ve never seen think aloud protocols used to develop a teacher’s pedagogical content knowledge. Instead, student think alouds are typically done to validate assessment items (e.g. does the item require use of the targeted skill), and teacher think alouds are a teaching technique used to model how to approach a problem.

To test this idea, I built a prompt for the spread and gather functions. Here’s the first part (you can see the full prompt here):

The prompt

I printed out the following prompt and talked to students about it one at a time to better understand how to teaching reshaping data in R.

Marva has data on the number of hamsters, cages, and hamster wheels that a couple of his classmates have. The data is stored in a data frame called hamsters_data_frame. You can think of hamsters_data_frame as a small spreadsheet:

hamsters_data_frame
## # A tibble: 6 x 3
##   name   what     count
##   <chr>  <chr>    <dbl>
## 1 Amy    hamsters     2
## 2 Amy    cages        4
## 3 Amy    wheels       1
## 4 Bianca hamsters     8
## 5 Bianca cages        3
## 6 Bianca wheels       2

Marva wants to change the shape of his data. To do so, she ran the following code:

spread(data = hamsters_data_frame, key = what, value = count)
## # A tibble: 2 x 4
##   name   cages hamsters wheels
##   <chr>  <dbl>    <dbl>  <dbl>
## 1 Amy        4        2      1
## 2 Bianca     3        8      2

Question 1: Explain what Marva’s code did. Please explain in detail. You might even draw a picture…

Lessons learned from using the prompt

I listened as a few students responded to this prompt. I was shocked at how quickly my vision for teaching spread and gather developed. The first thing I noticed was the different words that students used to describe what spread does to a dataset: collapse, reshape, transpose, spreadsheet, arrange, wide, and long. The second thing I noticed is that each of the students I talked to either didn’t understand or misconception around the idea of a row identifier. One student actually asked this question directly: “How does R know to collapse the name column?”. (INPUT IMAGE OF RESPONSE)

There was also confusion on how the output dataset was sorted. Was Amy the first row because she appears first in the input dataset or because she is first in alphabetically?

The idea of a simplest possible unambiguous example (spue)

The students responses made me realize more generally what the goal of my prompt was. I wanted a simple example that made the spread function clear by looking at the input and output datasets. Similarly, if you are teaching someone multiplication for the first time, you probably aren’t going to show them 112 * 101 = 11312 first because it’s too complex. Similarly, you probably wouldn’t show them 2 * 2 = 4 because it’s difficult to distinguish from addition. The best first example for teaching multiplation is 2 * 3 = 6. It’s simple and unambiguous.

I haven’t heard a name for these first examples used to introduce a concept, so for lack of a better term, let’s call them simplest possible umambiguous examples (or spues).

Talking to students made me realize that a single example of spread isn’t sufficient for students to understand it. So instead I seek a set of spues such that for a student to create a mental model that inputs every example’s input to its output, there’s a very high likelihood they’ll have generated an unflawed mental model.

Set of spues for the spread function

Example 1: Basic use of spread

hamsters_data_frame
## # A tibble: 6 x 3
##   name   what     count
##   <chr>  <chr>    <dbl>
## 1 Zach   hamsters     2
## 2 Zach   cages        4
## 3 Zach   wheels       1
## 4 Bianca hamsters     8
## 5 Bianca cages        3
## 6 Bianca wheels       2
spread(data = hamsters_data_frame, key = what, value = count)
## # A tibble: 2 x 4
##   name   cages hamsters wheels
##   <chr>  <dbl>    <dbl>  <dbl>
## 1 Bianca     3        8      2
## 2 Zach       4        2      1

Example 2: Uh oh, duplicate row identifiers

In talking to students, this was the key misconception I observed students having. In spread, any columns not provided as the key or value columns are implicitly used to identify rows. This is very subtle and confusing for students. To make matters worse, other versions of reshape functions in R, stata, and other languages uusally have the user specify explicitly the row identifying variables.

hamsters_data_frame
## # A tibble: 7 x 3
##   name   what     count
##   <chr>  <chr>    <dbl>
## 1 Zach   hamsters     2
## 2 Zach   cages        4
## 3 Zach   wheels       1
## 4 Bianca hamsters     8
## 5 Bianca cages        3
## 6 Bianca wheels       2
## 7 Zach   hamsters     3
spread(data = hamsters_data_frame, key = what, value = count)
## Error: Each row of output must be identified by a unique combination of keys.
## Keys are shared for 2 rows:
## * 1, 7

Example 3: Fixing duplicate identifiers by adding a column

hamsters_data_frame <- 
   tibble(
      name = c("Zach", "Zach", "Zach", "Bianca", "Bianca", "Bianca", "Zach"),
      id = c(1, 1, 1, 2, 2, 2, 3),
      what = c("hamsters", "cages", "wheels", "hamsters", "cages", "wheels", "hamsters"),
      count = c(2, 4, 1, 8, 3, 2, 3)
   )
hamsters_data_frame
## # A tibble: 7 x 4
##   name      id what     count
##   <chr>  <dbl> <chr>    <dbl>
## 1 Zach       1 hamsters     2
## 2 Zach       1 cages        4
## 3 Zach       1 wheels       1
## 4 Bianca     2 hamsters     8
## 5 Bianca     2 cages        3
## 6 Bianca     2 wheels       2
## 7 Zach       3 hamsters     3
spread(data = hamsters_data_frame, key = what, value = count)
## # A tibble: 3 x 5
##   name      id cages hamsters wheels
##   <chr>  <dbl> <dbl>    <dbl>  <dbl>
## 1 Bianca     2     3        8      2
## 2 Zach       1     4        2      1
## 3 Zach       3    NA        3     NA

Example 4: Driving the point home with a common mistake

hamsters_data_frame <- 
   tibble(
      name = c("Zach", "Zach", "Zach", "Bianca", "Bianca", "Bianca", "Zach"),
      id = rnorm(7),
      what = c("hamsters", "cages", "wheels", "hamsters", "cages", "wheels", "hamsters"),
      count = c(2, 4, 1, 8, 3, 2, 3)
   )
hamsters_data_frame
## # A tibble: 7 x 4
##   name        id what     count
##   <chr>    <dbl> <chr>    <dbl>
## 1 Zach   -2.25   hamsters     2
## 2 Zach    0.762  cages        4
## 3 Zach   -0.886  wheels       1
## 4 Bianca  0.200  hamsters     8
## 5 Bianca  0.943  cages        3
## 6 Bianca -0.0935 wheels       2
## 7 Zach    0.180  hamsters     3
spread(data = hamsters_data_frame, key = what, value = count)
## # A tibble: 7 x 5
##   name        id cages hamsters wheels
##   <chr>    <dbl> <dbl>    <dbl>  <dbl>
## 1 Bianca -0.0935    NA       NA      2
## 2 Bianca  0.200     NA        8     NA
## 3 Bianca  0.943      3       NA     NA
## 4 Zach   -2.25      NA        2     NA
## 5 Zach   -0.886     NA       NA      1
## 6 Zach    0.180     NA        3     NA
## 7 Zach    0.762      4       NA     NA
### not sure what this is so just hiding it in a comment ###

# # A recipe for teaching effectively
# 
# a design class though of build something really simple and put it in front of people and it will generate discussion. 
# 
# So I wrote a super simple example of what spread does. And I gave it to students. They wrote their responses and we talked about those responses. 
# 
# I put myself in  this situation 
# 
# 
# Not so long ago, I thought of a teacher's intuition 
# 
# 
# 
# Teaching is hard. I Teach a data science course in this area. It seems to me that teachining is the sum of thousnda of individual decisions: asf aksdf adsfk asdkfj asdfk. 
# 
# 
# 
# one can think of teaching as thousands of decisions. millions maybe.
# 
# i had this idea of we need evidence. we need to test everything. but then i realized two things
# 
# (1) a lot of what we know from other stuff holds
# 
# (2) testing everything becomes unwieldy
# 
# so then using intuition isn't the enemy, teachers are going to use it. how can we help them develop it. researchers call this PCK. 
# 
# so how can we help teachers develop PCK? It's useful perhaps to name the enemy. The enemy is the expert blindspots. Or in other words lack of empathy for what its like to not understand something. 
# 
# Maybe I'm explaining something and I go too quickly or don't offer practice on it? Why? Because I don't fully appreciate what its like to not know and the practices necessary.
# 
# We need to help teachers develop this empathy or intuiton of PCK or whatever you want to call it.
# 
# But doing that is super hard. For what seems to be such an essential piece of a student's education, research on developing PCK isn't compelling. This paper provides a meta-study which I'd summarize as many studies have found ways of improving PCK by doing things like ..., but the studies aren't specific about what works. 
# 
# I'm interested in something like what if someone needs to teach a class of students how to reshape data in R (e.g. go from wide to long) next week. What can they do to prepare now? I believe the first thing they do should be to build an empathy for what it's like to not understand that concept.
# 
# I looked and I looked and I looked and I couldn't find any suggestions for what to build that empathy quickly. I remembered a lesson from a design class though of build something really simple and put it in front of people and it will generate discussion. 
# 
# So I wrote a super simple example of what spread does. And I gave it to students. They wrote their responses and we talked about those responses. 
# 
# Students used all sorts of vocabulary. Could have been reshape_spread or reshape_gather.
# 
# This student made me realize what I now believe is the key misconception or shortcoming in almost everyone's understanding. When you use the spread function you specify the key and and value columns and nothing else. What happens to all of those other columns? They implicitly become the row identifiers. Almost no one understand that. And other implementations of reshapes has one specify those explicitly. 
# 
# I also realized this idea of a SPUE in creating it. For example, if you were teaching multiplication to a student for the first time you'd... then you'd move onto the edge of cases of multiply by 1 and 0 and then larger numbers. You might even ask students why 2 x 2 might be confusing. 
# 
# It's hard to show everything, like this idea of implicit row identification, with just one SPUE. So maybe the idea I'm after is a set of SPUEs such that in order for students to learn how the inputs lead to all of the outputs, they must hold a nearly correct mental model with high probability. 
# 
# Consider these examples:
# 
# For students to fully grasp, they must see it orders. That these become row identifiers. That when a value is missing it fills it in with NA. That if there are multiple values for a cell it returns an error. That for additional values of the column they just keep getting spread out wide. 
# 
# So here's my rough draft suggestion for what I'd do if I had to teach a data science concept next week.
# 
# (1) be sure i fully understand it
# 
# (2) make some example of using it and then find students and conduct a talk aloud about it
# 
# (3) create what I'm calling a set of SPUEs.
# 
# I think if you get to that point the rest of what to do unfolds fairly well. You can use that set of SPUEs as inspiration for a short-lecture on how to use the method. It can also be the inspiration for exercises for students to work on in groups. And it can certainly be inspiration for writing assessment items on the concept.