Friday, September 7, 2012

Conflict and Discussion in basic descriptive statistics

Just a quick update: yesterday I was going to have a very boring "you should know this but let's review anyways" lesson on descriptive statistics. It didn't turn out that way.

First, I asked the class for how many siblings each student has, and wrote the numbers on the board.
I asked them how to represent the data in a more presentable way, and we made a frequency table.

I asked them "would it be OK if I erased the original data now that we have a frequency table showing the same information?" Bored yes from everyone. Evil grin from me.

After a column chart (with lots of students wanting to do a histogram instead, so some discussion on that) and a relative frequency column added to the table, the class suggested we find the mean of the number of siblings. Now is when the fun started.

Me: "Any suggestions?"
S1: "Add all the numbers 0-5 and divide by 18."
Me: OK, (0+1+2+3+4+5)/18 = 0.83.
S1, S2, S3: that can't be right. Most of us had more than 1 sibling, and this shows less than 1.
Me: well, if this isn't right, then discuss among yourselves what could be the mistake here, and how could we fix it. 
Now I don't know about other kids in other schools, but my kids ALWAYS have trouble finding the mean (and median) from a frequency table. It's like they immediately lose track of the meaning of the table. This time, some very interesting and silly approaches were developed.

S1: (0+1+2+3+4+5)/6 = 2.5
S2: (1+4+6+4+2+1)/18=1
S3: (1+4+6+4+2+1)/6=3

In some cases, students laughed at their own attempts. S2 did this, when she realized she has just summed up all the students and divided by the number of students. S3 also realized his answer was too high to be reasonable, but needed prompting from me to see there was a conflict. S1 however did not realize there was a conflict, and her answer seemed reasonable, too. So I stepped in and pointed out that she didn't take the frequency column into account at all, and that her answer would have been the same even if everyone in class would have had 0 siblings.

After repeated attempts that led to conflicts of different kinds, I think that some kids started to realize the problem: they needed to somehow take the values in both columns into account. But how? Some kids came up with multiplying the siblings and the frequency, but it was only after I explained to the whole class how we could re-create the original data and then find the mean that the class understood (with a collective "Oooh!") what the method is and why it works.

Another conflict occured when students were finding the median. They once again focused on only the sibling-column or the frequency column, but more students this time used the original data and got that the median was 2. This solution was presented to the whole class. A moment later I asked the class how we could avoid writing out the original data ("what if there were 1000 students in this class?") and one student responded that we should average the middle numbers in the frequency column: (6+4)/2=5.
She, and other students, seemed unaware there was a conflict between this answer and the one they knew was right, because they'd gotten it from the original data. So I pointed it out and once again gave the class time to discuss other strategies to use the table. We were running out of time, however, so I wrapped it up rather too quickly by having one student explain his (correct) way of thinking.

Lessons learned:

  • I wasn't aiming for a conflict and discussion feel to this class, hadn't planned any of it, but took the opportunities that presented themselves because I'd read up on this method the day before. It's nice to see that not all improvements in teaching need to be painstakingly planned
  • Planning would have helped, however. For one, the data could have been such that all common mistakes produced answers that were clearly in conflict with the data. Then I would not have needed to tell the students there was a conflict, they would have noticed it themselves. 
  • A conflict very obvious to me may not be obvious to the students. Some teacher guidance, or carefully orchestrated group work, is therefore necessary to expose the conflicts and make them available for discussion. 
  • Multiple representations are a problem for students: on one hand, they can easily move from data to frequency table to column chart - but I shouldn't assume they can go the other direction or that they recognize when one representation is in conflict with another.
  • Students were very on-task and seemed more interested than usual. They had started this lesson expecting boring-ol-stats again, but then were lively and active throughout the lesson in a way I haven't seen from this group before.
  • Discussion took time. I let it. We were going to "cover" range and standard deviation this class, too, but that just didn't happen. On the other hand, maybe the students will now have more solid understanding of frequency tables which will allow us to not spend as much time on measures of spread.
Next class (Tuesday) I'll give a short diagnostic quiz to see whether students have retained how to find mean and median from frequency tables. More on that later.


  1. >the data could have been such that all common mistakes produced answers that were clearly in conflict with the data.

    But part of what made it work was that the data was theirs.

    I'm not sure planning could have made the data better.

    I like how this turned out.

  2. Wow! I love how you "storified" this. You should film it!

    I'm not a teacher, but are you sure about this "lesson learned":
    "the data could have been such that all common mistakes produced answers that were clearly in conflict with the data."
    How about that you needed a second set of more obvious data to test theories against it?

    I'm curious, how long did it take to write this superb blog post?

    Carl from BuzzMath

  3. Sue and Carl-Alexandre, thanks for the feedback! Yes I also liked that the data was "their own", because it helps them get a feel for the numbers. I really like the suggestion to use a second set of data (provided by me, such as "I asked the other class and they had this many siblings") to test the students methods against. Great idea!
    This didn't turn out to be a "quick update" like I intended. But maybe it took 30 minutes to write? I'm not sure.

    Carl-Alexandre, just out of curiosity: if you're not a teacher, what is your interest in reading teacher blogs?