What’s a control group? Why do I need one?

control-group-graphic

Let’s say you have some new teaching approach or technique that you think will improve learning outcomes for your students. You try it and they improve. Excellent, we’re making progress.

But then someone asks, did you have a control group? In this article, we ask, what’s a control group? And why do you need one?

A control group is a group of students in an evaluation who don’t receive the new approach and are used as a benchmark against which to measure the results of the group who do receive the new approach.

At the CEN, we have commercial collaborators who provide educational interventions for struggling students, and they are sometimes puzzled by researchers’ insistence on control groups. If you have something you think works, you should use it on all the students, right? In fact, if you’re confident your approach works better than current practice, isn’t it almost unethical not to give it to all the students? If measuring how well students are doing takes additional time or resources, the students in the control group incur this cost without any benefit. That’s not fair, is it? The researchers’ obsession with control groups can seem inefficient and cumbersome.

So, let’s discuss the main reasons for using control groups – even if using them adds time, expense, and risk of unfairness to our evaluations. Here are three main reasons:

You can kid yourself something works when it doesn’t.
Students improve over time anyhow.
Students can improve due to your new approach but not for the reason you think.

You can kid yourself something works when it doesn’t

Classrooms are complex contexts with lots going on, and if we are not systematic about evaluating which approaches are effective, it is easy to fool ourselves that something is working when it’s not, just by looking for what’s called confirmatory evidence. This is when we seek examples of situations that fit with what we want to be true, and ignore situations that don’t support our expectations. This is the usual way the brain works, and we have to train ourselves to seek out evidence that would potentially show our beliefs are wrong. History tells us that we don’t produce robust progress in understanding the most effective educational approaches if we rely solely on evidence generated from our own experience, on anecdote, or on word of mouth. Control groups are insurance against kidding ourselves something works when it doesn’t, because the group receiving our new approach (let’s call them the ‘intervention group’) at least have to improve more than a group of students not receiving it.

Students improve over time anyhow

Mostly, students improve over time anyway. Partly, this is due to the current educational approaches we’re using (sometimes called ‘teaching as usual’), partly because children develop over time. If we’re using a new approach across, say, the course of a term, we want to be confident that the intervention group are improving more than they would have done anyway. The most straightforward way to do this is to compare their improvement over the term with another similar group of students, say another class, who are just having teaching as usual.

Students can improve due to your new approach but not for the reason you think

If as a teacher you start to use a new approach, presumably there is something important you are doing differently that you think will lead to improved learning outcome – some key ingredient. But when you use a new approach, sometimes you are more enthusiastic as a teacher. And sometimes, students respond positively just to doing something different. Novelty itself may produce better outcomes. Your new approach may be working but not for the reason you think.

That’s potentially bad for two reasons. First, when the new approach becomes routine for you or the students, it will stop working. Second, anyone else who uses your new technique but who doesn’t share your enthusiasm (perhaps because they’ve merely been told to use the new approach) – or their students don’t share the enthusiasm – will find the approach doesn’t work.

To make progress in what works, we need to rule out these transitory effects. Unfortunately, the teaching-as-usual control group won’t help us here. Instead, we need what’s called an active control. This group of students will receive some other new intervention, but missing the key ingredient that you think will improve the target skill. If your intervention group improves, but the active control group does not, that increases your confident the new approach is working for the reason you think it is. And that so long as the key ingredient is included, your approach should work in other contexts, and might even be varied a bit in other ways to fit with those contexts.

For example, at the CEN, we co-designed with teachers a new computer game to improve 8-10-year-olds understanding of counterintuitive concepts in science and mathematics (e.g., that the Earth is round even though your experience of it is that it seems flat). Because we were concerned that using a novel computer game in science and mathematics classes might itself improve outcomes (perhaps increase motivation or attention), we had an active control group who played a different novel computer game designed to improve their socio-emotional skills. We predicted that both games should improve a skill, but if we were correct, the counterintuitive concept game shouldn’t improve socio-emotional skills, and vice versa. If we were wrong, using the novel socio-emotional game would improve maths and science skills. In which case, it would be the novel game playing the role, not the key ingredient we put into the maths and science game (inhibitory control – see here if you’re interested). Fortunately, we were correct!

So, we use a teaching-as-usual control group to check our new approach is having an effect, and an active control group to check our new approach is working for the reason we think it is.

Cost matters, too

When we assess a new approach, one of the considerations is how much it costs – in terms of time, money, resources, training and so forth. Resources are always limited to some extent. So we are often trading off how effective our new approach is with how much it costs. We want the biggest bang for the smallest buck! But note an important consequence here. If our intervention group improves exactly as much as our teaching-as-usual group, but the new approach is cheaper (or quicker, or easier for the students), you might still view it as better.

For example, GraphoGame Rime, a computer-assisted reading intervention for children struggling to learn to read, showed gains in reading skills no different from a teaching-as-usual control group. The teaching-as-usual group were children receiving a range of different types of phonics intervention, all administered one-to-one or to small groups of children by teachers or teaching assistants. Both groups were intended to receive a target of 12 hours tuition. Another way of viewing this finding is that a computer-assisted reading intervention was as effective as the current provision of phonics tuition for struggling readers. GraphoGame Rime is an app that costs $2.99 per child to download, while 12 hours of teacher/teacher-assistant time is substantially more costly.

So even when there are no differences between our new approach and a teaching-as-usual control group, we still may think it a better choice. For practical decision making, time and money matter too!

Other things researchers worry about

We won’t go into a lot of detail here, but when designing evaluations of educational techniques, there are other things researchers worry about, too. If you want to get into the nitty gritty, here are some of them!

When comparing groups, say intervention vs. active control vs. teaching-as-usual, researchers want those groups to be performing similarly before the new approach is tried. That makes it easier to link changes to the effect of the intervention. The easiest way to do this is by randomly allocating students to each group – in practice, this means allocating classes to conditions, or schools to conditions.
Researchers also worry about cross-contamination if intervention and control groups are in the same schools. Teachers may talk, or students, and some of the effects of the intervention may seep into the control group classrooms.
Researchers worry about fidelity – that the new approach is being used in the same way or to the same extent in different classrooms, or over the course of the intervention implementation. Researchers will often try to ask the teachers how they are using the approach, or how often, just to check. But there’s debate on this point, because teachers want new approaches to be robust to at least a little bit of variation so that they can adapt it to their own classroom contexts.
Researchers worry about what they are going to measure – perhaps some target skill, measuring classes before the intervention and then afterwards, and assessing the level of change. But how long after the intervention? Immediately? How can you tell if the effects of the intervention persist? You’ll need to follow up some time later with another assessment. Researchers also want their measures to be reliable – for example, so that if different please measure the same thing (say, observing classroom behaviour), they will come up with the same scores. This is called inter-rater reliability, something researchers like!
Researchers also worry about what’s called transfer. If you target a particular skill with your new approach, can you be confident that it will ‘transfer’ to also improve performance on standard educational assessments, like end of year tests or examinations? In psychological research on cognitive training, transfer to tasks very different from the one you are training on has proved hard to find.
Researchers worry about who is carrying out the evaluation of the new approach. History has shown us that new approaches evaluated by the people who designed them often mysteriously yield more positive results than those evaluated by independent researchers without a vested interest in the result. These days, independent evaluation is preferred (in educational terms, you shouldn’t mark your own homework).
Researchers prefer to evaluate one intervention at a time. That’s not necessarily the best for how you want to use techniques later, where you might want to combine all the best ones to get the biggest effect. But if you combine approach A and approach B, and the outcome is better than the control group, you won’t know which produced the effect. Was it A, B, or both? You would have to have one group receiving approach A, one receiving B, one receiving A and B, and one neither. That’s four groups before you’ve added an active control. And if you want to combine three approaches, A, B, and C, in order to know what each is contributing, you would need eight groups. Very quickly it gets out of hand. Hence, for simplicity, one at a time.
Lastly, researchers are concerned about statistics. If, according to the measure you use, you find your new approach works better than your control group, how can you be confident the effect is big enough to be real, and not just due to noise in the measurement test (who was paying attention that day) or the particular students who ended up in each group (referred to as sampling). To be confident that the effect is big enough to be real, you need to use statistics, which yield a level probability that the improvement you see is real. You get a higher probability if your sample size is large.
Statistics can become complicated, because often you are deciding whether an improvement (the difference between groups scores before an intervention compared to after the intervention) is bigger in the intervention group compared to the control group – a difference between differences! And then you may want to see if the new approach works better for some children than others (say low-attaining students) which involves using things called ‘co-variates’ (here level of attainment). This may get complicated, but statistics are always trying to answer the same question: do we think the effect we see – that your new approach seems to work better – is likely to be real?

But if something works, it works, right?

You have a new teaching approach; you think your students have done better. Perhaps they do, too. The students are happy, the school is happy, parents are happy. Everyone’s happy. No control groups in sight. What’s wrong with this picture?

Well, yes. If you try something new, and it improves learning outcomes, hooray. You don’t need a control group.

But your new approach may not necessarily have worked for the reason you think it did. It may not work again for you. It may not work for anyone else. And if it involves additional time and resources and fails, this is wasteful. Encouraging others to use your approach may have teachers running round in circles with promising techniques never producing sustained benefits. In contrast, with systematic evaluation, teaching as a profession can gradually home in on the best approaches for particular students learning particular skills in particular contexts. This is the goal of the evidence-informed approach to education.

Michael Thomas 27/8/24

Centre for Educational Neuroscience

University College London – Birkbeck University of London – UCL Institute of Education

What’s a control group? Why do I need one?