Randomized control trials (RCTs) have had a great decade. The stunning line-up of speakers who celebrated J-PAL’s tenth anniversary in Boston last December gives some indication of just how great. They are the shiny new tool of development policy, and a lot of them are pretty cool. Browsing through J-PAL’s library of projects, it’s easy to see how so many of them end up in top-notch academic journals.
So far, so good. But the ambition of RCTs is not just to provide a gold-standard measurement of impact. They aim to actually have an impact on the real world themselves. The scenario goes something like this: researchers investigate the effect of an intervention and use the findings to either get out of that mess quickly (if the intervention doesn’t work) or scale it up quickly (if it does). In the pursuit of this impact-seeker’s Nirvana, it’s easy to conflate a couple of things, notably that an RCT is not the only way to evaluate impact; and evaluating impact is not the only way to use evidence for policy. Unfortunately, it is now surprisingly common to hear RCTs conflated with evidence-use, and evidence-use equated with the key ingredient for better public services in developing countries. The reality of evidence use is different.
Today’s rich countries didn’t get rich by using evidence systematically. This is a point that we recently discussed at a big World Bank – ODI conference on the (coincidental?) tenth anniversary of the WDR 2004. Lant Pritchett made it best when describing Randomistas as engaging in faith-based activity: nobody could accuse the likes of Germany, Switzerland, Sweden or the US of achieving human development by systematically scaling up what works.
What these countries do have in spades is people noisily demanding stuff, and governments giving it to them. In fact, some of the greatest innovations in providing health, unemployment benefits and pensions to poor people (and taking them to scale) happened because citizens seemed to want them, and giving them stuff seemed like a good way to shut them up. Ask Otto Bismarck. It’s not too much of a stretch to call this the history of public spending in a nutshell.
The big assumption of scaling up the evidence from RCTs is that politicians and officials actually care. The findings may convince international agencies (who may well have commissioned the research in the first place), but aid doesn’t drive government decisions, governments themselves do. And, except for the most aid-dependent countries, domestic taxes fund the public spending that the evidence is supposed to influence. Governments need a reason to care about public services and ways to improve their delivery. Not all of them do. It is politically naïve to think that better information is the binding constraint to better services in countries where better services aren’t a government priority. More likely, RCTs are like trees falling in a forest with nobody around to hear them fall. It’s like they never happened.
Quite a few of these arguments could be put under empirical scrutiny. For instance, we do not have a counterfactual for the experience of today’s rich countries. Perhaps services would be better still if governments had used better evidence in the past (or could be improved by doing so today). It would be worth taking a closer look at use of impact evidence and how that tracks across countries. To my knowledge, not much comparative work has been done on this. In his recent blog post, Lant points out that RCTs have been known as a policy tool in the US since the 70s but never caught on.
And what about development? At the WDR 2004+10 conference, Nora Lustig referred to what must be the most frequently cited case for impact and scaling up of an RCT: Mexico’s Progresa/Oportunidades. The story in a nutshell is that incoming presidential administrations in Mexico traditionally canned their predecessors’ flagship initiatives. In 2000, Progresa, an innovative conditional cash transfer (CCT) program, was one such a flagship that seemed to be in acute danger of elimination. Evidence from an RCT-based evaluation helped persuade the incoming government to keep it (though under a different name). For CCT programs, the rest is history.
So have we seen a wave of policy-shaping RCTs being conducted in Mexico since 2000?
Well, not quite. The government did develop an institutional interest in evidence to inform social policy. A semi-independent national evaluation council (CONEVAL) was established to provide evidence to the government. A legal provision that every social program be annually evaluated for its impact was also passed. In practice, however, officials quickly found that RCTs are time consuming and very expensive. They found lessons to be learned from Chile, which had developed a sophisticated portfolio of evaluation tools.
Similarly to Chile, Mexico now uses a menu of options for evaluations. The bulk of these are systematic desk-based evaluations that look at issues of consistency and management, outputs against targets, and so forth. In Chile, one such ‘program evaluation’ cost about $12,000 (in 2003, inflation will have pushed it up accordingly). Compare that to RCTs, which are easily ten times more expensive, if not considerably more. Despite the relatively low cost, they are having a big impact on the efficiency of public spending, resulting in often profound reorganisations and budgetary re-allocations. Mexican and Chilean officials have found enough incremental work to be done to improve program delivery (‘how can we make this program a bit better, or cheaper, or less bureaucratic?’) , so that big, potentially scalable innovation (‘does this work?’) is pretty marginal.
The bottom line is governments s that care about impact have plenty of cheaper, timelier and more appropriate tools and options available to them than RCTs. That doesn’t mean RCTs shouldn’t be done, of course. And the evaluation of aid is a different matter altogether, where donors are free to be as inefficient about evidence-basing as they wish without burdening poor countries.
But for governments the choice of how to go about using systematic evidence is theirs to make. And it’s a tough capability to pick up. Many governments choose not to do it, and there’s no evidence that they suffer for it. It would be wrong for donors to suggest to low-income countries that RCTs are in any way critical for their public service capability. Better call them what they are: interesting, but marginal.