The power and limits of controlled experiments

The Freakanomics blog is running a fun contest asking readers to predict whether providing more factual evidence of impact increases or decreases donations.  Dean Karlan, co-author with Jacob Appel of More Than Good Intentions is running an experiment in partnership with Freedom From Hunger.  Dean wants to understand whether sharing, with donors, cold hard facts about the proven effectiveness of a business training program run by Freedom From Hunger increases or decreases donations.

As context, there is a well-documented study conducted by Deborah Small, George Loewenstein, and Paul Slovic that tests how potential donors respond to generalized factual information about hardship versus the story of an individual girl, named Rokia, in Mali.  The punchline is that the story of Rokia elicits about 2x the donations as does a brief with summary factual information about poverty.

Put another way: stories sell, facts don’t.  (remember, we think with our brains, but…)

Dean Karlan’s experiment is designed to test whether “story + facts” is more or less effective than “story.”   To test this, Freedom from Hunger sent out two mailers.  The control mailer just has the story of Rita, and it starts:

Many people would have met Rita and decided she was too poor to repay a loan.  Five hungry children and a small plot of mango trees don’t count as collateral.  But Freedom from Hunger knows that women like Rita are ready to end hunger in their own families and their communities…

The treatment mailer has different copy:

In order to know that our programs work for people like Rita, we look for more than anecdotal evidence.  That is why we have coordinated with independent researchers to conduct scientifically rigorous impact studies of our programs.  In Peru they found that women who were offered our Credit with Education program had 16% higher profits in their businesses than those who were not, and they increased profits in bad months by 27%!  This is particularly important because it means our program helped women generate more stable incomes throughout the year.

These independent researchers used a randomized evaluation, the methodology routinely used in medicine, to measure the impact of our programs on things like business growth, children’s health, investment in education, and women’s empowerment.

The question is: which mailer will have a higher response?

My guess is that the first one wins (even though this mailer is being sent to repeat donors, who have probably heard this story before – and it’s fair to guess that I’m wrong, otherwise why would they have blogged the contest in this way?).

Whether or not I’m right, I’d like to see a better-designed study. It feels misleading to me to describe this as testing “story” versus “story + facts.”  I’d instead say it’s testing “good letter” versus “only OK letter,” and if my take is right, the generalizability of these results will be low indeed.

This is on my mind because last week I had the chance to hear Esther Duflo speak about some of the examples from her book, Poor EconomicsMany of them are highly compelling – particularly those in which the treatment being tested (e.g. de-worming) is clear and readily measurable.

But the risk of the randomized-control trial rage (which is, very appropriately, a hot and exciting topic in our field right now, and Esther and Abhijit have been champions of high-quality, clear thinking) is that we over-extend our definition of “treatment” that can meaningfully be assessed in this way.  For example, one of the examples Esther cited in her talk was about whether poor farmers were willing to pay enough for a weather insurance product to make the product commercially viable.  In this test, farmers were offered a relatively simple and straightforward product that would pay them a certain amount if recorded rainfall at the weather station dropped below a certain level.  The conclusion, as described by Esther in the talk (and stated more strongly than she does in the book), was that farmers wouldn’t pay enough – and I heard her take this to mean that the insurance market for the poor might not be viable without significant subsidy.

Not having dug into the research – but having heard Esther’s description – I was left worried that in this case, like in the Dean Karlan study about the mailer, we run a real risk of overreaching in the conclusions we draw.  It may well be that market-based insurance for the poor doesn’t work; it may be that government needs to provide a subsidy; it may also be that in a market in which there is a limited track record of insurance, little history of or confidence in payouts, no competition and almost no trust, the study showed that willingness to pay was low – which wouldn’t be in the least bit surprising.

What I’m getting at is that sometimes our attitude about figuring out “what works” in poverty alleviation feels like designing studies, in the 1980s or 1990s, on the future of the tablet market based on intensive study of the Apple Newton and early tablet PCs.  Assuming everything is static, there’s no market.  But of course the whole point is NOT to let things be static – to create the development equivalent of the iPhone and the iPad through relentless innovation and a dogged unwillingness to fail.

This is an important point because at some fundamental level we must ask ourselves how much we believe in the power of innovation.  How far do we push, prod and experiment before we conclude that something does, or doesn’t, work?  In the simple example of the Freedom from Hunger mailer, I’m betting that some drastically better copy would have the desired effect (or a bigger desired effect) of using hard data to increase donations.  In the insurance example, I’d be interested in a lot more product development, market testing, and trust-building with smallholder farmers before drawing any broad conclusions.  And so it goes across the board with all the major interventions in the fight on poverty, from microfinance to girls’ education to de-worming to fortifying food to to HIV/AIDS prevention (where, shockingly, male circumcision is proving to be a very effective way to slow the spread of disease).

I don’t want to come out against testing, rigor, and “proof” – not at all.  We need all of these things, and need to have the ability to ask tough questions, to be willing to let things go quickly when they’re not working, and to over-resource things that are working even if they contradict our initial assumptions.  At the same time, our field – and, specifically, the injection of real innovation into our field – is nascent enough that it feels early in most cases to aspire to draw anything but narrow conclusions about what does and doesn’t work; where the poor are and are not willing to pay; and what interventions will have the greatest impact over time.  We’ve seen this play out most recently and most vociferously in the microfinance space – too-broad claims that it changes everything, and then equally broad claims that it does nothing – when surely the right answer is that when done right it can be valuable, when done wrong it can be destructive.   I’m sure we’ll see this same story play out time and time again, across interventions, across sectors, and across geographies.

This entry was posted in Innovation and tagged , , , , . Bookmark the permalink.

5 Responses to The power and limits of controlled experiments

  1. Rob says:

    Thanks for a very sensible post, Sasha. As they say, not everything that counts can be counted; not everything that can be counted, counts. Or to take it a step further, not everything that can be randomized, should be.

    Over-randomization and over-measurement (done badly) is a real risk in our sector. But in defense of the randomistas, the development sector has historically done a very poor job of finding out what low-income people really think, and what they really want. I don’t think field visit conversations with poor people (including pre-intervention and post-intervention M+E surveys) are sufficient listening devices. What I liked about Poor Economics and More Than Good Intentions is that the authors use randomized experiments to more accurately and effectively listen to the poor.

    Are randomized trials the only way to listen to the poor? Absolutely not. But I think they are useful tools and, maybe counter-intuitively, good listening devices.

  2. Tayo says:

    Great post, Sasha. I agree with Rob about the utility of randomized control testing to listen to the poor. This is an important insight; perhaps it even partially explains why the “story” approach works better in fundraising. With the Rokia example, I suspect that contextualizing a problem with a personal story makes the interchange feel more like a dialogue (talking and listening), instead of a monologue (talking OR listening, depending on which side of it you’re on). A conversation invites input and action; a monologue/directive does not.

    Also, although I’m only half way through “Poor Economics”, what I’ve found both challenging and useful about the authors’ presentation is the way that rigorous fact-finding (in the form of RCTs here) pokes at commonly-held assumptions, stimulating a search for better questions, and presumably, better answers. This type of process can surely drive the type of “disruptive” innovation you describe, no? We may well disagree with the types of answers that RCTs provide (both in terms of quality and applicability). That’s good news, especially because the process leaves a trail of bread crumbs that enables well-reasoned tire-kicking. Perhaps that suggests that these answers are good enough to be worth debating, which could also spur the innovation process.

  3. Pingback: Terrified of success | Sasha Dichter's Blog

  4. Kate Lang says:

    I had similar concerns about the copy for the mailer. They do caution in the blog against over-generalizing from these results, but that doesn’t mean that people won’t do it.

    I think your point about analyzing the potential tablet market based only on past experiences with tablet computers is spot on. A good analysis in any situation should include both quantitative and qualitative data. RCTs are an important component to determining what works, but they should not be our only reference point.

    If every entrepreneur developed a product or business model based only on what verifiably worked in the past or what we verifiably know about the current opportunity/need, it would be a sad situation indeed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s