Most (Published) Research False?
So it would seem.
And once again we are indebted to Bandolier for some clear statistical thinking.
And for a quick journalistic overview of the issues, the “Health Report” is worth reading.
OK, few readers of this blog are into medical statistics and clinical trials. But the Bandolier article is very accessible and helps us hone our sense of what level of trust we might place in (reported) findings, including those from analytics.
The world of clinical trials and precisely stated hypotheses and objectives does not map directly to the commercial and marketing worlds : there we tend to be more interested in generating insights rather than confirming or disconfirming hypotheses.
And estimation of effect sizes is a lesser priority, something of course we don’t see at all in qualitative research – just in quant research and choice modelling.
But is this really so?
A qual report de-facto has to say something about “what is important” .. it might heavily qualify the results by saying that they are from a small group of individuals, but nevertheless the decision maker has to do something with them .. to trust, or not to trust.
And here is where a well honed sense of caution needs to cut in.
It is easy to see how, in a simpler environment, the “Law of Initial Results” comes about
the “Law of Initial Results” :
so often early promising results are followed by others that are less impressive. It is almost as if there is a law that states that first results are always spectacular, and subsequent ones are mediocre: the law of initial results.
(Bandolier goes on to cite the evidence for this).
Well, you can refer to the article and the simulations by Ionnaides.
But it is very easy to see one simple way this comes about .. noteworthy/interesting/significant results are “published” (“presented” in the commercial world) , others are not. So we get regression towards the mean.
The effect is even stronger if well meaning researchers ignore unpromising results, then tweak the product (concept) and test again.
There is a very serious issue here about how we might possibly account for “prior research history” , but that is just too hard to even start on here. Let’s leave that one alone.
There is another angle on this.
Hypothesis Consistent Research
There is an interesting article in the “Journal of Personality and Social Psychology” (1995, vol. 68, no 1, pp. 52-60 )
“Hypothesis confirmation: the joint effect of positive test strategy and acquiescence response set” by Zuckerman, Knee, Hodgins, Miyake
Hypothesis testers tend to ask hypothesis-consistent questions (i.e., they ask about features more likely under the hypothesis than under the alternative).
Targets tend to acquiesce (i.e., they provide more yes than no answers).
….. the data generated were consistent with the hypothesis being tested. On the basis of these data, hypothesis testers drew inferences in line with the hypothesis they were testing. Because hypothesis testers derived their conclusions from hypothesis-confirming data, more diagnostic data resulted in a greater confirmation bias
Bringing it back home
Let’s just think about you and some research/analysis that has been presented to you.
Just how much should you trust it (research/analysis) ?
Well, assuming that you were not too deeply involved in the design-test/analyze-redesign-retest/reanalyze process.. that you have an oversight role of sorts, let me lightly paraphrase some of the Bandolier/ Ionnaides guidelines
§ if we accept evidence of poor quality, without validity, or where there are few .. cases/participants , we are likely, often highly likely, to be misled.
§ If we concentrate on evidence of high quality, which is valid, and with large numbers, that will hardly ever happen……..if instead of chasing some ephemeral statistical significance we concentrate our efforts where there is good prior evidence, our chances of getting the true result are better
§ the smaller the studies conducted …. the less likely the research findings are to be true.
§ the smaller the effect sizes ……. the less likely the research findings are to be true.
§ the greater the number … of tested relationships ……. the less likely the research findings are to be true.
§ the greater the flexibility in designs, definitions, outcomes, and analytical modes …….. the less likely the research findings are to be true.
§ the greater the financial and other interests and prejudices ………. the less likely the research findings are to be true.
§ the hotter .. an analysis technique.., the less likely the research findings are to be true.
Sensible, no?
Let me nail my colors to the mast here, lest the above be read as unthinking, partisan support for large scale, rigorous, quantitative research surveys.
I don’t work in the clinical trials field, my work is commercial and creative, looking for an edge as often as not : rather than being concerned with significance tests and publishable (and dull) results, I am interested in the progress of understanding and insights. A healthy disregard for “the real truth”, excessive mensuration .. we should not care too much about accuracy in measurement of phenomena we do not understand.
But given all that, given our relaxed and comfortable approach to research, exploratory investigations, data mining, given that we don’t have to be too concerned with evidence and truth but merely with insights and directions: given all that, there is still something to be learned about caution and trust from the above.