A Slim Contribution to Obesity Research

Accept as a working hypothesis that there is an obesity epidemic. The finger of suspicion has been pointed at a supposed increase in consumption of fast foods over the last few decades.

How might we research this? and why should we care? – given that as a reader of this blog you are most likely to be interested in business related analytics.

Well, it affords us the possibility of showcasing a research technique (retrospective dataset construction and modelling) and allows us to talk about how we might robustify the results to reasonable criticism (of our own, or the nay-sayers who would like to dismiss the results by cherry picking weaknesses). Read on.

We know that fast food outlets (FFOs) have been introduced at different rates in different parts of the country, presumably reflecting some head office modelling of population and likely profitability.

It is not unreasonable, although not central to our modelling, that we assume that lower socio-economic areas were the first to be graced by FFOs, and those in high population density/high traffic areas .. where the land was cheapest. So, we have a number of possible covariates – if we measure and model the effect of these it will make our model more robust to weak criticisms, make it more compelling and less easily dismissed.

So, we do a simple survey asking for

  • a) current weight
  • b) remembered access to FFO’s at ages 10 and 15. By “access” I mean eg the distance to the nearest FFO at that age. Obviously the questions need to be constructed with care to achieve some convergent validity and manage memory effects, but this is not a herculean task (as in, I don’t have the time to go into it now).
  • c) the answer set to such access questions obviously has to accommodate “can’t remember” and “it varied”, and it should also allow for some uncertainty on the respondent’s behalf (eg about 5 miles, but maybe it could have been up to 10 miles). (these uncertainties will be dealt with in the modelling, but note that they are very small relative to the range which we will induce by sampling .. if we interview people in WayOutWestVille – where we know there was never an FFO until 1993 then their answers to the access question will all be “very low”, conversely those in InnerWhitebread will mostly be “high”)
  • d) the questions can be made even more sophisticated than the above ; we could even get a measure of “total childhood exposure to FFOs” should we wish. None of this has to be perfect or immune from criticism, just careful and sensible, and forward looking (to potential weak spots)
  • e) we could even ask directly about remembered consumption of FF .. not impossible, and we can deal with biased memory in the analysis
  • f) current consumption might be a useful covariate (and we could possibly do some matching with other/official surveys, to lend additional credibility)

OK, we have done the survey. Our sample includes people from areas in which we had a strong prior belief there would be high historical FFO access and those for which we had a strong prior belief there would be extremely low FFO access. This induced variation is critical to the success of the modelling. We were also smart enough to record, for each individual, relevant covariates (age and gender, socioeconomic indicators, family history of obesity, predisposing medical conditions etc etc)

Enter the model. A simple linear model would be a good first cut (obesity = f ( FFOAccess, covariates..) and we would do all the obvious sensible things like examine it for outliers, use robust estimation, maybe consider a model tree or a gravity model.. all the sensible things one would do.

Suppose then that we find the effect of FFOaccess is “very significant” AFTER having taken into account the covariates.

CASE PROVEN? Not quite. We now want to robustify against “reasonable criticism”. It is “reasonable” to say “people’s memories are faulty” . Sure, how faulty? If we then proceeded to modify the data set such that 30% of the respondent’s answers on FFO access were changed by a random amount (to be agreed) and the effect was STILL highly significant, would you believe the results then ?

Hmm, but maybe people’s memories are biased. OK. Let’s say that they are biased in the direction of the finding .. that currently obese people are more likely to (falsely) remember high FFOAccess in their childhood. OK, we do some simulations .. maybe we discount the FFO access for ALL these people by 20% . The effect is STILL significant.

Believe me now? You say “well, they could have come from different backgrounds …” Umm, we measured that and included it the model so we have already adjusted for those effects.

At this point a “reasonable person” , having raised reasonable doubts and having had them evaluated, should start to be convinced. Or maybe the model will not withstand reasonable implementation of reasonable doubt, and the conclusion could not be maintained. But at least we know.

So, it is perfectly possible to go about analysis in such a way that we can defend against those who – at the last minute – wish to pick holes and pull the legs out from under it by cherry picking weak spots.

Pardon the mixed metaphors. I’ll get out of your way now.

ps. A nod to the Bayesians .. yes, one could have cast this in a Bayesian framework and used Bayesian Regression and sensitivity analyses – see for example Chapter 6 of Jeff Gill’s recent “Bayesian Methods: A Social and Behavioral Sciences Approach”[1] and particularly the example of the 2000 US Election in Palm Beach County (the interest is a model of influences on spoiled ballots) and subsequent sensitivity analyses.

[1] Bayesian Methods: A Social and Behavioral Sciences Approach
Author: Jeff Gill
Publication Date: January 2002
Publisher: CRC Pr I Llc
ISBN: 1584882883

Comments are closed.