The Pro Bono Statistics blog has some excellent pieces on the new NEJM Iraq Family Health Survey estimate of violent mortality. PBS raises several issues. First (s)he finds a correlation of .94 between governorate (province)) population size and sample size, which apparently contradicts the published description of the sampling procedure.
PBS also finds fault with the way the IFHS dealt with missing data, by extrapolating from Iraq Body Count data for two governorates.
While the more detailed postings described above are important to read, I’ll reproduce here a summary of five issues with IFHS raised by PBS. Evidently, PB is in the process of writing separate postings on each issue:
5 problems with the science of the IFHS study
Reviewing the IFHS study, I found 5 problems with the science of the study. I believe that taken together (but particularly the first three points, regarding the crucial role extrapolation plays in arriving at the estimates in the study, and regarding the ratio of under-reporting) those problems should be seen as grave. At the very least, they should be seen as putting the findings of the IFHS on equal or inferior footing to those of Burnham et al., rather than as being on superior footing due to the nominal large size of the sample in the IFHS.
I now give a brief abstract of the five problems. As I write a fuller description of each, I will add a link to it from the list here. Unless explicitly stated otherwise, death rates and counts below refer to violent deaths as defined by the IFHS authors.
1. Missing clusters and extrapolation using IBC numbers. The IFHS surveyors did not visit all of the clusters in their sample. Those areas that were judged to be dangerous went unsurveyed. A minority of those gaps (in Nineveh and Wasit) seem to be ignored, introducing potential bias. To fill in the rest of the gaps, the IFHS authors extrapolated from other areas. The extrapolation method was to calculate the mortality rate in all of Baghdad as a fixed factor times the mortality rate in some reference area, where the fixed factor was calculated using Iraq Body Count data. The same method (with a different factor) was applied to all of Anbar as well.
It is important to note that these extrapolations determined the total number of deaths estimated for Baghdad and Anbar. Any data that was collected within those areas was in effect ignored in calculating the death estimates. Thus the death count in Baghdad and Anbar, that together account for over 60% of the deaths in the estimate for the total, are purely a matter of extrapolation, and depend directly on the IBC extrapolation factors. To illustrate: the extrapolation factor used for Baghdad was 3.08. If instead the number was 6, that would have added about 80,000 deaths to the estimate.
The reliability of the IFHS estimate thus depends directly and substantially on certain properties holding for the IBC data (namely coverage rates which are constant across space and across political characteristics). We have no reason to assume that those properties hold, and have some reason to assume they don’t. The IFHS authors have apparently made no attempt to account for those issues – not so much even as to factor uncertainties into the size of the confidence interval.
In addition, the extrapolation method is the reason for the close resemblance, emphasized by the IFHS authors, between the IBC and IFHS breakdown of deaths by area. This resemblance is an artifact rather than a feature of the raw data and should not be seen as showing coherence between IFHS and IBC.
2. The extrapolation procedure is problematic even if the IBC extrapolation factors are assumed accurate. The extrapolation basis is the death rate in 3 reference governorates (the paper does not say exactly which, describing them only as the “three provinces that contributed more than 4% each to the total number of deaths reported for the period from March 2003 through June 2006″). Most governorates were sampled with 3 x 18 = 54 clusters each. Nineveh was sample with 72 clusters. Thus the estimate of deaths for Baghdad and Anbar (which, again, account for over 60% of the total) relies on at most 2 * 54 + 72 = 180 clusters. This number, much smaller than the nominal size of 971, is the dominant factor in determining the uncertainty of the estimate of the total (again, even if the extrapolation factor is assumed to be correct and known precisely).
This is the reason why the length of the confidence interval for the IFHS study (about 120,000 deaths) is not much smaller than that of Burnham et al. (about 370,000) despite the fact that Burnham et al. used only 47 clusters.
3. The IFHS does not account properly for uncertainty in under-reporting. In the same way that the IFHS estimate depends on the extrapolation factor, it depends on the assumed under-reporting factor. The justification for the factor used seems slim (I have not made an attempt to follow the reference given). Even accepting their assumptions – i.e., treating the proportion being reported as a normal variable with mean 0.65 and standard deviation of about 0.075, the authors fail to properly account for the uncertainty in the under-reporting in their calculation of the confidence interval of the estimate of the death rate. A proper accounting would increase the size of the confidence interval by about 25%.
4. In the IFHS paper, the heading “violent deaths” does not include certain types of injuries. I could not find this mentioned in the paper itself, but table 3 in the supplementary material and a statement by WHO official indicate that car accidents and “unintentional injuries” are not included in the estimate. This may seem reasonable a-priori regarding car-accidents, and to a lesser extent regarding unintentional injuries. However, contrary to the statement, those two categories account for more than a third of the deaths by injury in the survey. Also, there has been a dramatic increase in both of those categories as compared to pre-war rates. Under those circumstances, it appears unjustified to exclude these categories from the estimate. Including them in the estimate would increase it by more than 50%.
5. The last point is more of an indication of trouble (either in the methodology of the survey or in the way it is described in the paper) than a specific problem with the estimation. According to the description of the sampling method, 10 households were surveyed in each cluster, and there were (with few exceptions) 3 x 18 = 54 clusters per governorate. In such a set-up there should be no correlation between the number of people surveyed in each governorate and the size of the population in the governorate. However, looking at table 2 in the supplementary material of the paper, there appears to be a strong correlation between those two figures. It seems that the only way such a correlation could show up is the unlikely situation in which the size of the population in the a governorate is strongly correlated with the average household size in the the governorate.
January 20th, 2008