Reasons why the IHME model might be under-predicting coronavirus deaths in the USA

Most political elites in the United States right now seem to be invoking one epidemiological model from the Institute for Health Metrics and Evaluation (IHME) at the University of Washington.

Deborah Birx, leader of the White House coronavirus team, referring to the IHME forecasts.
Deborah Birx, leader of the White House coronavirus team, referring to the IHME forecasts.

Apparently there is a more complicated story about the forecasting models that have been considered by the US government task force, but elite messaging at this moment seems clearly converged on a model that sees roughly April 15th as the peak of destruction. And the main data source for that seems to be the graph above. A regularly updated version is available here.

As I write this, many people are attacking the projections for over-estimating the severity.

In this post, I want to articulate a few reasons for fearing that the model is under-estimating the severity. At the time I’m writing this, I honestly don’t know how much confidence to have in these concerns, so I want to articulate them and let others be the judge…

The IHME has a history of opaque and incorrect measurement

A 2018 article in BioMed Research International analyzed the IHME’s methodology in some of their past efforts. This article really, really does not inspire confidence…

Apparently the IHME is known for using opaque methods and refusing to share information in response to inquiries — a cardinal sin and huge red flag in scientific research of any kind that’s not private sector.

IHME reported 817,000 deaths between the ages of 5 and 15… In fact, when we look at the UN report data, the deaths are 164 million.

Did anyone else have to read that twice? This kind of discrepancy really makes you wonder what exactly is going on under the hood. But let’s give ‘em a benefit of the doubt and carry on…

Even more alarming to me is this:

IHME's methodology for measuring burden of disease has an unclear stage called “black box step.” In particular, only the Bayesian metaregression analysis and DisMod-MR were used to explain the YLD measurement method that should estimate the morbidities and the patients, but no specific method is described [2]. WHO requested sharing of data processing methods, but was informed of the inability to do so. For this, WHO researches were recommended to avoid collaborative work with IHME [8].

Does anyone else find this extremely troubling? I would really like to learn what possible reason the IHME — presumably a recipient of public funding — would have for declining to share methodological details. It is utterly insane to me that the US government would predicate its public forecasting on an insititution that won’t even disclose basic methodological details.

The model assumes complete social distancing and, um, have you talked to your Grandfather on the phone lately?

Am I missing something? It seems obvious that America is not anywhere near “complete” social distancing. I talk on the phone with my family in NJ, the most badly hit state other than New York, and the vibe throughout my large working-class-to-middle-class family milieu is surprisingly nonchalant. I think most Americans are vaguely going along with public directions by now, but I really don’t see the average American taking it as seriously as would be needed to motivate completely rigorous social distancing.

So then the question becomes, how sensitive is the model to the social distancing assumption? Well, we are not allowed to know for some mysterious reason. So let’s try some some stupidly crude back-of-the-napkin calculations. Deborah Birx said if there is zero social distancing, we would expect between 1.5 and 2.2 million deaths (presumably based on the Imperial model). If there is complete social distancing, we expect 100,000 and 240,000 deaths (based on the IHME).

Extrapolating from that, if you think Americans are doing social-distancing at a 50% level of rigor, then just split the difference: About 1 million deaths.

The IHME model assumes every state has stay-at-home orders

It’s a simple fact that some states still don’t have stay-at-home orders. How are these people using models with assumptions that are observably inconsistent with reality? Again, the whole thing just smells rotten, which is more troubling than any particular quantitative quibble.

The IHME model assumes every US state is responding like China did (?!?!)

“It’s a valuable tool, providing updated state-by-state projections, but it is inherently optimistic because it assumes that all states respond as swiftly as China,” said Dean, a biostatistician at University of Florida.

How is this real?

But there are still more reasons to fear this model is underestimating the coming destruction…

Americans are less likely to go to doctors and hospitals

Americans face higher out-of-pocket costs for their medical care than citizens of almost any other country, and research shows people forgo care they need, including for serious conditions, because of the cost barriers… in 2019, 33 percent of Americans said they put off treatment for a medical condition because of the cost; 25 percent said they postponed care for a serious condition. A 2018 study found that even women with breast cancer — a life-threatening diagnosis — would delay care because of the high deductibles on their insurance plan, even for basic services like imaging. (Vox)

This is important for two reasons.

First, it means that Americans are probably less likely to seek testing, and if the model presumably uses testing data as input, then the model will underestimate the problem.

But second, it means that Americans, on average, may go to doctors/hospitals later in the process of virus onset than the citizens of other countries. This actually has two implications: It would mean the model is underestimating the coming destruction because sick Americans are still hiding at home, but also the longer-term fatality rate may be higher than projected because Americans are less likely to seek and receive early-stage care that could save them.

Practical decisions should never be made on the basis of one model, anyway

Even if the model is the best possible model in the world, all statistical models are intrinsically characterized by what is called model uncertainty. You just never really know if you’re using the right model! All applied statisticians know this. For this reason, much applied data science leverages what are called ensemble methods. You run many models, and combine them in some way, if only averaging out their predictions.

So yea, who knows what the optimal forecast is, but personally I will wager that the worst day of deaths will see more deaths than the IHME point estimates predict.

If you think I’m missing anything, I would love to hear what.

And while I’m at it, why not throw out a numerical prediction just to hold myself accountable later? Personally my guess is that we will exceed 500,000 deaths, based on the reasoning above. I am not highly confident given the obviously informal nature of my reasoning, but I would bet a modest amount of money that the IHME model is under-predicting the coming destruction. I hope I’m wrong.

How to know if you’re talking to a wokescold: A scientific method for preventing IRL flame wars

[FYI: If you’re interested in data-blogging like this, I’m offering a little free course on it.]

We’ve all been there: You’ve had a couple drinks, you’re having fun talking with someone, then you blurt out a controversial opinion and everything goes belly up. Maybe your interlocutor scolds you, maybe they just walk away, or maybe nothing happens but there’s gossip a week later…

If you have controversial opinions, what you need is a method for knowing — in advance — whether your conversation partner can handle them. It needs to be simple and quick enough to be practical, but it needs to be scientific enough to offer real predictive validity.

It recently occurred to me that there exists a statistical technique that solves exactly this problem. It’s called recursive partitioning, and the practical tool it produces is called a decision tree. If you have data on public opinion and other demographic variables, you can use statistics to determine which chain of questions will give you the best guess about someone’s position on any given issue. If we create a decision tree to predict their position toward suppressing naughty opinions, then we have a simple, practical, and scientifically valid “life hack” for avoiding IRL flame wars.


I did this last week and the results are very interesting. If you’re interested in the statistical details, or you’d like to run the code yourself (perhaps on a different outcome variable), you can find all of that here. In this post, I’ll focus on the social and practical implications.

Here’s all you need to know about the stats. In this analysis, “being a wokescold” is proxied by whether or not someone thinks racist speakers should be allowed or disallowed. For possible predictor variables, I included a handful of variables that are reasonable to ask someone about or easy to observe yourself.


  • sex/gender = variable named sex
  • race = variable named race
  • left/right identification = variable named pol
  • family income = variable named realinc
  • college attendance = variable named college
  • word knowledge or verbal skill (proxy for IQ) = variable named wordsum

I then conducted recursive partitioning, which breaks the data down into the sequence of branches giving the most predictive traction over the outcome variable.


Figure 1 plots the resulting decision tree.

Figure 1

The graph is fairly intuitive, and if you’d like to understand the numbers better, see my more technical post over at jmrphy.net. Here I will give you a more concise and practical translation, resulting in a simple heuristic you can memorize.

If you meet a random person, there’s a 38% chance they’re a wokescold (defined as wanting to suppress racist speakers; one can debate this, but whatever, it’s a decent proxy).

The very first and most important question you can ask someone, to avoid a flame war, is: “Did you ever go to college?" If they say yes, the probability of them being a wokescold drops to 29% and that’s your best guess: They are probably not a wokescold. Nothing else will improve your guess from this point (at least from the variables we selected).

Now, many of you will say: But it’s the college-educated wokescolds one should be most afraid of! True. The limited utility of this analysis is also it’s primary social-scientific value: It reminds us that college-educated wokescolds remain a relatively minor anomaly, quantitatively speaking. Being educated still means you’re much more likely to support unsavory expression. It’s true that educated wokescolds are often the most dangerous landmines we’d like to tiptoe around, and unfortunately my particular analysis this week will not help you on this front. Fortunately, I have an alternative algorithm custom made for this use-case: If they went to college and they’re also a female with dyed hair, hold fire on your nuclear takes: They are probably a wokescold. Unless they’re Amber Frost.

If they never went to college, the next question you have to ask yourself is whether they're smart. You probably don't want to give them a vocabulary test, but conversation is pretty revealing. If they are smart, you infer they are not a wokescold (40% chance). If they are dumb, it's now a coin flip (50%).

Next, what is their race? This you can probably guess yourself. If white, this bumps them very slightly toward not being wokescolds (48%). If non-white, this bumps them toward being wokescolds (57%). From here:

If they are white and male, there's a 45% chance they’re a wokescold so you infer they are not — and that’s your final guess. If they are white and female, you should see if their family is rich or not. If rich, they are slightly less likely than a coin flip to be a wokescold (46%); if poor, they're slightly more likely than a coin flip to be a wokescold (54%).

If they are dumb and non-white, there is a 57% they’re a wokescold and that’s your best guess.

A heuristic you can memorize

(This only applies in America, mind you, the land of the free.)

  1. If they’re a female who signals creativity or virtue (e.g., dyed hair, bumper stickers), don’t share any edgy takes (this is post-hoc to the model, just a precaution in light of data limitations and researcher experience).


  1. If they went to college, they’re probably not a wokescold. You may gradually begin to share your edgy takes.
  2. If they did not go to college, but speak more intelligently than average, they are probably not a wokescold. You may gradually begin to share your edgy takes.

For all others, the safest decision rule is to not share edgy takes. Bonus rule only if you can master the above 3-step algorithm and you have an appetite for risk:

  1. If they are rich white people, you may gradually begin to share your edgy takes.

What about ideological identification?

The most intriguing result here, to my mind, is that ideological identification totally drops out — it appears to have no predictive power! As I wrote in my technical post:

[That ideological identification has no predictive power] is fascinating, given that many people today tend to think of speech suppression as a fashion on the educated Left! And it is, but that's only a highly visible minority. Political scientists would not be surprised by this result: We've long known that leftists and educated people are always more supportive of free expression (you just don't hear about those people in the media right now).


Please note that the model here does not provide especially satisfying statistical discrimination. It’s better than nothing, but one must still proceed carefully. Always begin by sharing mildly provocative takes, and watching your interlocutor’s reactions. Do not advance to nuclear takes until several acts of mild edgelording produce only smiles, laughter, or excited edgy reciprocity. With additional data and more sophisticated modeling, we may hope to derive more confident predictions for more ambitious social maneuvering. Until then, be careful.

The content of this website is licensed under a CREATIVE COMMONS ATTRIBUTION 4.0 INTERNATIONAL LICENSE. The Privacy Policy can be found here. This site participates in the Amazon Services LLC Associates Program, an affiliate advertising program designed to provide a means for us to earn fees by linking to Amazon.com and affiliated sites.