In comments for my previous post about regression to the mean, someone asked a perfectly logical question: yes, a study subject’s cholesterol might be spiking on the day he’s screened, which means he’s enrolled in study of a cholesterol-lowering drug even though he doesn’t actually have high cholesterol. But doesn’t it work in reverse as well? Wouldn’t that occurrence of random chance be offset by someone whose cholesterol happened to be lower than usual on screening day?
The short answer is: no, the two random variations don’t offset. Let’s assume the cutoff for being enrolled in a study of a cholesterol-lowering drug is 230. If the guy whose true average cholesterol is 215 happens to spike at 235 on screening day, he’ll be enrolled. Meanwhile, the guy whose true average is 245 but happens to have a cholesterol level of 225 on screening day won’t be enrolled.
For the long answer, let’s show randomness in action with a larger group — say, 100 people. It would be easy to choose cholesterol numbers to demonstrate my point, but that would be cheating. So we’ll fire up Excel and put it to work.
First, we’ll create a subject pool. From what I’ve read, the average cholesterol level among adults living in industrial societies is around 220 – that is, if they’re not on statins. (The average is lower in many countries now because almost everyone with “high” cholesterol is given a prescription.) So we’ll create a pool of 100 potential subjects whose cholesterol levels more or less mimic the real-world population.
To do that, I told Excel to give 10 of the subjects a random cholesterol number between 140 and 300, the next 20 subjects a random number between 170 and 270, another 20 subjects a random number between 190 and 250, and the final 50 subjects a random number between 205 and 235.
The result was a wide range of cholesterol numbers with some outliers, but with the majority clustering within a stone’s throw of 220. I called those numbers the Real TC. Then to toss random chance into the mix, I told Excel to adjust each number by a random number between -30 and 30. I called those numbers the Screen TC. That’s the total-cholesterol number the researchers would record as the baseline for each patient.
As you can see, the averages of both the Real TC and the Screen TC (the one adjusted by random chance) are very close to each other, and also very close to the average among non-medicated adults. So to a researcher, these screening numbers would look about right.
But they’re not. Thanks to our random variations, some people whose true average cholesterol is above 230 will have a screening number below 230, and some people whose true average cholesterol is below 230 will have a screening number above 230. In theory, the effect would be the same in both directions. Therefore it’s no problem, right?
Wrong!!! It’s a huge problem. The true high-cholesterol subjects who score below 230 solely because of random variations are excluded from the study. In the data shown above, that would happen with Patient #97.
Meanwhile, the true low-cholesterol subjects who score above 230 solely because of random variations are included in the study. In the data above, that would be the case with Patient #99. The end result is that people whose true cholesterol is lower than their screened cholesterol are overrepresented in the patient pool. Once again, let’s let the numbers tell the story.
Remember, only people with a total cholesterol of 230 or above on screening day are enrolled in the study. So among those enrolled, there can be three effects of the random variation: someone with true high cholesterol can be screened as having even higher cholesterol (H > H), someone with true high cholesterol can be screened as having lower cholesterol that’s still at or above the cutoff of 230 (H > L), or someone with true low cholesterol can be screened as having high cholesterol (L > H).
After our random screening (courtesy of Excel), 35 of the 100 people screened ended up in the study. There were 11 subjects with true high cholesterol whose screening number was even higher, eight with true high cholesterol whose screening number was lower, and 16 with true low cholesterol (below 230), but who were screened at 230 or above — and thus ended up in the study group. That means out of our 35 study subjects, 27 have true cholesterol that’s lower than the number assigned on screening day. And again, this is all due to nothing more than random variations interacting with a cutoff number for enrollment.
Are you with me so far? Good.
Now let’s assume regression to the mean kicks in. At the end of the study, people whose cholesterol was spiking on screening day return to their true, lower average, which we’ll call Final TC. Likewise, people whose cholesterol was artificially low on screening day return to their true, higher average – but there are fewer of them for the reasons I explained above. Here’s what our average numbers look like.
Between screening day and the final day, our group showed better than an 11-point drop in cholesterol on average. Woo-hoo! The drug works!
Uh, but wait … I didn’t treat these people with any drug. The effect is totally due to random variations spread evenly across the potential-subject pool. Yes, the variation was equal in both directions. But when the variation was sufficiently downward, someone with true high cholesterol was excluded from the study. When the variation was sufficiently upward, someone with true low cholesterol was enrolled. When everyone regressed to the mean, the statistical effect was a drop in average cholesterol (at least as measured) among those enrolled.
Okay, let’s add one more twist. If the subjects enrolled were split evenly into a treatment group and a placebo group, then everything I described above should apply equally to both groups. We might see an artificial drop in cholesterol numbers across both groups, but no real difference between groups. So the researchers would have to report that the drug was no better than a placebo.
But as Chris Masterjohn pointed out in the article I linked in my previous post, that assumes we’re talking about studies with sufficiently large patient populations that are properly randomized. Let’s see what happens if we split our 35 enrollees into small groups.
To make sure I’m not cherry-picking, I took our study group and had Excel assign each subject a random number from 1 to 4. Then I divided the subjects by those numbers.
Just looking at groups 1 and 2 shows what can happen with small study groups. Suppose group 1 had been assigned the placebo and group 2 had been assigned the cholesterol-lowering drug. Look at the numbers. (Again, the Final TC here is just a return to each subject’s true average.)
Wow! The placebo group’s total cholesterol dropped just six points on average. But the treatment group’s total cholesterol dropped by an average of 19 points! Woo-hoo! The drug works!
But once again, I didn’t treat anyone. These numbers are all being produced by nothing more than random variations followed by a regression to the mean.
Now, if you’re a clever sort, you may already be typing your next comment … something along the lines of Oh, yeah, Mister Smarty Pants? Who says everyone would regress to the mean? If there are random variations on screening day, wouldn’t there also be random variations on the final testing day? Huh? HUH?!! And wouldn’t those cancel each other out?
Okay, let’s see for ourselves. Since our drug didn’t actually do anything (because we don’t actually have a drug), I took the true average cholesterol for each patient and, once again, told Excel to apply a random variation of between -30 and 30. That number became our new Final TC.
Adding random variation to the final measurement did, in fact, reduce the dramatic difference between groups 1 and 2.
Now it’s 12 points lower vs. 19 points lower. Not so impressive. But let’s suppose Group 3 had been our placebo group and Group 2 was still our treatment group.
Once again, we see a drop of just six points in the placebo group, but a drop of 19 points in the treatment group. Woo-hoo! If we make like a pharmaceutical marketer and express the difference in relative terms, our treatment was better than 200% more effective than the placebo!
But once again, there was no actual treatment effect whatsoever. The difference is entirely random. I didn’t have to throw out any outliers I don’t like or cherry-pick any data, either. I had Excel do all the random variations and group assignments for me to ensure that I couldn’t cherry-pick. All I did was choose a cutoff number for the study – total cholesterol of 230 or higher – and let randomness do the rest.
And yet, largely because of the small study size, I found an impressive difference between two of my groups – even after applying random variations to the final cholesterol measurement.
To repeat a quote from Chris Masterjohn:
And thus we see that many published research findings are false. Some of these false findings exist because we would inevitably expect by the laws of probability for a small handful of well conducted, thoroughly reported, and appropriately interpreted studies to uncover apparent truths that are really false simply by random chance. This emphasizes the need to look at the totality of the data. Some will be false because of regression to the mean. This emphasizes the need to critically evaluate the data in each study.
Makes you hope your doctor has an understanding of statistics and a desire to dig into the research before prescribing that wonder drug. But I wouldn’t bet on it.
If you enjoy my posts, please consider a small donation to the Fat Head Kids GoFundMe campaign.