Skip to main content

What is "sampling bias" and why it's important to understand

10-15 years ago, when the complete takeover of American universities by radical far-leftist activists was in full force (which was just a culmination of literally decades of more under-the-hood work by leftist professors and activists working at universities), one of their favorite talking points was that "one in five" women in universities experience sexual abuse of some kind. (And, unsurprisingly, this just got inflated more and more over the years as the takeover became more and more widespread, eventually becoming "one in four" and sometimes even "one in three", with literally no sources for any of these claims.)

In reality the amount of sexual abuse in American universities is actually lower than in the rest of the population, and in the rest of the population is more like one in 35, or something like that. (Of course this is still admittedly way, way too large, but nevertheless, it's nowhere even near that "one in five" number.)

The source of the "one in five" number was quickly revealed: It was just a completely unofficial random online poll made in some blog by some random nobody. There was literally zero academic rigor to this poll, and the author herself outright admitted it and in fact warned about using the result for anything serious, as it was absolutely unreliable. That still didn't stop the activists from repeating the factoid again and again over the years (even inflating the numbers over time, based on absolutely nothing.)

The poll had several major and rather blatant problems with it. It had no safeguards of any kind against abuse (eg. stopping the same person from voting multiple times, or stopping trolls from entering false answers), and the number of samples was way too small to have any sort of reliability (I think it was in the low hundreds). And, most importantly, it likely suffered from sampling bias.

What is "sampling bias"?

It's when the distribution of what is being sampled, for example people, is unbalanced and biased in some unintended manner, and this isn't taken into account in the results. In other words, samples giving particular results are for one reason or another unintentionally overrepresented or underrepresented in the test.

In this particular case it's likely that, even assuming the amount of trolls giving fake answers was minimal, the amount of women who had been abused was overrepresented among the participants. Why? Because such women are much more motivated to participate in such polls (as a form of wanting to have their experiences heard and make some change in some way). In other words, if word-of-mouth spreads that such a poll exists, women who have experienced abuse are much more likely to want to check it out than other women, thus making them overrepresented in the total amount of participants. Women who have never experienced any such thing may be less motivated to check it out and give answers.

It is, thus, extremely likely that the "one in five" figure was caused by both trolls and activists giving fake answers (which as mentioned was not controlled or restricted in any way), as well as sampling bias of the genuine participants, because of personal eagerness to participate due to past experiences.

This is a problem that one should remember and not dismiss, even and especially with eg. polls that confirm one's own views and convictions.

Yes, even if you are eg. an anti-leftist activist and see a poll made by some other more popular anti-leftist activist in social media, you should still take the results with a grain of salt, as sampling bias can have a huge effect on the results of such polls. After all, people who follow anti-leftists on social media are extremely likely to agree with the author, and thus their responses to such polls will reflect that, and this might not reflect the entirety of the population.

I sometimes see some youtube video of exactly this, ie. some youtuber mentioning in a video that he made some poll asking for something, and how he got tens of thousands of answers, and what the results were, and how those results confirm whatever thing. While it's very tempting to believe those results, one should always remember the problem with sampling bias: It's likely that the vast majority of people who participated in the poll align politically and ideologically with the author, and thus the results will be heavily biased towards a particular side.

Comments