The funny thing about statistics is that you can manipulate raw data to support whatever result you want. Simply use mental gymnastics and excuses to be more selective and inflate results.
Take this study, for example, which has been much touted in the media. The study tries to prove that sexism is rampant in online gaming. (Yes, my use of "tries to prove", rather than "concluded" is not an accident. It's quite clear that the authors had a clear goal in mind when they made the experiment.)
The experiment was to play online Halo 3 matches with three players: A neutral "control" player that does not send any voice messages, a "male" player and a "female" player, and see how many sexist comments they receive.
Of course the amount of sexist comments was extremely low. That won't do. So how can they manipulate the data to inflate that percentage?
Well, firstly, and like all good science, let's remove the control from all consideration. Of course. That's a big bunch of data that would lower the percentage removed. Who needs controls anyway? That's just some fancy sciency stuff; what do they know?
Even after this, only about 1% of the players made any kind of sexist comment. (163 8-player games, meaning 7 other players besides the test player, which means about 1141 test subjects in total. 11 of them made any kind of sexist comment. That's a bit less than 1%.)
1% is still way too low. How to inflate this further? Well, let's just consider comments made by the players on the same team as the test player, and ignore the opposing players. And while we are at it, let's just remove the "male" test player from consideration as well. This way we can make the 11 sexist comment form a 13% of all comments.
Then let's conclude that, as the study says, "sexism is rampant in online environments."
From 11 sexist comments out of over a thousand players (and that was with the control samples removed from consideration.)
Yeah, because 11 is such a statistically significant amount.