A good, though sad, example of biased sampling came to light regarding the Boston University study of Chronic Traumatic Encephalopathy (CTE) in professional football player’s brains. (That’s American football for my non-US friends. You know… football that’s actually interesting to watch. J ) That study again hit the headlines when the analysis of the late Aaron Hernandez’ brain showed extreme CTE. (Hernandez was a tight end for the New England Patriots. He was convicted of one murder, acquitted on another, but then committed suicide.)
The study population in the most recent CTE paper represents a biased sample, as stated by the authors themselves. This means only the brains of self-selecting people who displayed neurological symptoms while living were studied. This is important because this sample was not a reflection of the general football population. The study was based on 202 brains out of the millions of people who’ve played football – 111 of which are former NFL players.
He then points out how the results of the study could be grossly misinterpreted:
So, when you hear “99 percent of football players had CTE,” that doesn’t mean that almost every football player will get CTE, and it doesn’t mean your child has a 99-percent chance of developing CTE if he or she plays football. It means 99 percent of a specifically selected study sample had some degree of CTE; not 99 percent of the general football population. This is an important distinction. (emphasis added)
The CTE study’s results are not meaningless. They just must be presented in the proper context. And unfortunately, reporters are unlikely to be schooled in how to perform analyses and present results.
Let’s put the lesson simply into the surveyor’s world: the results from a survey using a biased sample cannot legitimately be projected to the population as a whole.
Avoiding Biased Survey Samples
To avoid biased survey samples, ask yourself these two questions about the sampling process:
How was the invitation sample generated? It should be a random selection from the overall population.
(Four distinct approaches to probability sampling exist – simple random, stratified random, systemic, and cluster – but let’s just keep it simple and think about random samples.)
Is there something unique about who chose to respond? Non-response bias leads to a non-representative response sample.
Let’s give an example. We’ve all gone to ecommerce websites that ask us to take surveys. The request to take the survey may in fact be randomly generated to those who visit the site, but how about the second point above – is there something unique about those who choose to respond.
Personally, I do NOT take these surveys generally. Why? First, I find them annoying to say the least. I’m on the site to do some shopping. Get outta my face! Second, for the surveys I have taken, mostly by Foresee, I find the questionnaires to be poorly conceived, (that is, invalid) and uninteresting to me as a respondent.
As a surveyor, I’m clearly not a typical invitee to take these surveys, but do you think the person who does take the survey is typical of all those who are invited? Hmmm…
US General Services Administration Survey
US General Services Administration provides another example. Years ago, I got a “GSA Schedule” that supposedly would simplify my selling my survey training classes to US Government employees. Part of the application process required me to agree to a survey of past customers. I had to provide 20 names – of my own choosing – who would be asked to take a survey.
Wanna guess whom I selected? Yup, I picked the workshop attendees who were least happy with my class. No, of course not!! Nor did I randomly sample my past attendees (but I could have). I cherry picked people whom I knew loved the class. Stupid I’m not.
How many survey programs get good results by cherry picking respondents? A lot!
The CTE study is an example of cherry picking, but with legitimate reasons for this convenience sample. Again, I’m not trying to make light of CTE, but using it as a teaching example of properly executed results.
As a final insult to credible research, that GSA-contracted survey company wanted me to pay them an annual fee to put their logo and stamp of approval about my training product based upon their completely improperly done survey research.
Why do you think they wanted me to cherry pick respondents? Gotta love the government.
Presenting the Survey Results to Avoid Misrepresentation
Let me close by pointing out how the professor stated how the CTE results should be presented. How many times do we hear some poll quoted as saying 57% of Americans feel…” No, 57% of those who took the survey – however it was done – felt…
Similarly, we should present findings from a customer or employee survey not as “65% of customers indicated…” but rather as “65% of those responding to the survey indicated…” Somewhere in the report the methodology for conducting the survey research should inform the reader on how the sampling was performed.
If we’ve avoided biased survey samples, then the difference between those two statements is minimized.