How a Bad Survey Checklist Question Can Confuse Findings

Checklist questions are one of the more common survey question types, and they are also used heavily in data collection forms, which are a form of survey. And for good reason. You can get specific, actionable answers to a question – if the question is written correctly. A poorly designed checklist question can hide problems and confuse interpretation.

Ambiguous Questions: The Biggest Mistake in Survey Question Writing

Surveys are conducted to learn how some group feels. If the survey questions are flawed, then we don’t learn and may be misled. Ambiguous questions — questions whose phrasing leads to multiple interpretations — are the single biggest mistake made by survey designers. And perhaps a fatal one.

Survey Question Type Choice: More Than One Way to Skin That Importance Cat

Various survey question types can be used to measure something. The choices have trade-offs between analytical usefulness of the data and respondent burden.

The Collision of Biases: Some Things Are Just Hard To Measure in Polls & Surveys

A confluence of survey biases – response, interviewer & instrumentation – likely overwhelmed what the NY Times’ surveyors think they measured about people’s feelings about having a female presidential candidate.

The Anxious Survey Response Scale Returns

I’m anxious about your reaction to this article.

Unclear what I mean by that? That’s exactly the point. When designing survey questions and response scales for interval rating questions, It is critical to have “clarity of meaning” and “lack of ambiguity.” Without those you won’t be capturing valid, useful data, data that don’t suffer from instrumentation bias. “Anxious” is an anchor that has multiple meanings and thus should not be used in political surveys. Yet it is.

The importance of Good Survey Question Wording: Even Pros Make Mistakes

Proper survey question wording is essential to generate valid, meaningful data for organizational decisions. A survey question wording bias will lead to misleading interpretations and bad decisions. Here we examine a Pew survey on use of mobile devices in public settings.

Good question phrasing is an art form, and even the pros can make mistakes. Here we’ll show a question wording bias example from a survey done by Pew Research Center. Ambiguity in question wording likely led to incorrect data and conclusions. It provides some useful lessons for all survey designers.

Survey Question Types: When to Use Each

Each of the five survey question types has potential value in a survey research project. This article presents brief definitions of each question type, the analysis you can do, and some key concerns with each question type.

Impact of Mobile Surveys — Tips for Best Practice

Summary: Mobile survey taking has shot up recently. When a survey invitation is sent via email with a link to a webform, up to a third of respondents take the survey on a smartphone or equivalent.  Survey designers must consider this when designing their surveys since some questions will not display properly on a mobile device.  This article presents the issue and some tips for good practice.

~ ~ ~

If you do any kind of design work that involves information displays on webpages, you know the challenges of designing when you don’t know the screen size that will be used. It makes you long for the days of green screens and VT 100s when the mouse was something that your cat killed and a trackpad was somewhere you ran.

Screen sizes and resolution always seem to be in flux as we’ve moved from desktops to laptops, netbooks, smartphones, tablets, and phablets.

As part of rewriting my Survey Guidebook, I have been talking with survey software vendors, and the dramatic rise of the survey taking via smartphones is a big issue. Roughly around one quarter to one third of survey submissions are coming via smartphone devices. Ouch!

survey-monkeyThe issue is: how does your survey render onto a smartphone? Website designers have tackled this with responsive website design, and the same issues apply for surveyors. But the issue is perhaps more critical for surveyors.  While the webforms might be responsive to the size of the screen on which the survey will be displayed, the fact is some survey questions simply will not display well on a small screen.
For example I love — or loved — to set up my questions in what’s called a “grid” format (sometimes called table or matrix format). It’s very space efficient and reduces respondent burden.  However that question layout does not work well, if at all, on a phone screen, even a phablet.

Look at the nearby screen shot from a survey I took.  Notice how the anchors — the written descriptions — for the four response options crowd together. Essentially any question where the response options are presented on horizontally may not display well.

The webform rendering may have implications for the validity of the data collected.  If one person takes a survey with the questionnaire rendered for a 15 inch laptop screen while another person takes the “same” survey rendered for a 5-inch smartphone screen, are they taking the same survey?

We know that survey administration mode affects responses. Telephone surveys tend to have more scores towards the extremes of response scales. Will surveys taken by smartphones have some kind of response effects?  I am not aware of any research analyzing this, but I will be on the lookout for it or conduct that research myself.

So what are the implications for questionnaire design from smartphone survey taking?

First, we have to rethink the question design that we use. We may have to present questions that display the response options vertically as opposed to horizontally.  This is a major impact.  If you are going to use a horizontal display for an interval rating question, then you should use endpoint anchoring as opposed to having verbal descriptors for each scale point.  Endpoint anchoring may allow display of the response scale without cramped words. But make sure you have constant spacing or you’re compromising the scale’s interval properties!

Second, we have to simplify our questionnaires. We need to make them shorter. A survey that may be perfectly reasonable to take from a time perspective on the laptop with a table display will almost certainly feel longer to complete on the phone because of the amount of scrolling required.  While smartphone users are used to scrolling, there must be a limit to people’s tolerance.  A 30-question survey on a laptop might take 3 to 6 screens but take 30 screen’s worth of scrolling on a smartphone.  You might be tempted to put one question per screen to limit the scrolling.  However, the time to load each screen, even on a 4G network, may tax the respondent’s patience.

Third, beyond question design we should reconsider the question types we use, such as forced ranking and fixed sum.  Those are especially useful for trade-off analysis to judge what’s important to a respondent.  However, they would be very challenging to conduct on a small screen.  So, how do we conduct trade-off analysis on a smartphone?  I’m not sure.  Probably using a multiple choice question asking the respondent to choose the top two or three items.

Fourth, extraneous verbiage now becomes even more extraneous.  In my survey workshops I stress to remove words that don’t add value. With smart phone rendering, it becomes absolutely essential. Introductions or instructions that would cover an entire screen on the laptop would simply drive away a smartphone respondent. Even the questions should be as brief as possible as well as the response options.  The downside is that we may be so brief as to not be clear, introducing ambiguity.

Fifth, the data should be analyzed first by device on which the survey was taken. Are the scores the same? (There are statistical procedures for answering that question.) If not, the difference could be due to response effects caused by the device or a difference in the type of people who use the each device.  Young people who have grown up with personal electronic devices are more likely to take surveys on a mobile device.  So are differences in scores between devices a function of respondents’ age or a function of the device and how it displays the survey (response effects)?  Without some truly scientific research, we won’t know the answer.

Not analyzing the data separately assumes the smartphone survey mode has no response effects. That could be a bad assumption. We made that same assumption about telephone surveys, and we now know that is wrong. Could organizations be making incorrect decisions based upon data collected via smart phones? We don’t know but it’s a good question to ask.

In summary, the smartphone medium of interaction is a less rich visual communication medium than a laptop, just as telephone interviews are less rich since they lack the visual presentation of information. If we’re allowing surveys to be taken on smartphones, we must write our surveys so that they “work” on mobile devices — basically the lowest common denominator.  Ideally, the survey tool vendors will develop ways to better render a survey webform for smartphone users, but there are clearly limits and the above suggestions should be heeded.

Misleading (or Lying) With Survey Statistics

“What’s the objective of your survey program?” is the first issue I suggest people should consider when they create a survey program.  In fact, developing a Statement of Research Objectives is the first exercise in my Survey Design Workshops. The project step seems innocuous; the goal would be to capture information from customers, employees or some other stakeholder to create or improve products or services. Or is it? Is there some other goal or agenda — stated or unstated — that trumps that logical goal?

That real agenda may manifest itself in the survey questionnaire design. Disraeli said, “There are three types of lies: lies, damn lies, and statistics” and survey questionnaire design affords us the opportunity to lie through survey statistics it generates — or unwittingly be mislead as a result of decisions made during the survey questionnaire design process.

Let’s look at an example. Below is an image of the Ritz Carlton customer satisfaction survey — an event or transactional feedback survey — that you may have received if you stay with them in the early part of this century. The survey was professionally developed for them. (The survey shown has been abridged. That is, some questions have been omitted. Also, the formatting is very true to but not exactly as on the original survey. I took some minor liberties in translating from a paper form to a web page for readability’s sake.)

ritz-carlton-survey

Ritz Carlton is the gold standard of customer service, and they are well known for their efforts to identify and correct any customer problems, though I had been personally disappointed in their service recovery efforts. One key purpose of transaction-driven surveys — surveys conducted after the conclusion of some event — is to identify specific customers in need of a service recovery act. A second purpose is to find aspects of the operation in need of improvement. Consider how well this survey serves those purposes. In a follow-up article, we’ll consider other flaws in the wording of the survey questions.

First, let’s look at the survey as a complaint identifier. Putting aside the issues of the scale design , the questions at the end of the survey capture how happy you were with the service recovery attempt. But what if you had a problem and you did NOT report it? Sure, you could use the Comments field, but no explicit interest is shown in your unreported problem. No suggestion is made that you should state the nature of the unreported problem so they could make amends in some way.

Next, let’s examine how well the survey instrument design captures those service attributes that had room for improvement. Notice the scale design that Ritz uses. The anchor for the highest point is Very Satisfied and the next highest point is Somewhat Satisfied, with mirror image on the lower end of the scale.

Consider your last stay at any hotel — or the Ritz if your budget has made you so inclined. Were your minimal expectations met for front-desk check-in, room cleanliness, etc.? My guess is that your expectations were probably met or close to met, unless you had one of those disastrous experiences. If your expectations were just met, then you would probably consider yourself “just satisfied.”

So, what point on the scale would you check? You were more than Somewhat Satisfied — after all, they did nothing wrong — therefore it’s very possible you’d choose the highest response option, Very Satisfied, despite the fact that you were really only  satisfied, an option not on the scale. To summarize, if your expectations were essentially just met, the choices described by the anchors may well lead you to use the highest response option.

In conference keynotes I draw a two-sided arrow with a midpoint mark and ask people where on that spectrum they would place their feelings if their expectations were just met for some product or service. Almost universally, they place themselves in the center at the midpoint marker or just barely above it.  In other words, most people view satisfaction — the point where expectations were just met — as a midpoint or neutral position. This is a positive position, but not an extremely positive position.

What if your expectations were greatly exceeded from an absolutely wonderful experience along one or more of these service attributes? What response option would you choose? Very Satisfied is the only real option.  So consider this: customers who were just satisfied would likely choose the same option as those who were ecstatic. (By the way, the arguments described here for the high end of the scale apply equally to the low end of the scale.)

Here’s the issue: this scale lacks good dispersal properties. Put in Six Sigma and Pareto Analysis terminology, it does NOT “separate the critical few from the trivial many.” (For you engineering types, the noise-to-signal ratio is very low.) The scale design — either intentionally or unwittingly — drives respondents to the highest response option. Further, it’s a truncated scale since those with extreme feelings are lumped in with those with only moderately strong feelings. We really learn very little about the range of feelings that customers have.

Is Ritz Carlton the only company who uses this practice? Of course not. After a recent purchase at Staples, I took their transactional survey. Its error was even more extreme. The anchors were Extremely Satisfied, then Somewhat Satisfied. I was satisfied with the help I got from the store associates; they met my expectations. Which option should I choose? I was not ecstatic, but I was pleased.

A well-designed scale differentiates respondents along the spectrum of feelings described in the anchors. Learning how to choose good anchors is vitally important to get actionable data. How actionable do you expect the data from these questions would be? I’ve never seen their data, but I would bet they get 90% to 95% of scores in the top two response options, so called “Top Box scores.” I would not be surprised if they got 98% top box scores. If I were the Ritz, I would interpret any score other than a Very Satisfied to be a call to action. (Maybe they do that.) I would also bet that any hotel — a Motel 6 or Red Roof Inn — would also get 90%+ top box scores using this scale. The dispersal properties of this scale are just that poor.

A simple change would improve it. Make the highest option Extremely Satisfied and the next highest Satisfied. Or use a different scale. Have the midpoint be Expectations Just Met, which is still a positive statement, and the highest point be Greatly Exceeds Expectations. I have used that scale and found a dispersion of results that lends itself to actionable interpretation.

If you’re a cynic, then you might be asking what “damn lie” is really behind this questionnaire scale design. Here’s a theory: Public Relations or Inside Relations. Perhaps the “other goal” of the survey was to develop a set of statistics to show how much customers love Ritz Carlton. Or perhaps the goal is for one level of management to get kudos from senior management.

This questionnaire scale design issue is one reason why comparative benchmarking efforts within an industry are so fundamentally flawed. You may be familiar with these benchmarking data bases that collect data from companies and then share the average results with all who participate. Self-reported customer satisfaction survey scores are typically one of the data points, that is, the data submitted are not audited. Yet, if some companies use scale designs as shown here, how valid is the benchmark if you use a scale that truly differentiates? Your 50% top box score may reflect higher levels of customer satisfaction than the other company’s 90% top box score. Self-reported data for comparative benchmarking databases where there’s no standard practice for the data collected is suspect — to say the least.

The Disraeli quote at the opening is also attributed to Mark Twain. (Isn’t every good quote from Twain, Disraeli, or Churchill?) If Twain had lived in the days of customer satisfaction surveys, he would have augmented Disraeli’s quote thusly: “There are four types of lies: lies, damn lies, statistics, and survey statistics.”

Survey Question Design: Headlines or Meaningful Information?

Summary: Surveys can be designed to generate meaningful information or to manufacture a compelling “headline”. This article examines the forced-choice, binary-option survey question format and shows how an improper design can potentially misrepresent respondents’ views.

~ ~ ~

The Wall Street Journal and NBC news conduct periodic telephone surveys done by a collaboration of professional survey organizations. The poll results released on August 6, 2014 included a central question on the “Views of America” regarding individual opportunity. I say “central” since only a few of the 29 survey questions were reported in the paper. Here’s what they reported regarding one question:

A majority of those polled agreed with the statement that growing income inequality between the wealthy and everyone else “is undermining the idea that every American has the opportunity to move up to a better standard of living.”

That’s a pretty startling finding, and this is a bit of a dangerous article to write since it uses a public policy survey as the example. My purpose here is not to argue a point of view, but to show the impact of good question design — and bad. I’ll show how a mistake in survey question design can distort what your respondents actually feel. Or, to approach it from the opposite angle, the article shows how a survey can be used to generate a false headline with seemingly valid data.

~ ~ ~

Let me walk you through the progression of the survey so that you can experience properly this particular survey question. Question 14 used a 5-point scale ranging from Very Satisfied to Very Dissatisfied, “I’d like to get your opinion about how things are going in some areas of our society today,” specifically,

  • “State of the US economy” — 64% were either Somewhat or Very Dissatisfied
  • “America’s role in the world” — 62% were either Somewhat or Very Dissatisfied
  • “The political system” — 79% were either Somewhat or Very Dissatisfied

The next two questions were each asked of half the respondents. “And, thinking about some other issues…”

Q15 Do you think America is in a state of decline, or do you feel that this is not the case?

— 60% believe the US is in a state of decline

Q16 Do you feel confident or not confident that life for our children’s generation will be better than it has been for us?

 — Only 21% “feel confident,” significantly lower than the trend line shown in the report

The next question presents two statements about America with the statements rotated as to which one is presented first to the respondents.

Q17 Which of the following statements about America comes closer to your view? (ROTATE)

Statement A: The United States is a country where anyone, regardless of their background, can work hard, succeed and be comfortable financially.

Let me pause here and ask you to think about that statement. It’s a motherhood-and-apple-pie statement about America. If you disagree with any of that statement, you are likely predisposed to agree with the subsequent statement. And note that the previous questions have elicited quite strong negative opinions. I’ll come back to that effect.

To continue with Question 17…

…or…

Statement B: The widening gap between the incomes of the wealthy and everyone else is undermining the idea that every American has the opportunity to move up to a better standard of living.

Results from Question 17 on Views of America. 54% agree that “Widening gap undermining opportunity”.  (That’s how the pollsters described it in their PDF summary.) 44% agree that “Anyone can succeed”. 1% of respondents volunteered that both statements were true, and 1% volunteered that neither were true. Those options were not offered overtly. For those readers with good addition skills, 2% of the respondents are unaccounted. Perhaps that’s rounding, but the pollsters don’t say.

Sequence Effects upon Question 17. Unfortunately, the pollsters do not report the splits depending upon which statement is presented first. That split could be very enlightening about the sequence effect in that question’s design. That is, are people more likely to chose Statement B if it’s asked first or second?

We also have a sequence effect in play from the previous questions. Would the results be different if Question 17 had been asked before Question 14? I suspect so since the previous three questions put respondents into a response mode to answer negatively. Rotating the order of the questions might have made sense here if the goal is to get responses that truly reflect the views of the respondent. They also don’t report the splits for Question 17 based upon whether Question 15 or 16 was asked of the respondent immediately prior.

wsj-nbc-news-poll

The Design of Survey Question 17. This question is an example of a binary or dichotomous choice question.  You present to the respondents contrasting statements and ask which one best matches their views. This format is also call a forced choice question.

The power of the binary choice question lies in, well, the forcing of choice A or B. The surveyor is virtually guaranteed a majority viewpoint for one of the two choices!! Pretty slick, eh? Look again at what the Wall Street Journal reported:

A majority of those polled agreed with the statement that growing income inequality between the wealthy and everyone else “is undermining the idea that every American has the opportunity to move up to a better standard of living.”

What an impressive finding! Did they tell the reader that respondents were given only two choices? No. Not presenting the findings in the context of the question structure is at best sloppy reporting, at worst disingenuous, distortive reporting. In a moment we’ll consider other ways to measure opportunity in America in a survey, but first let’s look at the improper design of this binary question.

Proper Design of Binary Choice Questions. When using a binary choice question the two options should be polar opposites of the phenomenon being measured. This is a critically important design consideration. In this question, the phenomenon is — supposedly — the state of opportunity in American today.

But note the difference in construction of the two statements.

  • Statement A says the American Dream is alive and vibrant.
  • Statement B says the American Dream has been “undermined” and presents a cause for the decline.

So, if you feel that opportunity has been lessened, you’re likely to choose Statement B even if you don’t feel income inequality is the cause. In the end you’re agreeing not only that opportunity has been reduced, but also to the cause of it.

The astute reader may say the question doesn’t directly assert a cause-and-effect relationship between income inequality and personal opportunity. It says “the idea” of equal opportunity has been “undermined.” This question wording is purposeful. It softens the assertion of a cause-and-effect relationship, making it easier to agree with the statement. Will those reading the findings, including the media, see that nuance? No. The fine distinction in the actual question will get lost in the headline. Just look at the shorthand description that the pollsters used: “Widening gap undermining opportunity”.

Alternative Survey Question Designs to Research Opportunity in America. The issue could have been researched differently. The pollsters could have posed each statement on a 1-to-5 or 1-to-10 Agreement scale and then compared the average scores. Those findings arguably would have been more meaningful since respondents wouldn’t be forced to choose just one. But would the findings have had the same pizzazz as saying “A majority of those polled…”?

Another design choice would be to present more than two options from which to choose. What if the statement options had been:

  • Statement A: The United States is a country where anyone, regardless of their background, can work hard, succeed and be comfortable financially.
  • Statement B: The widening gap between the incomes of the wealthy and everyone else is undermining the idea that every American has the opportunity to move up to a better standard of living.
  • Statement C: The country’s continued economic weakness is denying opportunity for a better standard of living even to those who apply themselves and work hard.

Without much trouble, we could develop even more statements that present equally valid and divergent views of opportunity in America. However, presenting more choices does cause a problem in survey administration.  For telephone surveys we are asking a lot, cognitively, of our respondents. They must listen to, memorize, and select from multiple statements. Each additional option increases the respondent burden, but could the telephone respondents have handled three choices? Probably, if the choices truly represent very different opinions as just presented.

A key advantage of paper or webform surveys is that the visual presentation of the questions allows for multiple options to be presented to the respondent without undue burden and with less likelihood of a primacy or recency effect — the tendency to choose the first or last option.

Something else happens when we present more than 2 choices. We’re much less likely to get a compelling headline that “A majority of those polled agreed…”

The Importance of Clear Research Objectives. Any survey project should start with a clear understanding of its research objectives. For this particular question the research objectives could be:

  • To see if people feel opportunity in America is weakening.
  • To identify the cause of weakening opportunity.

For this survey, that question’s research objective is really both — and neither.

Income inequality may be a legitimate topic for discussion, but is it the primary cause of the loss of opportunity to the extent that no other possible cause would be offered to respondents? I think you’d have to be a died-in-the-wool Occupy-Wall-Streeter to view income inequality as THE cause of any loss of individual opportunity. In fact, the Wall Street Journal‘s liberal columnist, William Galston, discussed “secular stagnation” in an article on August 26, 2014 never mentioning income inequality as a cause of our sideways economy.

How much different — and more useful — would the poll findings have been if the question had been presented this way?

Q17 Which of the following statements about America comes closer to your view? (ROTATE)

Statement A: The United States is a country where anyone, regardless of their background, can work hard, succeed and be comfortable financially.

…or…

Statement B: The opportunity for every American, regardless of their background, to move up to a better standard of living through their own hard work has weakened over the years.

(If respondent chooses Statement B. Check appropriate statement from list below.)
What do you see as the two primary reasons for the drop in individual opportunity?

Income inequality
Weak economy means fewer jobs available
Weak economy means fewer career advancement opportunities
Increase in part-time jobs
Poor educational system
Good jobs are outsourced overseas
Government regulations deters business growth and thus job growth
Can’t get business loans to start or expand a business
Opportunity goes to those with the right connections
Discrimination in career advancement whether racial, gender…
Etc.

(Note that in a telephone survey, we generally do not read a checklist to the respondent. Instead, we pose an open-ended question, and the interviewer then checks the voiced items. Reading a comprehensive list would be tiresome. A webform survey could present a checklist, but the list must be comprehensive, balanced, and unbiased.)

Do you thing a majority — or even a plurality — of respondents who chose Statement B would say income inequality was a primal cause? I very much doubt it.

Wouldn’t this proposed question structure present a better picture of what Americans see as limiting opportunity, more fully answering the research objectives listed above?

But would the headline be as compelling? Pardon my cynicism, but the headline is probably the real research objective behind the question design.

~ ~ ~

Flawed Question Design in Organizational Surveys. Now you know why the forced choice, binary format is liked by those who want to generate data to argue a point of view. Is this restricted to political polls? Of course not. In an organizational survey measuring views of customers, employees, members or suppliers, we could accidentally — or purposely — word the choices to direct the respondent to one choice.

For example, imagine this binary-choice question for an internal survey about a company’s degree of customer focus.

Statement A: All our employees treat our customers in a manner to create and foster greater loyalty.

Statement B: Recent cuts in staff mean our customers are no longer treated by our employees in a way that will increase their loyalty.

Or a survey on member preferences for some group

Statement A: The association’s programs provide real value to our organization.

Statement B: The content offered in the associations programs doesn’t meet our requirements.

Notice how the question structure here parallels the question in the poll. A novice in survey design could stumble into the error in the above questions’ design. Or the person could want to “prove” that staff cuts are endangering customer loyalty or that content is the problem for members’ disaffection.

Any survey designer worth his salt can prove any point you want through a “professionally” designed and executed survey. And someone designing surveys for the first time needs to be aware of the importance of proper survey question design to generating meaningful, useful data.