Setting Goals from Survey Data: The Conundrum of Statistics

How to report the results of a customer survey is a vital step in a surveying program. If the results aren’t communicated in a readily interpretable way, then the impact of the survey’s findings will be compromised. Since we’re dealing with statistics, this issue can be perplexing. Most of us loathed our college statistics course and blissfully forgot everything learned (only temporarily) once we walked out of the final exam. That’s too bad since statistics is nothing more than a language for describing real world phenomena, and survey results use that language.

A second and closely related factor also comes into play: customer surveys typically serve two masters. First, they should provide feedback on strong and weak points in your company’s interactions with customers. Second, they will probably also be used to measure performance of individuals within the company. A danger exists that using the survey for performance measurement can create strange incentives for those being measured, and it could negate the feedback on operational business processes.

As an example of this, consider the following question I received from a colleague, a Vice President of Customer Care for a major software firm.

I was wondering if you could shed some light on a question that has come up in terms of customer satisfaction goals. We use a short email-based survey at the close of each customer support incident. It provides each participant five choices. Each choice is anchored with a descriptor ranging from Very Unsatisfied to Very Satisfied. We then apply a 5-point scale to the results with 1 corresponding to Highly Unsatisfied, etc. Is it better to strive for a certain average score (e.g. 4.2 or higher) or some designated percentage (e.g. 90% or more with an average score of 4 or higher)?

Here’s my response:

There’s no “right” answer to your question. Both goal setting options are viable. Typically, though, I would suggest setting the goal as a percentage responding at some point on the frequency distribution, e.g., 90% with 4 ratings or higher, so called, “top box” reporting, simply because it’s more understandable to the “statistically challenged.” A goal of 4.2 or 3.7 or 4.0 isn’t as readily interpretable and doesn’t carry the same gravitas.

However, there is a downside. With the single percentile goal, then it doesn’t matter how dissatisfied a customer is because the goal attainment isn’t affected whether someone rates service as a 1, 2 or 3 — yet I would hope you would care! You certainly wouldn’t want CSRs thinking, “Oh well, the customer’s mad at me and he’s not going to rate me a 4, so who cares…” (I cleaned up the language. 🙂 ) Perhaps you should have a paired goal. 90% with 4 or higher and less than 5% with a 1 or 2, for example.

Notice the issues in play: how to report statistical results in a way that everyone can understand, but to report results that won’t create bizarre behavioral incentives when the results are performance measures.

When I report results from a survey project in a management report, I try to find the Goldilocks happy medium. Not so little information that the readers ask for more statistics, and not so much information that they can’t find what they want. For every scalar survey question I provide “descriptive statistics,” that is, mean, mode, median, standard deviation, and confidence interval, but I also provide the frequency distribution and cumulative frequency distribution, that is, the percent of responses for each point on the scale and cumulative percent responded up to and including each point on the scale. This way each reader can get the information in the way she find most useful. The challenge is not in calculation, but in formatting! Lessons from Dr. Edward Tufte (www.tufte.com) on “data visualization” come into play.

In the executive summary of the report, I always first present the results using the top box, cumulative frequency method described in the above question posed to me, that is, the percent responding with the top 2 (or 3) response options on the scale. For example, say the data showed “85% of respondents said they were very to extremely satisfied regarding….” Why this reporting approach?  This number is most understandable to the greatest number of people, and it avoids having to explain the scale used, which, after all, is arbitrary with no inherent meaning. In the body of the report, especially in graphs showing results across a series of survey questions, I typically use the mean scores. Why? I don’t like to throw out data. Let me explain.

The danger in the top box method can be seen the above email exchange. You lose visibility to the shape of the distribution. Isn’t it useful to know not just how many respondents gave the top two scores on the scale but also the breakdown between the top two? Isn’t it also very important to know how many respondents gave very negative scores versus mid-level scores? The top box reporting obscures that. Of course, you could also report “bottom box” and “middle box” percentages, but notice how messy this gets.

The issue is that when you start collapsing the scales down into “boxes,” you’re throwing away data, that is, you’re throwing away the distinction between points on a scale within a box and you’re missing the relative impact of those scores “outside the box” (I had to use that phrase somewhere in this article!) upon the “typical” response. The mean or arithmetic average incorporates all of those distinctions, properly weighted, into the one score of what statisticians call “central tendency.” That’s why for survey data it’s the best overall statistic of typical or average score given by respondents. The conundrum is that the mean is not the most easily interpretable statistic. That’s why I report both.

Is the difference dramatic between top box scoring and mean scoring? Would it lead to different business decisions? Probably not — at a high level. In a recent client project, I produced a graph with the mean scores across a series of survey questions, putting them in rank order. The client noted that the same chart for the top box scoring would have had a different rank order. The differences were not major, but there was a difference. Where differences do occur, as noted, is in the possible perverse incentive for a survey used for performance measurement and in highlighting customers potentially at risk.

The key lesson: know what you’re getting and know the consequences.