Net Promoter Score — Summary & Controversy

Summary: The Net Promoter Score® is widely adopted and wildly controversial. What exactly is NPS and what are the various areas of controversy for using this survey question as a customer insight metric? This article provides a summary of the NPS concept and the critical concerns.

~ ~ ~

In an earlier article, I discussed the need for having confidence that the customer insight metrics we use in our operation are valid indicators of underlying customer sentiment. Here I am specifically focusing on those summary metrics we capture in our feedback surveys. This article will focus its attention on the Net Promoter Score®.

The Net Promoter Score (NPS) has achieved a near other-worldly place in the arena of customer metrics. It is certainly the most hyped measurement seen in decades, yet the fundamental measurement has been used for decades. What is new is the claim of the metric to predicting customer behavior along with the particular statistical technique applied to the data. After examining these points, we’ll turn to issues with NPS as the indicator of customer sentiment.

What is NPS?

NPS is quite simply the self-reported measurement of a respondent’s likelihood to recommend a product or service to others. That’s it.

How likely is it that you would recommend [Company X] to a friend or colleague?

This is not a new concept. The recommendation question has been asked for decades in customer satisfaction research. I can recall hearing a keynote speaker at a conference 25 years ago present the big three of summary customer measurements: customer satisfaction, likelihood of future purchase, and likelihood of recommendation. But now the claim is that the response to this question provides unique insight to a company’s forward profitability.

How Do We Know It’s Hyped?

Why do I say that NPS is hyped? Just look and listen to how NPS is used in business discussions. Companies aren’t doing customer satisfaction or customer relationship surveys; they’re now doing “NPS surveys.” Some companies actually have job titles that include “NPS.” Frankly, this is every consultant’s dream to have their trade-marked phrase become part of the business lexicon. So, call me jealous.

While many companies have never heard of NPS, you will find companies that have drunk the Kool Aid lock, stock and barrel. In these companies, NPS has achieved a mystical aura and everyone tracks the company’s NPS on a near real-time basis. NPS measurements are pushed down in the company, even to an individual level, across all departments. It’s even used to measure service provided internally within a company. This can border on obsession. One company with which I’m familiar recognized the obsession was obstructing focus on core business practices and scaled back its use of NPS.

I have seen many company surveys where questions are posed predominantly on 5-, 6-, 7-, or 10-point scales, but when it comes to the NPS question, Reichheld’s 0-to-10, 11-point scale is somehow sacrosanct. Sadly, I know that part of the reason is to allow comparisons of the company’s NPS against published NPS data. Any professional surveyor will tell you that such comparisons are highly dubious, given the differences in survey instruments and administration methods. People seem to treat NPS scores as if they’re accounting data following standardized practices. They’re not.

The Background for the NPS Claims

The Net Promoter Score first came to prominence with an article by Fred Reichheld, a Bain consultant, in the December 2003 Harvard Business Review, “The One Number You Need to Grow.” The article opens with Reichheld recounting a conference talk by the CEO of Enterprise Rent-A-Car, Andy Taylor. He talked about his company’s “way to measure and manage customer loyalty without the complexity of traditional customer surveys.” Enterprise used a two-question survey instrument; the two questions were:

  • What was the quality of their rental experience?
  • Would they rent again from Enterprise?

This approach was simple and quick, and we can infer from other comments in the article the survey process had a high response rate, though none is stated. Enterprise also ranked its branch offices solely using the percentage of customers who rated their experience using the highest rating option. Why this approach? Encouraging branches to satisfy customers to the point where they would give top ratings was a “key driver of profitable growth” since those people had a high likelihood of repeat business and of recommendations.

Reichheld, thus intrigued, pursued a research agenda to see if this experience could be generalized across industries. The first stage of this research was to identify what survey question correlated best with a person’s future purchase behavior. In most cases — but not all — the willingness-to-recommend question was the best predictor. Reichheld conjectures that the more tangible question of making a recommendation resonated better with respondents than the more abstract questions about a company deserving a customer’s loyalty.

The second research stage was to validate the recommendation question as a predictor of company growth. Through some unspecified statistical procedure, they decided to group the responses on a 1-to-10 scale into three groups. 1-to-6 respondents were “detractors,” 7-to-8 respondents were “passively satisfied,” and 9-to-10 respondents were “promoters.” (Note that later the scale was changed to a 0-to-10 scale. The addition of the zero supposedly clearly identifies which end of the scale is positive.)

Satmetrix — Reichheld sits on their Board of Directors — then administered the recommendation survey to thousands of people from public lists and compared the responses for various companies against the companies’ three-year growth rates.

The study concluded “that a single survey question can, in fact, serve as a useful predictor of growth.” The question was: “willingness to recommend a product or service to someone else.” The scores on this question “correlated directly with differences in growth rates among competitors.” (emphasis added) This “evangelic customer loyalty is clearly one of the most important drivers of growth.”

That’s the research basis for NPS as a customer insight metric. In comparison to the Customer Effort Score, this is a very robust research program. But Reichheld added another element to the conversation beyond just setting the recommendation question upon the pinnacle of what surveyors call “attitudinal outcome” survey questions. He also added a new statistic: net scoring.

The Net Scoring Statistic

Net scoring adds a certain mystical aura to NPS. We’re not just measuring a survey score; we’re taking a net score. A statistics professor of mine talked about the “Ooo Factor.” If a client would utter “Ooo” when a statistic was presented, then it had cachet. Net scoring has that Ooo Factor. It sounds so sophisticated. (In my workshops, I include the statisticians’ term for averages: “measures of central tendency.” Talk about an Ooo Factor!)

Well, a net score is a just a statistic to describe a data set. The mean (or arithmetic average) is also a statistic to describe a data set. The mean and the net score of a data set are very likely to tell the same story.

Quite simply, net scoring takes the percentage of data at the top end of the distribution, in this case respondents’ survey scores, and subtracts from it the percentage of respondents who gave us scores at the low end. The top end and bottom end scores are frequently called “top box” and “bottom box” scores, respectively. The technique can be applied to any survey question that has ordinal data properties, not just the “promotion” question — but that would dilute its aura.

The earliest reference I have found for this net scoring statistic is an online article by Rajan Sambandam and George Hausser in 1998, though they recommend multiplying the bottom box score by a factor of two or three to give it more impact.

   % 9s + 10s)
– (% 1s to 6s)
————————
Net Promoter Score

The goal of any statistic is to summarize a data set into one or two numbers that we can comprehend, digest, and apply to some decision. Reichheld advocates the net scoring approach to provide visibility to the bottom end of the distribution. Improving the views of those who are unhappy is the best way to drive overall improvement, and Reichheld wanted NPS to be used by the front line managers to improve operational performance.

However, net scoring actually throws away data distinctions, which the mean does not. A survey score of 1 has the same weight as a 6, a 9 the same as a 10. Aren’t those distinctions important? Aren’t the behavioral characteristics of a customer scoring a 1 likely to be different from a customer scoring a 6 or a 9 different from a 10? While the simplicity of the approach is a virtue, some valuable information can be lost through its simplicity.

Regardless, the idea that we’re calculating a net score whose tracking will lead to promotion of our business certainly has grabbed the attention of senior managers. And managers love a “number” to run their businesses.

The Issues With NPS

But maybe the hype about NPS is justified. And maybe not. Here are some of the concerns that have been raised with NPS.

Cannot reproduce the results. The most telling issue with NPS is that researchers have tried unsuccessfully to replicate the findings that Reichheld and Satmetrix developed as the core argument for the role of NPS in companies. This plays to the point of the earlier article about the need for reliable research.

Keiningham et al. in the Journal of Marketing, July 2007 present research using available data sources, including the American Customer Satisfaction Index (ACSI) since the data Reichheld used was not available. As this was published in an academic journal, it went through a peer review process which Reichheld’s research did not. The authors did not find that the recommendation question was superior in predicting company future profitability. In fact, the satisfaction question was superior. The authors conclude:

The clear implication is that managers have adopted the Net Promoter metric for tracking growth on the basis of the belief that solid science underpins the findings and that it is superior to other metrics. However, our research suggests that such presumptions are erroneous. The consequences are the potential misallocation of resources as a function of erroneous strategies guided by Net Promoter on firm performance, company value, and shareholder wealth.

Additionally, Morgan and Rego published research in Marketing Science (2006), examining “The Value of Different Customer Satisfaction and Loyalty Metrics in Predicting Business Performance.” They found:

Our results indicate that average satisfaction scores have the greatest value in predicting future business performance and that Top 2 Box satisfaction scores also have good predictive value. We also find that while repurchase likelihood and proportion of customers complaining have some predictive value depending on the specific dimension of business performance, metrics based on recommendation intentions (net promoters) and behavior (average number of recommendations) have little or no predictive value. Our results clearly indicate that recent prescriptions to focus customer feedback systems and metrics solely on customers’ recommendation intentions and behaviors are misguided.

If the research by Reichheld et al. cannot be replicated by others, how sound is the blind devotion to this measure?

Is NPS for relationship or transactional surveys — or both? Reichheld says that the metric is best for annual, high-level relationship surveys. Yet, he promotes using NPS to identify at-risk customers and driving the results to the front line to fix the customer relationships. This sounds more like something that should be done in transactional surveys. This seeming contradiction leads to the next issue.

The question is potentially ambiguous. Many companies pose the NPS question in relationship surveys with no wording adjustments. But consider how differently you might react to these two questions on a transactional survey after the closure of an interaction with a company:

  • How likely is it that you would recommend [Company X] to a friend or colleague?
  • Based on your most recent experience, how likely is it that you would recommend [Company X] to a friend or colleague?

In the first version, respondents may use different benchmarks for their answer. Some would base their response on their most recent experience, while others would use their overall experience in some undefined time period. How would you interpret the results if different benchmarks are used? Quite simply, you cannot. The definition of an invalid question is where respondents can have drastically different interpretations of the question. If using the net promoter question on a relationship survey, the ambiguity must be removed.

Where’s the promotion? While called Net Promoter Score, the question asks the likelihood of making a recommendation. The behavioral difference between recommending and promoting is cavernous. Promotion is active.  Recommendation is in response to a request for information. Shouldn’t it be called Net Recommend Score™ (NRS)? But that doesn’t have the same Ooo Factor, does it?

What about so-called Detractors? The bottom end of the scale is typically anchored as “highly unlikely to recommend.” If you’re highly unlikely to recommend, does that make you a detractor, someone who is actively going to give bad word of mouth?

Not applicable to all situations. In many situations people cannot make recommendations. Ask any government employee if they can make product recommendations. They cannot. They would be fired for fear of kickback schemes. Reichheld recognized this shortcoming in his 2003 article where he noted that NPS works better for the consumer product world than in business-to-business situations. Yet, that critical distinction has been missed.

The solution is to phrase the question as a hypothetical.

If you were able to make recommendations, how likely is it that you would recommend [Company X] to a friend or colleague?

Some companies have made this adjustment, but now our question’s syntax is getting complicated.

Creates a focus on getting word-of-mouth recommendations. Did the research show that getting recommendations led to higher company profitability? No. It showed a correlation between the likelihood to recommend a company with its future profitability. While we certainly want good word of mouth, it should be the byproduct of improved operations and product value. NPS has made the focus on getting recommendations.

It’s become a measurement tool, not operational tool. As mentioned, Reichheld viewed NPS as a means to drive changes in the front line, especially in addressing customer concerns. But NPS has become a performance measurement tool. This can lead to perverse behavior that can actually hide problems. Organization Behavior 101 teaches us that people will improve a performance appraisal measure — whatever it takes.

We increasingly see a business transaction conclude with the agent telling us that we’ll be getting a survey and if we can’t give them top scores to call someone before filling out the survey. I have personally experienced some truly over-the-top examples of this. On one hand, you might say, “Great, this is getting front line attention.” But on the other hand this may also mean that a symptom is being treated while the existence of the core problem is hidden from senior management.

Net scoring threshold effects. Net scoring is susceptible to big threshold effects. While a change from 10 to 9 has no effect on the net score, a change from 9 to 8 does. Small changes in a survey instrument design or administration that have nothing to do with customers’ underlying feelings could lead to these shifts, amplified by the net score. Research that I’ll be publishing shortly shows how survey mode — phone vs. web-form — impacts survey scores, and the net scoring approach amplifies the differences.

Cross-company comparisons. As mentioned earlier, many people think that scores on specific survey questions can be compared across companies without consideration for differences in the wording of the survey question, scale lengths, or the placement of the survey question within the broader survey. Such comparisons are highly dubious at best, but with NPS as an industry best practice metric means the comparison is done – blind to the issues. The best, most legitimate benchmark is against your own company’s previous survey scores.

More than one number you need to know. The myopic fascination with NPS, along with the title of the seminal article, “The One Number You Need to Grow,” has led people to think you only need a one-question survey, devoid of any complementary diagnostic questions or research. Reichheld himself doesn’t recommend this; however, he buried this information in his article in a side bar in parentheses. What people have heard is “The One Number You Need to Know.” Not true.

~ ~ ~

Measuring whether customers would recommend your company and its products and services is a valuable indicator of customer feelings. But does this one score deserve the exalted place it has achieved in the business world? How likely would I be to recommend NPS as the only customer insight metric? I’d rate it as a 2 – not very likely.

Note: NPS and Net Promoter Score are trademarks of Satmetrix Systems, Inc., Bain & Company, and Fred Reichheld.

Customer Insight Metrics: An Issue of Validity

Summary: Metrics that provide insight into customer loyalty are the holy grail of customer measurements. Several have been proposed in recent years, but whether they should be used as the basis for business decisions depends upon the validity of those metrics as true indicators of customer loyalty. This article discusses validity and reproducibility as the basis for evaluating customer insight metrics.

~ ~ ~

Unless we’re in a churn-and-burn business, customer retention is critical to achieve long term profitability. Loyal customers tend to buy more each year and the cost to maintain the customer account drops as the relationship grows. Acquiring new customers is far more expensive. But how do we know what drives customer loyalty? How do we know who is a loyal customer? We may have lots of anecdotal data, but can we be scientific in identifying those customers who are very likely to be repeat purchasers and to give good word-of-mouth to prospective customers?

This summarizes the quest for the Holy Grail of customer measurements. Can we accurately and consistently identify truly loyal customers — so that we can then identify what makes them loyal? Can we identify those previously loyal customers who are now at risk of defecting.

Surveying customer satisfaction is decades old at most, and the overall customer satisfaction question, known as CSAT in the parlance, has typically been asked as a summary question. It was assumed, and some research showed, that more satisfied customers were more likely to be repeat purchasers. Recently, other key metrics have arisen to reflect the customer loyalty sentiment. Their claim is that these questions better identify the truly loyal and the at-risk customers.

The Net Promoter Score® (NPS®) is perhaps the best known, but the Customer Effort Score (CES) is a new entrant in the field. Both are controversial. Yet, many organizations are adopting these metrics without truly assessing whether they are valid measurements of customer loyalty for any business, let alone for their business.

This article is the first in a series that will review these customer insight metrics. Future articles will each address a metric, NPS, CES, and the Secure Customer Index (SCI). Most importantly, we will provide the background behind the creation of each of these measurements and then provide an assessment of the strengths and weaknesses for each metric. The goal is to help make you an intelligent consumer of these metrics — and not just fall for the hype. We will arm you for an intelligent discussion with your vice president who heard about NPS or CES on the 18th hole last weekend.

A crucial factor in assessing the various customer metrics rests on the concepts of validity and reproducibility. We’ll turn to those now to set up the later discussion. A quick summary of validity and reproducibility is:

“Survey says…” doesn’t mean “Survey right…”

~ ~ ~

Validity is a key requirement for sound research. Simply put, validity means, “Are you measuring what you’re intending to measure with the instrument you are using?” That may seem downright dumb. How can I not measure what I’m intending to measure?

But imagine you have a glass bulb thermometer where the glass tube filled with alcohol has slipped from its original glued position against the degree markers with no way of knowing where the original position was. Will it provide valid measurements of temperature? Or imagine a household thermostat that is mounted next to a drafty window. Is the temperature reading by the window an accurate, valid measurement of the temperature in the household overall?

Try this simple experiment. Gather up all the thermometers you can and put them in the same place in your house. Wait 15 minutes for them to acclimate. Do they all read exactly the same? Probably not. Different technologies, different manufacturers, different ages, different levels of abuse.

Think of any instrument you use to measure something, be it a ruler to measure length, a speedometer to measure your car’s speed, or your bathroom scale to measure weight. You probably know someone — not yourself, of course — who skews the bathroom scale to feel lighter. To take action on the measurements we need to be confident that the readings are valid.

The same goes for surveys. We should be confident that what we’re measuring with our survey instrument truly reflects the views of those surveyed. The wording of our questions, the sequencing of those questions, the scales we chose, and even the statistics we use can misrepresent or distort what our respondents feel.

The differences you will find exemplify why it is dangerous to compare survey findings across companies where the instruments differ. The comparisons are not valid if the instrument (and the administration practices) are not identical. While true of surveys as a whole, this is equally true of summary customer insight metrics.

When we look at the summary customer insight metrics, we must ask if they are truly valid measures of customer loyalty.

~ ~ ~

A second key requirement of good research is that others can replicate a study and get the same results. This is known as reliability or reproducibility. Just because someone says, “We did a study that proves…” does not mean it’s true. For us to believe the findings and perhaps literally make million-dollar decisions based on the findings, we should want to know that others replicated the original study and reached the same conclusions.

The reliability requirement does more than simply catch the nefarious researchers who falsify data to support a conclusion. Yes, people falsify data — or exclude “wrong” data – more frequently than we’d like to believe. More importantly, the process of reproducing a study may bring out many factors that are in the mix that the original researcher didn’t recognize were important to the outcome.

In December of 2011, the Wall Street Journal published a front-page article, “Scientists’ Elusive Goal: Reproducing Study Results.” (If you search on the title, you will find many online discussions about the article.) The article focuses on reproducibility of medical research studies and quotes Glenn Begley, vice president of research at Amgen, “More often than not, we are unable to reproduce findings.” Bruce Albert, editor of Science, added, “It’s a very serious and disturbing issue because it obviously misleads people.”

The article includes a chart showing the results of 67 studies that Bayer tried to replicate. None of the studies were ones where data was fraudulent or findings had been retracted. 64% of the studies could not be replicated. Search on the phrase, “medical study retracted” and you’ll find how common it is for the findings from accepted studies to be found wanting upon further review.

You may be thinking, “but that’s medical research where lives are at stake. I’m just doing a survey.” True, but if you’re running a survey program, you are, in fact, a researcher. (At your next performance review, you may want to make that point!) In service organizations we make service delivery design decisions and personnel decisions based in part on data from surveys. Wouldn’t you want to be sure that the survey data are legitimate? Before applying the Net Promoter Score or the Customer Effort Score shouldn’t we know that the research that led to the advancement of those customer insight metrics as indicators of customer loyalty is valid and reproducible?

It’s easy to make claims. It’s harder to prove.

What Pollsters Can Teach Us — About Survey Practices

Summary: Political pollsters have a lot at stake in their polling predictions — their reputations. Many of their challenges are the same ones we confront in surveying our customers or employees or members. Recent newspaper articles discuss some of the challenges pollsters are confronting, and we confront similar changes. This articles presents some of these challenges to survey design and administration.

~ ~ ~

Deep into silly season with the US Fall 2010 elections right around the corner, we’re all sick of the ads and the polls. At least we’ll get a respite on November 2 from all this, but you can be sure the 2012 election starts in earnest on January 2.

That said, what can we learn professionally from all the political polls? We certainly see differences across polls. The differences can result from many factors, factors that are also in play when we design surveys to measure feelings of our customers, employees, members etc. The Wall Street Journal (WSJ) has had a couple of research articles examining the differences, which are enlightening. – and you don’t see this kind of public discussion about organizational surveying practices.

WSJ’s “The Numbers Guy,” Carl Bialik, had a recent article on the question rating presidential approval. Gallup presents the respondent with binary options, approve or disapprove. In contrast Rasmussen, presents a four-point scale adding “somewhat approve” and “somewhat disapprove” options. The Rasmussen poll finds much lower net approval ratings – those indicating some level of approval minus those indicating some level of disapproval. The difference is significant, about 5 % to 15%. Scott Rasmussen posits that those with mild disapproval are hesitant to say “disapprove” when presented with only the hard, binary options.

I’ve see binary options in many surveys. For example, hotel surveys may ask, “Will you stay at our hotel the next time traveling to our city? Yes or no?” Sorry, the world is not that clear cut. Many factors will be in play. If I see this on a paper survey, I add in a “maybe” option. If it’s a web form survey, I usually will skip the question or say, “no” and explain why in a comment box, hoping someone will read it.

Bialik’s column also points out that very complex questions will lead to more people choosing “Don’t know.” Political polls are typically done by telephone, which makes paramount the need for simple phrasing.

Of course, the rubber meets the road on election day. Which polling method better predicts the election? Rasmussen in a recent radio interview stated that after the election, his staff does a full debrief to see what they got right and what they got wrong. This will guide them in refining their polling procedures and in their statistical adjustment models. The key challenge is identifying those likely to vote.

In our organizational surveys we too should debrief after every survey to see what we have learned about the surveying process to refine future procedures. Even the pollster’s challenge of identifying those likely to vote has parallels for us. We have ongoing debates about what question(s) best measure the attitude customers feel toward companies, e.g., the Net Promoter Score®. Ours is a different domain but the same challenge.

In an earlier interview in the WSJ, Scott Rasmussen suggests a real flaw in many pollsters’ approach. Most pollsters fall into what Rasmussen would call the “Political Class” whom he feels view the world differently than the “Mainstream Public.” He differentiates the two groups through a series of questions. Pollsters tend to live in the power centers of the country and are disconnected from the Mainstream Public.

When pollsters write questions, they apply their world view, which may not be shared by most respondents. The response options offered in the survey will reflect the pollsters’ biases, and many respondents won’t know how to respond since none of the options reflect how they feel, even if they do understand the question. Thus, those who do provide a response are more likely to be ones who share the pollsters’ world view. We see this vibrantly in the non-scientific polls on advocacy media, but it is also present in the “scientific polls.”

When your organization designs its surveys, do you leave your safe confines and actually talk to those in your group of interest to find out what’s of concern to them or are you certain you know how they view their relationship with your organization? Is that confidence really justified? I see this practice frequently in surveys where the mental frame of the survey designer doesn’t align with the mental frame of the respondent group.

A Bloomberg National Poll, released on October 12, 2012 — but no longer available online, displays some other common survey mistakes, as well as some good practices. First, they open with the summary attitudinal question of right direction vs. wrong track. As it is the first question, the response is unbiased except by whatever opening the interviewer used. But I’ve always been intrigued by the wording. “In general, do you think things in the nation are headed in the right direction, or have they gotten off on the wrong track?” (Emphasis in original.)

Why isn’t the choice “right direction” versus “wrong direction” or “right track” versus “wrong track”? The wording makes it sound like an unintentional accident that we “got off on the wrong track” rather than the leaders’ choices that have taken us in the wrong direction. I can’t speak to the impact of the uneven wording, but I am sure it has an impact. Results reported on October 12: Right Direction = 31%, Wrong Track = 64%.

Also in the Bloomberg poll, questions that present a list of options, such as what are the most critical issues, are rotated in the order presented to responses to eliminate any bias toward choosing the first option. This is a practice that makes sense in all surveys, but it requires a fairly sophisticated survey tool.

While the poll had many interesting practices to analyze, probably most telling was the question about what changes to taxation and government programs should be adopted to reduce the deficit. When you look at the answers, you find an “I want my cake and eat it too” message. We want low taxes, except for the rich who should be taxed to pay for all the entitlements and programs we like. What a surprise!

If it was only that simple, but the fact is that all public policy decisions involve trade-offs. Binary choice and interval-rating questions for each item don’t force respondents to consider the range of trade-offs. Fixed-sum questions force consideration of trade-offs, and conjoint surveys also test trade-offs better. However, those approaches are very difficult to deliver through telephone surveys. So, we see the impact of the administration technique used upon the validity and value of the information garnered.

In our organizational surveys, we are also confronted with this challenge. The administration method constrains what we can do in a survey. But if your research objectives are to measure the trade-offs our “constituents” consider, then we have to use a method that provides valid results. Unless of course, the results that don’t force consideration of the trade-offs are the results you really want to argue some position.

Finally, we see another shortcoming in play in either the administration or analysis. According to the poll, Democrats will retain their majority in Congress, 42% to 40%. Mind you, the question was presented immediately after a question on the positive elements of the health care law, so a sequencing effect could well have been in play. I am writing this on October 15. Their opening “right direction” question makes one question the results of this generic ballot question, and every other poll shows Republicans leading the generic ballot by high single digits or more. Why does this one poll show the opposite?

Maybe they’re right, but it could also be due to very elements of sampling bias. Bloomberg polled 721 “likely voters,” giving it a margin of error or 3.7%, according to the authors. They contacted randomly selected landline and cell phone telephone numbers. The poll says they weighted the scores by gender and race to reflect the recent Census data, which means we can assume they did not weight the scores to reflect political background, which most polls do. They also do not indicate the randomly selected calls were made to insure a geographic spread nor that any weighting done to reflect geographic distinctions. We also do not know how they assessed those 721 people were likely to vote.

Perhaps these administrative and analytical shortcomings explain the difference that this poll shows in that summary question versus other contemporaneous polls. We, too, in our surveys have to be aware of biases we introduce into our data set from question sequencing, word choices, and administrative methods. We may not be predicting elections, but we may be basing significant organizational decisions upon the results.

Satisfy, Don’t Delight

Summary: “Delight, don’t just satisfy” has been the mantra in customer service circles for many years.  Satisfied customers are not necessarily loyal was the underlying assumption. Now a research project by the Customer Contact Council of the Corporate Executive Board argues that exceeding expectations has minimal marginal benefit over just meeting expectations. In essence the authors argue that satisfaction drives loyalty more than the mysterious delight factors. This article examines the argument, and specifically looks at its shortcomings in how they establish the loyalty link.

The holy grail of long term company profitability has been knowing what drives loyal behavior on the part of our customers. What gets them coming back again and again? What drives them away? How do we identify the disloyal ones to win them back? Various researchers from Reichheld’s Net Promoter research to Keiningham & Vavra, Improving Your Measurement of Customer Satisfaction, have argued we have to distinguish the attributes that satisfy from those that delight. A satisfied customer may buy again, but a delighted customer is far more likely to be loyal. That’s been the argument.

“Stop Trying to Delight Your Customers” in the July-August 2010 Harvard Business Review argues that the past research is flawed and leads to wasted effort. Matthew Dixon, Karen Freeman, and Nicholas Toman of the Customer Contact Council (CCC) addressed three questions in their research:

  • How important is customer service to loyalty?
  • Which customer service activities increase loyalty, and which don’t?
  • Can companies increase loyalty without raising their customer service operating costs?

Here I’ll summarize their research as reported and then discuss some shortcomings.

The research project surveyed 75,000 B2B and B2C customers across the globe about their contact center interactions along with extensive interviews of customer service managers. The published article doesn’t include the actual survey instrument or the details about the administration process, but we can infer many of the questions measured attributes of the service experience and the attitudes created on the part of the respondents, along with a slew of demographic data that were used as control variables in the analysis.

The authors argue “that what customers really want (but rarely get) is just a satisfactory solution to their service issue” and they have a new measure for loyalty. To paraphrase Reichheld, “forget everything you’ve ever known about loyalty research” – or do you? The authors list two critical findings for customer service strategies:

First, delighting customers doesn’t build loyalty; reducing [the customer’s] effort – the work they must do to get their problem solved – does. Second, acting deliberately on this insight can help improve customer service, reduce customer service costs, and decrease customer churn.

Indeed, 89 of the 100 customer service heads we surveyed said that their main strategy is to exceed expectations. But despite these Herculean – and costly – efforts, 84% of customers told us that their expectations had not been exceeded during their most recent interaction.

To summarize their argument in different words, companies should focus on reducing dissatisfaction, not maximizing satisfaction. I cringe at that statement since it’s what US-based airlines practice.

Although customer service can do little to increase loyalty, it can (and typically does) do a great deal to undermine it. Customers are four times more likely to leave a service interaction disloyal than loyal.

The loyalty pie consists largely of slices such as product quality and brand; the slice for service is quite small. But service accounts for most of the disloyalty pie. We buy from a company because it delivers quality products, great value, or a compelling brand. We leave one, more often than not, because it fails to deliver on customer service.

Reps should focus on reducing the effort customers must make. Doing so increases the likelihood that they will return to the company, increase the amount they spend there, and speak positively (and not negatively) about it – in other words, that they’ll become more loyal…

The immediate mission is clear: Corporate leaders must focus their service organizations on mitigating disloyalty by reducing customer effort.

The authors’ new contribution to customer metrics is their Customer Effort Score (CES), which is based on a new survey question: “How much effort did you personally have to put forth to handle your request?” The question is rated on a scale where 1 means “very low effort” and 5 means “very high effort.” Frankly, the wording confuses who is “handling the request.” I would have written: “How much effort did you personally have to put forth to get your request addressed?”

[Editor’s Note: Some people who have read this article think I am being overly kind in my assessment of that question’s wording. “What does that even mean?” was one comment. I agree. The question displays tortured syntax. Since this question forms the entire basis of their proposal for a new attitudinal measure, its wording is critical. If it’s ambiguous, that makes their whole argument dubious.]

Their research found that CES had strong “predictive” power for both repurchasing likelihood and future amount of purchases, which were their measures of loyalty — more on that later — and that it was a better predictor of loyalty than was the overall satisfaction question (CSAT) and Net Promoter question (NPS). They claim it’s better than NPS since NPS captures a customer’s view of the company as a whole, which is one of my main problems with NPS, while CES is more transactional oriented.

Beyond using the CES question, the authors discuss five key recommendations:

  1. Don’t just resolve the current issue; head off the next one.
  2. Arm reps to address the emotional side of customer interactions.
  3. Minimize channel switching by increasing self-service channel “stickiness.”
  4. Use feedback from disgruntled or struggling customers to reduce customer effort.
  5. Empower the frontline to deliver a low- effort experience. Incentive systems that value speed over quality may pose the single greatest barrier to reducing customer effort.

All of this sounds enticing when supported by sound research, but where are the weak spots?

First and foremost, I am always skeptical about findings when I don’t get a clear picture of the research methodology. This article was not the practitioner version of an academic research paper that had been submitted to an academic journal’s peer review process, which would require clean methodology. The reader is left to draw many inferences about the methodology.

No links to the survey instrument are provided. We know the CES question is posed on a 1-to-5 scale but it appears the “loyalty” questions are posed on a 1-to-7 scale, based on a chart provided. We don’t know how the respondents were identified and solicited or when they got the survey in relation to the transaction completion. While we learn that 84% of the respondents said their expectations were not exceeded, we don’t know how that 84% breaks down between expectations met and expectations not met. A chart shown with no hard data implies a weak correlation between CES and CSAT which seems hard to believe. It appears that regression analysis was performed. We don’t know the full model with all the variables included. We are provided no statistics to know the validity of the model.

The research methodology and execution could be exemplary, but the article does not provide enough background to remove my skepticism. If I don’t feel comfortable with my understanding of the methodology, I take any findings and conclusions with a giant grain of salt — in this case, a whole salt mine.

Second, the researchers appear to have defined implicitly — not explicitly — delivering “delight” as “exceeding expectations,” but they didn’t measure what customer expectations were. Previous researchers posit that some attributes are satisfiers while other attributes are delighters. They have said that exceeding expectations on satisfiers buys little, which jives with the findings here, but the CCC authors do not appear to have attempted to identify which attributes are delighters versus satisfiers, a lesson from Kano analysis.

Further, the Kano model presents an important distinction between delighters and satisfiers that the authors don’t address. In order for the delight attributes to have an effect, the satisfier attributes have to be delivered. Consider a hotel stay. If the room is not clean — a satisfier — exemplary performance on delight attributes buys little to nothing. Rather than test this hypothesis, the researchers dismiss delivering delight attributes as wasteful.

Third, following on the above point, we usually talk about companies raising the bar — what was unexpected now becomes expected — but contact centers in general have lowered the bar — what was once expected now becomes the unexpected — through the drive to offload work onto the customers. The authors state that you can create loyalty by satisfying — not delighting — the customer through the delivery of good, basic, reliable service. Personally, if I can talk with a live person quickly without having to navigate some annoying phone menu, and have a courteous interaction with an intelligent, knowledgeable person who resolves my issues quickly while instilling confidence, I wouldn’t be satisfied. I’d be delighted, which is a sad statement on the state of service. Perhaps, CES is actually a delight attribute? Again, the authors don’t discuss what is a satisfier versus a delighter.

Fourth, I would like to know what research led them to the hypothesis that CES attribute of service delivery would correlate highly with measures of “loyalty.” I’m just curious. Was this a fishing expedition that just happened to turn up a good fishing hole?

Fifth, the authors claim CES has strong “predictive” power for loyalty, performing better than the NPS or CSAT questions that apparently were in their questionnaire. Repurchasing and increased purchasing, along with word-of-mouth comments were their measures of “loyalty.” In their study, CES showed a correlation to intended future behavior of those loyalty measures, not actual future behaviors. Could people’s intended future behavior be different immediately after a poor service experience, as apparently captured in their surveys, than later?

One of the strengths of the NPS research is that those researchers performed longitudinal research. They compared the NPS scores for specific companies to future company profitability. (Note: other researchers have not been able to duplicate those NPS findings. No researcher could ever duplicate the findings of this study due to the lack of information about the study.) Here the authors did not perform such research. To claim predictive powers for CES is a semantic stretch not justified through this research.

Lastly, it is important to note that the authors investigated contact center customer service for both product-based and service-based companies. In many, if not most, of these situations, the contact center is providing remedial service. The very definition of remedial services means that the service is likely to only be a satisfier. No one wants to call for remedial service; it’s compensating for a failure in the core product or service.

We are not told the difference in CES predictive power for remedial versus non-remedial contact center experiences. This distinction is important to the application of their findings. The authors claim their findings provide lessons beyond contact center services by their use of hotel and airline service examples when building their argument. Generalizing these findings to the point of saying that service organizations in general should ignore delight attributes is at best dubious and absolutely unwarranted by the research. Those are claims a public relations person would make, but not a trained researcher.

The authors’ five recommendations make sense, and CES may have value as a new customer metric for remedial customer service, but as with NPS, I’m not sold.

Addendum, Monday April 4, 2010… However, the Wall Street Journal newspaper subscription service has been sold on this metric. I have had truly horrific delivery experiences with my Journal subscription for almost a year. Today, I called to report the third day in a row with no delivery (sic). I agreed to take their IVR post-call survey. This survey exhibited many of the problems with IVR surveys. It asked questions only about my call center interaction, but I was calling about my delivery service. The agent was fine — as have the agents been every other time I have called to complain. But my complaints appear to do no good. I have the same lousy delivery service from the same driver. If you look at the scores I gave, it would indicate that everything was okay. I was asked the NPS question to which I gave a mediocre score. (They didn’t ask me to qualify my response as “based upon your experience with the agent today…”)

But then they asked me how much effort I had to use today to get my issue addressed. I gave the lowest score on the scale. Will the analysts be able to tell that I am basing my score on the extended relationship where I have had horrific service? I doubt it. My effort today was minimal. I made a phone call. But I have continually complained to no avail. Thus my low score. You might say that this shows the importance of CES, and you’d be right — if the question had been phrased correctly and if the survey was properly positioned to look at the extended service interaction.

Earlier in this article I made the point that you must first deliver basic satisfiers before attempting to delight. Fixing this incompetent delivery service would be delivering a basic satisfier, which is delivering the service which I have contracted with them to do. Trying to “delight” in this circumstance would be wrong, but it’s not the amount of work that I have had to do which has me ready to cancel my subscription. Rather, it is the fact that the core issue has never been addressed despite repeated complaints. Plus, the survey has no open-ended question where I could have explained my scoring in the hopes that some responsible manager would actually see my complaint.

This is not the only time I have seen the Customer Effort question put in a survey without deep thought to how it is positioned in an extended customer-company interaction.

It’s a good thing I love the darn paper… And the iPad app is a fantastic option.

Surveys’ Negative Impact on Customer Satisfaction

Summary: A long understood, but seldom followed, truism of organization design is that reporting for operational control and management control should not be mixed. Tools designed to provide front-line managers information about operational performance will become compromised when used for performance measurement. This is true for customer satisfaction surveys used for operational control, including Reichheld’s Net Promoter Score®. It was intended to be an operational control tool, but when used for performance measurement, we can see the deleterious effects. Customer feedback surveys are one element in an organization’s measurement systems, and the survey program needs to be considered in full context to be sure the measurements are not corrupted, and — more importantly — that the results of the survey program don’t create customer dissatisfaction in the process of attempting to measure customer satisfaction.

Nearby you’ll see an “Important Customer Satisfaction Notice!!!” that was attached to my repair bill from a local Subaru dealer. Take a read of it. Even though I do surveys professionally, I try to listen to my gut when I take a survey or see something like the flyer in the nearby image. How am I reacting to it? How would a “normal” person — that is, someone for whom surveying is not the center of his work life — react to this? What’s your reaction to it at both an intellectual level and a gut level?

[Let me note that I am a member of the Subaru cult. It’s the only car brand I’ve owned since 1977, and I even got my wife to buy one for her latest car. I am not bashing Subaru’s products here, but perhaps I am bashing their use of a survey mechanism. Besides, I suspect that this same approach is used by many, many other car dealership brands, but I only see Subaru’s business processes.]

customer-satisfaction-notice-misuseYour first reaction probably is that the writer’s English skills are woeful — and that’s being kind. With my professorial red pen, I count 16 editing corrections in one page. Three people had their names at the bottom of the page. (I blanked them out for privacy reasons.) Didn’t they proofread this? Maybe they did, which would be really sad. If I were the owner of this dealership, I would be embarrassed. The sloppiness sends a message about the dealer as a whole. It’s a window into the concern for quality at the dealership. One can only hope they are better, more careful mechanics than writers!

Your second reaction is probably something like, “Why are they telling me this now? Why didn’t they just do a better job in the first place? Do they expect me to now lie on the survey?” I’ve shown this sheet in my customer feedback workshop, and that is the type of reaction I’ve heard.

Another reaction I’ve heard is that this “Notice!!!” isn’t about customer satisfaction. It’s about the dealership’s scoring from those in headquarters. That led the readers to really question the sincerity of the motives.

Plus, they put the onus on me, the customer, to reach out to them to fix their problems to then get me to give them high scores.

Lastly, they sure are using a lot of loaded language to impress you after the fact!

As a teacher of over 23 years, I can relate to the feeling. I frequently get students approach me to say, “I’m concerned about my grade.” Many times it’s a part-time MBA student who adds, “I don’t get reimbursed in full by my employer unless I get at least a B+ in the course.” Want to guess when they approach me? Yes, in the last week of the semester just before the final exam. Basically, they’re asking me to not apply the same grading standards to them as to the rest of the class. My response is usually something akin to, “If this is so important, why didn’t you work harder during the semester? And why are you coming to me now when the semester is over?”

During the ’80s I developed management and operational reporting systems for Digital Equipment Corporation’s field service division in the US. One lesson that I learned over and over again is that a reporting tool can’t serve two masters. If it’s to be used for operational improvement, then it should be used solely for that purpose. Once those operational control reports get used for performance measurement purposes, the validity of the data will deteriorate. Those being measured could improve their performance to affect the reported numbers or they could manipulate the data collection and reporting systems to affect the reported numbers – or both. Trust me, the data collection systems will be manipulated. At DEC we called it “pencil whipping” the forms that the techs filled out.

Fred Reichheld, the father of Net Promoter Scoring®, lamented about how his child is being misused. An attendee at the NPS® conference in early 2009, posted a blog about Reichheld’s comments. Essentially, Reichheld had envisioned NPS® as a tool to drive change in the customer-facing operations. But it is increasingly used as a performance metric club to beat people up. Worse yet, many companies think their feedback surveys only need to contain the “one number they need to grow.” A one-question survey cannot serve as an operational control tool.

Colleagues of mine, Sam Klaidman and Dennis Gershowitz, point out in an article of theirs that this focus on getting good NPS® scores creates the impulse for front line managers to “fix” the survey score before the score is submitted, as shown above.  The unintended consequence is that it creates a black hole in knowledge of what’s really happening at the front line. They refer to these as “gratification surveys.”

A survey program needs to be one element in a broader customer feedback strategic program.  If you plug in just a survey without thinking more broadly, then you will get reactions such as seen in the example above. I am glad that the dealership is taking customer satisfaction seriously, but the approach they’re using just feels wrong. Why didn’t the service advisor ask me questions when he handed me my bill? Or someone could have called me the next day. However, make sure the person placing the call is a professional and has been trained. I have received those kinds of calls from car repair shops, and it was clear that the person placing the calls had not been trained. It was an awkward call that left me feeling more skeptical about the dealership.  (It may have been this same dealership!) And don’t mention the survey. Once you mention the survey, you’ve crossed the line into manipulation. Subaru’s failure at the headquarters level was to implement the survey program without the additional elements of training those measured of what to do with the results, and to use this customer feedback system as a key element in their dealership reward system.

How would you react if you were getting beat up on survey scores? Perhaps the way this Subaru dealership has reacted: work the system. The Law of Unintended Consequences rears its ugly head. Ugh.

Have you found your survey program get corrupted when it started to be used for performance appraisal purposes? Let us know.

~ ~ ~

I received more response on this article than perhaps on any other one I’ve posted. Here’s one response.

This was my story one year ago. I took my car in for service for first time at this dealer. Price was outrageous and they did stuff not approved.

There was a copy of the customer survey form at the checkup desk, with a note to it attached saying they would get bad ratings if any area was not a 9 or above, so please fill it out that way basically. (can’t recall the exact wording, but that was the gist of it.) Then the gal checking me out told me the exact same thing — that I would be getting a survey from JD Power and they really wanted us to rate them 9 or above for them to keep their level rating.

Then, on my receipt, there was a note that if I would think of rating them anything below a 9 on any category to please call the Customer Service Manager. Well, I promptly gave him a call and told him that if they needed a 9+ so badly, perhaps they should try focusing more on service than just instructions about how to fill out the survey with the desired numbers. He apologized and offered to fix any of my service problems from the day before.

When those issues were resolved he called me to follow up. They were not resolved to my satisfaction and I told him so, and also told him I would not be giving the 9 scores he was looking for.

I never received a survey.

Penny Reynolds
Senior Partner, The Call Center School

Delusions of Knowledge — The Dangers of Poorly Done Research

Imagine you’re planning to conduct a survey to support some key business decision-making process. But you have to get results fast. So, you throw together a quick survey since something is better than nothing. Right? Wrong.

Why is this wrong? Because this knee-jerk survey process may contain biases and errors that lead to incorrect and misleading data. Thus, the decisions made from this data will be based on delusions of knowledge. The purpose of this article is to outline the types of biases and errors common to survey projects. Knowledge of these will help you create a survey that provides meaningful results.

First, though, let’s drive home this issue of true knowledge versus perception. Consider the following matrix. (As a quasi-academic, it’s a requirement to present ideas in 2×2 matrices. <grin>)

objective-reality-matrix

In this matrix, our true understanding of objective reality – we know the truth or in fact we don’t – is contrasted to our perception of our understanding of reality – we think we know the truth or we think we don’t. Let’s look at each quadrant.

Northwest Quadrant: We think we know reality and in fact we do. This is our goal. We want to live in this cell. Nothing but blue skies do we see…

Southwest Quadrant: We think we don’t know reality but in fact we do. This is a good zone because of the opportunity it presents. We’re not going to run off half-cocked and do something stupid. We’re cautious in this position awaiting someone – a manager, a mentor, our mate – to tell us that we know more than we thought and to give ourselves more credit. In the Knowledge Management field this is known as Tacit Knowledge in contrast to Explicit Knowledge found in the NW quadrant.

Southeast Quadrant: We think we don’t know reality and in fact we don’t. This is another good zone since we’re unlikely to make wrong decisions. However, that opportunity in the previous quadrant is lacking since we lack real knowledge. Our mentor first has to help us learn, and then realize what we have learned. We want to move to the Southwest quadrant and not to the Northeast quadrant.

Northeast Quadrant: We think we know reality but in fact we don’t. This is our black hole of decision making where we’re operating under delusions of knowledge. Many versions of this exist. Maybe one customer or employee complains and the assumption is made that this is a real problem that everyone confronts. Maybe we’ve done some “research” but the research plan or execution is flawed. A whole host of reasons can create delusions of knowledge.  You probably know many people who fit this quadrant: “He doesn’t know what he’s talking about.”

But how could a survey research project create such delusions?  “A survey is so simple,” you say.  “I get surveys in the mail all the time, and I’ve thrown together my own surveys. All research is beneficial. How could any survey effort not help? This isn’t brain surgery, after all.”  But it is (sort of). We are trying to get into the brain of some group of interest — our customers, employees, stockholders, suppliers, etc. – and measure their perception of us and perhaps collect some other hard data.

A survey project presents many, many opportunities to screw up. The following table lists the areas where biases or errors can affect our survey results – and thus our decisions.

stage-of-survey-project

In any research program, including surveying, the researcher faces a constant struggle to avoid introducing biases into the data and errors into the findings and conclusions. For a novice surveyor real danger exists. Why? Because the novice isn’t even aware that these biases could exist, they don’t know enough to avoid them, they won’t know they’ve created these biases, and they won’t know to take caution when interpreting the collected data set. The novice will execute the flawed project, analyze the data and draw conclusions using data that doesn’t represent the true feelings of our survey population, our group of interest.

Future articles will address each of those three major areas in depth. I’ll add links when they’re written. The bottom line message is that rigor is needed in any research project to be sure that you’re truly learning something about the area of interest.

In summary, would you rather make decisions based on:

  1. Sound research data, or
  2. Data you thought was right but was wrong or
  3. Gut feel and business intuition?

While I would always prefer the first choice, I’d rather operate from intuition than delusions of knowledge. As some wise folks over the millennia have stated in various forms:

  • ”The trouble with most folks isn’t so much their ignorance. It’s know’n so many things that ain’t so.” 19th century humorist Josh Billings
  • “To know that you do not know is the best. To pretend to know when you do not know is a disease.” Lao-Tzu
  • “It ain’t what you know that hurts you. It’s what you do know that ain’t so.” Will Rogers

I’m sure Mark Twain, Winston Churchill, and Disraeli had a version of the same quote.

Customer Surveying for Small Business: Why Bother?

Common wisdom is that only large companies need customer-research programs. After all, small companies have their feet on the ground and know what customers are thinking, right? On the surface, that seems a reasonable attitude for small-business managers to take. But ask yourself if there is some percentage of customers defecting to competitors each year for part or all of their purchases.

If so, then some formal customer-research program, especially surveying, can yield a quick payback by shedding light on the reasons for the defections.

Let’s expose some other myths of customer surveying.

Surveys don’t provide data that can be used. There can be some truth here. Survey results are primarily numbers. The challenge — assuming the numbers accurately reflect respondents’ views — is to give voice to the data. This voice comes from designing a questionnaire with an understanding of how the data will be used, and then performing proper data analysis.

Sometimes the voices are a scream for help. In fact, identifying those customers needing “service recovery” should be a key goal of a survey program. Such actions may stop a defection before it occurs. Most often, the voices are pointedly identifying shortcomings in a business process. An ongoing survey program can play this quality control function. Sometimes the voices from the data are more subtle, requiring correlation to demographic data to uncover the phenomena that are in play.

Surveys should be viewed as part of a broader program of understanding customer needs and concerns.  In fact, a survey may pose more questions than it answers, but these are questions that likely need to be raised — and you can’t find answers until you know the right questions to ask.  Follow-up and in-depth research should yield the specific improvements that are needed.

Customer survey programs are too expensive for small businesses. While it’s true that large companies can spend eight figures on research programs, a small business can perform a very adequate job on a reasonable budget. Of course, this would require that much of the work is done in-house, but the advent of electronic-survey tools makes surveying far more accessible. Good desktop tools are available for less than $1,000, and there are even free, hosted survey services.

You might still consider outsourcing in areas of the program that require specific expertise or operational capacity that simply is not available. Outsourcers will also help drive a project schedule. A pure in-house survey project may fall prey to higher-priority concerns and languish undone.

No one on staff has the skills to do a survey. Much of sound research practice is applied common sense, and someone with a market-research background should have exposure to the survey research discipline. In fact, it’s possible that your staff has skills it doesn’t know it has. If you’re skilled at interviewing, then you have experience in eliciting information in a structured way. Don’t sell your skills short, but do know your shortcomings.

Surveys are just a bunch of questions strung together. A quote from Mark Twain has merit here: “It ain’t what you don’t know that gets you, it’s the things you know that ain’t so.” Survey instrument design is a craft and arguably the most challenging part of a survey program. If you think you have that craft and you don’t, then you may create an instrument that provides misleading information.

Look at the following survey question: “How satisfied were you with how quickly your call for information was answered?”  Show this question to 10 colleagues and ask them what they think is being asked.  You may get 12 different answers.

This question, apparently benign, is loaded with ambiguity.  What’s being “answered,” the phone or the question?  Does “quickly” refer to the number of phone rings before someone or something answers, how soon you were in contact with a person who could answer the question, how quickly you got the answer? Fleshing out ambiguous terminology is a real challenge.  By the way, if you do present the question to your colleagues for review, you’ve developed the skills for pilot testing your survey instrument.

As mentioned, don’t sell your skills short — but know where they are short — and a survey program can yield a very fast return on investment by listening to a now more loyal customer base and applying the findings to improvement programs.

From the print edition of Boston Business Journal

Customer Satisfaction Surveys: The Heart of a Great Loyalty Program

Murphy’s idea of satisfaction is an open can of tuna fish, loyalty would be ensured by a second can of tuna fish, and his idea of surveying is watching chickadees, titmice and sparrows at the feeder. Yes, Murphy is a cat, a cat with elevated calcium levels in his blood, which may indicate a tumor. Since his diet could be the cause and Deli-Cat® is his primary diet, Murphy’s owner, who happens to be the author of this article, called the Ralston-Purina customer help line, whose number was on the Deli-Cat® packaging, to learn the precise calcium level in the food.

Murphy-in-sunThe help desk was true to its name, but Ralston-Purina’s interaction with the customer is not purely passive. A few days later, I received a phone call whose purpose was to conduct a survey on behalf of Ralston-Purina. The interviewer asked not only whether the information request was fulfilled and the agent was courteous, but also whether my experiences with the customer service desk would make me more or less likely to buy Ralston-Purina products in future. In other words, this telephone survey was  trying to assess the affect of the help desk upon my loyalty to the company and its products.

The consumer goods industry has always been on the vanguard of customer research, so Ralston-Purina’s efforts in regards to their help desk shouldn’t surprise us. However, customer support organizations in companies across various industries are also engaged in similar customer research efforts.

Loyalty is a Behavior

Before talking about customer loyalty programs and “customer sat” surveys, we need to answer two questions. First, what is loyalty? Second, why should we, as support service managers, care about loyalty?

To answer these questions, let’s look at the support business process. First, the customer contacts our organization to learn how to get the intended value out of some product. Then, through the service encounter, the question or issue is addressed, leading to two outcomes, a technical outcome and a behavioral outcome. The technical outcome relates to the extent to which the problem was resolved. The behavioral outcome will be driven by how the customer feels towards the company as a result of the interaction.

Customers enter the service encounter with certain expectations, and the comparison of those expectations to their perception of our performance equates to some level of satisfaction, which in turn leads to some set of behaviors. Hopefully, loyalty towards our company, rather than mere satisfaction or – even worse — dissatisfaction, is the behavioral result. As service managers, we all wear two hats, operations and marketing, making both the technical and behavioral outcomes key to understanding the effectiveness of our support organization.

You might ask why the distinction between loyalty and satisfaction is so important. Various research studies have shown that customer retention and loyalty drive profitability – from both the revenue and cost sides of the equation. These studies have shown how much more it costs to attract new customers than to keep current ones – and how much more product loyal customers are likely to buy from us.

There’s also a less obvious impact upon revenue. When customers more highly value a company’s products and services, they see a cost to switching vendors. Not only does this retain the customer, but this enhanced loyalty allows us to practice value-based pricing, which is a technique to maximize profitability by setting prices to “harvest” the value the customer perceives. Customers who are merely satisfied don’t feel the value and see little cost to switching vendors.

How, then, do we know if our customers are loyal and how can we increase their loyalty? That’s the aim of a customer loyalty program. A customer loyalty program is a portfolio of research techniques and action programs designed to assess customers’ attitudes towards our organization or company and to take action to improve their opinion. Both quantitative and qualitative research efforts are needed to capture the full breadth of information needed to assess and improve.

A well-balanced research portfolio allows an organization to

  • Listen Actively. Customer research should be performed continuously to ensure a consistent quality of service delivery.
  • Listen Broadly to the entire customer base. That’s the role of surveys. Surveys are also very good at identifying specific problem areas for in depth review and at providing an overview of our relationships with customers. We’ll focus on surveys in this article.
  • Listen Deeply. Focus groups and personal interviews, perhaps as a follow-up to a survey, allow us to understand the “why” behind the attitudes that surface through broad-based surveys. These research techniques generate the granular data that provides solid footing for improvement efforts.
  • Listen to Extremes. Comments from those who are highly pleased or displeased may identify strengths to be duplicated or fail points to be corrected. We should actively, not just passively, encourage people to complain – and to compliment.

Mass-administered surveys, which can gauge average satisfaction levels along multiple service dimensions, are the heart of a system to listen broadly to customer feedback. These surveys give us benchmarks of performance, and they may direct us towards weak points in the process.

Companies use various tools to administer these surveys. The Customer Services group in Aetna Insurance’s Retirement Services division conducts surveys through a VRU-based product immediately after a service transaction is completed. This AutoSurvey system from Teknekron can direct an email or fax to the appropriate person in the company whenever a survey response falls below some threshold. This allows for an immediate call back to the customer. “That instantaneous ability to capture a mistake and fix it is worth its weight in gold,” according to Dick Boyle, Vice President Customer Services.

America Online uses Decisive Technology’s internet-based survey services to take a constant pulse of customer’s reactions to support quality after support incidents are closed. While the scalability and affordability of the system were important, the ability to provide rapid feedback on key performance measurements has allowed AOL to improve its business performance.

Speedware, the Canadian OLAP software publisher, conducts telephone surveys for key performance measurements shortly after the service transaction is closed, asking how the customer’s perceptions match their expectations on a 1 to 10 scale. Maria Anzini, Worldwide Director of Support Services for Speedware, analyzes her survey data base with Visionize’s Customer Service Analyzer. Last year, the data showed a drop in first call resolution and customer satisfaction in a very technical skill area for European customers.  This hard evidence proved the need for additional staff for that targeted area.

“It’s our thermometer,” says Anzini, who also uses the survey data to manage her service agents. The results are part of a 360-degree performance review process. Each agent sees a chart showing his performance against the average agent performance. The surveys have also provided valuable data for recruiting and training practices, which is very important given the cultural differences across her international user base all serviced out of Montreal headquarters.  Anzini has identified a profile of the best agent for each product and for each geography. “I have to pay attention to how they [the international customers] want to be serviced.” For example, since British customers are more demanding, she hires agents who “maintain the charm but get to the point.”

But the personal interaction afforded by the telephone survey method allows Speedware to capture richer data. They encourage respondents to explain the basis for their answer, and these quotes are vital pieces of information. “That’s where the juice is” stated Anzini. The ability to probe deeply to get to the “why” behind customer feelings is the type of actionable data that is essential for a re-engineering effort properly directed at root causes.

Dick Boyle of Aetna also now supplements his 5-question surveys with comments at the end. He found that many respondents were giving high marks on the separate elements of support quality, for example, courtesy, knowledge, effectiveness, only to give a mediocre overall rating. “That led me to believe that something about the experience was not being captured in the questions, and we were missing the boat.”

This example also shows the importance of properly designed research instruments. Only well-designed instruments will provide valid information for business decisions, and there are many easy mistakes that can be made when designing survey questions.

Survey Design & Rich Data Analysis

Perhaps surprisingly, one key criterion of a good instrument is that it highlights distinctions among respondents.  Ralston-Purina’s call center performed a mail survey a few years back, and the results indicated that all aspects of service were uniformly outstanding. While this may seem good, the results – data with no variance – didn’t allow them to identify what areas needed improving. The cause was the questionnaire design, which masked true underlying differences in the perception of performance. That instrument posed questions to respondents asking for answers on a 5-point scale with agree-disagree anchors. This design frequently led customers to say early in the survey, “It’s a 5 for all of the questions.”

Tom Krammer and Rik Nemanick, professors at St. Louis University who are conducting the latest research program, decided to move away from this scale. They also stopped asking questions that related to specific call center agents since they found that in Ralston-Purina’s market context, customers were concerned about getting agents in trouble. They now ask for response on a scale where the anchors are performance relative to expectations and the midpoint is where expectations are just met. That change revealed significant distinctions since responses were now spread across the range of responses rather than clumped on one end of the scale.

Professors Krammer and Nemanick also thought about the analysis they wanted to perform before designing the survey instrument. They wanted to do more than just benchmark performance and point to areas of concern. Their comprehensive survey research design allowed them to perform statistical tests to demonstrate cause-and-effect relationships. This richer analysis allowed to really target improvement efforts. Ralston-Purina is using the statistical analysis to understand the specific impact of different problem resolution tactics to customer behavior, such as future purchase intentions.

“All complaints are not created equal,” according to Ken Dean, Director of Quality Systems & Resources. A cat that can’t hold down its new food requires a different response than a customer who gets a bag of broken dog biscuits. The impact of different tactics, e.g., a free bag of food versus discounts, is measured in the survey. They even use “non-product-oriented resolution tactics” (for example, sending flowers or a coupon for a restaurant dinner) and they are developing a table to optimize the service recovery, comparing the tactic to the resulting satisfaction.

Service Plus [formerly ServiceWare], the Canadian service management system vendor, is also using surveys to bring hard evidence to business decisions. Gary Schultz, Vice President of International Technical Support, recently started using SurveyTracker to conduct surveys by email. “Up until we used the survey tools, our data was soft – opinion based.” For example, sales and marketing were pushing Schultz to make his support organization a 24 x 7 operation since all the benchmark companies, e.g., Corel and Microsoft, were doing it. A quick survey found that 98% of the customer base would not find it beneficial since they did not operate around the clock. “It would have been a value-added service with no added value… The survey data allowed us to make the right decision about that… and we did it [the survey] in 4 to 5 days.”

More Benefits from Active Listening

While attitude assessment and targeting improvement activities are the primary objectives of a customer loyalty program for a support organization, other objectives, which affect customer loyalty, can be addressed.

First, relationship marketing objectives can be met in part by a customer loyalty program. The very process of soliciting customers’ perceptions about the quality of the service encounter alters those perceptions. Empathy is important to a customer’s overall impression of service quality, and a well designed and executed program shows the customer you care about his opinion.

Second, customer education can be advanced through the customer interaction. Nancy McNeal, of New England Business Systems (NEBS) conducted a user survey for her help desk. Aside from the “fun [of] interacting with people” from all over the company, “we became acknowledged as a corporate help desk rather than just an IT function.” NEBS employees previously had not realized the extent of the services the help desk performed in supporting their activities.

In a similar vein, Aetna Retirement Services educates some key customers by bringing them into the call center and letting them listen to calls. This “gives them an appreciation for what we do, according to Dick Boyle. “They actually do sit down and listen to calls. We then debrief with them and ask them what we could do better.”

When Speedware conducts its telephone surveys, they sometimes find that customer expectations are unrealistic since they don’t reflect the terms of the contract. “We reset their expectations… We take advantage of the survey [interaction] to educate the customer,” according to Maria Anzini.

Third, the objectives of your loyalty program don’t have to be constricted to perceptions about service just because you’re in a support service organization. Loyalty is determined by the weakest link in the value-added chain. As the last link, customer support is in the ideal position to assess the loyalty quotient for the company’s entire value-added chain.

At ServiceWare, this role is explicitly recognized. “The customer relationship belongs to me,” according to Gary Schultz. “I’m responsible for the customer just after the sales cycle, that is, the long term life cycle of the customer… The issue of referenceability is my ultimate responsibility and mandate.” Gary’s group uses his email surveys to “qualify and quantify exposure areas” in both the service and software products.

Speedware’s Anzini also uses the survey data to help those outside her organization. If a sales representative is going to a customer site, she prints out the results of the surveys along with other operational data. This properly prepares the sales representative and is “part of the customer-centric loop” practiced at Speedware.

Aetna’s Customer Services group is using its survey tool to “provide high quality competitive market intelligence on our products, our customer, our trends, problems or strengths to the rest of the company.” Through this, Dick Boyle intends to turn his “customer service function into one of the most important pieces of strategic thinking in the firm.”

The Keys to Good Listening

Notice the elements found across these programs.

First, these companies listen broadly to their base of customers, compiling scientifically valid statistics of support organization performance.

Second, they actively listen by touching customers soon after a transaction.

Third, they listen deeply, trying to get to the granular data that supports improvement activities.

While all of this makes eminent sense, there is an underlying fear.  As Aetna’s Dick Boyle says, “How much of an appetite does the customer have to want to do this.”  Knowledge workers are increasingly overwhelmed with information demands.  If everyone tries to get their customers to talk, will we kill the golden goose?

~ ~ ~

For you animal lovers, I am happy to report that Murphy is fine. Despite exhaustive tests at Tufts Veterinary Hospital, which did not please him in the least, the cause of Murphy’s elevated calcium level has not been isolated, but he shows no adverse effects from the calcium. And, in fact, the calcium levels have returned to normal.

He is still an active lord of his manor.