This article presents findings from a research study that examined the impact of mixed-mode survey administration upon the scores provided by customers. The data come from a large, well-known business-to-business technology company that conducts ongoing survey programs of its US customers regarding its service delivery. The company delivers the identical survey instrument both by telephone interview and by email invitation to a web-form survey.
While there’s much anecdotal evidence and “common knowledge” that survey mode matters, we had not seen research using data from an actual business environment. This is not a perfect, controlled experiment, but the statistical findings are so strong that we’re confident the research findings are important for two issues for companies that conduct customer surveys, especially, so-called “NPS surveys.”
- The administration mode can dramatically change the scoring respondents provide.
- The “net scoring” statistic used to calculate the Net Promoter Score can cause dramatic swings in the NPS across survey administrations — swings that likely overstate what are the actual changes in the views of the respondents.
Why Mixed-Mode Surveys Matters
While surveys are a very useful indicator of customer views, survey data have achieved a level of unquestioned acceptance that is, frankly, scary. Far too many people believe that survey results are the “truth” about how customers — or any other group being surveyed — view their company. Yet, survey research results, like all research, contain many sources of error. These errors will be exacerbated if the survey program is done by someone who is unaware of the challenges to performing good, valid, reliable research.
Survey error can result from a host of factors. In this article we’ll examine error from the survey administration mode chosen. We can conduct surveys many different ways, and this research examined two very common modes:
- By telephone,
- By web-form, linked to it from an email invitation.
The administration mode can affect the scores provided by respondents in two ways:
- Whether someone responds
- How they respond.
Impact of the Net Scoring Statistic
If you’re reading this article, it’s probably because of the “hook” in the title — Net Promoter Score. Elsewhere, we’ve critiqued the Net Promoter logic, but here we’ll address the statistical process behind the net scoring statistic that Fred Reichheld created.
Net scoring, like the mean, is a single statistic to summarize a data set. The logic of net scoring is to take the percentage of respondents at the top end of the response scale, so-called “top box,” and subtract from it the percentage of respondents at the lower end of the response scale, so-called “bottom box.” Net scoring thus arrives at a single number, which is expressed as a percentage that can range from +100% to -100%. While this statistic can be calculated for any ordinal or interval-rating survey question, Reichheld applied it to the recommendation question because of the predictive power he found for that question.
Reichheld defined the “top box” as those providing scores of 9 or 10 on his 0-to-10 scale, which he labeled as “Promoters,” and the “bottom box” as those providing scores of 0 to 6, which he labeled as “Detractors.” Those providing 7s and 8s were labeled as “Neutral” or “Passives.” Thus the net score is %(9s+10s) – %(0s to 6s). While the NPS is typically expressed as a percentage, it truly is not a percentage.
The net scoring logic means we have two threshold points in the scale that change the classification of the respondent.
- The change from 6 to 7 (and vice versa) – Detractor to Neutral
- The change from 8 to 9 (and vice versa) – Neutral to Promoter
These threshold points can cause dramatic swings in the resulting scores as we are about to see.
The Research Site
The company that provided us access to their data is a well known B2B company that conducts transactional surveys for its 750,000 service transactions each year. We got access to one month of data.
The company uses a simple three question survey regardless of how the survey is conducted.
- Assuming you were allowed to do so, how likely would you be to recommend XXX to colleagues within your organization or to other organizations?
- Please rate your overall satisfaction with XXX as a [product type] service provider.
- Overall, how satisfied were you with this most recent service visit?
Those customers for whom this company has an email address are sent email invitations with a link to the web-form survey. If they do not have an email address on file, then an independent, third-party professional surveying organization calls them to take the survey.
This study is not a perfect controlled experiment. To be such, customers would have been chosen at random to be surveyed either by telephone or by email invitation. That was not the case here. So, there is the possibility that the customers from whom the company has collected email addresses are structurally different from customers for whom it does not have an email address.
As will be shown, the differences in the scores between survey modes are so distinct, it is unlikely that this factor alone could explain those differences. The reality is that few companies are willing to conduct true experiments, so even this compromised experiment is valuable.
Data Analysis and Findings
We analyzed the data for all three questions and got similar results. So, here we will focus on the so-called Net Promoter question, posed on a 0-to-10 scale.
The nearby chart shows the results of our analysis. It displays the frequency distribution of survey responses for the Recommendation Question, broken out by survey mode. That is, it shows the percent of respondents who gave the score of 10, 9, 8 and so on. The phone survey data are in blue while the email web-form survey is in purple.
The differences in the distributions are dramatic, particularly at the top end of the distribution. While 54% of phone respondents gave scores of 10, “only” 27% of email/web-form respondents gave 10s — a 2:1 ratio! Yes, that’s right.
For every other point on the scale, those who responded via the web-form had higher frequencies. In the bottom half of the distribution, almost no phone respondents gave scores while there were some scores given here by email respondents. Simply put, the phone survey method garnered far more scores in the top response option.
The differences between survey modes was also highlighted in the mean score for each, assuming interval properties for the data, as shown in the figure and listed here:
- Telephone Mean Score: 8.79
- Web-form Mean Score: 7.44.
We won’t go into the test statistics in depth here. If you are statistically inclined and wish to see the Chi-Square analysis we performed on the data, please read the full report, written for an academic journal.
We’ll summarize with this. The probability is miniscule that the differences seen between telephone and web-form responses is just due to some fluke in who took each version of the survey — what’s known as sampling error. (For those who remember some statistics from college, we got a p value of 5.09 E-16. That’s 16 zeros to the right of the decimal point — a highly significant difference.) The differences seen are most certainly due to the impact of survey administration mode.
Survey mode matters! It can dramatically affect the scores we get from respondents!
So What’s This Got To Do With Net Promoter Scores?
Reichheld has pushed the idea of using the net scoring statistic rather than the mean to describe what the survey data are telling us. He believes this statistic helps drive operational implementation of the survey findings in part because it provides focus on respondents at the low end of the scale.
Here, we see even more dramatic differences between the survey modes. Look at the nearby table and the previous chart. The difference in the mean score from phone (8.79) to web-form (7.44) is certainly enough to pique most any manager’s interest, but a difference in the Net Scoring for the Recommendation question from 58.1 to 18.0 is of cardiac-arrest proportions.
Earlier we mentioned the “threshold effects” caused by the net scoring statistic. This is what we mean. Relatively small shifts in response patterns get magnified by the net scoring statistic.
The reason the company let us analyze the data is because they are moving slowly to more web-form surveys to reduce cost. But they found their NPSs falling. Note the third row in the nearby table. The overall combined NPS was 54.7% because there were far more phone surveys than web-form surveys conducted. If 10% of the surveys were to shift from phone to web-form — all else held equal — the NPS would drop from 54.7% to 50.7%, purely due to the composition shift of the mixed-mode administration!
The results provide strong evidence that survey mode can dramatically change survey scores. Our analysis shows that telephone surveys provide much higher scores. This finding has several practical implications.
- A mixed-mode survey program can create huge swings in trend lines if the mix of data from each mode changes. Best to use one mode only, keep the mix across modes consistent, or analyze the data by mode and not merging it. Developing adjustment factors would be a huge undertaking.
- While it is known among survey professionals that comparing data across companies is highly dubious where the surveying practices are not identical, it is commonly done by those who do not understand the danger. This research highlights why it is dangerous. When comparing one’s own Net Promoter Score — or any other survey question — to the scores from other companies, many, many confounding factors are in play. Drawing conclusions from these comparisons is dangerous and should not be done.
- The net scoring logic could create panic or joy that does not reflect true changes in how customers feel towards a company. Net scoring has significant threshold effects, and a small shift in the distribution of scores can have a dramatic impact on the resulting “net score.” Those who advocate use of this statistic say those shifts can create a call to action, but the NPS threshold effects may create a lot of false positive calls to action.
Want To Test Some Aspect of a Survey Program?
In addition to mode variation, numerous factors can impact survey results, assuming that nothing else has changed, for example:
- Question wording
- Scale design
- Positioning of questions (start of survey or end)
- Geography (impact of different cultures)
- Survey length
- Use of incentives and reminders
Understanding the impact of surveying practices with real-life data is important. So we have an offer for everyone. If you have data from a controlled “experiment” such as the one discussed here, we are ready to evaluate your data and determine the impact. Or if you’re willing to participate in experiment with your survey program, let us know. All we ask is for permission to write about the results.
Note: NPS and Net Promoter Score are trademarks of Satmetrix Systems, Inc., Bain & Company, and Fred Reichheld.