Mixed-Mode Surveys: Impact on NPS and Survey Results

Summary: Survey data are affected by all aspects of the survey process. This article examines the impact of mixed-mode surveys — telephone and webform — using actual data from a B2B company’s transactional survey for the research study.

Telephone surveys garner higher scores than the identical web-form survey, caused by a scale-truncation effect. The differences between survey administration modes are amplified by the threshold effects in the “net scoring” statistic.

Consumers of survey data, especially when doing cross-company comparisons, should understand the impact upon survey data resulting from differences in questionnaire design and survey administration practices before making decisions based on the survey data.

~ ~ ~

Dick Novitsky, Regional Service Operations Manager for a large technology firm, got a call from his VP. “Congratulations on your Net Promoter Score for this year. 59% shows good solid performance. I’ll be supporting your promotion.” Another regional manager, Nick Dovitsky, got a call from his VP. “What the heck is going on here??!! Your Net Promoter Score is 18%. We can’t tolerate such performance. You’d better get crackin’ — or update your resume.”

In actuality, Dick and Nick are the same person. The data reported above were from the same year’s survey of the same customer base. Those two Net Promoter Scores (NPS) are true and accurate. The same survey questionnaire was used in both cases. The difference:

The 59% NPS was from telephone administration of the survey
The 18% NPS was from webform administration of the survey.

Survey Sample Size Calculator

Get our Excel-based calculator. It can also be used to gauge statistical accuracy after the survey has been completed.

This article presents findings from a research study that examined the impact of mixed-mode survey administration upon the scores provided by customers. The data come from a large, well-known business-to-business technology company that conducts ongoing survey programs of its US customers regarding its service delivery. The company delivers the identical survey instrument both by telephone interview and by email invitation to a web-form survey.

While there’s much anecdotal evidence and “common knowledge” that survey mode matters, we had not seen research using data from an actual business environment. This is not a perfect, controlled experiment, but the statistical findings are so strong that we’re confident the research findings are important for two issues for companies that conduct customer surveys, especially, so-called “NPS surveys.”

The administration mode can dramatically change the scoring respondents provide.
The “net scoring” statistic used to calculate the Net Promoter Score can cause dramatic swings in the NPS across survey administrations — swings that likely overstate what are the actual changes in the views of the respondents.

Why Mixed-Mode Surveys Matters

While surveys are a very useful indicator of customer views, survey data have achieved a level of unquestioned acceptance that is, frankly, scary. Far too many people believe that survey results are the “truth” about how customers — or any other group being surveyed — view their company. Yet, survey research results, like all research, contain many sources of error. These errors will be exacerbated if the survey program is done by someone who is unaware of the challenges to performing good, valid, reliable research.

Survey error can result from a host of factors. In this article we’ll examine error from the survey administration mode chosen. We can conduct surveys many different ways, and this research examined two very common modes:

By telephone,
By web-form, linked to it from an email invitation.

The administration mode can affect the scores provided by respondents in two ways:

Whether someone responds
How they respond.

Telephone surveys are favored by many since they tend to get a higher response rate due to the more active solicitation of respondents. The higher response rate likely reduces non-response bias, which is the impact on the sample statistics that results from invited respondents choosing to not respond where those non-respondents have perceptions that structurally differ from those who did choose to respond. The lower the response rate, the greater the likelihood that some non-response bias is in the data.

However, survey mode also impacts what researchers call a composition effect, that is, who composes the respondent pool. People with certain demographic profiles, such as age, are more or less likely to take a survey using some administration mode, thus creating a bias in the survey results. For example, telephone surveys are more likely to be taken by an older demographic group.

Survey Training Classes

Running your own survey programs? Learn how to do it right from us — the premier worldwide trainers for survey design & analysis.

Featured Classes:

Survey Design Workshop

Data Analysis Workshop

But that’s not all. Survey administration mode also affects how people respond, which researchers call measurement error. How?

Perhaps the interviewers don’t all present the questions with the same intonation and prompts. That creates an interviewer bias in the data set.
Even the presence of an interviewer is known to garner more positive responses since respondents shun from saying negative things to live people, especially when using strength-of-agreement scales. This is known as a social desirability effect.
The communication exchange differs between telephone and web-form surveys affecting responses. Web-form surveys visually present the entire scale to the respondent with verbal and numerical descriptors. In contrast, in phone surveys the respondent is typically read the endpoints of the scale — “On a scale from 1 to 10 where 1 represents… and 10 represents…, how would you rate… ?” This presentation is more likely to lead the respondent to choose the endpoints of the scale that were just orally presented. While researchers typically call this a primacy and recency effect — the tendency to pick the first or last options — we feel the more accurate description here is a scale-truncation effect. Since the respondent only hears two points of the scale, they are more likely to choose those responses. In essence, the 10-point scale becomes a binary 2-point scale.

Impact of the Net Scoring Statistic

If you’re reading this article, it’s probably because of the “hook” in the title — Net Promoter Score. Elsewhere, we’ve critiqued the Net Promoter logic, but here we’ll address the statistical process behind the net scoring statistic that Fred Reichheld created.

Net scoring, like the mean, is a single statistic to summarize a data set. The logic of net scoring is to take the percentage of respondents at the top end of the response scale, so-called “top box,” and subtract from it the percentage of respondents at the lower end of the response scale, so-called “bottom box.” Net scoring thus arrives at a single number, which is expressed as a percentage that can range from +100% to -100%. While this statistic can be calculated for any ordinal or interval-rating survey question, Reichheld applied it to the recommendation question because of the predictive power he found for that question.

Tips for a Successful Survey

Request Chapter 1 of the new edition of our Survey Guidebook for key points in a more effective survey program.

Impact of the Net Scoring Statistic

Reichheld defined the “top box” as those providing scores of 9 or 10 on his 0-to-10 scale, which he labeled as “Promoters,” and the “bottom box” as those providing scores of 0 to 6, which he labeled as “Detractors.” Those providing 7s and 8s were labeled as “Neutral” or “Passives.” Thus the net score is %(9s+10s) – %(0s to 6s). While the NPS is typically expressed as a percentage, it truly is not a percentage.

The net scoring logic means we have two threshold points in the scale that change the classification of the respondent.

The change from 6 to 7 (and vice versa) – Detractor to Neutral
The change from 8 to 9 (and vice versa) – Neutral to Promoter

These threshold points can cause dramatic swings in the resulting scores as we are about to see.

The Research Site

The company that provided us access to their data is a well known B2B company that conducts transactional surveys for its 750,000 service transactions each year. We got access to one month of data.

The company uses a simple three question survey regardless of how the survey is conducted.

Assuming you were allowed to do so, how likely would you be to recommend XXX to colleagues within your organization or to other organizations?
Please rate your overall satisfaction with XXX as a [product type] service provider.
Overall, how satisfied were you with this most recent service visit?

Those customers for whom this company has an email address are sent email invitations with a link to the web-form survey. If they do not have an email address on file, then an independent, third-party professional surveying organization calls them to take the survey.

This study is not a perfect controlled experiment. To be such, customers would have been chosen at random to be surveyed either by telephone or by email invitation. That was not the case here. So, there is the possibility that the customers from whom the company has collected email addresses are structurally different from customers for whom it does not have an email address.

As will be shown, the differences in the scores between survey modes are so distinct, it is unlikely that this factor alone could explain those differences. The reality is that few companies are willing to conduct true experiments, so even this compromised experiment is valuable.

Data Analysis and Findings

We analyzed the data for all three questions and got similar results. So, here we will focus on the so-called Net Promoter question, posed on a 0-to-10 scale.

The nearby chart shows the results of our analysis. It displays the frequency distribution of survey responses for the Recommendation Question, broken out by survey mode. That is, it shows the percent of respondents who gave the score of 10, 9, 8 and so on. The phone survey data are in blue while the email web-form survey is in purple.

The differences in the distributions are dramatic, particularly at the top end of the distribution. While 54% of phone respondents gave scores of 10, “only” 27% of email/web-form respondents gave 10s — a 2:1 ratio! Yes, that’s right.

Frequency Distribution for Mixed-Mode Surveys

Frequency Distribution Phone vs. Web-form Recommendation Question

For every other point on the scale, those who responded via the web-form had higher frequencies. In the bottom half of the distribution, almost no phone respondents gave scores while there were some scores given here by email respondents. Simply put, the phone survey method garnered far more scores in the top response option.

The differences between survey modes was also highlighted in the mean score for each, assuming interval properties for the data, as shown in the figure and listed here:

Telephone Mean Score: 8.79
Web-form Mean Score: 7.44.

We won’t go into the test statistics in depth here. If you are statistically inclined and wish to see the Chi-Square analysis we performed on the data, please read the full report, written for an academic journal.

We’ll summarize with this. The probability is miniscule that the differences seen between telephone and web-form responses is just due to some fluke in who took each version of the survey — what’s known as sampling error. (For those who remember some statistics from college, we got a p value of 5.09 E-16. That’s 16 zeros to the right of the decimal point — a highly significant difference.) The differences seen are most certainly due to the impact of survey administration mode.

Survey mode matters! It can dramatically affect the scores we get from respondents!

So What’s This Got To Do With Net Promoter Scores?

Reichheld has pushed the idea of using the net scoring statistic rather than the mean to describe what the survey data are telling us. He believes this statistic helps drive operational implementation of the survey findings in part because it provides focus on respondents at the low end of the scale.

Here, we see even more dramatic differences between the survey modes. Look at the nearby table and the previous chart. The difference in the mean score from phone (8.79) to web-form (7.44) is certainly enough to pique most any manager’s interest, but a difference in the Net Scoring for the Recommendation question from 58.1 to 18.0 is of cardiac-arrest proportions.

Earlier we mentioned the “threshold effects” caused by the net scoring statistic. This is what we mean. Relatively small shifts in response patterns get magnified by the net scoring statistic.

The reason the company let us analyze the data is because they are moving slowly to more web-form surveys to reduce cost. But they found their NPSs falling. Note the third row in the nearby table. The overall combined NPS was 54.7% because there were far more phone surveys than web-form surveys conducted. If 10% of the surveys were to shift from phone to web-form — all else held equal — the NPS would drop from 54.7% to 50.7%, purely due to the composition shift of the mixed-mode administration!

Top Box, Bottom Box and Net Promoter Scores for Recommendation Question By Survey Mode

Conclusions

The results provide strong evidence that survey mode can dramatically change survey scores. Our analysis shows that telephone surveys provide much higher scores. This finding has several practical implications.

A mixed-mode survey program can create huge swings in trend lines if the mix of data from each mode changes. Best to use one mode only, keep the mix across modes consistent, or analyze the data by mode and not merging it. Developing adjustment factors would be a huge undertaking.
While it is known among survey professionals that comparing data across companies is highly dubious where the surveying practices are not identical, it is commonly done by those who do not understand the danger. This research highlights why it is dangerous. When comparing one’s own Net Promoter Score — or any other survey question — to the scores from other companies, many, many confounding factors are in play. Drawing conclusions from these comparisons is dangerous and should not be done.
The net scoring logic could create panic or joy that does not reflect true changes in how customers feel towards a company. Net scoring has significant threshold effects, and a small shift in the distribution of scores can have a dramatic impact on the resulting “net score.” Those who advocate use of this statistic say those shifts can create a call to action, but the NPS threshold effects may create a lot of false positive calls to action.

Want To Test Some Aspect of a Survey Program?

In addition to mode variation, numerous factors can impact survey results, assuming that nothing else has changed, for example:

Question wording
Scale design
Positioning of questions (start of survey or end)
Geography (impact of different cultures)
Survey length
Use of incentives and reminders

Understanding the impact of surveying practices with real-life data is important. So we have an offer for everyone. If you have data from a controlled “experiment” such as the one discussed here, we are ready to evaluate your data and determine the impact. Or if you’re willing to participate in experiment with your survey program, let us know. All we ask is for permission to write about the results.

Note: NPS and Net Promoter Score are trademarks of Satmetrix Systems, Inc., Bain & Company, and Fred Reichheld.