Over the 10 years we’ve been running Quantic Foundry and engaging with indie-all-the-way-to-AAA clients on a wide variety of gamer research/marketing projects, one pattern I’ve noticed is that clients often perceive survey panel samples, especially general population (gen pop) survey samples, as a sort of “cure all”. They tend to suddenly become much less inquisitive and critical when this part of the methodology is brought up, even though they were very engaged in other aspects of the project (e.g., survey question phrasing/ordering).

The survey panel industry is fundamentally about paying people to take surveys, which introduces thorny issues in terms of data quality. And the tactics to sanitize the data/respondents escalate in parallel with bad actors trying to cheat the system. Bad actors are increasingly operating at scale with human-bot hybrid farms, and now AI is part of the equation because AI is very good at generating non-repeating, plausible-looking data for both multiple-choice and open-ended responses. They also use VPNs, device emulators, and translation tools to spoof being every target demographic in the world.

AI is very good at generating non-repeating, plausible-looking data.

We Showed a Client That 80% Of Their Panel-Fielded Survey Data Was Bogus

We worked with a client of a well-known brand where we created a custom segment model and typing tool for them using a large, direct player base sample. They then contracted another vendor to conduct broader gen pop panel-fielded survey studies using this segment framework, but they encountered anomalous segmentation results and reached out to us again to troubleshoot the typing tool.

Since we had extensive domain knowledge of this game/audience and a large data set from actual gamers from the initial study, we were able to unequivocally show them that 80% of their new panel-fielded survey data was completely bogus—not just mild outliers, but completely implausible combinations of responses. This client had spent months and tens of thousands of dollars on that bogus data.

There’s an ingrained perception that gen pop panel samples are an accepted gold standard whether in market research or electoral polling, which reduces scrutiny and due diligence. Additionally, the survey panel industry has an odd property where it’s deceptively easy to explain it to someone at a high-level, but at the same time its internal machinery is highly opaque because of how many intermediate levels it takes to get to the ground floor.

The end client (say a gaming company) often contracts with a research agency to conduct the project. The agency, who may not manage a panel of their own, then subcontracts panel recruitment from a survey panel vendor. This vendor may have a small-to-large panel of their own, but often needs to supplement it for better coverage (e.g., geographic regions, hard-to-reach audiences), so they subcontract additional sample from large panel exchanges (which you can think of as an eBay where panel providers sell their respondents to mid-level panel vendors based on a bid system).

Because of mergers and acquisitions in the panel industry, because of the underlying panel exchanges, and because avid survey takers sign up for multiple panels, we can largely conceptualize this underlying pool of gen pop respondents as being highly homogeneous no matter where you’re getting them from (with the exception of proprietary/specialty panels). In any case, the end client is often 3-4 levels removed from the actual respondents. To say that the end client often has limited visibility into the panel recruitment process would be an understatement.

Panels & Panel Exchanges Have An Average Fraud Rate of 29%

Rep Data recently conducted a study across 4 large, well-known panels and 2 well-known panel exchanges. They found an average fraud rate of 29%–i.e., respondents with excessive/suspicious activity patterns or were using suspicious VPNs, emulators, etc. The fraud rate ranged from 21% to 38%, with panel exchanges in the middle, which makes sense since panel exchanges are a pooled-together average of the other individual panels.

Note that this fraud rate doesn’t include inattentive/unmotivated respondents who may straight-line or enter gibberish into surveys just to quickly get the financial incentive. The fraud rate is just one slice of the pie (the newer, growing problem) that Rep Data focused on in their study. If we assume a very conservative 10% inattention/unmotivated rate, then an average total 40% of panel-recruited respondents are problematic.

This means that when we’re fielding from panels or panel exchanges, the optimal disqualification/termination rate is around 40%. Anything below this translates into that remaining percentage of undetected bad responses entering your collected responses. Of course, when the rate of problematic responses approaches or exceeds 50% in some panels or when the incidence rate of the target audience is low (which is typical when we’re surveying specific genre fans on specific platforms), how confidently can we trust the remaining responses or that we correctly weeded out the bad respondents to begin with?

Your collected survey responses consist of 2 parts: authentic respondents + bad respondents. The key thing to understand is that bad respondents are really good at bypassing your screening criteria (using VPNs, platform emulators, or figuring out your screening questions), especially if they’re part of a coordinated team/farm. Thus, the percentage of bad responses as a function of total respondents screened remains roughly the same regardless of how prevalent or rare your target audience is.

Let’s assume the bad response rate is 30% of all respondents screened. In a very general survey that has no specific screening criteria, if we screened 1,000 respondents, we would expect the final data set to consist of 300 bad respondents + 700 authentic respondents. In the resulting data set, you would have an observed bad response rate of 30% (same as the overall rate).

Ok. Now imagine we’re running a survey with more stringent screening criteria and qualifications. From previous research, let’s say we know that only 10% of the general population fit this screening criteria (i.e., the incidence rate). If we screened 1,000 respondents, we would expect the final data set to consist of 300 bad respondents (same as above because they’re specifically adept at bypassing screening criteria and faking to be the target audience) + 70 authentic respondents (i.e., 10% of the 700 authentic respondents). In the resulting data set, you now have an observed bad response rate of 81% (=300/370). Of course, in practice, stringent screening criteria would also lower the bad response rate somewhat, though not to the degree of the expected incidence rate.

Thus, when the incidence rate (the prevalence of the target audience) decreases, the collected data consists of a higher proportion of bad responses. This is why the estimated fraud/inattention baseline of 39% isn’t the full picture. No one is running surveys without screening criteria. The actual, observed rate of bad responses can often be higher than 39% because it interacts with the incidence rate.

The Race to The Bottom

What is driving these systemic issues in the survey panel industry? I could talk about things we’ve learned about the survey panel industry ad nauseam, but I’ll focus on some of the most striking structural issues we’ve learned as we peeled back the onion over the years.

Each layer of the stream from end client to survey respondent has different incentives, but these are mostly aligned in one specific way.

  • The end client wants to minimize fielding time and project cost.
  • The research agency is competing against other agencies when putting in proposal bids; in survey projects, the easiest way to reduce the proposal cost is to lower the respondent payout.
  • Panel vendors are also bidding competitively and want to maximize their margin by retaining as much of the payment as possible, typically by minimizing payouts to respondents.

This puts downward pressure on respondent payout throughout the stream, specifically in forcing most agencies/vendors to compete on cost rather than data quality.

This downward pricing pressure has resulted in typical payouts of USD 0.25 – 2.00 for a 10-minute survey, which are often paid out via point systems or reward wallets rather than direct cash payouts. This raises the question of what data quality we should expect from respondents who are being paid USD 0.25 in reward point equivalents for a 10-minute survey.

This puts downward pressure on respondent payout throughout the stream, specifically in forcing most agencies/vendors to compete on cost.

It’s been a race to the bottom where the end client ostensibly wants high quality data, but because the project cost is far more tangible than data quality at the proposal stage, they often end up incentivizing the entire stream to pay respondents as little as possible at the detriment of data quality. And once this structural transformation has permeated the industry, it’s now become much harder to find high-quality respondents.

Old Sanitization Strategies Are Failing

Two of the most robust survey design strategies to exclude problematic respondents used to be domain-knowledge checks (e.g., “Which of the following is not a feature in Game XYZ?”) and open-ended response checks (e.g., “What game feature do you enjoy the most in Game XYZ?”). Unfortunately, these two strategies have become brittle in the face of LLMs, which can handle domain-specific questions and generate non-repeating strings of plausible open-ended responses.

It used to be easy to weed out gibberish in open-ended responses because people would type in low-effort autocompletes. But producing a seemingly-high-effort answer is now just a copy/paste away with an LLM. These old strategies are also failing in a completely silent manner: they are in fact creating the illusion that survey response quality is improving when it is in fact degrading dramatically.

We Often Conceptualize the True Adversary Incorrectly

When designing surveys and implementing sanitization strategies, practitioners often conceptualize the risk as being a “diffuse” problem of many low-effort individuals spread across the system. Thus, safeguards such as knowledge questions, consistency checks, speeding checks, etc. are included to weed out these individuals.

But the far more dangerous adversary has always been coordinated teams (or “farms”) because with automated tools, they can push through much larger volumes of bogus survey completes. The way they work is by first deploying a “vanguard” that figures out all the screening criteria and trap questions in your survey. Once this is done, they hand over the key to the automation team, which then uses a variety of human/bot/AI hybrid systems to retake the survey as many times as possible. And as recent events have shown, sometimes the call is even coming from inside the house.

The far more dangerous adversary has always been coordinated teams (or “farms”) because with automated tools, they can push through much larger volumes of bogus survey completes.

A sinking suspicion I have is that the heavily-weaponized surveys that are now commonly fielded in this arms race have created a truly toxic survey-taking environment: they have minimal impact on coordinated teams, but make the survey-taking experience absolutely hell for actual individual respondents. What normal person would spend 10 minutes answering a battery of schizophrenic trick questions for 50 cents in a platform-specific virtual currency? Over time, this tragedy of the commons is increasing the proportion of panel respondents who are purely financially motivated and constantly trying to game the system.

AI-Generated User Data Is Everywhere

The general poor health and quality of panel recruitment is opening the door to even more concerning trends in data-driven user/market research. The threat of AI-generated data doesn’t only exist in panel-fielded survey data. Some AI companies are championing the idea that we should simply get rid of human respondents altogether and rely on AI-based personas who can stand in for target audience cohorts.

Other companies use AI to inflate a small human respondent data sample to create a much larger data set. For example, they might collect survey data on a small handful of questions and then have AI confabulate responses to hundreds of other traits/questions that were not directly assessed. We’ve seen companies frame this data hyperinflation as an “innovation” and use slippery marketing buzzwords such as “AI-assisted assessments” to obfuscate what they’re doing. But these aren’t innovations at all; they’re admissions of not having the actual data in the first place. It’s the same marketing trick as a drink labeled “made with real orange juice” which is technically correct even if it contains less than 1% juice.

These aren’t innovations at all; they’re admissions of not having the actual data in the first place.

Making Chicken Nuggets

There’s this old episode of Jamie Oliver, the UK chef, where he’s showing kids how chicken nuggets are made. And he’s taking leftover chicken bones and chicken skin, processing it into a pink goo in a blender, mixing in handfuls of preservative powders and artificial flavors—so now there’s more non-chicken than chicken—and then deep frying it into nuggets. And the kids, the kids go crazy for it because it’s the most nuggety-looking “contains real chicken” nuggets they have ever seen.

Like these chicken nuggets, much of the market/user data we are consuming these days may only have the appearance of data: under the hood, it is being infiltrated with fraudulent survey responses and reprocessed AI slop. We’ve somehow arrived at an existential crossroads in user research where many of our data sources are being severely degraded and where new AI companies are selling the idea that user research can be done without actual users or by minimizing input from users.

People are complicated things. And understanding what makes people engaged is a complicated problem. But it is a problem that established, rigorous methodologies can help solve. Removing users from user research is like looking under a streetlight rather than where you dropped your key. Knowing where your data comes from and how it’s being processed will be critically important in the coming months and years as AI permeates the market/user research world. As we are bombarded with buzzwords and fads, all touting innovation, now more than ever is a good time to check that your data sources are organic.

Now, more than ever, is a good time to check that your data sources are organic.

I want to avoid stating highly-specific survey design tactics in a public blog post because it only serves to be AI-scraped and escalates the arms race, but here are high-level strategies you might consider adopting:

  • Start moving upstream in terms of panel respondent acquisition and get as far upstream as you possibly can. There are important aspects and parameters of panel respondents that only become accessible/apparent when you get close to the source.
  • Seek out curated, proprietary/specialty panels that target your specific industry/domain. While they might be smaller and have weaker geographic coverage, they tend to have far more robust data quality—e.g., due to higher accuracy and lower measurement error, you get better confidence intervals even with lower sample size.
  • Don’t skimp out on respondent payout. This also means understanding exactly how much your respondents are getting paid and how much is paid out to the multiple layers of middlemen instead. What’s the exact dollar payout to your respondents and how exactly are they being paid (direct vs. point systems vs. reward wallets)?
  • Establish domain-specific baseline questions that you can use to verify data quality. Do not share these with your vendors.

At Quantic Foundry, we completely avoid the systemic issues plaguing survey panels because we’re not paying people to take our Gamer Motivation Profile; no financial incentive is provided to our respondents. Instead, respondents primarily take our survey to get an accurate, in-depth assessment of their gaming motivations and tailored game recommendations. Our unique approach yields a high quality data set from highly-engaged, intrinsically-motivated respondents.

Your Experiences with Panel Data / AI-Generated Data? Anything We Missed?

  • If you are also a practitioner working with panel survey samples, are there other recent trends you’ve noticed in panel-fielded data?
  • Are there other systemic/structural issues in survey panels that we missed?
  • Has AI impacted how you or your company thinks about and acquires user/market data?