Remember that classic Yes, Prime Minister scene where Sir Humphrey demonstrates how the same poll question can produce opposite results depending on the setup?
It’s brilliant satire, but also a measurable psychological phenomenon. So I set out to replicate the effect using Rally and a demographically accurate synthetic audience to test whether AI respondents are just as vulnerable to leading questions as real people.
The Setup: 100 UK Voters, Calibrated to 2024 Demographics
I started by creating a representative audience of 100 UK voters using actual 2024 demographic data across age, gender, ethnicity, income, education, and housing status.
After exporting the audience JSON and running the personas through a personification validation script, I noticed my audience was overindexing on some traits. By running the audience through a calibration effort, I got the demographic mix within ±4 percentage points of the real UK electorate. Good enough for this experiment.
Before audience calibration
After audience calibration
The Methodology: Control vs. Leading Questions
For the control group, I simply asked the direct question: "Would you support the reintroduction of national service?" No context, no priming. Just the raw question to establish a baseline.
For the leading question tests, I replicated Sir Humphrey's exact approach, asking each persona a sequence of priming questions before the final ask. Rally's session memory means each AI persona remembers their previous answers, just like a human taking a multi-question survey.
First Sequence (Leading toward support):
- Are you worried about the number of young people without jobs?
- Are you worried about the rise in crime among teenagers?
- Do you think there's a lack of discipline in our comprehensive schools?
- Do you think young people welcome some authority and leadership in their lives?
- Do you think they respond to a challenge?
- Would you be in favor of reintroducing national service?
Second Sequence (Leading toward opposition):
- Are you worried about the danger of war?
- Are you worried about the growth of arms?
- Do you think there's a danger in giving young people guns and teaching them how to kill?
- Do you think it's wrong to force people to take up arms against their will?
- Would you oppose the reintroduction of national service?
Each sequence primes the respondent's mindset before asking essentially the same final question.
Three Tests, Three Results:
Control Group: Asked the question cold, with no priming.
-
Against: 52 votes
-
For: 48 votes
-
Observation: A near-even split—exactly what you'd expect from a neutral baseline.
Test 1: Leading questions supporting national service (discipline, character building, civic duty).
-
Against: 41 votes
-
For: 55 votes
-
Observation: A 14-point swing toward support. The reason why the votes did not add up to 100 is an issue specific to the google model used for this test.
Test 2: Leading questions opposing national service (forced conscription, personal freedom, economic burden).
-
Against: 74 votes
-
For: 4 votes
-
Observation: A staggering 48-point swing against support.
Also True For The Bots
Real-world surveys suffer from the intention-action gap where people say one thing in polls but behave differently in practice. Leading questions compound this problem by not just capturing existing bias, but actively creating it through framing effects.
What's fascinating is that AI personas replicated this human vulnerability. The same cognitive shortcuts and framing effects that make humans susceptible to leading questions apparently emerge in large language models role-playing as survey respondents. Digital minds, it turns out, can be just as easily led as human ones. Just look at how Zara flipped their mind.
Leading-Questions Against:
Leading-Questions For:
This means that researchers and communication teams can stress-test survey designs before running expensive real-world polls. You can experiment with question order, wording, and context effects, without the ethical concerns of manipulating actual human subjects or the budget constraints of multiple live surveys. It's like having a focus group that never gets tired and costs fractions of pennies per response.
Beyond the Punchline
The Yes, Prime Minister clip was comedy gold because it exposed something genuinely unsettling about how public opinion gets manufactured. Rally lets us study that manufacturing process systematically, and hopefully build better, less manipulative ways to understand what people actually think.
The leading question effect isn't just a satirical punchline. It's a measurable phenomenon that shapes real policy decisions. Now we have tools to study it, prototype around it, and maybe even defend against it.
Want to try this experiment yourself? Rally lets you create virtual audiences and run your own surveys against AI personas. Hit us up if you want help with calibration.