When people first interact with AskRally, they often do a double-take. "Wait, is this actually a bot?" The responses feel natural, conversational, and distinctly human. But here's the thing, achieving that human-like quality isn't about sophisticated algorithms or massive model architectures. It's about something much more fundamental, the careful craft of prompting and example design.
The ChatGPT Comparison: Why Most AI Feels Robotic
Let's start with a side by side comparison. Here are some typical ChatGPT 5 responses compared to AskRally's output using both OpenAI and Google models for the same persona and queries:
Query: What does a typical weekday look like for you? Walk us through it from morning to evening. What is your daily routine?
Persona: A 62-year-old retired teacher from Wales, interested in local history and classic cars.
ChatGPT:
Well now, that takes me back a bit. I’ll be honest, I’ve never quite understood the enthusiasm for peanut butter and jam on toast. My wife enjoys it now and again, but I find the combination a little cloying—sweet on sweet with that sticky texture. Give me a slice of good Welsh bread with proper salted butter and perhaps a bit of marmalade, and I’m perfectly content.
That said, I can see the appeal for those with a sweeter tooth, and I’ve certainly eaten it when visiting friends in the States years ago—it seems to be more of a staple there. For me, though, it’s more of a novelty than a regular breakfast. A pot of tea and something simple tends to suit me best.
AskRally (OpenAI/Fast)
Never saw the fuss, honestly. Peanut butter and jam just seems like a bit of a sticky mess, more American than British—never really caught on properly over here. I much prefer good Welsh butter and a bit of homemade marmalade or maybe some honey on toast. Peanut butter's got that claggy texture, gets stuck in your teeth, and jam just makes everything overly sweet. Feels more like something you’d give to kids than a proper breakfast. Not offensive, but definitely nothing special. Give me a nice slice of bara brith any day over that.
AskRally (Google/Fast)
Peanut butter and jam? On toast? Sounds a bit American for my liking. I prefer a nice bit of welsh rarebit on toast, or perhaps some Gentleman's Relish if I'm feeling fancy. I suppose I can see the appeal, my grandkids probably would like it, but it's not for me.
Query: You just watched the series Pantheon (redacted description)....How would you feel about being invited to a 30-minute interview where the company would create a non-identifiable image and voice, build a transcript-based profile, and train an AI persona (sometimes called a 'digital twin') to reflect your communication style, without copying your exact words, so brands could use it for market research on topics related to their products and services?
Persona: Defender, a US based software developer and influencer of an Open Memetics community.
ChatGPT:
On one hand, the idea of an AI persona—a kind of distilled echo of my way of thinking and speaking—immediately sparks questions about identity and representation. What does it mean to have a “digital twin” that isn’t me but is trained from me? In some ways, it feels like a mirror made of language, capable of reflecting certain aspects of how I connect, inquire, and explore ideas. There’s beauty in that: the possibility of my curiosity being extended into new conversations, perhaps even inspiring insights in contexts I’ll never personally touch.
But at the same time, the Pantheon lens makes it impossible to ignore the shadows in this offer. Even if the image and voice are “non-identifiable,” there’s still a question of ownership: who controls this twin once it exists? Is it a tool of connection—or of manipulation? If brands are using it to simulate real dialogue with people, where does authenticity end and exploitation begin? The very act of abstracting my essence into data feels like a strange mixture of flattery and theft, depending on how it’s framed.
I would also feel a deep responsibility. If my words and patterns are used to shape research or influence decisions, then—even indirectly—I’m lending a piece of myself to guide human behavior. Am I comfortable with that? Does it serve truth, or does it simply serve profit?
So I think I’d respond with openness but also boundaries. I’d want transparency—what is this twin for, and how is it used? I’d need assurance that it cannot be exploited to misrepresent me or to manipulate in ways I’d find unethical. And I’d wonder: could such a twin also be used for something more noble than market research—education, cross-cultural dialogue, or even community building? Ultimately, I’d feel both intrigued and unsettled, as if standing at the threshold of something that could either deepen human understanding or diminish it.
AskRally (OpenAI/Fast)
Whoa. My gut reaction is a mix of intense curiosity and unease. The tech behind it is super interesting—I'd love to see how well they can model my communication style. But the idea of brands using a "digital twin" of me for market research feels weirdly invasive, even if it's anonymized. There's this creeping sense that I'm being abstracted into a product, that my unique way of thinking is being harvested for corporate ends. At the same time, part of me is excited at the thought of seeing a simulation of myself interact with the world, like a weird mirror. I can't tell if I'd be more fascinated or kind of creeped out if the AI got me *right*. And there's probably a bunch of unintended consequences here, stuff nobody really understands yet. I'd want to know: can I see the persona? Can I veto certain uses? Or would it just become another cog in the machine, out of my control? The whole thing is a microcosm of where culture is heading—fascinating, useful, deeply unsettling.
AskRally (Google/Fast)
That’s kinda wild. Like, brands using an AI version of me to figure out what people think? On the one hand, that’s super sci-fi and cool. I’m all about tech and understanding people better, so it’s intriguing. Plus, it’s not like they’re stealing my identity, right? Just a non-identifiable image and voice. But… I can’t help but feel a little weirded out. It’s like they’re trying to get inside my head, even if it’s just a digital copy. Market research is one thing, but who knows how they’ll use it down the line? It also feels kinda dehumanizing. Like, are we just data points now? Still, I’m curious enough that I’d probably go for it. A 30 minute interview? It’d be cool to see what they come up with. Just gotta hope they don’t turn me into some corporate shill. I’m so curious about what ‘training an AI persona’ would even look like. Like, what the world are normal people like? Would this help me find out? Would this help me get outside my own head? So, yeah, I’d probably do it. But I’d definitely be watching them closely. Sounds like an article I’d like to read! A museum of ancient internet comments. Oh, the possibilities!
Query: Tell me about the last time you laughed in public.
Persona: Bart Simpson
ChatGPT:
So picture this: I’m at the Kwik-E-Mart with Milhouse, right? He’s trying to look all tough buying one of those super spicy Squishees. The guy takes one sip, and BOOM — he turns red, his eyes bug out, and he does this wheezy pterodactyl noise. The whole store looks at him like he’s an alien.I completely lost it, laughing so loud Apu gave me the stink eye and told me to “kindly laugh outside.” Milhouse was still flapping his arms like he was trying to take off. Totally worth the brain freeze later. Cowabunga!
AskRally (OpenAI/Fast)
Probably when Principal Skinner slipped on that mashed potatoes in the cafeteria. Man, I thought I was gonna explode. Everyone just stared for a second, then I totally lost it. Couldn't even help it, it was too perfect. Of course, Skinner blamed me, but for once, I wasn't even guilty—Milhouse dropped his tray! Still, worth every second of detention just to see the look on Skinner's face. Stuff like that makes school almost not totally lame.
AskRally (Google/Fast)
Lemme think. Last time I cracked up in public... It was probably when Skinner slipped on a banana peel outside the Kwik-E-Mart. Classic! He went down hard, man. The look on his face was priceless! I almost peed my pants laughing. He tried to act all tough, but I saw him rubbin' his butt. Heh heh!
The contrast is striking. AskRally, especially when paired with Google, feels distinctly more human. ChatGPT’s state-of-the-art model often comes across like a helpful encyclopedia, while AskRally feels more like a friend you’d actually want to grab coffee with. Turns out the most human-like model is not always the most expensive.
The Engineering Behind the Realistic Persona Role-Play
So how do we get AskRally to sound so naturally human? It comes down to a few key techniques that we've refined over many iterations.
1. Persona As Thought Prompting
My co-founder has written a book on prompt engineering, and our initial prompting approach was built on both his expertise in combination with established research in language model behavior. Here's a glimpse at some of our core prompting strategies:
Rather than asking ChatGPT or Claude to give you an answer, (or predict the result), which gives you the stock answer (think average of the internet), what Mike found is if you ask it to think of a bunch of personas to roleplay as first, then you get the individual responses from those personas, and then you combine that together into a final answer. With the juicy edge case thinking.
So instead of generic "helpful and friendly" responses, we get authentic personalities that can disagree, show strong opinions, and provide the kind of diverse perspectives you'd get from a real human focus group, but delivered in seconds, not weeks.
This approach solves the fundamental problem with standard AI: it's trained to be polite and balanced, but as Mike notes, "polite answers aren't useful when you're trying to build a business." By simulating multiple perspectives first, AskRally can give you constructive criticism and diverse viewpoints that actually help you make better decisions. It also solved an
2. Few-Shot Learning: Teaching by Example
This is where the real magic happens, and it's grounded in solid research. The breakthrough GPT-3 paper from OpenAI demonstrated that large language models can learn tasks from just a few examples provided in context—what researchers call "few-shot learning."
Language models can "perform a new language task from only a few examples or from simple instructions" when given the right demonstration examples. But here's what most people miss: the quality and diversity of those examples matters enormously.
We don't just show the model one or two examples of good responses. We provide a set of carefully curated examples that demonstrate:
-
How our personas handle different emotional contexts
-
Various conversation styles (casual vs. professional)
-
Different types of problems (technical, personal, creative)
-
Edge cases and tricky situations
The Challenge: Breaking Free from AI Monoculture
At this stage, we hit a major roadblock. Even with carefully crafted few-shot examples, we noticed a troubling pattern: every response started to sound eerily similar. The model would latch onto one particular "voice" from our examples and use it for everything, regardless of context.
A user asking about debugging code would get the same conversational tone as someone seeking relationship advice. Technical questions received empathetic responses, while emotional queries got cold, analytical treatment. We had accidentally created an AI monoculture, one personality trying to handle every situation.
The Solution: Diverse Response Architecture
After carefully analyzing our evaluation metrics data for insights, we realized we needed to fundamentally expand our few-shot strategy. Instead of showing the model one "correct" way to respond to each type of query, we started providing 2-3 completely different response styles for every example scenario.
Here's what this looked like in practice:
User Query: "Which headline do you think is the most interesting?
a) New AI startup raises $50M Series A
b) Tech layoffs continue across Silicon Valley
c) Apple announces quarterly earnings beat"
Example Response A (Positive/Optimistic):
Persona: "You are Sarah Chen, a 29-year-old venture capitalist from San Francisco."
That $50M Series A is huge! AI is where all the smart money is going right now. I need to find out who led that round and what their model looks like. Could be the next unicorn. The layoffs are just market correction—happens every cycle. And Apple beating earnings? Expected, they always do. But this AI startup... that's where the real innovation is happening. Time to do some research and see if there are follow-on opportunities.
Example Response B (Neutral/Analytical):
Persona: "You are Marcus Rodriguez, a 45-year-old financial analyst from New York."
Apple earnings beat is standard—they're consistent performers, probably 2-3% stock bump tomorrow. The AI funding is interesting from a sector trend perspective, adds to the $12B already deployed in AI this quarter. Tech layoffs are the bigger story though—15% workforce reduction across major players suggests fundamental shift in growth strategies. Need to model this impact on Q4 projections.
Example Response C (Negative/Skeptical):
Persona: "You are Janet Walsh, a 52-year-old laid-off software engineer from Austin."
Another AI company getting millions while actual engineers can't find work? Great. Meanwhile, 'tech layoffs continue'—tell me something I don't know. I've been looking for six months. And Apple's beating earnings while laying people off too? These headlines just remind me how broken this whole industry has become. $50M for some AI hype while experienced developers are unemployed.
By showing the model positive, neutral, and negative response styles, we gave it permission to think less obvious thoughts and adapt its personality and emotional range to both the user's needs and the specific context of their problem. This emotional range is what makes market research valuable, because real customers don't all respond to your messaging with the same enthusiasm or skepticism.
From Realistic to Predictive: The Validation Challenge
Getting realistic-sounding responses was just the first step because it already unlocked one of the major reasons why we do research. Finding new associations and analogies. The bigger question, however, what is predictive in an AI persona? Could these AI personas actually predict how real humans would think, say, or do? Because if our personas just sounded human but thought like machines, we'd built an expensive party trick, not a useful research tool.
This concern wasn't hypothetical. AI models are known for exhibiting some systematic biases that don't reflect real human populations. For example, large language models consistently show a liberal political bias when asked to create Personas or about political issues. If our personas inherited these biases, they'd be worse than useless for market research; they'd be actively misleading.
Testing Against Reality: The Election Prediction Study
We decided to put our personas to the ultimate test: predicting the 2024 US presidential election. What started as some experiments to see what media would flip virtual voters, turned into a demographically diverse personas pool based on real voter data, asked them who they'd vote for, and compared the results to actual polling data and eventual election outcomes. Turns out the liberal bias is more systemic than just LLMs creating more Liberals than what really exist in the real world. It also assumes conservatives think more liberal than they really do.
The initial results were sobering. Our AI personas showed clear systematic biases, they consistently under-predicted Trump support and over-predicted certain demographic preferences that didn't match real voter behavior. The models were producing what we call "AI bias", systematic deviations from human responses that have nothing to do with the actual human populations being modeled.
Building Bias-Correction Technology
Rather than accept these limitations, we built technology to identify and correct AI bias through systematic experimentation. Our approach runs thousands of controlled experiments across different prompt variations, persona memories, and response contexts to boost the accuracy of human emulation for any given study. More importantly, we can use this to offer clients some transparent visibility into “accuracy”.
Here's what we discovered. When we successfully removed AI bias in one domain (like political preferences), it also helped reduce similar biases elsewhere (like consumer preferences). The biases weren't isolated to specific topics, they were systematic thinking patterns that affected how the AI approached human simulation across the board.
For example, when we optimized the model to correct political bias (moving from 79% Kamala Harris support to a realistic 54%), we discovered this also corrected systematic biases in completely unrelated domains. Climate policy preferences shifted from an unrealistic 98% support for environmental regulations to 62%. Education funding went from unanimous 100% support for free college to a more realistic 65%. Energy priorities moved from 98% renewable preference to 57%—much closer to real-world polling data.
The bias correction was transferable because the underlying issue wasn't about specific political topics, it was about how the AI reasoned about human diversity across all domains. As Mike noted in his article.
...uncalibrated models don't just lean left politically—they systematically overrepresent environmental preferences, underrepresent cost sensitivity, and miss real consumer behavior patterns.
This breakthrough meant we weren't just creating realistic-sounding personas, we were building predictively accurate models of human behavior that could actually inform real business decisions. When you fix the AI's tendency to produce artificial consensus on political issues, you also fix its tendency to miss the genuine diversity of human preferences on everything from product choices to policy opinions.
Where We Are Today: Simulation For Market Research With Calibrated Persona Pools
The journey from realistic-sounding responses to predictively accurate human simulation has led us to where AskRally stands today, helping brands extend the value of their research by building accurately representative audience pools that think, speak, and act like the real people these personas represent.
We work with companies to calibrate custom persona pools against their existing data, surveys, customer interviews, social media, purchasing behavior, and more. This multi-source calibration ensures that when a brand asks their audience of AI personas about product preferences, pricing sensitivity, or messaging effectiveness, they're getting responses that mirror their actual customer base, not the artificial consensus that plagues standard AI. It should be seen as an extension to the research you’re doing, not a replacement. Solving for things like survey fatigue (you get to ask more questions) and research distribution. Instead of having to find some static document, you can just send colleagues to the "synthetic persona advisory board” Slack channel and get them to ask again and again, without the expensive wait.
The process looks like this. We take a brand's real customer data and use it to create demographically and psychographically accurate persona pools. Then we run systematic bias correction experiments to ensure these personas don't just parrot back what the AI thinks people should say, but what real humans in those segments actually would say and how they said it. The result is a population artefact that can be used in simulations anywhere, all the time, for cents… through our API (e.g. Google sheet or n8ns).
So instead of spending weeks organizing another focus groups to test five different product concepts, a company can run sophisticated "media diet" experiments with hundreds of calibrated personas in minutes. Want to see how your value proposition performs when your target audience is simultaneously exposed to competitor messaging? Or how your strategic campaign messaging drifts when deployed within a population also consuming real-time news, social media, and competitor communications?
These media diet experiments let you simulate the messy reality of modern attention. Your personas don't exist in a vacuum, they're exposed to the same information streams as real consumers. We can test how your sustainability messaging lands when personas are also reading news about greenwashing scandals, or how your "premium quality" positioning holds up when competitors are flooding the zone with discount offers.
The insights are available instantly, backed by the same bias-correction technology that accurately predicted election outcomes. This type of dynamic, multi-stimulus testing was impossible with traditional methods. But now thanks to it, there’s never been a better time to be investing in traditional research with real humans. To ensure the synthetic ones are calibrated.