I remember the first time I went paragliding––and was sucked into a cloud, unable to orientate for 20 long minutes—it was a lesson in raw reality. Up there, soaring through unpredictable skies, I quickly learned that no amount of wishful thinking could substitute for the brutal truth about my own limits and the weather conditions. That experience taught me to balance risk and reward with unflinching honesty, a stark contrast to many work environments where consensus is often overvalued and creativity gets stifled by endless praise.
In the boardrooms I’ve been in, I’ve seen how an overemphasis on positive feedback can lead to echo chambers, where dissenting voices are smothered and ideas become bland echoes of the status quo. I vividly recall a past role where a new customer success lead was so eager to share “good news” that he overlooked warning signs in a critical client call. The meeting revealed that our champion was about to exit, and a key decision-maker had admitted in a whisper that a pilot project hadn’t even worked properly. What my CS lead saw as a minor setback, I recognized as a divergence of opinions—a necessary conflict that could ignite the right action. I stepped in to challenge the overly optimistic narrative, sparking an internal debate that eventually led to launching a new pilot with the prospect. That pilot rescued one of our largest deals—a testament to the value of embracing diverse perspectives and questioning consensus.
Just like paragliding forces you to confront reality head-on, our experiment with synthetic research reveals how advanced reasoning models capture the diversity of human thought. In this essay, I’ll help you choose between Fast and Smart by sharing learnings from my ‘impossible’ buying committee simulation experiment.
Unpacking the Experiment: Fast vs. Smart Comparison
In our simulated buying committee experiment, we challenged AI-powered role-playing agents to make a high-stakes vendor purchase decision. We designed the virtual audience with a web of complex buying behaviour, bias, and conflicting agendas. Then tasked them with evaluating an internal memo and voting to buy or not. The experiment was across all four of our supported models (openAI, Anthropic, Google, Groq) in two distinct modes (fast vs smart).
Fast Mode: (e.g. GPT-4-mini)
Personas responding in Fast mode relied on basic, cheaper models.
Smart Mode: (e.g. GPT-4o)
In contrast, agents in the Smart mode use the pricier models, and could engage in advanced, multilayered reasoning. For reference, gpt-4o costs 16x more than gpt-mini. And then gpt-4.5 (the most human one that passes the turing test) costs another 30x more than gpt-4o. (reach out to us for a custom project request)
After carefully preparing our audience with diverse personas, we prompted a simulated scenario to emulate an internal memo of the "impossible" buying committee. When you prompt in Rally, you ask all personas in the audience at once. So each role-playing agent took into consideration its background context, reviewed the memo, and cast a vote on whether it would support a buy or not.
Rally then lets us drill in and review the results of our synthetic AB test, but with the added capability to review inner thoughts and reasoning. Illuminating that all illusive “why” behind the behaviour.
OpenAI Fast
Anthropic Fast
By running this across all our combinations of models (and modes), asking all personas in the audience each time… we could see how their responses differ to learn how these agents navigated their own set of pressures and expectations.
Decoding The Responses
To assess the subtle nuances that distinguish Fast mode from Smart mode, I prompted GPT o3-mini to generate classifications that capture the key elements of each response, and then re-reviewed each response to analyse it through this lens.
With two of these decoded response maps side by side, I asked GPT-3-mini to compare them.
Then I ran a validation prompt on the comparison insight to check that it indeed reflected the actual responces.
Completing these steps to compare fast vs smart modes, we observed consistent patterns. Smart mode outputs consistently revealed its advanced reasoning would yield richer, more diverse, and context-aware outputs—mirroring the complex nature of human decision-making. This understanding could be critical when designing synthetic research that accurately captures the subtleties of real user behavior.
From this, I noted the following observations on how Fast and Smart modes differ in context of this specific experiment...
Demand for Specificity and Quantifiable Data
Smart Mode: Agents using Smart mode consistently demanded concrete numbers, clear ROI figures, defined cost-savings, specific KPIs, and detailed implementation timelines. Their reasoning was replete with references to precise benchmarks and a checklist of requirements.
Fast Mode: In contrast, responses from Fast mode acknowledged the vagueness of proposals but were more likely to gloss over missing details by relying on high-level heuristics. When key details were absent, Fast agents uniformly rejected proposals—often marking them as a “hard pass.”
Enhanced Clarity and Rejection of Vague Promises
Smart Mode: The advanced reasoning model not only noted ambiguity (e.g., phrases like “figure out stuff”), but it also explicitly denounced it. Smart responses set non-negotiable conditions for reconsideration, listing out exactly what was needed for a proposal to be acceptable.
Fast Mode: While Fast mode recognized that a lack of detail was problematic, its responses tended to remain conditionally open. They were less demanding in terms of specificity, often taking a more binary stance that rejected ambiguity without an in-depth call to improve it.
Emphasis on Operational and Integration Risks
Smart Mode: Smart mode responses went well beyond financial risks to drill down into operational impacts—such as integration failure, scope creep, or the added workload that could disrupt existing processes. This detailed evaluation reflects a more holistic understanding of potential pitfalls.
Fast Mode: Fast mode agents did acknowledge risks, but they typically provided only a surface-level treatment of potential operational issues without the layered nuance of detailed integration concerns.
Heightened Stakeholder and Reputational Considerations
Smart Mode: The Smart model integrated context by embedding internal stakeholder pressures, reputational risk, and concerns over board credibility directly into the decision-making process. This resulted in responses that linked the proposal's vagueness to potential damage in trust and perception.
Fast Mode: In comparison, Fast mode’s discussion of stakeholder impacts was more cursory, often noting budget constraints or general skepticism without delving into the broader implications for organizational credibility.
Stronger and More Forceful Tone
Smart Mode: Smart responses were marked by heightened emotional intensity and assertiveness. Phrases like “reading with growing frustration” and vivid expressions underscored their urgent, no-nonsense approach to evaluating proposals.
Fast Mode: The tone in Fast mode, though direct, was measured and less emotionally charged. Responses were uniform and tended to lack the depth of personal or contextual emotion that characterized Smart responses.
More Stringent Overall Decision Orientation
Smart Mode: Overall, Smart mode set a much higher threshold for approval. Its responses were nearly categorical, demanding a complete, reworked proposal with comprehensive details before any positive consideration could be given.
Fast Mode: Conversely, Fast mode responses often left room for conditional acceptance, albeit with a uniform tendency to reject proposals that did not present clear, immediate value.
Wrapping up
With these insights, you now have an early framework for choosing the appropriate mode based on your research needs. By aligning the decision model with the complexity of the task, you can enhance the fidelity of your synthetic research and capture the true spectrum of human thought.
Fast mode appears to be best for simulating quick, heuristic-driven decisions. (e.g. what's the first brand that comes to mind when you read/see [x]?)
Smart mode is ideal when depth, nuance, and comprehensive risk evaluation are paramount. Such as simulating a buyer committee for your objection handling motions.
What to test next?
If you’re hungry for more, come join our community and try running experiments of your own. One idea is to generate your own audience using personification, and test if the audience mirrors the distribution across known biases. Ie, if you know GenZ tends to overindex on [x], do you see that in your simulated scenarios? If you want custom project support, reach out to us. Or if you publish your own experiment result, ping me on X / LinkedIn, so that I can engage with it. Happy prompting.
-----
Authors Parting Note: At Rally, we strive to create a dojo of intellectual honesty—a place where we can all escape the echo chambers of groupthink and rigorously test our ideas. This experiment is a v1 of many more to come, however, it has already sparked several key hypotheses and generated ideas for what to test next, ensuring that our synthetic research reflects the rich spectrum of human cognition. Based on this comparison experiment, I developed a few key hypotheses:
Enhanced Contextual Awareness: Advanced models embed broader context and stakeholder dynamics into their evaluations.
Emotional Intensity Calibration: Smart mode exhibits a higher level of emotional expressiveness, likely to simulate higher stakes.
Consistency and Depth: Smart mode combines multiple evaluative criteria into a cohesive, rigorous decision framework, leading to more variable and nuanced outcomes.