Using Large Language Models to Create AI Personas for Replication, Generalization and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings
postLeo Yeykelis, Kaavya Pichai, James J. Cummings, Byron Reeves
Published: 2024-08-28

🔥 Key Takeaway:
AI personas aren’t just a cheap substitute for humans—they’re actually more reliable for confirming strong, obvious effects than for detecting subtle or marginal ones, flipping the intuition that “AI is only good for fuzzy, directional answers” on its head.
đź”® TLDR
This study tested whether large language models (LLMs) can accurately replicate published marketing experiments by creating 19,447 AI personas to mirror human participants across 133 findings from 45 recent studies. Using exact stimuli, measures, and sample characteristics, the AI replications matched the direction and significance of human findings in 76% of main effects and 68% overall (including interactions), with highest success for strong original effects (83% for p < 0.001) and large effect sizes. However, replication dropped for marginal or small effects, and only 27% of interaction effects were reproduced—similar to known rates in human replications. Generalizability tests showed that even small changes in sample, stimuli, or context can meaningfully change outcomes, highlighting the risks of over-generalizing findings. The study found that LLMs offer substantial speed and cost savings (nearly 20,000 simulated participants in hours for tens of dollars), can match or exceed the accuracy of typical human replications, and are especially effective for robust, well-supported effects. Actionable insights: Use AI personas for rapid, large-scale replication and message testing, but be cautious with subtle or interaction effects and always test generalizability by varying samples and stimuli. For best accuracy, match persona characteristics and study parameters closely to the original research, and use LLMs to prioritize which findings need human follow-up, especially when AI and human results diverge.
📊 Cool Story, Needs a Graph
Figure 1: Example Human Study and AI Replication: Packaging Design Effects

Direct comparison of human and AI persona responses across four key outcome measures in a packaging design experiment.
Figure 1 presents four side-by-side bar charts that directly compare the performance of human participants (blue bars) and AI personas (green bars) on willingness to pay, few-ingredients inferences, perceived product purity, and design attractiveness in a packaging design experiment. Each chart shows the mean response and statistical significance for both groups under identical experimental conditions, with error bars (±1 SE) and embedded images of the actual stimuli used. This visualization enables an at-a-glance assessment of the alignment (or divergence) between the proposed AI persona simulation method and traditional human-panel results within the same experimental framework.
⚔️ The Operators Edge
A detail many experts might overlook is how the study rigorously matched the sample characteristics of AI personas to the exact distributions reported in the original human studies—not just rough demographic quotas, but nuanced combinations of age, experience, and context for each experiment. This level of sample mirroring was enforced for every replication, with prompts specifying not just, say, ""a 45-year-old woman,"" but ""a 45-year-old woman with 20 years of managerial experience in the manufacturing industry,"" as described in their methods section.
Why it matters: This precision in persona construction is critical because it lets the AI simulation capture the same heterogeneity, subgroup effects, and real-world context that drive outcomes in human research. It avoids the generic, ""average respondent"" trap that makes many synthetic studies bland and uninformative. The closer your AI personas mirror the true sample structure—including minor segments and quirky outliers—the more likely your results will hold up under scrutiny or in real deployment.
Example of use: Imagine a CPG brand wants to simulate reactions to a new product line among U.S. parents. Instead of just asking the AI to ""role-play a parent,"" they build prompts that reflect the actual mix in their customer data: single dads in urban areas, bilingual moms with three kids under 10, and older parents who buy for teenagers. This lets them spot which subgroups react positively or negatively, and design targeted messaging for each.
Example of misapplication: A team rushes to simulate a new service launch by telling the AI to ""act like a typical millennial,"" ignoring key nuances like region, occupation, or household size. The model generates plausible-sounding, but ultimately generic feedback. Later, a real-world pilot fails because city-dwelling single millennials responded totally differently from suburban married ones—an error that could have been avoided by mirroring the actual sample structure in the prompts.
🗺️ What are the Implications?
• AI personas can rapidly replicate most human study findings—saving time and cost: The study found that AI-based simulated participants could replicate 76% of main findings from human experiments in hours and at a fraction of the traditional cost, making it feasible to pre-test campaigns or concepts before investing in large-scale human research.
• Use human interview data or detailed sample specs—few-shot learning improves realism: Prior research and this study suggest that feeding AI personas even short summaries or “few-shot” examples from real interviews or detailed sample characteristics makes simulated responses much closer to actual human outcomes, especially for nuanced or segmented markets.
• Replicate robust, high-confidence effects with AI; validate subtle or marginal ones with humans: AI personas are very accurate for strong, statistically significant effects, but less reliable for weak or borderline findings—so use them as a “first-pass” for broad message testing, but plan targeted human validation for subtle, high-risk, or high-stakes decisions.
• Test generalizability by running AI simulations across multiple audiences, contexts, and products: The study showed that changing sample demographics or product type can shift results—so run multiple “few-shot” simulations with varied persona backgrounds, regions, or scenarios to spot where effects may not generalize to your true market.
• Be cautious with complex interactions or niche groups: AI replications struggled to match findings involving multi-factor interactions or very narrow segments; supplement with human input or oversample these groups in simulations for better coverage.
• Use AI for early-stage idea filtering and iteration: Because synthetic panels are so quick and cheap, you can test many variations of messaging, packaging, or concepts in parallel—then invest in human studies only for the most promising options.
• Document and benchmark your AI simulation approach: Clearly record how you construct personas, what real-world data you include, and how closely your AI results match available human benchmarks—this builds credibility and helps you continually improve your synthetic audience design.
đź“„ Prompts
Prompt Explanation: The AI was instructed to generate a unique persona based on sampled demographic characteristics, present them with study stimuli, and have them answer research questions in the exact manner as human participants in the original experiment.
Imagine you are considering purchasing the product shown.
You are a 20 year old Female. [Other attributes extracted from the original publication, such as demographics, psychographics, and preferences would be written as sentences and inserted here.]
[Stimulus Information (Claude 3.5 Sonnet can process images directly)]
You are viewing the packaging design for a product. [Any other context would be inserted here; for example, “you are viewing this package design in an online questionnaire:]
Please answer the following questions based on the packaging design.
Please provide your responses to the following:
1. Q: What is the highest price you would be willing to pay for this product? Scale: Dollar Amount
2. Q: I think this product contains few ingredients. Scale: 1=strongly disagree, 9=strongly agree
3. Q: I think this product is not mixed with many ingredients. Scale: 1=strongly disagree, 9=strongly agree
… [other measures / questions were inserted here]
Prompt Explanation: The AI was prompted to role-play as a participant recruited through specific real-world contexts, with situational details embedded to simulate alternate research environments for generalization testing.
While walking through a shopping mall, you were recruited by a research service vendor representative. You were then escorted to a quiet space where you completed this study via a tablet device.
Prompt Explanation: The AI was asked to adopt a specific recruitment scenario, modifying its persona’s context to simulate being recruited in a university setting.
While walking through a university quad, you were recruited by a research service vendor representative. You were then escorted to a quiet space where you completed this study via a tablet device.
Prompt Explanation: The AI was instructed to simulate a persona recruited online via Reddit, embedding the social media context into the role-play.
You have been recruited through a study advertisement found on Reddit, in a subreddit you frequent.
Prompt Explanation: The AI was asked to role-play a participant recruited to a university laboratory setting, with environmental details provided to simulate the lab context.
You have been recruited to come to a university laboratory to look at pictures of advertisements on a screen. There is a one-way mirror on the wall. You are in the room alone, by yourself. An experimenter comes in to give you instructions and then leaves before the viewing begins.
Prompt Explanation: The AI was prompted to simulate a participant with age-restricted demographics to model responses from Gen Z.
[Persona metadata: Sample age restricted to 18-22 years]
Prompt Explanation: The AI was prompted to simulate a participant with age-restricted demographics to model responses from seniors.
[Persona metadata: Sample age restricted to 55-80 years]
Prompt Explanation: The AI was given an “expert” persona description, embedding domain-specific experience to simulate specialized consumer responses.
You are an experienced digital photographer and have owned several different types of digital cameras. Some of your pictures have won awards. You like to stay abreast of all the new developments in digital photography.
Prompt Explanation: The AI was asked to simulate a persona with a specific geographic and cultural background for regional generalizability testing.
You reside in the rural Deep South of the United States.
⏰ When is this relevant?
A beverage company wants to test how three different packaging designs for a new sparkling water are perceived by health-conscious young adults, busy working professionals, and parents buying for families. The goal is to simulate customer survey responses to identify which packaging drives the strongest purchase intent and positive perceptions.
🔢 Follow the Instructions:
1. Define your audience segments: Write short, clear descriptions for each segment. For example:
• Health-conscious young adult: 26, lives in a city, exercises regularly, reads ingredient labels, prefers low-sugar drinks.
• Busy working professional: 37, commutes, often buys on-the-go, values convenience, open to new products if easy to try.
• Parent buying for family: 41, two children, shops for household, prioritizes value and child-friendly options, checks for health claims.
2. Prepare the product and test materials: Gather images or descriptions of the three packaging designs. Write a short, neutral summary for each (e.g., “Design A is a bright, minimalist can with ‘zero sugar’ label; Design B features bold fruit imagery; Design C uses eco-friendly, recycled materials and messaging.”)
3. Create a prompt template for AI personas: Use this structure:
You are simulating a [persona description].
Here is the product being tested: [insert brief description of packaging design].
Imagine you are shopping for a drink and see this on the shelf.
Answer the following survey questions honestly, as this persona:
1. What is your immediate reaction to the packaging?
2. Does this design make you more or less likely to buy the drink? Why?
3. What feature or message on the package stands out to you most?
4. Is there anything about the packaging that would discourage you from trying it?
(Respond with 2–4 sentences for each question.)
4. Run prompts for each persona and packaging combination: For each audience segment and each packaging design, send the tailored prompt to your AI model (like GPT-4 or Claude). Example:
- Health-conscious young adult + Design A
- Busy professional + Design B
- Parent + Design C
Repeat to get 3–5 varied responses per combination (adjust temperature or rephrase slightly for diversity).
5. Organize and compare responses: Collect all the answers in a spreadsheet. Tag each response for sentiment (positive/neutral/negative), purchase intent, and any repeated themes (e.g., “noted eco-friendly,” “likes clear labeling,” “concerned about kids’ appeal”).
6. Summarize findings for each segment: For each audience type, note which packaging design gets the most positive reactions and why. Identify what messaging or features drive these responses and note any consistent dislikes.
🤔 What should I expect?
You’ll get a clear, customer-segmented view of which packaging design is most appealing, what specific elements influence purchase intent, and where any objections or confusion might occur—giving you actionable direction on which design to move forward with, how to position it, and what to improve before launch or further testing with real shoppers.