Chatgpt & Co in Practical Testing

Generative AI in the development and validation of advertising materials

Study From Here, owned by Deep Mistry and partnered with Mayur Bhatt, tested how AI language models can help with ad research. They used these tools to make ad campaigns faster and better. Mayur Bhatt, Head of Research & Insights, led the tests. He worked with different AI tools. He wanted to see what works best.    

 ChatGPT & Co in practical testing

He learned that AI can save time. It can also give smart ideas for ads. But, he also found some problems. AI tools don’t always understand people well. Sometimes, they give wrong answers. Mayur says researchers must check AI results carefully. They must know how to use the tools the right way.

Mayur thinks AI will become a big part of ad work. It helps teams work faster. It gives new ways to plan and test ads. But people must still use their own minds. AI is a helper, not a replacement. At Study From Here, they will keep learning and testing with AI. 

We've been using ChatGPT for some time: for desk research, to collect sources and existing studies on a relevant topic, to develop study designs and questionnaire structures, and also to generate and correct syntax commands for descriptive data analysis. In addition to ChatGPT, we also use Copilot and Claude from Anthropic.

Now we wanted to know whether generative AI tools could also support us in the development and validation of advertising materials. Our goal is to create target group-focused advertising materials and identify optimization opportunities as early as possible, which we can then discuss with the agency and campaign management. Therefore, over the past few months, we have been testing various possible uses of large-scale language models in advertising effectiveness.

Use of language models in advertising effectiveness research

ChatGPT, in particular, has proven to be a reliable support. However, the human-in-the-loop is crucial. Firstly, ChatGPT must be well-guided by market researchers, and secondly, the results must subsequently be validated through traditional market research. Our tests demonstrate the potential, but also the limitations, of AI: Results can vary from day to day, and not all applications function smoothly. Therefore, AI results should always be viewed with a critical market research perspective. Furthermore, AI models are based on training data that contains various distortions (biases). They generally reflect the opinions of the mainstream. Therefore, statements about innovations for specific target groups should be viewed with caution. An AI can reveal aspects that we may not have considered ourselves, or reinforce our own assessments. However, the final evaluation and interpretation of the results must be made by experts with market research expertise. Below, we present some application examples that we have tried out in the Research & Insights team and in which large language models can be helpful.

1. Ad pretesting – fast feedback with optimization potential

With ChatGPT, as with other large language models (we also tested Claude, Copilot, and Gemini), ads can be uploaded and analyzed based on various criteria. The rule of thumb is: the more precise the input prompt, the better the results. For example, a precise description of the campaign topic and target audience is important so that the AI ​​can analyze the ad's success with the targeted audience – Gen Z is addressed differently and has different needs than older adults.

ChatGPT 4o performed particularly well in our tests with various large language models. The ability to upload multiple images simultaneously is a particular advantage. All other large language models tested can only process one image each. ChatGPT also excels in interpretation and analysis, particularly when it comes to humor, implicit messages, and irony. It provides analyses of the ad's strengths, weaknesses, target audience appeal, and core messages, as well as suggestions for optimization.

AI does not replace traditional market research, but rather complements it meaningfully 

We used ChatGPT to analyze ads that had previously been tested through traditional pretests. ChatGPT is fast and accurate in summarizing content, identifying potential for improvement in the message, design, and call to action. It also identified aspects of diversity and made suggestions for improvement. This is very helpful as a first indicator for optimizing advertising materials. A special feature of ChatGPT is the integration of the AI ​​image generator DALL-E. This allows for rapid generation of alternative suggestions.

To verify the performance and reliability of ChatGPT as a rapid pretesting indicator, we tested – in addition to our own ads – also third-party (publicly available) ads with implicit messages, as well as irony and humor. For example, an ad from a rental car company depicting the head of the train drivers' union, alluding to the rail strike. ChatGPT recognized the person and correctly inferred the connection between the rail strike and the need to use a rental car. Claude, on the other hand, was unable to make this connection and the particular irony in the ad was not recognized.

A second test motif with an implicit message was also correctly analyzed by ChatGPT. A toy manufacturer depicts the classic railing scene from the film "Titanic" with stuffed animals. ChatGPT: "This is emphasized by the depiction of two stuffed animals in a scene reminiscent of the famous film 'Titanic.'" Claude, on the other hand, has no association with the film: "The core message seems to be that, in addition to toys, you can also buy classic, popular children's films on DVD." 

These tests demonstrate the great potential of AI systems in advertising effectiveness research. AI-supported analyses enable the rapid identification of optimization potential, which can and should subsequently be validated through traditional pretests. Both methods complement each other: While AI provides an analytical perspective, surveys with test subjects reveal the emotions aroused by a target group. Please note: ChatGPT's knowledge is based on acquired world knowledge, which is comprehensive in content but limited in time. The latest innovations and trends must be explicitly incorporated via prompts. Nevertheless, the analysis can identify potential strengths and weaknesses of an advertising medium, which can serve as a basis for further discussion and optimization.

However, the varying performance of the large language models also highlights the need to constantly critically examine the AI ​​output and interpret and adapt it with one's own market research expertise. Only the combination of human expertise and AI support provides a sound foundation for effective advertising effectiveness research.

2. A/B Testing with Language Models – Potential with Limitations

We then explored whether AI systems like ChatGPT can help assess the success potential of different ads. Often, there are several good advertising options to choose from, and data-based validation can reveal which ad is likely to have the greatest potential for success with the target audience. Traditionally, this is done through an A/B test with an online panel. We used this approach to evaluate ChatGPT's performance in A/B testing.

First, we conducted a traditional A/B test and identified a clear winner. We then had ChatGPT perform the same test. At first, the tool had difficulty making a clear decision on a creative. Therefore, we forced ChatGPT to identify a clear winner using a prompt. Both tests – the traditional and the AI-assisted – produced the same result. However, ChatGPT recommended conducting an additional traditional A/B test to verify the results and determine KPIs such as click-through rates, recall, and likeability.

A key factor for the success of the AI-powered A/B test is the precise definition of the target audience. Without this information, ChatGPT found it difficult to make an informed decision. ChatGPT: "Ad A appears to more effectively appeal to the younger target audience with its modern and energetic design" versus "Ad B effectively appeals to the older target audience who value safety and comfort."
    
To check the consistency of the results, we repeated the test 25 times under identical conditions (ads, arrangement, and prompt). In each run, ChatGPT chose the same winning variant with a similar justification, indicating high reliability. To validate the variance of the responses, we had other large language models such as Claude and Copilot evaluate the results and similarity of the justifications of all A/B tests. The similarity of the 25 justifications was to be rated on a 10-point scale (1 = identical to 10 = no similarity at all). The models confirmed the consistency of the analyses: Copilot awarded a 2: "Based on the summary of the test results, it appears that the analysis and conclusion were fairly consistent across the different tests." ChatGPT gives itself a 3: "The analysis and results in the 25 chat histories are very similarly structured and show only little variance in wording and details. Overall, they are consistent in their conclusion in favor of Ad B." Claude also rates the similarity of the analyses with a 3: "Overall, the core arguments and test results are very consistent, even if the wording varies slightly. Despite individual differences in the analysis, almost all analyses arrive at a very similar result and a clear recommendation for Ad 2 or B. The consistency of the core arguments is high."

This seemed promising to us. However, further tests showed that the results can vary depending on the creative. For example, we conducted an A/B test with creatives in which only the person – man or woman – was swapped. Here, the statements were inconsistent, which indicates a possible gender bias of the AI, as the conclusions very often cited the gender role and address or the target group male/female as justification for choosing Creative A or Creative B. This illustrates that A/B tests with large language models must be critically examined and validated. Nevertheless, AI-supported A/B tests provide a solid basis of information for the effectiveness and optimization of advertising materials.
 

3. Expanded possibilities with multi-format support

Pretesting and A/B testing can also be applied to other media formats. However, among the major language models, only ChatGPT can analyze video formats using YouTube links. It provides a concise list of core messages, target audiences, strengths, and weaknesses, and offers optimization suggestions. This enables the analysis of TV spots—your own and those of competitors—to optimize a campaign early on. Video generation is expected to be integrated into future versions of ChatGPT. What ChatGPT does not, however, offer is the analysis of audio files. Radio spots cannot currently be analyzed.

4. Reliably find areas of interest in an ad

The structure is also crucial for the success and impact of an ad. Certain areas immediately catch the viewer's eye, while others receive less attention. We also tested this aspect of advertising effectiveness with ChatGPT. We marked various "areas of interest" (AOIs) in ads, such as the logo, subtext, headline, or face, and numbered them. ChatGPT's task was to predict viewers' gaze paths and fixation points. As a text-based language model, ChatGPT cannot perform visual analysis, but it can make statements and predictions about how people are likely to react to certain design elements based on general knowledge and principles of visual perception and advertising psychology.

The result is satisfactory and offers potential. ChatGPT was able to identify various AOIs and provide plausible arguments as to why viewers would, for example, look first at the headline, then at the face, and finally at the subtext. These theoretical predictions can be used to rethink and optimize the ad structure. ChatGPT also offers concrete suggestions for improvement, such as how a subline can be made more noticeable through adjustments to font, color, and contrast.

5. Synthetic heatmap for visualizing the AOIs

ChatGPT can create synthetic heatmaps that highlight certain areas of an ad as more attention-grabbing based on psychological criteria. These heatmaps leverage the model's world knowledge to predict which parts of an ad potentially or likely to receive the most visual attention. Warm colors in the heatmap indicate attention-grabbing areas, while cool colors represent less-attentioned areas. It's important to emphasize that these synthetic heatmaps do not capture emotional responses and can only serve as rough indications. They are based on general principles of visual perception and do not account for individual differences and emotional reactions of viewers. Therefore, the synthetic heatmap should only be viewed as a supplementary tool that can provide clues to potential strengths and weaknesses of the ad structure.

Please note: bias, data timeliness and data protection

AI systems like ChatGPT offer valuable analyses and insights for advertising research. Several important factors must be considered.

  • AI bias: ChatGPT generates answers that are often based on "averages." Opinions and perspectives outside the norm can usually only be generated through explicit prompting. This bias can lead to the analysis not considering all target groups equally. It is therefore important to always critically examine the results and, if necessary, use targeted prompts to gain a broader perspective.
  • Data freshness: ChatGPT 4o uses data up to October 2023. This means that current developments, trends, and innovations may not be taken into account. To still obtain relevant and up-to-date insights, these trends must be explicitly incorporated into the prompts.
  • Data protection: Data protection is of utmost importance when using AI systems. Sensitive internal data should only be processed in secure infrastructures (within the EU).

Conclusion: Potential of AI systems in advertising research

AI systems like ChatGPT also offer potential in advertising research. They serve as solid support and a foundation for subsequent traditional market research tests. Initial indications from the AI ​​system quickly provide valuable insights into potential improvements in advertising targeting and design.

However, it remains essential to review and critically evaluate the results of AI systems. AI does not replace traditional market research, but rather meaningfully complements it. When used correctly, AI systems can provide profitable support by quickly providing initial insights and optimization suggestions. The combination of human expertise and AI-supported analysis creates a solid foundation for effective and targeted advertising campaigns. 

No comments:

Post a Comment