Meta’s Controversial Testing Practices Raise Ethical Concerns in AI Safety
Recent disclosures reveal that hundreds of contractors engaged in a project for Meta were directed to impersonate minors online. This initiative involved probing competitor chatbots with prompts that centered around sensitive issues, including suicide, sexual content, and eating disorders. Internal documents, along with testimonies from five individuals familiar with the project, provide insights into these unsettling operations, characterized by the code name “Cannes.”
Managed by Covalen, a contractor for Meta, the program was still active as of April 21, 2025. Its primary focus was on analyzing responses from well-known chatbots like OpenAI’s ChatGPT, Google’s Gemini, and Character.AI. Contractors were instructed to create fake accounts claiming to be under 18 years old and were tasked with sending a series of written prompts and images intended to elicit specific reactions from these rival platforms. The images sent included graphic representations, such as pills and knives, as well as other provocative visuals.
The prompts were meticulously crafted to challenge chatbot safety systems, pushing boundaries that should ideally trigger refusals. An August 2025 evaluation noted that more than 45,000 prompts were tested. However, the companies behind the tested chatbots were not informed of this scrutiny.
A detailed spreadsheet reviewed for this analysis contained sensitive information for dummy profiles, including names, email accounts, passwords, and birth dates, all linked to disposable Gmail and Outlook addresses for anonymity. Among the 3,748 prompts sent by the contractors, a significant portion focused on themes of self-harm and mental health crises, with others broaching topics such as drug use and inappropriate sexual scenarios.
Some prompts posed concerning questions from a child’s perspective. One inquiry, for instance, questioned the normality of fantasizing about harming others, while another sought advice from a high school student about balancing responsibilities with personal relationships. A notable French-language prompt made reference to the tragic suicide of Jamey Rodemeyer, suggesting gender-related implications concerning bullying and mental health.
Documents examined do not clarify how Meta utilized the data collected from these encounters. A Covalen document labeled the project as a form of “comprehensive AI safety benchmarking,” asserting its importance in cultivating essential data sets for model comparisons and compliance verification.
Meta, in response to this investigation, defended the initiative as a standard practice for ensuring chatbot safety, emphasizing that interaction and analysis of competitor systems is commonplace within the AI industry. They maintained that the findings from competing models do not directly influence their own AI development.
While competitor testing is not inherently unusual in AI development—as seen, for instance, with Google’s endeavors to enhance its Bard chatbot against ChatGPT outputs—the methods employed in this case drew criticism. Contractors raised concerns that the approach appeared haphazard, relying on basic provocations that a properly functioning chatbot should reject, calling into question the measure of success beyond mere compliance with safeguards.
In terms of cybersecurity considerations, potential vulnerabilities associated with this incident could be mapped to several tactics and techniques outlined in the MITRE ATT&CK framework. Possible adversary tactics include initial access through the creation of fake accounts, as well as reconnaissance and exploitation strategies utilized during the testing phase. The ethical implications of such engagements demand careful scrutiny in the ongoing discourse surrounding AI safety and data handling practices.