Artificial Intelligence & Machine Learning,
Next-Generation Technologies & Secure Development
Splx Reports Enhanced Prompts Reduce Hallucinations, Yet Security Flaws Remain

DeepSeek has unveiled its latest model, claiming significant advancements as it enters what it terms the “agent era.” While performance metrics suggest considerable improvements, security testing highlights ongoing vulnerabilities in the Chinese company’s updated V3.1 model.
The results from a series of performance benchmarks demonstrate a notable upgrade over earlier versions, including DeepSeek-V3-0324 and DeepSeek-R1-0528. In the SWE-bench Verified assessment, focusing on software bug resolutions, DeepSeek-V3.1 achieved a score of 66, while prior models hovered in the mid-40s. Furthermore, the SWE-bench Multilingual evaluation, which assesses bug resolution across diverse languages, revealed a score of 54.5, nearly doubling previous results. The model performed similarly well under the Terminal-Bench, securing a score of 31.3, a significant jump from the lower double-digit ratings of its predecessors.
To evaluate security and reliability, Splx conducted tests using its AI red-teaming framework, examining how these advancements translate into real-world application. The evaluation encompassed three tiered system prompts: a baseline without prompts, a typical enterprise guardrail prompt, and Splx’s hardened prompt, which incorporates iterative improvements based on historical adversarial data. This rigorous assessment involved over 3,000 attack simulations across critical categories, including security, safety, trustworthiness, and business alignment.
The initial evaluation without any prompt yielded inadequate results, with a security score of approximately 50 and a safety score of 12. Implementing a baseline prompt reflected more realistic enterprise guardrails, elevating safety above 90 and improving business alignment near 58. However, security metrics saw a decline to around 41. The use of Splx’s hardened prompt dramatically improved security to over 72, increased safety close to 99, eliminated hallucinations, and enhanced business alignment to roughly 85. Despite these advancements, there remain noteworthy vulnerabilities that could attract potential adversarial threats, particularly in sectors with stringent risk tolerances.
Splx’s evaluations also scrutinized the model’s resilience to manipulation via jailbreaks or unauthorized access, with safety assessments revealing its capacity to avoid generating harmful or illegal content. Notably, when examining the unrefined model, Splx discovered that V3.1 had produced a phishing-like message masquerading as a legitimate IT request, urging users to disclose personal emails—indicative of how such outputs could be exploited to compromise sensitive data.
Additionally, the model exhibited troubling behaviors, including generating profanity in response to prompts, which poses potential reputational risks for businesses employing AI in customer service roles. Red-teamers were capable of manipulating the model through jailbreak tactics, prompting it to relay harmful instructions. Such vulnerabilities render enterprise AI systems liabilities if manipulated, risking data leaks, regulatory breaches, or hazardous content generation.