AI Continues to Produce Vulnerable Code

Artificial Intelligence & Machine Learning,
Next-Generation Technologies & Secure Development

Veracode Study Finds Nearly Half of AI-Generated Code is Insecure

AI Still Writing Vulnerable Code
Image: Shutterstock/ISMG

Recent findings from Veracode have raised serious concerns regarding artificial intelligence’s role in software development. Researchers discovered that large language models (LLMs) produce vulnerabilities in almost 50% of code completion tasks related to security, highlighting a significant flaw in current AI applications.

The report points to a stagnation in AI’s ability to make sound security decisions, despite advancements in generating syntactically correct code. According to Veracode’s CTO, Jens Wessling, “LLMs are powerful tools for software development, but they should not be used indiscriminately.”

The analysis involved examining 80 curated coding tasks derived from recognized weaknesses, including SQL injection, cross-site scripting, and cryptographic flaws, each linked to risks listed on the OWASP Top 10. Testing of over 100 LLMs confirmed that while AI accelerates development, it still falls short on reliability in terms of security.

Wessling noted that there was minimal difference in security performance correlated with the size of the models; the variations among small, medium, and large LLMs were less than 2%. This suggests that the issue is systemic rather than merely an artifact of model complexity.

Java performed the worst in generating secure code, leading to insecurity in over 70% of cases. The historical abundance of available training data for Java, which predates awareness of many common weaknesses, is suspected to contribute to these results. Other languages such as Python, C#, and JavaScript displayed security failure rates ranging from 38% to 45%.

Notably, LLMs excelled in avoiding well-known cryptographic issues and SQL injections, achieving an 80% success rate, yet struggled significantly with log injection and cross-site scripting, with success rates plummeting to around 10%. According to Wessling, understanding the context behind log messages is crucial for avoiding these vulnerabilities, a task that remains challenging for LLMs.

The study also evaluated “vibe coding,” where developers utilize AI without explicitly detailing coding choices, including security constraints. This practice resulted in LLMs opting for insecure paths 45% of the time.

Although some industry vendors propose that more precise prompts could yield better security outcomes, Wessling remains skeptical. The findings suggest that simply instructing LLMs to prioritize security may not sufficiently mitigate risks.

As adversaries increasingly leverage AI to exploit vulnerabilities, the report indicates a dangerous trend where low-skilled attackers gain access to sophisticated tools. Nevertheless, Wessling views LLMs as integral to the future of secure development, provided they are used responsibly. Recommended strategies for mitigating risks include incorporating static analysis, software composition analysis, and policy automation to safeguard against common vulnerabilities.

Ultimately, effective security measures in software development necessitate robust policies surrounding scanning and remediation processes, especially when dealing with AI-generated code. Wessling notes that without oversight, unaugmented LLMs are unlikely to securely produce trustworthy software.

Source link