AI assistants may produce more insecure code, study finds

A recent study by computer scientists from Stanford University has revealed that programmers who use AI tools like GitHub Copilot to generate code may end up with more security vulnerabilities than those who write code without such assistance. The study, titled “Do Users Write More Insecure Code with AI Assistants?”, was published as a preprint paper on arXiv on August 28, 2023.

GitHub Copilot is a code suggestion tool that uses OpenAI’s Codex model, a large language model trained on billions of lines of code from public repositories. It can generate code snippets for various programming languages based on natural language prompts or existing code. However, the tool has been criticized for its potential legal and ethical issues, such as copying code without attribution, violating licenses, and producing insecure or buggy code.

AI assistants may produce more insecure code, study finds
AI assistants may produce more insecure code, study finds

The Stanford researchers conducted an online experiment with 61 participants, who were asked to write code for six tasks in Python, JavaScript, or C using an online editor. The participants were randomly assigned to either use GitHub Copilot or not. The tasks involved common security issues such as encryption, hashing, SQL injection, cross-site scripting, and password validation.

AI assistants may delude developers about the quality of their output

The researchers used a tool called RobustAPI, which they developed to check the code for API misuse and security vulnerabilities. They found that participants with access to GitHub Copilot often produced more security vulnerabilities than those without access, with particularly significant results for string encryption and SQL injection. For example, GitHub Copilot suggested using the insecure ECB mode for AES encryption, which can leak information about the plaintext.

Moreover, the researchers found that participants who used GitHub Copilot were more likely to believe that they wrote secure code than those who did not. This suggests that AI assistants may create a false sense of confidence among developers, who may not bother to check the code for errors or test it thoroughly.

The researchers also compared the performance of GitHub Copilot with two other AI models: Llama 2 and Vicuna-1.5, which are open-source variants of OpenAI’s Codex model. They found that Llama 2 produced fewer vulnerabilities than GitHub Copilot and Vicuna-1.5, but also generated less code overall. Vicuna-1.5 produced more vulnerabilities than GitHub Copilot, but also generated more code.

AI assistants are not a substitute for human expertise and testing

The researchers concluded that AI assistants are still not great at generating clean and secure code, and that developers should not rely on them blindly. They suggested that AI assistants should provide more feedback and explanations to the users, and that users should verify the code using static analysis tools and testing frameworks.

They also pointed out some limitations of their study, such as the small sample size, the limited set of tasks, and the possible bias of the participants. They called for more research on the security implications of AI assistants, and the development of more robust and reliable tools for code generation.

Leave a Reply

Your email address will not be published. Required fields are marked *