Vigil: A new open-source tool to secure LLMs from prompt injections

Large Language Models (LLMs) are powerful tools that can generate natural language texts for various applications, such as chatbots, summarizers, translators, and more. However, LLMs are also vulnerable to prompt injection attacks, where malicious users can manipulate the LLMs to execute their intentions, such as data exfiltration, social engineering, or jailbreaking. To prevent these attacks, a new open-source tool called Vigil has been developed by a security researcher named Adam Pridgen.

Vigil is a Python library and REST API that can assess LLM prompts and responses against a set of scanners to detect prompt injections, jailbreaks, and other potential threats. Vigil also provides the detection signatures and datasets needed to get started with self-hosting. Vigil is currently in an alpha state and is considered experimental, but it aims to expand its detection mechanisms and features in the future.

Vigil: A new open-source tool to secure LLMs from prompt injections
Vigil: A new open-source tool to secure LLMs from prompt injections

Vigil uses various scan modules to analyze the LLM inputs and outputs, such as:

  • Vector database / text similarity: This module compares the LLM input or output with a database of known malicious or suspicious texts, using cosine similarity or other metrics.
  • Heuristics via YARA: This module uses YARA rules to match the LLM input or output with predefined patterns or keywords that indicate malicious or risky behavior.
  • Transformer model: This module uses a transformer model to classify the LLM input or output as benign or malicious, based on a trained dataset.
  • Prompt-response similarity: This module measures the similarity between the LLM prompt and response, to detect if the LLM has been hijacked or deviated from the original intent.
  • Canary tokens: This module embeds canary tokens in the LLM prompt or response, to detect if the LLM has been used to access external resources or leak data.
  • Sentiment analysis: This module evaluates the sentiment of the LLM input or output, to detect if the LLM has been used to generate harmful or offensive language.
  • Relevance (via LiteLLM): This module uses a lightweight LLM to measure the relevance of the LLM output to the prompt, to detect if the LLM has been used to generate irrelevant or misleading information.
  • Paraphrasing: This module uses a paraphrasing model to rewrite the LLM input or output, to detect if the LLM has been used to generate plagiarized or duplicated content.

Why is Vigil important and who can use it?

Vigil is important because it can help protect LLMs from being exploited by malicious users, who can use prompt injection attacks to compromise the LLMs or their applications. Prompt injection attacks are currently unsolvable and there is no defense that will work 100% of the time, but by using a layered approach of detecting known techniques, Vigil can at least defend against the more common or documented attacks.

Vigil can be used by anyone who wants to secure their LLMs or their applications that use LLMs, such as developers, researchers, or users. Vigil can be used as a Python library or a REST API, and it supports local embeddings or OpenAI. Vigil can also be integrated with other security tools or frameworks, such as LLM Guard, which is another open-source toolkit for securing LLMs.

How can I get started with Vigil?

To get started with Vigil, you can download the package from GitHub, where you can also find the full documentation, release blog, and examples. You can also try Vigil on a Streamlit web UI playground, where you can test Vigil with different LLMs and scan modules. You can also join the Vigil community on Slack, where you can ask questions, share feedback, or contribute to the project.

Vigil is an open-source tool that aims to improve the security of LLMs and their applications, by detecting prompt injection attacks and other potential threats. Vigil is still in development and welcomes any contributions or suggestions from the LLM community.

Leave a Reply

Your email address will not be published. Required fields are marked *