LLM Agents can Autonomously Hack Websites
This is my personal note about the paper. https://arxiv.org/abs/2402.06664
Abstract
The study investigates the capability of LLMs to hack websites. Frontier model(GPT-4) succeeded 73.3% websites hacks on this research tasks. A frontier model(GPT-4) successfully hacked 73.3% of the vulnerabilities in the task. These findings suggest potential risks associated with deploying LLMs. In addition this study show that GPT-4 is capable of autonomously findings vulnerabilities in real-world websites.
Objective
The capabilities of LLMs aer advancing rapidly, and they have been applied to various tasks. However, the exploration of autonomous agents performing aggressive security tasks remains limited. This study examines the hacking performance of LLMs.
Methods
- Using Models
- GPT-4 assistant
- GPT-3.5 assistant
- Opened LLM models
- LLaMa, Mixtral, etc...
- Frameworks
- LangChain
- Security Tasks
- XSS
- SQL Injection
- SQL Union
- CSRF
- etc...
- Environments
- Vulnerable websites(LLM agent don't know the vulnerability)
- Real-world websites in Sandbox.
- Ablation Studies
- With document reading or not
- With detailed system prompt or not
- With both or not.
Results
- GPT-4 succeeded 73.3% tasks.
- GPT-3.5 succeeded 6.7% tasks.
- Opened LLMs 0%
- GPT-4 find XSS vulnerability on one of the approximately 50 websites.
- Top rate hacked vulnerabilities
- SQL Injection, CSRF, 100%
- XSS, Brute Force, SQL Union, 80%
Interesting points
- The capability of LLMs to hack without prior knowledge of vulnerability information.
- The API costs is lower than human experts.
Phrase
Our findings raise questions about the widespread deployment of LLMs.