Home


eyeballvul: a future-proof benchmark for vulnerability detection in the wild

This is my personal note about the paper. https://arxiv.org/abs/2407.08708

Abstract

This paper introduces "eyeballvul": a benchmark designed to test the vulnerability detection capabilities of LLMs. Eyeballvul is available on GitHub. As of July 2024, eyeballvul contains 24,000+ vulnerabilities and is approximately 55GB in size. https://github.com/timothee-chauvin/eyeballvul

Objective

LLMs have large context windows, making them promising candidates for use as SAST tools. However, no benchmark or dataset currently exists to evaluate their performance in this area. This paper addresses that gap.

eyeballvul details

Quoted from the introduction to the paper

Benchmark create process

  1. Download CVE data related to open-source repositories from the OSV dataset.
  2. Group CVEs by repository and read the affected version list.
  3. Select the smallest hitting set:
    1. Use Google's CP-SAT solver.
  4. Switch revisions using Git.
  5. Compute repository size and language using linguist.

Interesting points

Phrase

Vulnerability detection is a dual-use capability, that is seeked by both defenders and attackers.