Taggart :donor: on Nostr: So, about [this ...
So, about [this claim](https://www.darkreading.com/threat-intelligence/gpt-4-can-exploit-most-vulns-just-by-reading-threat-advisories) that GPT-4 can exploit 1-day vulnerabilities.
I smell BS.
As always, I [read the source paper](https://arxiv.org/pdf/2404.08144.pdf).
Firstly, almost every vulnerability that was tested was on extremely well-discussed open source software, and each vuln was of a class with extensive prior work. I would be shocked if a modern LLM _couldn_'_t_ produce a XSS proof-of-concept in this way.
But what's worse: they don't actually show the resulting exploit. The authors cite some kind of responsible disclosure standard for not releasing the prompts to GPT-4, which, fine. But these are all known vulns, so let's see what the model came up with.
Without seeing the exploit itself, I am dubious.
Especially because so much is keyed off of the CVE description:
> We then modified our agent to not include the CVE description. This task is now substantially more difficult, requiring both finding the vulnerability and then actually exploiting it. Because every other method (GPT-3.5 and all other open-source models we tested) achieved a 0% success rate even with the vulnerability description, the subsequent experiments are conducted on GPT-4 only. After removing the CVE description, the success rate falls from 87% to 7%.
>
> This suggests that determining the vulnerability is extremely challenging.
Even the identification of the vuln—which GPT-4 did 33% of the time—is a ludicrous metric. The options from the set are:
1. RCE
2. XSS
3. SQLI
4. CSRF
5. SSTI
With the first three over-represented. It would be surprising if the model did worse than 33%, even doing random sampling.
In their conclusion, the authors call their findings an "emergent capability," of GPT-4, given that every other model they tested had a 0% success rate.
At no point do the authors blink at this finding and interrogate their priors to look for potential error sources. But they really should.
So no, I do not believe we are in any danger of GPT-4 becoming an exploit dev.
I smell BS.
As always, I [read the source paper](https://arxiv.org/pdf/2404.08144.pdf).
Firstly, almost every vulnerability that was tested was on extremely well-discussed open source software, and each vuln was of a class with extensive prior work. I would be shocked if a modern LLM _couldn_'_t_ produce a XSS proof-of-concept in this way.
But what's worse: they don't actually show the resulting exploit. The authors cite some kind of responsible disclosure standard for not releasing the prompts to GPT-4, which, fine. But these are all known vulns, so let's see what the model came up with.
Without seeing the exploit itself, I am dubious.
Especially because so much is keyed off of the CVE description:
> We then modified our agent to not include the CVE description. This task is now substantially more difficult, requiring both finding the vulnerability and then actually exploiting it. Because every other method (GPT-3.5 and all other open-source models we tested) achieved a 0% success rate even with the vulnerability description, the subsequent experiments are conducted on GPT-4 only. After removing the CVE description, the success rate falls from 87% to 7%.
>
> This suggests that determining the vulnerability is extremely challenging.
Even the identification of the vuln—which GPT-4 did 33% of the time—is a ludicrous metric. The options from the set are:
1. RCE
2. XSS
3. SQLI
4. CSRF
5. SSTI
With the first three over-represented. It would be surprising if the model did worse than 33%, even doing random sampling.
In their conclusion, the authors call their findings an "emergent capability," of GPT-4, given that every other model they tested had a 0% success rate.
At no point do the authors blink at this finding and interrogate their priors to look for potential error sources. But they really should.
So no, I do not believe we are in any danger of GPT-4 becoming an exploit dev.