XBOW, an autonomous penetration tester, has reached the top spot on HackerOne

>>summar+(OP)
Related: https://arstechnica.com/gadgets/2025/05/open-source-project-...

>>ryandr+yn
Open source maintainers have been complaining about this for a while. https://sethmlarson.dev/slop-security-reports. I'm assuming the proliferation of AI will have some significant changes on/already has had for open source projects.

>>bgwalt+Ot
You should come to my upcoming BlackHat talk on how we did this while avoiding false positives :D

https://www.blackhat.com/us-25/briefings/schedule/#ai-agents...

>>summar+(OP)
Have XBow provided a link to this claim, I could only find:

https://hackerone.com/xbow?type=user

Which shows a different picture. This may not invalidate their claim (best US), but a screenshot can be a bit cherry-picked.

>>radial+lu
We've got a bunch of agent traces on the front page of the web site right now. We also have done writeups on individual vulnerabilities found by the system, mostly in open source right now (we did some fun scans of OSS projects found on Docker Hub). We have a bunch more coming up about the vulns found in bug bounty targets. The latter are bottlenecked by getting approval from the companies affected, unfortunately.

Some of my favorites from what we've released so far:

- Exploitation of an n-day RCE in Jenkins, where the agent managed to figure out the challenge environment was broken and used the RCE exploit to debug the server environment and work around the problem to solve the challenge: https://xbow.com/#debugging--testing--and-refining-a-jenkins...

- Authentication bypass in Scoold that allowed reading the server config (including API keys) and arbitrary file read: https://xbow.com/blog/xbow-scoold-vuln/

- The first post about our HackerOne findings, an XSS in Palo Alto Networks GlobalProtect VPN portal used by a bunch of companies: https://xbow.com/blog/xbow-globalprotect-xss/

>>tptace+dB
Wait a sec, I thought they were optional?

> White Paper/Slide Deck/Supporting Materials (optional)

> • If you have a completed white paper or draft, slide deck, or other supporting materials, you can optionally provide a link for review by the board.

> • Please note: Submission must be self-contained for evaluation, supporting materials are optional.

> • PDF or online viewable links are preferred, where no authentication/log-in is required.

(From the link on the BHUSA CFP page, which confusingly goes to the BH Asia doc: https://i.blackhat.com/Asia-25/BlackHat-Asia-2025-CFP-Prepar... )

>>summar+(OP)
I'm looking forward to the LLM's ELI5 explanation. If I understand correctly, XBOW is genuinely moving the needle and pushing the state of the art.

Another great reading is [1](2024).

[1] "LLM and Bug Finding: Insights from a $2M Winning Team in the White House's AIxCC": >>41269791

>>mellos+Zw
If you scroll down on [the leaderboard](https://hackerone.com/leaderboard?year=2025&quarter=2&owasp=...) page to Country and select United States, xbow is currently on top

>>summar+(OP)
It’s humans who:

- Design the system and prompts

- Build and integrate the attack tools

- Guide the decision logic and analysis

This isn’t just semantics — overstating AI capabilities can confuse the public and mislead buyers, especially in high-stakes security contexts.

I say this as someone actively working in this space. I participated in the development of PentestGPT, which helped kickstart this wave of research and investment, and more recently, I’ve been working on Cybersecurity AI (CAI) — the leading open-source project for building autonomous agents for security:

- CAI GitHub: https://github.com/aliasrobotics/cai

- Tech report: https://arxiv.org/pdf/2504.06017

I’m all for pushing boundaries, but let’s keep the messaging grounded in reality. The future of AI in security is exciting — and we’re just getting started.

>>summar+(OP)
Ive also invested some time in this space over the last several years. A group Im in took the approach of custom building agents for each CTF we approached. Our best so far was an agent participating in an AI CTF against current top injection/jailbreak and leakage defense techniques, the agent autonomously completed 22 of the 40 challenges and at one point held 8th place out of 380 teams. It eventually plateaued and slipped to 12th by the end.

The tooling and models are maturing quickly and there is definitely some value in autonomous security agents, both offensive and defensive- but also still requires alot of work, knowledge(my group is all ML people), skill, planning- if you want to approach anything more than bug bashing.

This recent paper from Dreadnode discusses a benchmark for this sort of challenge: https://arxiv.org/abs/2506.14682

>>summar+(OP)
And so we've arrived at William Gibson's black ice and ice-breaker (Russian military) systems.

https://en.wikipedia.org/wiki/Burning_Chrome

>>vasco+VK1
Actually, those humans (XBOW's) were already top rankers. Just look it up.

What's being critized here is the hype, which can be misleading and confusing. On this topic, wrote a small essay: “Cybersecurity AI: The Dangerous Gap Between Automation and Autonomy,” to sort fact from fiction -> https://shorturl.at/1ytz7

zlacker

XBOW, an autonomous penetration tester, has reached the top spot on HackerOne