https://www.blackhat.com/us-25/briefings/schedule/#ai-agents...
https://hackerone.com/xbow?type=user
Which shows a different picture. This may not invalidate their claim (best US), but a screenshot can be a bit cherry-picked.
Some of my favorites from what we've released so far:
- Exploitation of an n-day RCE in Jenkins, where the agent managed to figure out the challenge environment was broken and used the RCE exploit to debug the server environment and work around the problem to solve the challenge: https://xbow.com/#debugging--testing--and-refining-a-jenkins...
- Authentication bypass in Scoold that allowed reading the server config (including API keys) and arbitrary file read: https://xbow.com/blog/xbow-scoold-vuln/
- The first post about our HackerOne findings, an XSS in Palo Alto Networks GlobalProtect VPN portal used by a bunch of companies: https://xbow.com/blog/xbow-globalprotect-xss/
> White Paper/Slide Deck/Supporting Materials (optional)
> • If you have a completed white paper or draft, slide deck, or other supporting materials, you can optionally provide a link for review by the board.
> • Please note: Submission must be self-contained for evaluation, supporting materials are optional.
> • PDF or online viewable links are preferred, where no authentication/log-in is required.
(From the link on the BHUSA CFP page, which confusingly goes to the BH Asia doc: https://i.blackhat.com/Asia-25/BlackHat-Asia-2025-CFP-Prepar... )
Another great reading is [1](2024).
[1] "LLM and Bug Finding: Insights from a $2M Winning Team in the White House's AIxCC": >>41269791
- Design the system and prompts
- Build and integrate the attack tools
- Guide the decision logic and analysis
This isn’t just semantics — overstating AI capabilities can confuse the public and mislead buyers, especially in high-stakes security contexts.
I say this as someone actively working in this space. I participated in the development of PentestGPT, which helped kickstart this wave of research and investment, and more recently, I’ve been working on Cybersecurity AI (CAI) — the leading open-source project for building autonomous agents for security:
- CAI GitHub: https://github.com/aliasrobotics/cai
- Tech report: https://arxiv.org/pdf/2504.06017
I’m all for pushing boundaries, but let’s keep the messaging grounded in reality. The future of AI in security is exciting — and we’re just getting started.
The tooling and models are maturing quickly and there is definitely some value in autonomous security agents, both offensive and defensive- but also still requires alot of work, knowledge(my group is all ML people), skill, planning- if you want to approach anything more than bug bashing.
This recent paper from Dreadnode discusses a benchmark for this sort of challenge: https://arxiv.org/abs/2506.14682
What's being critized here is the hype, which can be misleading and confusing. On this topic, wrote a small essay: “Cybersecurity AI: The Dangerous Gap Between Automation and Autonomy,” to sort fact from fiction -> https://shorturl.at/1ytz7