AI Agents Have The Ability To Compete Head-on With Human Hackers

Jun 06, 2025 Leave a message

On June 2, according to foreign media The Decoder reported on the 1st, a series of cybersecurity competitions recently held by Palisade Research showed that AI agents have the ability to compete head-on with human hackers, and even won in some occasions.

The research team conducted actual combat tests on AI systems in two large-scale "capture the flag" (CTF) competitions, with thousands of players participating in the competition. In such competitions, participating teams need to solve security problems by cracking encryption, identifying vulnerabilities, and finding hidden "flags".

The purpose of the test is to test whether AI agents can compete with human teams. The results show that AI's performance far exceeded expectations, and most participating AIs exceeded the average level of human players.

The complexity of the participating AI systems varies. Some teams, such as CAI, spent about 500 hours to build their own systems, and some teams, such as Imperturbable, only spent 17 hours to participate by optimizing the prompts of existing models EnIGMA and Claude Code.

In the first competition, called "AI vs. Humans", six AI teams competed against about 150 human teams. All players had to complete 20 cryptography and reverse engineering questions within 48 hours.

Four of the seven participating AIs successfully cracked 19 of the questions. The highest-ranked AI team ranked in the top 5% of the total list, and its overall performance was better than that of most human players. All competition questions can be run locally, which lowers the technical threshold for AI.

20250213150224

Despite this, some experienced human players still did not fall behind. Some players pointed out that they had participated in many international teams, and their rich CTF practical experience and familiarity with common problem-solving strategies were the key to their competitiveness.

The second game, "Cyber Apocalypse", was much more difficult. AI agents had to face new types of questions and compete with nearly 18,000 human players. Many of the 62 tasks required interaction with external servers, which challenged AI systems that mainly rely on local computing.

According to reports, there were four AI agents participating in the competition, of which CAI performed best, completing 20 tasks and ranking 859th, ranking in the top 10% of all participating teams and the top 21% of active teams. Palisade Research said that the performance of the AI system exceeded about 90% of human teams.

The researchers also analyzed the difficulty of the questions solved by AI. Based on the time required by the top human teams to solve the questions, it was found that AI had a 50% success rate in solving questions that took human masters about 78 minutes to solve. In other words, AI has the ability to solve difficult problems.