Security researcher spends $1,500 testing LLM exploit capabilities
A security researcher has published findings from a $1,500 API expenditure designed to determine if artificial intelligence can successfully reproduce common software exploits.
Security researcher Kasra has released the results of a controlled experiment designed to evaluate the practical capabilities of large language models in reproducing software exploits. The study centred on a deliberately vulnerable book review application, constructed using a React Native frontend via Expo and a Python backend. The primary objective for the AI models was to locate a hidden "flag" within a user's private reviews, simulating a common class of security vulnerabilities identified in previous research.
The experiment was driven by a desire to assess whether current generative AI tools could effectively automate the reproduction of exploit classes that Kasra had previously encountered while conducting security research for various applications and websites. To facilitate this, the researcher built a custom test environment and engaged multiple models to attempt the task. The initiative highlights the growing intersection of artificial intelligence and cybersecurity, specifically focusing on the potential for AI to assist in or hinder vulnerability discovery.
Kasra invested $1,500 in API costs to execute the trial, which involved conducting ten full runs across several models. Due to the escalating costs associated with processing, the researcher was unable to complete the full ten-run protocol for all models tested, though partial runs were included for completeness. The specific performance metrics and success rates of individual models were shared in the published findings, offering a transparent look at how different AI architectures performed against the challenge.
Following the completion of the runs, the researcher made the test application package (APK) and the challenge description publicly available. By distributing a ZIP file containing these materials, Kasra invited the broader security community and developers to test their own models against the same vulnerable application. This open approach aims to foster independent verification and further exploration of how AI agents interact with security-critical code.
The experiment underscores the increasing role of AI in both offensive and defensive security practices. As large language models become more integrated into development and testing workflows, understanding their ability to identify and exploit vulnerabilities is becoming increasingly relevant for institutions managing digital risk. The researcher has also indicated availability for custom model building and extracting business insights from unstructured data, reflecting the broader commercial interest in these technologies.


