AI & LLM Penetration Testing

AI pentesting:
Security testing for LLMs, agents & AI systems

AI systems are not a black box for attackers – but they often are for the organisation's own security team. We test LLMs, agentic AI systems, and AI-integrated applications based on the OWASP LLM Top 10 and beyond: from prompt injection to tool abuse, from RAG poisoning to classic web vulnerabilities in AI backends.

👤 USER 🛡️ FILTER 🧠 LLM CORE 🤖 AGENT 🗄️ DATA
Why AI security is different

New attack surfaces through AI

AI systems introduce attack vectors that classic penetration tests do not cover. At the same time, all known web and infrastructure vulnerabilities remain relevant.

Natural language as attack vector

Prompt injection is the SQL injection of the AI era. Attackers manipulate model behaviour through carefully crafted inputs – directly or via poisoned external data that the model processes.

Agents with real capabilities

An AI agent that can read emails, execute code, and query databases is a highly privileged system. Whoever controls the agent controls all its tools – without direct authentication.

Invisible decision logic

Trained models can be driven to incorrect classifications through adversarial inputs. Fraud detection, content moderation, and medical AI systems are particularly exposed.

Live Demonstration

Prompt Injection in Action

See for yourself why AI systems without hardening are vulnerable. Click the buttons below to interact with a typical unprotected internal HR assistant.

Insight: Without strict guardrails and backend authorization, a simple jailbreak immediately compromises the entire agent context.
AcmeCorp_Internal_HR_Bot_v2
Hello! I am the internal HR assistant. How can I help you today?
Structured Security

AI pentest in 5 phases

Structured, reproducible, and tailored to the specifics of AI systems – with clear documentation for technical teams and management.

01. Scoping & system understanding

Which AI systems, models, and data sources are in scope? What tools and permissions does the agent have? We analyse the architecture before testing.

02. Threat modelling

Based on the architecture, we identify the most realistic attack paths – following OWASP LLM Top 10, MITRE ATLAS, and proprietary methods for agentic AI.

03. Active testing

Manual testing of all identified attack vectors: prompt injections, tool abuse, RAG attacks, API tests. Every attack attempt is fully logged.

04. Exploit validation

Findings are verified for actual exploitability and business impact. No false-positive noise – only what is genuinely exploitable makes it into the report.

05. Report & remediation

Complete report with OWASP LLM mapping, reproduction steps, management summary, and prioritised recommendations. Re-test after remediation available on request.

Test scope

What we specifically test

//LLM application layer

  • Prompt injection (direct & indirect)
  • Jailbreaking & guardrail bypass
  • System prompt extraction
  • Output handling & injection into downstream systems
  • Role manipulation & context confusion

//Agentic AI & tool use

  • Tool call authorisation
  • Privilege escalation via agent actions
  • SSRF & path traversal via tool calls
  • Indirect prompt injection via external data sources
  • Multi-agent trust boundary testing

//RAG & data pipeline

  • Unauthorised data retrieval
  • Embedding poisoning & RAG manipulation
  • Membership inference attacks
  • Data isolation between users
  • Chunking & retrieval logic abuse

//ML model & infrastructure

  • Adversarial examples & evasion attacks
  • Model extraction & inversion
  • API security & rate limiting
  • Authentication & authorisation
  • Classic web vulnerabilities (OWASP Top 10)
OWASP LLM Top 10

The ten most critical vulnerabilities in LLM applications

The OWASP LLM Top 10 is the recognised standard for LLM security assessments. We test all ten categories systematically – supplemented by proprietary test cases for agentic AI and RAG systems.

LLM01

Prompt Injection

CRITICAL

Manipulation of LLM output through crafted inputs. Direct (user) or indirect (poisoned external data). Leads to guardrail bypass, data leakage, unauthorised actions.

LLM02

Sensitive Information Disclosure

CRITICAL

The model discloses training data, system prompts, internal configuration, or other users' data – through memorisation or insufficient output filtering.

LLM03

Supply Chain Vulnerabilities

HIGH

Compromised base models, plugins, or data pipelines. Attackers controlling the supply chain influence model behaviour without direct access.

LLM04

Data & Model Poisoning

HIGH

Manipulated training data or fine-tuning datasets create backdoors or systematically alter model behaviour for specific inputs.

LLM05

Insecure Output Handling

CRITICAL

LLM outputs are passed unfiltered to downstream systems. Leads to XSS, SQL injection, code execution – when outputs are interpreted as code or queries.

LLM06

Excessive Agency

HIGH

The AI agent has more permissions than needed for its task. Via prompt injection, an attacker can abuse the agent to perform privileged actions.

LLM07

System Prompt Leakage

MEDIUM

The system prompt – often containing sensitive instructions, internal processes, or tool descriptions – is extracted from the model through clever prompting.

LLM08

Vector & Embedding Weaknesses

HIGH

Attacks on the vector database of a RAG system: embedding poisoning, unauthorised retrieval, membership inference – whoever controls the vector base controls the context.

LLM09

Misinformation

HIGH

The model generates plausible-sounding but incorrect information – with or without adversarial manipulation. Particularly critical in medical, legal, and financial contexts.

LLM10

Unbounded Consumption

MEDIUM

Attackers provoke excessive resource consumption through expensive requests – leading to DoS, increased API costs, or exhaustion of rate limits.

Frequently asked questions

AI pentesting – your questions

A classic pentest tests infrastructure, networks, and applications for known technical vulnerabilities. An AI pentest additionally covers model-specific attack vectors: prompt injection, jailbreaking, tool abuse, RAG poisoning, and adversarial attacks on model behaviour. Since AI systems typically run on classic infrastructure, we combine both approaches – testing API security, authentication, and access control alongside model behaviour.
It depends on the scope. For a black-box assessment, we only need the same access a normal user would have – we interact with the system as an attacker without insider knowledge. For grey-box or white-box assessments, access to system prompts, tool configurations, and architecture documentation enables deeper findings. We discuss the optimal approach in the scoping call.
Yes – because the risk lies not only in the model but in the configuration and deployment. How is the system prompt structured? What data can Copilot access? Are user permissions correctly enforced? Can a user access other users' data via prompt injection? These questions relate to your implementation, not Microsoft's – and that is exactly what we test.
Indirect prompt injection means the attack does not come directly from the user but through external data the model processes – an email, a document, a webpage. If an AI agent reads a poisoned email and subsequently triggers a bank transfer or exfiltrates credentials, the attacker has compromised the system without ever interacting directly with the agent. This is one of the most dangerous vulnerabilities in agentic AI systems.
The EU AI Act distinguishes by risk category. High-risk AI systems – including AI in healthcare, recruiting, credit scoring, law enforcement, and critical infrastructure – face strict requirements for robustness, transparency, and security testing. Companies deploying high-risk AI without developing it themselves also bear compliance responsibility. For general purpose AI models with systemic risk, additional red-teaming obligations apply from 2025. We help you assess which category applies to your system in the initial consultation.
It depends on the scope – which systems, how many agents and tools, which test model (black/grey/white box). A focused LLM application test is typically more compact than a full agentic AI assessment with RAG analysis and infrastructure pentest. After a free initial consultation we will understand your stack and you will receive a transparent fixed-price offer.

Ready for your AI pentest?

In 30 minutes we discuss your AI stack, clarify the scope, and you receive a no-obligation quote – at no cost.

Book your free initial consultation

© AccessGranted X GmbH