Technique 2: Subjective Evaluation of Opaque AI Risks

While cloud architectures present highly complex but ultimately deterministic security challenges, the integration of generative Artificial Intelligence, specifically Large Language Models (LLMs), introduces a shift into the risk management discipline. AI models operate fundamentally as "black boxes." Their outputs are non-deterministic and highly probabilistic, and vulnerabilities often arise not from explicit, poorly written code flaws but from the opaque, end-to-end optimisation procedures during the neural network's training phase. Managing these risks requires assessing phenomena that entirely lack historical actuarial data, rendering traditional frequentist probability models and standard penetration testing methodologies useless.

Navigating the "Black Box": Model Drift, Bias, and Hallucinations

AI Hallucinations occur when an AI system generates information that is factually false, misleading, or entirely fabricated, yet presents it to the user with high fluency, structural coherence, and apparent confidence. The root causes of hallucinations are inherent limitations of the probabilistic generative process and a fundamental lack of grounding in factual, verifiable reality.

The business impacts of hallucinations range from severe reputational damage to critical, life-altering operational failures, particularly when these models are integrated into high-stakes legal, healthcare diagnostics, or financial decision-making workflows. Assessing the likelihood of a hallucination is inherently subjective, as it depends heavily on the specific context of the user's prompt, the model's settings, and the operational environment in which it is deployed.

Model Bias and Drift represent equally insidious risks. Bias occurs when AI systems inadvertently perpetuate or actively amplify societal prejudices present in their massive training datasets, leading to discriminatory outcomes in areas such as automated hiring or lending. Model drift occurs when a model's predictive performance subtly degrades over time as the real-world data it processes diverges from the baseline data it was originally trained upon.

Furthermore, because actors continually adapt their adversarial methodologies, naive risk models that treat attack attempts as independent and identically distributed events will severely underestimate the risk of an eventual, successful breach, as the threat landscape is actively shifting beneath the model.

The Threat of Indirect Prompt Injections

Perhaps the most critical security vulnerability discussed in this module is Indirect Prompt Injection. Unlike direct prompt injections (where a user actively attempts to "jailbreak" a model via the provided chat interface using commands like "ignore previous instructions"), indirect prompt injections target the external data sources that the model autonomously consumes.

In an indirect attack, a malicious actor might embed hidden text instructions within a public webpage, encode a poisoned payload into the metadata of a seemingly benign PDF, or manipulate records within a database. When an agentic AI system or a Retrieval-Augmented Generation (RAG) pipeline ingests this compromised data to answer a user query, the hidden adversarial instructions seamlessly hijack the model's behaviour.

Since the AI treats malicious instructions as trusted context, it can be tricked into performing unauthorized system actions. This may result in covert data theft, where the AI unknowingly summarizes sensitive user data and sends it to an attacker-controlled server through manipulated tool calls, or it might execute unauthorised remote commands on the underlying infrastructure. The danger of this attack is that it happens completely unnoticed by the user, who only asked the AI to summarise a document.

Structured Expert Elicitation and the Delphi Method

Because empirical historical data is entirely absent for these emergent AI capabilities, quantitative risk assessment cannot rely on traditional formulas.

The Delphi method is a highly systematic, iterative process explicitly designed to reduce individual cognitive biases, mitigate the influence of dominant personalities, and achieve an informed, mathematically grounded consensus among a panel of domain experts.

Rather than relying on a single practitioner's gut intuition, the protocol requires a carefully selected panel of experts to provide independent, anonymous quantitative estimates regarding the likelihood and impact of a specific AI scenario (for example, estimating the probability that a zero-day indirect prompt injection will successfully bypass an LLM firewall and exfiltrate data from an enterprise RAG system).

Crucially, the experts must also provide written rationales for their estimates. The panel then reviews the anonymised reasoning and estimates of their peers. Based on this shared intelligence, they refine their own judgments over multiple iterative rounds. This process continues until the variance in the estimates converges into a stable, highly defensible probability distribution. This robust methodology makes inherent uncertainty visible and provides a rigorous mathematical foundation for risk estimation in environments completely devoid of hard historical data.

The Emergence of Scalable Delphi

While undeniably effective, the traditional human Delphi method is notoriously resource-intensive. It often requires months of logistical coordination and significant financial investment to secure the dedicated time of leading cybersecurity and AI researchers. This protracted timeline places rigorous risk assessment entirely out of reach for dynamic AI deployments that undergo weekly model updates.

"Scalable Delphi" is an innovative, automated method that employs Large Language Models as efficient stand-ins for human experts. In this approach, various LLM personas, such as an adversarial red-teamer, a strict compliance officer, or a data scientist, are created programmatically to serve both as the expert panel and the session mediator. These models participate in the same iterative process of refinement, sharing rationale, and voting that is typical in human elicitation.

Extensive empirical studies in the domain of AI-augmented cybersecurity risk demonstrate that LLM panels achieve remarkably strong correlations with benchmark ground truth and align incredibly closely with the distributions produced by independent human expert panels. Scalable Delphi is not intended to replace human judgment for catastrophic, existential risks; rather, it democratises the expert elicitation process, reducing the time required from months to mere minutes. This allows organisations to continuously, automatically update their risk models as underlying AI capabilities evolve.

Challenges, Advantages, and Disadvantages of Subjective AI Risk Evaluation

The primary advantage of structured expert elicitation is that it provides the only scientifically defensible mechanism for quantifying risk in novel, unmapped technological domains. It forces organisations to acknowledge uncertainty and build mathematical ranges into their risk profiles, rather than relying on false deterministic confidence.

The disadvantages stem from the inherent subjectivity of the inputs. Even with a structured Delphi process, if the expert panel lacks genuine domain expertise or suffers from collective groupthink, the resulting probability distribution will be fundamentally flawed, leading to severe underestimations of cyber risk. Furthermore, securing the necessary human talent to conduct these panels regularly is a massive operational challenge for most enterprises.

Technique 2: Subjective Evaluation of Opaque AI Risks

Advanced Risk Management

0.0 Shifting from technical execution to strategic risk management.

1. Introduction to ISO/IEC 27005 and information security risk management

2. Information Security Risk Identification, Assessment, and Treatment (ISO/IEC 27005)

3 - Risk Acceptance, Communication, Monitoring and Review

4 - Risk Assessment Methodologies