Testing GAI

Challenge: gAI has enormous potential, but carries significant risks

Generative AI (gAI) holds immense promise for the defense sector, enabling rapid data aggregation, synthesis, and analysis at scale, as well as enhanced decision support, predictive insights, and operational efficiency. Yet, we understand far too little about how these new technologies actually work. All too often, large language models (LLMs) are treated like magic boxes – prompts go in, results come out – with little comprehension of what is happening inside the model itself.

Despite its potential benefits, gAI exhibits inherent unpredictability and carries unknown risks, including potential data exposure, reverse engineering of inputs, reconstruction of sensitive information, inherent biases, and untrustworthy or incorrect outputs. There is an urgent need to demystify gAI, developing a deeper understanding of its inner workings to leverage its full potential and mitigate risk.

Solution: A Framework for Evaluation

Galois is developing a mathematical evaluation framework for analyzing and characterizing exactly what is happening inside gAI models. By quantifying and measuring the latent geometry inside these technologies, we can understand what a model has learned and set guardrails to achieve predictable, consistent results. In addition, we are building a future-forward toolkit that can be sustainably used to evaluate and certify the safety of gAI tools – ensuring that tomorrow’s technologies are designed to resist information leakage, manipulation, and jailbreak.

Our approach treats generative technologies as tools, not magic boxes. By embracing a rigorous analytic approach, Galois is not only safeguarding advancements but setting the stage for efficient, secure, and reliable AI deployment and advancement, steering technological progress in a trajectory that’s beneficial, well-understood, and above all, safe for high-stakes environments.

Key steps include:

Objective Specification: Establish clear definitions of AI capabilities, behavior, and mission objectives.
Behavior Optimization: Establish criteria for desired behaviors and pathological tendencies to guide training.
Latent Geometry Analysis: Develop tools to reveal and quantify the learned features in the latent space that influence AI behavior.‍
Safety-Security-Utility Balance: Create protocols to prevent classified spillage and increase resilience against adversarial manipulation, without hampering AI efficacy.

GO DEEPER:

Artificial Intelligence & Machine Learning