Governing LLM based Gen AI Applications
When it comes to Gen AI and Large Language Models, there is widespread agreement on their potential to significantly influence various aspects of our lives. This includes enhancing productivity, advancing automation, and facilitating tasks that are currently considered impractical. Nevertheless, in the context of widespread Enterprise adoption, two primary challenges have consistently emerged as obstacles:
1. Trust: At the heart of Large Language Models (LLMs) are primarily artificial neural networks with transformer architectures. The complexity and scale of these models, characterized by an extensive number of neurons, render them inherently unexplainable and probabilistic. While there have been efforts to rationalize their outputs, these explanations are often insufficient for full trust in LLMs for Enterprise-level deployment. The challenge is further exacerbated by the fact that these foundation models are trained on internet data, which may contain biases, offensive content, and profanity. Consequently, Enterprises frequently limit the use of LLMs to lower-risk applications or implement additional oversight to verify the results.
2. Cost: The expenses associated with training and operating Large Language Models (LLMs) are substantial and have the potential to escalate rapidly if not properly managed. At an Enterprise level, the key challenge lies in the oversight of numerous projects utilizing Gen AI. It is imperative to establish robust processes and automation to guarantee that each invocation of the target LLM is warranted and cost-effective.
The question of how an Enterprise can effectively manage costs while leveraging Large Language Models (LLMs), and simultaneously ensure that applications utilize Gen AI responsibly and at scale, is indeed crucial. Tools such as LangChain and Llama Index serve as orchestrators, providing a framework within which developers can build Gen AI applications with additional safeguards. Developers can also add guardrails to mitigate risks associated with LLMs.
Furthermore, Hyperscalers have introduced reference architectures for the development of Gen AI applications. A typical Gen AI architecture provided by one of the Hyperscalers [Google] is as below –
While the hyperscalers and other Gen AI players, such as Databricks and Hugging Face, advocate for Responsible AI; their approach typically involves providing guidelines and standalone tools that advise developers on best practices for constructing Responsible Gen AI applications. These resources are meant to guide developers in integrating ethical considerations and accountability into their AI systems, ensuring that the applications they build are not only effective but also align with broader societal values and norms.
However, these are still applicable at developer level and need lot of stringent processes automation at Enterprise level to ensure that proper controls are put in place by developers to create Enterprise scale Gen AI applications. Enterprises struggle on cost and trust front in absence of centrally controlled mechanism for e.g. –
a. Whether a particular application is calling LLM way too frequency resulting into cost overruns?
b. Whether users are engaged in prompt hijacking a particular Gen AI application resulting in unintentional behaviour?
c. How can Enterprise perform a cost analysis of the spend on LLM calls as to optimize it.
…. etc.
The above concerns are indeed significant for Enterprises seeking to scale Gen AI applications responsibly. Addressing these challenges requires a multi-faceted approach:
a. Monitoring Frequency of Calls: Implementing a centralized monitoring system that tracks the frequency of LLM calls by applications can help prevent cost overruns. This system would alert administrators if a particular application exceeds predefined thresholds, enabling timely interventions.
b. Preventing Prompt Hijacking: To combat prompt hijacking and unintended behaviours; it’s essential to establish strict validation layers within the applications. These layers can include user behaviour analysis and context-aware restrictions to ensure that prompts remain within the intended scope of use.
c. Cost Analysis and Optimization: For effective cost management, Enterprises can deploy analytics tools that provide a granular breakdown of LLM usage and associated costs. By analysing this data, organizations can identify optimization opportunities, such as caching frequent queries or refining the logic that triggers LLM calls.
etc …
In summary, Enterprises must invest in robust infrastructure and governance frameworks that ensure the responsible and efficient use of Gen AI. This includes creating clear policies, employing advanced monitoring tools, and continuously educating developers on best practices for Responsible AI implementation.
Introducing an additional Governance Layer controlled by the Enterprise is a strategic approach to enhance the reliability of Gen AI applications. This layer acts as a supplementary safeguard, reinforcing the responsible AI practices already in place within the applications. It serves as an Enterprise-specific checkpoint that scrutinizes the interactions between applications and models, ensuring adherence to the organization’s standards for ethical AI use. By implementing this Governance Layer, Enterprises can provide an extra level of assurance, fostering greater confidence in the deployment and functionality of Gen AI applications.
The purpose of the Governance Layer is multifaceted and serves as a critical component in the responsible deployment of Gen AI applications within an Enterprise.
Purpose of the Governance Layer:
The Governance Layer, governed by the Enterprise, acts as an intermediary between Gen AI applications and Large Language Models (LLMs). It is not a direct replacement for the LLMs but serves as a regulatory gateway to ensure compliance and control. The Governance Layer is responsible for the following functions:
Access Control: It ensures that Gen AI applications throughout the organization are restricted to using only a vetted list of LLMs. This list is centrally managed and enforced by the Governance Layer, providing a consistent and secure approach to accessing LLM resources.
Logging and Analytics: The Governance Layer is tasked with recording all application interactions with LLMs, including the inputs provided and the outputs received. These logs are then stored in an analytics database, enabling detailed analysis and insights. This data can be leveraged to monitor usage patterns, optimize performance, and ensure cost-effective operation of Gen AI applications.
Guard Rails: It will establish protective measures for both the input to and output from LLMs, scrutinizing inputs for potential misuse and outputs for biases, derogatory language, or profanity etc.
Context Validation: The Governance Layer will assess outputs against provided context information, such as RAG (Red Amber Green) ratings, to safeguard against LLM-generated inaccuracies or “hallucinations.”
Prompt Security: It will evaluate user inputs to prevent prompt hijacking, ensuring that the context and intent of prompts are preserved and not overridden.
Data Categorization: The Governance Layer will verify that the category of input data aligns with the specific application or LLM in use.
PII Protection: It will screen for Personally Identifiable Information (PII) and anonymize it before transmission to LLMs, thereby safeguarding personal data.
Usage Authorization: The Governance Layer will serve as an additional checkpoint to confirm user or application authorization for LLM access.
Prompt Caching: To minimize costs, it will implement prompt caching strategies, storing inputs and outputs, and determining whether to make a new LLM call or retrieve information from the cache based on specific criteria.
Prompt Optimization: It will optimize the number of tokens sent to LLMs, reducing the cost associated with processing large volumes of data.
Performance Analysis: Analysing log data created by Governance Layer will enable identification of alternative models that offer a better balance of cost and performance.
In essence, the Governance Layer serves as a critical component in the governance of Gen AI applications, providing a structured approach to maintaining cost-efficiency, security, and trustworthiness in Enterprise AI operations.
By implementing such a Governance Layer, Enterprises can establish a robust framework for monitoring, controlling, and analysing the use of LLMs, thereby enhancing trust and accountability in their Gen AI initiatives.
Sample Design of GOVERNANCE LAYER
In its simplest form, Governance Layer can be built by following the Proxy Design Pattern to manage interactions between Gen AI applications and Large Language Models (LLMs).
Sample Design of Governance Layer Using Proxy Design Pattern:
Model Proxy: Acts as an intermediary between Gen AI applications and LLMs. It exposes APIs for the applications to interact with, rather than allowing direct communication with the LLMs.
Deployment: The Model Proxy can be deployed within a Kubernetes cluster, offering scalability to handle varying loads from Gen AI applications.
Functionality: Prior to invoking LLMs, the Model Proxy performs essential functions such as input validation, output sanitization, and authorization checks.
Flexibility: Capable of interfacing with models deployed in various environments, whether on-premises or cloud-based within an Enterprise’s project or subscription.
Integration: Can also integrate with SaaS-based models, providing a versatile framework that accommodates different LLM services.
Analytics: Model Proxy will log the information on a data store enabling advanced analytics for improving the cost-performance and trust for Gen AI applications.
This architecture not only enhances security and control but also ensures that the Enterprise’s standards for responsible AI are consistently applied across all applications.
Abhinav Ajmera
Enterprise Gen AI Architect.