How To Secure GenAI by Implementing RBAC In The Enterprise

Where GenAI tools fall short – RBAC (role-based access control) best practices offer much-needed data protection.

Isaac Madan Co-Founder and CEO, Nightfall AI

May 28, 2024

(Credits: Shutterstock)

GenAI tools are revolutionizing many business processes, but current iterations do not offer appropriate data protection for enterprises. However, companies that implement Role-based access control(RBAC) best practices can drastically reduce the risk created by the current gap in protection.

What company hasn’t deployed generative AI (GenAI) at this point? Organizations like Microsoft, Expedia, and Instacart have leveraged it to create a new generation of chatbots and customer search tools. Employees from a wide variety of companies, from McKinsey to Walmart, regularly use GenAI tools to do their jobs.

Unfortunately, company GenAI policies have lagged behind adoption, and an unfortunate consequence has been data leak incidents that result from employees inputting sensitive data into public platforms like ChatGPT. So, it’s no surprise that ISC2 recently foundOpens a new window that 54% of respondents reported increased AI-related security threats over the previous six months.

Enterprise GenAI data security is a complex issue that requires a multifaceted approach; there’s no single, all-encompassing solution. Role-based access control (RBAC) is a crucial security feature that can significantly reduce unauthorized access to sensitive data when building AI applications, ensuring that users only have access to information on a need-to-know basis. However, it’s important to note that RBAC is not a feature that GenAI models inherently support. Instead, it needs to be implemented in the context of the AI application as it interacts with and retrieves data from various sources.

GenAI platforms must sharpen their RBAC offerings, allowing for granular control over data access based on user roles and permissions. In the meantime, companies can take immediate action by designing their AI applications to incorporate RBAC principles, ensuring that the application layer controls access to sensitive data based on predefined roles and permissions.

Let’s delve deeper into the challenge of data access in GenAI environments, explore areas where GenAI tools can improve their RBAC offerings, and discuss practical steps companies can take to implement RBAC soon, all while mitigating risks associated with unauthorized access to sensitive data.

Access: An AI Data Security Conundrum

Enterprise GenAI poses a significant security risk due to its reliance on vast training data. To achieve optimal performance, enterprise AI is often trained on potentially sensitive business data, such as confidential meeting notes, budgets, and other proprietary information. The AI converts this data into vector embeddings, which are later used to retrieve or summarize relevant information in response to employee prompts.

However, this process raises a critical issue: unauthorized access to sensitive data. If an employee without the appropriate clearance asks an enterprise AI tool to disclose intellectual property, financial projections, or other classified intelligence, it could lead to severe consequences. Unrestricted access to sensitive data leaves businesses vulnerable to data leaks, which can lead to losing customer trust and falling out of compliance with industry standards like HIPAA, PCI-DSS, or ISO-27001.

To mitigate these risks, companies may consider restricting access to AI tools to a select few employees. However, this approach defeats the purpose of providing company-wide access to powerful and innovative tools that can enhance productivity and decision-making. RBAC offers a potential solution to this dilemma, as it can be implemented at two distinct layers to ensure comprehensive data security when using enterprise GenAI.

Firstly, RBAC can be applied at the end-user layer to control who can access the AI tool. Companies can define user roles and permissions to determine which employees can interact with the application. This granular control ensures that the AI only retrieves and presents information based on the user’s predefined role and permissions.

Secondly, RBAC can be implemented at the AI layer to control what data and functionality the AI model can access based on the permissions of the user making the request. In this case, the AI model is designed to understand and respect the user’s role and permissions when processing prompts and retrieving information.

The answer seems simple: RBAC. However, current GenAI tools don’t inherently offer RBAC capabilities, which poses a challenge for enterprises. As we’ll discuss, GenAI vendors face limitations in incorporating the necessary elements of RBAC, so it’s up to companies to implement alternative strategies themselves.

See More: Don’t Go It Alone: Effective GenAI Implementation Requires Collaboration

Current AI Apps Lack RBAC Primitives

The rapid adoption of GenAI as a ubiquitous component of modern enterprise applications has outpaced the development of robust security measures in this new environment. At a glance, some of the legwork involved in implementing RBAC at scale for AI apps includes creating data filters and implementing identity access management capabilities.

1. Data Filters

AI models cannot often apply data filters during the training process, as well as when they retrieve information. This limitation can increase an organization’s risk if not addressed. The specific data that must be protected in a GenAI environment varies depending on factors such as:

The company’s industry (e.g., healthcare, finance, or technology)
Unique company policies (e.g., internal data classification and handling guidelines)
Compliance frameworks to which the company must adhere (HIPAA, GDPR, or PCI-DSS, for example)

Without the ability to apply filters based on these factors, models may inadvertently expose sensitive data during training or when responding to user prompts. Customizable data filters would allow organizations to define specific criteria to control, which the data model could then access and use, both during the training phase and when providing information to users.

For example, a filter could be set up to exclude any data classified as “confidential” or “restricted” from being used to train the LLM or being included in the AI’s responses. This would help ensure that sensitive information is not inadvertently exposed, even if a user without the appropriate permissions attempts to access the data. Multiple methods can aid in the process of filtering sensitive data from training data sets, including:

Data identification, redaction, and scrubbing
Synthetic data generation
Privacy-preserving federated learning

In summation, implementing data filters is a crucial aspect of applying RBAC principles to GenAI tools, as they allow organizations to maintain control over the data the AI can access and use based on the organization’s unique security and compliance requirements.

2. Identity and access management

Many GenAI applications fail to incorporate RBAC or least-privilege principles when training LLMs, which can lead to data being accessible where it shouldn’t be. Ensuring that LLMs are trained with RBAC-based access controls is crucial to maintaining data security. In line with this assertion, granular permission settings are essential to ensure that models provide outputs based on data that the user is authorized to see.

Similarly, AI apps deployed via APIs often lack the appropriate measures to authorize user identities as they send prompts, which can lead to unauthorized access to sensitive information encoded in the trained vectors. Integrating identity management and authentication mechanisms is crucial to prevent unauthorized access.

Robust access management and least-privilege access are essential not just in the context of training data but also in the context of downstream actions that the LLM may take. For example, if an LLM is integrated with a database downstream, the LLM should not have any permissions on the database that are greater than the end-user prompting the LLM. Otherwise, the LLM may serve as a basis for privilege escalation.

See More: API Security in the AI Era: Challenges and Innovations

Implementing RBAC Best Practices in GenAI Environments

In due time, security standards for AI tools may be a matter of regulation. For example, the EU AI Act, which the European Parliament adopted on March 13, 2024Opens a new window , will break AI systems into risk tiers and require high-risk systems like critical infrastructure to adhere to security standards. However, until AI data security standards are more ubiquitous, companies must create, implement, and adhere to their own GenAI security policies.

Organizations should follow a structured approach that ensures granular control over data access and user permissions. Here are the critical steps to implementing RBAC:

1. Define user roles and permissions

Begin by identifying the various user roles within your organization and defining the specific permissions associated with each role. For example, you may have roles such as “admin,” “manager,” “analyst,” and “user,” each with different levels of access to data and functionality within the AI application.

2. Categorize and classify data

Classify your organization’s data based on sensitivity and criticality. This may involve labeling data as “public,” “internal,” “confidential,” or “restricted.” Clearly define the criteria for each classification level and ensure that all data is appropriately categorized.

3. Map roles to data access

Create a matrix that maps user roles to the appropriate data classifications. This matrix should define which roles have access to each level of classified data. For example, “admin” and “manager” roles may have access to “confidential” data, while “analyst” and “user” roles may only have access to “internal” and “public” data.

4. Implement access control mechanisms

Integrate access control mechanisms into the AI application layer to enforce the defined RBAC policies. This may involve using techniques such as:

Authentication: Ensure that users are appropriately authenticated before granting access to the AI application.
Authorization: Implement authorization checks to verify that users have the appropriate permissions to access specific data and functionality within the AI application.
Data filtering: Apply custom data filters to restrict the data that the AI model can access and use based on the user’s role and permissions.
API-level controls: Implement RBAC at the API level to control which endpoints and functionalities each user role can access.

5. Train and fine-tune the LLM with RBAC

When training and fine-tuning the LLM, ensure that RBAC principles are applied during the data preprocessing and tokenization stages. This involves filtering the training data based on the defined access controls and ensuring that the LLM only learns from data appropriate for each user role.

6. Test and validate RBAC implementation

Thoroughly test the RBAC implementation to ensure that access controls function as intended. Conduct rigorous testing scenarios to verify that users can only access the data and functionality they are authorized to access based on their role and permissions.

7. Monitor and audit access

Regularly monitor and audit user access to the AI application to detect suspicious activities or potential breaches. Implement logging mechanisms to track user actions and maintain an audit trail for compliance and security purposes.

8. Regularly review and update RBAC policies

As your organization’s needs and data landscape evolve, regularly review and update the RBAC policies to ensure they remain relevant and effective. This may involve adding new user roles, modifying permissions, or adjusting data classifications as required.

By implementing these RBAC strategies, organizations can ensure that the AI model only accesses and utilizes data determined by users’ permissions and roles. This granular control helps protect sensitive information, maintain compliance with industry regulations, and reduce the risk of unauthorized access to data.

When it comes to rolling out GenAI in the enterprise, RBAC is emerging as a promising security building block. It can help companies balance embracing new AI technologies and maintaining control over access to data and resources. Over time, we expect to see GenAI tools address access issues with future iterations. Until then, it’s up to companies to use the resources at their disposal to tap into the tremendous potential GenAI offers while ensuring the data they need to harness its ROI isn’t compromised.

Image Source: Shutterstock

MORE ON DATA SECURITY

Isaac Madan

Co-Founder and CEO, Nightfall AI

opens a new window opens a new window

Isaac Madan is the Co-Founder and CEO of Nightfall AI, the first data leak prevention (DLP) platform built with generative AI at the core. Before building Nightfall, Madan and his co-founder saw firsthand how legacy DLP solutions could overload security teams with false positive alerts. With this in mind, they combined their knowledge of machine learning, engineering, and enterprise security to create the first DLP platform that leverages AI to detect PII, PCI, PHI, secrets, and credentials with industry-leading accuracy.

Do you still have questions? Head over to the Spiceworks Community to find answers.

How To Secure GenAI by Implementing RBAC In The Enterprise

Access: An AI Data Security Conundrum