LLM Prompt Injection Worm

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/03/llm-prompt-injection-worm.html

Researchers have demonstrated a worm that spreads through prompt injection. Details:

In one instance, the researchers, acting as attackers, wrote an email including the adversarial text prompt, which “poisons” the database of an email assistant using retrieval-augmented generation (RAG), a way for LLMs to pull in extra data from outside its system. When the email is retrieved by the RAG, in response to a user query, and is sent to GPT-4 or Gemini Pro to create an answer, it “jailbreaks the GenAI service” and ultimately steals data from the emails, Nassi says. “The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client,” Nassi says.

In the second method, the researchers say, an image with a malicious prompt embedded makes the email assistant forward the message on to others. “By encoding the self-replicating prompt into the image, any kind of image containing spam, abuse material, or even propaganda can be forwarded further to new clients after the initial email has been sent,” Nassi says.

It’s a natural extension of prompt injection. But it’s still neat to see it actually working.

Research paper: “ComPromptMized: Unleashing Zero-click Worms that Target GenAI-Powered Applications.

Abstract: In the past year, numerous companies have incorporated Generative AI (GenAI) capabilities into new and existing applications, forming interconnected Generative AI (GenAI) ecosystems consisting of semi/fully autonomous agents powered by GenAI services. While ongoing research highlighted risks associated with the GenAI layer of agents (e.g., dialog poisoning, membership inference, prompt leaking, jailbreaking), a critical question emerges: Can attackers develop malware to exploit the GenAI component of an agent and launch cyber-attacks on the entire GenAI ecosystem?

This paper introduces Morris II, the first worm designed to target GenAI ecosystems through the use of adversarial self-replicating prompts. The study demonstrates that attackers can insert such prompts into inputs that, when processed by GenAI models, prompt the model to replicate the input as output (replication), engaging in malicious activities (payload). Additionally, these inputs compel the agent to deliver them (propagate) to new agents by exploiting the connectivity within the GenAI ecosystem. We demonstrate the application of Morris II against GenAI-powered email assistants in two use cases (spamming and exfiltrating personal data), under two settings (black-box and white-box accesses), using two types of input data (text and images). The worm is tested against three different GenAI models (Gemini Pro, ChatGPT 4.0, and LLaVA), and various factors (e.g., propagation rate, replication, malicious activity) influencing the performance of the worm are evaluated.

Kernel prepatch 6.8-rc7

Post Syndicated from corbet original https://lwn.net/Articles/964337/

The 6.8-rc7 kernel prepatch is out for
testing.

So we finally have a week where things have calmed down, and in
fact 6.8-rc7 is smaller than usual at this point in time. So if
that keeps up (but that’s a fairly notable “if”) I won’t feel like
I need to do an rc8 this release after all.

So no guarantees, but assuming no bad surprises, we’ll have the
final 6.8 next weekend.

Welcome to Security Week 2024

Post Syndicated from Grant Bourzikas original https://blog.cloudflare.com/welcome-to-security-week-2024


April 2024 will mark my one-year anniversary as the Chief Security Officer at Cloudflare. In the past year, we’ve seen a rapid increase in sophisticated threats and incidents globally. Boards and executives are applying significant pressure to security organizations to prevent security breaches while maintaining only slight increases to budgets. Adding regulatory scrutiny, global security leaders are under pressure to deliver on the expectations from executives to protect their company. While this has been the expectation for over 20 years, we have recently seen a significant rise in attacks, including the largest and most sophisticated DDoS attacks, and the continued supply chain incidents from Solarwinds to Okta. Along with more nation state sponsored attackers, it is clear security professionals – including Cloudflare – can’t let their guards down and become complacent when it comes to security.

This past year, I met with over a hundred customers at events like our Cloudflare Connect conference in London, Chicago, Sydney, and NYC. I spoke with executives, policy experts, and world leaders at Davos. And I’ve been in constant dialogue with security peers across tech and beyond. There is much consistency amongst all security leaders on the pain points and concerns of Chief Information Security Officers (CISOs), spanning every geography and industry, from startups to large Fortune 500s.

Over the course of this week we will announce new products inspired by these conversations that respond to common challenges faced by CISOs around the world. We will cover many aspects of these security concerns, ranging from application security to securing employees and cloud infrastructure. We will also be sharing stories of how we do things at Cloudflare, and some thought leadership blog posts.

My Cloudflare Journey

As a CSO for more than 20 years for some of the world’s largest and most complex companies, I was drawn to the rapid innovation, unique market position and the global network that Cloudflare offers. Looking back on my first year at Cloudflare, the discussions I have had with customers has shaped me into a better CSO. Sharing my own challenges and listening to others has expanded my own understanding of the complex issues that we, Cloudflare, can learn from and adopt.

The core pillars of my organization are to Protect Cloudflare, Foster Innovation, and share “How Cloudflare does it.”  My team is customer zero: first to use Cloudflare products and collaborate on needs of security organizations. Innovation weeks are certainly a key feature of the Cloudflare way, and I’m extremely proud to be able to open Security Week 2024 by announcing a series of exciting new products and features.

Security Priorities in 2024

There are three key challenges that have emerged in my discussions with CISOs and security practitioners: responding to risks and opportunities from AI, maintaining visibility and control as cloud technology changes so quickly, and how to consolidate technologies to effectively manage the security and IT budget.

One of the key topics I heard at Davos is how global leaders can address urgent global issues. As a society, we are facing a number of challenges, ranging from the environment to the ongoing effort to keep democracies functioning. The role of the Internet has never been more crucial, and I believe it’s a shared responsibility to keep it functioning and improve its security.

Our product and engineering teams have been working to deliver an array of solutions aligned to these challenges, and ultimately helping build a better Internet.

Responding to opportunity and risk from AI

No surprise, AI is the number one topic of discussion. At Davos, AI was the common theme across all industries, with a core concern of how to secure and protect our investments. As a leader in AI inference, our engineering and product teams have been working hard on building a way to protect our own, and our customers’, AI models and applications.

This week our product teams are announcing tools to safeguard applications in the era of AI as well as AI-powered features helping our customers simplify how they interact with our analytics.

As a CSO, securing data is a core capability that is only made more challenging as the workforce may choose to use open AI services without understanding the risks. We have some announcements this week aligned with preventing data leakage from AI, as well as how you can use AI to secure against AI-enhanced phishing.

Finally, we will also share our philosophy of how AI can be used to increase the level of defense and security against increasingly sophisticated attacks.

Maintaining visibility and control as applications and clouds change

Effective security programs keenly focus on reducing complexity, increasing visibility, and robust alerting capabilities. A resounding message of 2024 is security by design, rather than bolted-on security. Security by design sounds easy but is more challenging for those of us without a greenfield.

While most do not have the luxury of starting over, many are succeeding by eliminating legacy tooling, such as third party storage tools, and at the same time gaining visibility and control.

There are new ways to secure and connect multi-cloud environments with consistent policy management. Our team will be sharing many new releases they are working on, and a recent acquisition, all aligned to this challenge we all face.

Consolidating to drive down costs

Every year security leaders are asked to do more with less.  With economic uncertainty persisting into 2024, budget constraints have each of us critically analyzing our security stack for value and simplicity. Everyone is looking for strategies that not only reduce costs, but reduce complexity and increase your posture by removing room for human error. The CISOs who I see succeed in this environment have built programs based on simplification. Cloud migrations and zero trust architecture implementations have many asking if those transformations delivered on the promise of simplification and scale. My own zero trust journeys have given me a deep appreciation for the Cloudflare approach in moving away from expensive and complex security architectures.

How can we help make the Internet better?

2024 will be a pivotal year for the Internet. Geopolitical conflict and the elections around the world are being heavily analyzed for impact across every industry.  This week we will share how we can leverage our robust platforms to stand by our mission to help build a better Internet and protect global democracy and large scale international events.

Welcome to Security Week

Innovation weeks are a great tradition at Cloudflare. This is where we launch new capabilities and share new ways to solve the challenges we have heard from our customers. No surprise, Security Week will be my personal favorite. I hope you each walk away with something that makes your job just a little easier.

Седмицата (26 февруари – 2 март)

Post Syndicated from Боряна Телбис original https://www.toest.bg/sedmitsata-26-fevruari-2-mart/

Седмицата (26 февруари – 2 март)

Преди 47 години човечеството е успяло да създаде машина, която толкова да го обича, че да се обърне и да му направи една последна снимка от Космоса, преди да изключи камерата за всичко останало завинаги. Става въпрос за космическата сонда „Вояджър 1“, която днес се намира на ръба на Слънчевата система, в безкрайното нищо, на милиарди километри от нас. Освен това леко се е побъркала и данните, които изпраща, са абсолютно неразчетими, но никой не я хули за това. Защото този необичаен пътешественик свърши много повече работа и стигна много по-далече, отколкото изобщо някой някога си е представял. През 1990 г. именно „Вояджър 1“ ни оставя онази прословута снимка на малката синя точка, която изглежда като изгорял пиксел на черен екран. Днес космическите телескопи ни пращат къде-къде по-детайлни изображения, но нищо не може да се сравни с бледосинята пясъчна люспа, която „Вояджър 1“ ни показа за последно. На нея е всичко. 

Вече втора седмица поред в „Тоест“ се вълнуваме какво става и какво ще стане с „Вояджър 1“, защото краят на сондата ще бъде и краят на цяла епоха. Миналата седмица за това ни разказа Михаил Ангелов в своите научни новини, а сега публикуваме една изключително емоционална и също толкова информативна статия за „Вояджър 1“, която вкарва по симпатичен начин в исторически контекст целия разказ около изстрелването на сондата и ни подръчква да си позволим да сме малко по-чувствителни и сантиментални, което ние, разбира се, избягваме да правим просто защото няма време за всичко. Статията е на Дъг Мюър, публикуваме я с неговото специално разрешение, а преводът е на Анелия Костова.

Обираме леко патоса и преместваме погледа си от далечния Космос към твърде близката и натопорчена политическа реалност, която на фона на вечността изглежда крайно несъществено. Но понеже предстои така напоително обговаряната ротация, се налага да кажем няколко думи и за нея. Прави го Емилия Милчева в текста си „Осми март, ден първи“, като задава правилните въпроси:

кога и как ще бъде скрепен документално съюзът – ще бъдат ли заложени реформи на основни системи, или ще се ограничат до механизма за назначения, изобщо ще се стигне ли до кооперативно равновесие, от което (и) обществото да има известни ползи?

Дали сглобката ще премине в по-стабилен строеж, предстои да разберем. Междувременно тунел „Железница“ на автомагистрала „Струма“ премина от строителен обект в годна за експлоатация инфраструктурна единица. Оказа се обаче, че самите ние като нация може би не сме съвсем преминали в настоящия век, защото продължаваме да живеем като едни предмодерни българи, които вграждат икони в тунели и невести в мостове, за да сме по-сигурни в стабилността на самото съоръжение. По този въпрос разсъждава в текста си „Икони вместо работещи институции“ Светла Енчева. 

От потенциалните външни разпади на мостове, тунели и сглобки минаваме към едни много по-важни и много по-крехки – вътрешните разпади на душата на малки уязвими в самотата си съставни части. На тях е посветена тазседмичната рецензия на Антония Апостолова. Книгата на фокус в рубриката „На второ четене“ е сборникът с разкази „Академия за китове“ от Виолета Златарева, в която се разказва за хората с нестандартна психика и за миговете, в които не са съвсем себе си. 

В една Малайзия с лют сос и съкровища ви предлагаме да се оттеглим като за финал на тазседмичния „Тоест“. Това е първата част от пътеписа на Петя Кокудева за живописната азиатска страна, където покрай пътищата растат помело, манго, драконов плод и ананас, макаци могат да ти откраднат телефона само защото го мислят за храна, а ако ти самият искаш нещо да закусиш, съвсем спокойно може да го направиш с един лют ориз.

Приятно четене на новия „Тоест“.

Friday Squid Blogging: New Extinct Species of Vampire Squid Discovered

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/03/friday-squid-blogging-new-extinct-species-of-vampire-squid-discovered.html

Paleontologists have discovered a 183-million-year-old species of vampire squid.

Prior research suggests that the vampyromorph lived in the shallows off an island that once existed in what is now the heart of the European mainland. The research team believes that the remarkable degree of preservation of this squid is due to unique conditions at the moment of the creature’s death. Water at the bottom of the sea where it ventured would have been poorly oxygenated, causing the creature to suffocate. In addition to killing the squid, it would have prevented other creatures from feeding on its remains, allowing it to become buried in the seafloor, wholly intact.

Research paper.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Read my blog posting guidelines here.

Mistral AI models now available on Amazon Bedrock

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/mistral-ai-models-now-available-on-amazon-bedrock/

Last week, we announced that Mistral AI models are coming to Amazon Bedrock. In that post, we elaborated on a few reasons why Mistral AI models may be a good fit for you. Mistral AI offers a balance of cost and performance, fast inference speed, transparency and trust, and is accessible to a wide range of users.

Today, we’re excited to announce the availability of two high-performing Mistral AI models, Mistral 7B and Mixtral 8x7B, on Amazon Bedrock. Mistral AI is the 7th foundation model provider offering cutting-edge models in Amazon Bedrock, joining other leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. This integration provides you the flexibility to choose optimal high-performing foundation models in Amazon Bedrock.

Mistral 7B is the first foundation model from Mistral AI, supporting English text generation tasks with natural coding capabilities. It is optimized for low latency with a low memory requirement and high throughput for its size. Mixtral 8x7B is a popular, high-quality, sparse Mixture-of-Experts (MoE) model, that is ideal for text summarization, question and answering, text classification, text completion, and code generation.

Here’s a quick look at Mistral AI models on Amazon Bedrock:

Getting Started with Mistral AI Models
To get started with Mistral AI models in Amazon Bedrock, first you need to get access to the models. On the Amazon Bedrock console, select Model access, and then select Manage model access. Next, select Mistral AI models, and then select Request model access.

Once you have the access to selected Mistral AI models, you can test the models with your prompts using Chat or Text in the Playgrounds section.

Programmatically Interact with Mistral AI Models
You can also use AWS Command Line Interface (CLI) and AWS Software Development Kit (SDK) to make various calls using Amazon Bedrock APIs. Following, is a sample code in Python that interacts with Amazon Bedrock Runtime APIs with AWS SDK:

import boto3
import json

bedrock = boto3.client(service_name="bedrock-runtime")

prompt = "<s>[INST] INSERT YOUR PROMPT HERE [/INST]"

body = json.dumps({
    "prompt": prompt,
    "max_tokens": 512,
    "top_p": 0.8,
    "temperature": 0.5,
})

modelId = "mistral.mistral-7b-instruct-v0:2"

accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model(
    body=body,
    modelId=modelId,
    accept=accept,
    contentType=contentType
)

print(json.loads(response.get('body').read()))

Mistral AI models in action
By integrating your application with AWS SDK to invoke Mistral AI models using Amazon Bedrock, you can unlock possibilities to implement various use cases. Here are a few of my personal favorite use cases using Mistral AI models with sample prompts. You can see more examples on Prompting Capabilities from the Mistral AI documentation page.

Text summarization — Mistral AI models extract the essence from lengthy articles so you quickly grasp key ideas and core messaging.

You are a summarization system. In clear and concise language, provide three short summaries in bullet points of the following essay.

# Essay:
{insert essay text here}

Personalization — The core AI capabilities of understanding language, reasoning, and learning, allow Mistral AI models to personalize answers with more human-quality text. The accuracy, explanation capabilities, and versatility of Mistral AI models make them useful at personalization tasks, because they can deliver content that aligns closely with individual users.

You are a mortgage lender customer service bot, and your task is to create personalized email responses to address customer questions. Answer the customer's inquiry using the provided facts below. Ensure that your response is clear, concise, and directly addresses the customer's question. Address the customer in a friendly and professional manner. Sign the email with "Lender Customer Support."

# Facts
<INSERT FACTS AND INFORMATION HERE>

# Email
{insert customer email here}

Code completion — Mistral AI models have an exceptional understanding of natural language and code-related tasks, which is essential for projects that need to juggle computer code and regular language. Mistral AI models can help generate code snippets, suggest bug fixes, and optimize existing code, accelerating your development process.

[INST] You are a code assistant. Your task is to generate a 5 valid JSON object based on the following properties:
name: 
lastname: 
address: 
Just generate the JSON object without explanations:
[/INST]

Things You Have to Know
Here are few additional information for you:

  • Availability — Mistral AI’s Mixtral 8x7B and Mistral 7B models in Amazon Bedrock are available in the US West (Oregon) Region.
  • Deep dive into Mistral 7B and Mixtral 8x7B — If you want to learn more about Mistral AI models on Amazon Bedrock, you might also enjoy this article titled “Mistral AI – Winds of Change” prepared by my colleague, Mike.

Now Available
Mistral AI models are available today in Amazon Bedrock, and we can’t wait to see what you’re going to build. Get yourself started by visiting Mistral AI on Amazon Bedrock.

Happy building,
Donnie

Metasploit Weekly Wrap-Up 03/01/2024

Post Syndicated from Christophe De La Fuente original https://blog.rapid7.com/2024/03/01/metasploit-weekly-wrap-up-03-01-2024/

Connect the dots from authentication bypass to remote code execution

Metasploit Weekly Wrap-Up 03/01/2024

This week, our very own sfewer-r7 added a new exploit module that leverages an authentication bypass vulnerability in ConnectWise ScreenConnect to achieve remote code execution. This vulnerability, CVE-2024-1709, affects all versions of ConnectWise ScreenConnect up to and including 23.9.7.The module creates a new administrator user account on the server, which is used it to upload a malicious extension (.ashx file) and get code execution as the NT AUTHORITY\SYSTEM user on Windows or root user on Linux, depending on the target platform.

New module content (1)

ConnectWise ScreenConnect Unauthenticated Remote Code Execution

Authors: WatchTowr and sfewer-r7
Type: Exploit
Pull request: #18870 contributed by sfewer-r7
Path: multi/http/connectwise_screenconnect_rce_cve_2024_1709

Description: This PR adds an unauthenticated RCE exploit for ConnectWise ScreenConnect (CVE-2024-1709).

Enhancements and features (8)

  • #18830 from sjanusz-r7 – Aligns the behavior of the MSSQL, PostgreSQL, and MySQL sessions. This functionality is currently behind a feature flag enabled with the features command.
  • #18833 from zeroSteiner – This catches an exception when updating a non-existing session. Prior to this PR, trying to run ‘sessions -k’ after running ‘workspace -D’ would result in a stack trace being printed to the console. This resolves issue #18561.
  • #18849 from adfoster-r7 – Adjusts the logic used for the visual indentation of tables.
  • #18872 from zgoldman-r7 – Updates the MSSQL modules to support querying database rows that contain boolean bit values.
  • #18878 from adfoster-r7 – This updates a number of rspec gems which help improve test suite error messages when string encodings are different.
  • #18879 from zeroSteiner – Updates the auxiliary/admin/kerberos/inspect_ticket module with improved error messages and support for printing Kerberos PAC credential information.
  • #18892 from zeroSteiner – Allows users to leverage the latest ADCS ESC13 technique. These changes are related to the identification of misconfigured certificate templates and workflow documentation. ldap_esc_vulnerable_cert_finder and ldap_query were also updated to improve usability.
  • #18893 from sjanusz-r7 – Updates the help command to visually align command names to the same width to improve readability.

Bugs fixed (2)

  • #18873 from cgranleese-r7 – Fixes a regression that caused a CreateSession option to be available for payloads that did not make sense.
  • #18880 from jmartin-tech – Fixes a bug with the auxiliary/capture/ldap module’s handling of NTLM hashes.

Documentation

You can find the latest Metasploit documentation on our docsite at docs.metasploit.com.

Get it

As always, you can update to the latest Metasploit Framework with msfupdate
and you can get more details on the changes since the last blog post from
GitHub:

If you are a git user, you can clone the Metasploit Framework repo (master branch) for the latest.
To install fresh without using git, you can use the open-source-only Nightly Installers or the
commercial edition Metasploit Pro

How BMO improved data security with Amazon Redshift and AWS Lake Formation

Post Syndicated from Amy Tseng original https://aws.amazon.com/blogs/big-data/how-bmo-improved-data-security-with-amazon-redshift-and-aws-lake-formation/

This post is cowritten with Amy Tseng, Jack Lin and Regis Chow from BMO.

BMO is the 8th largest bank in North America by assets. It provides personal and commercial banking, global markets, and investment banking services to 13 million customers. As they continue to implement their Digital First strategy for speed, scale and the elimination of complexity, they are always seeking ways to innovate, modernize and also streamline data access control in the Cloud. BMO has accumulated sensitive financial data and needed to build an analytic environment that was secure and performant. One of the bank’s key challenges related to strict cybersecurity requirements is to implement field level encryption for personally identifiable information (PII), Payment Card Industry (PCI), and data that is classified as high privacy risk (HPR). Data with this secured data classification is stored in encrypted form both in the data warehouse and in their data lake. Only users with required permissions are allowed to access data in clear text.

Amazon Redshift is a fully managed data warehouse service that tens of thousands of customers use to manage analytics at scale. Amazon Redshift supports industry-leading security with built-in identity management and federation for single sign-on (SSO) along with multi-factor authentication. The Amazon Redshift Spectrum feature enables direct query of your Amazon Simple Storage Service (Amazon S3) data lake, and many customers are using this to modernize their data platform.

AWS Lake Formation is a fully managed service that simplifies building, securing, and managing data lakes. It provides fine-grained access control, tagging (tag-based access control (TBAC)), and integration across analytical services. It enables simplifying the governance of data catalog objects and accessing secured data from services like Amazon Redshift Spectrum.

In this post, we share the solution using Amazon Redshift role based access control (RBAC) and AWS Lake Formation tag-based access control for federated users to query your data lake using Amazon Redshift Spectrum.

Use-case

BMO had more than Petabyte(PB) of financial sensitive data classified as follows:

  1. Personally Identifiable Information (PII)
  2. Payment Card Industry (PCI)
  3. High Privacy Risk (HPR)

The bank aims to store data in their Amazon Redshift data warehouse and Amazon S3 data lake. They have a large, diverse end user base across sales, marketing, credit risk, and other business lines and personas:

  1. Business analysts
  2. Data engineers
  3. Data scientists

Fine-grained access control needs to be applied to the data on both Amazon Redshift and data lake data accessed using Amazon Redshift Spectrum. The bank leverages AWS services like AWS Glue and Amazon SageMaker on this analytics platform. They also use an external identity provider (IdP) to manage their preferred user base and integrate it with these analytics tools. End users access this data using third-party SQL clients and business intelligence tools.

Solution overview

In this post, we’ll use synthetic data very similar to BMO data with data classified as PII, PCI, or HPR. Users and groups exists in External IdP. These users federate for single sign on to Amazon Redshift using native IdP federation. We’ll define the permissions using Redshift role based access control (RBAC) for the user roles. For users accessing the data in data lake using Amazon Redshift Spectrum, we’ll use Lake Formation policies for access control.

Technical Solution

To implement customer needs for securing different categories of data, it requires the definition of multiple AWS IAM roles, which requires knowledge in IAM policies and maintaining those when permission boundary changes.

In this post, we show how we simplified managing the data classification policies with minimum number of Amazon Redshift AWS IAM roles aligned by data classification, instead of permutations and combinations of roles by lines of business and data classifications. Other organizations (e.g., Financial Service Institute [FSI]) can benefit from the BMO’s implementation of data security and compliance.

As a part of this blog, the data will be uploaded into Amazon S3. Access to the data is controlled using policies defined using Redshift RBAC for corresponding Identity provider user groups and TAG Based access control will be implemented using AWS Lake Formation for data on S3.

Solution architecture

The following diagram illustrates the solution architecture along with the detailed steps.

  1. IdP users with groups like lob_risk_public, Lob_risk_pci, hr_public, and hr_hpr are assigned in External IdP (Identity Provider).
  2. Each users is mapped to the Amazon Redshift local roles that are sent from IdP, and including aad:lob_risk_pci, aad:lob_risk_public, aad:hr_public, and aad:hr_hpr in Amazon Redshift. For example, User1 who is part of Lob_risk_public and hr_hpr will grant role usage accordingly.
  3. Attach iam_redshift_hpr, iam_redshift_pcipii, and iam_redshift_public AWS IAM roles to Amazon Redshift cluster.
  4. AWS Glue databases which are backed on s3 (e.g., lobrisk,lobmarket,hr and their respective tables) are referenced in Amazon Redshift. Using Amazon Redshift Spectrum, you can query these external tables and databases (e.g., external_lobrisk_pci, external_lobrisk_public, external_hr_public, and external_hr_hpr), which are created using AWS IAM roles iam_redshift_pcipii, iam_redshift_hpr, iam_redshift_public as shown in the solutions steps.
  5. AWS Lake Formation is used to control access to the external schemas and tables.
  6. Using AWS Lake Formation tags, we apply the fine-grained access control to these external tables for AWS IAM roles (e.g., iam_redshift_hpr, iam_redshift_pcipii, and iam_redshift_public).
  7. Finally, grant usage for these external schemas to their Amazon Redshift roles.

Walkthrough

The following sections walk you through implementing the solution using synthetic data.

Download the data files and place your files into buckets

Amazon S3 serves as a scalable and durable data lake on AWS. Using Data Lake you can bring any open format data like CSV, JSON, PARQUET, or ORC into Amazon S3 and perform analytics on your data.

The solutions utilize CSV data files containing information classified as PCI, PII, HPR, or Public. You can download input files using the provided links below. Using the downloaded files upload into Amazon S3 by creating folder and files as shown in below screenshot by following the instruction here. The detail of each file is provided in the following list:

Register the files into AWS Glue Data Catalog using crawlers

The following instructions demonstrate how to register files downloaded into the AWS Glue Data Catalog using crawlers. We organize files into databases and tables using AWS Glue Data Catalog, as per the following steps. It is recommended to review the documentation to learn how to properly set up an AWS Glue Database. Crawlers can automate the process of registering our downloaded files into the catalog rather than doing it manually. You’ll create the following databases in the AWS Glue Data Catalog:

  • lobrisk
  • lobmarket
  • hr

Example steps to create an AWS Glue database for lobrisk data are as follows:

  • Go to the AWS Glue Console.
  • Next, select Databases under Data Catalog.
  • Choose Add database and enter the name of databases as lobrisk.
  • Select Create database, as shown in the following screenshot.

Repeat the steps for creating other database like lobmarket and hr.

An AWS Glue Crawler scans the above files and catalogs metadata about them into the AWS Glue Data Catalog. The Glue Data Catalog organizes this Amazon S3 data into tables and databases, assigning columns and data types so the data can be queried using SQL that Amazon Redshift Spectrum can understand. Please review the AWS Glue documentation about creating the Glue Crawler. Once AWS Glue crawler finished executing, you’ll see the following respective database and tables:

  • lobrisk
    • lob_risk_high_confidential_public
    • lob_risk_high_confidential
  • lobmarket
    • credit_card_transaction_pci
    • credit_card_transaction_pci_public
  • hr
    • customers_pii_hpr_public
    • customers_pii_hpr

Example steps to create an AWS Glue Crawler for lobrisk data are as follows:

  • Select Crawlers under Data Catalog in AWS Glue Console.
  • Next, choose Create crawler. Provide the crawler name as lobrisk_crawler and choose Next.

Make sure to select the data source as Amazon S3 and browse the Amazon S3 path to the lob_risk_high_confidential_public folder and choose an Amazon S3 data source.

  • Crawlers can crawl multiple folders in Amazon S3. Choose Add a data source and include path S3://<<Your Bucket >>/ lob_risk_high_confidential.

  • After adding another Amazon S3 folder, then choose Next.

  • Next, create a new IAM role in the Configuration security settings.
  • Choose Next.

  • Select the Target database as lobrisk. Choose Next.

  • Next, under Review, choose Create crawler.
  • Select Run Crawler. This creates two tables : lob_risk_high_confidential_public and lob_risk_high_confidential under database lobrisk.

Similarly, create an AWS Glue crawler for lobmarket and hr data using the above steps.

Create AWS IAM roles

Using AWS IAM, create the following IAM roles with Amazon Redshift, Amazon S3, AWS Glue, and AWS Lake Formation permissions.

You can create AWS IAM roles in this service using this link. Later, you can attach a managed policy to these IAM roles:

  • iam_redshift_pcipii (AWS IAM role attached to Amazon Redshift cluster)
    • AmazonRedshiftFullAccess
    • AmazonS3FullAccess
    • Add inline policy (Lakeformation-inline) for Lake Formation permission as follows:
      {
         "Version": "2012-10-17",
          "Statement": [
              {
                  "Sid": "RedshiftPolicyForLF",
                  "Effect": "Allow",
                  "Action": [
                      "lakeformation:GetDataAccess"
                  ],
                  "Resource": "*"
              }
          ]

    • iam_redshift_hpr (AWS IAM role attached to Amazon Redshift cluster): Add the following managed:
      • AmazonRedshiftFullAccess
      • AmazonS3FullAccess
      • Add inline policy (Lakeformation-inline), which was created previously.
    • iam_redshift_public (AWS IAM role attached to Amazon Redshift cluster): Add the following managed policy:
      • AmazonRedshiftFullAccess
      • AmazonS3FullAccess
      • Add inline policy (Lakeformation-inline), which was created previously.
    • LF_admin (Lake Formation Administrator): Add the following managed policy:
      • AWSLakeFormationDataAdmin
      • AWSLakeFormationCrossAccountManager
      • AWSGlueConsoleFullAccess

Use Lake Formation tag-based access control (LF-TBAC) to access control the AWS Glue data catalog tables.

LF-TBAC is an authorization strategy that defines permissions based on attributes. Using LF_admin Lake Formation administrator, you can create LF-tags, as mentioned in the following details:

Key Value
Classification:HPR no, yes
Classification:PCI no, yes
Classification:PII no, yes
Classifications non-sensitive, sensitive

Follow the below instructions to create Lake Formation tags:

  • Log into Lake Formation Console (https://console.aws.amazon.com/lakeformation/) using LF-Admin AWS IAM role.
  • Go to LF-Tags and permissions in Permissions sections.
  • Select Add LF-Tag.

  • Create the remaining LF-Tags as directed in table earlier. Once created you find the LF-Tags as show below.

Assign LF-TAG to the AWS Glue catalog tables

Assigning Lake Formation tags to tables typically involves a structured approach. The Lake Formation Administrator can assign tags based on various criteria, such as data source, data type, business domain, data owner, or data quality. You have the ability to allocate LF-Tags to Data Catalog assets, including databases, tables, and columns, which enables you to manage resource access effectively. Access to these resources is restricted to principals who have been given corresponding LF-Tags (or those who have been granted access through the named resource approach).

Follow the instruction in the give link to assign  LF-TAGS to Glue Data Catalog Tables:

Glue Catalog Tables Key Value
customers_pii_hpr_public Classification non-sensitive
customers_pii_hpr Classification:HPR yes
credit_card_transaction_pci Classification:PCI yes
credit_card_transaction_pci_public Classifications non-sensitive
lob_risk_high_confidential_public Classifications non-sensitive
lob_risk_high_confidential Classification:PII yes

Follow the below instructions to assign a LF-Tag to Glue Tables from AWS Console as follows:

  • To access the databases in Lake Formation Console, go to the Data catalog section and choose Databases.
  • Select the lobrisk database and choose View Tables.
  • Select lob_risk_high_confidential table and edit the LF-Tags.
  • Assign the Classification:HPR as Assigned Keys and Values as Yes. Select Save.

  • Similarly, assign the Classification Key and Value as non-sensitive for the lob_risk_high_confidential_public table.

Follow the above instructions to assign tables to remaining tables for lobmarket and hr databases.

Grant permissions to resources using a LF-Tag expression grant to Redshift IAM Roles

Grant select, describe Lake Formation permission to LF-Tags and Redshift IAM role using Lake Formation Administrator in Lake formation console. To grant, please follow the documentation.

Use the following table to grant the corresponding IAM role to LF-tags:

IAM role LF-Tags Key LF-Tags Value Permission
iam_redshift_pcipii Classification:PII yes Describe, Select
. Classification:PCI yes .
iam_redshift_hpr Classification:HPR yes Describe, Select
iam_redshift_public Classifications non-sensitive Describe, Select

Follow the below instructions to grant permissions to LF-tags and IAM roles:

  • Choose Data lake permissions in Permissions section in the AWS Lake Formation Console.
  • Choose Grants. Select IAM users and roles in Principals.
  • In LF-tags or catalog resources select Key as Classifications and values as non-sensitive.

  • Next, select Table permissions as Select & Describe. Choose grants.

Follow the above instructions for remaining LF-Tags and their IAM roles, as shown in the previous table.

Map the IdP user groups to the Redshift roles

In Redshift, use Native IdP federation to map the IdP user groups to the Redshift roles. Use Query Editor V2.

create role aad:rs_lobrisk_pci_role;
create role aad:rs_lobrisk_public_role;
create role aad:rs_hr_hpr_role;
create role aad:rs_hr_public_role;
create role aad:rs_lobmarket_pci_role;
create role aad:rs_lobmarket_public_role;

Create External schemas

In Redshift, create External schemas using AWS IAM roles and using AWS Glue Catalog databases. External schema’s are created as per data classification using iam_role.

create external schema external_lobrisk_pci
from data catalog
database 'lobrisk'
iam_role 'arn:aws:iam::571750435036:role/iam_redshift_pcipii';

create external schema external_hr_hpr
from data catalog
database 'hr'
iam_role 'arn:aws:iam::571750435036:role/iam_redshift_hpr';

create external schema external_lobmarket_pci
from data catalog
database 'lobmarket'
iam_role 'arn:aws:iam::571750435036:role/iam_redshift_pcipii';

create external schema external_lobrisk_public
from data catalog
database 'lobrisk'
iam_role 'arn:aws:iam::571750435036:role/iam_redshift_public';

create external schema external_hr_public
from data catalog
database 'hr'
iam_role 'arn:aws:iam::571750435036:role/iam_redshift_public';

create external schema external_lobmarket_public
from data catalog
database 'lobmarket'
iam_role 'arn:aws:iam::571750435036:role/iam_redshift_public';

Verify list of tables

Verify list of tables in each external schema. Each schema lists only the tables Lake Formation has granted to IAM_ROLES used to create external schema. Below is the list of tables in Redshift query edit v2 output on top left hand side.

Grant usage on external schemas to different Redshift local Roles

In Redshift, grant usage on external schemas to different Redshift local Roles as follows:

grant usage on schema external_lobrisk_pci to role aad:rs_lobrisk_pci_role;
grant usage on schema external_lobrisk_public to role aad:rs_lobrisk_public_role;

grant usage on schema external_lobmarket_pci to role aad:rs_lobmarket_pci_role;
grant usage on schema external_lobmarket_public to role aad:rs_lobmarket_public_role;

grant usage on schema external_hr_hpr_pci to role aad:rs_hr_hpr_role;
grant usage on schema external_hr_public to role aad:rs_hr_public_role;

Verify access to external schema

Verify access to external schema using user from Lob Risk team. User lobrisk_pci_user federated into Amazon Redshift local role rs_lobrisk_pci_role. Role rs_lobrisk_pci_role only has access to external schema external_lobrisk_pci.

set session_authorization to creditrisk_pci_user;
select * from external_lobrisk_pci.lob_risk_high_confidential limit 10;

On querying table from external_lobmarket_pci schema, you’ll see that your permission is denied.

set session_authorization to lobrisk_pci_user;
select * from external_lobmarket_hpr.lob_card_transaction_pci;

BMO’s automated access provisioning

Working with the bank, we developed an access provisioning framework that allows the bank to create a central repository of users and what data they have access to. The policy file is stored in Amazon S3. When the file is updated, it is processed, messages are placed in Amazon SQS. AWS Lambda using Data API is used to apply access control to Amazon Redshift roles. Simultaneously, AWS Lambda is used to automate tag-based access control in AWS Lake Formation.

Benefits of adopting this model were:

  1. Created a scalable automation process to allow dynamically applying changing policies.
  2. Streamlined the user accesses on-boarding and processing with existing enterprise access management.
  3. Empowered each line of business to restrict access to sensitive data they own and protect customers data and privacy at enterprise level.
  4. Simplified the AWS IAM role management and maintenance by greatly reduced number of roles required.

With the recent release of Amazon Redshift integration with AWS Identity center which allows identity propagation across AWS service can be leveraged to simplify and scale this implementation.

Conclusion

In this post, we showed you how to implement robust access controls for sensitive customer data in Amazon Redshift, which were challenging when trying to define many distinct AWS IAM roles. The solution presented in this post demonstrates how organizations can meet data security and compliance needs with a consolidated approach—using a minimal set of AWS IAM roles organized by data classification rather than business lines.

By using Amazon Redshift’s native integration with External IdP and defining RBAC policies in both Redshift and AWS Lake Formation, granular access controls can be applied without creating an excessive number of distinct roles. This allows the benefits of role-based access while minimizing administrative overhead.

Other financial services institutions looking to secure customer data and meet compliance regulations can follow a similar consolidated RBAC approach. Careful policy definition, aligned to data sensitivity rather than business functions, can help reduce the proliferation of AWS IAM roles. This model balances security, compliance, and manageability for governance of sensitive data in Amazon Redshift and broader cloud data platforms.

In short, a centralized RBAC model based on data classification streamlines access management while still providing robust data security and compliance. This approach can benefit any organization managing sensitive customer information in the cloud.


About the Authors

Amy Tseng is a Managing Director of Data and Analytics(DnA) Integration at BMO. She is one of the AWS Data Hero. She has over 7 years of experiences in Data and Analytics Cloud migrations in AWS. Outside of work, Amy loves traveling and hiking.

Jack Lin is a Director of Engineering on the Data Platform at BMO. He has over 20 years of experience working in platform engineering and software engineering. Outside of work, Jack loves playing soccer, watching football games and traveling.

Regis Chow is a Director of DnA Integration at BMO. He has over 5 years of experience working in the cloud and enjoys solving problems through innovation in AWS. Outside of work, Regis loves all things outdoors, he is especially passionate about golf and lawn care.

Nishchai JM is an Analytics Specialist Solutions Architect at Amazon Web services. He specializes in building Big-data applications and help customer to modernize their applications on Cloud. He thinks Data is new oil and spends most of his time in deriving insights out of the Data.

Harshida Patel is a Principal Solutions Architect, Analytics with AWS.

Raghu Kuppala is an Analytics Specialist Solutions Architect experienced working in the databases, data warehousing, and analytics space. Outside of work, he enjoys trying different cuisines and spending time with his family and friends.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close