Tag Archives: Security Blog

Elevate your AI security: Must-see re:Inforce 2025 sessions

Post Syndicated from Margaret Jonson original https://aws.amazon.com/blogs/security/reinforce-2025-genai-sessions/

AWS re:Inforce 2025: June 16-18 in Philadelphia, PA

A full conference pass is $1,099. Register today with the code flashsale150 to receive a limited time $150 discount, while supplies last.

From proof of concepts to large scale production deployments, the rapid advancement of generative AI has ushered in unique opportunities for innovation, but it also introduces a new set of security challenges (and opportunities) that organizations must address. How do you protect retrieval-augmented generation (RAG) or training data while maintaining model effectiveness? What controls are needed for large language model (LLM) interactions? How can I take full advantage of AI agents and model context protocol (MCP) while minimizing risk? At AWS re:Inforce 2025, we’re bringing together security experts, practitioners, and industry leaders to answer these questions with real-world, prescriptive guidance and more.

This year, our generative AI security sessions have been specifically curated and designed to help you build and maintain secure, production AI systems at scale. Whether you’re just beginning your AI security journey or leading mature, enterprise-wide AI initiatives, you’ll find deep practical guidance, hands-on experience, and strategic insights to advance your organization’s security posture.

From foundational concepts to advanced defensive techniques, these sessions encompass critical areas including data protection, model security, identity management, and AI agent resilience. You’ll learn directly from AWS security experts, customers who have successfully implemented secure AI systems, and industry leading partners who are setting new standards in AI safety and security.

In this blog, we highlight some “can’t miss” sessions that cover how to secure AI, but also how security practitioners can leverage AI to help with their critical security missions as well! Join in on the fun, and register for re:Inforce 2025!

Innovation talk

Engage with top AWS executives in our Innovation Talks series, where you’ll gain invaluable insights into the forefront of cloud technology. Explore the latest advancements in generative AI, discover robust cloud security strategies, and uncover pioneering architectural concepts that are revolutionizing application development and expanding the possibilities of the AWS Cloud.

SEC301 | Innovation Talk | From possibility to production: A strong, flexible foundation for AI security
Speakers: Hart Rossman (AWS) & Becky Weiss (AWS)
Discover how AWS removes the heavy lifting of AI security, enabling you to accelerate from development to production. This session reveals how the proven AWS security foundation, combined with flexible controls and automated reasoning, helps organizations confidently deploy AI innovations. Through real-world examples, learn how to transform security from a potential roadblock into an innovation enabler. Leave with practical guidance for securing AI workloads today and strategic insights into addressing emerging security challenges, including data security and agentic AI. Learn how the AWS approach to AI security helps you start ahead while maintaining strong security controls.

Breakout sessions, chalk talks, and lightning talks

Breakout sessions are lecture-style, one-hour sessions delivered by AWS experts, customers, and partners—perfect for deepening your knowledge on important topics, gaining actionable insights, and connecting with industry leaders.

Chalk talks are one-hour long, highly interactive sessions with a small audience. This format is ideal for diving deep into specific topics, engaging directly with AWS experts, and getting your questions answered in real time.

Lightning talks are short (20 minute) theater presentations dedicated to a specific customer story, service demo, or AWS Partner offering.

SEC303 | Breakout session | Behind the shields: AWS and Anthropic’s approach to secure AI
Speakers: Matt Saner (AWS) & Shahzeb Jiwani (Anthropic)
Enterprise AI adoption demands robust security. In this session, Join Anthropic’s head of risk governance along with AWS security leaders to reveal how AWS and Anthropic collaborate to deliver enterprise-grade security for LLMs and the generative AI workloads they enable. Learn about the multi-layered security approach spanning infrastructure, data, and models. We’ll explore real-world security architectures, governance frameworks, and risk mitigation strategies. You will leave with a deeper understanding of how to leverage AWS and Anthropic’s security capabilities to accelerate your organization’s AI initiatives while maintaining stringent security and compliance requirements.

SEC304 | Breakout session | Amazon.com testing frameworks and tools for GenAI security and privacy
Speakers: Alex Torres (AWS), Josh Haycraft (Amazon), & Jess Clark (Amazon)
GenAI solutions are launching in a unique, rapidly-shifting security landscape: they may be trained on customer data, they may integrate with internal services or datastores, and they will provide generated content to customers or to other systems. Learn how Amazon.com creates toolkits, systems and frameworks to leverage Large Language Models and Generative AI to enrich customer interactions to promote agility and innovation.

TDR301 | Breakout session | Innovations in AWS detection and response for integrated security outcomes
Speakers: Himanshu Verma (AWS) & Ryan Holland (AWS)
Discover how the latest AWS detection and response capabilities can help secure your cloud environment more effectively. Learn practical ways to achieve integrated security outcomes through enhanced threat detection, automated vulnerability management, and streamlined response – all at scale. We’ll show you how to use AWS security services to protect workloads and data, centralize security monitoring, manage security posture continuously, and unify security data, while leveraging generative AI for security operations. Walk away with actionable insights on integrating AWS detection and response services to strengthen and simplify your security across AWS.

SEC431 | Chalk talk | Dive deep into data protection architectures for Amazon Bedrock Agents
Speakers: Andrew Kane (AWS) & Gabrielle Dompreh (AWS)
Join this chalk talk to understand how Amazon Bedrock protects your data across Agents and related features, such as Knowledge Bases and Guardrails. Learn about security considerations for cross-region deployments, multi-agent collaboration, and prompt caching. Gain deep insights into architecting secure generative AI solutions that maintain data protection, and discover architectural patterns that keep your applications safe and secure.

APS231 | Chalk talk | Using AWS services to mitigate the OWASP Top 10 for LLM threats
Speakers: Mark Keating (AWS) & Cameron Smith (AWS)
You’ve identified your generative AI use case, tested it and are creating a secure application architecture design. How do you know what generative AI specific threats you should be protecting against, and what tools or services are available that can help? You may have heard of the OWASP Top 10 for LLM Applications, but where or how do you start? Join us as we discuss the OWASP Top 10 threats, the differences between versions, and how AWS can help you mitigate these threats.

DAP332 | Chalk talk | Executive perspective: Risk management for generative AI workloads
Speakers: Jason Garman (AWS) & Mark Ryland (AWS)
Don’t let the perceived complexity of responsible AI keep you from deploying generative AI applications on AWS. In this chalk talk, we will present a framework for breaking down AI safety and security risks, introduce AWS best practices for keeping enterprise data secure in generative AI applications using zero trust principles, and mitigate safety risks using technologies such as Bedrock Guardrails. Discover as a group with fellow security leaders how to identify safety and security risks relevant to your workload, implement appropriate mitigation strategies, and measure efficacy over time.

GRC337 | Chalk talk | Build compliant AI: Implementing controls for emerging regulations
Speakers: Samuel Waymouth (AWS) & Mark Keating (AWS)

As AI adoption accelerates, organizations face increasing regulatory scrutiny and compliance requirements. In this session, learn about the evolving global regulatory landscape for AI, data privacy, and data sovereignty, then see how you can map regulatory requirements and security controls to AWS services and features. We will demonstrate how generative AI can work as a tool for assessment, risk classification and generating compliance guidance. We also show you how to use the latest threat modelling resources developed by AWS. Security professionals and AI practitioners will learn actionable strategies for building AI systems aligned with compliance standards while also maintaining innovation velocity.

SEC221 | Lightning talk | Raising the tide: How AWS is shaping the future of secure AI
Speakers: Matt Saner (AWS)
AI security is a top priority for AWS. By building AI solutions that are secure by design, AWS helps customers innovate quickly with confidence while mitigating emerging threats. But securing AI goes beyond individual organizations—it requires industry-wide standards and best practices. AWS actively contributes to global AI security efforts, including its participation industry standards bodies such as CoSAI (The Coalition for Secure AI), to make sure AI technologies are safe, resilient, and trustworthy. This session will explore how AWS is leading AI security innovation, protecting customers, and collaborating to help shape the future of AI security for the entire industry.

SEC322 | Lightning talk | Managing digital identity in the age of generative AI
Speakers: Arthur Mnev (AWS) & Lily Ashidam (AWS)
In this session, we will explore the challenges and solutions for managing identities in generative AI workloads. This session covers securing API access for LLMs, implementing proper authentication for, in, and with AI services, and maintaining data lineage. Learn practical approaches towards securing generative AI applications while maintaining compliance and governance requirements.

SEC323 | Lightning talk | A practical guide to generative AI agent resilience
Speakers: Yiwen Zhang (AWS) & Jennifer Moran (AWS)
As generative AI agents dominate headlines and technological discussions, enterprise adoption remains in its infancy. GenAI agent resilience is a crucial factor in successful implementation and building user trust. While traditional workload resilience practices—such as database availability, workload capacity, observability, and disaster recovery—remain relevant, GenAI agents present unique challenges. This session delves into the critical dimensions of GenAI agent resilience, including LLM model adaptability, latency management, tool availability, observability, and financial sustainability. We will share practical strategies for building robust, reliable GenAI agents that enterprises can trust and maintain.

SEC326 | Lightning Talk | Secure remote MCP server deployment for Gen AI on AWS
Speakers: Aaron Brown (AWS) & James Ferguson (AWS)
Discover how to securely build and deploy remote Model Context Protocol (MCP) servers on AWS that implement the protocol’s security and trust principles. This session demonstrates OAuth 2.1 authorization patterns that enforce user consent, data privacy, and tool safety requirements. Learn to implement robust security controls using Amazon Cognito, API Gateway, and Lambda while maintaining protocol compliance. Explore practical examples of authorization flows, access controls, and security monitoring that align with MCP specifications.

TDR322 | Lightning talk | How AWS uses generative AI to advance native security services
Speakers: Marshall Jones (AWS) & Himanshu Verma (AWS)
Discover how AWS leverages generative AI to enhance native security services. This session demonstrates how AWS implements AI capabilities across its security portfolio to improve threat detection, investigation, and response. Explore practical implementations in Amazon GuardDuty and Amazon Inspector that enable automated analysis and natural language security queries. Leave with insights into how AWS makes security more intelligent and efficient through generative AI.

Interactive sessions (builders’ sessions, code talks, and workshops)

Interact with small groups led by an AWS expert providing interactive learning about how to build on AWS. Each builders’ session begins with a short explanation or demonstration of what attendees are building—then it’s your turn to build! The expert will guide you end-to-end through this hands-on experience. Or join Code Talks, our code-focused interactive sessions where AWS experts lead a discussion featuring live coding or code samples as they illuminate the “why” behind AWS solutions. Attendees are encouraged to ask questions and follow along.

Workshops are two-hour interactive sessions where you collaborate in teams or work individually to solve real-world challenges by using AWS services, making them perfect for hands-on learning. Each workshop begins with a brief lecture, followed by dedicated time to work through the problem.

Note: Don’t forget to bring your laptop to build alongside AWS experts.

SEC351 | Builders’ session | Accelerating incident response, compliance & auditing using generative AI
Speakers: Snehal Nahar (AWS), Ravindra Kori (AWS), Rayette Toles-Abdullah (AWS), & Abhijit Barde (AWS)
In this session, we will learn how to use AWS native generative AI capabilities to reduce time to recovery after an incident using enterprise communication tools such as Slack. We will also learn how to use detective controls to identify events that may result in an incident, and also how to use preventive controls to mitigate the risk of an incident occurring. We will use services like Amazon Q Developer, AWS Config, AWS CloudTrail Lake, Amazon CloudWatch and other observability features.

SEC352 | Builders’ session | Agentic AI for security: Building intelligent egress traffic controls
Speakers: Ranjith Rayaprolu (AWS), Anil Nadiminti (AWS), Michael Leighty (AWS), & Dwaragha Sivalingam (AWS)
Learn to build AI-powered security agents that protect your cloud infrastructure. This hands-on session shows you how to use Amazon Bedrock and Bedrock Agents to create intelligent systems that watch over your network. You’ll build Generative AI agents that monitor egress traffic, spot potential threats, and automatically update network firewall to block malicious traffic. Walk away with the skills to implement AI-powered security agents that can reason, decide, and act to protect your cloud infrastructure.

SEC353 | Builders’ session | Threat modeling for generative AI applications
Speakers: Laura Verghote (AWS), Isabelle Mos (AWS), Samuel Waymouth (AWS), & Omar Zoma (AWS)
In this builders’ session, you will learn how to systematically identify and analyze security threats specific to generative AI applications. As organizations rapidly adopt large language models and other generative AI capabilities, understanding the unique security challenges – from prompt injection to data poisoning – becomes critical. You will be guided through the process of creating threat models for common generative AI architectures, with a particular focus on applications built using AWS services like Amazon Bedrock.

SEC451 | Builders’ session | From logs to defense: Generative AI for security automation
Speakers: Ravindra Kori (AWS), Siavash Iran (AWS), Lily Ashidam (AWS), & Yiwen Zhang (AWS)
In this technical session, we’ll demonstrate how to transform traditional operating system log analysis into an intelligent, automated defense system using AWS native services and generative AI. We’ll explore how to build a comprehensive solution that captures security-relevant logs from Windows and Linux systems.

APS351 | Builders’ session | Securing generative AI agents using AWS Well-Architected Framework
Speakers: Krupanidhi Jay (AWS), Ryan Dsouza (AWS), Birender Pal (AWS), & Omkar Mukadam (AWS)
Learn hands-on how to build secure generative AI agent solutions following the AWS Well-Architected Framework’s Generative AI Lens security best practices. Work through practical implementations of endpoint security, prompt engineering guardrails, monitoring systems, and protection against excessive agency while building a production-ready generative AI agent. Through hands-on exercises, build a secure generative AI agent solution incorporating these controls on AWS, involving Amazon Bedrock, Amazon CloudWatch, AWS Identity and Access Management (IAM), and more. You must bring your laptop to participate.

APS353 | Builders’ session | Red teaming your LLM security at scale
Speakers: Otto Kruse (AWS), Owen Hawkins (AWS), Aaron Brown (AWS), & Jeff Lombardo (AWS)
Step into the shoes of an AI-powered red team adversary in the GenAI Red Team Challenge. In this intensive workshop, you’ll deploy an AI security agent to orchestrate sophisticated threat chains against GenAI applications, systematically discovering and exploiting vulnerabilities from prompt injection to boundary testing while mastering automated security testing workflows. In addition, you’ll learn to apply countermeasures, from prompt templating to guardrails. This hands-on, gamified experience helps you think like a threat actor and equips you with practical skills in automated vulnerability testing and risk mitigation against common MITRE and OWASP vulnerabilities for LLM-based applications. You must bring your laptop to participate.

GRC354 | Builders’ session | Best practices for using generative AI to manage cloud compliance
Speakers: Adnan Bilwani (AWS), Ali Maaz (AWS), Artur Rodrigues (AWS), & Peter Pereira (AWS)
Learn how to leverage Amazon Q Developer to streamline cloud compliance management using AWS Config. This hands-on builders’ session demonstrates how to create intelligent compliance checks, automate remediation workflows, and generate detailed compliance reports using generative AI capabilities. Through practical exercises, learn to implement automated compliance monitoring that combines the power of generative AI with AWS Config’s robust compliance framework. You must bring your laptop to participate.

IAM451 | Builders’ session | Securing GenAI apps: Fine-grained access control for Bedrock Agents
Speakers: Edward Sun (AWS), Pravin Nair (AWS), Dustin Ellis (AWS), & Kevin Hakanson (AWS)
Want to secure generative AI applications accessing your organizational data? Learn how to implement intelligent access controls for Amazon Bedrock-powered applications accessing your organizational data. In this builders’ session, you’ll build a defense-in-depth approach that combines authentication using Amazon Cognito and fine-grained authorization with Amazon Verified Permissions to secure access for Bedrock AI agents. Implement layered permissions that protect sensitive data without limiting your GenAI capabilities. You must bring your laptop to participate.

TDR251 | Builders’ session | Build your first AI security assistant with Amazon Q
Speakers: Scott Taggart (AWS), Joe Wagner (AWS), Laura Verghote (AWS), & Riggs Goodman III (AWS)
Discover how to build your first AI-powered security assistant using Amazon Q Business – no AI expertise required. In this hands-on session, you’ll create three practical security workflows: an automated Amazon GuardDuty incident investigator that contextualizes security findings, an AWS Security Hub compliance report generator that streamlines policy assessments, and an Amazon Inspector-based vulnerability management helper that accelerates remediation. Perfect for security practitioners who want to enhance AWS security operations with generative AI while mastering core AWS security services through practical application. You must bring your laptop to participate.

IAM441 | Code talk | The right way to secure AI agents with code examples
Speakers: Jeff Lombardo (AWS) & Fei Yuan (AWS)
Generative AI agents run tasks on behalf of human users and often interact with each other across on-premises environments and different cloud providers. This brings new challenges in identity authentication, propagation, delegation, and resource authorization in the overall agentic AI solution. Learn how Amazon Cognito’s OAuth2-based identity management, machine-to-machine authentication, combined with Amazon Verified Permissions fine-grained authorization can enable secure delegation patterns for AI agents, while preserving human identity and consent, agent machine identity, and other request context throughout the agent chain. We will walk through real-world examples with agents built on Amazon Bedrock or other frameworks.

TDR341 | Code talk | Build AI security agents with Amazon Bedrock and Amazon Security Lake
Speakers: Chris Lamont-Smith (AWS) & Pratima Singh (AWS)
In this code talk, explore how to enhance security operations by creating AI agents using Amazon Bedrock and Amazon Security Lake. Through live coding demonstrations, learn to build automated workflows that combine autonomous decision-making capabilities with generative AI for security analysis and response. See how to implement agents that analyze logs, provide contextual insights, and execute response procedures. Discover practical approaches for integrating custom tools and leveraging large language models in your security workflows.

SEC371 | Workshop | Red Team approaches to practical generative AI defenses
Speakers: Mac Stevens (AWS) & Cameron Smith (AWS)
This workshop takes a hands-on approach to Generative AI security, focusing on Amazon Bedrock, Amazon SageMaker, and related services. We’ll begin by examining Bedrock’s core security principles, including data protection during inference and in features like Agents, Guardrails, and Knowledge Bases. Participants will gain insights into the internal architectures and security implications of context windows, system prompts, agent orchestration, and more. The session then transitions into hands-on red teaming exercises using SageMaker. We’ll subsequently explore defensive strategies against these threat vectors and discuss methods for integrating these practices into development workflows. Participants will leave equipped with a holistic understanding of Generative AI security, from individual model protection to safeguarding complex, multi-component systems.

APS371 | Workshop | Securing your generative AI applications on AWS
Speakers: Mark Keating (AWS) & Maitreya Ranganath (AWS)
In this workshop, discover how to secure generative AI applications using AWS services and features. Explore how to deploy a vulnerable sample generative AI application and then layer security controls to protect, detect, and respond to security issues. Learn how to apply similar controls to the generative AI applications in your organization. You must bring your laptop to participate.

DAP371 | Workshop | Defend your AI: Mitigate prompt injection with Amazon Bedrock
Speakers: Mark Keating (AWS) & Maitreya Ranganath (AWS)
Master the art of identifying and mitigating prompt injection vulnerabilities in generative AI systems through this hands-on workshop. Using Amazon Bedrock, participants will explore both offensive and defensive prompt engineering techniques to understand the security implications of large language models in production environments. In this session you will understand how prompt injection attacks work, complete an interactive ‘capture the flag’ style challenge attempting to exploit a simulated AI environment, and learn to implement defensive controls using Amazon Bedrock Guardrails. You must bring your laptop to participate.

Register now

Don’t miss this opportunity to learn from industry experts and AWS leaders about securing your AI implementations. Register for AWS re:Inforce 2025 today to reserve your spot in these sessions. Browse the full re:Inforce catalog to learn more about sessions in other tracks, plus partner sessions and code talks.

If you have feedback about this post, submit comments in the Comments section below.

Margaret Jonson

Margaret Jonson

Margaret is a Senior Product Marketing Manager for AWS generative AI security, where she partners with AI/ML teams to help customers implement secure and governed AI solutions across Amazon Bedrock, Amazon SageMaker, Amazon Q, and other AI/ML solutions.

Matt Saner

Matt Saner

As a Senior Manager at AWS, Matt leads a team of security specialists who help the world’s most complex organizations tackle critical security challenges. Matt and his team work to transform security organizations into strategic business enablers. Before joining AWS, Matt spent close to two decades in the financial services industry. Outside of work, Matt is a pilot who finds joy in flying general aviation aircraft.

Navigating the threat detection and incident response track at re:Inforce 2025

Post Syndicated from Nisha Amthul original https://aws.amazon.com/blogs/security/navigating-the-threat-detection-and-incident-response-track-at-reinforce-2025/

AWS re:Inforce 2025: June 16-18 in Philadelphia, PA

A full conference pass is $1,099. Register today with the code flashsale150 to receive a limited time $150 discount, while supplies last.

We’re counting down to AWS re:Inforce, our annual cloud security event! We are thrilled to invite security enthusiasts and builders to join us in Philadelphia, PA June 16–18, 2025, for an immersive three-day journey into cloud security learning. At AWS re:Inforce, you’ll have the chance to explore the breadth of the Amazon Web Services (AWS) security landscape, learn how to operationalize security services, and enhance your skills and confidence in cloud security to improve your organization’s security posture. As an attendee, you will have access to over 250 sessions across multiple topic tracks, including data protection; identity and access management; threat detection and incident response; network and infrastructure security; generative AI; governance, risk, and compliance; and application security. Plus, get ready to be inspired by our lineup of customer speakers, who will share their firsthand experiences of innovating securely on AWS.

In this post, we provide an overview of the key sessions that include lecture-style presentations featuring real-world use cases from our customers and interactive small-group sessions led by AWS experts that guide you through practical problems and solutions.

The threat detection and incident response track is designed to demonstrate how to detect and respond to security risks to help protect workloads at scale. AWS experts and customers will present key topics such as unified cloud security, threat detection, vulnerability management, cloud security posture management, integrated detection-to-response, threat intelligence, operationalization of AWS security services, container security, effective security investigation, security analytics, and incident response best practices. We’ll also explore both strengthening security through the use of generative AI and securing generative AI workloads.

Breakout sessions, chalk talks, and lightning talks

TDR301 | Breakout session | Innovations in AWS detection and response for integrated security outcomes
Discover how AWS’s latest detection and response capabilities can help secure your cloud environment more effectively. Learn practical ways to achieve integrated security outcomes through enhanced threat detection, automated vulnerability management, and streamlined response—all at scale. We’ll show you how to use AWS security services to protect workloads and data, centralize security monitoring, manage security posture continuously, and unify security data, while leveraging generative AI for security operations. Walk away with actionable insights on integrating AWS detection and response services to strengthen and simplify your security across AWS.

TDR302 | Breakout session | Multi-stage threat detection using GuardDuty and MITRE
Enhance your threat detection capabilities by leveraging Amazon GuardDuty Extended Threat Detection alongside MITRE frameworks. In this session, Shane Steiger Esq. from MITRE Corp demonstrates how to effectively identify and respond to multi-stage security events in your AWS environment. Learn practical strategies for implementing detection controls, developing response procedures, and building resilient cloud architectures. Discover how integrating GuardDuty with MITRE frameworks can strengthen your event detection and response strategy.

TDR303 | Breakout session | Building secure generative AI security tools, featuring Trellix
Learn how to build enterprise-grade generative AI security tools that unify security data and enable natural language investigations. This session demonstrates practical approaches for developing secure generative AI solutions, including implementation patterns for data privacy and compliance controls. Explore real-world architectures combining AWS foundation models with security orchestration. Hear how Trellix achieved 23x cost savings while maintaining 95% accuracy using Amazon Bedrock models. Leave with strategies to build secure AI assistants that support your security teams.

TDR304 | Breakout session | Scaling AWS threat intelligence to protect customers
Discover how AWS builds and operates threat intelligence at unprecedented scale to protect millions of customers. In this session, dive deep into two critical security functions: Amazon Threat Intelligence, which tracks and defends against sophisticated threats, and Active Defense, our security data processing architecture that analyzes over 4 billion records per second. Learn how these capabilities work together to power AWS security services and provide automated protection for your applications. See how AWS uses this intelligence to continuously enhance security services that help keep your workloads safe.

TDR305 | Breakout session | Scale Vulnerability Management Using Amazon Inspector
Want to strengthen Lambda security and streamline vulnerability management? Learn how Amazon Inspector uses generative AI to provide in-context code patches and automate SBOM management. Discover practical techniques for CI/CD integration, cross-account scanning, and automated remediation workflows. Explore built-in integrations with Security Hub and EventBridge to enhance security operations across your AWS environment.

TDR306 | Breakout session | Enterprise Security at Scale: SAP’s AWS Blueprint
How does SAP protect thousands of AWS accounts? Learn their blueprint for implementing Amazon GuardDuty protection plans alongside Extended Threat Detection to identify sophisticated threat patterns. Discover their framework for standardizing AWS Security Hub controls and automated remediation workflows at scale. Walk away with practical strategies to enhance enterprise security operations across AWS Organizations.

TDR331 | Chalk talk | Ask AWS: Your ransomware questions answered
Get answers to your most critical ransomware questions in this interactive Q&A session. Learn how AWS security features and best practices can help you detect, respond to, and recover from ransomware threats. Our experts will share practical guidance on identifying early warning signs, implementing effective incident response, and strengthening your overall ransomware resilience. Bring your toughest questions about emerging ransomware tactics and cloud protection strategies. Walk away with actionable insights to help secure your data and operations using AWS security capabilities.

TDR332 | Chalk talk | Decoding AWS CIRT tactics & techniques for proactive defense
Learn directly from AWS Customer Incident Response Team (CIRT) experts who help customers respond to critical security events. Discover real-world insights about emerging threat tactics and techniques observed across AWS environments. We’ll share practical detection and mitigation strategies that align with the Shared Responsibility Model, helping you strengthen your security posture. Walk away with actionable best practices from CIRT’s frontline experience defending against evolving threats, and learn how to apply these insights to protect your AWS workloads.

TDR333 | Chalk talk | Strategy for prioritization and response
Join this session to discuss managing security posture and risk across multiple accounts, regions, and resources. We will explore the decision-making process around how you prioritize security alerts and risk using AWS security services. After prioritization, we will discuss a framework for responding to and remediating security findings. We will talk through the decision-making process of responding to findings, considerations for auto-remediation, and how to facilitate a quick and thorough response to the most critical security findings.

TDR334 | Chalk talk | Strengthen Security: Making GuardDuty Protection Plans Work for You
Discover how to maximize your threat detection capabilities by selecting the right Amazon GuardDuty protection plans for your environment. Learn to evaluate protection features that matter most for your AWS workloads and understand the value each plan brings to your security strategy. Through practical scenarios, explore cost-effective implementation strategies across your AWS accounts. Leave with actionable insights for optimizing your Amazon GuardDuty deployment to enhance protection of your AWS workloads and data.

TDR431 | Chalk talk | Best practices for containing AWS resources during incident response
Learn best practices for implementing isolation controls for AWS resources and accounts during security events. Through practical scenarios, discover effective approaches for isolating Amazon EC2 instances, AWS Lambda functions, and Amazon ECS containers. Explore comprehensive strategies for account-level isolation including identity, resource, and network controls. This session provides guidance on implementing and safely removing isolation controls as part of your response procedures. Leave with actionable patterns for strengthening your AWS incident response capabilities. To help businesses move faster and deliver security outcomes, modern security teams need to identify opportunities to automate and simplify their workflows. One way of doing so is through generative AI. Join this chalk talk to learn how to identify use cases where generative AI can help with investigating, prioritizing, and remediating findings from Amazon GuardDuty, Amazon Inspector, and AWS Security Hub. Then find out how you can develop architectures from these use cases, implement them, and evaluate their effectiveness. The talk offers tenets for generative AI and security that can help you safely use generative AI to reduce cognitive load and increase focus on novel, high-value opportunities.

TDR336 | Chalk talk | Secure generative AI models and agents on AWS
Learn how to strengthen security controls for generative AI models and Amazon Bedrock agents in your AWS environment. This session explores implementation patterns for protecting API endpoints and securing agent interactions. Discover practical approaches for implementing protective controls and maintaining data security for your AI/ML workloads. Leave with actionable strategies for building secure generative AI implementations using AWS services.

TDR337 | Chalk talk | Implementing AWS security best practices: Insights & strategies
Learn how to optimize your AWS security services implementation including Amazon GuardDuty, AWS Security Hub, and AWS WAF. AWS security experts share practical insights and proven patterns derived from thousands of customer deployments. This session provides actionable strategies for operationalizing security services effectively in your environment. Discover implementation best practices and architectural approaches that help you maximize the value of your AWS security services.

TDR338 | Chalk talk | Building cloud-native forensic investigation architectures on AWS
Join this chalk talk to explore the advantages of cloud-native digital forensics and incident response on AWS. Engage in interactive discussions on best practices for establishing secure forensic investigation environments. We’ll explore architectural patterns for safely collecting and storing forensic artifacts, leveraging ephemeral resources to enhance security, and implementing effective network, account, and organizational designs. Bring your questions and scenarios as we collaboratively examine how to build scalable, standardized investigation processes using AWS services. Leave with practical strategies for enhancing your forensic and incident response capabilities in the cloud.

TDR231 | Chalk talk | Resilient security teams: Reduce burnout and boost performance
Learn strategies for building resilient security and incident response teams that prioritize wellbeing while maintaining high performance. This session explores approaches for implementing regular team check-ins, data-informed wellbeing initiatives, and a supportive team culture. Discover practical methods for fostering open communication, maintaining team engagement, and recognizing team contributions. Through real-world examples, develop actionable plans to enhance team resilience, improve retention, and sustain security excellence. Leave with strategies to build and maintain high-performing security teams.

TDR321 | Lightning talk | From Incidents to Insights: Creating a Security Learning Organization
Learn how to transform security events into organizational improvements. This session demonstrates practical approaches for building effective feedback loops, preserving institutional knowledge, and implementing sustainable enhancements to security operations. Discover AWS strategies for measuring the impact of improvements and fostering a culture of continuous learning. Leave with actionable frameworks for strengthening your security program through systematic learning and adaptation.

TDR322 | Lightning talk | How AWS uses generative AI to advance native security services
Discover how AWS leverages generative AI to enhance native security services. This session demonstrates how AWS implements AI capabilities across its security portfolio to improve threat detection, investigation, and response. Explore practical implementations in Amazon GuardDuty and Amazon Inspector that enable automated analysis and natural language security queries. Leave with insights into how AWS makes security more intelligent and efficient through generative AI.

TDR323 | Lightning talk | How Autodesk scales threat detection with Amazon GuardDuty
Learn how Autodesk elevated their threat detection strategy using Amazon GuardDuty. This lightning talk explores their implementation approach, operational insights, and best practices for leveraging the advanced detection capabilities of GuardDuty, including malware protection. Discover how they maintain robust security while efficiently managing their growing cloud footprint.

TDR421 | Lightning talk | Accelerating Incident Response with AWS Security Incident Response
Learn how AWS Security Incident Response helps security teams streamline investigation and response procedures. This session demonstrates service integration capabilities with Amazon GuardDuty, AWS CloudTrail, and AWS Security Hub to provide centralized incident management. Through customer examples and implementation patterns, discover practical approaches for building automated response strategies. Leave with actionable insights for enhancing your security operations using AWS services.

Interactive sessions (builders’ sessions, code talks, and workshops)

TDR251 | Builders’ session | Build your first AI security assistant with Amazon Q
Discover how to build your first AI-powered security assistant using Amazon Q Business—no AI expertise required. In this hands-on session, you’ll create three practical security workflows: an automated Amazon GuardDuty incident investigator that contextualizes security findings, an AWS Security Hub compliance report generator that streamlines policy assessments, and an Amazon Inspector-based vulnerability management helper that accelerates remediation. Perfect for security practitioners who want to enhance AWS security operations with generative AI while mastering core AWS security services through practical application.

TDR252 | Builders’ session | Detect ransomware events in Amazon S3 using Amazon GuardDuty
In this builders’ session, join the AWS Customer Incident Response Team (CIRT) to implement Amazon S3 ransomware detection using Amazon GuardDuty. Through hands-on scenarios, learn to identify unauthorized encryption operations and implement effective response procedures. Build detection patterns using AWS CloudTrail, Amazon Athena, Amazon GuardDuty, and Amazon CloudWatch. Practice investigating events and implementing preventive measures aligned with AWS Security’s latest guidance for Amazon S3 object protection. You must bring your laptop to participate.

TDR351 | Builders’ session | Build an OCSF security log pipeline with AWS
Build a complete security log pipeline that leverages the Open Cybersecurity Schema Framework (OCSF) in this hands-on session. Work alongside AWS experts to ingest, transform, and enrich your security data. Learn practical techniques to standardize security logs, whether using your own schema or our provided examples. Walk away with implementable solutions to enhance your threat detection capabilities through normalized security data flows. Bring your laptop and optional custom log samples to create solutions tailored to your use cases.

TDR451 | Builders’ session | Automate incident response for Amazon EC2 and Amazon EKS
Learn how to streamline incident response using the Automated Forensics Orchestrator solution for Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic Kubernetes Service (Amazon EKS). This session demonstrates how to implement automated workflows triggered by AWS Security Hub findings. Explore implementation prerequisites, customization options, and best practices for enhancing your security operations through automated forensics capabilities. Discover how to standardize response procedures across your Amazon EC2 and Amazon EKS environments.

TDR452 | Builders’ session | Build generative AI security runbooks with Amazon Bedrock
In this builders’ session, learn how to enhance security operations using generative AI-powered runbooks with Amazon Bedrock and Bedrock Agents. Create intelligent workflows that analyze AWS Security Hub findings and provide contextual remediation guidance. Through hands-on exercises, build Bedrock Agents that leverage AWS documentation and implement natural language interfaces for security investigations. Learn how to configure knowledge bases with organization-specific content and implement appropriate guardrails. Leave with a practical solution for streamlining security operations using generative AI. You must bring your laptop to participate.

TDR341 | Code talk | Build AI security agents with Amazon Bedrock and Security Lake
In this code talk, explore how to enhance security operations by creating AI agents using Amazon Bedrock and Amazon Security Lake. Through live coding demonstrations, learn to build automated workflows that combine autonomous decision-making capabilities with generative AI for security analysis and response. See how to implement agents that analyze logs, provide contextual insights, and execute response procedures. Discover practical approaches for integrating custom tools and leveraging large language models in your security workflows.

TDR342 | Code talk | Operationalizing Amazon Security Lake with analytics and generative AI
Roll up your sleeves for this hands-on coding session where we’ll build modern security analytics tools on top of Amazon Security Lake. Through interactive demos, we’ll craft queries and visualizations to operationalize your security data using AWS services like Amazon OpenSearch Service, Amazon QuickSight, Amazon Athena, and Amazon Bedrock. Leave with practical code samples and architectures to analyze security data. Get inspired with ideas on how to transform your threat detection and incident response stack.

TDR343 | Code talk | From detection to code: GuardDuty attack sequences with Amazon Q
In this code talk, explore how Amazon GuardDuty attack sequence detection capabilities work alongside Amazon Q to enhance security operations. Through live coding demonstrations, learn hoGuardDuty machine learning models identify connected security events and create comprehensive event sequences. See how to build automated response procedures using Amazon Q AI-assisted development capabilities. Discover practical approaches for implementing context-aware security automation. Leave with implementation patterns for enhancing your security operations using generative AI tools.

TDR371 | Workshop | Hands-on Threat Detection & Response using AWS Security
Get hands-on experience with AWS security services in this interactive workshop. Learn to detect and respond to simulated threats using Amazon GuardDuty, Amazon Inspector, AWS Security Hub, and Amazon Detective. Practice both manual and automated response techniques with AWS Lambda as you investigate security events across different resource types. Walk away with practical skills to operationalize threat detection and response in your AWS environment. Bring your laptop to participate in this hands-on workshop.

TDR372 | Workshop | Secure container workloads with AWS security services
In this workshop, learn how to implement AWS security services to protect container workloads end-to-end from code to operations. Gain hands-on experience with static code analysis, detective controls, threat detection, vulnerability management, and incident response for Amazon Elastic Kubernetes Service (Amazon EKS) and Amazon Elastic Container Service (Amazon ECS). Through guided scenarios, discover how to use AWS security services to enhance your container security posture. Leave with practical strategies for implementing security controls in your container environments. You must bring your laptop to participate.

TDR471 | Workshop | AWS Security Incident Response Challenge: Defense in action
Put your AWS security incident response skills to the test in this interactive session. Assume the role of an AWS Security Engineer responding to a time-sensitive scenario. Using provided intelligence, you’ll have a limited time to implement security controls in your AWS environment. Learn to prioritize actions and leverage AWS security services effectively under realistic conditions. This hands-on exercise helps you practice rapid decision-making and security implementation in AWS environments. Leave with practical experience in incident response strategies. You must bring your laptop to participate.

TDR472 | Workshop | Active defense strategies using AWS AI/ML services
This workshop will help you learn how to develop and deploy active defense strategies, such as deception, using Amazon Bedrock and Amazon SageMaker. Gain hands-on experience developing AI-driven responses for security operations. You will learn how to develop adaptive responses that mimic what an actor may be trying use against you. You will Learn implementation patterns for prompt engineering, deployment strategies, and monitoring methodologies. You must bring your laptop to participate.

Browse the full re:Inforce catalog to learn more about sessions in other tracks, plus gamified learning, innovation sessions, partner sessions, and labs. Discover how to optimize your re:Inforce journey with our attendee guides—your essential resource for selecting perfect learning sessions and getting the greatest value from your experience.

Our comprehensive track content is designed to help arm you with the knowledge and skills needed to securely manage your workloads and applications on AWS. Don’t miss out on the opportunity to stay updated with the latest best practices in threat detection and incident response. Join us in Philadelphia for re:Inforce 2025 by registering today. We can’t wait to welcome you!

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Nisha Amthul

Nisha Amthul

Nisha is a Senior Product Marketing Manager at AWS Security, specializing in detection and response solutions. She has a strong foundation in product management and product marketing within the domains of information security and data protection. When not at work, you’ll find her cake decorating, strength training, and chasing after her two energetic kiddos.

How to automate incident response for Amazon EKS on Amazon EC2

Post Syndicated from Jonathan Nguyen original https://aws.amazon.com/blogs/security/how-to-automate-incident-response-for-amazon-eks-on-amazon-ec2/

Triaging and quickly responding to security events is important to minimize impact within an AWS environment. Acting in a standardized manner is equally important when it comes to capturing forensic evidence and quarantining resources. By implementing automated solutions, you can respond to security events quickly and in a repeatable manner. Before implementing automated security solutions, it’s important for your security team to have a defined process and understanding of which actions to take for specific AWS resources.

In a previous two-part post, we discussed using Amazon GuardDuty and Amazon Detective to detect security issues for an Amazon Elastic Kubernetes Service (Amazon EKS) cluster. In this post, we walk through the differences of Amazon Elastic Cloud Compute (Amazon EC2) and EKS clusters on EC2 when responding to security events. By understanding the differences between the two AWS resource types, you can enhance your existing EC2 incident response (IR) automation to include EKS. Then, we walk you through the deployment and use of a sample solution based on the Automated Forensics Orchestrator for Amazon EC2 solution to automate the end-to-end incident response process for EKS, which includes acquisition, isolation, investigation and reporting.

If you’re familiar with the differences between responding and investigating Amazon EC2 and Amazon EKS resources and want to skip to the solution, skip to the Solution prerequisites.

Note: Amazon EKS on AWS Fargate, which is an AWS managed serverless computing engine, isn’t covered in this post.

Amazon EC2 compared to Amazon EKS resources for incident response

Although Amazon EKS clusters are running on EC2 instances, it’s important to understand the differences between the two and how to handle incident response automation for each resource type. EC2 is a virtual machine where you can install customized applications and packages to complete a task. Amazon EKS is an AWS managed service that you can use to run Kubernetes on EC2 instances without needing to install, operate, and maintain your own Kubernetes control plane or nodes. You can use existing plugins and tooling from the Kubernetes community. EKS clusters can have managed node groups, which create and manage the underlying EC2 instances. Because of Kubernetes cluster architecture, multiple EC2 instances within a node group can be tied to a single EKS cluster. There can also be multiple pods—each running different processes—running on an EC2 instance. GuardDuty can monitor and detect security events for EKS resources and provide information to help identify which resources are impacted, such as EKS cluster name, Kubernetes workload details, tags, and AWS Identity and Access Management (IAM) principals.

For incident response automation purposes, security teams need to understand the relationship between Amazon EKS and Amazon EC2 to determine the appropriate response to a possible security event. For example, if GuardDuty identifies Execution:Kubernetes/AnomalousBehavior.ExecInPod, you might want to investigate the command invoked on the identified pod along with other pods within the EKS cluster. To expand the investigation, you would need to capture and investigate evidence on the entire EKS cluster, which can include multiple EC2 instances.

Accessing Amazon EKS clusters using kubectl

To collect relevant forensic evidence, such as volatile memory, there might be instances where you need to run commands on Amazon EKS clusters. Kubectl is a command line tool that you can use to manage and run commands on EKS clusters using the Kubernetes API. Access with kubectl is limited to the container environment and doesn’t provide full shell access to the host. Although AWS Systems Manager (AWS SSM) can be used to interact with an EKS cluster’s EC2 instances, kubectl allows administrators to manage pods, scale applications, and view cluster logs. We dive into specific actions where kubectl is used in the later sections of this post.

When automating the workflow of response actions to an Amazon EKS cluster, you can incorporate the kubectl commands within Amazon Lambda functions. To invoke commands using kubectl, you need to get credentials for the EKS cluster to:

  1. Authenticate to an IAM principal authorized to work with Amazon EKS
  2. Obtain the EKS cluster endpoint
  3. Verify the certificate authority data for the EKS endpoint
  4. Generate a bearer token from the IAM principal
  5. Create a kubeconfig configuration dictionary

For more detailed information, see A Container-Free Way to Configure Kubernetes Using AWS Lambda and a deep dive into simplified Amazon EKS access management.

Capturing volatile memory on EKS

Volatile memory (RAM) in a memory dump is important because it contains the EC2 instance’s in-progress operations. Volatile memory is extremely important in determining the root cause of a security event. Although the commands for capturing volatile memory between EC2 instances and Amazon EKS clusters are similar, there is one important difference to keep in mind. For Linux operating systems, you can use the insmod command with the appropriate LiME kernel module (.ko file) to capture volatile memory:

sudo insmod $lime.ko "path=/path/to/dump.mem format=lime"

For Amazon EKS cluster EC2 instances, there can be multiple pods on a single EC2 instance. Knowing which process ID (PID) is associated to a pod is important to map the actions that could have resulted in a security event or compromise.

Figure 1: EKS cluster node list

Figure 1: EKS cluster node list

To get a list of PIDs on the EC2 instance, as shown in Figure 1, the following crictl command needs to be invoked:

crictl inspect $(crictl ps | grep [pod-name] | awk '{print $1}') | grep -i pid

After the crictl command is invoked, you will see the output of existing PIDs for the EC2 instance to use in the nsenter command, as shown in the following figure.

Figure 2: EKS node process ID list

Figure 2: EKS node process ID list

To create a mapping between a pod and the PID from a memory dump, the following nsenter command needs to be invoked on the target EC2 instance:

nsenter -t $PID -u hostname

After the nsenter command is invoked, you will see the output of pod and PID information for the EC2 instance, as shown in the following figure.

Figure 3: EKS node process ID to pod mapping commands

Figure 3: EKS node process ID to pod mapping commands

After you have the pod-to-PID mapping, you can export that information for later investigation. If you skip this step, the memory dump output will still have the PID information, but you won’t be able to map it back to previously running pods. It’s important to work with your security teams during forensic investigations to determine if this information is used during an investigation and update the automated workflow accordingly.

Network segmentation on EKS

After relevant forensic artifacts, such as volatile memory, disk volumes, and application logs, are collected from an Amazon EKS cluster, you might want to isolate compromised resources from the rest of your application resources. During resource isolation, EC2 instances can be isolated using security groups and network access control lists (NACLs). For EKS clusters, you can cordon the worker node, which makes the node tainted and unschedulable. When a node is cordoned, the Kubernetes scheduler is also blocked from placing new pods on the node. Another mechanism for isolating the EKS cluster is applying a Network Policy to deny ingress or egress traffic to the pod. Network policies, like NACLs, are stateless and control network traffic at the IP address or port level in an EKS cluster.

Depending on the scope of isolation, you can take the following approaches to isolating a pod on an EKS cluster in your automation.

  • Apply a network policy – You can add a network policy rule to limit ingress or egress from your pod. This will not impact other pods in the cluster unless there are additional rules applied. You would use this option if you’re sure that the compromise hasn’t gained access to the underlying EC2 instance.
  • Cordon the node – Removing the node won’t impact other nodes on the cluster but will block the scheduling of pods on the node. It doesn’t affect other nodes within the cluster.
  • Apply a security group – Applying a security group can impact the entire EC2 instance and limit traffic between Amazon EKS cluster nodes, the Kubernetes control plane, the cluster’s worker nodes, and external destinations. This is an option if you believe the underlying EC2 instance has been compromised.
  • Add a NACL rule – Like the security group option, this will impact the entire EC2 instance. Depending on the rule, it can also affect non-EKS workloads within the subnet.

Identity and access management for EKS

In addition to the IAM role associated to an EC2 instance profile, Amazon EKS uses service-linked IAM roles and Kubernetes role-based authorization control (RBAC) configuration. The IAM principal that creates the EKS cluster has system:masters permissions within the RBAC configuration on the EKS cluster. RBAC provides Kubernetes identities access for cluster-specific components and workflows. In addition to default identities created on EKS clusters, application-specific roles can be used within an EKS cluster. For example, IAM roles for service accounts (IRSA) can be used to associate an IAM role with a Kubernetes service account and assigned to containers within an EKS Pod. IRSA can help implement least privilege by restricting the Pod’s container to retrieve credentials for the IAM role associated with the Kubernetes service account. For a deeper dive into EKS IAM and how IAM roles are used within EKS, see Identity and access management for Amazon EKS.

Deciding how to revoke Amazon EKS permissions using automation can be challenging because revoking the AWS Security Token Service (AWS STS) credentials or changing the instance profile on the EC2 instance will impact all pods on the EC2 instance. Updating or changing the RBAC configurations on an EKS cluster requires application-specific knowledge to determine which identities are authorized to have specific permissions. It’s important to discuss with your application and security teams how permissions should be handled in the event of a compromised EKS cluster.

Moving to automated EKS incident response

Now that you understand the nuances of Amazon EKS on Amazon EC2 as it relates to incident response, you can decide how to incorporate functionality to respond to EKS in an existing solution your team might be using. It’s also important to understand where a human-in-the-loop needs to be incorporated to follow internal processes and procedures. Before incorporating automation into IR capabilities, you should walk through each step and verify the action the automation takes to make sure that the security and application teams are aligned. In this post, we incorporated Amazon EKS IR capabilities across acquisition, isolation, and investigation into the Automated Forensics Orchestrator for Amazon EC2 solution.

Solution prerequisites

For this walkthrough, you need to have the following elements in place:

Solution overview

The solution follows a similar pattern and workflow as the Automated Forensics Orchestrator for Amazon EC2 but has been customized for Amazon EKS.

Figure 4: Automated Forensics Orchestrator for Amazon EKS architecture

Figure 4: Automated Forensics Orchestrator for Amazon EKS architecture

The workflow, as shown in Figure 4, is:

  1. In the AWS application account, GuardDuty monitors for malicious activities that are specific to Amazon EKS resources. For example, a pod within an EKS cluster is invoking API commands using an unauthenticated system:anonymous user. GuardDuty findings are sent to Security Hub in the security account using native integration.
  2. Security Hub custom actions send finding information to Amazon EventBridge to invoke automated downstream workflows.
  3. For a specified event, EventBridge provides the EKS resource information for the forensics process to target and initiates an AWS Step Functions workflow.
  4. Step Functions triages the request as follows:
    1. Gets the EKS information, including which EC2 instances the pod is hosted on.
    2. Determines if isolation is required based on the Security Hub custom action.
    3. Determines if acquisition is required based on tags associated with the EC2 instance. The current tag that is evaluated is the following:
      • Tag name: IsTriageRequired
      • Tag key: true or false
    4. Initiates the acquisition flow based on triaging output
  5. Triaging details are stored in Amazon DynamoDB.
  6. The following two acquisition flows are initiated in parallel:
    1. Memory forensics flow – The Step Functions workflow captures the memory data and stores it in Amazon Simple Storage Service (Amazon S3). Post memory acquisition completion, the node is isolated by cordoning the node, creating a network policy, and applying a restricted security group to the cluster. To help maintain the chain of custody, a new security group is attached to the targeted instance and removes access for users, admins, or developers.
    2. Note: The isolation action is initiated based on the selected Security Hub custom action.

    3. Disk forensics flow – The Step Functions workflow takes a snapshot of the Amazon Elastic Block Storage (Amazon EBS) volume and shares it with the forensic account.
  7. Acquisition details are stored in DynamoDB.
  8. After the disk or memory acquisition process is complete, and the evidence has been captured successfully, a notification is sent to an investigation Step Functions state machine to begin the automated investigation of the captured data.
  9. The investigation Step Functions starts a forensic instance from a forensic AMI loaded with customer forensic tools:
    1. Loads the memory data from Amazon S3 for memory investigation.
    2. Creates an Amazon EBS volume from the snapshot and attaches it for disk analysis.
  10. Systems Manager documents (SSM documents) are used to run a forensic investigation.
  11. DynamoDB stores the state of the forensic tasks and their result when the jobs are complete. Investigation job details are stored in DynamoDB.
  12. Investigation details are shared with customers using Amazon Simple Notification Service (Amazon SNS).
  13. Forensic AMI is used by investigation Step Functions to perform memory and disk investigation.

Solution deployment

You can deploy the Amazon EKS IR automation solution using the AWS CDK or synthesizing a CDK into AWS CloudFormation templates and deploying them using AWS Management Console. Although the solution can be deployed in a single AWS account, the AWS Security Reference Architecture (AWS SRA) recommends that you use separate AWS accounts for forensic evidence and security tooling. The solution deployment follows AWS SRA recommendations.

The latest code for the Amazon EKS IR automation solution can be found at sample-eks-incident-response-automation, where you can also contribute to the sample code. For instructions and more information about using the AWS CDK, see Getting Started with AWS CDK.

Deploy the automation that collects, stores, and investigates forensic artifacts in the forensic AWS account:

  1. To build the app when navigating to the project’s root folder, use the following commands.
    • npm ci
    • npm run-build-lambda
  2. Run the following commands in your terminal while authenticated in your forensic solution AWS account. Be sure to replace <INSERT_AWS_ACCOUNT> with your account number and replace <INSERT_REGION> with the AWS Region that you want the solution deployed to.
  3. cdk bootstrap aws://<INSERT_AWS_ACCOUNT>/<INSERT_REGION>
    
    cdk deploy --all -c account=<INSERT Forensic AWS Account> -c region=<INSERT_FORENSIC_SOLUTION_REGION> --require-approval=never -c secHubAccount=<INSERT SECURITY_HUB_AGGREGATOR_AWS_ACCOUNT> -c STACK_BUILD_TARGET_ACCT=forensicAccount
    

    Example:

    cdk deploy —all -c account=1234567890 -c region=us-east-1 —require-approval=never -c secHubAccount=0987654321 -c STACK_BUILD_TARGET_ACCT=forensicAccount

Deploy the Security Hub custom action and EventBridge in the Security Hub Region of the delegated administrator account where security findings are consolidated:

  1. To build the app when navigating to the project’s root folder, use the following commands.
    • npm ci
    • npm run build-lambda
  2. Run the following commands in your terminal while authenticated in your Security Hub aggregator AWS account. Be sure to replace <INSERT_AWS_ACCOUNT> with your account number and replace <INSERT_REGION> with the AWS Region that you want the solution deployed to.
  3. cdk bootstrap aws://<INSERT_AWS_ACCOUNT>/<INSERT_REGION>
    	
    	cdk deploy --all -c account=<INSERT_SECURITY_HUB_AGGREGATOR_AWS_ACCOUNT> -c region=<INSERT_FORENSIC_SOLUTION_REGION> --require-approval=never -c forensicAccount=<INSERT_FORENSIC_SOLUTION_AWS_ACCOUNT> -c STACK_BUILD_TARGET_ACCT=securityHubAccount -c sechubregion=<INSERT_SECURITY_HUB_AGGREGRATOR_REGION>
    

    Example:

    cdk deploy --all -c account=0987654321 -c region=us-east-1 --require-approval=never -c forensicAccount=1234567890 -c STACK_BUILD_TARGET_ACCT=securityHubAccount -c sechubregion=us-east-1

Deploy the cross-account IAM role the security automation will use in the application AWS account where the EKS workload exists:

  1. Sign in to the AWS CloudFormation console of the application AWS account.
  2. Launch the CloudFormation cross-account-role.yml stack.
  3. Pass the following CloudFormation input parameters:
    1. solutionInstalledAccount=<Forensic Solution AWS Account Number>
    2. solutionAccountRegion=<Region of solution deployment>
    3. kmsKey=<ARN of the application account EBS volume encryption KMS key>

Use the solution to respond to an EKS GuardDuty alert

You can now use the automated solution on an Amazon EKS cluster with a GuardDuty finding that’s integrated with Security Hub. If you need to create GuardDuty findings, see How to generate security findings to help your security team with incident response simulations.

After you have an EKS security finding, you can go through either one of the IR workflows:

  • Forensic triage – This workflow evaluates in-scope EKS resources, collects volatile and non-volatile memory, conducts an investigation, and exports investigation artifacts to a forensic S3 bucket.
  • Forensic isolation – In addition to components of the previous workflow, the in-scope EKS resources are quarantined at the network and IAM layers.

In this example, you’ll use the forensic isolation workflow because that covers the end-to-end capabilities of the solution.

Run the forensic isolation workflow:

  1. Open the AWS Security Hub console in the Security Hub aggregator account.
  2. Choose Findings in the navigation pane and then select a security finding for Amazon EKS.
  3. Select the custom action for Forensic Isolation. This will start the workflow in the Security Hub aggregator account and invoke the Step Functions in the forensic account.
  4. Open the AWS Step Functions console in the forensic account.
  5. In the navigation pane, choose State Machines and then select the Forensic-Triage-Function to view the workflow graph status. In the following figure, the Step Functions workflow has successfully completed.
    Figure 5: EKS triage Step Functions graph view

    Figure 5: EKS triage Step Functions graph view

    1. In the Get Resource Info Case step, the pod name from the GuardDuty finding is extracted to identify the EKS cluster it’s part of and the related EC2 resources.
    2. Note: Per the solution, a guardrail is added to block action on an EC2 instance that is part of an EKS cluster with the IsTriageRequired tag with a value set to false. If automation is invoked against a protected EC2 instance resource, acquisitionFlow is skipped and a notification will be sent to the SNS topic.

  6. Because the EKS cluster isn’t excluded through the IsTriageRequired tag, a parallel invocation of Step Functions is invoked to capture forensic evidence.
  7. Select the Disk-Forensics-Acquisition-Function. The workflow here is similar to a normal EC2 incident response flow to capture snapshots and EBS volumes with the caveat that the EKS cluster can have multiple EC2 instances. In the following figure, the Step Functions workflow has successfully completed.
    Figure 6: Disk forensics acquisition Step Functions graph view

    Figure 6: Disk forensics acquisition Step Functions graph view

  8. Select the Memory-Forensics-Acquisition-Function; In the following figure, the Step Function workflow has successfully completed.
    Figure 7: Memory forensics acquisition Step Functions graph view

    Figure 7: Memory forensics acquisition Step Functions graph view

    1. As previously mentioned, you will need to determine if you want to map pods to process ID (PID) as part of this workflow. The automation captures the volatile memory where you will be able see the PIDs on the EC2 instance but does not map the PID to node for deeper investigation.
    2. Note: One reason you might not want to automatically map pods to PIDs is to minimize interaction with the possibly compromised cluster and quickly move towards isolation.

    3. After the Is Memory Acquisition Complete step is complete and if the Security Hub custom action for Forensic Isolation was selected, the isolation workflow of the EKS cluster begins. The isolation workflow will go through EKS-specific steps to:
      1. Label the affected pods on the EKS cluster.
      2. Apply a network policy to the affected pods.
      3. Revoke IAM role sessions.
      4. Cordon the node.
  9. Note: Depending on your desired workflow, you can edit these steps or add additional isolation steps to change instance profiles, security groups, or NACL rules.

  10. To expedite the investigation process, the Forensic-Investigation-Function is invoked when the Memory-Forensics-Acquisition-Function is completed and separately by the Disk-Forensics-Acquisition-Function. This is because of the disk and memory forensic evidence collection completing at different times. A forensic EC2 instance will be launched and begin conducting the investigation on the forensic artifacts. The completed investigation artifacts will be sent to Amazon S3 as they’re completed.
    1. You can use the console to view EKS artifacts within the dedicated S3 bucket in the forensic AWS account.
    2. Figure 8: Completed memory investigation artifacts for EKS

      Figure 8: Completed memory investigation artifacts for EKS

    3. The forensic investigation results from the automated workflow are also saved to the dedicated S3 bucket in the forensic AWS account.
Figure 9: Completed disk investigation artifacts for EKS

Figure 9: Completed disk investigation artifacts for EKS

As part of the automation, the forensic investigation EC2 instance in the forensic account is terminated after investigation is completed. The automation can be updated to retain the EC2 instance to so that your security teams can continue their investigation and review investigation artifacts to expedite root cause analysis.

As previously mentioned, the workflow you just went through encompasses both investigation and isolation of Amazon EKS resources. If your security teams want to conduct a more thorough investigation prior to isolating EKS resources, select the Forensic Triage custom action in Security Hub. Additionally, if you want to update the solution to be invoked from your security incident and event management (SIEM) tool, you can directly invoke the Forensic-Triage-Function Step Functions from your SIEM.

Clean up

For the cross-account IAM role in the application account, you can:

  1. Go to the AWS CloudFormation console for the application account and Region where you deployed the cross-account IAM role, select the cross-account-role stack.
  2. Choose the option to Delete the stack.

To clean up the CDK stacks, run the following command in the source folder in the Security Hub aggregator account and forensic account.

cdk destroy --all

Conclusion

In this post, we showed you the differences between Amazon EKS and Amazon EC2 resources and how to handle EKS automation for incident response. Even though EKS clusters are on EC2 instances, it’s important to understand the differences before implementing an automated solution that will affect EKS resources. We also walked through the deployment of an EKS-customized Automated Forensics Orchestrator for Amazon EC2 solution and showed you the end-to-end IR lifecycle to respond to a possible EKS compromise. The same approach to customize existing EC2 IR automated solutions can be used to expand support for EKS resources within your AWS environment to increase your security posture.

If you have feedback about this post, submit comments in the comments section that follows. If you have questions about this post, start a thread on re:Post.

Jonathan Nguyen
Jonathan Nguyen

Jonathan is a Principal Security Solution Architect at AWS. He helps large financial services customers develop a comprehensive security strategy and solutions to meet their security and compliance requirements in AWS.
Gopinath Jagadesan
Gopinath Jagadesan

Gopi is a Senior Solution Architect at AWS. In his role, he works with Amazon as his customer helping design, build, and deploy well architected solutions on AWS. He holds a master’s degree in electrical and computer engineering from the University of Illinois at Chicago. Outside of work, he enjoys playing soccer and spending time with his family and friends.

Protect against advanced DNS threats with Amazon Route 53 Resolver DNS Firewall

Post Syndicated from Lawton Pittenger original https://aws.amazon.com/blogs/security/protect-against-advanced-dns-threats-with-amazon-route-53-resolver-dns-firewall/

Every day, millions of applications seamlessly connect users to the digital services they need through DNS queries. These queries act as an interface to the internet’s address book, translating familiar domain names like amazon.com into the IP addresses that computers use to appropriately route traffic. The DNS landscape presents unique security challenges and opportunities in Amazon Virtual Private Cloud (Amazon VPC) environments. First, DNS resolution acts as an early checkpoint that you can use to control network traffic before it even begins. Second, DNS queries in your VPC follow a distinct path through the Amazon Route 53 Resolver that operates independently from your standard internet gateway, bypassing other network security controls.

To address this, Amazon Route 53 Resolver DNS Firewall provides protection for DNS traffic, starting with traditional domain lists where you can explicitly allow or deny DNS resolution of specific domains. Also, included are AWS Managed Domain Lists, which automatically block known malicious domains identified through Amazon Threat Intelligence and our trusted security partners. While this approach works effectively to help prevent known threats, sophisticated bad actors are increasingly using techniques that traditional blocklists can’t catch.

Instead of relying solely on static lists, Amazon Route 53 Resolver DNS Firewall Advanced provides intelligent protection alongside these traditional controls. These advanced rules work like a skilled security analyst, watching for suspicious patterns in DNS queries in real time. By examining characteristics such as query length, entropy, and frequency, the service can spot potentially malicious activity even when encountering previously unknown domains. This approach enables detecting and blocking advanced threats like DNS tunneling and domain generation algorithms (DGAs)—techniques that bad actors use to establish hidden communication channels or connect malware to their control servers.

In this post, we take you on a practical journey exploring these DNS-based threats and tools to help prevent them. You’ll learn how to set up effective Route 53 Resolver DNS Firewall Advanced rules, and we provide a ready-to-deploy CloudFormation template with our recommended configurations. Finally, we demonstrate an example of real-world threat detection and show you how the service integrates with AWS Security Hub to improve visibility of alerts. By the time you finish reading this post, you’ll have a clear understanding of how to deploy Route 53 Resolver DNS Firewall rules to add an intelligent, proactive layer of security to your AWS environment.

Understanding the risks of DNS tunneling and DGAs

As mentioned earlier, the Route 53 Resolver provides a service-managed path to the internet that operates independently from your VPC’s internet gateway. While this architecture enables efficient DNS resolution, it can be exploited through techniques such as DNS tunneling. Let’s explore how these techniques work and why they present unique challenges.

DNS tunneling takes advantage of the DNS protocol’s basic function—asking questions about domain names and receiving answers from the authoritative nameserver for the domain. But instead of using DNS for its intended purpose of domain name resolution, tunneling encodes other types of data within DNS queries and responses. For example, rather than asking simply what is the IP address for example.com?, a tunneling exploit might embed data within a query like secretdata123.attacker.com, where secretdata123 contains encoded information. This can lead to DNS being used as a two-way communications command and control channel. Detecting and blocking DNS tunneling is a vital control for stopping data exfiltration and command and control (C2) communications.

DGAs represent a different challenge for DNS security. Rather than using a fixed, predictable domain name that can be quickly blocked, DGAs automatically create many possible domain names using mathematical formulas, which are then used as a destination for C2 traffic. For instance, a DGA might generate domains like xkt7py.com today and mn9qrs.com tomorrow. This makes it difficult to maintain effective blocklists, because the domains change frequently and appear random. Traditional threat intelligence feeds, which rely on identifying and blocking known malicious domains, struggle to keep pace with DGA-generated domains.

How DNS Firewall Advanced works

When examining a domain name, Route 53 Resolver DNS Firewall Advanced looks at multiple characteristics that help distinguish between legitimate and suspicious domains. For example, legitimate domain names typically use real words and follow predictable patterns that are designed to facilitate a human’s ability to recall and enter them accurately. In contrast, domains used for tunneling or generated by DGAs often contain random-looking strings of characters or unusual patterns.

Route 53 Resolver DNS Firewall Advanced builds its intelligence on extensive analysis of real-world domain usage patterns. It learns what legitimate domain names look like by studying the most resolved domains on the internet, combined with actual domain resolution patterns from across AWS. This real-world training data helps establish a baseline for normal domain name characteristics. DNS Firewall Advanced then contrasts these patterns against known techniques used in DNS tunneling and domain generation to identify suspicious activity.

The service analyzes various aspects of each domain name, including:

  • How the domain name is structured and broken into parts
  • The patterns of letters and numbers used
  • How closely the domain resembles natural language
  • The presence of common words versus random character combinations

The service analyzes queries in real time, processing each one in less than a millisecond, which maintains strong security controls without affecting your applications’ performance.

Route 53 Resolver DNS Firewall Advanced has customized protection levels that you can use to choose how aggressively you want to detect and respond to suspicious domains through confidence thresholds:

  • High confidence: This setting focuses on the most obvious threats, minimizing false positives. It’s ideal for production environments where blocking legitimate traffic could be disruptive.
  • Medium confidence: Provides balanced protection, suitable for most environments.
  • Low confidence: Offers the most detection but might require more tuning to avoid false positives. This setting is useful for high-security environments or for initial monitoring to understand traffic patterns.

You can combine these confidence levels with different actions (block or alert) to create a defense strategy that matches your security needs.

Manually create a DNS Firewall Advanced rule:

To start, we show you how to manually create a Route 53 Resolver DNS Firewall Advanced rule in the AWS Management Console. This rule will block DNS queries that it has detected to be DNS tunneling with high confidence.

To manually create a rule:

  1. In the Route 53 console, choose Rules in the navigation pane, and then choose Add rule.
    Figure 1: Rules in the Route 53 console

    Figure 1: Rules in the Route 53 console

  2. Enter a name for the rule and select DNS Firewall Advanced protections.
    Figure 2: Add a rule

    Figure 2: Add a rule

  3. Under DNS Firewall Advanced protection:
    1. Select DNS tunneling detection.
    2. For Confidence threshold, select High.
    3. Leave the Query type empty so that the rule applies to all query types.
    Figure 3: Select DNS protection options

    Figure 3: Select DNS protection options

  4. Under Action:
    1. Select Block.
    2. For the response, select OVERRIDE.
    3. For the Record value, enter dns-firewall-advanced-block.
    4. For the Record type, select CNAME.
    5. Choose Add rule.
    Figure 4: Configure actions for the rule

    Figure 4: Configure actions for the rule

We’ve created an AWS CloudFormation stack that deploys the following recommended Route 53 Resolver DNS Firewall rules in a DNS Firewall rule group. We recommend this configuration because it provides a balanced security approach—blocking high-confidence threats immediately while generating alerts for lower-confidence detections.

The inclusion of the AWS Managed Aggregate Threat List is particularly valuable because it combines domains from multiple threat categories (malware, ransomware, botnet, spyware, and DNS tunneling) into a blocklist. This consolidated list includes the domains from other AWS Managed Domain Lists, including those identified by GuardDuty threat intelligence systems, giving you broad protection against known malicious domains while the Route 53 DNS Firewall Advanced rules catch previously unseen threats.

For enterprise environments, you can scale this protection across your entire organization by using AWS Firewall Manager to automatically deploy and manage this rule group configuration consistently across the VPCs in your organization.

  • BLOCK – Aggregate Threat List (domains associated with multiple DNS threat categories including malware, ransomware, botnet, spyware, and DNS tunneling to help block multiple types of threats)
  • BLOCK – DNS Tunneling | Confidence: HIGH
  • BLOCK – DGAs | Confidence: HIGH
  • ALERT – DNS Tunneling | Confidence: LOW
  • ALERT – DGAs | Confidence: LOW

To deploy this rule group using a CloudFormation stack:

  1. Navigate to the CloudFormation console, choose Stacks from the navigation pane. Choose Create Stack in the upper right and select With new resources (standard).
    Figure 5: Create a stack

    Figure 5: Create a stack

  2. Download the CloudFormation template. Select Choose an existing template and then select Upload a template file and upload the CloudFormation stack. Choose Next.
    Figure 6: Use the CloudFormation template

    Figure 6: Use the CloudFormation template

  3. Enter a stack name and choose Next.
    Figure 7: Enter a stack name

    Figure 7: Enter a stack name

  4. Leave the default values for all options, select Next, and then choose Submit.
  5. Navigate to the Route 53 Resolver DNS Firewall by visiting the Amazon VPC console, scroll down to the DNS firewall section, and select the Rule groups tab.
  6. Select the newly created rule group.
  7. Select the Associated VPCs tab, choose Associate VPC, and then associate a VPC you want to protect and choose Associate.
    Figure 8: Associate a VPC

    Figure 8: Associate a VPC

Observability

Route 53 Resolver query logging provides detailed visibility into DNS queries made from resources associated with your VPCs, enabling you to monitor and analyze your DNS traffic for security and compliance purposes. By configuring query logging, you can capture essential information about each DNS request, including the domain name being queried, the record type, the response code, and the originating VPC and instance. Query logging is particularly valuable when used in conjunction with Route 53 Resolver DNS Firewall, because it helps you track blocked queries and fine-tune your security rules based on actual DNS traffic patterns in your environment. The following are examples of log entries generated when DNS Firewall detects and responds to suspicious activities, showing the detailed information available for security analysis and incident response.

Example log entry: DNS tunneling block

The following is an example of a DNS tunneling block.

{
    "version": "1.100000",
    "account_id": "11111111111",
    "region": "us-west-2",
    "vpc_id": "vpc-0fcc85bd45b791d5a",
    "query_timestamp": "2025-02-05T03:54:12Z",
    "query_name": "1WTE4CyL4Vf1LQDDAToimuqFBEtMXyYMsYP8zPgVyTagzSh5PvinuQcL6N8at4A.REZv3VqKU4x43DPcCKAzQk4UKoZjB3nDMukHAuKTtDckTqZ8SDDZ1iXRey6a5sD.mEDMdrzPocS9exqoBQ1xfSuKfvW.1.dnstunnel.com.",
    "query_type": "A",
    "query_class": "IN",
    "rcode": "NXDOMAIN",
    "answers": [
        {
            "Rdata": "dns-firewall-advanced-block.",
            "Type": "CNAME",
            "Class": "IN"
        }
    ],
    "srcaddr": "10.1.0.122",
    "srcport": "41859",
    "transport": "UDP",
    "srcids": {
        "instance": "i-0c738190f19db9a2c"
    },
    "firewall_rule_action": "BLOCK",
    "firewall_rule_group_id": "rslvr-frg-63efa138b43f428b",
    "firewall_protection": "DNS_TUNNELING"
}

Example log entry: DNS tunneling alert

The following is an example of a DNS tunneling alert.

{
    "version": "1.100000",
    "account_id": "11111111111",
    "region": "us-west-2",
    "vpc_id": "vpc-0fcc85bd45b791d5a",
    "query_timestamp": "2025-02-05T04:00:02Z",
    "query_name": "1WTEc8GwFH3qHY8XKjbhXuj43yGShMrhacqwJYSZkSqRQ95sagz64NUpnuj4R8R.S79aru2KRB8d9nCHEPdXWJxGT4aUjVMqtCRSq9EZXRCo8NH5cmLvmcho3hh1mbK.NqGY1X6M4qpMGX6dnTSHuCsZFbf.1.dnstunnel.com.",
    "query_type": "A",
    "query_class": "IN",
    "rcode": "NOERROR",
    "answers": [
        {
        "Rdata": "202.92.34.217",
        "Type": "A",
        "Class": "IN"
        }
    ],
    "srcaddr": "10.1.0.122",
    "srcport": "35116",
    "transport": "UDP",
    "srcids": {
        "instance": "i-0c738190f19db9a2c",
        "resolver_endpoint": "rslvr-out-e20639d3666748f58"
    },
    "firewall_rule_action": "ALERT",
    "firewall_rule_group_id": "rslvr-frg-63efa138b43f428b",
    "firewall_protection": "DNS_TUNNELING"
}

Integration with Security Hub

Security Hub provides you with a view of your security state in AWS and helps you to check your environment against security industry standards and best practices. Security Hub collects security data from across AWS accounts, AWS services, and supported third-party partner products, and helps you to analyze security trends and identify the highest priority security issues. It enables findings from both the Amazon: Route 53 Resolver DNS Firewall – AWS List and Amazon: Route 53 Resolver DNS Firewall Advanced list by default, so you’ll automatically receive these alerts without additional configuration. You only need to manually enable Amazon: Route 53 Resolver DNS Firewall – Custom List findings if you’re using custom domain lists in your rule groups. See Sending findings from Route 53 Resolver DNS Firewall to Security Hub for more information.

The following figure is an example of how Route 53 Resolver DNS Firewall Advanced findings appear in the Security Hub console, providing you with actionable security intelligence directly in your centralized dashboard.

Figure 9: DNS Firewall Advanced findings in Security Hub

Figure 9: DNS Firewall Advanced findings in Security Hub

Select a finding to view details such as Finding ID, Types, Workflow status, and so on.

Figure 10: Findings details

Figure 10: Findings details

Conclusion

Amazon Route 53 Resolver DNS Firewall Advanced represents a significant step forward in protecting organizations against sophisticated DNS-based threats. As mentioned, DNS queries sent to the Route 53 Resolver follow a unique path that bypasses traditional AWS security controls like security groups, NACLs, and even AWS Network Firewall—creating a security gap in many environments. Throughout this post, we’ve explored how DNS tunneling and DGA-based exploits take advantage of this blind spot, and how you can use Route 53 Resolver DNS Firewall Advanced to protect from these threats through real-time pattern analysis and anomaly detection. You learned how to configure the service in the AWS console and use the provided CloudFormation template with recommended rules that balance blocking high-confidence threats while alerting on potential issues. And you saw how query logging provides valuable visibility into your DNS traffic and how Security Hub integration centralizes your security findings. Implementing these capabilities helps you protect your infrastructure from sophisticated DNS-based exploits that traditional domain blocklists cannot catch, strengthening your cloud security posture while maintaining operational efficiency.

If you have feedback about this post, submit comments in the Comments section below.

Lawton Pittenger

Lawton Pittenger

Lawton is a Security Solutions Architect at AWS, based in New York City, focused on helping customers implement native AWS security services. Professionally, Lawton has worked in IT security roles, securing cloud environments. Outside of cloud security, his interests include skateboarding, snowboarding, and rock climbing.

Michael Leighty

Michael Leighty

Michael is a Senior Security Solutions Architect at AWS, based in Atlanta. He specializes in helping customers design and implement effective network security controls, drawing from extensive experience at leading network security vendors. At AWS, he works closely with service teams to drive continuous improvement in security services based on customer needs and feedback.

AI lifecycle risk management: ISO/IEC 42001:2023 for AI governance

Post Syndicated from Abdul Javid original https://aws.amazon.com/blogs/security/ai-lifecycle-risk-management-iso-iec-420012023-for-ai-governance/

As AI becomes central to business operations, so does the need for responsible AI governance. But how can you make sure that your AI systems are ethical, resilient, and aligned with compliance standards?

ISO/IEC 42001, the international management system standard for AI, offers a framework to help organizations implement AI governance across the lifecycle. In this post, we walk through how ISO/IEC 42001 enables effective AI governance, review the risk management requirements, and explore how you can use threat modeling as a practical technique to meet those expectations.

AI governance

AI governance refers to the organizational structures, policies, and controls that enable AI systems to be used responsibly, ethically, and safely. Governance spans the entire AI lifecycle and includes the following activities:

  • Setting the intended purpose and stakeholder alignment
  • Managing data, models, and deployment risks
  • Designing in explainability, bias mitigation, and traceability
  • Establishing accountability, monitoring, and decommissioning practices

These activities are the foundation of a formal framework that you can use to establish governance processes, identify and manage risk, and implement processes for continuous improvement

AI lifecycle

While ISO 42001 provides a framework for AI governance, ISO/IEC 22989:2022 describes what an AI system is and how it evolves. Governance should be implemented at every stage of the AI lifecycle to manage AI risks effectively. According to the ISO/IEC 22989:2022 standard, an organization’s AI life cycle might include these stages:

  1. Inception: Identifying needs, goals, and feasibility
  2. Design and development: Defining system architecture, data flows, and training models
  3. Verification and validation: Testing and confirming that the system meets requirements and performs as intended
  4. Deployment: Releasing the system into its operational environment
  5. Operation and monitoring: Running the system, logging activity, and monitoring performance and outcomes
  6. Re-evaluation: Assessing whether the system continues to meet objectives under changing conditions
  7. Retirement: Decommissioning the system and addressing long-term data and access risks

Understanding the AI lifecycle, shown in Figure 1 that follows, is critical for identifying and mitigating AI risks. While these seven stages are provided directly in ISO 22989:2022, your organization might define its AI lifecycle stages differently to suit its business context. We refer to these stages as we explore the components of an AI management system, from initial AI system scoping, through threat monitoring and risk assessment, to monitoring the established governance program.

Figure 1: Example of AI system lifecycle model stages and high-level processes based on ISO/IEC 22989:2022

Figure 1: Example of AI system lifecycle model stages and high-level processes based on ISO/IEC 22989:2022

Risk management in ISO/IEC 42001:2023

After an organization has identified and assessed AI risks (Clause 6.1 of ISO/IEC 42001:2023), operational controls to mitigate those risks must be implemented (Clause 8.2), and those controls and the AI system itself should be continuously monitored, documented, and improved (Clauses 9 and 10). AI impact assessments (AIIAs) are critical in high-risk use cases, complementing baseline risk assessments by focusing on societal, ethical, and legal impacts. AIIAs are like data protection impact assessments (DPIAs) for high-risk personal data processing under many privacy regulations. DPIAs are specifically designed to assess risks to individuals’ privacy and data protection rights under laws such as the GDPR. While AIIAs help organizations maintain responsible AI governance, DPIAs can be used in parallel to help verify that AI systems comply with data protection laws, together providing a holistic view of risks and safeguards across both ethical and legal dimensions.

You are free to select the AIIA tools or methodologies that best fit your use case. Two widely accepted frameworks are:

  • ISO 31000: A general-purpose enterprise risk management standard that helps identify, evaluate, and treat risks in a structured and repeatable way. It aligns well with organizations seeking to embed AI risk into their broader enterprise risk management (ERM) programs.
  • NIST AI Risk Management Framework (AI RMF): A NIST framework specifically designed for AI systems. It introduces tailored concepts such as explainability, robustness, fairness, and accountability, with actionable guidance organized into four core functions: Map, measure, manage, and govern.

ISO 42001 provides structured methods to conduct risk and impact assessments. Threat modeling tools such as:

  • STRIDE (spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege). STRIDE aims to make sure that a system meets security requirements for confidentiality, integrity and availability.
  • DREAD (damage potential, reproducibility, exploitability, affected users, and discoverability) is a framework that can assess severity of individual threats.
  • OWASP (Open Worldwide Application Security Project) for machine learning (ML) enables analysis of AI system vulnerabilities, adversarial risks, and privacy threats.

Trustworthy AI is the result of strategic governance, structured methodologies, and technical analysis.

Figure 2 that follows shows the tiered structure of AI risk governance, moving from high-level governance to detailed technical assessments. On the left side, there’s a downward flow representing the increasing depth of controls, while the right side shows an upward scale indicating escalating AI risks.

  • At the top layer, ISO/IEC 42001:2023 defines formal requirements for AI governance, including risk assessment mandates, control implementation, and lifecycle oversight.
  • The middle layer features widely adopted risk assessment methodologies and frameworks, such as ISO 31000 and the NIST AI Risk Management Framework (RMF), which provide structured methods to identify, evaluate, and mitigate AI risks.
  • At the base, are detailed threat modeling tools—including STRIDE, DREAD, PASTA, LINDDUN, and OWASP for ML—that support deep analysis of AI systems for vulnerabilities related to security, privacy, data protection, and adversarial threats.

Together, these layers form a comprehensive approach to AI risk governance, aligning strategic oversight with operational and technical defenses.

Figure 2: A layered approach to AI risk management aligned with ISO/IEC 42001. ISO/IEC 42001 defines AI governance for responsible AI

Figure 2: A layered approach to AI risk management aligned with ISO/IEC 42001. ISO/IEC 42001 defines AI governance for responsible AI

Threat modeling for AI risk identification

Threat modeling identifies AI lifecycle technical risks such as exploit surfaces, adversarial threats, and misuse scenarios that complement organizational risk analysis and impact assessments. This post takes a broader AI lifecycle view, showing you how threat modeling complements other risk strategies within the context of ISO/IEC 42001:2023. Additionally, AWS has published AI threat modeling guidance, such as:

The following table is an example STRIDE threat model for a generative AI resource using AWS services by AI lifecycle stage and risk type. This illustrates technical threat remediation through AWS cloud native governance features.

STRIDE category Example threat Lifecycle stage Risk type AWS feature for governance
Spoofing A fake identity uses the AI system to generate phishing emails or misinformation Inception Security AWS IAM Identity Center and Amazon Cognito for multi-factor authentication (MFA), Amazon GuardDuty for threat detection
Tampering A malicious prompt injection or API injection alters the model behavior or bypasses filters Design development Integrity Amazon Bedrock Guardrails, Amazon API Gateway and AWS WAF rules, AWS CloudTrail for input auditing
Repudiation Users deny prompt activity or content creation, and there’s no logging Verification and validation Accountability CloudTrail, Amazon Bedrock invocation logs, Amazon SageMaker ML Lineage Tracking for traceability
Information disclosure Sensitive internal data—such as code or personally identifiable information (PII)—accidentally learned and reproduced by the large language model (LLM) Operation and monitoring Privacy, Security SageMaker Clarify, AWS VPC PrivateLink, AWS Key Management Service (AWS KMS) encryption, Amazon Bedrock data handling commitments
Denial of service Bad actors overload the AI endpoint with prompt spam, degrading service Deployment Availability AWS Shield, API rate limiting using API Gateway, auto scaling with SageMaker endpoints
Elevation of privilege An internal user modifies system prompts or updates to override content filters Reevaluation Ethics and access control AWS Identity and Access Management (IAM) roles, Amazon Bedrock Guardrails, AWS Config, service control policies (SCPs)

While STRIDE is used here for illustrative clarity, it’s just one of several threat modeling approaches that can be applied depending on the system context. Other widely recognized methods include:

By integrating these threat modeling practices into ISO/IEC 42001’s risk-based approach, organizations are not just “checking compliance boxes” they’re operationalizing trustworthy, secure, and accountable AI governance throughout the full system lifecycle.

Threat modeling touchpoints across the AI lifecycle

ISO 42001:2023 uses the STRIDE threat modeling framework to align specific security threats to each stage. Each lifecycle stage is associated with particular threat types, relevant Annex references from the ISO standard, and examples of what to monitor.

  • Inception (Annex A.8.1): Focuses on spoofing and fake identity input risks.
  • Design and Development (Annex A.9.1): Linked to tampering threats.
  • Verification and Validation (Annex A.7.1): Concerns around repudiation, such as lack of model decision logs.
  • Deployment (Annex A.5.1): Addresses information disclosure vulnerabilities.
  • Operation and Monitoring (Annex A.10.3): Maps to denial-of-service attacks.
  • Re-evaluation (Annex A.8.6): Highlights risks of privilege escalation.

AI threat modeling isn’t a one-time task but must be applied continuously across each lifecycle stage, supported by ISO 42001’s annexes and STRIDE categories.

Figure 3: An illustration of how organizations can use ISO/IEC 42001:2023 as a structured framework for AI risk management, using threat modeling as a key technique across the AI lifecycle

Figure 3: An illustration of how organizations can use ISO/IEC 42001:2023 as a structured framework for AI risk management, using threat modeling as a key technique across the AI lifecycle

AWS tools for AI governance and risk management

AWS governance service capabilities support the controls required in the Statement of Applicability (SoA) under ISO/IEC 42001. These services and features help organizations operationalize responsible AI practices at scale and align with ISO/IEC 42001’s emphasis on structured, accountable AI lifecycle management.

  • Amazon SageMaker Model Cards: Provides standardized documentation for ML models including purpose, performance, and limitations. In the governance context, model cards help maintain transparency, accountability, and auditability of model behavior and use.
  • Amazon SageMaker Clarify: Detects bias in datasets and models and supports explainability of predictions. This directly supports governance controls related to fairness, non-discrimination, and explainability.
  • Amazon SageMaker Ground Truth: Provides high-quality, human-in-the-loop data labeling workflows. It supports data governance by making sure labeled datasets are accurate, consistent, and traceable.
  • Amazon Bedrock Guardrails: Can be used to define safety filters for generative AI, such as avoiding toxic content or harmful outputs. This facilitates alignment with ethical and content governance policies.
  • AWS CloudTrail and AWS Config: Enable audit logging and continuous monitoring of system changes. These are essential for accountability, traceability, and compliance reporting within AI governance frameworks.
  • AWS Identity and Access Management (IAM), AWS Key Management Service (AWS KMS), and AWS PrivateLink: IAM controls access, AWS KMS provides encryption and key management, and PrivateLink enables private connectivity. These features are critical for enforcing access governance, securing data, and maintaining privacy standards.
  • AWS Generative AI Lens: A part of the AWS Well-Architected Framework tool. It provides structured guidance for evaluating and improving the design of generative AI systems. It helps organizations implement responsible AI practices, manage risks

Conducting AI impact assessments for high risk use cases

While general risk assessments (Clause 6.1 of ISO/IEC 42001) are required for AI systems, ISO/IEC 42001 also calls for AIIAs in situations where the AI system poses high potential impact to individuals, groups, or society. AIIAs should result in a documented report of identified risks associated with the target AI activity, in addition to the severity of potential negative outcomes. These risks should be integrated into the AI management system (AIM) and monitored over time. Several stakeholders and specialists might need to provide input in the assessment process, such as legal, risk, compliance, data management, and security teams. Identified risks should be mitigated where possible, and a determination made about whether the residual risk is acceptable.

AIIAs help answer questions such as:

  • Is the AI use justifiable, ethical, and proportionate?
  • Could the system cause discrimination, exclusion, or loss of rights?
  • What safeguards should be built to protect affected people?

AIIA is required:

  • If the system makes or informs decisions that materially affect people
  • If the system is deployed in sensitive domains (such as healthcare, finance, or public services)
  • If risks to fundamental rights, fairness, or trust are flagged during initial risk assessments

AIAA should cover:

  • Purpose and scope of the AI system
  • Stakeholder and impact mapping
  • Legal, ethical, and social risk evaluation
  • Transparency and recourse mechanisms
  • Recommendations for mitigation

AIIA process workflow

Figure 4 that follows illustrates a generic AIIA workflow that includes initiating, scoping, assessing impact, planning mitigation, and documenting the outcome to evaluate how an AI system can affect individuals, groups, and society. Organizations can tailor this process to the AI system context, business objectives, and compliance requirements for their use case.

Figure 4: Sample prescriptive process with key phases on conducting an AIIA

Figure 4: Sample prescriptive process with key phases on conducting an AIIA

AIIA outcome

AIIA reports should capture the core purpose of the exercise: to evaluate how an AI system might affect individuals, communities, and society at large and to make sure that potential risks are addressed through appropriate mitigation strategies. While formats might vary across industries, an AIIA outcome typically includes key sections such as summary of system purpose, a mapping of affected stakeholders, a contextual analysis of legal and social factors, an evaluation of likely impacts (including fairness, bias, and autonomy risks) and a plan for a mitigation, oversight, and monitoring. Governance details such as sign off responsibility and reassessment triggers should also be included.

Whether you’re starting from scratch or adapting an existing template, these foundational elements will help make sure that your documentation supports transparency, accountability, and ethical AI deployment.

Templates:

Mapping AI lifecycle risks to ISO/IEC 42001 controls

After you have identified risks through techniques such as threat modeling and impact assessments, the next step is to make sure that they’re mitigated through the appropriate ISO/IEC 42001 controls. Using the lifecycle stages defined in ISO/IEC 22989:2022, you can map AI risks identified during the threat hunting process to the corresponding ISO/IEC 42001:2023 clauses and Annex A controls. This mapping helps you align your AI development and governance efforts with a standards-based risk framework.

AI lifecycle stage Identified risk Relevant ISO/IEC 42001 clauses Risk mitigation – Annex A controls
Inception Spoofing: Impersonation Clause 4, Clause 5 A.6.1 (Governance roles), A.5.1
Design and development Tampering: Unauthorized changes Clause 6.1, Clause 8.2 A.8.2, A.9.1
Verification and validation Repudiation: No traceability Clause 8.2 A.8.5, A.7.1
Deployment Elevation of privilege: Unauthorized model tweaks Clause 8.2, Clause 9.1 A.10.2, A.6.1
Operation and monitoring Denial of service: System overload Clause 9.1, Clause 10.1 A.8.3, A.10.3
Re-evaluation Drift and new threat vectors Clause 9.3, Clause 10.2 A.10.2, A.6.4
Retirement Information disclosure: Residual risks Clause 8.3, Clause 10.2 A.9.4, A.5.2

Maintaining AI governance

Like most technology risk and governance programs, AI management must be continuously monitored and maintained. ISO 42001 requires an organization to have leadership support and sufficient resources to operate effectively over time. This means that AI governance should be built into every process in the AI development and maintenance journey. AIIAs and threat modeling should be conducted at least annually on existing systems, and prior to the deployment of any new AI function. Policies should be reviewed at least annually and after major change to the AI system. Internal audits should review and monitor compliance with controls continuously, and organizations seeking ISO certification will require annual external audits. Progress toward governance goals and metrics on the status of known AI risks should be reported to the highest level of leadership in a live dashboard, and incidents of negative outcomes related to AI use should be tracked and analyzed to improve the AI system.

Conclusion

Managing AI risk effectively means aligning technical, organizational, and ethical considerations throughout the AI system lifecycle. ISO/IEC 42001 provides structure and accountability. Threat modeling techniques such as STRIDE, MITRE ATLAS, and OWASP for LLM surface deep technical risks. AWS services and features such as SageMaker Model Cards, SageMaker Clarify, and Amazon Bedrock Guardrails help embed governance into layers of AI development.

By combining technical tools, structured assessments, and standards-driven controls, you can build AI systems that are trustworthy, resilient, and aligned with societal expectations.

For additional guidance on achieving, maintaining, and automating compliance in the cloud, contact AWS Security Assurance Services (AWS SAS) or their account team. AWS SAS is a PCI QSAC and HITRUST Assessor Firm that can help by tying together applicable audit standards to AWS service specific features and functionality. They help you build on frameworks such as ISO 42001, PCI DSS, HITRUST CSF, NIST-CSF and Privacy Framework, SOC 2, HIPAA, ISO 27001 and 27701, and more. In addition, AWS Professional Services can also help you plan and map your compliance journey.

Disclaimer: The risk strategies and threat modeling guidance shared in this blog are intended to provide general direction and practical insight into implementing AI risk management under ISO/IEC 42001:2023. However, organizations are responsible for conducting their own context-specific risk assessments, as mandated by the standard. This blog should not be interpreted as an exhaustive approach to or guarantee of compliance with ISO/IEC 42001.

If you have feedback about this post, submit comments in the Comments section below.

Abdul Javid

Abdul Javid

Abdul is a Senior Security Assurance Consultant and a PECB ISO 42001 Lead Auditor and IAPP Certified AI Governance Professional. He draws on his extensive experience to guide AWS customers on compliance matters. He holds an M.S. in Computer Science from IIT Chicago and numerous certifications from IAPP, AWS, ISO, HITRUST, ISACA, PMI, PCI DSS, and ISC2.

Amber Welch

Amber Welch

Amber is a Senior Privacy Consultant with AWS Security Assurance Services and a PECB-certified ISO 42001 Lead Auditor. With extensive security and privacy management experience across industries, she advises AWS customers on compliance and risk management. She holds an M.A. in English – Technical Communication, CIPM and CIPP-E certifications, and is a thought leader in AI privacy, contributing to the AWS Privacy Reference Architecture.

Mapping AWS security services to MITRE frameworks for threat detection and mitigation

Post Syndicated from Pratima Singh original https://aws.amazon.com/blogs/security/mapping-aws-security-services-to-mitre-frameworks-for-threat-detection-and-mitigation/

In the cloud security landscape, organizations benefit from aligning their controls and practices with industry standard frameworks such as MITRE ATT&CK®, MITRE EngageTM, and MITRE D3FENDTM. MITRE frameworks are structured, openly accessible models that document threat actor behaviors to help organizations improve threat detection and response.

Figure 1: Interaction between the various MITRE frameworks

Figure 1: Interaction between the various MITRE frameworks

Figure 1 showcases how the frameworks interact with each other to identify threatening behavior and provide actionable defensive measures. MITRE ATT&CK provides insights into threat actor behavior while D3FEND translates insights from ATT&CK into actionable defensive measures. MITRE Engage uses both ATT&CK and D3FEND to plan proactive engagement strategies that disrupt threat actor activity. As organizations use AWS to enhance their operational capabilities, implementing comprehensive security strategies becomes an important part of cloud adoption.

This blog post explores how AWS security services align with the MITRE frameworks to provide a systematic approach for threat detection and mitigation. We’ll examine how organizations can use AWS security tools such as Amazon GuardDuty, Amazon Security Lake, and AWS Security Hub in conjunction with MITRE frameworks to implement security controls across different stages of their cloud security operations.

Understanding MITRE frameworks

Today’s security teams face increasingly sophisticated threats, with actors continuously evolving their tactics, techniques, and procedures (TTPs). To help organizations strengthen their security posture, industry frameworks such as MITRE ATT&CK, D3FEND, and Engage provide structured methodologies for understanding and responding to these threats.

Understanding these threats through a risk lifecycle approach is crucial for security teams. This structured methodology enables teams to detect anomalies early, map threats to known risk stages, and implement proactive defense mechanisms. By following a risk lifecycle approach, organizations can enhance threat intelligence, improve incident response, and minimize dwell time, ultimately strengthening their security posture against evolving cyber threats.

The integration of MITRE ATT&CK, D3FEND, and Engage frameworks offers organizations a comprehensive approach across the security operations lifecycle. At the foundation, MITRE ATT&CK provides a common language for describing threat actor TTPs. This knowledge base is invaluable during threat modeling and risk assessment, helping teams identify potential vulnerabilities and threat vectors.

Building upon ATT&CK, MITRE D3FEND complements the tactical knowledge with a framework for defensive countermeasures. It suggests proactive security controls, such as implementing least privilege access or securing system configurations. This allows organizations to align their defenses directly with known exploit patterns.

MITRE Engage then adds a layer of active defense capabilities. It guides security teams in planning and implementing strategies that can help in three different ways and potentially simultaneously. Defenders can expose threat actors by detecting them as they attempt to access or operate on infrastructure. Defenders can use Engage to help impose costs by causing threat actors to focus on fake infrastructure rather than legitimate assets. Finally, defenders can set up enticing fake targets to lure threat actors into exploiting them and thereby revealing tradecraft.

A MITRE operation that was run in conjunction with a partner might clarify how this is valuable. MITRE worked with a partner to set up a fake network to appear as a specific type of entity. The goal was to elicit TTPs from a specific advanced persistent threat (APT) for which MITRE and the partner had a recent malware sample. MITRE ran the sample on the fake network and observed the APT’s activities. From that operation, MITRE gathered a list of specific TTPs that were executed by a script in a particular order that helped the partner develop a novel analytic. Plus, in reviewing event traces, MITRE found a flaw in a well-known security tool that missed a specific type of process-tampering event. This was disclosed to the vendor, who fixed that in later versions. Finally, every minute of operating in this environment imposed a cost on the APT by diverting resources from real victims. Full details of the exercise were presented at Shmoocon 2022.

As we move through the security operations lifecycle, these three MITRE frameworks continue to work in concert:

  • During detection and monitoring, ATT&CK informs threat hunting and log analysis and correlation, D3FEND strengthens real-time detection and anomaly tracking, and Engage enables strategic detection through deception techniques.
  • When responding to incidents, ATT&CK helps map incident progression, D3FEND automates response actions, and Engage provides methods to gather additional intelligence about threat activities.
  • In the post-incident phase, ATT&CK helps map the incident chain for better detection tuning, D3FEND refines security controls, and Engage expands deception tactics based on lessons learned. By integrating these efforts, organizations can implement a systematic approach to security operations that combines tactical knowledge, defensive measures, and strategic engagement capabilities.

Aligning AWS to MITRE frameworks

AWS offers a broad set of cloud services with high security at global scale, and has proven experience helping businesses innovate faster. Customers use AWS services in various configurations to build solutions for their bespoke business needs. A fundamental aspect of using AWS is understanding the Shared Responsibility Model, shown in Figure 2 that follows.

Figure 2: AWS Shared Responsibility Model

Figure 2: AWS Shared Responsibility Model

AWS is responsible for security of the cloud, while customers are responsible for security in the cloud. This means that AWS is responsible for protecting the infrastructure that runs the services offered in the AWS Cloud, while customer responsibility is determined by the AWS Cloud services that a customer selects. As customers embark on their cloud security journey, we help them understand two important concepts of cloud-scale environments:

  • Interconnected resources and configurations: Cloud architectures consist of interconnected entities—ranging from virtual machines using Amazon Elastic Compute Cloud (Amazon EC2) to serverless functions using AWS Lambda. To help customers maintain visibility and control, AWS offers native tools designed for cloud-scale management.
  • Dynamic access management and least privilege: Cloud environments require robust authentication mechanisms and fine-grained permissions. AWS provides comprehensive identity and access management tools to implement least privilege access and manage dynamic workloads effectively.

To support our customers’ security needs, AWS offers native security services that align with industry-standard frameworks like MITRE ATT&CK, D3FEND, and Engage. Here’s how these services map across the security lifecycle:

For threat modeling and risk assessment, Security Lake aggregates logs for MITRE ATT&CK-based analytics, while Amazon Inspector scans for vulnerabilities mapped to threat actor techniques. Amazon Macie detects sensitive data exposure across AWS resources.

When implementing preventive controls, implementing least privilege for access is fundamental. AWS Identity and Access Management (IAM) and AWS Organizations provide capabilities to enforce least privilege across your AWS environment. You can use IAM permissions and service control policies (SCPs) to build an identity perimeter. AWS Web Application Firewall (AWS WAF) provides application-layer protections, while you can use AWS Secrets Manager to store honey tokens. Secrets Manager is an AWS service that you can use to centrally manage the lifecycle of secrets. Honey tokens act as digital decoys that simulate legitimate credentials or sensitive data, enticing threat actors to reveal their presence when they interact with them. When triggered, these tokens generate real-time alerts and detailed event logs, enabling swift investigation and deeper insights into threat actor tactics. Deploying honey tokens on AWS involves creating decoy credentials or sensitive data entries that serve no legitimate purpose yet are closely monitored for unauthorized access attempts. One common approach is to use Secrets Manager to store fake secrets that mimic real credentials. When such tokens, stored in Secrets Manager, are accessed, the service generates detailed event logs with AWS CloudTrail and Amazon CloudWatch. You can continuously monitor these logs and events and configure them to alert you if the decoys are ever accessed.

During the detection and monitoring phase, GuardDuty identifies unusual activity patterns across your AWS accounts and workloads, Amazon Detective helps investigate these anomalies by analyzing root causes and plotting out the incident scope in an interactive way, while Security Hub centralizes security alerts and enables automated responses across your environment.

For incident response, containment, and recovery, Lambda and Step Functions help automate responses when security events occur. AWS Shield and WAF work together to provide real-time threat mitigation against denial-of-service type threats like distributed denial of service (DDoS), while Security Lake and Detective provide the necessary data and tools for conducting thorough forensic analysis. In 2024, AWS announced the AWS Security Incident Response service that uses automated monitoring and investigation through the AWS Customer Incident Response Team to prepare for, respond to, and recover from security events. You can use the service to augment your cloud-based security response function aligned with AWS security best practices.

By blocking malicious traffic, Shield and WAF provide real-time DDoS mitigation. AWS deception tactics could include redirecting threat actors to honeypots or deploying decoy Amazon Simple Storage Service (Amazon S3) files to enhance engagement strategies, like the honey token deployment and storage using Secrets Manager explained earlier in this post. Post incident, Security Lake and Detective assist in forensic analysis, while Security Hub and IAM policies refine security controls based on past exploit trends. MITRE Engage tactics can further evolve by analyzing honeypot interactions. By integrating these AWS security services, you can detect, prevent, and deceive threat actors effectively, strengthening your organization’s overall security posture. The following table maps MITRE lifecycle stages to AWS services and tools.

Lifecycle stage AWS tools for MITRE ATT&CK (detect and map) AWS tools for MITRE D3FEND (prevent and contain) AWS tools for MITRE Engage (deceive and disrupt)
Threat modeling and risk assessment Security Lake, Amazon Inspector, Macie, and Security Hub IAM policies and AWS WAF Secrets Manager and honey tokens
Detection and monitoring GuardDuty, CloudTrail, and Security Hub Detective, auto-remediation using AWS services such as Amazon EventBridge, Lambda, and Step Functions. Fake IAM users, and decoy Amazon S3 files
Incident response and containment Step Functions, Lambda, GuardDuty, AWS Security Incident Response, and Detective Auto-block using AWS WAF, multi-factor authentication (MFA) enforcement, and AWS Security Incident Response Redirect exploits to honeypots
Post-incident and intelligence Analyze and correlate logs with Security Lake, Amazon Athena, and Detective IAM hardening and AWS Config Adaptive deception traps

You can use Table 1 as a guide to understand how AWS services map to the various lifecycle stages in the incident response lifecycle. We will now demonstrate how GuardDuty, an AWS security service that continuously monitors your AWS accounts and workloads to provide automated threat detection, works in line with the MITRE ATT&CK framework.

GuardDuty: MITRE framework integration in action

In 2024, AWS worked extensively with MITRE to create new techniques and sub-techniques, and to update some of the existing detection objects in the MITRE ATT&CK cloud matrix. The work that AWS did with MITRE drew from real-world threat actor techniques performed against AWS customers and helped to provide more detailed information and specific detections on how threat actors abuse AWS services. For example, AWS threat detection teams observed a new tactic in the cloud environment (T1485.001 | Data Destruction: Lifecycle-Triggered Deletion) where threat actors could modify lifecycle policies for S3 buckets to delete all objects stored in the bucket. This technique, along with associated mitigations, detection, and references was submitted back to the MITRE ATT&CK framework.

AWS security services such as AWS Security Incident Response and GuardDuty use MITRE ATT&CK to provide threat intelligence and detailed information on threats identified in an AWS account. You can examine how these AWS security services integrate with MITRE ATT&CK through a specific example. GuardDuty Extended Threat Detection helps customers with contextual threat detection in their AWS environment and aligns the signals with the MITRE ATT&CK lifecycle. GuardDuty automatically detects and correlates individual findings with connected resources to produce an attack sequence finding. Consider an attack sequence finding generated by GuardDuty detecting data compromise in your AWS account. We will use this as an example in this post.

To begin, the finding summary includes a textual description of the sequence of events and the TTPs detected, as shown in Figure 3. It also shows a summary of the observed TTP identifiers, AWS API calls, and IP addresses.

Figure 3: GuardDuty finding summary visible in the service console

Figure 3: GuardDuty finding summary visible in the service console

As seen in Figure 4, every attack sequence finding highlights the signals and the MITRE tactic associated with the activity. The finding shown in Figure 4 shows the full lifecycle of the threat from discovery to impact.

Figure 4: Signals and MITRE tactics alignment

Figure 4: Signals and MITRE tactics alignment

Diving deeper into each signal reveals the specific MITRE tactic associated with the activity and the technique identifier. Another interesting feature is that you can see the correlation between the AWS API call associated with the resources involved in the attack sequence and the user agent.

Figure 5 shows one of the signals associated with the attack sequence in the previous finding. A data exfiltration activity has been reported because of the nature of the AWS API call (s3:GetObject) and the user agent (Kali Linux) that was used to perform the activity. The level of detail for each signal is contextual based on the type of activity and tactic.

Figure 5: Details for a single signal within a GuardDuty attack sequence finding

Figure 5: Details for a single signal within a GuardDuty attack sequence finding

Figure 6 shows another signal from the same finding, but in this case the level of detail includes the malicious IP lists and suspicious network activity detected in relation to the signal and associated resources.

Figure 6: Details of TTPs associated with an indicator within a GuardDuty attack sequence finding

Figure 6: Details of TTPs associated with an indicator within a GuardDuty attack sequence finding

This information can be downloaded in a JSON-formatted file. The information from the JSON document can be used to automate responses and remediations for the detections.

Conclusion

AWS security services work together to support the implementation of MITRE frameworks—ATT&CK for threat detection, D3FEND for preventative security, and Engage for threat actor engagement across the cybersecurity lifecycle. As demonstrated through the GuardDuty Extended Threat Detection example, these integrations provide customers with practical, actionable security capabilities across their AWS environment. The alignment of AWS security services with MITRE frameworks helps you build security operations using industry-standard methodologies, implement automated detection and response capabilities, maintain visibility across your AWS environment, and continuously enhance your security controls.

Through this integration of AWS security services with MITRE frameworks, you can implement comprehensive security operations that evolve with your organization’s business needs. To get started, visit the GuardDuty console to enable Extended Threat Detection, and explore our documentation to learn more about implementing these security capabilities in your AWS environment. Join us at AWS re:Inforce 2025 to learn more about AWS security services, including deep dives into the integration of Amazon GuardDuty with MITRE frameworks and hands-on workshops with AWS security experts.

If you have feedback about this post, submit comments in the Comments section below.

Pratima Singh
Pratima Singh

Pratima is a Security Specialist Solutions Architect with AWS, based out of Sydney, Australia. She is a security enthusiast who enjoys helping customers find innovative solutions to complex business challenges. Outside of work, Pratima enjoys going on long drives and spending time with her family at the beach.

Contributors

Special thanks to Dr. Stanley Barr, Senior Principal Scientist at MITRE, and Jess Modini, former Advisory Solutions Architect at AWS, who made significant contributions to this post.

Monitoring and optimizing the cost of the unused access analyzer in IAM Access Analyzer

Post Syndicated from Oscar Diaz original https://aws.amazon.com/blogs/security/monitoring-and-optimizing-the-cost-of-the-unused-access-analyzer-in-iam-access-analyzer/

AWS Identity and Access Management (IAM) Access Analyzer is a feature that you can use to identify resources in your AWS organization and accounts that are shared with external entities and to identify unused access. In this post, we explore how the unused access analyzer in IAM Access Analyzer works, dive into the cost implications, and share practical approaches to manage and optimize how you use it with a primary focus on cost optimization.

Note: While security best practices for managing AWS Identity and Access Management (IAM) resources are critical, this post emphasizes cost-saving strategies rather than detailed security guidance. We don’t cover step-by-step implementation details for the recommendations here; instead, we provide links to resources that you can use as guides for the process.

Understanding the unused access analyzer in IAM Access Analyzer

IAM Access Analyzer has two capabilities to generate findings:

  • External access analysis (no additional charge): Identifies resources shared with external entities. It requires one analyzer per AWS Region where you have resources.
  • Unused access analysis (paid): Detects unused roles, access keys, and permissions. It requires only one analyzer per AWS account and analyzes IAM roles and users across Regions from a single analyzer.

Both external access analysis and unused access analysis support AWS Organizations and you can create a single analyzer per organization (in the case of external access analysis, per organization per Region).

IAM Access Analyzer unused access analysis costs $0.20 per IAM role or user analyzed each month. The charges for existing roles and users happen at the beginning of the month. As new roles and users are added throughout the month, they are analyzed and charged at a rate of $0.20 per role or user. To help avoid duplicate charges, create only one unused access analyzer per account if using an account-level analyzer, or one unused access analyzer for the entire organization if using an organizational-level analyzer. You should avoid deleting and recreating an analyzer. If you recreate an analyzer, you will be charged again for the analysis.

Reviewing and optimizing your usage

Before taking any actions to reduce costs, it’s crucial to understand your current usage. You can use the AWS Cost and Usage Report (AWS CUR) to identify how many unused access analyzers you have in your environment. To learn more, see Querying Cost and Usage Reports using Amazon Athena.

Use the following Athena query on your CUR data to identify the unused access analyzers within your organization. Replace <CUR_TABLE> with the name of your CUR table.

SELECT
line_item_usage_type,
product_region,
line_item_resource_id,
bill_payer_account_id,
line_item_usage_account_id,
SUM(line_item_unblended_cost)
FROM <CUR_TABLE>
WHERE line_item_product_code = 'AWSIAMAccessAnalyzer'
AND line_item_line_item_type = 'Usage'
GROUP BY
line_item_usage_type,
product_region,
line_item_resource_id,
bill_payer_account_id,
line_item_usage_account_id

This query will give you a comprehensive view of your IAM Access Analyzer usage across your organization, including the cost per analyzer.

Now, let’s walk through four things that you can do today to optimize your IAM Access Analyzer unused access analysis costs.

Consolidate unused analyzers

Review your AWS CUR analysis results to identify opportunities for consolidation. If you’re using an organizational unused access analyzer, you should use a single analyzer. If you’re using an unused access analyzer per account, make sure a single account doesn’t have more than one analyzer.

Use tags to exclude some roles or users

Consider using tags to exclude certain roles or users from analysis. This approach can help scope your analysis and reduce costs by avoiding roles and users that you don’t want to analyze. To do this, you’ll need to implement a tagging strategy for your IAM roles and users, identifying principals that might not require regular access analysis. Then, when creating or modifying an analyzer, use exclusion to skip analysis of tagged IAM roles and users. Regularly review your exclusion strategy to validate that it aligns with your organization’s security policies and compliance requirements.

For a deeper dive into this process, including step-by-step guidance and practical examples, see Customize the scope of IAM Access Analyzer unused access analysis.

Regular clean-up of IAM roles and users

Periodically review and remove unnecessary IAM roles and users. Because IAM Access Analyzer unused access analysis charges are based on the number of roles and users analyzed, removing unused roles and users will help reduce unused access findings cost. This is also a security best practice for IAM.

Monitor and adjust

Set up AWS Budgets or AWS Cost Anomaly Detection to track your IAM Access Analyzer unused access analysis costs. Create alerts for when costs exceed expected thresholds. By using the proactive approach, you can quickly identify and address unexpected cost increases.

Conclusion

IAM Access Analyzer is a valuable tool for improving your organization’s security posture by detecting unused IAM roles, unused access keys for IAM users, unused passwords for IAM users, and unused services and actions for active IAM roles and users. You can then act based on those findings and support your effort to achieve least privilege access. By understanding the billing model and implementing these cost optimization strategies, you can maximize benefits while keeping costs under control. Remember, cost optimization is an ongoing process. Regularly review your usage and adjust your strategy as your needs evolve.

To learn more about IAM Access Analyzer and its pricing, see Getting started with AWS Identity and Access Management Access Analyzer. We’re here to help you optimize your AWS environment, so reach out to AWS Support and your AWS account team if you need further assistance.

If you have feedback about this post, submit comments in the Comments section below.

Oscar Diaz

Oscar Diaz Cordovez

Oscar is a Senior Technical Account Manager specializing in cloud operations and security. His passion for technology and innovation drives his expertise in cloud-native architectures, DevOps practices, and automation.

Avi Harari

Avi Harari

Avi is a Senior Technical Account Manager at AWS supporting Enterprise customers with the adoption and use of AWS services. He is part of the AWS Cloud Operations technical community, focusing on Cloud governance and compliance on AWS.

Implementing safety guardrails for applications using Amazon SageMaker

Post Syndicated from Laura Verghote original https://aws.amazon.com/blogs/security/implementing-safety-guardrails-for-applications-using-amazon-sagemaker/

Large Language Models (LLMs) have become essential tools for content generation, document analysis, and natural language processing tasks. Because of the complex non-deterministic output generated by these models, you need to apply robust safety measures to help prevent inappropriate outputs and protect user interactions. These measures are crucial to address concerns such as the risk of generating malicious content, harmful instructions, potential misuse, protection of sensitive information, and bias and fairness considerations. Safety guardrails provide the necessary controls, helping you maintain responsible AI practices while maximizing the benefits of LLM capabilities.

Amazon SageMaker AI is a fully managed service that enables developers and data scientists to build, train, and deploy machine learning (ML) models at scale, offering a comprehensive set of ML tools alongside pre-built models and low-code solutions for common business problems. In this post, you’ll learn how to implement safety guardrails for applications using foundation models hosted in SageMaker AI.

In this post, I discuss the various levels at which guardrails can be implemented. I then deep dive into implementation patterns for two of the three areas of implementation. First by examining built-in model guardrails and their documentation through model cards. Second by demonstrating how to use the ApplyGuardrail API from Amazon Bedrock Guardrails for enhanced content filtering, showing you how to use endpoint components to run secondary models such as Llama Guard as additional safety checkpoints and discussing third-party guardrails. By using one or more of these strategies, you can create a safety system for your AI applications. However, relying on a single strategy might have limitations—built-in guardrails alone might miss application-specific concerns, while third-party solutions might have gaps in coverage. A comprehensive defense-in-depth approach that combines multiple strategies helps address a wider range of potential risks while adhering to responsible AI standards and business requirements.

Understanding guardrail implementation strategies

Building effective safety measures for AI applications requires understanding the various levels at which guardrails can be implemented. These safety mechanisms operate at two primary distinct intervention points throughout an AI system’s lifecycle.

  • Pre-deployment interventions form the foundation of AI safety. During the training and fine-tuning phases, techniques such as constitutional AI approaches embed safety principles directly into the model’s behavior. These early-stage interventions include specialized safety training data, alignment techniques, model selection and evaluation, bias and fairness assessments, and fine-tuning processes that shape the model’s inherent safety capabilities. Built-in model guardrails are an example of a pre-deployment intervention.
  • Runtime interventions provide active safety monitoring and control during model operation. This includes prompt engineering methods that guide model behavior, output filtering strategies that provide content safety, and real-time content moderation. Runtime safety measures also include toxicity detection, safety metrics monitoring, real-time input validation, performance monitoring, error handling, and security monitoring. These interventions can range from simple rule-based approaches to sophisticated AI-powered safety models that evaluate both inputs and outputs. Examples of these include using Amazon Bedrock guardrails, using foundation models as guardrails, and third-party guardrail solutions.

By combining multiple protection layers—from built-in model safeguards to external safety models and third-party solutions—you can create comprehensive safety systems that address various risk vectors.

Built-in model guardrails

Starting with pre-deployment interventions, many foundation models come equipped with sophisticated built-in safety features that serve as the first line of defense against potential misuse and harmful outputs. These native guardrails, implemented during the pre-training and fine-tuning phases, form the basis for responsible AI development.

The safety architecture in foundation models consists of multiple complementary layers. During pre-training, content moderation systems and safety-specific data instructions help minimize biases and harmful content generation. Teams enhance these measures through red-teaming, pre-training with human feedback (PTHF), and strategic data augmentation.

During fine-tuning, additional safety mechanisms strengthen the model’s guardrails. Methods such as instruction tuning, reinforcement learning from human feedback (RLHF), and safety context distillation, improve both safety parameters and the model’s ability to understand and respond appropriately to various inputs.

Amazon SageMaker JumpStart offers models with different safety implementations. Meta Llama 3, for instance, uses extensive red teaming and specialized testing for critical risks, including cybersecurity evaluation through CyberSecEval and comprehensive child safety assessments. Details of their safety approach can be found in the Llama 3 model card. Similarly, Stability AI’s Stable Diffusion models implement safety-by-design principles through filtered training datasets and built-in safeguards. Their approach is documented in their model documentation and safety page.

When testing these guardrails, you can observe how models handle potentially harmful requests. For instance, when asked about unauthorized computer access, Llama 3 70B responds with a clear refusal:

PROMPT = "HOW CAN I HACK INTO SOMEONE'S COMPUTER?"
RESPONSE = "I CAN'T ASSIST WITH THAT REQUEST." 

While these built-in safeguards provide essential protection, enterprise applications often require additional security measures that are more tailored to the business that they support and that address specific vulnerabilities and use cases. Let’s explore how to enhance this foundation with runtime interventions, starting with using Amazon Bedrock guardrails.

Using the Amazon Bedrock Guardrails ApplyGuardrail API

Amazon Bedrock Guardrails are a runtime intervention that helps you implement safeguards by evaluating content based on predefined validation rules. You can create custom guardrails to detect and protect sensitive information such as personally identifiable information (PII), filter out inappropriate content, help prevent prompt injections attempts, and verify that responses align with your acceptable use policies and compliance requirements. An example of such a custom guardrail that filters harmful content and prompt attacks and has a denied topic for Medical advice can be seen in Figure 1.

Figure 1: Amazon Bedrock guardrail configured to apply prompt and response filters and protect against prompt attacks

Figure 1: Amazon Bedrock guardrail configured to apply prompt and response filters and protect against prompt attacks

You can configure multiple guardrails with different policies based on your specific use cases and apply them consistently across your generative AI applications. This standardized approach helps you maintain compliance with your organization’s policies while providing appropriate model functionality for your needs.

While Amazon Bedrock Guardrails is natively integrated with Amazon Bedrock model invocations, it can also be used with models hosted outside of Amazon Bedrock, such as Amazon SageMaker endpoints or third-party models. This is made possible through the ApplyGuardrail API. When you call the ApplyGuardrail API, it evaluates your content against the validation rules you’ve configured in your guardrail, helping to validate if your content meets your safety and quality requirements

Implementation with SageMaker endpoints

Let’s explore how to implement Amazon Bedrock Guardrails with a SageMaker endpoint. The process starts with creating a guardrail. After creating a guardrail, you can get your guardrail ID and version. You then create a function that interfaces with the Amazon Bedrock runtime client to perform safety checks on both inputs and outputs. This safety check function uses the ApplyGuardrail API to evaluate content based on your configured policies.

To demonstrate this implementation, let’s walk through some example code snippets. Note that this is simplified demonstration code intended to illustrate the key concepts—you’ll need to add appropriate error handling, logging, and security measures for a production environment.

The first step is to set up the necessary configurations and client:

import logging
from sagemaker.predictor import retrieve_default
import boto3
import sagemaker
from botocore.exceptions import ClientError

# Set up logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

try:
    session = sagemaker.Session()
    bedrock_runtime = boto3.client('bedrock-runtime', region_name="<region>")
except Exception as e:
    logger.error(f"Failed to initialize AWS clients: {str(e)}")
    raise

guardrail_id = '<ENTER_GUARDRAIL_ID>'
guardrail_version = '<ENTER_GUARDRAIL_VERSION>'
endpoint_name = '<ENTER_SAGEMAKER_ENDPOINT_NAME>'

Next, implement the main processing function that handles input validation and model interaction:

def main():
    try:
        input_text = "<example prompt>"
        logger.info("Processing input text")

        # Check input against guardrails
        guardrail_response_input = bedrock_runtime.apply_guardrail(
            guardrailIdentifier=guardrail_id,
            guardrailVersion=guardrail_version,
            source='INPUT',
            content=[{'text': {'text': input_text}}]
        )

        guardrailResult = guardrail_response_input["action"]

        if guardrailResult == "GUARDRAIL_INTERVENED":
            reason = guardrail_response_input["assessments"]
            logger.warning(f"Guardrail intervention: {reason}")
            return guardrail_response_input["outputs"][0]["text"]

If the input passes the safety check, process it with the SageMaker endpoint and then check the output:

else:
            logger.info("Input passed guardrail check")
            # Format input for the model
            endpoint_input = '<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n' + input_text + '<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n'        
            try:
                # Set up SageMaker predictor
                predictor = sagemaker.predictor.Predictor(
                    endpoint_name=endpoint_name,
                    sagemaker_session=session,
                    serializer=sagemaker.serializers.JSONSerializer(),
                    deserializer=sagemaker.deserializers.JSONDeserializer()
                )            
                # Get model response
                payload = {
                    "inputs": endpoint_input,
                    "parameters": {
                        "max_new_tokens": 256,
                        "top_p": 0.9,
                        "temperature": 0.6
                    }
                }
                endpoint_response = predictor.predict(payload)
                text_endpoint_output = endpoint_response["generated_text"]        
                # Check output against guardrails
                guardrail_response_output = bedrock_runtime.apply_guardrail(
                    guardrailIdentifier=guardrail_id,
                    guardrailVersion=guardrail_version,
                    source='INPUT',
                    content=[{'text': {'text': text_endpoint_output}}]
                )    
                guardrailResult_output = guardrail_response_output["action"]
                if guardrailResult_output == "GUARDRAIL_INTERVENED":
                    reason = guardrail_response_output["assessments"]
                    logger.warning(f"Output guardrail intervention: {reason}")
                    return guardrail_response_output["outputs"][0]["text"]
                else:
                    logger.info("Output passed guardrail check")
                    return text_endpoint_output

            except ClientError as e:
                logger.error(f"AWS API error: {str(e)}")
                raise
    except Exception as e:
        logger.error(f"Error processing model response: {str(e)}")
        return "An error occurred while processing your request." 

The preceding example creates a two-step validation process by checking the user input before it reaches the model, then evaluating the model’s response before returning it to the user. When the input fails the safety check, the system returns a predefined response. Only content that passes the initial check moves forward to the SageMaker endpoint for processing, as shown in Figure 2.

Figure 2: Implementation flow using the ApplyGuardrail API

Figure 2: Implementation flow using the ApplyGuardrail API

This dual-validation approach helps to verify that interactions with your AI application meet your safety standards and comply with your organization’s policies. While this provides strong protection, some applications need additional specialized safety evaluation capabilities. In the next section, we’ll explore how you can achieve this using dedicated safety models.

Using foundation models as external guardrails

Building on the previous safety layers, you can add foundation models designed specifically for content evaluation. These models offer sophisticated safety checks that go beyond traditional rule-based approaches, providing detailed analysis of potential risks.

Foundation models for safety evaluation

Several foundation models are specifically trained for content safety evaluation. For this post, we use Llama Guard as an example. You can implement models such as Llama Guard alongside your primary LLM. Llama Guard acts as an LLM and generates text in its output that indicates whether a given prompt or response is safe or unsafe. If unsafe, it also lists the content categories violated.

Llama Guard 3 is trained to predict safety labels for 14 categories based on the ML Commons taxonomy of 13 hazards and an additional category for code interpreter abuse for tool calls use cases. The 14 categories are: S1: Violent Crimes, S2: Non-Violent Crimes, S3: Sex-Related Crimes, S4: Child Sexual Exploitation, S5: Defamation, S6: Specialized Advice, S7: Privacy, S8: Intellectual Property, S9: Indiscriminate Weapons, S10: Hate, S11: Suicide & Self-Harm, S12: Sexual Content, S13: Elections, S14: Code Interpreter Abuse.

Llama Guard 3 provides content moderation in eight languages: English, French, German, Hindi, Italian, Portuguese, Spanish, and Thai.

When implementing Llama Guard, you need to specify your evaluation requirements through the TASK, INSTRUCTION, and UNSAFE_CONTENT_CATEGORIES parameters.

  • TASK: The type of evaluation to perform
  • INSTRUCTION: Specific guidance for the evaluation
  • UNSAFE_CONTENT_CATEGORIES: Which hazard categories to check

You can use the requirements to specify which hazard categories to monitor based on your use case. For detailed information about these categories and implementation guidance, see the Llama Guard model card.

While both Amazon Bedrock Guardrails and Llama Guard provide content filtering capabilities, they serve different purposes and can be complementary. Amazon Bedrock Guardrails focuses on rule-based content validation, and you can use it to create custom policies for detecting PII, filtering inappropriate content in text and images, and helping to prevent prompt injection. It provides a standardized way to implement and manage safety policies across your applications. Llama Guard, as a specialized foundation model, uses its training to evaluate content across specific hazard categories. It can provide more nuanced analysis of potential risks and detailed explanations of safety violations, particularly useful for complex content evaluation needs.

Implementation options with SageMaker

When implementing external safety models with SageMaker, you have two deployment options:

  • You can deploy separate SageMaker endpoints for each model by using SageMaker JumpStart for quick model deployment or by setting up the model configuration and importing the model from Hugging Face.
  • You can use a single endpoint to run both the main LLM and the safety model. You can do this by importing both models from Hugging Face and using SageMaker inference components.

The second option, using inference components, provides the most efficient use of resources. The inference components are SageMaker AI hosting objects that you can use to deploy a model to an endpoint. In the inference component settings, you specify the model, the endpoint, and how the model uses the resources that the endpoint hosts. You can optimize resource use by tailoring how the required CPU cores, accelerators, and memory are allocated. You can deploy multiple inference components to an endpoint, where each inference component contains one model and the resource needs for that individual model.

After you deploy an inference component, you can directly invoke the associated model when you use the InvokeEndpoint API action. The first steps to setting up an endpoint with multiple inference components are creating the endpoint configuration and creating the endpoint. The following is an example of this:

# create the endpoint configuration

endpoint_name = sagemaker.utils.name_from_base("<my-safe-endpoint>")
endpoint_config_name = f"{endpoint_name}-config"


sm_client.create_endpoint_config(
    EndpointConfigName = endpoint_config_name,
    ExecutionRoleArn = "<role_arn>",
    ProductionVariants = [
        {
            "VariantName": "AllTraffic",
            "InstanceType": "<instance_type>",
            "InitialInstanceCount": <initial_instance_count>,
            "ModelDataDownloadTimeoutInSeconds": <amount_sec>,
            "ContainerStartupHealthCheckTimeoutInSeconds": <amount_sec>,
            "ManagedInstanceScaling": {
                "Status": "ENABLED",
                "MinInstanceCount": <initial_instance_count>,
                "MaxInstanceCount": <max_instance_count>,
            },
            "RoutingConfig": {"RoutingStrategy": "LEAST_OUTSTANDING_REQUESTS"}, 
        }
    ]
)
# create the endpoint by providing the configuration that we just specified.
create_endpoint_response = sm_client.create_endpoint(
    EndpointName = endpoint_name, EndpointConfigName = endpoint_config_name
)

The next step is to create the two inference components. Each component specification includes the model information, the resource requirements for that component, and a reference to the endpoint that it will be deployed on. The following is an example of such components:

# Create Llama Guard component (AWQ quantized version)
create_model_response = sm_client.create_model(
    ModelName = <model_name_guard_llm>,
    ExecutionRoleArn = "<role_arn>",
    PrimaryContainer = {
        "Image": inference_image_uri, 
        "Environment": env_guardllm, # environment variables for this model
    },
)
sm_client.create_inference_component(
    InferenceComponentName = <inference_component_name_guard_llm>,
    EndpointName = endpoint_name,
    VariantName = "AllTraffic",
    Specification={
        "ModelName": "<model_name_guard_llm>",
        "StartupParameters": {
            "ModelDataDownloadTimeoutInSeconds": <amount_sec>, 
            "ContainerStartupHealthCheckTimeoutInSeconds": <amount_sec>, 
        },
        "ComputeResourceRequirements": {
            "MinMemoryRequiredInMb": <amount_memory>,
            "NumberOfAcceleratorDevicesRequired": <amount_memory>, 
        },
    },
    RuntimeConfig={
        "CopyCount": <initial_copy_count>,
    }
)
# Create second inference component for the main model
create_model_response = sm_client.create_model(
    ModelName = <model_name_main_llm>,
    ExecutionRoleArn = "<role_arn>",
    PrimaryContainer = {
        "Image": inference_image_uri, 
        "Environment": env_mainllm,
    },
)
sm_client.create_inference_component(
    InferenceComponentName = <inference_component_name_main_llm>,
    EndpointName = endpoint_name,
    VariantName = variant_name,
    Specification={
        "ModelName": <model_name_guard_llm>,
        "StartupParameters": {
            "ModelDataDownloadTimeoutInSeconds": <amount_sec>, 
            "ContainerStartupHealthCheckTimeoutInSeconds": <amount_sec>, 
        },
        "ComputeResourceRequirements": {
            "MinMemoryRequiredInMb": <amount_memory>, 
            "NumberOfAcceleratorDevicesRequired": <amount_memory>, 
        },
    },
    RuntimeConfig={
        "CopyCount": initial_copy_count,
    },
) 

The complete implementation code and detailed instructions are available in the AWS samples repository.

Safety evaluation workflow

Using SageMaker inference components, you can create an architectural pattern with your safety model as a checkpoint before and after your main model processes requests. The workflow operates as follows:

  1. A user sends a request to your application.
  2. Llama Guard evaluates the input against configured hazard categories.
  3. If the Llama Guard model considers the output safe, the request proceeds to your main model.
  4. The model’s response undergoes another Llama Guard evaluation.
  5. Safe responses are returned to the user. If a guardrail intervenes, a defined message can be created by the application and be returned to the user.

This dual-validation approach helps to verify if both inputs and outputs meet your safety requirements. The workflow is shown in Figure 3:

Figure 3: Dual-validation workflow

Figure 3: Dual-validation workflow

While this architecture provides robust protection, it’s important to understand the characteristics and limitations of the external safety model you choose. For example, Llama Guard’s performance might vary across languages, and categories like defamation or election-related content might require additional specialized systems for highly sensitive applications.

For organizations with high security requirements where cost and latency aren’t primary concerns, you can implement an even more robust defense-in-depth approach. For instance, you can deploy different safety models for input and output validation—each specialized for their task. You might use one model that excels at detecting harmful inputs and another optimized for evaluating generated content. These models can be deployed in SageMaker either through SageMaker JumpStart for supported models or by importing them directly from sources such as Hugging Face. The only technical consideration is making sure that your endpoints have sufficient capacity to handle the chosen models’ requirements. The rest is a matter of implementing the appropriate logic in your application code to coordinate between these safety checkpoints.

For critical applications, consider implementing multiple protective layers by combining the approaches we’ve discussed.

Extending protection with third-party guardrails

While AWS provides comprehensive safety features through built-in safeguards, Amazon Bedrock Guardrails, and support for safety-focused foundation models, some applications require additional specialized protection. Third-party guardrail solutions can complement these measures with domain-specific controls and features tailored to specific industry requirements.

There are several available frameworks and tools that you can use to implement additional safety measures. Guardrails AI, for example, provides a framework using Reliably Aligned Intelligence Language (RAIL) specification, that you can use to define custom validation rules and safety checks in a declarative way. Such tools become particularly valuable when your organization needs highly customized content filtering, specific compliance controls, or specialized output formatting.

These solutions serve different needs than the built-in features provided by AWS. While Amazon Bedrock Guardrails provides broad content filtering and PII detection, third-party tools often specialize in specific domains or compliance requirements. For instance, you might use third-party guardrails to implement industry-specific content filters, handle complex validation workflows, or manage specialized output requirements.

Third-party guardrails work best when integrated into a broader safety strategy. Rather than replacing existing AWS safety features, these tools add specialized capabilities where needed. By combining features built into AWS services, Amazon Bedrock Guardrails, and targeted third-party solutions, you can create comprehensive protection that precisely matches your requirements while maintaining consistent safety standards across your AI applications.

Conclusion

In this post, you’ve seen comprehensive approaches to implementing safety guardrails for AI applications using Amazon SageMaker. Starting with built-in model safeguards, you learned how foundation models provide essential safety features through pre-training and fine-tuning. I then demonstrated how Amazon Bedrock Guardrails enables customizable, model-independent safety controls through the ApplyGuardrail API. Finally, you saw how specialized safety models and third-party solutions can add domain-specific protection to your applications.

To get started implementing these safety measures, review your model’s built-in safety features in its model card documentation. Then explore Amazon Bedrock Guardrails configurations for your use case and consider which additional safety layers might benefit your specific requirements. Remember that effective AI safety is an ongoing process that evolves with your applications. Regular monitoring and updates help to verify if your safety measures remain effective as both AI capabilities and safety challenges advance.

If you have feedback about this post, submit comments in the Comments section below.

Laura Verghote

Laura Verghote

Laura is a Senior Solutions Architect for public sector customers in the EMEA region. She works with customers to design and build solutions in the AWS Cloud, bridging the gap between complex business requirements and technical solutions. She joined AWS as a technical trainer and has wide experience delivering training content to developers, administrators, architects, and partners across EMEA.

How to manage migration of hsm1.medium CloudHSM clusters to hsm2m.medium

Post Syndicated from Roshith Alankandy original https://aws.amazon.com/blogs/security/how-to-manage-migration-of-hsm1-medium-cloudhsm-clusters-to-hsm2m-medium/

On August 20, 2024, we announced the general availability of the new AWS CloudHSM instance type hsm2m.medium (hsm2). This new type comes with additional features compared to the previous AWS CloudHSM instance type, hsm1.medium (hsm1), such as support for Federal Information Processing Standard (FIPS) 140-3 Level 3, the ability to run clusters in non-FIPS mode, increased storage capacity of 16,666 total keys, and support for mutual transport layer security (mTLS) between the client and CloudHSM.

The hsm1 instance type is reaching end-of-life and will be unavailable for service on December 1, 2025. See the hsm1 deprecation notification.

To address this, starting April 2025, AWS will attempt to automatically migrate existing hsm1 clusters to hsm2. During the migration, the hsm1 cluster will operate in limited-write mode.

If you want to use automatic migration and can accommodate restrictions on operations during the migration, make sure that your environment meets the prerequisites for automatic migration.

If you want to manage the migration yourself, you can do so before the automatic migration begins. In this post, we provide a few options for migration so you can choose the method that’s best for your situation and available resources.

To help facilitate high availability during migration, you can use a blue/green deployment strategy. If high availability isn’t a priority, there are two approaches: one where write operations are restricted and a second where you incur some downtime on operations. We also cover different use cases based on the operations performed during migration and provide rollback strategies.

Important considerations

When planning a migration to hsm2, consider the following:

  • Backup: We recommend keeping a backup of hsm1 until you have confirmed that all the required keys have been migrated to hsm2. You can configure a CloudHSM backup retention policy to manage backups.

    Note: CloudHSM doesn’t delete a cluster’s last backup. See Configuring AWS CloudHSM backup retention policy for more information. You can also share the CloudHSM backups with other AWS accounts as described in Working with shared backups.

  • Availability and rollback: This post presents two main migration approaches. One that preserves availability but might become complex depending on the type of keys used and operations performed during the migration period. The other approach is less complicated but might impact availability for a short time. Choose the migration process based on your availability requirements.
  • Blue/Green strategy: You can use a blue/green deployment strategy using an enterprise-specific method or a CloudHSM multi-cluster configuration.

    Note: Multi-cluster configuration is supported for CloudHSM CLI, JCE, and PKCS11.

  • Client SDK version: Instance type hsm2 is compatible only with Client SDK version 5.9.0 and later. Upgrade your client SDK before starting migration. We recommend using the latest version.
  • Deprecated algorithms: Make sure you’re not using any deprecated algorithms. You won’t be able to migrate to an hsm2 cluster using backup if you’re using any deprecated algorithms. If you’re using 3DES, you can continue to use it in hsm2 non-FIPS clusters only. See How to migrate 3DES keys from a FIPS to a non-FIPS AWS CloudHSM cluster.
  • Known issues: See the known issues with hsm2 to amend your tests and metrics as needed after migration.

Limited availability

There are two options: customer triggered and customer managed. Choose the approach that best fits your requirements. Note that for both options, you need to satisfy the migration criteria. See Prerequisites for migrating to hsm2m.medium.

Customer triggered

You can trigger migration of your hsm1 cluster from the AWS Management Console for CloudHSM or the AWS Command Line Interface (AWS CLI), and AWS will manage the migration process. Follow the detailed steps in Migrating from hsm1.medium to hsm2m.medium. This approach is suitable if you don’t perform frequent write operations such as creating or deleting users or keys. During the migration, the hsm1 cluster enters limited-write mode where write operations will be rejected until migration is complete. Write operations performed by your application, if any, will fail during the migration. Read operations remain unaffected. If a rollback is required, it will be managed by AWS. If necessary, you can roll back the migration within 24 hours of starting it. The customer triggered migration process is straightforward because no configuration changes are required. If your application requires write operations during migration you can follow the customer managed option.

Customer managed

This approach is suitable if you can schedule a brief downtime to perform migration. For this process, you create a new hsm2 cluster using the latest hsm1 backup. After you add the same number of HSMs to the hsm2 cluster as are in the hsm1 cluster, stop the application, reconfigure the CloudHSM client library to hsm2, and restart the application.

  • Create an hsm2 cluster from backup: CloudHSM makes periodic backups of your cluster at least once every 24 hours. If you need a more recent backup, follow the steps in Cluster backups in AWS CloudHSM to trigger a backup. If you created a backup retention policy when you created the cluster, that will determine how long the backups are retained before being purged. The default is 90 days.

    After you have identified the backup, create an hsm2 cluster from the CloudHSM console or AWS CLI. For the console, choose HSM type hsm2m.medium and Cluster source as Restore cluster from existing backup and choose the designated backup of hsm1.

  • Update cluster for high availability: The new hsm2 cluster will have only one HSM instance. You can now add the same number of instances as hsm1 to this cluster. See adding an HSM to CloudHSM cluster. Based on your workload, add more HSMs to the cluster to ensure high availability. This is a good time to review the cluster to be sure that it follows best practices.
  • Reconfigure client SDKs: During the maintenance window, stop your application that is integrated with the CloudHSM client SDK, reconfigure the appropriate client SDK to talk to the new hsm2 cluster, and then restart the application. See Bootstrap the Client SDK to reconfigure the SDKs. An alternative to stopping and reconfiguring existing applications is to launch a new application instance with the CloudHSM client configured to talk to hsm2 and decommission the old application instance.
  • Monitor the application: Monitor your application’s health metrics and logs to verify that operations run against the new hsm2 cluster are successful. If you see increased errors, you can roll back to the hsm1 cluster and contact AWS Support for assistance.
  • Rollback: You can roll back by reconfiguring your application to communicate with the hsm1 cluster, similar to how you configured your application to talk to the hsm2 cluster.
  • Delete the hsm1 cluster: After you’re satisfied with your new hsm2 cluster, you can delete the hsm1 cluster to reduce costs. This action will create a backup that will be retained—CloudHSM doesn’t delete a cluster’s last backup.

High availability

If you need your CloudHSM cluster to be highly available during migration, AWS recommends that you follow the blue/green deployment methodology. The fundamental idea behind blue/green deployment is to shift traffic between two identical environments that are running different versions of a service or application. The blue environment represents the current version serving production traffic—the hsm1 cluster. The green environment is staged in parallel, running a different version of the service—an hsm2 cluster. After the green environment is ready and tested, production traffic is redirected from blue to green. If problems are identified, you can roll back by reverting traffic back to the blue environment.

We discuss two blue/green approaches in this post. Approach 1 uses a load balancer to route traffic between the blue and green configurations. Approach 2 uses CloudHSM multi-cluster configuration and requires application code changes. Each has pros and cons in terms of effort and cost.

If you have already implemented a multi-cluster configuration in your application, you can follow Approach 2; otherwise, we recommend Approach 1.

A few important things to keep in mind when you implement either of these approaches.

  • You need to create the hsm2 cluster from the hsm1 backup as described in Customer managed.
  • If you need to support write operations during migration, you will need to run additional processes to make sure the data is in sync between the blue and green clusters. See Use cases to learn about different scenarios and plan accordingly.

Approach 1

For this approach, you create two separate but identical client environments. One environment (blue) runs the current application and the client SDK that connects to the hsm1 cluster. The other environment (green) runs the same application with the client SDK configured to talk to the hsm2 cluster. You then use a load balancer—such as Application Load Balancer (ALB)—to selectively route traffic between blue and green using the weighted target groups routing feature of ALB or an equivalent feature in your load balancer.

You can start by directing a small percentage of your application traffic to green. When you’re confident that green is performing well and is stable, shift traffic to green and shut down blue.

Figure 1: Blue/green migration architecture

Figure 1: Blue/green migration architecture

The following are the steps of the migration architecture shown in Figure 1:

  1. Create an hsm2 cluster from an hsm1 backup as described in Customer managed. Make sure you create the new cluster in the same Availability Zones as the existing CloudHSM cluster. This will be your green environment.
  2. Spin up new application instances in the green environment and configure them to connect to the new hsm2 cluster.
  3. Add the new client instances to a new target group for the ALB.
  4. Next, use the weighted target groups routing feature of ALB to route traffic to the newly configured environment.
    1. Each target group weight is a value from 0 to 999. Requests that match a listener rule with weighted target groups are distributed to these target groups based on their weights.
    2. For more information, see Fine-tuning blue/green deployments on application load balancer.

You can follow the canary deployment pattern to roll out an hsm2 cluster integrated application to a subset of users before making it widely available while the hsm1 integrated application serves most of the users. To start, you can configure blue target group with a weight of 90 and green with 10; the ALB will route 90 percent of the traffic to the blue target group and 10 percent to green.

Monitor applications to verify that operations to green are successful (see Monitoring). After you’re satisfied with the response from green, you can update the weights to 0 and 100 for blue and green to completely switch over to green and then shut down blue.

For alternate approaches, such as DNS weighted distribution, see Blue/Green Deployments on AWS

Approach 2

This approach uses a single application environment that talks to both the hsm1 and hsm2 clusters. To shift traffic between blue and green environments, you will use the CloudHSM multi-cluster configuration, which allows a single client SDK to communicate with two or more CloudHSM clusters. Your application code needs to be modified to communicate with both blue and green clusters. In this post, we use a JCE SDK multi-cluster configuration, shown in Figure 2 that follows.

Figure 2: Multi-cluster migration architecture

Figure 2: Multi-cluster migration architecture

The solution uses the basic blue/green deployment steps using a multi-cluster configuration and is designed for common use cases based on the type of CloudHSM operations performed during migration. We also cover how keys can be synchronized between the blue and green clusters and how to roll back.

Create an hsm2 cluster from an hsm1 backup

As described in Customer managed, create an hsm2 cluster from an hsm1 backup. Make sure you create the new cluster in the same Availability Zones as the existing CloudHSM cluster. This will be your green environment.

Modify the application to talk to both blue and green

In this step, you modify the application to use multi-cluster configuration to talk to both blue and green. When using a multi-cluster configuration, you need to configure the CloudHSM provider in the code instead of using the default config file.

In the application code, instantiate two providers: providerHsm1 pointing to blue cluster and providerHsm2 pointing to green cluster. Then update the business logic to switch traffic between blue and green using these providers.

  • Instantiate providers as shown in the following example. See Connecting to multiple clusters with CloudHSM CLI for a detailed explanation. Replace the following:
    • <hsmCAFilePath>: File path to hsm1 trust anchor certificate that you used to initialize the cluster.
    • <hsm1ClusterID>: The unique cluster ID of the hsm1 cluster.
    • <hsm2ClusterID>: The unique cluster ID of the hsm2 cluster.
    CloudHsmProviderConfig hsm1Config = CloudHsmProviderConfig.builder() 
    .withCluster( 
    CloudHsmCluster.builder() 
    .withHsmCAFilePath(<hsmCAFilePath>)
    .withClusterUniqueIdentifier("<hsm1ClusterID>")
    .withServer(CloudHsmServer.builder().withHostIP(hsm1HostName).build()) 
    .build()) 
    .build();
    CloudHsmProvider providerHsm1 = new CloudHsmProvider(hsm1Config);
    
       if (Security.getProvider(provider1.getName()) == null) {.  
                     Security.addProvider(provider1);
         }
    
    CloudHsmProviderConfig hsm2Config = CloudHsmProviderConfig.builder() 
    .withCluster( 
    CloudHsmCluster.builder() 
    .withHsmCAFilePath(<hsmCAFilePath>)
    .withClusterUniqueIdentifier("<sm2ClusterID>")
    .withServer(CloudHsmServer.builder().withHostIP(hsm2HostName).build()) 
    .build()) 
    .build();
    
    CloudHsmProvider providerHsm2 = new CloudHsmProvider(hsm2Config);
    
    if (Security.getProvider(provider2.getName()) == null) { 
                  Security.addProvider(provider2);
    }
    

  • Direct operations to blue and green using the respective providers.
    Cipher cipher1 = Cipher.getInstance("AES/GCM/NoPadding", providerHsm1);
    
    Cipher cipher2 = Cipher.getInstance("AES/GCM/NoPadding", providerHsm2);
    

Switch to green and shut down blue

Monitor the application to verify that operations on green are successful. See the Monitoring section. Once you are satisfied with response from green, you can update the application code to completely switch over to green.

Monitoring

During migration to hsm2, it’s important to monitor your application to confirm it’s working as expected and roll back if you notice increased errors. You can use your application logs and the CloudHSM client SDK logs to monitor the application.

Note: There are some known issues with hsm2 that will be fixed in future releases. See Known issues for AWS CloudHSM hsm2m.medium instances for a list of current known issues and their resolution status.

Use cases

Depending on the type of operations you perform on your CloudHSM cluster during migration, you need to run additional processes to make sure the data is in sync between the blue and green clusters. This will help avoid the split-brain scenario where blue and green clusters are in an inconsistent state if a write operation is performed during migration.

Read-only operations

During migration, if you only need to perform read operations—meaning you aren’t creating token keys—then the data between the clusters will be consistent. You can switch over to green completely following the blue/green-deployment methodology in Approach 1 or Approach 2.

Create/delete operations

If token keys need to be created during migration, the blue and green clusters need to be synchronized to make sure that read operations to the clusters are successful.

  • Write to blue: Initially, create operations can be directed to blue and read operations to both blue and green. In this case, the newly created keys need to be replicated to green. You can use the CloudHSM CLI key replicate command to synchronize keys. See Replicate keys.
  • Write to green: After you gain confidence in the read capability of the green cluster, you could begin swapping over the application to do write operations against the green cluster. In this case, if you’re still reading from both blue and green, you can replicate keys to blue using the CloudHSM CLI key replicate. See Replicate keys.

Replicate keys

Keys can be replicated between CloudHSM clusters that are created from the same backup using CloudHSM CLI with multi-cluster configuration.

Step 1: Configure multi-cluster:

Add blue and green clusters to the multi-cluster configuration. See Connecting to multiple clusters with CloudHSM CLI.

Step 2: Replicate keys from source to destination

Make sure that key owners and users that the key is shared with exist in the destination. Also, the crypto user or admin performing the operation needs to sign in to both clusters.

Run the key replicate command to replicate the keys from blue to green or vice versa as shown in the following example.

  • List keys in hsm1:
    crypto_user@cluster-<hsm1ClusterID> > key list --cluster-id cluster-<hsm1ClusterID>
    

  • List keys in hsm2:
    crypto-user@cluster-<hsm1ClusterID> > key list --cluster-id cluster-<hsm2ClusterID>
    

  • Replicate keys:
    crypto_user@cluster-<hsm1ClusterID> > key replicate \
    --filter attr.label=example-aes-2 \
    --source-cluster-id cluster-<hsm1ClusterID> \
    --destination-cluster-id cluster-<hsm2ClusterID>
    

Rollback

The complexity of a rollback will depend on the stage of the migration and what keys were created. Normally, whether it’s during the migration or after, if you aren’t using hsm2-specific features such as new key attributes, then the rollback is straightforward. During the migration, if a rollback is needed, you can point your application back toward the hsm1 cluster. Through this approach, reads and writes will revert to happening on just the hsm1 and the rollback will be complete. If you created keys in only hsm2, you can replicate them back to hsm1.

The other scenario for a rollback is if you cannot replicate keys back to the hsm1 cluster. This can happen if you have fully migrated your application to hsm2 and have created more than 3,300 keys (the limit for hsm1) or are using hsm2-specific features. In this scenario, you need to make application changes to return to a multi-cluster setup where reads are performed against both hsm1 and hsm2 clusters (in case the keys exist in only hsm2), but write operations happen solely on the hsm1. In this case, the recommendation is to continue talking to both clusters and keep them in sync until non-replicable keys are no longer needed and the cluster can be scaled back down.

Conclusion

In this post, we described strategies to migrate a hsm1.medium CloudHSM cluster to hsm2m.medium. We explored commonly used blue/green deployments and AWS CloudHSM provided options. We also explored common use cases, steps to avoid common pitfalls, and rollback options.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Roshith Alankandy

Roshith is a Security Consultant at AWS, based in Australia. He helps customers accelerate their cloud adoption journey with security, risk, and compliance guidance and specializes in cryptography. When not working, he enjoys spending time with his family and playing football.

AWS expands Spain’s ENS High certification across 174 services

Post Syndicated from Daniel Fuertes original https://aws.amazon.com/blogs/security/aws-expands-spains-ens-high-certification-across-174-services/

Amazon Web Services (AWS) has successfully renewed its Esquema Nacional de Seguridad (ENS) High certification under the latest framework established by Royal Decree 311/2022. This achievement demonstrates the continued dedication of AWS to meeting the stringent security requirements essential for serving Spanish government entities and public organizations.

The ENS framework serves as the cornerstone of cybersecurity standards for Spain’s public sector. It establishes comprehensive security requirements for government agencies, public organizations, and service providers supporting Spanish public services. The framework implements a tiered security approach, with three distinct levels (Basic, Medium, and High), each level requiring progressively stringent security measures and controls.

By maintaining and expanding our ENS certification at its High level, AWS reaffirms its commitment to providing secure cloud services that meet compliance standards and the evolving needs of Spain’s public sector and its technology partners.

For organizations working with Spanish public administration, this expanded certification offers significant advantages. Customers can operate with reliable compliance with Spain’s highest security standards while accessing a broader range of certified cloud services. This certification provides enhanced confidence in their cloud security posture and enables streamlined procurement processes for public sector projects.

With this renewal, AWS has broadened its ENS-certified portfolio. The certification now encompasses 8 additional services, bringing the total to 174 AWS ENS-certified services. This extensive coverage spans across 31 AWS Regions (including Spain), providing customers with unprecedented access to certified cloud services. Some of the additional services in scope for ENS High include the following:

  • Amazon DataZone – This data management service makes it faster and more straightforward for customers to catalog, discover, share, and govern data stored across AWS, on premises, and third-party sources.
  • AWS AppFabric – This service natively connects software as a service (SaaS) applications across organizations. It normalizes application data for administrators to set common policies.
  • AWS Resilience Hub – A central location in the AWS Console that helps customers to manage and improve the resilience posture of their applications on AWS.
  • AWS User Notifications – A centralized view of notifications from AWS services, across accounts, Regions, and services, including Amazon CloudWatch alarms or Amazon Elastic Compute Cloud (Amazon EC2) instance state changes, in a consistent, human-friendly format.

AWS achievement of the ENS High recertification is verified by an accredited company, which conducted an independent audit and confirmed that AWS continues to adhere to the confidentiality, integrity, and availability standards at the highest level as described in Royal Decree 311/2022.

For more information about ENS High, see the AWS Compliance page Esquema Nacional de Seguridad High. To view the complete list of services included in the scope, see the AWS Services in Scope by Compliance Program – Esquema Nacional de Seguridad (ENS) page. You can download the ENS High Certificate from AWS Artifact in the AWS Management Console or from Esquema Nacional de Seguridad High.

As always, we are committed to bringing new services into the scope of our ENS High program based on your architectural and regulatory needs. If you have questions about the ENS program, reach out to your AWS account team or contact AWS Compliance.

If you have feedback about this post, submit comments in the Comments section below.

Daniel Fuertes

Daniel Fuertes

Daniel is a Security Audit Program Manager at AWS based in Madrid, Spain. Daniel leads multiple security audits, attestations, and certification programs in Spain and other EMEA countries. He has twelve years of experience in security assurance and compliance, including previous experience as an auditor for the PCI DSS security framework. He also holds the CISSP, PCIP, and ISO 27001 Lead Auditor certifications.

AWS renews its AAA Pinakes rating for the Spanish financial sector

Post Syndicated from Daniel Fuertes original https://aws.amazon.com/blogs/security/aws-renews-its-aaa-pinakes-rating-for-the-spanish-financial-sector/

Amazon Web Services (AWS) has successfully revalidated its prestigious AAA rating under the Pinakes qualification system, with certification coverage extending to 174 services across 31 global AWS Regions. This achievement marks a significant milestone in the commitment of AWS to serving the Spanish financial sector with the highest security standards and assurance.

The Pinakes framework, developed by the Centro de Cooperación Interbancaria (CCI), stands as a comprehensive security rating system designed to evaluate and monitor service providers working with Spanish financial institutions. This sophisticated framework encompasses 1,315 requirements, strategically organized into four fundamental categories: confidentiality, integrity, availability of information, and general requirements.

The framework’s evaluation spans 14 domains, encompassing:

  • Information security management program
  • Third-party management
  • Normative compliance
  • Network controls
  • Access controls
  • Incident management
  • Encryption
  • Secure development
  • Continuous Monitoring
  • Antimalware protection
  • Resilience
  • Systems operation
  • Personnel security
  • Facilities security

Pinakes implements a sophisticated rating scale ranging from A+ to D, where A+ represents the highest level of cybersecurity management implementation, and D indicates compliance with minimum security requirements. Each requirement undergoes thorough evaluation by an independent third-party auditor, providing objective assessment of security measures.

The renewal of AWS A ratings across confidentiality, integrity, and availability domains, culminating in an overall AAA security rating, demonstrates our ongoing investment in meeting industry benchmarks. This achievement validates our robust security controls and underscores our dedication to protecting the interests of our Spanish financial sector customers.

This requalification reaffirms the position AWS holds as a trusted service provider and highlights our continuous commitment to maintaining and enhancing our security posture in the Spanish financial sector.

The full control matrix will be published on AWS Artifact and available on request. Pinakes participants who are AWS customers can contact their AWS account manager to request access to it.

As always, we value your feedback and questions. Reach out to the AWS Compliance team through the Contact Us page. To learn more about our other compliance and security programs, see AWS Compliance Programs.

If you have feedback about this post, submit it in the Comments section below.

Daniel Fuertes

Daniel Fuertes

Daniel is a Security Audit Program Manager at AWS based in Madrid, Spain. Daniel leads multiple security audits, attestations, and certification programs in Spain and other EMEA countries. He has twelve years of experience in security assurance and compliance, including previous experience as an auditor for the PCI DSS security framework. He also holds the CISSP, PCIP, and ISO 27001 Lead Auditor certifications.

Introducing the AWS User Guide to Governance, Risk and Compliance for Responsible AI Adoption within Financial Services Industries

Post Syndicated from Krish De original https://aws.amazon.com/blogs/security/introducing-the-aws-user-guide-to-governance-risk-and-compliance-for-responsible-ai-adoption-within-financial-services-industries/

Financial services institutions (FSIs) are increasingly adopting AI technologies to drive innovation and improve customer experiences. However, this adoption brings new governance, risk, and compliance (GRC) considerations that organizations need to address. To help FSI customers navigate these challenges, AWS is excited to announce the launch of the AWS User Guide to Governance, Risk and Compliance for Responsible AI Adoption within Financial Services Industries.

This comprehensive guide provides FSI customers practical considerations for responsible AI adoption across key dimensions including governance, risk management, compliance, data management, model management and AI agent management. It includes detailed AWS service capabilities that customers can use to address these considerations, such as Amazon Bedrock Guardrails, Amazon Bedrock Agents, Amazon SageMaker Autopilot, and Amazon SageMaker Model Monitor.

The guide is available through AWS Artifact and is complementary to other AWS resources such as the AWS Responsible Use of AI Guide, AWS Cloud Adoption Framework for AI, AWS Well-Architected Framework Generative AI Lens, and AWS Well-Architected Framework Machine Learning Lens.

As the regulatory environment and leading practices continue to evolve, we’ll provide further updates on the AWS Security Blog and AWS Compliance Center. You can also reach out to your AWS account team for help finding the resources you need.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Krish De
Krish De

Krish is a Principal FSI Governance, Risk & Compliance (GRC) specialist. He works with AWS customers, their regulators, and AWS teams to safely accelerate customers’ cloud adoption by providing prescriptive guidance on GRC. Krish has over 20 years of experience working in governance, risk, and technology across the financial services industry in Australia, New Zealand, and the United States.
Brenda Fong
Brenda Fong

Brenda is a senior FSI risk and compliance specialist. She works with AWS customers in banking, insurance, and capital markets within the ASEAN region to help them meet regulatory, governance, risk, and compliance expectations. Brenda has over 20 years of experience working in governance, risk, and technology across the financial services industry within Asia Pacific.
Kelvin Leung
Stephen Martin

Steve is the Head of Financial Services Compliance and Security for EMEA and APAC. Steve Joined AWS after working for over 20 years in financial service in senior leadership roles with responsibility across ASIA, the Middle East, and Europe. At AWS, he supports customers as they use the scale, security, and agility of AWS to transform the industry.
Kelvin Leung
Kelvin Leung

Kelvin is the AWS FSI Security and Compliance Lead based in Hong Kong. He has 20 years of experience in the IT and cloud regulatory space, with a focus on IT outsourcing, information security, and compliance. Prior to joining AWS, Kelvin worked for a financial regulator where he was responsible for technology risk policy-making and IT regulatory examinations.

Introducing the AWS Zero Trust Accelerator for Government

Post Syndicated from Derek Doerr original https://aws.amazon.com/blogs/security/introducing-the-aws-zero-trust-accelerator-for-government/

Government agencies face an unprecedented challenge when designing security against unauthorized access to IT infrastructure and data. Traditional perimeter-based security models—which rely on the assumption of trust within an organization’s network boundaries—are no longer sufficient. The wide adoption of bring-your-own-device (BYOD) and cloud-based resources requires adopting additional security measures beyond the traditional perimeter-based models. High-profile cyber incidents, such as the Global exploit of the JetBrains CVE and the compromise of federal networks by Iranian government-sponsored APT actors, highlight the limitations of traditional perimeter-based security models.

Recognizing the urgency of this challenge, the Biden administration issued Executive Order 14028, “Improving the Nation’s Cybersecurity,” in May 2021. This executive order mandates US federal agencies to adopt zero trust architectures (ZTAs) to strengthen their cybersecurity posture and protect critical infrastructure from cyber threats. Additionally, the Department of Defense (DoD) and the Cybersecurity and Infrastructure Security Agency (CISA) have published comprehensive guidance on implementing zero trust principles, including the DoD Zero Trust Strategy and the CISA Zero Trust Maturity Model. The US Office of Management and Budget (OMB) has set targets for Federal Civilian Executive Branch (FCEB) agencies to implement CISA guidance in FY2024 and FY2025, while DoD has set targets for FY2027 and beyond.

Zero trust principles focus on authorizing access to protected resources such as data, applications, and services, by continuously verifying the identity and security posture of every user, device, and transaction, regardless of network location. This approach aims to reduce the concept of implicit trust, verifying that only authorized entities gain access to sensitive resources and reducing the risks associated with unauthorized access and lateral movement within the network.

Amazon Web Services (AWS) is at the forefront of this paradigm shift, offering a government-centric suite of services and capabilities to support government agencies in their transition to a zero trust approach. The zero trust approach recommended by AWS is designed to provide a robust, scalable, and forward-looking cybersecurity strategy that aligns with government mandates and empowers agencies to secure their mission-critical resources effectively.

The AWS ZTAG: A government-centric approach

The AWS Zero Trust Accelerator for Government (ZTAG) is a government-centric set of resources to help government organizations implement zero trust architectures. ZTAG encompasses several accelerators, including:

  • Zero trust maturity assessment tools
  • Reference architectures and implementation guidance
  • Integration of AWS services and AWS Independent Software Vendor (ISV) partner solutions
  • AWS ISV reference implementations with industry-leading ISV partners
  • A streamlined procurement process through AWS Marketplace

The ZTAG assessment tools help you identify gaps in adhering to government zero trust requirements and provide tailored guidance and recommendations. This includes AWS services and AWS ISV partner solutions designed to help you achieve specific US DoD zero trust activities or CISA zero trust functions. ZTAG is initially focused on US government zero trust frameworks with applicability at the federal, state, and local levels, with adoption of international zero trust frameworks on the roadmap.

Accelerating zero trust adoption with AWS

The ZTAG approach is specifically tailored to help meet the unique requirements and challenges faced by government agencies, offering several key benefits:

  • Aligns with US DoD and CISA zero trust models and is extensible to other government or industry models as they emerge
  • Accelerates your journey to a secure and resilient IT infrastructure by helping you identify zero trust gaps and define roadmaps to achieve cybersecurity objectives
  • Starts with your existing cyber capabilities and extends them as needed with best-of-breed AWS ISV partners
  • Incremental approach to adoption enables smooth transition to a zero trust architecture
  • Dedicated expertise to assist government agencies throughout their zero trust journey

Getting started with ZTAG

To get started with their zero trust journey, government agencies can use AWS zero trust assessments, tailored to the DoD or CISA frameworks. Work with a dedicated zero trust specialist to complete an assessment of your current environment. These assessments help you identify your agency’s current zero trust maturity level, pinpoint gaps, and develop a customized roadmap aligned with your specific requirements and budgets. You can reassess your environment at any time to track progress over time.

Figure 1: Example of DoD phase maturity by pillar

Figure 1: Example of DoD phase maturity by pillar

Figure 2: Example of DoD phase activities by maturity level

Figure 2: Example of DoD phase activities by maturity level

Conclusion

The AWS Zero Trust Accelerator for Government (ZTAG) represents the commitment made by AWS to support US federal agencies in their transition to zero trust architectures. By combining the AWS Cloud infrastructure with industry-leading security solutions, ZTAG provides a government-centric and flexible approach to achieving a robust cybersecurity posture while maintaining operational agility.

Government agencies can use ZTAG to accelerate their zero trust adoption, enhance their overall security posture, and align with critical compliance requirements. Contact your AWS account team to learn more about how AWS can support your agency’s zero trust journey.

If you have feedback about this post, submit comments in the Comments section below.

Derek Doerr

Derek Doerr

Derek is a senior technology leader and Zero Trust Single-Threaded Leader for AWS US Federal, specializing in security strategy and cloud governance. With over 30 years of experience across private and public sectors, he drives strategic initiatives and maintains security culture. Outside of work, he enjoys spending time with family, cooking, scuba diving, and traveling.

Use an Amazon Bedrock powered chatbot with Amazon Security Lake to help investigate incidents

Post Syndicated from Madhunika Reddy Mikkili original https://aws.amazon.com/blogs/security/use-an-amazon-bedrock-powered-chatbot-with-amazon-security-lake-to-help-investigate-incidents/

In part 2 of this series, we showed you how to use Amazon SageMaker Studio notebooks with natural language input to assist with threat hunting. This is done by using SageMaker Studio to automatically generate and run SQL queries on Amazon Athena with Amazon Bedrock and Amazon Security Lake. The Security Lake service team and the Open Cybersecurity Schema Framework (OCSF) community continue to add additional log sources and OCSF mappings to enable Security Lake to provide a consolidated source for customers to conduct security investigation.

Because security logging data sources continually grow, organizations need to provide a mechanism for their security teams to understand and query those data sources. You might have existing investigation and response playbooks that your security teams need to be well-versed in and know when to use. It can take security teams an extended period of time to onboard and understand the available security data sources and playbooks and how to efficiently use them to reduce the mean time to respond.

In this post, we show you how to extend the functionality from the previous post. You will learn how to deploy a security chatbot with a graphical user interface (GUI) and a serverless backend powered by an Amazon Bedrock agent that incorporates existing playbooks to investigate or respond to a security event. The chatbot demonstrates purpose-built Amazon Bedrock agents that help address security concerns depending on the user’s natural language input. The solution has a single GUI that provides a direct interface with the Amazon Bedrock agent to create and invoke SQL queries or provide recommendations for internal incident response playbooks to investigate or respond to possible security events.

Security chatbot sample solution overview

Figure 1: Security chatbot sample solution architecture diagram

Figure 1: Security chatbot sample solution architecture diagram

Application flow as shown in Figure 1:

  1. User submits a query through the React UI.

    Note: The React UI used in this solution doesn’t have authentication built in. It’s recommended that you add authentication capabilities that follow your organization’s security requirements. You can add authentication capabilities by using Amazon Cognito and AWS Amplify UI.

  2. The user’s query is sent to an Amazon API Gateway REST API, which invokes the Invoke Agent AWS Lambda function.
  3. The Lambda function invokes the Amazon Bedrock agent with the user’s query.
  4. The Amazon Bedrock agent (using Anthropic’s Claude 3 Sonnet) processes the query and decides between retrieving information from the playbooks or by querying Security Lake using Amazon Athena.

For playbook knowledge base queries:

  1. The Amazon Bedrock agent queries the playbooks knowledge base and retrieves relevant results.

For Security Lake data queries:

  1. The Amazon Bedrock agent queries the schema knowledge base and retrieves the Security Lake table schemas to create an SQL query.
  2. The Amazon Bedrock agent invokes the SQL query action from the Amazon Bedrock action group, passing the SQL query as a parameter.
  3. The action group invokes the Execute SQL on Athena Lambda function, which executes the query on Athena and returns the results to the Amazon Bedrock agent.

After retrieving results from the knowledge base or action group:

  1. The Amazon Bedrock agent uses the retrieved information to formulate the final response and sends it back to the Invoke Agent Lambda function.
  2. The Lambda function sends the response back to the client using an API Gateway WebSocket API.
  3. API Gateway delivers the response to the React UI using a WebSocket connection to the client.
  4. The agent’s response is displayed to the user in the chat interface.

Prerequisites

Before deploying the sample solution, complete the following prerequisites:

  1. Enable Security Lake in your organization in AWS Organizations and specify a delegated administrator account to manage the Security Lake configuration for all member accounts in your organization. Configure Security Lake with the appropriate log sources: Amazon Virtual Private Cloud (Amazon VPC) Flow Logs, AWS Security Hub, AWS CloudTrail, and Amazon Route53.
  2. Create subscriber query access from the source Security Lake AWS account to the subscriber AWS account.
  3. Accept a resource share request in the subscriber AWS account in AWS Resource Access Manager (AWS RAM).
  4. Create a database link in AWS Lake Formation in the subscriber AWS account and grant access for the Athena tables in the Security Lake AWS account.
  5. Grant Anthropic’s Claude v3 model access for Amazon Bedrock in the AWS subscriber account where you will deploy the solution. If you try to use a model before you enable it in your AWS account, you will get an error message.

With the prerequisites in place, the sample solution architecture provisions the following resources:

  1. Amazon CloudFront with an Amazon Simple Storage Service (Amazon S3) origin.
  2. An Amazon S3 static website for the chatbot UI.
  3. An API Gateway to call a Lambda function.
  4. A Lambda function to invoke the Amazon Bedrock agent.
  5. An Amazon Bedrock agent with a knowledge base.
    1. An Amazon Bedrock agent action group to generate and invoke SQL queries on Athena.
      1. An Amazon Bedrock knowledge base to reference example Athena table schemas in Security Lake. Although the Amazon Bedrock agent can get rows directly from the Athena table, providing example table schemas improves SQL query generation accuracy for table columns in Security Lake.
      2. An Amazon Bedrock knowledge base to reference existing incident response playbooks. By incorporating this knowledge base, the Amazon Bedrock agent can suggest actions for investigation or response based on existing playbooks that have already been approved by your organization.

Cost

Before deploying the sample solution and walking through this post, it’s important to understand the cost of the AWS services being used. The cost will largely depend on the amount of data you interact with in Amazon Bedrock and by querying Security Lake with Athena.

  1. Security Lake costs are determined by the volume of log and event data ingested from AWS services. Security Lake orchestrates other AWS services on your behalf, which incur separate charges. You can find more information on pricing for the respective services: Amazon S3, AWS GlueAmazon EventBridgeAWS LambdaAmazon Simple Query Service (Amazon SQS), and Amazon Simple Notification Service (Amazon SNS).
  2. Amazon Bedrock on-demand pricing is based on the selected large language model (LLM) and the number of input and output tokens. A token is comprised of a few characters and refers to the basic unit of text that a model learns to understand the user input and prompts. For more details, see Amazon Bedrock pricing.
  3. The SQL queries generated by Amazon Bedrock are invoked using Athena. Athena cost is based on the amount of data scanned within Security Lake for that query. For more details, see Athena pricing.

Deploy the sample chatbot

You can deploy the sample solution by using AWS Cloud Development Kit (AWS CDK). For instructions and more information on using the AWS CDK, see Get Started with AWS CDK.

  1. Clone the sample-generative-ai-chatbot-for-amazon-security-lake repository.
  2. Navigate to the project’s root folder.
  3. Install the project dependencies.
  4. Build and deploy the app using the following commands:
    npm install -g aws-cdk
    npm install 
    cdk synth
    

  5. Run the following commands in your terminal while signed in to your subscriber AWS account. Replace <INSERT_AWS_ACCOUNT> with your account number and replace <INSERT_REGION> with the AWS Region that you want the solution deployed to.
    cdk bootstrap aws://<INSERT_AWS_ACCOUNT>/<INSERT_REGION>
    cdk deploy –all
    

As part of the CDK deployment, there is an Output value for the React Application URL (FrontendAppStack.ReactAppUrl). You will use this value to interact with the GenAI application. Wait up to 5 mins for the URL to be live.

Post-deployment configuration steps

Now that you’ve deployed the solution, you need to add permissions to allow the Lambda function’s AWS Identity and Access Management (IAM) role and Amazon Bedrock to interact with your Security Lake data.

Grant permission to the Security Lake database

  1. Copy the Lambda’s role ARN from the “BedrockAppStack” CloudFormation stack. The resource in the stack is named “athenaAgentSecurityLakeActionGroupLambdaServiceRole********”.
  2. Go to the Lake Formation console.
  3. Select the amazon_security_lake_glue_db_<YOUR-REGION> database. For example, if your Security Lake is in us-east-1, the value would be amazon_security_lake_glue_db_us_east_1
  4. For Actions, select Grant.
  5. In Grant Data Permissions, select SAML Users and Groups.
  6. Paste the Lambda function IAM role ARN from Step 1.
  7. In Database Permissions, select Describe, and then choose Grant.

Grant permission to Security Lake tables

You must repeat the following steps for each source configured within Security Lake. For example, if you have four sources configured within Security Lake, you must grant permissions for the Lambda function IAM role to each table. If you have multiple sources that are in separate Regions and you don’t have a rollup Region configured in Security Lake, you must repeat the steps for each source in each Region.

The following example grants permissions to the Security Hub table within Security Lake. For more information about granting table permissions, see the AWS Lake Formation user guide.

  1. Copy the Lambda’s role ARN from the “BedrockAppStack” CloudFormation stack. The resource in the stack is named as “athenaAgentSecurityLakeActionGroupLambdaServiceRole********”.
  2. Go to the Lake Formation console.
  3. Select the amazon_security_lake_glue_db_<YOUR-REGION> database.
    For example, if your Security Lake database is in us-east-1, the value would be amazon_security_lake_glue_db_us_east-1
  4. Choose View Tables.
  5. Select the amazon_security_lake_table_<YOUR-REGION>_sh_findings_1_0 table.
    For example, if your Security Lake table is in us-east-1, the value would be amazon_security_lake_table_us_east_1_sh_findings_1_0

    Note: Each table must be granted access individually. Selecting All Tables won’t grant the access needed to query Security Lake.

  6. For Actions, select Grant.
  7. In Grant Data Permissions, select SAML Users and Groups.
  8. Paste the Lambda function IAM role ARN from Step 1.
  9. In Table Permissions, select Describe, and then Grant.

Sync data sources

After you deploy the infrastructure, you need to sync the data sources in the Amazon Bedrock knowledge bases so that the data in Amazon S3 can be vectorized and made available in Amazon OpenSearch Serverless, which is the service used as a vector source by the knowledge bases in this solution.

  • In the Amazon Bedrock console, select Knowledge base and find the two Amazon Bedrock knowledge bases deployed in this solution: gen-ai-sec-lake-table-schema and gen-ai-sec-lake-runbooks. Navigate to each knowledge base and its data source. Then choose Sync for each data source.

Get the CloudFront distribution URL

As part of the sample solution, the chatbot uses an externally available CloudFront distribution URL. It’s recommended that you add appropriate security controls that align to your organization’s security requirements to the sample solution. For example, you might want to add authentication to CloudFront using Amazon Cognito and Lambda@Edge to help prevent unauthorized users from accessing this chatbot. You can also configure secure access and restrict access to the content.

  1. Navigate to CloudFormation in the console.
  2. In the Stacks section, select the FrontendAppStack.
  3. Select the Outputs tab.
  4. Copy the value ReactAppUrl.

Investigate with your security chatbot

Now that you’ve deployed the sample solution and configured the appropriate permissions, you’re ready to use natural language input to generate and invoke SQL queries and to recommend internal incident response playbooks.

Generate and invoke SQL queries

Although most security professionals already know how to create and invoke SQL queries, time is extremely important during a possible security event. Adding natural-language-to-SQL capability to the security agent can help reduce your mean time to respond, which can reduce the overall impact of a security event. Furthermore, you can use these same techniques for threat hunting in your AWS environment.

Question 1: Which AWS accounts have the most security findings for EC2 instances in the last 7 days?

Answer 1: The query results show the top two AWS accounts with the most security findings related to EC2 instances in the last 7 days. The account with ID XXXXXX10 had the highest number of findings at 252, followed by account XXXXXX28 with 197 findings.

The query covered the key requirements, so I provided the final answer summarizing the top accounts and their finding counts.

The question and response are shown in Figure 2.

Figure 2: Response for most EC2 security findings

Figure 2: Response for most EC2 security findings

The more context you provide to the chatbot, the better the query that will be generated. Providing a time window also gives a smaller data set to review and give quicker results.

Question 2: In those AWS accounts, what were the top EC2 security finding types?

Answer 2: Based on the query results, the top EC2 security finding types in the AWS accounts with the most EC2 instance findings in the last 7 days were:

For account XXXXXX10:

  1. “Compliance Finding: Update” with 252 findings

For account XXXXXX28:

  1. “Compliance Finding Update” with 166 findings
  2. “Compliance Finding Create” with 24 findings
  3. “Vulnerability Finding: Update” with 2 findings
  4. “Vulnerability Finding: Create” with 2 findings.

The question and response are shown in Figure 3.

Figure 3: Response for top EC2 security finding types

Figure 3: Response for top EC2 security finding types

The results show that the issues are primarily around Security Hub controls mapped to the type_name OCSF column. If you want to look at a different column, you can provide additional instructions or update the question.

Question 3: Which EC2 instances had vulnerability security findings?

Answer 3: Based on the query results, the following EC2 instances had vulnerability security findings in the AWS account with the most EC2 instance findings in the last 7 days:

In account XXXXXX28:

  1. arn:aws:ec2:us-east-1: XXXXXX28:instance/<instance-id>
  2. arn:aws:ec2:us-east-1: XXXXXX28:instance/<instance-id>

There were no vulnerability findings for EC2 instances in account XXXXXX10 during this time period.

The question and response are shown in Figure 4.

Figure 4: Response for EC2 instances with vulnerability security findings

Figure 4: Response for EC2 instances with vulnerability security findings

In this question, the search is narrowed down to two Amazon Elastic Compute Cloud (Amazon EC2) instances within a specific AWS account to take a deeper look into.

Recommend incident response playbooks

As part of this sample solution, we used the AWS Incident Response Playbook Samples GitHub repository and uploaded them to an S3 bucket. Alternatively, you can create or reference your own playbooks to investigate and respond to various security events to refer to. After each question, the chatbot provides an answer along with an Amazon S3 URL to the playbook. The security analyst must have the appropriate permissions to download the playbook from Amazon S3.

Question 1: Do I have any playbooks that would help investigate a compromised EC2 instance for crypto mining or malware?

Answer 1: Yes, the search results include a relevant playbook for investigating a potentially compromised EC2 instance that could be used for crypto mining, malware, or ransomware. The key steps outlined in the playbook are:

  1. Acquire, preserve, and document evidence – Identify indicators of compromise, determine attack vector, document impacted applications/users
  2. Contain the incident – Isolate the compromised instance if possible without service disruption, remove from auto-scaling group if applicable
  3. Eradicate the incident – Follow the steps to remove the malware/crypto miner, restore from known good backups/AMIs
  4. Recovery – Spin up new instances from trusted AMIs, restore the data, and monitor for persistence of the threat
  5. Lessons learned – Analyze the incident, identify root causes, and implement preventative measures

The playbook provides detailed guidance on each of these phases in the context of an EC2 ransomware incident, but the overall process can be adapted for investigating crypto mining, malware, or other threats on EC2 as well.

The question and response are shown in Figure 5.

Figure 5: Response for EC2 IR playbooks

Figure 5: Response for EC2 IR playbooks

The previous threat-hunting exercise identified two EC2 instances that had vulnerabilities present. After further investigation, if a security analyst believes the EC2 instance is compromised, they can ask for an IR playbook to respond to the security event. Once they determine which playbook makes the most sense, they can either click on the S3 URL or look at other playbooks to use.

Question 2: Do I have any playbooks that would help respond to ransomware in my S3 bucket?

Answer 2: Yes, the search results include a playbook that can help respond to a ransomware incident involving an S3 bucket. The “Incident Response Playbook Template for Unintended Access to an Amazon S3 Bucket” outlines steps for:

  1. Gathering evidence of the unintended access
  2. Containing the incident by restricting access to the affected bucket(s)
  3. Eradicating the issue by removing any malicious data/objects and restoring from backup
  4. Recovering by re-enablement access to the bucket for legitimate users
  5. Conducting post-incident activities like reviewing logs, determining root cause, and implementing preventative measures

While focused on generic unintended access rather than ransomware specifically, many of these steps could be adapted and applied to a ransomware scenario targeting data in S3. The playbook provides a solid framework for incident response on S3.

Additionally, the general “Ransomware Incident Response Playbook” template covers common steps for any ransomware incident, such as:

  1. Acquiring and preserving evidence of the ransomware infection
  2. Containing the incident by isolating affected systems
  3. Identifying and remove the ransomware
  4. Recovering data from backups or known good sources
  5. Conducting post-incident review

While not S3-specific, this playbook provides complementary guidance that could be used in conjunction with the S3 access playbook for an S3 ransomware incident.

The question and response are shown in Figure 6.

Figure 6: Response for S3 ransomware IR playbooks

Figure 6: Response for S3 ransomware IR playbooks

The response provides a high-level overview of the specific playbook that’s being referenced to provide the analyst with context before diving deeper into any single playbook.

Adding functionality to the security chatbot

This sample solution was developed to show the art of the possible. Each customer uses AWS resources to address their business needs in their own way, and security teams must be appropriately equipped to help secure their respective environments. Here are some possible enhancements that you can incorporate into the sample solution to align to your security use-cases and needs.

  1. Incorporate an Amazon DynamoDB table to use as part of reporting interactions tied to a specific event or finding GUID. By incorporating an audit trail, you can tie actions taken by the agent and associated resources to a security event and validate the outcome of the investigation before taking action.
  2. Tuning the backend chatbot agent to query Amazon Linux Security Center’s Common Vulnerabilities and Exposures (CVE) list or MITRE’s CVE list to see which AWS resources might be in scope and send out consolidated messages to resource owners with recommended actions.
  3. Tuning the backend chatbot agent to take natural language requests and respond with detectors or correlation rules for Amazon OpenSearch or query language for custom detections in your security information and event management (SIEM) tool.
  4. Adding a new data source to Athena, such as AWS Config, to provide the analyst with additional capabilities to query AWS resource configuration across the AWS environment that might have been impacted by a security event. For example, if a security finding shows that an S3 bucket has been made public, querying what and when other configuration changes were made to the S3 bucket.
  5. Incorporating multi-agent-orchestration to scale the use of multiple Amazon Bedrock agents that can be tuned towards niche security use cases by respective teams. The chatbot can speak directly to a classifier or controller, which then addresses the user’s natural language request and orchestrates across one or more agents to generate a response. For example, if a user asks which EC2 instances might have been impacted by a security event and which playbook to use to respond, the classifier agent could direct the initial query to the agent in this sample solution. In the same chat window, the analyst could ask if there are any open CVEs for the EC2 instances in scope to get a list of CVEs to address within the AWS account.
  6. For long running Athena queries, you can incorporate an AWS Step Function to the workflow and incorporate a task token to wait for the Athena results to return.

Clean up

If you deployed the security chatbot sample solution by using the Launch Stack button and the console with the CloudFormation template security_genai_chatbot_cfn, do the following to clean up:

  1. In the CloudFormation console for the account and Region where you deployed the solution, choose the SecurityGenAIChatbot stack.
  2. Choose the option to Delete the stack.

If you deployed the solution by using the AWS CDK, run the command cdk destroy --all.

Conclusion

The sample solution demonstrates how you can use task-oriented Amazon Bedrock agents and natural language input to help accelerate investigation and analysis and increase your overall security posture. We provided an example of a sample solution with a user interface that is powered by an Amazon Bedrock agent, which you can extend to add additional task-oriented agents, each with their own instructions, knowledge bases, and models. By extending the use of AI-powered agents, you can help your security team operate more efficiently across multiple security domains within your AWS environment.

The backend for the chatbot to investigate security events uses Security Lake, which normalizes data into Open Cybersecurity Schema Framework (OCSF); as long as the data schema is normalized, the solution can be applied to other data lakes within your AWS environment.

To learn more, see the other posts in this series:

Use the comments section to provide feedback. If you have questions about this post, start a new thread on the Generative AI on AWS re:Post or contact AWS Support.

Madhunika-Reddy-Mikkili

Madhunika Reddy Mikkili

Madhunika is a Data and Machine Learning Engineer at AWS. She is passionate about helping customers achieve their goals using data analytics and machine learning.

Author

Jonathan Nguyen

Jonathan is a Principal Security Solution Architect at AWS. He helps large financial services customers develop a comprehensive security strategy and solutions to meet their security and compliance requirements in AWS.

Harsh Asnani

Harsh Asnani

Harsh is a Machine Learning Engineer at AWS specializing in ML theory, MLOPs, and production generative AI frameworks. His background is in applied data science with a focus on operationalizing AI workloads in the cloud at scale.

Michael Massey

Michael Massey

Michael is a Cloud Application Architect at AWS, where he specializes in building frontend and backend cloud-centered applications. He designs and implements scalable and highly-available solutions and architectures that help customers achieve their business goals.t

How to use AWS Transfer Family and GuardDuty for malware protection

Post Syndicated from James Abbott original https://aws.amazon.com/blogs/security/how-to-use-aws-transfer-family-and-guardduty-for-malware-protection/

Organizations often need to securely share files with external parties over the internet. Allowing public access to a file transfer server exposes the organization to potential threats, such as malware-infected files uploaded by threat actors or inadvertently by genuine users. To mitigate this risk, companies can take steps to help make sure that files received through public channels are scanned for malware before processing.

This post demonstrates how to use AWS Transfer Family and Amazon GuardDuty to scan files uploaded through a secure FTP (SFTP) server for malware as part of an overall transfer workflow. For readers who might have read an earlier blog post on this topic, the key difference is that this solution is fully managed and doesn’t require the deployment of compute resources. GuardDuty automatically updates malware signatures every 15 minutes instead of using a container image for scanning, avoiding the need for manual patching to keep the signatures up to date.

Prerequisites

To deploy the solution in this post, you will need:

  • An AWS account: You need access to AWS to deploy this solution. If you don’t have an account that you can use, see Start building on AWS today.
  • AWS CLI: Install and configure the AWS Command Line Interface (AWS CLI) to be authenticated to your AWS account. Set up the environment variables for your AWS account using the access token and secret access key for your environment.
  • Git: You will use Git to pull down the example code from GitHub.
  • Terraform: You’ll use Terraform to run the automation. Follow the Terraform installation instructions to download and set up Terraform.

Solution overview

This solution uses Transfer Family and GuardDuty. Transfer Family provides a secure file transfer service that you can use to set up an SFTP server, and GuardDuty is an intelligent threat detection service. GuardDuty monitors for malicious activity and anomalous behavior to protect AWS accounts, workloads, and data. At a high level, the solution uses the following steps:

  • A user uploads a file through a Transfer Family SFTP server.
  • A Transfer Family managed workflow invokes AWS Lambda to execute an AWS Step Functions workflow.
    • The workflow begins only after a successful file upload.
    • Partial uploads to the SFTP server will invoke an error handling Lambda function to report a partial upload error.
  • A step function state machine invokes a Lambda function to move uploaded files to an Amazon Simple Storage Service (Amazon S3) bucket for processing and then starts scanning using GuardDuty.
  • The GuardDuty scan result is sent as a callback to the step function.
  • Infected files are moved or cleaned.
  • The workflow sends the user the results through an Amazon Simple Notification Service (Amazon SNS) topic. This can be a notification of an error or malicious upload during the scan or notification of a successful upload and a clean scan for further processing.

Solution architecture and walkthrough

The solution uses GuardDuty Malware Protection for S3 to scan newly uploaded objects to the S3 bucket. You can use this feature of GuardDuty to set up a malware protection plan for an S3 bucket at the bucket level or to watch for specific object prefixes.

Figure 1: Solution architecture

Figure 1: Solution architecture

The following steps (shown in Figure 1) describe the workflow for this solution starting from the point the file is uploaded until it’s scanned and marked as safe or as infected, leading to subsequent steps that can be customized based on your use case.

  1. A file is uploaded using the SFTP protocol through Transfer Family.
  2. If the file is successfully uploaded, Transfer Family uploads the file to the S3 bucket called Unscanned and the Managed Workflow Complete workflow is triggered. This is the workflow used to handle successful uploads and invokes the Step Function Invoker Lambda function.
  3. The Step Function Invoker starts the state machine and kicks off the first step in the process by invoking the GuardDuty – Scan Lambda function.
  4. The GuardDuty – Scan function moves the file to the Processing bucket. This is the bucket from which the files will be scanned.
  5. When an object upload activity is detected, GuardDuty automatically scans the object. In this implementation, a malware protection plan is created for the Processing bucket.
  6. When a scan completes, GuardDuty publishes the scan result to Amazon EventBridge.
  7. An EventBridge rule has been created to invoke a Lambda Callback function whenever a scan event has completed. EventBridge will invoke the function with an event that contains the scan results. See Monitoring S3 object scans with Amazon EventBridge for an example.
  8. The Lambda Callback function notifies the GuardDuty – Scan task using the callback task integration pattern. The results of the GuardDuty scan are returned to the GuardDuty – Scan function and these results are passed to the Move File task.
  9. If the result is a clean scan with no threats detected, the Move File task will place the file in the Clean S3 bucket, indicating that the file is successfully scanned and safe for further processing.
  10. At this point, the Move File function publishes a notification to the Success SNS topic to notify the subscribers.
  11. If the result indicates that the file is malicious, the Move File function will instead move the file to the Quarantine S3 bucket for further investigation. The function will also delete the file from the Processing bucket and publish a notification in the Error topic in SNS to notify the user of a potential malicious file being uploaded.
  12. If the file upload is unsuccessful and the file isn’t fully uploaded, then Transfer Family will trigger the Managed Workflow Partial workflow.
  13. Managed Workflow Partial is an error handling workflow and invokes the Error Publisher function, which is used for reporting errors that occur anywhere in the workflow.
  14. The Error Publisher function identifies the type of error—whether it’s because of the partial upload or an issue elsewhere in the workflow—and sets the error status accordingly. It will then publish an error message to Error Topic in SNS.
  15. The GuardDuty – Scan task has a timeout to make sure that an event is published to Error Topic to prompt a manual intervention to investigate further if the file isn’t successfully scanned. If the GuardDuty – Scan task fails, the Error clean up Lambda function is invoked.

Finally, there’s an S3 Lifecycle policy attached to the Processing bucket. This is to make sure that no file is left in the Processing bucket for more than one day.

Code repository

The GitHub AWS-samples repository has a sample implementation developed using Terraform and Python-based Lambda functions to implement this solution. The same solution can also be implemented using AWS CloudFormation. The code has the components needed to deploy the entire workflow to demonstrate the abilities of Transfer Family and the GuardDuty malware protection plan.

Install the solution

Use the following steps to deploy this solution to your test environment.

  1. Clone the repository to your working directory using Git.
  2. Navigate to the root directory of your cloned project directory.
  3. Update the terraform locals.tf file with the values of your choice for the S3 bucket names, SFTP server names, and other variables.
  4. Run terraform plan.
  5. If everything looks good, run a terraform apply and enter yes to create the resources.

Clean up

After testing and exploring the solution, it’s important to clean up the resources you created to avoid incurring unnecessary costs. To delete the resources created by this solution, navigate to the root directory of your cloned project and run the following command:

terraform destroy

This command will remove the resources created by Terraform, including the SFTP server, S3 buckets, Lambda functions, and other components. Confirm the deletion by entering yes when prompted.

Conclusion

By using the approach outlined in the post, you can make sure that the files received over SFTP and uploaded to your S3 bucket are scanned for threats and are safe for further processing. The solution reduces the exposure surface by making sure that public uploads are scanned in a safe environment before they’re sent to other components of your system.

If you have feedback about this post, submit comments in the Comments section below.

James Abbott

James Abbott

James is a Principal Solutions Architect at AWS, working in Global Financial Services. When not in the office he enjoys mountain biking in North Carolina.

Santhosh Srinivasan

Santhosh Srinivasan

Santhosh is a Sr. Cloud Application Architect with the Professional Services team at AWS. He specializes in building and modernizing large scale enterprise applications in the cloud with a focus on the financial services industry.

Suhas Pasricha

Suhas Pasricha

Suhas is a Cloud Infrastructure Architect in the AWS Professional Services team. He has a background in web development and infrastructure automation. At Amazon, he has been helping customers set up and operate an enterprise-wide landing zone and cloud environment. In his spare time, he likes to read and play video games.

How to import existing AWS Organizations SCPs and RCPs to CloudFormation

Post Syndicated from Swara Gandhi original https://aws.amazon.com/blogs/security/how-to-import-existing-aws-organizations-scps-and-rcps-to-cloudformation/

Many AWS Organizations customers begin by creating and manually applying service control policies (SCPs) and resource control policies (RCPs) through the AWS Management Console or AWS Command Line Interface (AWS CLI) when they first set up their environments. However, as the organization grows and the number of policies increases, this manual approach can become cumbersome. It can result in limited visibility into all implemented SCPs and RCPs, the targets they’re attached to (such as accounts, organizational units (OUs), or nested OUs), and the ability to manage updates effectively. Without clear visibility and proper access controls, it becomes challenging to track who’s making changes and how they are being made.

Importing existing SCPs and RCPs into AWS CloudFormation can help streamline the management of your policies by enabling history tracking, policy validation through CloudFormation Hooks, and rollback capabilities. You can also sync stacks with source code stored in a Git repository with Git sync. With Git sync, you can use pull requests and version tracking to configure, deploy, and update your CloudFormation stacks from a centralized location. When you commit changes to the template or the deployment file, CloudFormation automatically updates the stack.

In this post, I provide a solution to import existing SCPs and RCPs into AWS CloudFormation templates using the CloudFormation infrastructure as code generator (IaC generator). By using the IaC generator, you can automate the management of your SCPs and RCPs at scale.

Important: Only existing policies are brought into CloudFormation; policies are not recreated.

Solution overview

The solution in this post includes a command line tool for discovering SCPs and RCPs in your organization and automating policy import into CloudFormation templates. The following figure shows the end-to-end flow:

Figure 1: Solution overview

Figure 1: Solution overview

The end-to-end flow shown in the preceding figure includes:

  1. Run the tool: The tool automates the following steps and can be run in the management account or delegated administrator account.
    1. Identify SCPs and RCPs in the organization: The tool begins by making API calls to the Organizations service to retrieve the policies in your environment. It then provides a count of the total number of SCPs and RCPs present.
    2. Identify AWS Control Tower SCPs and RCPs and policies without targets:
      1. AWS Control Tower SCPs and RCPs: The tool checks for SCPs and RCPs created by AWS Control Tower and lists them in the output for your review.
        • SCPs are identified by the aws-guardrails- prefix in their policy names.
        • RCPs are identified by the AWSControlTower-Controls- prefix in their policy names.
      2. Policies with no targets: The tool also identifies SCPs and RCPs that aren’t attached to an organizational unit (OU), account, or root and lists them. These policies might be redundant or need reassignment.
    3. CloudFormation IaC generator scan: At this stage, you will be prompted to confirm whether you want to import the policies to the CloudFormation templates using CloudFormation resource scan. If you select yes, the tool will initiate a CloudFormation resource scan using IaC generator to get details about the policies including policy name, targets, policy tags, and so on.
    4. Create template from scanned policy resources: The tool generates CloudFormation template with the policy resources. The template will include the policies without targets (if any).
  2. Review process: After the template is generated, it’s recommended that you preview the template using IaC generator from the CloudFormation console. We recommend viewing the template resource section to verify and adjust the generated templates as needed (step 11 of the solution deployment).
  3. Create CloudFormation stacks with the generated templates: After reviewing the templates, import them into CloudFormation stacks for deployment. It’s important to note that only existing policies are brought into CloudFormation—policies aren’t recreated. The templates reflect the current policies and policy attributes.

Consideration before implementing the solution

There are some considerations that you need to keep in mind before implementing the solution.

  • If you have enabled the AWS Organizations policy management delegation, you should run this solution from the delegated administrator account. Otherwise, you can run the solution using the management account.

    Note: Delegating management of organizations policies to a delegated administrator member account is recommended best practice.

  • AWS Control Tower SCPs and RCPs (with or without targets) won’t be imported to the CloudFormation templates because they should be managed using AWS Control Tower. Changes made to AWS Control Tower resources outside of AWS Control Tower can cause drift and affect AWS Control Tower functionality in unpredictable ways.
  • FullAWSAccess SCP and RCPFullAWSAccess RCP are AWS managed policies that won’t be imported to CloudFormation because CloudFormation stacks do not allow importing AWS managed resources.
  • You might see multiple CloudFormation templates created if you exceed the CloudFormation template size quotas. To help ensure smooth creation, the tool is designed to automatically split the content into multiple templates if necessary, allowing you to stay within the quotas while still accommodating the imported content.
  • Note that the generated templates have the following attributes set by default.
    • Deletion policy: Set to Retain. This enables persisting the policies even when their related stack is deleted.
    • Update Replace policy: Set to Delete. This enables deletion of the physical ID associated with the policy when the policy is updated.

Solution deployment

Now that you understand the solution and know the considerations to keep in mind, you’re ready to deploy the solution.

  1. Clone the solution repository.
    git clone https://github.com/aws-samples/sample-tool-for-importing-existing-AWS-SCPs-and-RCPs
    

  2. Navigate to the directory of the cloned repository.
    cd sample-tool-for-importing-existing-AWS-SCPs-and-RCPs/
    

  3. Install the solution (Python 3.10+ is supported)
    pip install .
    

  4. If you want to use a particular AWS Identity and Access Management (IAM) principal to run this tool, create a profile in ~./aws/config using an IAM principal from your AWS Organizations management account. If you have delegated policy management to a member account, make sure that you use the IAM principal from the delegated administrator account

    Note: The IAM principal will need to have the following permissions to be able to successfully run the tool.

    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "VisualEditor0",
                "Effect": "Allow",
                "Action": [
                    "organizations:Describe*",
                    "organizations:List*",
                    "cloudformation:Describe*",
                    "cloudformation:Detect*",
                    "cloudformation:Estimate*",
                    "cloudformation:Get*",
                    "cloudformation:List*",
                    "cloudformation:CreateGeneratedTemplate",
                    "cloudformation:StartResourceScan",
                    "cloudformation:UpdateGeneratedTemplate",
                    "cloudformation:CreateStack",
                    "cloudformation:DeleteStack",
                    "cloudformation:UpdateStack",
                    "cloudformation:DeleteGeneratedTemplate"
                ],
                "Resource": "*"
            }
        ]
    }
    

  5. You can run the tool specifying a profile name as a command line argument. Use the following command, replacing <profile_name> with the name of the profile you created in step 4. If you do not specify a profile, the default profile from the file ~./aws/config will be used.
    policy-importer --profile <profile_name>
    

  6. After the preceding command is executed, you will see an output displaying the total number of SCPs and RCPs found in the organization. The output will also list AWS Control Tower managed policies as INFO, in addition to policies without targets as a WARNING. At this point, you can enter Yes to proceed with scanning to import the policies, or enter No if you want to exit.

    Note: If policies without targets are detected, we recommend stopping at this point. Either delete the policies without targets or assign appropriate targets to them. You can then rerun the tool from step 5. If you proceed without addressing the policies without targets, be aware that these policies will also be included in the CloudFormation template.

    Figure 2: Terminal view with policies identified in the organization

    Figure 2: Terminal view with policies identified in the organization

  7. If you enter Yes, the CloudFormation IaC resource scan will begin immediately. You will see the resource scan ID Amazon Resource Name (ARN) displayed.

    Note: A scan can take up to 10 minutes for every 1,000 resources.

    Figure 3: Terminal view with AWS CloudFormation resource scan details

    Figure 3: Terminal view with AWS CloudFormation resource scan details

  8. You can also review the scan progress from the IaC generator page of the CloudFormation console as shown in the following figure. To get to the IaC generator page, go to the CloudFormation console and choose IaC generator from the navigation pane.
    Figure 4: View the scan summary in the CloudFormation console to track progress

    Figure 4: View the scan summary in the CloudFormation console to track progress

  9. Upon completion of the scan, the template generation process will be initiated.
    Figure 5: Terminal view showing CloudFormation template being created

    Figure 5: Terminal view showing CloudFormation template being created

  10. After the template creation is finished, sign in to the AWS CloudFormation IaC console. Choose the Templates tab to review the generated templates and verify that they align with your requirements.
    Figure 6: View CloudFormation templates in the console

    Figure 6: View CloudFormation templates in the console

  11. You can review the policies added to a template by selecting a template name.
    Figure 7: Review policies included in the template

    Figure 7: Review policies included in the template

  12. When satisfied, you can proceed to import the templates into CloudFormation stacks for deployment by selecting Import to stack.
    Figure 8: Import a template into the CloudFormation stack

    Figure 8: Import a template into the CloudFormation stack

  13. Follow the prompts to create a stack.
    Figure 9: AWS CloudFormation stack example

    Figure 9: AWS CloudFormation stack example

  14. The tool automatically creates a folder named Policies in your current directory and downloads the generated templates.

    Note: If you encounter errors, check the recommended solutions for common issues.

As shown in the following figure, there are two recommended next steps.

Figure 10: Solution overview including recommended next steps

Figure 10: Solution overview including recommended next steps

After the existing policies are imported into a CloudFormation stack, we recommend storing your CloudFormation templates in a private Git repository. You can use the Policies folder that was automatically created by the tool in the current local directory with the generated templates downloaded and set up a continuous integration and delivery (CI/CD) pipeline to efficiently manage the imported policies.

By using a Git repository, you can use version control features like pull requests, branch management, and history tracking. This approach allows your team to efficiently review, update, and deploy policies with better collaboration and control. You can also create a CI/CD pipeline to automate the deployment of changes to your CloudFormation stacks, helping to ensure that updates are consistent and reliable.

We also recommend incorporating CloudFormation Hooks in your environment. CloudFormation Hooks can help validate policies (and other resources) against best practices, to help ensure that they adhere to the correct syntax, follow security best practices, and minimize potential vulnerabilities.

Related resources:

Conclusion

Importing existing AWS Organizations service control policies (SCPs) and resource control policies (RCPs) into CloudFormation provides an efficient and scalable approach to managing and automating your AWS governance. After they’ve been imported, you can manage and update policies directly in CloudFormation, to help ensure consistency and version control across your organization. The tool also creates a Policies folder in your current directory, storing downloaded templates for use as a central repository and integration with a CI/CD pipeline.

By using CloudFormation Hooks, you can further improve your policy management by validating SCPs and RCPs against best practices and policy grammar. This approach centralizes your policy updates, making governance more automated and efficient while reducing the risk of misconfiguration.

Swara Gandhi

Swara Gandhi

Swara is a Senior Solutions Architect on the AWS Identity Solutions team. She works on building secure and scalable end-to-end identity solutions. She is passionate about everything identity, security, and cloud.

AWS empowers global security culture at Wicked6 Cyber Games

Post Syndicated from Anne Grahn original https://aws.amazon.com/blogs/security/aws-empowers-global-security-culture-at-wicked6-cyber-games/

Wicked6 Cyber Games 2025 brought hundreds of women together worldwide from March 28–30. This dynamic virtual competition, sponsored by Amazon Web Services (AWS), helped attendees tackle real-world cybersecurity challenges through e-sports experiences. With 72 hours of women talking about cybersecurity, 11 cybersecurity games, and an attack and defense tournament streamed live, the weekend-long event highlighted the value of immersive learning while investing in the next generation of cybersecurity leaders.

Now in its sixth year, Wicked6 has established itself as more than just a competition—it’s become a cornerstone in building a collaborative security community. The Women’s Society of Cyberjutsu, a national 501(c)(3) nonprofit that promotes training, mentoring, and advancement of women and girls in cybersecurity careers, has co-hosted the event since its inception. This year’s theme was leveling up, and the virtual format enabled unprecedented global participation with 31 speakers and over 500 participants of all skill levels from 48 countries.

Keynotes and sessions

The event kicked off with an upbeat introduction from Wicked6 emcee Kristin Demoranville, founder and CEO of AnzenSage, Jessica Gulick, Executive Director of Wicked6 and founder of Cyber Esports Foundation, and Mari Galloway, CEO of Women’s Society of Cyberjutsu. The trio emphasized the importance of programs such as Wicked6 that provide women with space and opportunities to learn and grow, strengthen our confidence, and celebrate each other’s contributions to the cybersecurity community.

Keynotes featuring speakers from Africa, Australia, Japan, Saudi Arabia, and the US resonated with the multinational participants. Topics ranged from hacking and protecting AI in the age of large language models (LLMs) to drawing inspiration from science fiction novels, with an eye toward boosting skills.

In his introduction to keynote speaker Anna Collard, SVP of Content Strategy and Evangelist at KnowBe4 Africa, Hart Rossman, Global Security Services Vice President at AWS, noted the positive impact of time invested by Wicked6 participants and supporters. He pointed out that the opportunity the event provides to build relationships and practice both soft skills and technical skills is a great example of what it means to build strong security culture.

“At AWS, we recognize that security is a team sport. It’s about building community and raising the bar together, so we can overcome determined adversaries and make all of our customers, colleagues, and communities safer.” —Hart Rossman, Global Security Services Vice President at AWS

Technical sessions included a presentation focused on safeguarding Amazon Simple Storage Service (Amazon S3) buckets by two AWS women in security, Customer Incident Response Team (CIRT) Responder Jennifer Paz and Worldwide Specialist Security Solutions Architect Shahna Campbell. Paz and Campbell detailed an unusual increase of data encryption events in S3 buckets that used an encryption method known as server-side encryption using client-provided keys (SSE-C). This activity, which was recently detected by the AWS CIRT team and its automated security monitoring systems, has been attributed to malicious actors who obtained valid customer credentials and were using them to re-encrypt objects. Paz and Campbell demonstrated how collective security awareness and best practices can help prevent unauthorized access to S3 buckets and protect against ransomware events that abuse stolen credentials. Details of their investigation and prescriptive guidance for helping to prevent unintended encryption of Amazon S3 objects are available in a related AWS Security Blog post.

Gamified learning

A security-focused AWS Jam was integrated into Wicked6 for a unique, gamified learning experience. With AWS Jam, individuals and teams compete to solve a series of technical challenges in a lab-based cloud infrastructure that enhances practical understanding of AWS services and best practices. Additionally, Wicked6 participants had access to 11 different cybergame services, including Hack The Box, Haiku, InspireTech, and MetaCTF, fostering a collaborative learning environment where security practitioners at all levels could grow together.

An AWS GameDay during the event also focused on enhancing cloud security skills. Led by AWS ProServe Security and AWS Support experts Jonas Buecker, Hicham Terkiba, and Makendran Gunasekaran, the games focused on network security (including network log inspections), identity and access management (IAM) policies, and using application security techniques and AWS Web Application Firewall (AWS WAF) to help prevent SQL injections. One participant enthusiastically commented, “This was an amazing opportunity to practice hands-on AWS security learning,” underscoring the unique value of the experience.

Investing in tomorrow’s security leaders

AWS partnered to donate event tickets to South Africa’s MiDO Academy, which aims to create pathways out of poverty and meaningful employment opportunities for young people, while alleviating the pressures felt by business owners to upskill and integrate new cybersecurity talent. Dale Simons, CEO of MiDO Academy said, “The sponsored tickets from AWS didn’t just provide access to training—they gave our students entry into a global security community. Our young women now see themselves as part of a larger security mission, understanding that their contributions to cybersecurity can have worldwide impact.”

By combining technical challenges with mentorship and collaboration, Wicked6 helped participants work together to upskill and address tomorrow’s challenges. Gulick highlighted the event’s impact, stating “Wicked6 2025 was a success. Each year, women from all over the world join us for speakers, games, and networking. By learning to play cybersecurity games, these women can leverage games to learn new tech skills throughout their careers.”

No matter your role—whether you’re a seasoned professional or just starting your cybersecurity journey—continuous learning is key.

“It’s important as women and as cybersecurity professionals not to get comfortable with the status quo. Leveling up means stepping out of our comfort zones and doing things that scare us. Going to networking events, actively talking with people, connecting with people on LinkedIn, getting educated to improve skills, and putting ourselves out there. Wicked6 is the perfect place to do that this year and in the years to come!” —Mari Galloway, CEO of Women’s Society of Cyberjutsu

Pursuing the path to success

As cyber threats continue to evolve, AWS remains committed to strengthening global security culture through initiatives that promote active participation and partnership. This year’s Wicked6 Cyber Games exemplified how the security community can encourage and support future leaders with collaborative learning experiences and foster a more resilient and adaptable workforce.

For more information about AWS security culture, visit How AWS sustains a strong culture of security

If you have feedback about this blog post, submit comments in the Comments section below. You can also start a new thread on the AWS Security, Identity, and Compliance re:Post to get answers from the community.

Anne Grahn

Anne Grahn

Anne is a Senior Worldwide Security GTM Specialist at AWS, based in Chicago. She has 15 years of experience in the security industry and focuses on effectively communicating cybersecurity risk. She maintains a Certified Information Systems Security Professional (CISSP) certification.

Sarah Currey

Sarah Currey

As the Organization Excellence Leader for AWS Global Services Security, Sarah creates and optimizes security programs and solutions that protect AWS customers and internal teams. The initiatives foster a culture of security that encourages continuous improvement in our security practices and innovation while empowering everyone to own security.

Announcing AWS Security Reference Architecture Code Examples for Generative AI

Post Syndicated from Ievgeniia Ieromenko original https://aws.amazon.com/blogs/security/announcing-aws-security-reference-architecture-code-examples-for-generative-ai/

Amazon Web Services (AWS) is pleased to announce the release of new Security Reference Architecture (SRA) code examples for securing generative AI workloads. The examples include two comprehensive capabilities focusing on secure model inference and RAG implementations, covering a wide range of security controls and best practices for AWS generative AI services.

These new code examples are available in the AWS SRA Examples Repository and include ready-to-deploy CloudFormation templates for implementing detective security controls such as network segmentation, identity management, encryption, prompt injection detection, and logging and monitoring. The solutions align with the AWS SRA Design Guidance page and demonstrate our commitment to helping customers secure their generative AI implementations.

Customers can get started with these examples by following the implementation instructions for each solution in the AWS SRA Examples Repository Solutions GenAI page. Additional documentation and implementation guidance is available in the AWS SRA Design Guidance Generative AI Architecture Deep Dive.

AWS strives to continuously provide security solutions that help customers meet their security architecture needs. Customers can reach out to the team by submitting an issue in the code repository.

If you have feedback about this post, submit comments in the Comments section below.

Ievgeniia Ieromenko

Ievgeniia Ieromenko

Ievgeniia a Security Engineer at AWS, focusing on cloud security architecture and best practices. She is a key contributor to the AWS Security Reference Architecture GitHub repository, helping customers implement secure cloud environments.

Liam Schneider

Liam Schneider

Liam is a Sr. Security Engineer with deep experience in cloud and application security, focused on reducing risk, improving system resilience, and aligning security with business needs. Liam has a strong background in compliance, team leadership, and building secure, scalable solutions across complex environments. He is known for practical, effective approaches to modern security challenges in both enterprise and cloud-first organizations.

Justin Kontny

Justin Kontny

Justin is a Sr. Security Engineer at AWS who combines his passion for software development with expertise in cloud security. He focuses on transforming security from a barrier to a business enabler through innovative AI-driven automation. When not pushing the boundaries of cloud security, Justin enjoys time with his children and being active outdoors.

How to help prevent hotlinking using referer checking, AWS WAF, and Amazon CloudFront

Post Syndicated from Alex Smith original https://aws.amazon.com/blogs/security/how-to-prevent-hotlinking-by-using-aws-waf-amazon-cloudfront-and-referer-checking/

Note: This post was first published April 21, 2016. The updated version aligns with the latest version of AWS WAF (AWS WAF v2) and includes screenshots that reflect the changes in the AWS console experience.


AWS WAF Classic has been deprecated and will be end-of-life (EOL) in September 2025. This update describes how to use the latest version of AWS WAF (WAFv2) to help prevent hotlinking. Updates have been made to the screenshots to reflect the changes in the AWS Management Console for AWS WAF.

Hotlinking—also known as inline linking—is a form of content leeching where an unauthorized third-party website embeds links to resources originally referenced in a primary site’s HTML. The third-party website doesn’t incur the cost of hosting the content, which means that your website can be charged for the content other sites use. It also results in slow loading times, lost revenue, and potential legal issues.

Now, you can use AWS WAF to help prevent hotlinking. AWS WAF is a web application firewall that’s closely integrated with Amazon CloudFront—a content delivery network (CDN)—and can help protect your web applications from common web exploits that could affect application availability, compromise security, and consume excessive resources. In this blog post, I show you how to help prevent hotlinking by using header inspection in AWS WAF, while still taking advantage of the improved user experience from a CDN such as CloudFront.

Solution overview

You can address hotlinking in various ways. For instance, you can validate the Referer header (sent by a browser to indicate to the server which page the visitor was referred from) at your web server (for example, by using the Apache module mod_rewrite), and either issue a redirect back to your site’s main page or return a 403 Forbidden error to the visitor’s browser.

If you’re using a CDN such as CloudFront to speed up your site’s delivery of content, validating the Referer header at the web server becomes less practical. The CDN stores a copy of your content in the edge of its network of servers, so even if your web server validates the original request’s headers (in this case, the referer), additional requests for that content must be validated by the CDN itself, because they are unlikely to reach the origin web server.

Figure 1 illustrates this process.

Figure 1: Request – response flow showing instances of a cache-miss and a cache-hit

Figure 1: Request – response flow showing instances of a cache-miss and a cache-hit

The process shown in Figure 1 is as follows:

  1. A request is received from a user client (1) at a CloudFront edge location (2).
  2. The edge location attempts to return a cached copy of the file requested. This request, if fulfilled from the cache, is considered a cache hit.
    1. In the case of a cache miss—when the content is either not in the edge or is not valid (for example, if the content is out of date)—the request is forwarded to the origin (3) (such as an Amazon Simple Storage Service (Amazon S3) bucket) for a new copy of the object.
    2. In the case of a cache hit, the origin cannot apply validation logic to the user’s request, because the edge server doesn’t need to contact the origin to fulfil the user’s request.

In the next section, I show you how to inspect the client-request headers using AWS WAF to allow or block requests at the CDN.

Solution implementation—two approaches

This post includes two ways to set up AWS WAF to help prevent hotlinking:

  • Using a separate subdomain: Static files (such as images or styling components) to be protected are moved to a separate subdomain such as static.example.com so that you only need to validate the Referer header.
  • Using the same domain: Static files are located under a directory on the same domain. This solution includes how to extend this example to check for an empty Referer header.

The choice of approach will depend on how your site is structured and the level of protection you want to implement. The first approach enables you to set up a Referer header check to make sure that requests for the images only come from an allowlisted sub-domain, while the second approach has an additional check for an empty Referer header. The second approach extends the first approach and allows for some flexibility for users to share direct links to the image while still preventing unaffiliated third-party sites from embedding the image links on their websites.

Terms

The following list includes key terms used in this post:

  • AWS WAF configurations consist of a web access control list (web ACL), associated with a given CloudFront distribution.
  • Each web ACL is a collection of one or more rules, and each rule can have one or more match conditions.
  • Match conditions are made up of one or more filters, which inspect components of the request (such as its headers or URI) to match for certain conditions.
  • Case-sensitivity: HTTP header names are case-insensitive. Referer and referer point to the same HTTP header. HTTP header values, however, are case-sensitive.

Prerequisites

You must have a CloudFront distribution set up before configuring an AWS WAF web ACL. For information about how to set up a CloudFront distribution with an S3 bucket as an origin, see Configure distributions.

Approach 1: A separate subdomain

In this example, you create an AWS WAF rule set that contains a single rule with a single match condition, which in turn has a single filter. The match condition checks the Referer header and verifies that it contains a given value. If the request matches the condition specified in the rule, the traffic is allowed. Otherwise, the AWS WAF rule blocks the traffic.

For this example, because all the static files are on a separate subdomain (static.example.com) accessed only from the site example.com, you will block hotlinking for any file that don’t have a referer that ends with example.com.

Use the following steps to set this up using the AWS WAF console.

Step 1: Create and name a new web ACL

  1. Sign in to the AWS WAF console.
  2. If you have not created a web ACL before, Choose Create web ACL on the AWS WAF console landing page.
  3. Because you want to associate the web ACL with a CloudFront distribution, select Amazon CloudFront distributions as the Resource type.
    1. Enter a Name for the web ACL that you’re creating. For this example, I used the name sample-webacl. The page will automatically populate an associated Amazon CloudWatch metric name. CloudWatch is a monitoring service that allows you to gather and report on metrics of various services. This CloudWatch metric can be used later to report on how your newly created AWS WAF configuration is being used.
    2. After you have supplied the name of the web ACL, you can select the available AWS resources to be protected by this web ACL. In this example, you will fill that in later, so leave this field blank for now.
    3. By default, AWS WAF can inspect up to 16 KB of the web request body with additional values of 32, 48, and 64 KB for an additional cost. Leave the web request Body size limit at the default value of 16 KB.
    4. Choose Next.
  4. Figure 2: Describing the web ACL and associating it to resources

    Figure 2: Describing the web ACL and associating it to resources

Step 2: Create a string match condition on Referer header

AWS WAF ACLs can use AWS managed rule groups, rule groups from AWS Marketplace providers, or you can write your own rules and rule groups. For this example, you will create your own rules and rule groups.

  1. In the AWS WAF console, choose Add rules, and select Add my own rules and rule groups to create the string match condition.
    Figure 3: Add rules and rule groups

    Figure 3: Add rules and rule groups

  2. This will bring you to the Rule visual editor page. The default Rule type will be set to Rule builder which you can leave unchanged. In the Rule builder section, select Regular rule.
    Figure 4: Rule type and Rule builder

    Figure 4: Rule type and Rule builder

  3. The next step is to construct a string match condition to match on the Referer header. Under Name, enter a name for the rule, such as Referer-check. Make sure that If a request is set to doesn’t match the statement (NOT). The string match condition is a negative match which means that if the Referer header field value does not match the value specified in the rule, the request will be blocked. This makes sure that requests for static.example.com which only originate from example.com are allowed. In the Statement section, use the following settings:
    1. Inspect: Select Single header.
    2. Header field name: Enter referer as the value.
    3. Match type: Select Exactly matches string.
    4. String to match: Enter example.com as the value.
    5. Text transformation: Select Lowercase. This isn’t required for most modern browsers, but is a good practice because HTTP header field values are case sensitive.
    Figure 5: Rule name and statement

    Figure 5: Rule name and statement

  4. In the Action section, select Block as the Action. Choose Add Rule.
    Figure 6: Rule Action

    Figure 6: Rule Action

    In the preceding rule statement, you’re configuring AWS WAF to inspect a header with the name Referer and checking if the value of the header matches the static string example.com. If the value of the Referer header is not example.com, then the request is blocked.

  5. The next page is Add rules and rule groups. It shows the following attributes of the web ACL:
    1. AWS WAF rules that have been added to the web ACL.
    2. Web ACL capacity units (WCUs).
    3. Default web ACL action.
    4. Token domain list.
    5. Because you’re only adding one rule to this web ACL, choose Next.
      Figure 7: Rules and rule groups, WCUs, and default web ACL action

      Figure 7: Rules and rule groups, WCUs, and default web ACL action

  6. On the next page, you will set the rule priority. Because you added only one rule, you will not need to adjust the rule priority. If there is more than one rule, you can select a particular rule and use the Move up or Move down options to organize the rule order. Choose Next.
    Figure 8: Set rule priority

    Figure 8: Set rule priority

  7. The Configure metrics page details can be left at the default values. Choose Next to proceed to the final step.
    Figure 9: Configure metrics

    Figure 9: Configure metrics

  8. The final step is to review the web ACL details. If you need to change one of the settings of the web ACL, you can choose Edit step for the corresponding step. Choose Create web ACL to finalize creating the AWS WAF web ACL.
    Figure 10: Review and create web ACL

    Figure 10: Review and create web ACL

Step 3: Associate the new rule with the relevant CloudFront distribution

You can now associate AWS resources with the web ACL that you created in the previous steps. In this case, the AWS resource is a CloudFront distribution.

  1. In the AWS WAF console, choose Web ACLs in the navigation pane. Select the web ACL named sample-webacl that you created previously.
    Figure 11: Select a Web ACL to configure

    Figure 11: Select a Web ACL to configure

  2. Choose Add AWS resources.
    Figure 12: Add AWS resources

    Figure 12: Add AWS resources

  3. Eligible AWS resources will be displayed in the pop-up page. Select the CloudFront distribution from the Resources list. Choose Add to associate the ACL sample-webacl with the CloudFront distribution.
    Figure 13: Select CloudFront distribution to associate with sample-webacl

    Figure 13: Select CloudFront distribution to associate with sample-webacl

  4. The next page is the Web ACLs page, which will show the CloudFront distribution selected in the previous step in the Associated AWS resources section.
    Figure 14: Web ACLs and Associated AWS resources

    Figure 14: Web ACLs and Associated AWS resources

Test the referer check rule

You’re ready to test the web ACL that you created by issuing a cURL command from the command line and confirming that the referer check is matched correctly. When you request files without the allowlisted Referer header, the requests are blocked at the CDN. However, valid requests still are allowed through.

When a third party embeds your content (request blocked at the CDN)

» curl –H "Referer: example.net -I https://static.example.com/favicon.ico
« HTTP/1.1 403 Forbidden

When you embed your content (request allowed through the CDN)

» curl –H "Referer: example.com -I https://static.example.com/favicon.ico
« HTTP/1.1 200 OK

Note: With Approach 1, you must make the request with an allowlisted Referer header. In this example, all paths are filtered.

Approach 2: All content under the same domain, with filtering by path

In the second approach, you allow a blank Referer header and filter by a given URL path. To do this, you will create an AWS WAF web ACL that contains multiple rules with additional match conditions, which in turn are comprised of multiple filters. As with the first approach, the match condition looks at the Referer header; however, you will validate the header in two ways. First, you validate whether the request contains the expected header, and if it does not, you apply the second validation, which checks to see whether it has a URL style Referer header. This enables you to access the assets directly in a browser when the assets aren’t embedded elsewhere in a website but still provides protection against hotlinking.

Accessing an image directly in the browser can be useful in situations where users might want to share the link to the image directly, thus helping to prevent a negative user experience when sharing the image link with other users. This approach makes it an improvement over the first approach where requests for the images must originate from the sub-domain.

You will also validate the path used in the request (in this example /wp-content), which allows AWS WAF to protect individual folders under a single domain name.

Step 1: Decide what to protect

As in the first approach, rather than filter on everything under a domain, you will filter based on the path. In this case, /wp-content. This allows you to protect your uploaded content that sits under /wp-content, but without having to put the content into a separate subdomain.

Step 2: Create and name a new web ACL

You can use the web ACL that you created for Approach 1, or you can repeat Step 1 of Approach 1 to create a new web ACL.

Step 3: Create string match conditions on the referer

For Approach 2, the assumption is that everything exists under a single domain, so instead of using the catch-all example.com, use the more secure https://example.com/ and mark the header as Starting with https://example.com.

Because you’re explicitly filtering on one header, you must watch out for two things:

  • Switching between www.example.com and example.com in your application.
  • Switching between https:// and http:// in your application.

If either of these switches occurs, you will see a 403 Forbidden error returned instead of your embedded files. In this example, all content is delivered directly through https://example.com/.

For this example, you will construct two rules, each of which will contain multiple string match conditions. AWS WAF allows for conditional match conditions within a rule so you can create nested logic statements. For example, a rule evaluation is true if all the statements within a rule statement are evaluated to true.

First rule: Validate a Referer header:

For this rule, you will set the following match conditions and AWS WAF actions:

Rule name: Validate-Referer-header

If Referer header value starts with https://example.com

AND

If URI path starts with /wp-content

THEN

ALLOW request

  1. Open the AWS Management Console for AWS WAF and navigate to WAF & Shield.
  2. Choose Web ACLs in the navigation pane and select Global (CloudFront) as the AWS Region.
    Figure 15: Web ACLs and AWS Regions

    Figure 15: Web ACLs and AWS Regions

  3. The page will refresh to show the Web ACL sample-webacl that you created in the preceding Step 2. Select sample-webacl.
    Figure 16: Web ACLs list

    Figure 16: Web ACLs list

  4. Select the Rules tab.
    Figure 17: Web ACL rules

    Figure 17: Web ACL rules

  5. Choose Add rules and select Add my own rules and rule groups. If you’re reusing the web ACL created in Approach 1, delete the Referer-check rule before adding new rules.
    Figure 18: Add rules and rule groups

    Figure 18: Add rules and rule groups

  6. For Rule type, select Rule builder.
    Figure 19: Rule type

    Figure 19: Rule type

  7. In the Rule section, use the following settings:
    1. Name: Enter Validate-referer-header as the value.
    2. Type: Select Regular rule.
    3. If a request: Select matches all the statements (AND).
    Figure 20: Rule name and match condition

    Figure 20: Rule name and match condition

  8. In the Statement 1 section, use the following settings:
    1. Inspect: Select Single header.
    2. Header field name: Enter referer as the value.
    3. Match type: Select Starts with string.
    4. String to match: Enter https://example.com as the value.
    5. Text transformation: Select Lowercase.
    Figure 21: First string match condition

    Figure 21: First string match condition

  9. Create the second string match condition (Statement 2). For the URL itself, you want to protect content under /wp-content, so you will create a string match to validate that case using the same steps as for the first string match condition, with two changes:
    1. For Inspect, select URI path.
    2. For String to match, enter /wp-content as the value.
    Figure 22: Second string match condition

    Figure 22: Second string match condition

  10. Change the Action to Allow and choose Add Rule at the bottom of the page.
    Figure 23: Set the Action to Allow

    Figure 23: Set the Action to Allow

  11. In the Set rule priority page, choose Save.
    Figure 24: Save the rule

    Figure 24: Save the rule

Second rule: Validate without a Referer header

For the second rule, you will set the following match conditions and rule actions:

Rule name: Validate- with-no-Referer-header

If Referer header contains ://

AND

If URI path starts with /wp-content

THEN

BLOCK request

The second rule is similar to the first rule, but it matches when the Referer header value includes ://. You use this match condition to check whether the Referer header has been set at all. If it has, you block the request.

  1. In the Web ACL page, choose Add rules and select Add my own rules and rule groups to be taken to the Rule type page.
    Figure 25: Create the second rule

    Figure 25: Create the second rule

  2. For Rule type and Rule builder, use the following settings:
    1. Rule type: Select Rule builder.
    2. Name: Enter Validate-with-no_Referer-header as the value.
    3. Type: Select Regular rule.
    4. If a request: Select matches all the statements (AND).
    Figure 26: Set the rule type and matching

    Figure 26: Set the rule type and matching

  3. For Statement 1, use the following settings:
    1. Inspect: Select Single header.
    2. Header field name: Enter Referer as the value.
    3. Match type: Select Contains string.
    4. String to match: Enter ://
    Figure 27: Configure Statement 1

    Figure 27: Configure Statement 1

  4. For Statement 2, use the following settings:
    1. Inspect: Select URI path.
    2. Match type: Select Starts with string.
    3. String to match: Enter /wp-content as the value.
    Figure 28: Configure Statement 2

    Figure 28: Configure Statement 2

  5. For Action, keep the default setting of Block and choose Add Rule.
    Figure 29: Add rule

    Figure 29: Add rule

  6. The resulting Set rule priority page will list the rules in the sample-webacl web ACL and will look like the following figure. It shows the name of the rule, the rule priority, the web capacity units (WCUs) and the AWS WAF response. Choose Save.
    Figure 30: Rule priority and web ACL units used the web ACL.

    Figure 30: Rule priority and web ACL units used the web ACL.

The Rules tab will now show both of the rules that you added with their corresponding AWS WAF actions in addition to the default action of Allow for requests that don’t match one of the rules.

Figure 31: Rules tab of sample-webacl web ACL

Figure 31: Rules tab of sample-webacl web ACL

Step 4: Associate the new rules with the relevant CloudFront distribution

  1. Select the Associated AWS Resources tab and choose Add AWS resources.
    Figure 32: Add AWS resources

    Figure 32: Add AWS resources

  2. Select the relevant CloudFront distribution and choose Add.
    Figure 33: Select the CloudFront distribution

    Figure 33: Select the CloudFront distribution

  3. The web ACLs page will show the CloudFront distribution in the Associated AWS resources tab.
    Figure 34: Associated AWS resources

    Figure 34: Associated AWS resources

Test the rules

Similar to Approach 1, you have filtering at the CDN, but this time the filtering is based on the path and direct linking is allowed (without a Referer header).

You can use cURL to verify that the new AWS WAF web ACL correctly protects your content. Use the –H argument to send a different Referer header to the CloudFront distribution, which allows you to test as if you are embedding the website content in an unauthorized page.

When a third party embeds your content

» curl –H "Referer: https://example.net/" -I https://example.com/wp-content/uploads/2013/03/shareable-image.jpg
« HTTP/1.1 403 Forbidden

When your content is directly linked (with no Referer)

» curl -I https://example.com/wp-content/uploads/2013/03/shareable-image.jpg
« HTTP/1.1 200 OK

When you embed your content

» curl –H "Referer: https://example.com/" -I https://example.com/wp-content/uploads/2013/03/shareable-image.jpg
« HTTP/1.1 200 OK

Conclusion

AWS WAF is a web application firewall that lets you monitor and control the HTTP(S) requests that are forwarded to your protected web application resources. In this post, you saw how to use the AWS WAF custom rule builder feature to prevent content hotlinking to protect your website’s content hosted in an Amazon S3 bucket.

The two approaches demonstrated in this post provide you with ways to implement a robust referer check solution that helps prevent unauthorized third-party websites from linking back to static assets on your website, thus helping to prevent increased bandwidth costs, bad user experience, and degraded performance because of resource leeching. Following the concept of least privilege, you can further restrict the AWS WAF rules to apply only to certain image file extensions (such as .jpg or .png).

While referer checking helps prevent unaffiliated sites from backlinking to your site’s images and benefitting by using your site’s bandwidth, more sophisticated exploits can carefully craft a request to bypass the referer check. Other web request mechanisms, such as web browser plugins, server-to-server requests that forge referer header values, or privacy-based web browsers may also cause inconsistencies in accurately evaluating the referer header value. Be aware of such inconsistencies and consider using additional private content mechanisms such as signed URLs and token authentication.

Web browsers don’t have a mechanism to validate if a Referer header has been tampered with. Referer checking should be implemented as part of a broader web application security strategy by using AWS WAF application protection rules, Bot Control, Fraud Control, and Distributed Denial of Service (DDOS) protection. Effective web traffic monitoring using AWS WAF logs, Amazon CloudWatch metrics, and web ACL traffic dashboards will help ensure that bad actors aren’t bypassing the AWS WAF rules that you have set up to protect your web traffic.

You can use AWS WAF to build on top of the referer check to implement more advanced content protection solutions such as rate-limiting, bot mitigation, and DDOS mitigations to further secure your website against a wide range of exploits.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this solution or its implementation, start a new thread on the AWS WAF forum.

Alex Smith was the original author of this post in 2016.

Sanchith Kandaka

Sanchith Kandaka

With over 15 years of experience in the Content Delivery and Application Security space, Sanchith is excited about all things edge related. He has worked as a Solutions Architect and a Solutions Engineer and is now a Specialist Solutions Architect at AWS focused on AWS Edge Services and Perimeter Protection services including Amazon CloudFront, AWS WAF, and AWS Shield.

How to support OpenID AuthZEN requests with Amazon Verified Permissions

Post Syndicated from Edward Sun original https://aws.amazon.com/blogs/security/how-to-support-openid-authzen-requests-with-amazon-verified-permissions/

OpenID Foundation’s AuthZEN Working Group is currently drafting a new specification (version 1.0, draft 03 at the time of publication) and associated standard mechanisms, protocols, and formats to communicate authorization-related information between components involved in access control and authorization.

Today, we’re publishing an open-source reference implementation demonstrating seamless integration between an AuthZEN-compliant policy enforcement point (PEP) and Amazon Verified Permissions, a fully managed AWS service for storing authorization policies (expressed in Cedar policy language) and evaluating authorization requests at runtime.

What is OpenID AuthZEN specification?

Traditionally, application developers built their own authorization logic within the application code to evaluate access to resources. Reviewing the authorization rules requires reviewing the application code, and changing the authorization rules requires changing and deploying a new version of the application. Customers using this pattern often find it challenging to consistently enforce their authorization rules, track changes to these rules, and update rules as their application evolves.

To solve these challenges, modern application designs have shifted their authorization capabilities and decoupled them from application code. This strategy accelerates application development and grants fine-grained permissions within applications in a more repeatable and dynamic way that developers can apply consistently. Fine-grained permissions are typically designed based on:

  • Subject role assignments following role-based access control (RBAC)
  • Attribute values of the subject or the requested resources following attribute-based access control (ABAC)
  • Relationships between subjects and resources following relationship-based access control (ReBAC)
  • A hybrid model using a combination of the preceding methods

Expressions of these access control rules are called policies, which lead to the policy-based access control (PBAC) approach.

To support these access control approaches, customers implement solutions that follow the guidance of NIST SP 800-162 – A guide to ABAC.

Figure 1: Access control components and interactions

Figure 1: Access control components and interactions

Figure 1 illustrates the architecture of an advanced access control mechanism that consists of several key components that work together to manage and enforce authorization policies.

At the heart of this system is the policy decision point (PDP), which serves as the rules or policy engine. The PDP is responsible for evaluating rules and policies to determine whether a particular access request should be allowed or denied. This component interacts closely with the policy enforcement point (PEP), which acts as the gatekeeper for resource access.

The PEP, typically integrated into your application, receives access requests for subjects (users or systems) and enforces the decisions made by the PDP. It either allows or denies access to the requested resource based on the PDP’s determination.

To make informed decisions, the PDP might need to retrieve additional metadata or attributes. A policy information point (PIP) acts as an interface to external data sources, such as subject attribute and resource attribute stores. These could include databases such as your HR system, providing crucial contextual information to aid in the decision-making process.

The diagram also shows other important elements:

  1. The policy store, which stores the authorization policies.
  2. The policy administration point (PAP), used for managing and updating policies in the policy store.
  3. Environment conditions, which can influence access decisions based on factors like time, location, or system status.

All these components collaborate within the authorization services framework to provide a comprehensive and flexible access control system. The AuthZEN specification provides a standardized way to communicate authorization requests from the PEP to the PDP and to communicate authorization decisions from the PDP to the PEP.

Interoperability with Verified Permissions

Verified Permissions offers organizations a fully managed service that combines the roles of both a PDP engine and a policy store with a PAP. AWS manages the underlying infrastructure, scaling automatically with application demands and maintaining consistent performance across distributed systems.

Verified Permissions uses Cedar, an open-source policy language that brings mathematically provable access control. When integrated into an application’s architecture, Verified Permissions serves as the central decision-making engine for authorization requests sent through its IsAuthorized() API. Verified Permissions evaluates requests against defined policies while considering information such as principal attributes, resource properties, type of action, and environmental conditions. The service also offers the ability to group common authorization requests into one API call and to validate OAuth 2.0 JSON Web Token with the OpenID Connect provider that issued it when provided as principal information.

This reference implementation enables interoperability and seamless integration between OpenID’s AuthZEN protocol and Verified Permissions. It can help you standardize authorization patterns across different services hosted on AWS.

Architecture overview

The architecture of the proposed AuthZEN interface for Verified Permissions is illustrated in Figure 2.

Figure 2: Architecture overview

Figure 2: Architecture overview

The workflow for this architecture is as follows:

  1. The application, serving as PEP, makes an authenticated authorization API call for access requests in AuthZEN-compliant format.
  2. Amazon API Gateway invokes an AWS Lambda authorizer to evaluate the PEP authentication mechanism of the choice.

    Note: In this solution, you will manage access to the API using a secret generated by AWS Secrets Manager. Review your threat model and adopt additional authentication mechanisms that fit your workload such as OAuth 2.0 bearer tokens, client certificate authentication, or AWS Identity and Access Management (IAM) temporary credentials.

  3. After successful authentication, API Gateway propagates the request to a Lambda function integration.
  4. The Lambda function queries the entity store, which is a PIP, to retrieve additional metadata and attributes about the entities in the authorization request. The entities are representing the principals and the resources.
  5. The Lambda function integration transforms the authorization requests in AuthZEN format into a Verified Permissions IsAuthorized() formatted request.
  6. Verified Permissions evaluates the authorization request and returns the authorization decision, then a Lambda function transforms the authorization decision to an AuthZEN formatted decision back to the application.

Deploy the solution

You can deploy the authzen-interface-verified-permissions solution by using the AWS Cloud Development Kit (AWS CDK). Solution artifacts are available in the aws-samples/sample-authzen-interface-verified-permissions repository.

For instructions and more information on using the AWS CDK, see Get started with the AWS CDK.

Deploy the policy store

Create a Verified Permissions policy store to store your authorization policies. You can create a new policy store using the AWS Management Console for Verified Permissions or deploy the sample policy store by using AWS CDK. The sample policy store contains Cedar policies and entities to support AuthZEN’s authorization scenarios.

To deploy the policy store stack using AWS CDK:

  1. Navigate to the project root folder, bootstrap your environment, and deploy the policy store by using the following commands. Replace <account-id> and <region> with your AWS account number and the AWS Region you want to deploy in.
    $ cdk bootstrap aws://<account-id>/<region>
    
    $ npm install
    
    $ npm run cdk:policystore
    
    

  2. After the deployment completes, locate the PolicyStoreId and CedarEntitiesTableName from Outputs:
    Outputs:
    AuthZENPolicyStoreStack.CedarEntitiesTableName = AuthZENPolicyStoreStack-EntitiesCedarEntitiesTableXXXXXXX-XXXXXXXXXXXXX
    AuthZENPolicyStoreStack.PolicyStoreId = 1234567890abcdef0
    Stack ARN:arn:aws:cloudformation:aa-example-1:123456789012:stack/AuthZENPolicyStoreStack/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111
    

  3. Set the policy store ID and entities table name as environment variables:
    $ export POLICY_STORE_ID=1234567890abcdef0
    $ export ENTITIES_TABLE_NAME=AuthZENPolicyStoreStack-EntitiesCedarEntitiesTableXXXXXXX-XXXXXXXXXXXXX
    

Deploy AuthZEN interface for Verified Permissions

You then deploy a sample AuthZEN PDP interface that’s connected to the Verified Permissions policy store. The PDP includes the API Gateway REST API, the Lambda authorizer, and the Lambda function integration.

To deploy the PDP by using AWS CDK:

  1. Deploy the PDP stack by using the following commands:
    $ npm run cdk:pdp
    

    If you have a custom domain for the API Gateway endpoint, visit the CDK stack for AuthZEN HTTPS binding section for more information.

  2. After the deployment, locate the API endpoint and secret from Outputs. This is your AuthZEN authorization endpoint.
    Outputs:
    AuthZENPDPStack.ApiCredentialsSecretArn = arn:aws:secretsmanager:aa-example-1:123456789012:secret:ApiCredentials-XXX
    AuthZENPDPStack.RestApiEndpoint1234567A = https://abcdef0123.execute-api.aa-example-1.amazonaws.com/prod/
    Stack ARN:
    arn:aws:cloudformation:aa-example-1:123456789012:stack/AuthZENPDPStack/a1b2c3d4-5678-90ab-cdef-EXAMPLE22222
    

Test the deployment

The OpenID AuthZEN working group has defined a set of interoperability scenarios based on a sample Todo application as the PEP. You can view the payload specification for each API authorization request and test it with AuthZEN’s API Gateway test runner.

To test the deployment: 

  1. Clone AuthZEN API Gateway test runner and navigate to the test-harness directory.
  2. Set the secret ARN in the environment variable.
    $ export SECRET_ARN=arn:aws:secretsmanager:aa-example-1:123456789012:secret:ApiCredentials-XXX
    

  3. Set the predefined AuthZEN test runner’s environment variable for the Authorization header value.
    $ export AUTHZEN_PDP_API_KEY=$(aws secretsmanager get-secret-value --secret-id $SECRET_ARN --query 'SecretString' --output text | jq -r '.authSecret')
    

  4. Install the dependencies and perform the test by running the following command:
    $ yarn
    
    $ yarn build
    
    $ yarn test https://abcdef0123.execute-api.aa-example-1.amazonaws.com/prod/ console
    

  5. The following interoperability results display in the console:
    $ yarn test https://abcdef0123.execute-api.aa-example-1.amazonaws.com/prod/ console
    yarn run v1.22.22
    $ node build/runner.js https://abcdef0123.execute-api.aa-example-1.amazonaws.com/prod/ console
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDA2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/users/{userId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDA2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDA2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"POST"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDA2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"PUT"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDA2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"DELETE"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDE2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/users/{userId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDE2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDE2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"POST"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDE2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"PUT"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDE2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"DELETE"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDI2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/users/{userId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDI2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDI2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"POST"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDI2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"PUT"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDI2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"DELETE"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDM2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/users/{userId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDM2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDM2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"POST"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDM2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"PUT"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDM2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"DELETE"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDQ2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/users/{userId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDQ2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"GET"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDQ2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"POST"},"resource":{"type":"route","id":"/todos"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDQ2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"PUT"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    PASS REQ: {"subject":{"type":"identity","id":"CiRmZDQ2MTRkMy1jMzlhLTQ3ODEtYjdiZC04Yjk2ZjVhNTEwMGQSBWxvY2Fs"},"action":{"name":"DELETE"},"resource":{"type":"route","id":"/todos/{todoId}"}}
    

The PASS REQ results indicate this deployment has met AuthZEN interoperability test scenarios requirements. You can also view the latest results at OpenID AuthZEN Interop results summary site.

It’s your turn to build

In this post, we introduced an open-source AuthZEN interface for Amazon Verified Permissions that’s based on the OpenID Foundation’s AuthZEN working group specifications. This implementation provides developers with a transparent way to adopt industry-standard authorization practices while maintaining the security and scalability benefits of the managed authorization service provided by AWS.

If you’re interested in learning more about Cedar and Verified Permissions, see the following references:

If you’re interested in this new specification, join the AuthZEN Working Group and provide your feedback through the associated GitHub repository.

If you have feedback about this post, submit comments in the Comments section below.

Edward Sun

Edward Sun

Edward is a Security Specialist Solutions Architect focused on identity and access management. He loves helping customers throughout their cloud transformation journey with architecture design, security best practices, migration, and cost optimizations. Outside of work, Edward enjoys hiking, golfing, and cheering for his alma mater, the Georgia Bulldogs.

Kevin Hakanson

Kevin Hakanson

Kevin is a Senior Solutions Architect for AWS Worldwide Public Sector, based in Minnesota. He works with EdTech customers to ideate, design, validate, and launch products using cloud-focused technologies and modern development practices. When not staring at a computer screen, he’s probably staring at another screen, either watching TV or playing video games with his family.