Tag Archives: Governance

Enabling AI adoption at scale through enterprise risk management framework – Part 2

Post Syndicated from Milind Dabhole original https://aws.amazon.com/blogs/security/enabling-ai-adoption-at-scale-through-enterprise-risk-management-framework-part-2/

In Part 1 of this series, we explored the fundamental risks and governance considerations. In this part, we examine practical strategies for adapting your enterprise risk management framework (ERMF) to harness generative AI’s power while maintaining robust controls.

This part covers:

  • Adapting your ERMF for the cloud
  • Adapting your ERMF for generative AI
  • Sustainable Risk Management

By the end of this post, you’ll have a roadmap for scaling generative AI adoption securely and responsibly.

Adapting your ERMF for the cloud

Before diving into generative AI-specific controls, it’s crucial to understand the fundamental infrastructure that enables these technologies. Cloud computing is the foundational infrastructure that has made generative AI possible and accessible at scale. The development and deployment of large language models and other generative AI systems require massive computational resources, vast amounts of data storage, and sophisticated distributed processing capabilities that cloud systems can efficiently provide.

Cloud technology differs from on-premises IT solutions, and the relationship between financial institutions and cloud service providers is also different from the relationship with a traditional outsourcing provider.

These differences change the nature of many risks that financial institutions face and how they manage them. However, if cloud technology is implemented in the right way, it can reduce risk and provide tools to help Chief Risk Officers (CROs) to manage risk too.

You can read more about how your ERMF needs to change for large scale cloud adoption in Is your Enterprise Risk Management Framework ready for the Cloud?

Adapting your ERMF for generative AI

Organizations adopting generative AI can use their enterprise risk management framework to realize business value while maintaining appropriate controls. This approach allows you to build on existing risk management practices while addressing generative AI’s unique characteristics.

For a structured approach to cloud-enabled AI transformation, the AWS Cloud Adoption Framework for AI, ML, and generative AI (AWS CAF for AI) provides detailed implementation guidance aligned with enterprise risk management principles. For a detailed user guide, see AWS User Guide to Governance, Risk and Compliance for Responsible AI Adoption within Financial Services Industries, available in AWS Artifact using your AWS sign in. AWS Artifact provides AWS security and compliance reports, helping organizations maintain compliance through best practices.

When it comes to model management and the AI system lifecycle, customers can consult ISO42001 AI Management, Section A6. This section encompasses capturing the objective and processes for the responsible design and development of AI systems, including criteria and requirements for each stage of the AI system life cycle. This guidance can help organizations verify that their model management practices align with industry standards for responsible AI development.

From a business leader’s perspective, incorporating generative AI considerations into your ERMF helps establish documented good practices, implement effective controls, and maintain transparency about usage across the enterprise. This enables both responsible innovation and prudent risk management. Here’s how organizations are approaching this:

Generative AI policy and governance foundations in ERMF

In the field of generative AI, organizations establish both guardrails for innovation and clear accountability for risk management. The three lines of defense model provides the structure for implementing these foundational elements:

  • Acceptable use framework for your organization: Clear direction on appropriate generative AI use helps organizations manage risks while enabling innovation. The range of use cases for generative AI is large and likely to expand over the years, making it essential to have clear guidance on what applications are permitted and under what conditions. As organizations explore these opportunities, their framework can evolve with their experience and maturity.
  • Risk accountability: The generative AI lifecycle—from use case selection through implementation and ongoing monitoring—requires clear ownership across business and control functions. While organizations can establish specific generative AI oversight mechanisms, these should integrate with existing governance structures. Risk reporting and accountability for generative AI initiatives should flow through established enterprise risk committees and governance boards, helping to facilitate consistent risk management across the organization rather than creating isolated pockets of oversight.

Implementation approach for generative AI: Putting principles into practice

Building on the three lines of defense model discussed earlier, organizations can adapt their risk management practices to address the unique characteristics of generative AI while using industry best practices and frameworks. This often involves evolving existing controls and introducing new ones specific to generative AI. AWS services have built-in capabilities that support these enhanced governance, risk management, and compliance requirements, helping organizations to implement controlled and responsible generative AI solutions. This includes, for example, Amazon Bedrock Guardrails, among many others.

Building on the risk areas we outlined earlier, we now explore how organizations can implement controls for each of these areas. For each, we describe the principle and the practical implementation considerations. While organizations might prioritize these areas differently based on their use cases and risk appetite, together they provide a framework for responsible generative AI adoption through ERMF.

While we explore high-level control principles that follow, technical teams can review the AWS Well-Architected Framework – Generative AI Lens for detailed architectural guidance that supports these governance objectives.

Fairness

Generative AI systems can deliver equitable outcomes across different stakeholder groups, helping organizations build trust and meet expectations. Organizations can support this by setting up clear fairness metrics for specific use cases, regularly assessing training data for bias, and closely monitoring performance across different groups. For high-stakes applications, additional checks can help facilitate fair treatment across diverse populations.

Amazon Bedrock Guardrails provides configurable safeguards to help maintain fair and unbiased outputs, with customizable thresholds to match different use case requirements. Amazon Bedrock provides comprehensive model evaluation tools including model cards with detailed bias metrics, to assess bias across demographic groups. Amazon Bedrock includes built-in prompt datasets like the Bias in Open-ended Language Generation Dataset (BOLD), which automatically evaluates fairness across key areas such as profession, gender, race, and various ideologies. These capabilities integrate with Amazon SageMaker Clarify for comprehensive bias detection and mitigation, supported by built-in bias metrics and reporting.

Explainability

Generative AI systems can provide understanding of their decision-making processes, supporting accountability and effective oversight. Explainability is essential for all generative AI systems—whether using custom-built or pre-built models, particularly for complex models like transformer networks.

Organizations can implement practical controls by establishing clear explainability thresholds based on use case risk levels. This remains an active industry challenge, with ongoing research and evolving approaches. For critical business applications, tailoring explanations to different stakeholders while maintaining accuracy can improve understanding and trust.

Amazon Bedrock provides tools that help identify which factors influenced the generative AI’s decisions, while maintaining detailed records of system inputs and outputs. For complex workflows, Chain-of-Thought (CoT) reasoning traces are available through Amazon Bedrock Agents, showing the step-by-step logic behind each decision. Organizations can monitor how responses are generated in real time. For Retrieval-Augmented Generation (RAG) applications, which optimize AI outputs by referencing specific knowledge bases, Amazon Bedrock Knowledge Bases automatically includes references and links to source materials used in generating responses.

Privacy and security

Generative AI systems benefit from strong privacy and security measures to protect sensitive information and help prevent unauthorized access or data exposure. These systems can potentially generate content or unintentionally reveal confidential data, which organizations can proactively manage.

Organizations can set up multi-layered protection strategies, including access controls, content filtering, and data privacy safeguards. This can involve creating company-wide standards for prompt engineering to help prevent harmful outputs, using techniques like RAG to control information sources, and using automated systems to detect and protect personal information. Regular testing and validation, especially to comply with regulations like GDPR, can be part of the development and deployment process.

Amazon Bedrock implements multiple security layers including private endpoints with Amazon Virtual Private Cloud (Amazon VPC) support, fine-grained AWS Identity and Access Management (IAM) access control, and end-to-end encryption. Importantly, it maintains no persistent storage of prompt or completion data and helps preserve model provider isolation.

Amazon Bedrock Guardrails provides sensitive information filters that can detect and protect personally identifiable information (PII) through automated input rejection, response redaction, and configurable regex patterns, supporting various use cases while maintaining data privacy. Organizations like Genesys demonstrate these capabilities at scale, maintaining GDPR compliance while processing 1.5 billion monthly customer interactions through Amazon Bedrock.

For detailed security considerations, see Generative AI Security Scoping Matrix, which provides a comprehensive framework for assessing and addressing generative AI security risks.

Safety

Generative AI systems can be designed and operated with safeguards to avoid harm to individuals, and communities. This includes addressing risks of generating dangerous, illegal, or abusive content, and helping to prevent system misuse.

Organizations can implement specific safety measures through predeployment content filtering, real-time safety boundaries with prompt constraints, and output classification systems to detect and block dangerous content. Context-aware content moderation considers the specific application domain, while automated detection can identify potential safety violations before content generation. Ongoing monitoring and updating of these controls help address evolving capabilities and potential risks of generative AI systems.

Amazon Bedrock Guardrails delivers industry-leading safety protections across text and images, blocking up to 85 percent more harmful content on top of native protections provided by foundation models (FMs). Additional safety controls include token limits to avoid excessive responses, rate limiting against misuse, and moderation endpoints for content screening.

For full practical implementation guidance on building safety controls, see Build safe and responsible generative AI applications with guardrails.

Controllability

Organizations can maintain appropriate control over generative AI systems to make sure that they work as intended and can be adjusted or stopped if issues arise. This helps manage risks and maintain system reliability.

A multi-layered approach to control includes implementing technical safeguards and operational processes. Organizations can control model behaviour by adjusting parameters such as temperature (controlling output randomness), and sampling methods like top-k or top-p (managing output diversity). Clear operational boundaries define the system’s scope of action, while human-in-the-loop validation provides oversight for critical applications.

For effective control, organizations can establish parameter thresholds tailored to different use cases, implement rapid adjustment mechanisms, and create clear escalation procedures. Amazon Bedrock enhances control through customizable agent prompts and reasoning techniques, and the ability to break complex tasks into smaller, manageable components. Organizations can choose between structured workflows or flexible agent-based approaches. Regular comparison of outputs against established benchmarks helps maintain system reliability.

This balanced approach supports creative AI outputs while helping to facilitate consistent performance within defined quality limits. This helps prevent service degradation and business disruption while minimizing inefficiencies.

Control capabilities are further enhanced through Amazon CloudWatch monitoring integration and robust knowledge base version control. The capabilities of Amazon Bedrock, including LLM-as-a-judge features, help organizations assess and optimize their generative AI applications efficiently.

Veracity and robustness

Generative AI systems can produce reliable and accurate outputs, even when faced with unexpected or challenging inputs. This helps maintain trust and helps maintain the system’s usefulness across various applications.

Organizations can implement a combination of technical and procedural controls to enhance both system robustness and output reliability. This includes establishing clear parameter thresholds for different use cases, implementing human-in-the-loop validation for critical applications, and regularly comparing outputs against established ground truths. The framework specifies when and how these controls are applied based on the use case criticality and required level of accuracy.

Amazon Bedrock Guardrails improves veracity by helping to prevent factual errors through automated reasoning checks that deliver up to 99 percent accuracy in detecting correct responses from models, using mathematical logic and formal verification techniques. This capability supports processing of large documents up to 80,000 tokens and includes automated scenario generation for comprehensive testing.

Amazon Bedrock also includes sophisticated input sanitization features and supports adversarial testing through AWS testing tools integration.

Governance

Effective governance of generative AI systems helps manage risks, maintain accountability, and align AI use with organizational values and regulations. This covers the entire AI lifecycle, from development to deployment and ongoing operation.

Organizations can create clear governance structures, including defined roles for AI oversight, regular risk assessments, and ways to engage with stakeholders. This involves integrating AI governance into existing risk management practices and making sure of compliance with relevant laws and standards. Because AI technology is evolving rapidly, regular reviews and updates to governance practices are essential to address new capabilities, emerging risks, and changing regulatory requirements. This includes providing appropriate training and skill development for system users.

AWS has achieved of ISO/IEC 42001 certification, demonstrating our commitment to systematic governance approaches in AI implementation. Governance features in Amazon Bedrock include comprehensive model provenance tracking, detailed AWS CloudTrail audit logging, and streamlined model deployment approval workflows integrated with AWS Organizations. AWS Audit Manager provides pre-built frameworks to assess generative AI implementation against best practices.

Transparency

Generative AI systems can operate transparently, helping stakeholders understand system capabilities, limitations, and the context of AI-generated outputs. This builds trust and enables informed decision-making by users and affected parties.

Organizations can implement specific transparency measures including comprehensive model documentation detailing intended use cases, known limitations, and performance boundaries. Clear AI disclosure practices should describe when and how AI is being used and what data is being processed. Regular performance reporting can include accuracy rates, error patterns, and bias assessments.

For customer-facing applications, transparency includes providing clear indicators of AI-generated content, documenting how decisions are made, and establishing processes for users to question or challenge outputs. Maintaining detailed version histories of model updates and changes in system behavior helps track the evolution of AI capabilities and their impacts over time.

From the AWS side of the Shared Responsibility Model, transparency is supported through AWS AI Service Cards and detailed documentation of model characteristics. Amazon Bedrock enhances this with comprehensive logging and monitoring capabilities to track model behavior and performance metrics.

Unified risk management

These eight areas are interconnected and mutually reinforcing within the enterprise risk management framework. While organizations might prioritize them differently based on their use cases and risk appetite, together they provide a comprehensive approach to responsible generative AI adoption. For detailed technical guidance, standards, and compliance requirements, see the AWS guidance documents in Resources for technical implementation, at the end of this blog post, that support implementation across these areas.

AI risk management in practice: Building organizational capability

Successful implementation of generative AI systems involves integrating risk management practices across the organization. This includes establishing processes for measuring outcomes and risks and preparing the organization to adapt as technology evolves. Effective risk management depends on building appropriate knowledge and skills at all levels of the organization.

Organizations can create clear pathways from proof of concept to production by aligning with the three lines of defense model. The ERMF provides broad parameters for reliability, safety, and privacy, which business units can adapt for their specific use cases.

To build and maintain lasting capability for both current and future generative AI adoption, organizations can focus on:

  • Developing incident response plans for AI-specific scenarios
  • Building expertise through training and certification programs
  • Regular review and updates of risk management practices

These elements, when woven into the organization’s operating fabric, create sustainable practices that evolve with advancing technology and emerging risks.

Sustainable risk management: Making your ERMF generative AI-ready

Governance, risk, and compliance (GRC) leaders, Chief Risk Officers (CROs), and Chief Internal Auditors (CIAs) can provide sustained executive sponsorship for generative AI adoption. Long-term capability building extends beyond technology and innovation hubs to encompass business and control functions. Clear direction from leadership helps organizations balance generative AI opportunities with appropriate risk management.

Organizations benefit from viewing generative AI as a transformative capability that touches many functions rather than as isolated initiatives. This approach supports sustainable integration of enterprise-wide governance approaches for generative AI, avoiding the limitations of short-term projects with restricted scope and impact.

Organizations can successfully implement generative AI while maintaining their risk management obligations through controlled, well-defined use cases. TP ICAP’s Parameta division demonstrates this approach in their regulatory compliance implementation. By focusing initially on a highly regulated area, maintaining clear governance controls, and making sure there was human oversight in the compliance review process, they established a framework for responsible AI adoption. This led to creating dedicated oversight roles for AI initiatives, strengthening their governance structure for future AI implementations.

Similarly, Rocket Mortgage’s implementation of AWS services for their AI tool Rocket Logic – Synopsis demonstrates how organizations can use Amazon Bedrock for responsible AI integration at scale. This approach enabled them to maintain stringent data security and compliance measures while saving 40,000 team hours annually through automated processes.

Action checklist for sustainable generative AI implementation:

  • ERMF foundations: Assess and enhance your risk framework’s readiness for generative AI, including acceptable use guidelines and clear accountabilities
  • Technical controls: Begin with core controls such as Amazon Bedrock Guardrails and expand based on specific use cases and risk profiles
  • Organizational capability: Develop broad expertise through training and oversight mechanisms across business and control functions
  • Monitoring and measurement: Create dashboards for key risk indicators and maintain regular reviews
  • Integration strategy: Align generative AI controls with existing processes and organizational strategy

Conclusion

This two-part series has explored the critical importance of integrating generative AI governance into enterprise risk management frameworks. In Part 1, we introduced the unique risks and governance considerations associated with generative AI adoption. Part 2 has provided a comprehensive guide for adapting your ERMF to address these challenges effectively.

We’ve outlined practical strategies for scaling generative AI adoption securely and responsibly, covering key areas such as fairness, explainability, privacy and security, safety, controllability, veracity and robustness, governance, and transparency. By implementing these strategies and following the action checklist provided, organizations can build sustainable practices that evolve with advancing technology and emerging risks.

Organizations that integrate generative AI governance into their ERMF as described in this post are better positioned to accelerate innovation and operational efficiency while protecting against key risks such as data exposure, model hallucinations, and regulatory non-compliance. This balanced approach enables organizations to capture the transformative potential of generative AI while maintaining the robust controls essential for financial services institutions.

For foundational concepts and risk considerations, see Part 1.

Customer success stories

Resources for technical implementation

 


If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Milind Dabhole

Milind Dabhole

Milind is a Principal Customer Solutions Manager focusing on enterprise innovation and risk governance. Before joining AWS, he spent over two decades in financial services, holding senior roles across first, second, and third lines of defense at global financial institutions. At AWS, he advises C-suite executives on cloud and AI transformation strategies that balance innovation with robust controls.

Stephen James Martin

Stephen James Martin

Steve is the Head of Financial Services Compliance and Security for EMEA and APAC. Steve Joined AWS after working for over 20 years in financial service in senior leadership roles with responsibility across Asia, the Middle East, and Europe. At AWS, he supports customers as they use the scale, security, and agility of AWS to transform the industry.

Enabling AI adoption at scale through enterprise risk management framework – Part 1

Post Syndicated from Milind Dabhole original https://aws.amazon.com/blogs/security/enabling-ai-adoption-at-scale-through-enterprise-risk-management-framework-part-1/

According to BCG research, 84% of executives view responsible AI as a top management responsibility, yet only 25% of them have programs that fully address it. Responsible AI can be achieved through effective governance, and with the rapid adoption of generative AI, this governance has become a business imperative, not just an IT concern. By implementing systematic governance approaches at the enterprise level, organizations can balance innovation with control, effectively managing the risks while harnessing the transformative potential of generative AI.

While generative AI technologies offer compelling capabilities, they also introduce new types of risks that need business oversight and management. Financial institutions face real challenges—AI-driven financial analysis tools could make investment recommendations based on biased data, leading to significant losses, while generative AI-powered customer service systems might inadvertently expose confidential customer information. The unprecedented scale and speed at which generative AI operates makes robust business controls essential. However, with the right governance approach and strategic oversight, these risks are manageable.

Part 1 of this two-part blog post guides business leaders, Chief Risk Officers (CROs), and Chief Internal Auditors (CIAs) through three critical questions:

  • What specific or unique risks does generative AI introduce and how can they be managed?
  • How should your enterprise risk management framework (ERMF) evolve to support generative AI adoption?
  • How can you build sustainable generative AI governance in an ever-changing world—what should be on your checklist?

To address these questions, organizations can use established frameworks and standards including:

These frameworks provide valuable guidance for organizations looking to implement responsible and governed AI practices.

Role of GRC leaders, CROs, and CIAs

Governance, risk and control (GRC) functions led by business leaders, CROs and CIAs are well-positioned to advance generative AI innovation in financial services institutions. These functions have successfully managed complex risks in banks for years, and their existing expertise, proven approaches, and established risk frameworks provide a strong foundation for guiding generative AI adoption. They collaborate across the three lines of defense: business leaders making implementation decisions and managing associated risks (first line), risk and compliance functions providing frameworks and oversight (second line), and internal audit providing independent assurance (third line).

If generative AI risks, both perceived and real, are managed through enterprise-wide governance practices rather than isolated project-by-project approaches, organizations can use the advantages offered by generative AI over the long term. This requires integration with the ERMF, with some practices fitting into existing structures while others need deliberate adjustments to ERMF itself to address generative AI’s unique characteristics.

New frontiers in generative AI risk management

The traditional risk landscape at the enterprise level was based on a paradigm in which risks are predicted from past exposures. Preventive controls help stop unwanted things from happening, detective controls discover when bad things slip through the preventive controls, and corrective controls take remediation actions.

Much of this paradigm is still valid in the world of generative AI. For example, access to generative AI applications needs to be managed carefully to avoid unauthorized use. All three types of the preceding controls should help prevent unauthorized use, identify potential breaches, and remedy unauthorized access when detected.

However, additional focus and attention are required in the following areas when implementing generative AI solutions:

  • Non-deterministic outputs – The non-deterministic nature of generative AI outputs poses a specific challenge. While the probabilistic nature of these systems is often useful, the risk of inaccurate output from the black box can have serious business implications, and organizations need to take conscious actions to address these risks. Organizations can address this through Amazon Bedrock Guardrails Automated Reasoning checks, which use mathematically sound verification to help prevent factual errors and hallucinations.
  • Deepfake threat – Generative AI’s ability to create authentic-looking images and documents extends beyond traditional fraudulent activities. It elevates the threat to an entirely new level, creating eerily realistic content with unprecedented ease—hence the term deepfake. This poses significant challenges for organizations in verifying document authenticity, particularly in processes like Know Your Customer (KYC).
  • Layered opacity – While enterprises are learning about generative AI, they must address risks from multi-layered AI systems where each layer generates content and makes decisions based on potentially unexplainable models, hampering traceability. For example, consider generative AI outputs from a third-party system serving as inputs to internal AI systems, creating a chain of interdependent decisions. This lack of transparency in critical decisions affecting organizational performance and customer treatment could have profound implications for enterprise trustworthiness, brand reputation, and regulatory compliance.

The following table outlines key generative AI risk areas and their potential business impacts. In Part 2, we explain how organizations can address these risks through their ERMF. Effectively managing these risks through enterprise-wide governance not only protects the organization but also forms the foundation for responsible AI adoption. Robust risk management and governance are essential prerequisites for achieving responsible AI outcomes.

For a comprehensive foundation in responsible AI implementation, see the AWS Responsible Use of AI Guide, which aligns with the governance principles that we discuss throughout this article.

Risk area Description Potential risk impact
Fairness Are the underlying data and algorithms fair and unbiased? Are the outputs leading to fair outcomes for different groups of stakeholders?
  • Discrimination lawsuits
  • Loss of trust
  • Business loss because of exclusion of segments
Explainability Can stakeholders understand the black box behavior and evaluate system outputs?
  • Legal liabilities and regulatory sanctions due to inability to explain decisions
  • Incorrect business decisions
Privacy and security Are the systems aligned with privacy regulations and security requirements?
  • Fines arising from data breaches
  • Loss of trust
  • Damage because of security incidents
Safety Are there controls to help prevent harmful system output and misuse?
  • Harmful content generation
  • Customer harm
  • Reputational damage
Controllability Are there mechanisms to monitor and steer AI system behaviour, including detection of model and data drifts?
  • Undetected degradation of service
  • Business disruption because of unreliable decisions
  • Customer harm
  • Inefficiencies arising from remediation
Veracity and robustness Can the system maintain correct outputs even with unexpected or adversarial inputs?
  • Incorrect business decisions
  • System failures under stress
  • Loss of operational reliability
Governance Are there documented accountabilities across the AI supply chain including model providers and deployers? Are users adequately trained to use systems?
  • Confusion in crisis management
  • Personal liability for executives
  • Regulatory censure for governance failures
  • System misuse by untrained staff
Transparency Can stakeholders make informed choices about their engagement with the AI system?
  • Loss of customer trust
  • Regulatory non-compliance
  • Stakeholder dissatisfaction

Remitly’s implementation of Amazon Bedrock Guardrails to protect customer personally identifiable information (PII) data and reduce hallucinations demonstrates how financial institutions can effectively manage privacy and veracity risks in generative AI applications, addressing several of the risk areas outlined above.

Conclusion

In this post, we introduced the critical importance of responsible AI governance for enterprises adopting generative AI at scale. We explored the unique risks that generative AI presents, including non-deterministic outputs, deepfake threats, and layered opacity. We outlined key risk areas such as fairness, explainability, privacy and security, safety, controllability, veracity and robustness, governance, and transparency. These risks underscore the need for a robust enterprise risk management framework tailored to the challenges of generative AI.

We emphasized the crucial role of GRC leaders, CROs, and CIAs in advancing generative AI innovation while managing associated risks. By using established frameworks like the AWS Cloud Adoption Framework for AI, ISO/IEC 42001, and the NIST AI Risk Management Framework, organizations can implement responsible and governed AI practices.

In Part 2 of this series, we explore how organizations can adapt their enterprise risk management framework to address these risks effectively, including specific considerations for cloud and generative AI implementation. We’ll provide detailed guidance on making your ERMF generative AI-ready and outline practical steps for sustainable risk management.


Additional reading

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Milind Dabhole

Milind Dabhole

Milind is a Principal Customer Solutions Manager focusing on enterprise innovation and risk governance. Before joining AWS, he spent over two decades in financial services, holding senior roles across first, second, and third lines of defense at global financial institutions. At AWS, he advises C-suite executives on cloud and AI transformation strategies that balance innovation with robust controls.

Stephen James Martin

Stephen James Martin

Steve is the Head of Financial Services Compliance and Security for EMEA and APAC. Steve Joined AWS after working for over 20 years in financial service in senior leadership roles with responsibility across Asia, the Middle East, and Europe. At AWS, he supports customers as they use the scale, security, and agility of AWS to transform the industry.

Minimize risk through defense in depth: Building a comprehensive AWS control framework

Post Syndicated from Luis Pastor original https://aws.amazon.com/blogs/security/minimize-risk-through-defense-in-depth-building-a-comprehensive-aws-control-framework/

Security and governance teams across all environments face a common challenge: translating abstract security and governance requirements into a concrete, integrated control framework. AWS services provide capabilities that organizations can use to implement controls across multiple layers of their architecture—from infrastructure provisioning to runtime monitoring. Many organizations deploy multi-account environments with AWS Control Tower, or Landing Zone Accelerator to implement a foundational baseline of controls and security architecture. Once their environment is provisioned, organizations typically look to add additional detective controls from services such as AWS Security Hub and AWS Config based on security, compliance, and operational requirements. While this sequence is a great start, there are more opportunities during this time to implement layered defense-in-depth coverage to enhance your security posture.

Highly regulated industries such as fintech and financial services are often viewed as the gold standard for governance and security controls. While these sectors have established robust frameworks, there’s consistently room for improvement and valuable lessons for other industries looking to enhance their control environments. However, many organizations struggle to move beyond a basic compliance-focused approach. In our experience working with customers across various sectors, this limited perspective often stems from multiple factors, including:

  • Immediate compliance pressures
  • Resource constraints
  • Limited understanding of control maturity pathways
  • Focus on detection rather than prevention
  • A tendency to prioritize technology-agnostic controls over bult-in AWS capabilities, leading to unnecessarily complex implementations

The good news? A more comprehensive approach that uses AWS preventative, proactive, detective, and responsive controls can significantly reduce risk while decreasing operational overhead through automation.

In this post, we outline a practical framework that you can adopt to evolve your security and governance controls strategy. We explore how your organization can mature from a detection-focused security posture to a multi-layered control framework, using real-world examples across the resource lifecycle, including infrastructure-as-code testing and preventative controls such as service control policies (SCPs), resource control policies (RCPs), and declarative policies (DPs).

Drawing from best practices in highly regulated industries while incorporating modern cloud capabilities through services such as AWS Organizations and AWS Control Tower, we provide a structured framework that you can use to elevate your organization’s control environment beyond basic compliance requirements.

Customer challenges in implementing controls

Organizations face several significant challenges when attempting to implement a comprehensive control framework in AWS. Let’s explore the main obstacles:

Resource constraints and expertise gaps

Security teams often find themselves caught between limited resources and expanding responsibilities in the cloud. With constrained budgets and personnel, teams typically gravitate toward quick wins through detective controls, which appear straightforward to implement initially. While this provides immediate visibility, it can leave critical gaps in security posture. Many teams lack comprehensive expertise across all control types, particularly in implementing preventative, proactive, and responsive controls effectively. The pressure to demonstrate immediate security improvements, combined with day-to-day operational demands, frequently results in tactical solutions rather than strategic, layered security approaches.

Analysis paralysis

Deciding which tools to prioritize can be a challenge; the breadth of options and extensive capabilities available across AWS security services and third-party tools can feel overwhelming at times. Security teams struggle to determine the optimal mix of controls for their environment and where to begin implementation. This challenge is compounded by the complexity of mapping technical compliance requirements to cloud-focused capabilities and maintaining visibility into emerging threats as the security landscape evolves. The layers of abstraction created by proliferating security controls can further obscure clear decision-making, leading teams to delay critical security improvements while seeking perfect solutions.

Misunderstanding of defense in depth

Defense in depth as a concept is good, but it can be misunderstood and difficult to achieve, leading to vulnerabilities in the security architecture. A common misconception is that a single strong control, separation of duties in AWS Identity and Access Management (IAM) roles, least permission in IAM policies, and so on, provide sufficient protection. This overlooks the crucial value of implementing controls at multiple points and how different control types can be combined to create a robust security posture. Teams often miss how organizational controls like SCPs can work in harmony with workload-specific controls to achieve greater protection. The role of preventative controls in guiding technical implementations is frequently under appreciated.

Maturity journey challenges

The path to security maturity presents numerous obstacles. Many organizations remain stuck in the early stages, implementing detective controls but never progressing to preventative measures. Security controls are often implemented in isolation, without consideration for the broader security landscape. Organizations struggle to create and follow a clear roadmap for evolving their security posture, and measuring improvement over time proves challenging.

Scale and consistency issues

As AWS environments grow, maintaining consistent governance and security becomes increasingly complex. Organizations face mounting challenges in managing exceptions and special cases across their expanding infrastructure. These interrelated challenges often result in controls implementations that fail to achieve their intended risk reduction goals. You need a structured approach to overcome these obstacles and implement comprehensive security controls, which we explore in the following sections.

Strategic investment in security

While implementing comprehensive controls requires an initial investment in time and resources, the long-term benefits fundamentally transform how organizations operate.

The foundation for this transformation begins with establishing baseline controls through proven starting points such as AWS Control Tower and its customization options. AWS Control Tower provides building blocks for secure multi-account architectures with hundreds of security capabilities and proactive controls already built in. Rather than trying to create baselines from scratch by wrangling vast amounts of account-level or resource-specific controls, you can use these accelerators to rapidly establish a strong security foundation. With these baseline controls in place, this transformation extends beyond security teams to enable the entire organization to operate more efficiently. Development and operations teams can deploy faster with confidence when security guardrails are in place. Security becomes an enabler rather than a bottleneck, so that teams across the organization can innovate while maintaining a strong security posture.

As you mature your organization’s control framework through automation and layered defenses, a security transformation occurs. Security teams shift from constant firefighting to proactive risk management. Automated policy enforcement replaces manual reviews, and the time previously spent on routine tasks can be redirected to strategic initiatives.

Understanding control types and their interplay

AWS defines distinct types of controls to build a comprehensive security framework. Let’s examine each type and how they work together, using a common scenario: preventing public Amazon Simple Storage Service (Amazon S3) bucket exposure.

Preventative controls

Preventative controls establish the foundation of a secure environment by defining the policies, standards, and requirements that guide security implementations. At their core, these controls encompass corporate security policies that outline acceptable resource configurations across the organization. They work in conjunction with compliance requirements and frameworks to help maintain regulatory alignment, while architectural standards and guidelines provide technical direction for implementations. Data classification policies play a crucial role by determining specific security requirements based on data sensitivity.

To illustrate how preventative controls work in practice, consider a common S3 bucket security requirement. A typical preventative control might establish a corporate policy stating All S3 buckets must be private by default, with public access granted only through an approved exception process. This simple but effective policy sets clear expectations and requirements before a technical implementation begins.

  • Organization level: SCPs blocking public S3 bucket creation.
  • Resource level:
    • RCPs enforcing network access controls, such as requiring authenticated access or limiting requests to your organization’s network range.
    • SCPs to stop malicious overwrites of S3 objects using SSE-C encryption by blocking s3:PutObject requests with customer-provided keys unless explicitly allowed, paired with AWS IAM Roles Anywhere for short-term credential enforcement.
```json
// SCP to block SSE-C unless explicitly allowed 
{
  "Version": "2012-10-17",
  "Statement": [
 {
     "Sid": "DenySSECEncryption",
     "Effect": "Deny",
     "Action": "s3:PutObject",
     "Resource": "*",
     "Condition": {
         "Null": {
               "s3:x-amz-server-side-encryption-customer-algorithm": "false"}
            } }
  ]}

Proactive controls

Proactive controls act as an early warning system, identifying and addressing potential security issues before they manifest in your environment. These controls work by validating configurations and changes against established security requirements during the development and deployment phases. Through automated validation and policy enforcement at build and deploy time, proactive controls help prevent misconfigurations from reaching production environments, reducing the operational overhead of fixing security issues after the fact. Think of proactive controls as your first line of defense in maintaining a secure cloud environment. In AWS, these can be implemented at multiple levels:

  • Amazon S3 Block Public Access settings at the account level.
  • Policy-as-code checks in continuous integration and delivery (CI/CD) pipelines (such as CFN-Nag, or AWS Config proactive rules).
  • AWS CloudFormation hooks for pre-deployment validation and policy enforcement.
  • AWS Config rules in proactive mode to evaluate resources before creation.
  • At the resource level, you can use:
    • IAM policies restricting bucket policy modifications
    • CloudFormation Guard rules

#####################################
##           Gherkin               ##
#####################################
# Rule Identifier:
#    S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED
# Description:
#   Checks if your Amazon S3 bucket either has the Amazon S3 default encryption enabled or that the Amazon S3 bucket policy
#   explicitly denies put-object requests without server side encryption that uses AES-256 or AWS Key Management Service.
# Reports on:
#    AWS::S3::Bucket
# Evaluates:
#    AWS CloudFormation
# Rule Parameters:
#    NA
# Scenarios:
# a) SKIP: when there are no S3 resource present
# b) PASS: when all S3 resources Bucket Encryption ServerSideEncryptionByDefault is set to either "aws:kms" or "AES256"
# c) FAIL: when all S3 resources have Bucket Encryption ServerSideEncryptionByDefault is not set or does not have "aws:kms" or "AES256" configurations
# d) SKIP: when metadata includes the suppression for rule S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED
#
# Select all S3 resources from incoming template (payload)
#
let s3_buckets_server_side_encryption = Resources.*[ Type == 'AWS::S3::Bucket'
  Metadata.cfn_nag.rules_to_suppress not exists or 
  Metadata.cfn_nag.rules_to_suppress.*.id != "W41"
  Metadata.guard.SuppressedRules not exists or
  Metadata.guard.SuppressedRules.* != "S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED"
]
rule S3_BUCKET_SERVER_SIDE_ENCRYPTION_ENABLED when %s3_buckets_server_side_encryption !empty {
  %s3_buckets_server_side_encryption.Properties.BucketEncryption exists
  %s3_buckets_server_side_encryption.Properties.BucketEncryption.ServerSideEncryptionConfiguration[*].ServerSideEncryptionByDefault.SSEAlgorithm in ["aws:kms","AES256"]
  <<
    Violation: S3 Bucket must enable server-side encryption.
    Fix: Set the S3 Bucket property BucketEncryption.ServerSideEncryptionConfiguration.ServerSideEncryptionByDefault.SSEAlgorithm to either "aws:kms" or "AES256"
  >>
}

Source: aws-guard-rules-registry

Detective controls

Detective controls provide continuous visibility into your security posture by monitoring for and identifying potential security violations or unauthorized changes within your environment. While preventative controls aim to stop issues before they occur, detective controls help you maintain awareness of your security state and can identify when preventative controls have been bypassed or failed. These controls form a critical layer of defense by enabling rapid identification of security issues and providing the visibility needed for effective incident response and compliance reporting. While many organizations start and stop here, detective controls are only part of the solution:

  • AWS Config rules monitoring for public buckets
  • Security Hub findings to flag non-compliant resources
  • AWS IAM Access Analyzer evaluations

Responsive controls

Responsive controls complete the security lifecycle by providing automated and manual mechanisms to address security issues after they’re detected. These controls define and implement the actions taken when security violations are identified, ranging from automated remediation of common misconfigurations to coordinated incident response procedures for complex security events. By establishing clear response patterns and using automation where appropriate, responsive controls help facilitate consistent and timely handling of security issues while reducing the mean time to remediation. Responsive controls address violations when they occur:

The power comes not from implementing these controls in isolation, but from using them together in a coordinated way. This layered approach begins with preventative controls to establish the requirements, followed by proactive controls to block most potential violations at the source. Issues that manage to slip through are caught by detective controls, while responsive controls automatically remediate identified problems. Throughout this process, comprehensive documentation tracks issues, remediation plans, and progress, such as through a plan of action and milestones (POAM), helping to make sure that compliance requirements are met and improvements can be measured over time.

Implementation lifecycles: Ideal compared to reality

You can follow one of two paths when implementing security controls: starting fresh with a comprehensive approach or evolving from an existing detective-focused implementation. Let’s examine both scenarios.

Starting fresh: The ideal approach

When starting from scratch, you have a unique opportunity to build your security and governance following an ideal approach. Your team can take advantage of this clean slate to architect controls and processes methodically, free from legacy constraints. The following steps offer guidance though establishing a strong foundation while maintaining the flexibility you need as your business grows.

  • Rationalize controls against requirements and risk profile:
    • Choose appropriate security frameworks (for example, CIS and NIST).
    • Map compliance, regulatory, legal, and contractual requirements to your base framework.
    • Define clear security objectives and success criteria for your security and compliance program.
  • Design a comprehensive control strategy:
    • Document control requirements across all four types (preventive, proactive, detective, and responsive controls). You can use the framework to decide which controls are best for each type of requirement.
    • Plan implementation phases and priorities.
    • Define metrics for measuring effectiveness.
  • Implement controls in layers:
    • Start with AWS Control Tower, which gives you foundational controls to mature from. You can add customizations if required.
    • Think about additional preventative controls that can help establish a stronger security and compliance posture.
    • Deploy proactive controls to stop violations.
    • Add detective controls as safeguards.
    • Implement responsive controls for automated or manual remediation.
  • Monitor and assess effectiveness
    • Evaluate control performance against defined metrics.
    • Identify gaps and areas for improvement.
    • Adjust controls based on emerging threats and changing requirements.
    • Implement continuous improvement feedback loop.

Evolution from detective controls: The common path

Most organizations find themselves starting with detective controls and face challenges in maturing from there:

  • Initial state:
    • Baseline detective controls through Security Hub and AWS Config
    • Manual remediation processes
    • Limited visibility into security posture
  • Maturation steps:
    • Analyze findings to identify patterns
    • Implement automated remediation for common issues
    • Add preventative and proactive controls based on recurring events
    • Periodically refine and update policies
  • Optimization:
    • Review control effectiveness
    • Identify gaps in coverage
    • Implement additional preventative, proactive, detective, and responsive measures
    • Automate processes where possible

The goal: Comprehensive and layered security controls

The goal of implementing security controls across multiple layers isn’t just about compliance or following best practices—it’s about creating a robust, resilient security posture that can effectively help prevent, detect, and respond to security issues. Let’s explore why this approach is crucial:

Why multiple control layers matter

Security controls shouldn’t exist in isolation. When implementing a security requirement, you should consider:

  • How can we prevent this issue from occurring?
  • How will we detect if our preventative controls fail?
  • What should happen when we detect a violation?
  • What policies and standards guide these decisions?

Moving beyond detection

While detective controls are important, they signal that a security violation has already occurred. A mature security posture requires:

  • Strong preventative controls to stop violations before they happen
  • Detective controls as a safety net if there is drift or a violation
  • Automated remediation where possible, to reduce exposure time
  • Clear policies to guide implementation and decisions
  • Measuring success

You should measure the effectiveness of your control framework through several key performance indicators. Success can be seen in the steady reduction of security findings over time, coupled with decreasing time-to-remediation metrics. The maturity of the framework becomes evident through an increasing percentage of automated remediation activities and a declining number of recurring issues. These improvements manifest in better audit outcomes, providing tangible evidence that the control framework is delivering its intended results.

Practical implementation: From theory to practice

Let’s examine how to implement a comprehensive control framework using a common security requirement: preventing exposure of sensitive data through public S3 buckets. This example demonstrates how different control types work together to create defense in depth. While not every control might be necessary for every situation, each should be carefully considered and evaluated based on various factors including system criticality, data sensitivity, operational overhead, and organizational risk tolerance. The decision to implement or omit specific controls should be deliberate and documented, rather than occurring by default.

The architecture will have layers and components like the following.

  • Preventative layer:
    • Service control policies (SCPs) or resource control policies (RCPs)
    • S3 Block Public Access
    • IAM policies
  • Detective layer:
  • Responsive layer:
    • AWS Config auto-remediation
    • Lambda functions
    • Systems Manager automation

Building controls at each layer

An effective security and compliance strategy includes all four types of security controls. While preventative controls are a first line of defense to help prevent unauthorized access or unwanted changes to your network, it’s important to make sure that you establish detective and responsive controls so that you know when an event occurs and can take immediate and appropriate action to remediate it. Using proactive controls adds another layer of security because it complements preventative controls, which are generally stricter in nature.

Begin by defining your security objectives, then establish clear policies to meet those objectives:

  • Define organizational and business objectives:
    • Identify data protection goals
    • Determine acceptable risk levels
    • Align with compliance requirements
  • Establish clear policies:
    • For example, document business requirements for external data sharing and access controls in security policies. These requirements will drive technical decisions around AWS storage configurations such as S3 bucket policies and public access settings.
    • Define permitted use cases for public access.
    • Establish exception processes.
    • Set clear ownership and responsibilities.

Deploy preventative guardrails:

  • Organization level:
    • SCPs to block public bucket creation at the organization level
    • Account-level S3 Block Public Access settings to enforce account-level restrictions
  • Resource level:
    • IAM policies restricting bucket policy modifications
    • S3 bucket policy templates with controlled deployment
    • RCPs to enforce rules on specific resource types across your organization

Deploy proactive guardrails:

  • Infrastructure as code:
    • Implement policy-as-code checks in CI/CD pipelines using:
      • CloudFormation Guard
      • cfn-nag
      • AWS Config proactive rules
    • Integrate with pull request workflows
  • AWS Control Tower proactive controls:
    • Enable relevant optional AWS Control Tower guardrails

Add detective controls by creating a monitoring framework:

  • AWS CloudTrail for comprehensive API activity logging and auditing to enable investigation of unauthorized access attempts and configuration changes.
  • AWS Config rules for bucket configuration. AWS Config rules or AWS Config conformance packs deployed for the entire organization can monitor S3 bucket configurations for compliance.
  • Security Hub findings for continuous assessment by aggregating findings and flagging non-compliant resources.
  • Amazon EventBridge rules for policy changes to detect and route S3 bucket policy modifications.
  • IAM Access Analyzer for external access review.
  • Regular compliance reporting, which can be automated through AWS Audit Manager.

Implement responsive controls by automating remediation where possible:

  • Security Hub and Systems Manager integration to automate incident response workflows.
  • Custom Lambda functions for specific use cases.
  • Integration with ITSM for human review when needed.
  • AWS Config remediation rules For example, the AWSConfigRemediation-ConfigureS3PublicAccessBlock runbook configures an AWS account’s S3 Block Public Access settings based on the values you specify in the runbook parameters.

The following table describes control types, what a basic implementation includes, and the services and methods used for advanced implementation.

Control type Basic implementation Advanced implementation
Preventative Documentation, peer reviews SCPs, RCPs, DPs, IAM policies, and S3 Block Public Access
Detective Security Hub, AWS Config rules Security Hub, AWS Config, and CloudWatch alerts
Responsive Manual remediation Auto-remediation through AWS Config, Systems Manager, EventBridge, and Lambda
Compliance One-time checks CIS/NIST mapping with Security Hub and automation of evidence collection and reporting using AWS Audit Manager
Automation Limited Full CI/CD Integration (for example, using CloudFormation or Terraform)
Cost optimization effort High (manual effort) Low (automation reduces overhead)

Scaling and management considerations

As your security and governance program matures, scaling these controls across a growing organization requires thoughtful management and automation. This section explores key considerations for effectively managing your security posture at scale, optimizing costs, and maintaining consistency across your AWS environment. Whether you’re expanding across multiple accounts, business units, or AWS Regions, these practices help you balance security requirements with operational efficiency and cost management.

Use AWS services effectively:

  • Consider deploying AWS Control Tower for consistent account setup and centrally deploying and managing controls at scale across multiple use cases and organizational units.
  • AWS Organizations can aid hierarchical policy management and the implementation of:
    • IAM policies for identity-based guardrails and permissions
    • SCPs for access guardrails
    • RCPs define permissions based on resource attributes
    • DPs to help facilitate consistent resource configurations across your organization
    • Tag policies for consistent resource categorization
    • Backup policies for data protection standards
    • AI service opt-out policies for data privacy requirements
    • Cost allocation tag policies to standardize cost attribution
    • Data residency policies to enforce regional restrictions
  • Implement resource governance through policy integration
    • For example, use Organizations tag policies to enforce a Confidential tag on S3 buckets storing personally identifiable information (PII). Combine this with SCPs that mandate AES-256 encryption for tagged buckets, overriding developer attempts to disable it.
    • Using backup policies to enforce retention rules (for example, Retention=7 years).
    • Use DPs to help maintain consistent security configurations across resources, such as enforcing encryption settings on Amazon Elastic Block Store (Amazon EBS) volumes or requiring specific security group rules.
  • Centralize logging and monitoring

Manage compliance exceptions:

  • Implement clear exception processes
  • Document and track approved exceptions
  • Establish regular periodic reviews of exceptions
  • Use time-bound approvals with automated expiration

Optimize costs:

  • Use periodic instead of continuous checking where appropriate
  • Implement targeted monitoring based on resource criticality
  • Use AWS Config recordings effectively
  • Balance automation costs against manual effort

Conclusion: Moving forward with control maturity

Implementing a comprehensive control framework is a journey, not a destination. Start from your organization’s current position, whether that’s with basic detective controls or a fresh implementation, and focus on progressive improvement rather than attempting to implement everything at once. Success comes from carefully documenting decisions about control implementation, regularly reviewing them, and using automation to reduce operational overhead while improving consistency. Progress can be measured through concrete metrics: reduced findings, faster remediation times, and increased automation.

Remember that the goal extends beyond better security—it’s about transforming security and governance from a reactive operation to a strategic enabler that provides real business value. This transformation manifests through reduced risk from systematic controls, improved operational efficiency through automation, and enhanced visibility and governance. Perhaps most importantly, it frees security teams to focus on strategic initiatives rather than routine operational tasks.

By following this approach, you can build a robust security and governance posture that not only protects your organization’s AWS environment but also supports business innovation and growth. The result is a security program that evolves alongside the business, enabling rather than hindering progress, while maintaining a strong approach that can scale with your organization’s needs.


If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Luis Pastor

Luis Pastor

Luis is a Senior Security Solutions Architect at AWS specializing in infrastructure security and compliance. He leads Technical Field Communities focused on security and compliance while contributing to AWS Well-Architected guidance. Before AWS, he helped clients across financial services, healthcare, and retail industries improve their security posture in hybrid environments. Outside of work, Luis enjoys staying active and culinary adventures.

Rodolfo Brenes

Rodolfo Brenes

Rodolfo is a Principal Solutions Architect focused on Cloud Governance and Compliance. With over 18 years of experience, he currently leads a technical field community in AWS helping customers scale and improve their security and governance frameworks. Besides work, Rodolfo enjoys video games, playing with his four cats, and won’t say no to a good outdoor adventure.

George'son Tib.

George’son Tib.

George’son is a Solutions Architect focused on Infrastructure Security at AWS, working with Enterprise customers in the Auto and Manufacturing Industry. He specializes in helping organizations build robust, automated control frameworks that enhance their security posture and drive operational efficiency.

Adopting Amazon Q Developer in Enterprise Environments

Post Syndicated from Rene-Martin Tudyka original https://aws.amazon.com/blogs/devops/adopting-amazon-q-developer-in-enterprise-environments/

Increasing developer productivity has been a persistent challenge for senior leaders over the past decades. With the rise of generative artificial intelligence (AI), a new wave of innovation is transforming how software teams work. Generative AI tools like Amazon Q Developer are emerging as game-changers, supporting developers across the entire software development lifecycle. But how are large-scale organizations successfully adopting this AI-assisted approach? This document shares good practices I have discovered through working with enterprise customers navigating this technological transformation.

A common misperception still is that a developer is more productive when they write code faster. But a median developer spends less than one hour per day writing code, as a study conducted by software.com in 2022 shows. This makes clear that there are other aspects to consider when it comes to building an application and running it in production. Generative AI tools for developers, such as Amazon Q Developer, started as coding companions that provided inline completions. As the technology evolved, Amazon Q Developer is now able to support developers across the entire software development life cycle.

The Change Challenge

Generative AI offers new and interesting ways for developers to solve challenges and to support them in their daily work. But taking advantage of those opportunities takes some time. It introduces a noticeable change to their familiar ways of working and therefore it is much about forming a new habit. This, according to science, usually takes a minimum of two months. Teams need the space and permission to play with this new approach and to find out what works best for them. Expecting them to adopt a new way of working while expecting the same (or higher) level of output at the same time will only lead to teams falling back to what they know works. For customers that successfully have adopted Amazon Q Developer I have seen them reducing the delivery expectations to give space to learn, and having required teams to share learnings in return.

Additionally, as in any other large change project there is a significant cultural aspect to consider. If people feel intrinsically convinced by the value and benefits of an AI-supported approach to software engineering they will use it. “Simply ordering from above” will not help making an adoption successful. But building community, experimenting, learning, and sharing successes will. “Culture eats strategy for breakfast”, as the management visionary Peter Drucker said.

Keeping that in mind it is clear that there is no one-size-fits-all prescriptive way of successfully introducing generative AI tools for developers in your organization. It very much depends on your individual culture, goals, challenges, people, skills, and technology stack for example. However, there are a number of principles and good practices that work well with several of our large-scale customers who have successfully adopted Amazon Q Developer.

Best Practices for Successful Adoption

The following illustration describes the change management cycle and recommended activities for adopting AI-assisted software development.

The change management cycle – align, plan, roll out, reflect and repeat

Figure 1. The change management cycle

1. Secure Top-Down Commitment

Secure executive buy-in for adopting AI-assisted software development because this is a powerful sign for the organization. It helps to remove roadblocks and to promote successes across the organization. An executive sponsor links the adoption of AI-assisted software development to the strategic goals of the organization, will help resolving prioritization and capacity conflicts. Invite the sponsor to a kick-off meeting. Include them for the discussion of goals and success criteria, and keep them updated on progress, success stories, and challenges.

2. Become Clear on Goals and Success Criteria

Simply rolling out Generative AI tools to a large developer base won’t be enough. You need to be clear what you expect from such an adoption for your organization, for your developers, and for your business. It is important for you to understand your organization’s pain points from a developer experience perspective and how they affect your development productivity. These likely differ between different personas, like Software Developers and DevOps Engineers. For example, are your applications lacking test coverage or is your code lacking documentation, which creates a maintenance burden, and makes it difficult to onboard new developers? Is handling legacy code risky and time-consuming so that you avoid necessary upgrades because of the complexity and effort? Is troubleshooting applications locally or in production eating up your developers’ time? Or are you facing challenges caused by hard-to-find security issues in the code? What is it that you want to achieve by introducing an AI-assisted development approach, and why?

3. Establish Ownership

Adopting a new approach like the use of generative AI in software development at scale leads to many change management and coordination tasks. Those are related to getting access to the tool, enablement for new users, budget planning, measuring success and creating transparency on problems, creating momentum and making sure the adoption sustains, amongst others. Therefore, introducing a Customer Champion as a leader who coordinates the business and technology related aspects of the adoption is a common approach. They will connect the strategic goals of the organization with the tactical activities necessary for the implementation. If your adoption is spanning multiple different business units with larger developer bases consider establishing this role for each of them, forming a team that collaborates across your whole organization. Key responsibilities of the Customer Champion are to bring the right people together to successfully work on the adoption, to create transparency on status, and to address impediments early on.

4. Introduce Metrics

Once you have gained clarity on the specific pain points and goals for your organization, the next step is to determine the appropriate metrics to measure the success of your efforts. For example, you can measure the impact of using Amazon Q Developer on onboarding new developers by tracking the time to first commit against a previously established baseline average. Comparing the time it takes to complete certain tasks – like writing and integrating code, fixing bugs, setting up or upgrading environments, the time it takes to build a new feature, or by comparing sprint velocity before and after the introduction of Amazon Q Developer will give you further indications. But keep in mind that there is ambiguity in these metrics because they are impacted by different factors.

Focusing on developer experience and productivity, monitor the development of your established measurement framework, like DORA, SPACE, or GSM. SPACE in particular, with its “Satisfaction & Well-Being” dimension, pays close attention to how software developers perceive their work and how satisfied they are with their own productivity. Tools like Amazon Q Developer have shown a positive effect here, as for example a McKinsey study shows. They help to free developers from tasks that are perceived as toil, like repetitive and boring grunt work that doesn’t necessarily bring business value. To measure this impact on perceived productivity, customers sometimes design simple surveys, asking developers questions like “in percentage, how would you estimate the impact of using Amazon Q Developer on your productivity” or “on a scale from 1-5, how is Amazon Q Developer impacting the satisfaction with your day-to-day work?”

As a last dimension, understanding the general usage of Amazon Q Developer across your organization is important. Monitor the number active subscriptions, accepted code recommendations, or the number of executed security scans from the Amazon Q Developer dashboard. That way you will get an understanding for the acceptance of the tool in your developer base. Correlate this information with the success metrics you defined.

5. Start Small

When “rolling out” an AI-assisted software development approach, keep the technology adoption lifecycle and Everett Roger’s bell curve in mind. It describes that adoption of a new technology usually starts with a few “innovators”, followed by a small fraction of “early adopters”. Only if those early groups demonstrate convincing success, the majority of users will follow the adoption. To settle on this model will help you making the adoption of AI-assisted development successful in your organization.

Start small. Identify your tech innovators, or champions, who are enthusiastic to support the introduction of Amazon Q Developer and to advocate the approach in your organization. With them, build a team or a Center of Excellence (COE) that will help with identifying early adopters across your development teams, building technical and enablement foundations for onboarding users, creating roll-out plans, and evangelizing and sustaining the adoption. The champions will act as a bridge between your adoption program team and your users. They can provide insights and feedback on the adoption and come up with recommendations.

6. Create Momentum

Now that you have your foundations in place it is time to create momentum and onboard your early adopters to Amazon Q Developer. Start with a communication plan to raise awareness, to share updates and success stories, and to collect feedback from the users. You will need to create training material covering organization-specific resources (how to get access, for example, or which communication channels and contact points exist), user guides, tutorials, and pointers to relevant documentation and online resources. Consider creating your own prompt library, documenting prompts for Amazon Q Developer that your users find particularly helpful. There are community projects like promptz.dev that can deliver inspiration. Especially for new users this will have a lot of value for getting up to speed.

Consider how to integrate learning pathways with your organization’s learning platform moving forward. These should include the company-specific experience and knowledge you collect. In addition, it might be helpful to integrate external offers, like classroom trainings or workshop formats offered by AWS.

Hands-on technical enablement workshops will help your early adopters jumpstarting their experience. AWS offers multiple formats that can be tailored to your individual context. These include service deep-dives with Amazon Q Developer specialists, Immersion Days and hackathon support for teams to hands-on experience the tool in a guided, interactive format, or joined proof-of-value engagements. Your AWS account team will provide you more information.

7. Sustain and Scale Adoption

At that point you will have established Amazon Q Developer among an initial segment of your developer base. However, keeping Roger’s adoption curve in mind the largest part of the adoption still lies ahead of you. To facilitate the it across your whole organization, focus on delivering enablement workshops for the teams to be onboarded. These will be led by your champions and incorporate your organization-specific practices and knowledge.

To sustain the adoption, make sure the existing user base stays active. Continue offering exchange and support, collecting feedback, updating your enablement material, sharing updates on the latest developments for Amazon Q Developer, and reviewing your metrics. Create visibility across your organization for the accomplishments and positive experiences. Let user talk about the impact Amazon Q Developer has on their work. This will help building momentum. Nurture community building around your users of AI-assisted development.

Establish regular office hours with your champions for providing support and enablement to your users of Amazon Q Developer. This will facilitate the continuous gathering of feedback to improve documentation and enablement materials, collect and promote success stories, and validate the adoption approach. Additionally, establish a consistent communications and reporting cadence to keep all relevant stakeholders informed of key metrics, updates, feedback, and success stories. This ensures the alignment of the adoption of AI-assisted development with your strategic goals and expectations.

8. Inspect and Adapt

In addition, keep reflecting on the goals and metrics you came up with from a strategic perspective. Are they developing in the right direction? Are your pain points still the same? Should you move your focus to different aspects of the software development life cycle? And how might new capabilities of Amazon Q Developer support your goals?

Conclusion

By following the outlined approach, large organizations can successfully navigate the challenges of adopting Amazon Q Developer. The approach addresses technical, cultural, and organizational aspects of the adoption, increasing the likelihood of realizing the full potential of AI-assisted development across the enterprise.

If you are looking for advice or support during your adoption journey, your AWS Account Team will connect you with experts on Amazon Q Developer, provide you information on training and enablement, and support you with setting up a successful rollout program for your organization. To learn more about how Amazon Q Developer’s capabilities support the whole software development lifecycle, visit the product page or have a look at the documentation.

About the Author

Rene-Martin Tudyka is a Senior Customer Solutions Manager at AWS and provides guidance and support for enterprise customers on their cloud maturity journey. He has a long background in developing highly performant IT organizations and in successful large-scale cloud adoption.

Making sense of secrets management on Amazon EKS for regulated institutions

Post Syndicated from Piyush Mattoo original https://aws.amazon.com/blogs/security/making-sense-of-secrets-management-on-amazon-eks-for-regulated-institutions/

Amazon Web Services (AWS) customers operating in a regulated industry, such as the financial services industry (FSI) or healthcare, are required to meet their regulatory and compliance obligations, such as the Payment Card Industry Data Security Standard (PCI DSS) or Health Insurance Portability and Accountability Act (HIPPA).

AWS offers regulated customers tools, guidance and third-party audit reports to help meet compliance requirements. Regulated industry customers often require a service-by-service approval process when adopting cloud services to make sure that each adopted service aligns with their regulatory obligations and risk tolerance. How financial institutions can approve AWS services for highly confidential data walks through the key considerations that customers should focus on to help streamline the approval of cloud services. In this post we cover how regulated customers, especially FSI customers, can approach secrets management on Amazon Elastic Kubernetes Service (Amazon EKS) to help meet data protection and operational security requirements. Amazon EKS gives you the flexibility to start, run, and scale Kubernetes applications in the AWS Cloud or on-premises.

Applications often require sensitive information such as passwords, API keys, and tokens to connect to external services or systems. Kubernetes has secrets objects for managing these types of sensitive information. Additional tools and approaches have evolved to supplement the Kubernetes Secrets to help meet the compliance requirements of regulated organizations. One of the driving forces behind the evolution of these tools for regulated customers is that the native Kubernetes Secrets values aren’t encrypted but encoded as base64 strings; meaning that their values can be decoded by a threat actor with either API access or authorization to create a pod in a namespace containing the secret. There are options such as GoDaddy Kubernetes External Secrets, AWS Secrets and Configuration Provider (ASCP) for the Kubernetes Secrets Store CSI Driver, Hashicorp Vault, and Bitnami Sealed secrets that you can use to can help to improve the security, management, and audibility of your secrets usage.

In this post, we cover some of the key decisions involved in choosing between External Secrets Operator (ESO), Sealed Secrets, and ASCP for the Kubernetes Secrets Store Container Storage Interface (CSI) Driver, specifically for FSI customers with regulatory demands. These decision points are also broadly applicable to customers operating in other regulated industries.

AWS Shared Responsibility Model

Security and compliance is a shared responsibility between AWS and the customer. The AWS Shared Responsibility Model describes this as security of the cloud and security in the cloud:

  • AWS responsibility – Security of the cloud: AWS is responsible for protecting the infrastructure that runs the services offered in the AWS Cloud. For Amazon EKS, AWS is responsible for the Kubernetes control plane, which includes the control plane nodes and etcd database. Amazon EKS is certified by multiple compliance programs for regulated and sensitive applications. The effectiveness of the security controls are regularly tested and verified by third-party auditors as part of the AWS compliance programs.
  • Customer responsibility – Security in the cloud: Customers are responsible for the security and compliance of customer configured systems and services deployed on AWS. This includes responsibility for securely deploying, configuring and managing ESO within their Amazon EKS cluster. For Amazon EKS, the customer responsibility depends upon the worker nodes you pick to run your workloads and cluster configuration as shown in Figure 1. In the case of Amazon EKS deployment using Amazon Elastic Compute Cloud (Amazon EC2) hosts, the customer responsibility includes the following areas:
    • The security configuration of the data plane, including the configuration of the security groups that allow traffic to pass from the Amazon EKS control plane into the customer virtual private cloud (VPC).
    • The configuration of the nodes and the containers themselves.
    • The nodes’ operating system, including updates and security patches.
    • Other associated application software:
    • The sensitivity of your data, such as personally identifiable information (PII), keys, passwords, and tokens
      • Customers are responsible for enforcing access controls to protect their data and secrets.
      • Customers are responsible for monitoring and logging activities related to secrets management including auditing access, detecting anomalies and responding to security incidents.
    • Your company’s requirements, applicable laws and regulations
    • When using AWS Fargate, the operational overhead for customers is reduced in the following areas:
      • The customer is not responsible for updating or patching the host system.
      • Fargate manages the placement and scaling of containers.
Figure 1: AWS Shared Responsibility Model with Fargate and Amazon EC2 based workflows

Figure 1: AWS Shared Responsibility Model with Fargate and Amazon EC2 based workflows

As an example of the Shared Responsibility Model in action, consider a typical FSI workload accepting or processing payments cards and subject to PCI DSS requirements. PCI DSS v4.0 requirement 3 focuses on guidelines to secure cardholder data while at rest and in transit:

Control ID Control description
3.6 Cryptographic keys used to protect stored account data are secured.
3.6.1.2 Store secret and private keys used to encrypt and decrypt cardholder data in one (or more) of the following forms:

  • Encrypted with a key-encrypting key that is at least as strong as the data-encrypting key, and that is stored separately from the data-encrypting key.
  • Stored within a secure cryptographic device (SCD), such as a hardware security module (HSM) or PTS-approved point-of-interaction device.
  • Has at least two full-length key components or key shares, in accordance with an industry-accepted method. Note: It is not required that public keys be stored in one of these forms.
3.6.1.3 Access to cleartext cryptographic key components is restricted to the fewest number of custodians necessary.

NIST frameworks and controls are also broadly adopted by FSI customers. NIST Cyber Security Framework (NIST CSF) and NIST SP 800-53 (Security and Privacy Controls for Information Systems and Organizations) include the following controls that apply to secrets:

Regulation or framework Control ID Control description
NIST CSF PR.AC-1 Identities and credentials are issued, managed, verified, revoked, and audited for authorized devices, users and processes.
NIST CSF PR.DS-1 Data-at-rest is protected.
NIST 800-53.r5 AC-2(1)
AC-3(15)
Secrets should have automatic rotation enabled.
Delete unused secrets.

Based on the preceding objectives, the management of secrets can be categorized into two broad areas:

  • Identity and access management ensures separation of duties and least privileged access.
  • Strong encryption, using a dedicated cryptographic device, introduces a secure boundary between the secrets data and keys, while maintaining appropriate management over the cryptographic keys.

Choosing your secrets management provider

To help choose a secrets management provider and apply compensating controls effectively, in this section we evaluate three different options based on the key objectives derived from the PCI DSS and NIST controls described above and other considerations such as operational overhead, high availability, resiliency, and developer or operator experience.

Architecture and workflow

The following architecture and component descriptions highlight the different architectural approaches and responsibilities of each solution’s components, ranging from controllers and operators, command-line interface (CLI) tools, custom resources, and CSI drivers working together to facilitate secure secrets management within Kubernetes environments.

External Secrets Operator (ESO) extends the Kubernetes API using a custom resource definition (CRD) for secret retrieval. ESO enables integration with external secrets management systems such as AWS Secrets Manager, HashiCorp Vault, Google Secrets Manager, Azure Key Vault, IBM Cloud Secrets Manager, and various other systems. ESO watches for changes to an external secret store and keeps Kubernetes secrets in sync. These services offer features that aren’t available with native Kubernetes Secrets, such as fine-grained access controls, strong encryption, and automatic rotation of secrets. By using these purpose-built tools outside of a Kubernetes cluster, you can better manage risk and benefit from central management of secrets across multiple Amazon EKS clusters. For more information, see the detailed walkthrough of using ESO to synchronize secrets from Secrets Manager to your Amazon EKS Fargate cluster.

ESO is comprised of a cluster-side controller that automatically reconciles the state within the Kubernetes cluster and updates the related secrets anytime the external API’s secret undergoes a change.

Figure 2: ESO workflow

Figure 2: ESO workflow

Sealed Secrets is an open source project by Bitnami comprised of a Kubernetes controller coupled with a client-side CLI tool with the objective to store secrets in Git in a secure fashion. Sealed Secrets encrypts your Kubernetes secret into a SealedSecret, which can also be deployed to a Kubernetes cluster using kubectl. For more information, see the detailed walkthough of using tools from the Sealed Secrets open source project to manage secrets in your Amazon EKS clusters.

Sealed Secrets comprises of three main components: First, there is an operator or a controller which is deployed onto a Kubernetes cluster. The controller is responsible for decrypting your secrets. Second, you have a CLI tool called Kubeseal that takes your secret and encrypts it. Third, you have a CRD. Instead of creating regular secrets, you create SealedSecrets, which is a CRD defined within Kubernetes. That is how the operator knows when to perform the decryption process within your Kubernetes cluster.

Upon startup, the controller looks for a cluster-wide private-public key pair and generates a new 4096-bit RSA public-private key pair if one doesn’t exist. The private key is persisted in a secret object in the same namespace as the controller. The public key portion of this is made publicly available to anyone wanting to use Sealed Secrets with this cluster.

Figure 3: Sealed Secrets workflow

Figure 3: Sealed Secrets workflow

The AWS Secrets Manager and Config Provider (ASCP) for Secret Store CSI driver is an open source tool from AWS that allows secrets from Secrets Manager and Parameter Store, a capability of AWS Systems Manager, to be mounted as files inside Amazon EKS pods. It uses a CRD called SecretProviderClass to specify which secrets or parameters to mount. Upon a pod start or restart, the CSI driver retrieves the secrets or parameters from AWS and writes them to a tmpfs volume mounted in the pod. The volume is automatically cleaned up when the pod is deleted, making sure that secrets aren’t persisted. For more information, see the detailed walkthrough on how to set up and configure the ASCP to work with Amazon EKS.

ASCP comprises of a cluster-side controller acting as the provider, allowing secrets from Secrets Manager, and parameters from Parameter Store to appear as files mounted in Kubernetes pods. Secrets Store CSI Driver is a DaemonSet with three containers: node-driver-registrar, which registers the CSI driver with Kubelet; secrets-store, which implements the CSI Node service gRPC services for mounting and unmounting volumes during pod creation and deletion; and  liveness-probe, which monitors the health of the CSI driver and reports to Kubernetes for automatic issue detection and pod restart.

Figure 4: AWS Secrets Manager and configuration provider

Figure 4: AWS Secrets Manager and configuration provider

In the next section, we cover some of the key decisions involved in choosing whether to use ESO, Sealed Secrets, or ASCP for regulated customers to help meet their regulatory and compliance needs.

Comparing ESO, Sealed Secrets, and ASCP objectives

All three solutions address different aspects of secure secrets management and aim to help FSI customers meet their regulatory compliance requirements while upholding the protection of sensitive data in Kubernetes environments.

ESO synchronizes secrets from external APIs into Kubernetes, targeting the cluster operator and application developer personas. The cluster operator is responsible for setting up ESO and managing access policies. The application developer is responsible for defining external secrets and the application configuration.

Sealed Secrets encrypts your Kubernetes secrets before storing them in version control systems such as public Git repositories. This is the case if you decide to check in your Kubernetes manifest to a Git repository granting access to your sensitive secrets to anyone who has access to the Git repository. This is ultimately the reason why Sealed Secrets was created and the sealed secret can be decrypted only by the controller running in the target cluster.

Using ASCP, you can securely store and manage your secrets in Secrets Manager and retrieve them through your applications running on Kubernetes without having to write custom code. Secrets Manager provides features such as rotation, auditing, and access control that can help FSI customers meet regulatory compliance requirements and maintain a robust security posture.

Installation

The deployment and configuration details that follow highlight the different approaches and resources used by each solution to integrate with Kubernetes and external secret stores, catering to the specific requirements of secure secrets management in containerized environments.

ESO provides Helm charts for ease of operator deployment. External Secrets provides custom resources like SecretStore and ExternalSecret for configuring the required operator functionality to synchronize external secrets to your cluster. For instance, SecretStore can be used by the cluster operator to be able to connect to AWS Secrets Manager using appropriate credentials to pull in the secrets.

To install Sealed Secrets, you can deploy the Sealed Secrets Controller onto the Kubernetes cluster. You can deploy the manifest by itself or you can use a Helm chart to deploy the Sealed Secrets Controller for you. After the controller is installed, you use the Kubeseal client-side utility to encrypt secrets using asymmetric cryptography. If you don’t already have the Kubeseal CLI installed, see the installation instructions.

ASCP provides Helm charts to assist in operator deployment. The ASCP operator provides custom resources such as SecretProviderClass to provide provider-specific parameters to the CSI driver. During pod start and restart, the CSI driver will communicate with the provider using gRPC to retrieve the secret content from the external secret store you specified in the SecretProviderClass custom resource. Then the volume is mounted in the pod as tmpfs and the secret contents are written to the volume.

Encryption and key management

These solutions use robust encryption mechanisms and key management practices provided by external secret stores and AWS services such as AWS Key Management Service (AWS KMS) and Secrets Manager. However, additional considerations and configurations might be required to meet specific regulatory requirements, such as PCI DSS compliance for handling sensitive data.

ESO relies on encryption features within the external secrets management system. For instance, Secrets Manager supports envelope encryption with AWS KMS which is FIPS 140-2 Level 3 certified. Secrets Manager has several compliance certifications making it a great fit for regulated workloads. FIPS 140-2 Level 3 ensures only strong encryption algorithms approved by NIST can be used to protect data. It also defines security requirements for the cryptographic module, creating logical and physical boundaries.

Both AWS KMS and Secrets Manager help you to manage key lifecycle and to integrate with other AWS Services. In terms of key rotation, both provide automatic rotation of secrets that runs on a schedule (which you define), and abstract the complexity of managing different versions of keys. For AWS managed keys, the key rotation happens automatically once every year by default. With customer managed keys (CMKs), automatic key rotation is available but not enabled by default.

When using SealedSecrets, you use the Kubeseal tool to convert a standard Kubernetes Secret into a Sealed Secrets resource. The contents of the Sealed Secrets are encrypted with the public key served by the Sealed Secrets Controller as described in the Sealed Secrets project homepage.

In the absence of cloud native secrets management integration, you might have to add compensating controls to achieve the regulatory standards required by your organization. In cases where the underlying SealedSecrets data is sensitive in nature, such as cardholder PII, PCI requires that you store sensitive secrets in a cryptographic device such as a hardware security module (HSM). You can use Secrets Manager to store the master key generated to seal the secrets. However, this you will have to enable additional integration with Amazon EKS APIs to fetch the master key securely from the EKS cluster. You will also have to modify your deployment process to use a master key from Secrets Manager. The applications running in the EKS cluster must have permissions to fetch the SealedSecret and master key from Secrets Manager. This might involve configuring the application to interact with Amazon EKS APIs and Secrets Manager. For non-sensitive data, Kubeseal can be used directly within the EKS cluster to manage secrets and sealing keys.

For key rotation, you can store the controller generated private key in Parameter Store as a SecureString. You can use the advanced tier in Parameter Store if the file containing the private keys exceeds the Standard tier limit of up to 4,096 characters. In addition, if you want to add key rotation, you can use AWS KMS.

The ASCP relies on encryption features within the chosen secret store, such as Secrets Manager. Secrets Manager supports integration with AWS KMS for an additional layer of security by storing encryption keys separately. The Secrets Store CSI Driver facilitates secure interaction with the secret store, but doesn’t directly encrypt secrets. Encrypting mounted content can provide further protection, but introduces operational overhead related to key management.

ASCP relies on Secrets Manager and AWS KMS for encryption and decryption capabilities. As a recommendation, you can encrypt mounted content to further protect the secrets. However, this introduces the additional operational overhead of managing encryption keys and addressing key rotation.

Additional considerations

These solutions address various aspects of secure secrets management, ranging from centralized management, compliance, high availability, performance, developer experience, and integration with existing investments, catering to the specific needs of FSI customers in their Kubernetes environments.

ESO can be particularly useful when you need to manage an identical set of secrets across multiple Kubernetes clusters. Instead of configuring, managing, and rotating secrets at each cluster level individually, you can synchronize your secrets across your clusters. This simplifies secrets management by providing a single interface to manage secrets across multiple clusters and environments.

External secrets management systems typically offer advanced security features such as encryption at rest, access controls, audit logs, and integration with identity providers. This helps FSI customers ensure that sensitive information is stored and managed securely in accordance with regulatory requirements.

FSI customers usually have existing investments in their on-premises or cloud infrastructure, including secrets management solutions. ESO integrates seamlessly with existing secrets management systems and infrastructure, allowing FSI customers to use their investment in these systems without requiring significant changes to their workflow or tooling. This makes it easier for FSI customers to adopt and integrate ESO into their existing Kubernetes environments.

ESO provides capabilities for enforcing policies and governance controls around secrets management such as access control, rotation policies, and audit logging when using services like Secrets Manager. For FSI customers, audits and compliance are critical and ESO verifies that access to secrets is tracked and audit trails are maintained, thereby simplifying the process of demonstrating adherence to regulatory standards. For instance, secrets stored inside Secrets Manager can be audited for compliance with AWS Config and AWS Audit Manager. Additionally, ESO uses role-based access control (RBAC) to help prevent unauthorized access to Kubernetes secrets as documented in the ESO security best practices guide.

High availability and resilience are critical considerations for mission critical FSI applications such as online banking, payment processing, and trading services. By using external secrets management systems designed for high availability and disaster recovery, ESO helps FSI customers ensure secrets are available and accessible in the event of infrastructure failure or outages, thereby minimizing service disruption and downtime.

FSI workloads often experience spikes in transaction volumes, especially during peak days or hours. ESO is designed to efficiently managed a large volume of secrets by using external secrets management that’s optimized for performance and scalability.

In terms of monitoring, ESO provides Prometheus metrics to enable fine-grained monitoring of access to secrets. Amazon EKS pods offer diverse methods to grant access to secrets present on external secrets management solutions. For example, in non-production environments, access can be granted through IAM instance profiles assigned to the Amazon EKS worker nodes. For production, using IAM roles for service accounts (IRSA) is recommended. Furthermore, you can achieve namespace level fine-grained access control by using annotations.

ESO also provides options to configure operators to use a VPC endpoint to comply with FIPS requirements.

Additional developer productivity benefits provided by ESO include support for JSON objects (Secret key/value in the AWS Management console) or strings (Plaintext in the console). With JSON objects, developers can programmatically update multiple values atomically when rotating a client certificate and private key.

The benefit of Sealed Secrets, as discussed previously, is when you upload your manifest to a Git repository. The manifest will contain the encrypted SealedSecrets and not the regular secrets. This assures that no one has access to your sensitive secrets even when they have access to your Git repository. Sealed Secrets offer a few benefits to developers in terms of developer experience. Sealed Secrets gives you access to manage your secrets, making them more readily available to developers. Sealed Secrets offers VSCode extension to assist in integrating it into the software development lifecycle (SDLC). Using Sealed Secrets, you can store the encrypted secrets in the version control systems such as Gitlab and GitHub. Sealed Secrets can reduce operational overhead related to updating dependent objects because whenever a secret resource is updated, the same update is applied to the dependent objects.

ASCP integration with the Kubernetes Secrets Store CSI Driver on Amazon EKS offers enhanced security through seamless integration with Secrets Manager and Parameter Store, ensuring encryption, access control, and auditing. It centralizes management of sensitive data, simplifying operations and reducing the risk of exposure. The dynamic secrets injection capability facilitates secure retrieval and injection of secrets into Kubernetes pods, while automatic rotation provides up-to-date credentials without manual intervention. This combined solution streamlines deployment and management, providing a secure, scalable, and efficient approach to handling secrets and configuration settings in Kubernetes applications.

Consolidated threat model

We created a threat model based on the architecture of the three solution offerings. The threat model provides a comprehensive view of the potential threats and corresponding mitigations for each solution, allowing organizations to proactively address security risks and ensure the secure management of secrets in their Kubernetes environments.

X = Mitigations applicable to the solution

Threat Mitigations ESO Sealed Secrets ASCP
Unauthorized access or modification of secrets
  • Implement least privilege access principles
  • Rotate and manage credentials securely
  • Enable RBAC and auditing in Kubernetes
X X X
Insider threat (for example, a rogue administrator who has legitimate access)
  • Implement least privilege access principles
  • Enable auditing and monitoring
  • Enforce separation of duties and job rotation
X X
Compromise of the deployment process
  • Secure and harden the deployment pipeline
  • Implement secure coding practices
  • Enable auditing and monitoring
X
Unauthorized access or tampering of secrets during transit
  • Enable encryption in transit using TLS
  • Implement mutual TLS authentication between components
  • Use private networking or VPN for secure communication
X X X
Compromise of the Kubernetes API server because of vulnerabilities or misconfiguration
  • Secure and harden the Kubernetes API server
  • Enable authentication and authorization mechanisms (for example, mutual TLS and RBAC)
  • Keep Kubernetes components up-to-date and patched
  • Enable Kubernetes audit logging and monitoring
X
Vulnerability in the external secrets controller leading to privilege escalation or data exposure
  • Keep the external secrets controller up-to-date and patched
  • Regularly monitor for and apply security updates
  • Implement least privilege access principles
  • Enable auditing and monitoring
X
Compromise of the Secrets Store CSI Driver, node-driver-registrar, Secrets Store CSI Provider, kubelet, or Pod could lead to unauthorized access or exposure of secrets
  • Implement least privilege principles and role-based access controls
  • Regularly patch and update the components
  • Monitor and audit the component activities
X
Unauthorized access or data breach in Secrets Manager could expose sensitive secrets
  • Implement strong access controls and access logging for Secrets Manager
  • Encrypt secrets at rest and in transit
  • Regularly rotate and update secrets
X X

Shortcomings and limitations

The following limitations and drawbacks highlight the importance of carefully evaluating the specific requirements and constraints of your organization before adopting any of these solutions. You should consider factors such as team expertise, deployment environments, integration needs, and compliance requirements to promote a secure and efficient secrets management solution that aligns with your organization’s needs.

ESO doesn’t include a default way to restrict network traffic to and from ESO using network policies or similar network or firewall mechanisms. The application team is responsible for properly configuring network policies to improve the overall security posture of ESO within your Kubernetes cluster.

Any time an external secret associated with ESO is rotated, you must restart the deployment that uses that particular external secret. Given the inherent risks associated with integrating an external entity or third-party solution into your system, including ESO, it’s crucial to implement a comprehensive threat model similar to the Kubernetes Admission Control Threat Model.

Also, ESO set up is complicated and the controller must be installed on the Kubernetes cluster.

SealedSecrets cannot be reused across namespaces unless they’re re-encrypted or made cluster-wide, which makes it challenging to manage secrets across multiple namespaces consistently. The need to manually rotate and re-encrypt SealedSecrets with new keys can introduce operational overhead, especially in large-scale environments with numerous secrets. The old sealing keys pose a potential risk of misuse by unauthorized users, which increases the risk. To mitigate both risks (high overhead and old secrets), you should implement additional controls such as deleting older keys as part of the key rotation process or periodically rotate sealing keys and make sure that old sealed secret resources are re-encrypted with the new keys. Sealed Secrets doesn’t support external secret stores such as HashiCorp Vault, or cloud provider services such as Secrets Manager, Parameter Store, or Azure Key Vault. Sealed Secrets requires a Kubeseal client-side binary to encrypt secrets. This can be a concern in FSI environments where client-side tools are restricted by security policies.

While ASCP provides seamless integration with Secrets Manager and Parameter Store, teams unfamiliar with these AWS services might need to invest some additional effort to fully realize the benefits. This additional effort is justified by the long-term benefits of centralized secrets management and access control provided by these services. Additionally, relying primarily on AWS services for secrets management can potentially limit flexibility in deploying to alternative cloud providers or on-premises environments in the future. These factors should be carefully evaluated based on the specific needs and constraints of the application and deployment environment.

Conclusion

We have provided a summary of three options for managing secrets in Amazon EKS, ESO, Sealed Secrets, and AWS Secrets and Configuration Provider (ASCP), and the key considerations for FSI customers when choosing between them. The choice depends on several factors including existing investments in secrets management systems, specific security needs and compliance requirements, preference for a Kubernetes native solution or willingness to accept vendor lock-in.

The guidance provided here covers the strengths, limitations, and trade-offs of each option, allowing regulated institutions to make an informed decision based on their unique requirements and constraints. This guidance can be adapted and tailored to fit the specific needs of an organization, providing a secure and efficient secrets management solution for their Amazon EKS workloads, while aligning with the stringent security and compliance standards of the regulated institutions.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Piyush Mattoo

Piyush Mattoo
Piyush is a Senior Solution Architect for Financial Services Data Provider segment at Amazon Web Services. He is a software technology leader with over a decade long experience building scalable and distributed software systems to enable business value through the use of technology. He is based out of Southern California and current interests include outdoor camping and nature walks.

Ruy Cavalcanti

Ruy Cavalcanti
Ruy is a Senior Security Architect for the Latin American Financial market at AWS. He has been working in IT and Security for over 19 years, helping customers create secure architectures in the AWS Cloud. Ruy’s interests include jamming on his guitar, firing up the grill for some Brazilian-style barbecue, and enjoying quality time with his family and friends.

Chetan Pawar

Chetan Pawar
Chetan is a Cloud Architect specializing in infrastructure within AWS Professional Services. As a member of the Containers Technical Field Community, he provides strategic guidance on enterprise Infrastructure and DevOps for clients across multiple industries. He has an 18-year track record building large-scale Infrastructure and containerized platforms. Outside of work, he is an avid traveler and motorsport enthusiast.

Governance at scale: Enforce permissions and compliance by using policy as code

Post Syndicated from Roland Odorfer original https://aws.amazon.com/blogs/security/governance-at-scale-enforce-permissions-and-compliance-by-using-policy-as-code/

AWS Identity and Access Management (IAM) policies are at the core of access control on AWS. They enable the bundling of permissions, helping to provide effective and modular access control for AWS services. Service control policies (SCPs) complement IAM policies by helping organizations enforce permission guardrails at scale across their AWS accounts.

The use of access control policies isn’t limited to AWS resources. Customer applications running on AWS infrastructure can also use policies to help control user access. This often involves implementing custom authorization logic in the program code itself, which can complicate audits and policy changes.

To address this, AWS developed Amazon Verified Permissions, which helps implement fine-grained authorizations and permissions management for customer applications. This service uses Cedar, an open-source policy language, to define permissions separately from application code.

In addition to access control, you can also use policies to help monitor your organization’s individual governance rules for security, operations and compliance. One example of such a rule is the regular rotation of cryptographic keys to help reduce the impact in the event of a key leak.

However, manually checking and enforcing such rules is complex and doesn’t scale, particularly in fast-growing IT organizations. Therefore, organizations should aim for an automated implementation of such rules. In this blog post, I will show you how to use policy as code to help you govern your AWS landscape.

Policy as code

Similar to infrastructure as code (IaC), policy as code is an approach in which you treat policies like regular program code. You define policies in the form of structured text files (policy documents), which policy engines can automatically evaluate.

The main advantage of this approach is the ability to automate key governance tasks, such as policy deployment, enforcement, and auditing. By storing policy documents in a central repository, you can use versioning, simplify audits, and track policy changes. Furthermore, you can subject new policies to automated testing through integration into a continuous integration and continuous delivery (CI/CD) pipeline. Policy as code thus forms one of the key pillars of a modern automated IT governance strategy.

The following sections describe how you can combine different AWS services and functions to integrate policy as code into existing IT governance processes.

Access control – AWS resources

Every request to AWS control plane resources (specifically, AWS APIs)—whether through the AWS Management Console, AWS Command Line Interface (AWS CLI), or SDK — is authenticated and authorized by IAM. To determine whether to approve or deny a specific request, IAM evaluates both the applicable policies associated with the requesting principal (human user or workload) and the respective request context. These policies come in the form of JSON documents and follow a specific schema that allows for automated evaluation.

IAM supports a range of different policy types that you can use to help protect your AWS resources and implement a least privilege approach. For an overview of the individual policy types and their purpose, see Policies and permissions in IAM. For some practical guidance on how and when to use them, see IAM policy types: How and when to use them. To learn more about the IAM policy evaluation process and the order in which IAM reviews individual policy types, see Policy evaluation logic.

Traditionally, IAM relied on role-based access control (RBAC) for authorization. With RBAC, principals are assigned predefined roles that grant only the minimum permissions needed to perform their duties (also known as a least privilege approach). RBAC can seem intuitive initially, but it can become cumbersome at scale. Every new resource that you add to AWS requires the IAM administrator to manually update each role’s permissions – a tedious process that can hamper agility in dynamic environments.

In contrast, attribute-based access control (ABAC) bases permissions on the attributes assigned to users and resources. IAM administrators define a policy that allows access when certain tags match. ABAC is especially advantageous for dynamic, fast-growing organizations that have outgrown the RBAC model. To learn more about how to implement ABAC in an AWS environment, see Define permissions to access AWS resources based on tags.

For a list of AWS services that IAM supports and whether each service supports ABAC, see AWS services that work with IAM.

Access control – Customer applications

Customer applications that run on AWS resources often require an authorization mechanism that can control access to the application itself and its individual functions in a fine-grained manner.

Many customer applications come with custom authorization mechanisms in the application code itself, making it challenging to implement policy changes. This approach can also hinder monitoring and auditing because the implementation of authorization logic often differs between applications, and there is no uniform standard.

To address this challenge, AWS developed Amazon Verified Permissions and the associated open-source policy language Cedar. Amazon Verified Permissions replaces the custom authorization logic in the application code with a simple IsAuthorized API call, so that you can control and monitor authorization logic centrally by using Cedar-based policies. To learn how to integrate Amazon Verified Permissions into your applications and define custom access control policies with Cedar, see How to use Amazon Verified Permissions for authorization.

Compliance

In addition to access control, you can also use policies to help monitor and enforce your organization’s individual governance rules for security, operations and compliance. AWS Config and AWS Security Hub play a central role in compliance because they enable the setup of multi-account environments that follow best practices (known as landing zones). AWS Config continuously tracks resource configurations and changes, while Security Hub aggregates and prioritizes security findings. With these services, you can create controls that enable automated audits and conformity checks. Alternatively, you can also choose from ready-to-use controls that cover individual compliance objectives such as encryption at rest, or entire frameworks, such as PCI-DSS and NIST 800-53.

AWS Control Tower builds on top of AWS Config and Security Hub to help simplify governance and compliance for multi-account environments. AWS Control Tower incorporates additional controls with the existing ones from AWS Config and Security Hub, presenting them together through a unified interface. These controls apply at different resource life cycle stages, as shown in Figure 1, and you define them through policies.

Figure 1: Resource life cycle

Figure 1: Resource life cycle

The controls can be categorized according to their behavior:

  • Proactive controls scan IaC templates before deployment to help identify noncompliance issues early.
  • Preventative controls restrict actions within an AWS environment to help prevent noncompliant actions. For example, these controls can help prevent deployment of large Amazon Elastic Compute Cloud (Amazon EC2) instances or restrict the available AWS Regions for some users.
  • Detective controls monitor deployed resources to help identify noncompliant resources that proactive and preventative controls might have missed. They also detect when deployed resources are changed or drift out of compliance over time.

Categorizing controls this way allows for a more comprehensive compliance framework that encompasses the entire resource life cycle. The stage at which each control applies determines how it may help enforce policies and governance rules.

With AWS Control Tower, you can enable hundreds of preconfigured security, compliance, and operational controls through the console with a single click, without needing to write code. You can also implement your own custom controls beyond what AWS Control Tower provides out of the box. The process for implementing custom controls varies depending on the type of control. In the following sections, I will explain how to set up custom controls for each type.

Proactive controls

Proactive controls are mechanisms that scan resources and their configuration to confirm that they adhere to compliance requirements before they are deployed. AWS provides a range of tools and services that you can use, both in isolation and in combination with each other, to implement proactive controls. The following diagram provides an overview of the available mechanisms and an example of their integration into a CI/CD pipeline for AWS Cloud Development Kit (CDK) projects.

Figure 2: CI/CD pipeline in AWS CDK projects

Figure 2: CI/CD pipeline in AWS CDK projects

As shown in Figure 2, you can use the following mechanisms as proactive controls:

  1. You can validate artifacts such as IaC templates locally on your machine by using the AWS CloudFormation Guard CLI, which facilitates a shift-left testing strategy. The advantage of this approach is the relatively early testing in the deployment cycle. This supports rapid iterative development and thus reduces waiting times.

    Alternatively, you can use the CfnGuardValidator plugin for AWS CDK, which integrates CloudFormation Guard rules into the AWS CDK CLI. This streamlines local development by applying policies and best practices directly within the CDK project.

  2. To centrally enforce validation checks, integrate the CfnGuardValidator plugin into a CDK CI/CD pipeline.
  3. You can also invoke the CloudFormation Guard CLI from within AWS CodeBuild buildspecs to embed CloudFormation Guard scans in a CI/CD pipeline.
  4. With CloudFormation hooks, you can impose policies on resources before CloudFormation deploys them.

AWS CloudFormation Guard uses a policy-as-code approach to evaluate IaC documents such as AWS CloudFormation templates and Terraform configuration files. The tool defines validation rules in the Guard language to check that these JSON or YAML documents align with best practices and organizational policies around provisioning cloud resources. By codifying rules and scanning infrastructure definitions programmatically, CloudFormation Guard automates policy enforcement and helps promote consistency and security across infrastructure deployments.

In the following example, you will use CloudFormation Guard to validate the name of an Amazon Simple Storage Service (Amazon S3) bucket in a CloudFormation template through a simple Guard rule:

To validate the S3 bucket

  1. Install CloudFormation Guard locally. For instructions, see Setting up AWS CloudFormation Guard.
  2. Create a YAML file named template.yaml with the following content and replace <DOC-EXAMPLE-BUCKET> with a bucket name of your choice (this file is a CloudFormation template, which creates an S3 bucket):
    Resources:
      S3Bucket:
        Type: 'AWS::S3::Bucket'
        Properties:
          BucketName: '<DOC-EXAMPLE-BUCKET>'

  3. Create a text file named rules.guard with the following content:
    rule checkBucketName {
        Resources.S3Bucket.Properties.BucketName == '<DOC-EXAMPLE-BUCKET>'
    }

  4. To validate your CloudFormation template against your Guard rules, run the following command in your local terminal:
    cfn-guard validate --rules rules.guard --data template.yaml

  5. If CloudFormation Guard successfully validates the template, the validate command produces an exit status of 0 ($? in bash). Otherwise, it returns a status report listing the rules that failed. You can test this yourself by changing the bucket name.

To accelerate the writing of Guard rules, use the CloudFormation Guard rulegen command, which takes a CloudFormation template file as an input and autogenerates Guard rules that match the properties of the template resources. To learn more about the structure of CloudFormation Guard rules and how to write them, see Writing AWS CloudFormation Guard rules.

The AWS Guard Rules Registry provides ready-to-use CloudFormation Guard rule files to accelerate your compliance journey, so that you don’t have to write them yourself.

Through the CDK plugin interface for policy validation, the CfnGuardValidator plugin integrates CloudFormation Guard rules into the AWS CDK and validates generated CloudFormation templates automatically during its synthesis step. For more details, see the plugin documentation and Accelerating development with AWS CDK plugin – CfnGuardValidator.

CloudFormation Guard alone can’t necessarily prevent the provisioning of noncompliant resources. This is because CloudFormation Guard can’t detect when templates or other documents change after validation. Therefore, I recommend that you combine CloudFormation Guard with a more authoritative mechanism.

One such mechanism is CloudFormation hooks, which you can use to validate AWS resources before you deploy them. You can configure hooks to cancel the deployment process with an alert if CloudFormation templates aren’t compliant, or just initiate an alert but complete the process. To learn more about CloudFormation hooks, see the following blog posts:

CloudFormation hooks provide a way to authoritatively enforce rules for resources deployed through CloudFormation. However, they don’t control resource creation that occurs outside of CloudFormation, such as through the console, CLI, SDK, or API. Terraform is one example that provisions resources directly through the AWS API rather than through CloudFormation templates. Because of this, I recommend that you implement additional detective controls by using AWS Config. AWS Config can continuously check resource configurations after deployment, regardless of the provisioning method. Using AWS Config rules complements the preventative capabilities of CloudFormation hooks.

Preventative controls

Preventative controls can help maintain compliance by applying guardrails that disallow policy-violating actions. AWS Control Tower integrates with AWS Organizations to implement preventative controls with SCPs. By using SCPs, you can restrict IAM permissions granted in a given organization or organizational unit (OU). One example of this is the selective activation of certain AWS Regions to meet data residency requirements.

SCPs are particularly valuable for managing IAM permissions across large environments with multiple AWS accounts. Organizations with many accounts might find it challenging to monitor and control IAM permissions. SCPs help address this challenge by applying centralized permission guardrails automatically to the accounts of an organization or organizational unit (OU). As new accounts are added, the SCPs are enforced without the need for extra configuration.

You can define SCPs through CloudFormation or CDK templates and deploy them through a CI/CD pipeline, similar to other AWS resources. Because misconfigured SCPs can negatively affect an organization’s operations, it’s vital that you test and simulate the effects of new policies in a sandbox environment before broader deployment. For an example of how to implement a pipeline for SCP testing, see the aws-service-control-policies-deployment GitHub repository.

To learn more about SCPs and how to implement them, see Service control policies (SCPs) and Best Practices for AWS Organizations Service Control Policies in a Multi-Account Environment.

Detective controls

Detective controls help detect noncompliance with existing resources. You can implement detective controls by using AWS Config rules, with both managed rules (provided by AWS) and custom rules available. You can implement custom rules either by using the domain-specific language Guard or Lambda functions. To learn more about the Guard option, see Evaluate custom configurations using AWS Config Custom Policy rules and the open source sample repository. For guidance on creating custom rules using Lambda functions, see AWS Config Rule Development Kit library: Build and operate rules at scale and Deploying Custom AWS Config Rules in an AWS Organization Environment.

To simplify audits for compliance frameworks such as PCI-DSS, HIPAA, and SOC2, AWS Config also offers conformance packs that bundle rules and remediation actions. To learn more about conformance packs, see Conformance Packs and Introducing AWS Config Conformance Packs.

When a resource’s configuration shifts to a noncompliant state that preventive controls didn’t avert, detective controls can help remedy the noncompliant state by implementing predefined actions, such as alerting an operator or reconfiguring the resource. You can implement these controls with AWS Config, which integrates with AWS Systems Manager Automation to help enable the remediation of noncompliant resources.

Security Hub can help centralize the detection of noncompliant resources across multiple AWS accounts. Using AWS Config and third-party tools for detection, Security Hub sends findings of noncompliance to Amazon EventBridge, which can then send notifications or launch automated remediations. You can also use the security controls and standards in Security Hub to monitor the configuration of your AWS infrastructure. This complements the conformance packs in AWS Config.

Conclusion

Many large and fast-growing organizations are faced with the challenge that manual IT governance processes are difficult to scale and can hinder growth. Policy-as-code services help to manage permissions and resource configurations at scale by automating key IT governance processes and, at the same time, increasing the quality and transparency of those processes. This helps to reconcile large environments with key governance objectives such as compliance.

In this post, you learned how to use policy as code to enhance IT governance. A first step is to activate AWS Control Tower, which provides preconfigured guardrails (SCPs) for each AWS account within an organization. These guardrails help enforce baseline compliance across infrastructure. You can then layer on additional controls to further strengthen governance in line with your needs. As a second step, you can select AWS Config conformance packs and Security Hub standards to complement the controls that AWS Control Tower offers. Finally, you can secure applications built on AWS by using Amazon Verified Permissions and Cedar for fine-grained authorization.

Resources

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Roland Odorfer

Roland Odorfer

Roland is a Solutions Architect at AWS, based in Berlin, Germany. He works with German industry and manufacturing customers, helping them architect secure and scalable solutions. Roland is interested in distributed systems and security. He enjoys helping customers use the cloud to solve complex challenges.

How FactSet automates thousands of AWS accounts at scale

Post Syndicated from Amit Borulkar original https://aws.amazon.com/blogs/devops/factset-automation-at-scale/

This post is by FactSet’s Cloud Infrastructure team, Gaurav Jain, Nathan Goodman, Geoff Wang, Daniel Cordes, Sunu Joseph, and AWS Solution Architects Amit Borulkar and Tarik Makota. In their own words, “FactSet creates flexible, open data and software solutions for tens of thousands of investment professionals around the world, which provides instant access to financial data and analytics that investors use to make crucial decisions. At FactSet, we are always working to improve the value that our products provide.”

At FactSet, our operational goal to use the AWS Cloud is to have high developer velocity alongside enterprise governance. Assigning AWS accounts per project enables the agility and isolation boundary needed by each of the project teams to innovate faster. As existing workloads are migrated and new workloads are developed in the cloud, we realized that we were operating close to thousands of AWS accounts. To have a consistent and repeatable experience for diverse project teams, we automated the AWS account creation process, various service control policies (SCP) and AWS Identity and Access Management (IAM) policies and roles associated with the accounts, and enforced policies for ongoing configuration across the accounts. This post covers our automation workflows to enable governance for thousands of AWS accounts.

AWS account creation workflow

To empower our project teams to operate in the AWS Cloud in an agile manner, we developed a platform that enables AWS account creation with the default configuration customized to meet FactSet’s governance policies. These AWS accounts are provisioned with defaults such as a virtual private cloud (VPC), subnets, routing tables, IAM roles, SCP policies, add-ons for monitoring and load-balancing, and FactSet-specific governance. Developers and project team members can request a micro account for their product via this platform’s website, or do so programmatically using an API or wrap-around custom Terraform modules. The following screenshot shows a portion of the web interface that allows developers to request an AWS account.

FactSet service catalog
Continue reading How FactSet automates thousands of AWS accounts at scale

Set up centralized monitoring for DDoS events and auto-remediate noncompliant resources

Post Syndicated from Fola Bolodeoku original https://aws.amazon.com/blogs/security/set-up-centralized-monitoring-for-ddos-events-and-auto-remediate-noncompliant-resources/

When you build applications on Amazon Web Services (AWS), it’s a common security practice to isolate production resources from non-production resources by logically grouping them into functional units or organizational units. There are many benefits to this approach, such as making it easier to implement the principal of least privilege, or reducing the scope of adversely impactful activities that may occur in non-production environments. After building these applications, setting up monitoring for resource compliance and security risks, such as distributed denial of service (DDoS) attacks across your AWS accounts, is just as important. The recommended best practice to perform this type of monitoring involves using AWS Shield Advanced with AWS Firewall Manager, and integrating these with AWS Security Hub.

In this blog post, I show you how to set up centralized monitoring for Shield Advanced–protected resources across multiple AWS accounts by using Firewall Manager and Security Hub. This enables you to easily manage resources that are out of compliance from your security policy and to view DDoS events that are detected across multiple accounts in a single view.

Shield Advanced is a managed application security service that provides DDoS protection for your workloads against infrastructure layer (Layer 3–4) attacks, as well as application layer (Layer 7) attacks, by using AWS WAF. Firewall Manager is a security management service that enables you to centrally configure and manage firewall rules across your accounts and applications in an organization in AWS. Security Hub consumes, analyzes, and aggregates security events produced by your application running on AWS by consuming security findings. Security Hub integrates with Firewall Manager without the need for any action to be taken by you.

I’m going to cover two different scenarios that show you how to use Firewall Manager for:

  1. Centralized visibility into Shield Advanced DDoS events
  2. Automatic remediation of noncompliant resources

Scenario 1: Centralized visibility of DDoS detected events

This scenario represents a fully native and automated integration, where Shield Advanced DDoSDetected events (indicates whether a DDoS event is underway for a particular Amazon Resource Name (ARN)) are made visible as a security finding in Security Hub, through Firewall Manager.

Solution overview

Figure 1 shows the solution architecture for scenario 1.
 

Figure 1: Scenario 1 – Shield Advanced DDoS detected events visible in Security Hub

Figure 1: Scenario 1 – Shield Advanced DDoS detected events visible in Security Hub

The diagram illustrates a customer using AWS Organizations to isolate their production resources into the Production Organizational Unit (OU), with further separation into multiple accounts for each of the mission-critical applications. The resources in Account 1 are protected by Shield Advanced. The Security OU was created to centralize security functions across all AWS accounts and OUs, obscuring the visibility of the production environment resources from the Security Operations Center (SOC) engineers and other security staff. The Security OU is home to the designated administrator account for Firewall Manager and the Security Hub dashboard.

Scenario 1 implementation

You will be setting up Security Hub in an account that has the prerequisite services configured in it as explained below. Before you proceed, see the architecture requirements in the next section. Once Security Hub is enabled for your organization, you can simulate a DDoS event in strict accordance with the AWS DDoS Simulation Testing Policy or use one of AWS DDoS Test Partners.

Architecture requirements

In order to implement these steps, you must have the following:

Once you have all these requirements completed, you can move on to enable Security Hub.

Enable Security Hub

Note: If you plan to protect resources with Shield Advanced across multiple accounts and in multiple Regions, we recommend that you use the AWS Security Hub Multiaccount Scripts from AWS Labs. Security Hub needs to be enabled in all the Regions and all the accounts where you have Shield protected resources. For global resources, like Amazon CloudFront, you should enable Security Hub in the us-east-1 Region.

To enable Security Hub

  1. In the AWS Security Hub console, switch to the account you want to use as the designated Security Hub administrator account.
  2. Select the security standard or standards that are applicable to your application’s use-case, and choose Enable Security Hub.
     
    Figure 2: Enabling Security Hub

    Figure 2: Enabling Security Hub

  3. From the designated Security Hub administrator account, go to the Settings – Account tab, and add accounts by sending invites to all the accounts you want added as member accounts. The invited accounts become associated as member accounts once the owner of the invited account has accepted the invite and Security Hub has been enabled. It’s possible to upload a comma-separated list of accounts you want to send to invites to.
     
    Figure 3: Designating a Security Hub administrator account by adding member accounts

    Figure 3: Designating a Security Hub administrator account by adding member accounts

View detected events in Shield and Security Hub

When Shield Advanced detects signs of DDoS traffic that is destined for a protected resource, the Events tab in the Shield console displays information about the event detected and provides a status on the mitigation that has been performed. Following is an example of how this looks in the Shield console.
 

Figure 4: Scenario 1 - The Events tab on the Shield console showing a Shield event in progress

Figure 4: Scenario 1 – The Events tab on the Shield console showing a Shield event in progress

If you’re managing multiple accounts, switching between these accounts to view the Shield console to keep track of DDoS incidents can be cumbersome. Using the Amazon CloudWatch metrics that Shield Advanced reports for Shield events, visibility across multiple accounts and Regions is easier through a custom CloudWatch dashboard or by consuming these metrics in a third-party tool. For example, the DDoSDetected CloudWatch metric has a binary value, where a value of 1 indicates that an event that might be a DDoS has been detected. This metric is automatically updated by Shield when the DDoS event starts and ends. You only need permissions to access the Security Hub dashboard in order to monitor all events on production resources. Following is an example of what you see in the Security Hub console.
 

Figure 5: Scenario 1 - Shield Advanced DDoS alarm showing in Security Hub

Figure 5: Scenario 1 – Shield Advanced DDoS alarm showing in Security Hub

Configure Shield event notification in Firewall Manager

In order to increase your visibility into possible Shield events across your accounts, you must configure Firewall Manager to monitor your protected resources by using Amazon Simple Notification Service (Amazon SNS). With this configuration, Firewall Manager sends you notifications of possible attacks by creating an Amazon SNS topic in Regions where you might have protected resources.

To configure SNS topics in Firewall Manager

  1. In the Firewall Manager console, go to the Settings page.
  2. Under Amazon SNS Topic Configuration, select a Region.
  3. Choose Configure SNS Topic.
     
    Figure 6: The Firewall Manager Settings page for configuring SNS topics

    Figure 6: The Firewall Manager Settings page for configuring SNS topics

  4. Select an existing topic or create a new topic, and then choose Configure SNS Topic.
     
    Figure 7: Configure an SNS topic in a Region

    Figure 7: Configure an SNS topic in a Region

Scenario 2: Automatic remediation of noncompliant resources

The second scenario is an example in which a new production resource is created, and Security Hub has full visibility of the compliance state of the resource.

Solution overview

Figure 8 shows the solution architecture for scenario 2.
 

Figure 8: Scenario 2 – Visibility of Shield Advanced noncompliant resources in Security Hub

Figure 8: Scenario 2 – Visibility of Shield Advanced noncompliant resources in Security Hub

Firewall Manager identifies that the resource is out of compliance with the defined policy for Shield Advanced and posts a finding to Security Hub, notifying your operations team that a manual action is required to bring the resource into compliance. If configured, Firewall Manager can automatically bring the resource into compliance by creating it as a Shield Advanced–protected resource, and then update Security Hub when the resource is in a compliant state.

Scenario 2 implementation

The following steps describe how to use Firewall Manager to enforce Shield Advanced protection compliance of an application that is deployed to a member account within AWS Organizations. This implementation assumes that you set up Security Hub as described for scenario 1.

Create a Firewall Manager security policy for Shield Advanced protected resources

In this step, you create a Shield Advanced security policy that will be enforced by Firewall Manager. For the purposes of this walkthrough, you’ll choose to automatically remediate noncompliant resources and apply the policy to Application Load Balancer (ALB) resources.

To create the Shield Advanced policy

  1. Open the Firewall Manager console in the designated Firewall Manager administrator account.
  2. In the left navigation pane, choose Security policies, and then choose Create a security policy.
  3. Select AWS Shield Advanced as the policy type, and select the Region where your protected resources are. Choose Next.

    Note: You will need to create a security policy for each Region where you have regional resources, such as Elastic Load Balancers and Elastic IP addresses, and a security policy for global resources such as CloudFront distributions.

    Figure 9: Select the policy type and Region

    Figure 9: Select the policy type and Region

  4. On the Describe policy page, for Policy name, enter a name for your policy.
  5. For Policy action, you have the option to configure automatic remediation of noncompliant resources or to only send alerts when resources are noncompliant. You can change this setting after the policy has been created. For the purposes of this blog post, I’m selecting Auto remediate any noncompliant resources. Select your option, and then choose Next.

    Important: It’s a best practice to first identify and review noncompliant resources before you enable automatic remediation.

  6. On the Define policy scope page, define the scope of the policy by choosing which AWS accounts, resource type, or resource tags the policy should be applied to. For the purposes of this blog post, I’m selecting to manage Application Load Balancer (ALB) resources across all accounts in my organization, with no preference for resource tags. When you’re finished defining the policy scope, choose Next.
     
    Figure 10: Define the policy scope

    Figure 10: Define the policy scope

  7. Review and create the policy. Once you’ve reviewed and created the policy in the Firewall Manager designated administrator account, the policy will be pushed to all the Firewall Manager member accounts for enforcement. The new policy could take up to 5 minutes to appear in the console. Figure 11 shows a successful security policy propagation across accounts.
     
    Figure 11: View security policies in an account

    Figure 11: View security policies in an account

Test the Firewall Manager and Security Hub integration

You’ve now defined a policy to cover only ALB resources, so the best way to test this configuration is to create an ALB in one of the Firewall Manager member accounts. This policy causes resources within the policy scope to be added as protected resources.

To test the policy

  1. Switch to the Security Hub administrator account and open the Security Hub console in the same Region where you created the ALB. On the Findings page, set the Title filter to Resource lacks Shield Advanced protection and set the Product name filter to Firewall Manager.
     
    Figure 12: Security Hub findings filter

    Figure 12: Security Hub findings filter

    You should see a new security finding flagging the ALB as a noncompliant resource, according to the Shield Advanced policy defined in Firewall Manager. This confirms that Security Hub and Firewall Manager have been enabled correctly.
     

    Figure 13: Security Hub with a noncompliant resource finding

    Figure 13: Security Hub with a noncompliant resource finding

  2. With the automatic remediation feature enabled, you should see the “Updated at” time reflect exactly when the automatic remediation actions were completed. The completion of the automatic remediation actions can take up to 5 minutes to be reflected in Security Hub.
     
    Figure 14: Security Hub with an auto-remediated compliance finding

    Figure 14: Security Hub with an auto-remediated compliance finding

  3. Go back to the account where you created the ALB, and in the Shield Protected Resources console, navigate to the Protected Resources page, where you should see the ALB listed as a protected resource.
     
    Figure 15: Shield console in the member account shows that the new ALB is a protected resource

    Figure 15: Shield console in the member account shows that the new ALB is a protected resource

    Confirming that the ALB has been added automatically as a Shield Advanced–protected resource means that you have successfully configured the Firewall Manager and Security Hub integration.

(Optional): Send a custom action to a third-party provider

You can send all regional Security Hub findings to a ticketing system, Slack, AWS Chatbot, a Security Information and Event Management (SIEM) tool, a Security Orchestration Automation and Response (SOAR), incident management tools, or to custom remediation playbooks by using Security Hub Custom Actions.

Conclusion

In this blog post I showed you how to set up a Firewall Manager security policy for Shield Advanced so that you can monitor your applications for DDoS events, and their compliance to DDoS protection policies in your multi-account environment from the Security Hub findings console. In line with best practices for account governance, organizations should have a centralized security account that performs monitoring for multiple accounts. Security Hub and Firewall Manager provide a centralized solution to help you achieve your compliance and monitoring goals for DDoS protection.

If you’re interested in exploring how Shield Advanced and AWS WAF help to improve the security posture of your application, have a look at the following resources:

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Security Hub forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Fola Bolodeoku

Fola is a Security Engineer on the AWS Threat Research Team, where he focuses on helping customers improve their application security posture against DDoS and other application threats. When he is not working, he enjoys spending time exploring the natural beauty of the Western Cape.

New IAMCTL tool compares multiple IAM roles and policies

Post Syndicated from Sudhir Reddy Maddulapally original https://aws.amazon.com/blogs/security/new-iamctl-tool-compares-multiple-iam-roles-and-policies/

If you have multiple Amazon Web Services (AWS) accounts, and you have AWS Identity and Access Management (IAM) roles among those multiple accounts that are supposed to be similar, those roles can deviate over time from your intended baseline due to manual actions performed directly out-of-band called drift. As part of regular compliance checks, you should confirm that these roles have no deviations. In this post, we present a tool called IAMCTL that you can use to extract the IAM roles and policies from two accounts, compare them, and report out the differences and statistics. We will explain how to use the tool, and will describe the key concepts.

Prerequisites

Before you install IAMCTL and start using it, here are a few prerequisites that need to be in place on the computer where you will run it:

To follow along in your environment, clone the files from the GitHub repository, and run the steps in order. You won’t incur any charges to run this tool.

Install IAMCTL

This section describes how to install and run the IAMCTL tool.

To install and run IAMCTL

  1. At the command line, enter the following command:
    pip3 install git+ssh://[email protected]/aws-samples/[email protected]
    

    You will see output similar to the following.

    Figure 1: IAMCTL tool installation output

    Figure 1: IAMCTL tool installation output

  2. To confirm that your installation was successful, enter the following command.
    iamctl –h
    

    You will see results similar to those in figure 2.

    Figure 2: IAMCTL help message

    Figure 2: IAMCTL help message

Now that you’ve successfully installed the IAMCTL tool, the next section will show you how to use the IAMCTL commands.

Example use scenario

Here is an example of how IAMCTL can be used to find differences in IAM roles between two AWS accounts.

A system administrator for a product team is trying to accelerate a product launch in the middle of testing cycles. Developers have found that the same version of their application behaves differently in the development environment as compared to the QA environment, and they suspect this behavior is due to differences in IAM roles and policies.

The application called “app1” primarily reads from an Amazon Simple Storage Service (Amazon S3) bucket, and runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance. In the development (DEV) account, the application uses an IAM role called “app1_dev” to access the S3 bucket “app1-dev”. In the QA account, the application uses an IAM role called “app1_qa” to access the S3 bucket “app1-qa”. This is depicted in figure 3.

Figure 3: Showing the “app1” application in the development and QA accounts

Figure 3: Showing the “app1” application in the development and QA accounts

Setting up the scenario

To simulate this setup for the purpose of this walkthrough, you don’t have to create the EC2 instance or the S3 bucket, but just focus on the IAM role, inline policy, and trust policy.

As noted in the prerequisites, you will switch between the two AWS accounts by using the AWS CLI named profiles “dev-profile” and “qa-profile”, which are configured to point to the DEV and QA accounts respectively.

Start by using this command:

mkdir -p iamctl_test iamctl_test/dev iamctl_test/qa

The command creates a directory structure that looks like this:
Iamctl_test
|– qa
|– dev

Now, switch to the dev folder to run all the following example commands against the DEV account, by using this command:

cd iamctl_test/dev

To create the required policies, first create a file named “app1_s3_access_policy.json” and add the following policy to it. You will use this file’s content as your role’s inline policy.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
         "arn:aws:s3:::app1-dev/shared/*"
            ]
        }
    ]
}

Second, create a file called “app1_trust_policy.json” and add the following policy to it. You will use this file’s content as your role’s trust policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now use the two files to create an IAM role with the name “app1_dev” in the account by using these command(s), run in the same order as listed here:

#create role with trust policy

aws --profile dev-profile iam create-role --role-name app1_dev --assume-role-policy-document file://app1_trust_policy.json

#put inline policy to the role created above
 
aws --profile dev-profile iam put-role-policy --role-name app1_dev --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

In the QA account, the IAM role is named “app1_qa” and the S3 bucket is named “app1-qa”.

Repeat the steps from the prior example against the QA account by changing dev to qa where shown in bold in the following code samples. Change the directory to qa by using this command:

cd ../qa

To create the required policies, first create a file called “app1_s3_access_policy.json” and add the following policy to it.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*"
            ],
            "Resource": [
         "arn:aws:s3:::app1-qa/shared/*"
            ]
        }
    ]
}

Next, create a file, called “app1_trust_policy.json” and add the following policy.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "ec2.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

Now, use the two files created so far to create an IAM role with the name “app1_qa” in your QA account by using these command(s), run in the same order as listed here:

#create role with trust policy
aws --profile qa-profile iam create-role --role-name app1_qa --assume-role-policy-document file://app1_trust_policy.json

#put inline policy to the role create above
aws --profile qa-profile iam put-role-policy --role-name app1_qa --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

So far, you have two accounts with an IAM role created in each of them for your application. In terms of permissions, there are no differences other than the name of the S3 bucket resource the permission is granted against.

You can expect IAMCTL to generate a minimal set of differences between the DEV and QA accounts, assuming all other IAM roles and policies are the same, but to be sure about the current state of both accounts, in IAMCTL you can run a process called baselining.

Through the process of baselining, you will generate an equivalency dictionary that represents all the known string patterns that reduce the noise in the generated deviations list, and then you will introduce a change into one of the IAM roles in your QA account, followed by a final IAMCTL diff to show the deviations.

Baselining

Baselining is the process of bringing two accounts to an “equivalence” state for IAM roles and policies by establishing a baseline, which future diff operations can leverage. The process is as simple as:

  1. Run the iamctl diff command.
  2. Capture all string substitutions into an equivalence dictionary to remove or reduce noise.
  3. Save the generated detailed files as a snapshot.

Now you can go through these steps for your baseline.

Go ahead and run the iamctl diff command against these two accounts by using the following commands.

#change directory from qa to iamctl-test
cd ..

#run iamctl init
iamctl init

The results of running the init command are shown in figure 4.

Figure 4: Output of the iamctl init command

Figure 4: Output of the iamctl init command

If you look at the iamctl_test directory now, shown in figure 5, you can see that the init command created two files in the iamctl_test directory.

Figure 5: The directory structure after running the init command

Figure 5: The directory structure after running the init command

These two files are as follows:

  1. iam.jsonA reference file that has all AWS services and actions listed, among other things. IAMCTL uses this to map the resource listed in an IAM policy to its corresponding AWS resource, based on Amazon Resource Name (ARN) regular expression.
  2. equivalency_list.jsonThe default sample dictionary that IAMCTL uses to suppress false alarms when it compares two accounts. This is where the known string patterns that need to be substituted are added.

Note: A best practice is to make the directory where you store the equivalency dictionary and from which you run IAMCTL to be a Git repository. Doing this will let you capture any additions or modifications for the equivalency dictionary by using Git commits. This will not only give you an audit trail of your historical baselines but also gives context to any additions or modifications to the equivalency dictionary. However, doing this is not necessary for the regular functioning of IAMCTL.

Next, run the iamctl diff command:

#run iamctl diff
iamctl diff dev-profile dev qa-profile qa
Figure 6: Result of diff command

Figure 6: Result of diff command

Figure 6 shows the results of running the diff command. You can see that IAMCTL considers the app1_qa and app1_dev roles as unique to the DEV and QA accounts, respectively. This is because IAMCTL uses role names to decide whether to compare the role or tag the role as unique.

You will add the strings “dev” and “qa” to the equivalency dictionary to instruct IAMCTL to substitute occurrences of these two strings with “accountname” by adding the follow JSON to the equivalency_list.json file. You will also clean up some defaults already present in there.

echo “{“accountname”:[“dev”,”qa”]}” > equivalency_list.json

Figure 7 shows the equivalency dictionary before you take these actions, and figure 8 shows the dictionary after these actions.

Figure 7: Equivalency dictionary before

Figure 7: Equivalency dictionary before

Figure 8: Equivalency dictionary after

Figure 8: Equivalency dictionary after

There’s another thing to notice here. In this example, one common role was flagged as having a difference. To know which role this is and what the difference is, go to the detail reports folder listed at the bottom of the summary report. The directory structure of this folder is shown in figure 9.

Notice that the reports are created under your home directory with a folder structure that mimics the time stamp down to the second. IAMCTL does this to maintain uniqueness for each run.

tree /Users/<username>/aws-idt/output/2020/08/24/08/38/49/
Figure 9: Files written to the output reports directory

Figure 9: Files written to the output reports directory

You can see there is a file called common_roles_in_dev_with_differences.csv, and it lists a role called “AwsSecurity***Audit”.

You can see there is another file called dev_to_qa_common_role_difference_items.csv, and it lists the granular IAM items from the DEV account that belong to the “AwsSecurity***Audit” role as compared to QA, but which have differences. You can see that all entries in the file have the DEV account number in the resource ARN, whereas in the qa_to_dev_common_role_difference_items.csv file, all entries have the QA account number for the same role “AwsSecurity***Audit”.

Add both of the account numbers to the equivalency dictionary to substitute them with a placeholder number, because you don’t want this role to get flagged as having differences.

echo “{“accountname”:[“dev”,”qa”],”000000000000”:[“123456789012”,”987654321098”]}” > equivalency_list.json

Now, re-run the diff command.

#run iamctl diff
iamctl diff dev-profile dev qa-profile qa

As you can see in figure 10, you get back the result of the diff command that shows that the DEV account doesn’t have any differences in IAM roles as compared to the QA account.

Figure 10: Output showing no differences after completion of baselining

Figure 10: Output showing no differences after completion of baselining

This concludes the baselining for your DEV and QA accounts. Now you will introduce a change.

Introducing drift

Drift occurs when there is a difference in actual vs expected values in the definition or configuration of a resource. There are several reasons why drift occurs, but for this scenario you will use “intentional need to respond to a time-sensitive operational event” as a reason to mimic and introduce drift into what you have built so far.

To simulate this change, add “s3:PutObject” to the qa app1_s3_access_policy.json file as shown in the following example.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*",
                "s3:PutObject"
            ],
            "Resource": [
         "arn:aws:s3:::app1-qa/shared/*"
            ]
        }
    ]
}

Put this new inline policy on your existing role “app1_qa” by using this command:

aws --profile qa-profile iam put-role-policy --role-name app1_qa --policy-name s3_inline_policy --policy-document file://app1_s3_access_policy.json

The following table represents the new drift in the accounts.

Action Account-DEV
Role name: app1_dev
Account-QA
Role name: app1_qa
s3:Get* Yes Yes
s3:List* Yes Yes
s3: PutObject No Yes

Next, run the iamctl diff command to see how it picks up the drift from your previously baselined accounts.

#change directory from qa to iamctL-test
cd ..
iamctl diff dev-profile dev qa-profile qa
Figure 11: Output showing the one deviation that was introduced

Figure 11: Output showing the one deviation that was introduced

You can see that IAMCTL now shows that the QA account has one difference as compared to DEV, which is what we expect based on the deviation you’ve introduced.

Open up the file qa_to_dev_common_role_difference_items.csv to look at the one difference. Again, adjust the following path example with the output from the iamctl diff command at the bottom of the summary report in Figure 11.

cat /Users/<username>/aws-idt/output/2020/09/18/07/38/15/qa_to_dev_common_role_difference_items.csv

As shown in figure 12, you can see that the file lists the specific S3 action “PutObject” with the role name and other relevant details.

Figure 12: Content of file qa_to_dev_common_role_difference_items.csv showing the one deviation that was introduced

Figure 12: Content of file qa_to_dev_common_role_difference_items.csv showing the one deviation that was introduced

You can use this information to remediate the deviation by performing corrective actions in either your DEV account or QA account. You can confirm the effectiveness of the corrective action by re-baselining to make sure that zero deviations appear.

Conclusion

In this post, you learned how to use the IAMCTL tool to compare IAM roles between two accounts, to arrive at a granular list of meaningful differences that can be used for compliance audits or for further remediation actions. If you’ve created your IAM roles by using an AWS CloudFormation stack, you can turn on drift detection and easily capture the drift because of changes done outside of AWS CloudFormation to those IAM resources. For more information about drift detection, see Detecting unmanaged configuration changes to stacks and resources. Lastly, see the GitHub repository where the tool is maintained with documentation describing each of the subcommand concepts. We welcome any pull requests for issues and enhancements.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS IAM forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Sudhir Reddy Maddulapally

Sudhir is a Senior Partner Solution Architect who builds SaaS solutions for partners by day and is a tech tinkerer by night. He enjoys trips to state and national parks, and Yosemite is his favorite thus far!

Author

Soumya Vanga

Soumya is a Cloud Application Architect with AWS Professional Services in New York, NY, helping customers design solutions and workloads and to adopt Cloud Native services.