Tag Archives: Thought Leadership

How to implement multi tenancy with Amazon SES

2023-05-25 satyaso

Post Syndicated from satyaso original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-manage-email-sending-for-multiple-end-customers-using-amazon-ses/

In this blog post, you will learn how to design multi-tenancy with Amazon SES, as well as the fundamental best practices for implementing a multi-tenant architecture that can effectively handle bulk the email sending needs of your downstream customers.

Amazon Simple Email Service (SES) is utilized by customers across various industries to send emails to their recipients. Often, they need to send emails on behalf of their downstream customers or for other business divisions. Organizations commonly refer to these use cases as “multi-tenant email sending practices. To implement email sending multi-tenancy practices (i.e. to send bulk emails on behalf of end customers), Amazon SES customers need to adopt an architecture that enables them to effectively meet the email sending needs of thousands of downstream customers while also ensuring that the email sending reputation of each customer or the tenant is isolated.

Use cases

Onboard multiple brands from different Business units (BUs) with different domains.
Separate marketing and transaction tenants.
ISV Customer’s requirement to segregate email sending reputation of their end customers.
Domain management via configuration sets.
Track individual customer’s email sending repurataion and control their email sending process.

Prerequisites

For this post, you should be familiar with the following:

Solution Overview

In the email ecosystem, domain and IP reputation are critical in getting emails delivered to the inbox. Tenants in a multi-tenant scenario might be unique businesses or an internal team (eg marketing team, customer service team and so on). Because the maturity of each tenant varies greatly, implementing a multi-tenant environment may be increasingly complicated and difficult. While one tenant may have a well-validated and highly-engaged recipient list, another tenant may have an untrusted email recipient list, and sending emails to such email addresses may result in bounces or spam, lowering the IP and domain reputation. So, organizations have to build safe guards to prevent an unsophisticated sender or a bad actor from impacting the other tenants.

To better understand multi-tenancy, let us first look at how Amazon SES sends emails. Any emails sent via Amazon SES to end users are sent using IP addresses that have been mapped within Amazon SES. Amazon SES offers two types of IP addresses: shared IP addresses and dedicated IP addresses. (Currently Amazon SES offers two kinds of dedicated IPs, which are 1/ Standard dedicated IPs, 2/ Managed dedicated IPs). Shared IPs are shared across many SES customers, and all your emails are sent using shared IP addresses by default unless you have requested for dedicated IPs. Dedicated IP addresses/addresses are designated for a single customer or tenant, where the tenant might be a business unit within the customer’s own eco system or a downstream customer of an ISV.

If a customer is using shared IPs to send email via SES and trying to achieve multi tenancy, then they can do so to segregate the business functions of multiple tenants such as tenant tagging, SES event destination routing, cost allocation for each tenant, and so on; but it won’t help to manage or isolate email sending reputation from one tenant to another. This is because; these shared IPs are mapped to an AWS region and incase one rogue tenant is trying to send spam emails then it will impact other customers in the same region who are using same set of shared IPs.

If you are an Amazon SES user and wish to separate the reputation of one end-customer from another then dedicated IPs are the ideal solution. Dedicated IP or Dedicated IPs (also known as dedicated IP pool) can be assigned to a tenant, and the email sending reputation for that tenant can be readily isolated from that of another tenant. If tenant one is a problematic sender and internet service providers (ISPs) such as Gmail, Hotmail, Yahoo and, so on, flags the respective domain or IPs, the reputation of the other tenants’ domains and IPs are unaffected since they are mutually exclusive.

Amazon SES supports multi-tenancy primarily through two constructs: 1/configuration sets, 2/Dedicate IP pools. Configuration sets are setup rules that are applicable to your verified identities, whereas dedicated IP pool is to group dedicated IPs into a pool, which can then be mapped to a configuration set, such that the respective Identity/Identities may only utilize the same IP Pool without affecting other tenants. Let’s now witness a simplified architecture view.

Multi tenancy using a single AWS account

In this architecture, if you notice tenant 1, tenant 2 and tenant 3 are using the distinct configurations with respective dedicated IPs while tenant 4 is using shared IPs. i.e. the tenants can chose which configuration sets needs to be used for their domain. This provides customers capability to achieve multi tenancy.

Amazon SES multi tenancy – best practices

Always proactively reach out to your account team or raise a support case under “service limit increase” category informing that you will be sending on behalf of tens of thousands of customers. This will help AWS in rightly setup limits within your account and be cognizant of your sending patterns.

While the architecture described above will most of the time help Amazon SES users manage multiple end customers effectively, in rare cases; Amazon SES users may receive a notification from AWS support stating that their Amazon SES account is being reviewed. This indicates that your Amazon SES account is being used to send problematic email to end recipients, or that the account has been paused (if you haven’t reacted proactively upon controlling the faulty senders within the review timeframe), which means you can’t send email from your SES account because your spam or complaint rate has exceeded a certain threshold. These type of situations occurs because, Amazon SES sanitization process is implemented at the AWS account level by default. So, even if any of the tenants using a dedicated IP or a dedicated IP pool and their spam or complaint rates exceed the approved SES limit, Amazon SES sends a notification to the account admin, flagging the concern in their account. In such cases, it is recommended to implement a process known as “automatically pausing email sending for a configuration set“. You can configure Amazon SES to export reputation metrics that are specific to emails that are sent using a specific configuration set to Amazon CloudWatch. You can then use these metrics to create CloudWatch alarms that are specific to those configuration sets. When these alarms exceed certain thresholds, you can automatically pause the sending of emails that use the specified configuration sets, without impacting the overall email sending capabilities of your Amazon SES account.

If you are an Enterprise ISV customer and you have tens of thousands of downstream customers then there is a possibility that you will hit Amazon SES provided maximum quota. In those scenarios you have two options; 1/ Ask for an exception for your AWS SES account – In this approach, you need to request AWS to increase your quota applicable for the existing account to a higher threshold and depending upon your previous usage and reputation AWS shall increase your account limit to accommodate more customers/tenants. To do this you need to raise an AWS support case under “service limit increase” and present your requirement on why you want to increase your Amazon SES account quota to a higher limit. There is no guaranty that the exception will always be granted. If your exception request is denied, you must proceed to the second option, which is to 2/ segment your customers across multiple AWS accounts. In this approach, you must calculate your customer base ahead of time and distribute your downstream customers across multiple accounts within the same AWS region in order to set up their email sending mechanism using SES. To better understand option 2, refer to the architecture diagram below.

Multi tenancy using multiple AWS account

In the above architecture various tenants are connecting to Amazon SES in different AWS accounts to implement multi tenancy. Email event responses can be taken back to a central data lake located in the same AWS region or in different region. Furthermore, as shown in the diagram above, all AWS accounts mapped to different tenants are under a Parent AWS account; this hierarchical structure is known as AWS Organizations. it is recommended to use AWS Organizations which enables you to consolidate multiple AWS accounts into an organization that you create and centrally manage. It helps in security and compliance guide lines, managing consolidated billing for all the child accounts.

Conclusion

Appropriate multi-tenancy implementation within Amazon SES not only helps you manage end-customer reputation but also aids in tracking usage of each user independently from one another. In this post, we have showcased how Amazon SES users can utilize Amazon SES to manage large number of end customer, what are the design best practices to implement multi-tenant architecture with Amazon SES.

Satyasovan Tripathy works at Amazon Web Services as a Senior Specialist Solution Architect. He is based in Bengaluru, India, and specialises on the AWS customer developer service product portfolio. He likes reading and travelling outside of work.

Let’s Architect! Designing microservices architectures

2023-05-24 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-designing-microservices-architectures/

In 2022, we published Let’s Architect! Architecting microservices with containers. We covered integrations patterns and some approaches for implementing microservices using containers. In this Let’s Architect! post, we want to drill down into microservices only, by focusing on the main challenges that software architects and engineers face while working on large distributed systems structured as a set of independent services.

There are many considerations to cover in detail within a broad topic like microservices. We should reflect on the organizational structure, automation pipelines, multi-account strategy, testing, communication, and many other areas. With this post we dive deep into the topic by analyzing the options for discoverability and connectivity available through Amazon VPC Lattice; then, we focus on architectural patterns for communication, mainly on asynchronous communication, as it fits very well into the paradigm. Finally, we explore how to work with serverless microservices and analyze a case study from Amazon, coming directly from the Amazon Builder’s Library.

AWS Container Day featuring Kubernetes

Modern applications are often built using a microservice distributed approach, which involves dividing the application into smaller, specialized services. Each of these services implement their own subset of functionalities or business logic. To facilitate communication between these services, it is essential to have a method to authorize, route, and monitor network traffic. It is also important, in case of issues, to have the ability of identifying the root cause of an issue, whether it originates at the application, service, or network level.

Amazon VPC Lattice can offer a consistent way to connect, secure, and monitor communication between instances, containers, and serverless functions. With Amazon VPC Lattice, you can define policies for traffic management, network access, advanced routing, implement discoverability, and, at the same time, monitor how the traffic is flowing inside complex applications in near real time.

Take me to this video!

Amazon VPC Lattice service gives you a consistent way to connect, secure, and monitor communication between your services

Application integration patterns for microservices

Loosely coupled integration can help you design independent systems that can be developed and operated individually, plus increase the availability and reliability of the overall system landscape—particularly by using asynchronous communication. While there are many approaches for integration and conversation scenarios, it’s not always clear which approach is best for a given situation.

Join this re:Invent 2022 session to learn about foundational patterns for integration and conversation scenarios with an emphasis on loose coupling and asynchronous communication. Explore real-world use cases architected with cloud-native and serverless services, and receive guidance on choosing integration technology.

Take me to this re:Invent 2022 video!

Loosely coupled integration can help you design independent systems that can be developed and operated individually and can also increase the availability and reliability of the overall system

Design patterns for success in serverless microservices

Software engineers love patterns—proven approaches to well-known problems that make software development easier and set our projects up for success. In complex, distributed systems, such as microservices, patterns like CQRS and Event Sourcing help decouple and scale systems.

The first part of the video is all about introducing architectural patterns and their applications, while the second part contains a set of demos and examples from the AWS console.
In this session, we examine at some typical patterns for building robust and performant serverless microservices, and how data access patterns can drive polyglot persistence.

Take me to this AWS Summit video!

With event sourcing data is stored as a series of events, instead of direct updates to data stores; microservices replay events from an event store to compute the appropriate state of their own data stores

Avoiding overload in distributed systems by putting the smaller service in control

If we don’t pay attention to the relative scale of a service and its clients, distributed systems with microservices can be at risk of overload. A common architecture pattern adopted by many AWS services consists of splitting the system in a control plane and a data plane.

This article drills down into this scenario to understand what could happen if the data plane fleet exceeds the scale of the control plane fleet by a factor of 100 or more. This can happen in a microservices-based architecture when service X recovers from an outage and starts sending a large amount of request to service Y. Without careful fine-tuning, this shift in behavior can overwhelm the smaller callee. With this resource, we want to share some mental models and design strategies that are beneficial for distributed systems and teams working on microservices architectures.

Take me to the Amazon Builders’ Library!

To stay up to date on the data plane’s operational state, the control plane can poll an Amazon S3 bucket into which data plane servers periodically write that information

To stay updated on the data plane’s operational state, the control plane can poll an Amazon S3 bucket into which data plane servers periodically write that information

See you next time!

Thanks for stopping by! Join us in two weeks when we’ll discuss multi-tenancy and patterns for SaaS on AWS.

To find all the blogs from this series, you can check out the Let’s Architect! list of content on the AWS Architecture Blog.

Stronger together: Highlights from RSA Conference 2023

2023-05-19 Anne Grahn

Post Syndicated from Anne Grahn original https://aws.amazon.com/blogs/security/stronger-together-highlights-from-rsa-conference-2023/

Golden Gate bridge

RSA Conference 2023 brought thousands of cybersecurity professionals to the Moscone Center in San Francisco, California from April 24 through 27.

The keynote lineup was eclectic, with more than 30 presentations across two stages featuring speakers ranging from renowned theoretical physicist and futurist Dr. Michio Kaku to Grammy-winning musician Chris Stapleton. Topics aligned with this year’s conference theme, “Stronger Together,” and focused on actions that can be taken by everyone, from the C-suite to those of us on the front lines of security, to strengthen collaboration, establish new best practices, and make our defenses more diverse and effective.

With over 400 sessions and 500 exhibitors discussing the latest trends and technologies, it’s impossible to recap every highlight. Now that the dust has settled and we’ve had time to reflect, here’s a glimpse of what caught our attention.

Noteworthy announcements

Hundreds of companies — including Amazon Web Services (AWS) — made new product and service announcements during the conference.

We announced three new capabilities for our Amazon GuardDuty threat detection service to help customers secure container, database, and serverless workloads. These include GuardDuty Elastic Kubernetes Service (EKS) Runtime Monitoring, GuardDuty RDS Protection for data stored in Amazon Aurora, and GuardDuty Lambda Protection for serverless applications. The new capabilities are designed to provide actionable, contextual, and timely security findings with resource-specific details.

Artificial intelligence

It was hard to find a single keynote, session, or conversation that didn’t touch on the impact of artificial intelligence (AI).

In “AI: Law, Policy and Common Sense Suggestions on How to Stay Out of Trouble,” privacy and gaming attorney Behnam Dayanim highlighted ambiguity around the definition of AI. Referencing a quote from University of Washington School of Law’s Ryan Calo, Dayanim pointed out that AI may be best described as “…a set of techniques aimed at approximating some aspect of cognition,” and should therefore be thought of differently than a discrete “thing” or industry sector.

Dayanim noted examples of skepticism around the benefits of AI. A recent Monmouth University poll, for example, found that 73% of Americans believe AI will make jobs less available and harm the economy, and a surprising 55% believe AI may one day threaten humanity’s existence.

Equally skeptical, he noted, is a joint statement made by the Federal Trade Commission (FTC) and three other federal agencies during the conference reminding the public that enforcement authority applies to AI. The statement takes a pessimistic view, saying that AI is “…often advertised as providing insights and breakthroughs, increasing efficiencies and cost-savings, and modernizing existing practices,” but has the potential to produce negative outcomes.

Dayanim covered existing and upcoming legal frameworks around the world that are aimed at addressing AI-related risks related to intellectual property (IP), misinformation, and bias, and how organizations can design AI governance mechanisms to promote fairness, competence, transparency, and accountability.

Many other discussions focused on the immense potential of AI to automate and improve security practices. RSA Security CEO Rohit Ghai explored the intersection of progress in AI with human identity in his keynote. “Access management and identity management are now table stakes features”, he said. In the AI era, we need an identity security solution that will secure the entire identity lifecycle—not just access. To be successful, he believes, the next generation of identity technology needs to be powered by AI, open and integrated at the data layer, and pursue a security-first approach. “Without good AI,” he said, “zero trust has zero chance.”

Mark Ryland, director at the Office of the CISO at AWS, spoke with Infosecurity about improving threat detection with generative AI.

“We’re very focused on meaningful data and minimizing false positives. And the only way to do that effectively is with machine learning (ML), so that’s been a core part of our security services,” he noted.

We recently announced several new innovations—including Amazon Bedrock, the Amazon Titan foundation model, the general availability of Amazon Elastic Compute Cloud (Amazon EC2) Trn1n instances powered by AWS Trainium, Amazon EC2 Inf2 instances powered by AWS Inferentia2, and the general availability of Amazon CodeWhisperer—that will make it practical for customers to use generative AI in their businesses.

“Machine learning and artificial intelligence will add a critical layer of automation to cloud security. AI/ML will help augment developers’ workstreams, helping them create more reliable code and drive continuous security improvement. — CJ Moses, CISO and VP of security engineering at AWS

The human element

Dozens of sessions focused on the human element of security, with topics ranging from the psychology of DevSecOps to the NIST Phish Scale. In “How to Create a Breach-Deterrent Culture of Cybersecurity, from Board Down,” Andrzej Cetnarski, founder, chairman, and CEO of Cyber Nation Central and Marcus Sachs, deputy director for research at Auburn University, made a data-driven case for CEOs, boards, and business leaders to set a tone of security in their organizations, so they can address “cyber insecure behaviors that lead to social engineering” and keep up with the pace of cybercrime.

Lisa Plaggemier, executive director of the National Cybersecurity Alliance, and Jenny Brinkley, director of Amazon Security, stressed the importance of compelling security awareness training in “Engagement Through Entertainment: How To Make Security Behaviors Stick.” Education is critical to building a strong security posture, but as Plaggemier and Brinkley pointed out, we’re “living through an epidemic of boringness” in cybersecurity training.

According to a recent report, just 28% of employees say security awareness training is engaging, and only 36% say they pay full attention during such training.

Citing a United Airlines preflight safety video and Amazon’s Protect and Connect public service announcement (PSA) as examples, they emphasized the need to make emotional connections with users through humor and unexpected elements in order to create memorable training that drives behavioral change.

Plaggemeier and Brinkley detailed five actionable steps for security teams to improve their awareness training:

Brainstorm with staff throughout the company (not just the security people)
Find ideas and inspiration from everywhere else (TV episodes, movies… anywhere but existing security training)
Be relatable, and include insights that are relevant to your company and teams
Start small; you don’t need a large budget to add interest to your training
Don’t let naysayers deter you — change often prompts resistance

“You’ve got to make people care. And so you’ve got to find out what their personal motivators are, and how to develop the type of content that can make them care to click through the training and…remember things as they’re walking through an office.” — Jenny Brinkley, director of Amazon Security

Cloud security

Cloud security was another popular topic. In “Architecting Security for Regulated Workloads in Hybrid Cloud,” Mark Buckwell, cloud security architect at IBM, discussed the architectural thinking practices—including zero trust—required to integrate security and compliance into regulated workloads in a hybrid cloud environment.

Mitiga co-founder and CTO Ofer Maor told real-world stories of SaaS attacks and incident response in “It’s Getting Real & Hitting the Fan 2023 Edition.”

Maor highlighted common tactics focused on identity theft, including MFA push fatigue, phishing, business email compromise, and adversary-in-the middle attacks. After detailing techniques that are used to establish persistence in SaaS environments and deliver ransomware, Maor emphasized the importance of forensic investigation and threat hunting to gaining the knowledge needed to reduce the impact of SaaS security incidents.

Sarah Currey, security practice manager, and Anna McAbee, senior solutions architect at AWS, provided complementary guidance in “Top 10 Ways to Evolve Cloud Native Incident Response Maturity.” Currey and McAbee highlighted best practices for addressing incident response (IR) challenges in the cloud — no matter who your provider is:

Define roles and responsibilities in your IR plan
Train staff on AWS (or your provider)
Develop cloud incident response playbooks
Develop account structure and tagging strategy
Run simulations (red team, purple team, tabletop)
Prepare access
Select and set up logs
Enable managed detection services in all available AWS Regions
Determine containment strategy for resource types
Develop cloud forensics capabilities

Speaking to BizTech, Clarke Rodgers, director of enterprise strategy at AWS, noted that tools and services such as Amazon GuardDuty and AWS Key Management Service (AWS KMS) are available to help advance security in the cloud. When organizations take advantage of these services and use partners to augment security programs, they can gain the confidence they need to take more risks, and accelerate digital transformation and product development.

Security takes a village

There are more highlights than we can mention on a variety of other topics, including post-quantum cryptography, data privacy, and diversity, equity, and inclusion. We’ve barely scratched the surface of RSA Conference 2023. If there is one key takeaway, it is that no single organization or individual can address cybersecurity challenges alone. By working together and sharing best practices as an industry, we can develop more effective security solutions and stay ahead of emerging threats.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Nitro System gets independent affirmation of its confidential compute capabilities

2023-05-09 Sheila Busser

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/aws-nitro-system-gets-independent-affirmation-of-its-confidential-compute-capabilities/

This blog post was written By Anthony Liguori, VP/Distinguished Engineer, EC2 AWS.

Customers around the world trust AWS to keep their data safe, and keeping their workloads secure and confidential is foundational to how we operate. Since the inception of AWS, we have relentlessly innovated on security, privacy tools, and practices to meet, and even exceed, our customers’ expectations.

The AWS Nitro System is the underlying platform for all modern AWS compute instances which has allowed us to deliver the data isolation, performance, cost, and pace of innovation that our customers require. It’s a pioneering design of specialized hardware and software that protects customer code and data from unauthorized access during processing.

When we launched the Nitro System in 2017, we delivered a unique architecture that restricts any operator access to customer data. This means no person or even service from AWS, can access data when it is being used in an Amazon EC2 instance. We knew that designing the system this way would present several architectural and operational challenges for us. However, we also knew that protecting customers’ data in this way was the best way to support our customer’s needs.

When AWS made its Digital Sovereignty Pledge last year, we committed to providing greater transparency and assurances to customers about how AWS services are designed and operated, especially when it comes to handling customer data. As part of that increased transparency, we engaged NCC Group, a leading cybersecurity consulting firm based in the United Kingdom, to conduct an independent architecture review of the Nitro System and the security assurances we make to our customers. NCC has now issued its rand affirmed our claims.

The report states, “As a matter of design, NCC Group found no gaps in the Nitro System that would compromise [AWS] security claims.” Specifically, the report validates the following statements about our Nitro System production hosts:

There is no mechanism for a cloud service provider employee to log in to the underlying host.
No administrative API can access customer content on the underlying host.
There is no mechanism for a cloud service provider employee to access customer content stored on instance storage and encrypted EBS volumes.
There is no mechanism for a cloud service provider employee to access encrypted data transmitted over the network.
Access to administrative APIs always requires authentication and authorization.
Access to administrative APIs is always logged.
Hosts can only run tested and signed software that is deployed by an authenticated and authorized deployment service. No cloud service provider employee can deploy code directly onto hosts.

The report details NCC’s analysis for each of these claims. You can also find additional details about the scope, methodology, and steps that NCC used to evaluate the claims.

How Nitro System protects customer data

At AWS, we know that our customers, especially those who have sensitive or confidential data, may have worries about putting that data in the cloud. That’s why we’ve architected the Nitro System to ensure that your confidential information is as secure as possible. We do this in several ways:

There is no mechanism for any system or person to log in to Amazon EC2 servers, read the memory of EC2 instances, or access any data on encrypted Amazon Elastic Block Store (EBS) volumes.

If any AWS operator, including those with the highest privileges, needs to perform maintenance work on the EC2 server, they can do so only by using a strictly limited set of authenticated, authorized, and audited administrative APIs. Critically, none of these APIs have the ability to access customer data on the EC2 server. These restrictions are built into the Nitro System itself, and no AWS operator can circumvent these controls and protections.

The Nitro System also protects customers from AWS system software through the innovative design of our lightweight Nitro Hypervisor, which manages memory and CPU allocation. Typical commercial hypervisors provide administrators with full access to the system, but with the Nitro System, the only interface operators can use is a restricted API. This means that customers and operators cannot interact with the system in unapproved ways and there is no equivalent of a “root” user. This approach enhances security and allows AWS to update systems in the background, fix system bugs, monitor performance, and even perform upgrades without impacting customer operations or customer data. Customers are unaffected during system upgrades, and their data remains protected.

Finally, the Nitro System can also provide customers an extra layer of data isolation from their own operators and software. AWS created , which allow for isolated compute environments, which is ideal for organizations that need to process personally identifiable information, as well as healthcare, financial, and intellectual property data within their compute instances. These enclaves do not share memory or CPU cores with the customer instance. Further, Nitro Enclaves have cryptographic attestation capabilities that let customers verify that all of the software deployed has been validated and not compromised.

All of these prongs of the Nitro System’s security and confidential compute capabilities required AWS to invest time and resources into building the system’s architecture. We did so because we wanted to ensure that our customers felt confident entrusting us with their most sensitive and confidential data, and we have worked to continue earning that trust. We are not done and this is just one step AWS is taking to increase the transparency about how our services are designed and operated. We will continue to innovate on and deliver unique features that further enhance our customers’ security without compromising on performance.

Learn more:

Watch Anthony speak about AWS Nitro System Security here.

Read the NCC report.
Read this whitepaper on the security design of the AWS Nitro System.
Please visit this webpage to learn more about our ongoing commitment to Digital Sovereignty Digital Sovereignty.
Read Matt Garman’s blog about the NCC Validation Report.

Delivering on the AWS Digital Sovereignty Pledge: Control without compromise

2023-05-04 Matt Garman

Post Syndicated from Matt Garman original https://aws.amazon.com/blogs/security/delivering-on-the-aws-digital-sovereignty-pledge-control-without-compromise/

At AWS, earning and maintaining customer trust is the foundation of our business. We understand that protecting customer data is key to achieving this. We also know that trust must continue to be earned through transparency and assurances.

In November 2022, we announced the new AWS Digital Sovereignty Pledge, our commitment to offering all AWS customers the most advanced set of sovereignty controls and features available in the cloud. Two pillars of this are verifiable control over data access, and the ability to encrypt everything everywhere. We already offer a range of data protection features, accreditations, and contractual commitments that give customers control over where they locate their data, who can access it, and how it is used. Today, I’d like to update you on how we are continuing to earn your trust with verifiable control over customer data access and external control of your encryption keys.

AWS Nitro System achieves independent third-party validation

We are committed to helping our customers meet evolving sovereignty requirements and providing greater transparency and assurances to how AWS services are designed and operated. With the AWS Nitro System, which is the foundation of AWS computing service Amazon EC2, we designed and delivered first-of-a-kind innovation by eliminating any mechanism AWS personnel have to access customer data on Nitro. Our removal of an operator access mechanism was unique in 2017 when we first launched the Nitro System.

As we continue to deliver on our digital sovereignty pledge of customer control over data access, I’m excited to share with you an independent report on the security design of the AWS Nitro System. We engaged NCC Group, a global cybersecurity consulting firm, to conduct an architecture review of our security claims of the Nitro System and produce a public report. This report confirms that the AWS Nitro System, by design, has no mechanism for anyone at AWS to access your data on Nitro hosts. The report evaluates the architecture of the Nitro System and our claims about operator access. It concludes that “As a matter of design, NCC Group found no gaps in the Nitro System that would compromise these security claims.” It also goes on to state, “NCC Group finds…there is no indication that a cloud service provider employee can obtain such access…to any host.” Our computing infrastructure, the Nitro System, has no operator access mechanism, and now is supported by a third-party analysis of those data controls. Read more in the NCC Group report.

New AWS Service Term

At AWS, security is our top priority. The NCC report shows the Nitro System is an exceptional computing backbone for AWS, with security at its core. The Nitro controls that prevent operator access are so fundamental to the Nitro System that we’ve added them in our AWS Service Terms, which are applicable to anyone who uses AWS.

Our AWS Service Terms now include the following on the Nitro System:

AWS personnel do not have access to Your Content on AWS Nitro System EC2 instances. There are no technical means or APIs available to AWS personnel to read, copy, extract, modify, or otherwise access Your Content on an AWS Nitro System EC2 instance or encrypted-EBS volume attached to an AWS Nitro System EC2 instance. Access to AWS Nitro System EC2 instance APIs – which enable AWS personnel to operate the system without access to Your Content – is always logged, and always requires authentication and authorization.

External control of your encryption keys with AWS KMS External Key Store

As part of our promise to continue to make the AWS Cloud sovereign-by-design, we pledged to continue to invest in an ambitious roadmap of capabilities, which includes our encryption capabilities. At re:Invent 2022, we took further steps to deliver on this roadmap of encrypt everything everywhere with encryption keys managed inside or outside the AWS Cloud by announcing the availability of AWS Key Management Service (AWS KMS) External Key Store (XKS). This innovation supports our customers who have a regulatory need to store and use their encryption keys outside the AWS Cloud. The open source XKS specification offers customers the flexibility to adapt to different HSM deployment use cases. While AWS KMS also prevents AWS personnel from accessing customer keys, this new capability may help some customers demonstrate compliance with specific regulations or industry expectations requiring storage and use of encryption keys outside of an AWS data center for certain workloads.

In order to accelerate our customers’ ability to adopt XKS for regulatory purposes, we collaborated with external HSM, key management, and integration service providers that our customers trust. To date, Thales, Entrust, Fortanix, DuoKey, and HashiCorp have launched XKS implementations, and Salesforce, Atos, and T-Systems have announced that they are building integrated service offerings around XKS. In addition, many SaaS solutions offer integration with AWS KMS for key management of their encryption offerings. Customers using these solutions, such as the offerings from Databricks, MongoDB, Reltio, Slack, Snowflake, and Zoom, can now utilize keys in external key managers via XKS to secure data. This allows customers to simplify their key management strategies across AWS as well as certain SaaS solutions by providing a centralized place to manage access policies and audit key usage.

We remain committed to helping our customers meet security, privacy, and digital sovereignty requirements. We will continue to innovate sovereignty features, controls, and assurances within the global AWS Cloud and deliver them without compromise to the full power of AWS.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Build an analytics pipeline for a multi-account support case dashboard

2023-05-01 Sindhura Palakodety

Post Syndicated from Sindhura Palakodety original https://aws.amazon.com/blogs/big-data/build-an-analytics-pipeline-for-a-multi-account-support-case-dashboard/

As organizations mature in their cloud journey, they have many accounts (even hundreds) that they need to manage. Imagine having to manage support cases for these accounts without a unified dashboard. Administrators have to access each account either by switching roles or with single sign-on (SSO) in order to view and manage support cases.

This post demonstrates how you can build an analytics pipeline to push support cases created in individual member AWS accounts into a central account. We also show you how to build an analytics dashboard to gain visibility and insights on all support cases created in various accounts within your organization.

Overview of solution

In this post, we go through the process to create a pipeline to ingest, store, process, analyze, and visualize AWS support cases. We use the following AWS services as key components:

Amazon EventBridge and AWS Lambda to ingest support case data in real time from various AWS accounts
Amazon Simple Storage Service (Amazon S3) to store the support case data
AWS Glue DataBrew to clean and transform the data
AWS Glue crawlers to catalog the data
AWS Step Functions to orchestrate the flow
Amazon Athena to query the processed data
An Amazon QuickSight dashboard that provides insights into various support case metrics

The following diagram illustrates the architecture.

The central account is the AWS account that you use to centrally manage the support case data.

Member accounts are the AWS accounts where, whenever the support cases are created, the data flows into an S3 bucket in the central account that can be visualized using the QuickSight dashboard in the central account.

To implement this solution, you complete the following high-level steps:

Determine the AWS accounts to use for the central account and member accounts.
Set up permissions for AWS CloudFormation StackSets on the central account and member accounts.
Create resources on the central account using AWS CloudFormation.
Create resources on the member accounts using CloudFormation StackSets.
Open up support cases on the member accounts.
Visualize the data in a QuickSight dashboard in the central account.

Prerequisites

Complete the following prerequisite steps:

Create AWS accounts if you haven’t done so already.
Before you get started, make sure that you have a Business or Enterprise support plan for your member accounts.
Sign up for QuickSight if you have never used QuickSight in this account before. To use the forecast capability in QuickSight, sign up for the Enterprise Edition.

Preparation for CloudFormation StackSets

In this section, we go through the steps to set up permissions for StackSets in both the central and member accounts.

Set up permissions for StackSets on the central account

To set up permissions on the central account, complete the following steps:

Sign in to the AWS Management Console of the central account.
Download the administrator role CloudFormation template.
On the AWS CloudFormation console, choose Create stack and With new resources.
Leave the Prepare template setting as default.
For Template source, select Upload a template file.
Choose Choose file and supply the CloudFormation template you downloaded: AWSCloudFormationStackSetAdministrationRole.yml.
Choose Next.
For Stack name, enter StackSetAdministratorRole.
Choose Next.
For Configure stack options, we recommend configuring tags, which are key-value pairs that can help you identify your stacks and the resources they create. For example, enter Owner as the key, and your email address as the value.
We don’t use additional permissions or advanced options, so accept the default values and choose Next.
Review your configuration and select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
Choose Create stack.

The stack takes about 30 seconds to complete.

Set up permissions for StackSets on member accounts

Now that we’ve created a StackSet administrator role on the central account, we need to create the StackSet execution role on the member accounts. Perform the following steps on all member accounts:

Sign in to the console on the member account.
Download the execution role CloudFormation template.
On the AWS CloudFormation console, choose Create stack and With new resources.
Leave the Prepare template setting as default.
For Template source, select Upload a template file.
Choose Choose file and supply the CloudFormation template you downloaded: AWSCloudFormationStackSetExecutionRole.yml.
Choose Next.
For Stack name, use StackSetExecutionRole.
For Parameters, enter the 12-digit account ID for the central account.
Choose Next.
For Configure stack options, we recommend configuring tags. For example, enter Owner as the key and your email address as the value.
We don’t use additional permissions or advanced options, so choose Next.

For more information, see Setting AWS CloudFormation stack options.

Review your configuration and select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
Choose Create stack.

The stack takes about 30 seconds to complete.

Set up the infrastructure for the central account and member accounts

In this section, we go through the steps to create your resources for both accounts and launch the StackSets.

Create resources on the central account with AWS CloudFormation

To launch the provided CloudFormation template, complete the following steps:

Sign in to the console on the central account.
Choose Launch Stack:
Choose Next.
For Stack name, enter a name. For example, support-case-central-account.
For AWSMemberAccountIDs, enter the member account IDs separated by commas from where support case data is gathered.
For Support Case Raw Data Bucket, enter the S3 bucket in the central account that holds the support case raw data from all member accounts. Note the name of this bucket to use in future steps.
For Support Case Transformed Data Bucket, enter the S3 bucket in central account that holds the support case transformed data. Note the name of this bucket to use in future steps.
Choose Next.
Enter any tags you want to assign to the stack and choose Next.
Select the acknowledgement check boxes and choose Create stack.

The stack takes approximately 5 minutes to complete. Wait until the stack is complete before proceeding to the next steps.

Launch CloudFormation StackSets from the central account

To launch StackSets, complete the following steps:

Sign in to the console on the central account.
On the AWS CloudFormation console, choose StackSets in the navigation pane.
Choose Create StackSet.
Leave the IAM execution role name as AWSCloudFormationStackSetExecutionRole.
If AWS Organizations is enabled, under permissions, select Service-managed permissions.
Leave the Prepare template setting as default.
For Template source, select Amazon S3 URL.
Enter the following Amazon S3 URL under Specify Template: https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/BDB-2583/AWS_MemberAccount_SupportCaseDashboard_CF.yaml
Choose Next.
For StackSet name, enter a name. For example, support-case-member-account.
For CentralSupportCaseRawBucket, enter the name of the Support Case Raw Data Bucket created in the central account, which you noted previously.
For CentralAccountID, enter the account ID of the central account.
For Configure StackSet options, we recommend configuring tags.
Leave the rest as default and choose Next.
If AWS Organizations is enabled, in the Set deployment options step, for Deployment targets, you can either choose Deploy to organization or Deploy to organizational units (OU).
- If you deploy to OUs, you will need to specify the AWS OU ID.
If AWS Organizations is not enabled, on the Set Deployment Options page, under Accounts, select Deploy stacks in accounts.
- Under Account numbers, enter the 12-digit account IDs for the member accounts as a comma-separated list. For example: 111111111111,222222222222.
Under Specify regions, choose US East (N. Virginia).

Due to the limitation of EventBridge with the AWS Support API, this StackSet has to be deployed only in the US East (N. Virginia) Region.

Optionally, you can change the maximum concurrent accounts to match the number of member accounts, adjust the failure tolerance to at least 1, and choose Region Concurrency to be Parallel to set up resources in parallel on the member accounts.
Review your selections, select the acknowledgement check boxes, and choose Submit.

The operation takes about 2–3 minutes to complete.

Visualize your support cases in QuickSight in the central account

In this section, we go through the steps to visualize your support cases in QuickSight.

Grant QuickSight permissions

To grant QuickSight permissions, complete the following steps:

Sign in to the console on the central account.
On the QuickSight console, on the Admin drop-down menu in top right-hand corner, choose Manage QuickSight.
In the navigation pane, choose Security & permissions.
Under QuickSight access to AWS services, choose Manage.
Select Amazon Athena.
Select Amazon S3 to edit QuickSight access to your S3 buckets.
Select the bucket you specified during stack creation.
Choose Finish.
Choose Save.

Prepare the datasets

To prepare your datasets, complete the following steps:

On the QuickSight console, choose Datasets in the navigation pane.
Choose New dataset.
Choose Athena.
For Data source name, enter support-case-data-source.
Choose Validate connection.
After your connection is validated, choose Create data source.
For Database, choose support-case-transformed-data.
For Tables, select the table under the database (there should only be one table that matches the name of the S3 bucket you set as the destination for the transformed data).
Choose Edit/Preview data.
Leave Query mode set as Direct Query.
Choose the options menu (three dots) next to the field case_creation_year and set Change data type to Date.
Enter the date format as yyyy, then choose Validate and Update.
Similarly, right-click on the field case_creation_month and set Change data type to Date.
Enter the date format as MM, then choose Validate and Update.
Right-click on the field case_creation_day and set Change data type to Date.
Enter the date format as dd, then choose Validate and Update.
Right-click on the field case_creation_time and set Change data type to Date.
Enter the date format as yyyy-MM-dd’T’HH:mm:ss.SSSZ, then choose Validate and Update.
Change the name of the QuickSight dataset to support-cases-dataset.
Choose Save & publish.
Note the dataset ID from the URL (alpha-numeric string between datasets and view, excluding slashes) to use later for QuickSight dashboard creation.

Choose Cancel to exit this page.

Set up the QuickSight dashboard from a template

To set up your QuickSight dashboard, complete the following steps:

Navigate to the following link, then right-click and choose Save As to download the QuickSight dashboard JSON template from the browser.
On the console, choose the user profile drop-down menu.
Choose the copy icon next to the Account ID: field (of the central account).

Open the JSON file with a text editor and replace xxxxx with the account ID. This will be replaced in two places.
Replace yyyyy with the dataset ID that you previously noted.
Replace rrrrr with the Region where you deployed resources in the central account.

To determine the principal (user) to be used for the dashboard creation, you can use AWS CloudShell.

Navigate to CloudShell on the console. Ensure it’s the same Region where your resources are deployed.

Wait until the environment gets created and you see the CloudShell prompt.

Run the following command, providing your account ID (central account) and Region:

aws quicksight list-users –region <region> --aws-account-id <account-id> --namespace default

From the output, select the value of the ARN field. Replace the value of zzzzz with the ARN.
Optionally, you can change the name of the dashboard by changing the value of the fields in the JSON file:
- For DashboardId, enter SupportCaseCentralDashboard.
- For Name, enter SupportCaseCentralDashboard.
Save the changes to the JSON file.

Now we use CloudShell to upload the JSON file provided in the previous step.

On the Actions menu, choose Upload file.

To create the QuickSight dashboard from the JSON template, use the following AWS Command Line Interface (AWS CLI) command and pass the updated JSON file as an argument, providing your Region:
```
aws quicksight create-dashboard –region <region> --cli-input-json file://support-case-dashboard-template.json
```

The output of the command looks similar to the following screenshot.

In case of any issues or if you want to see more details about the dashboard, you can use the following command:

aws quicksight describe-dashboard --region <region> --aws-account-id <central-account-id> --dashboard-id <DashboardId in screenshot above>

On the QuickSight console, choose Dashboards in the navigation pane.
Choose Support Cases Dashboard.

You should see a dashboard similar to the screenshot shown at the beginning of this post, but there should only be one case.

Add additional member accounts

If you want to add additional member accounts, you need to update the CloudFormation stack that you created earlier on the central account. If you followed our name recommendation, the stack is called support-case-central-account-stack. Add the additional account number in the Member Account IDs parameter.

Next, go to the StackSet in the central account. If you followed our naming recommendation, the StackSet is called support-case-member-account. Select the StackSet and on the Actions menu, choose Add stacks to StackSet. Then follow the same instructions that you followed previously when you created the StackSet.

Monitor support cases created in the central account

So far, our setup will monitor all support cases created in the member accounts that you specified. However, it doesn’t include support cases that you create in the central account. To set up monitoring for the central account, complete the following steps:

Update the CloudFormation stack that you created earlier on the central account. If you followed our name recommendation, the stack is called support-case-central-account-stack. Add the central account ID in the Member Account IDs parameter.
Sign in to the CloudFormation console in the central account.
Choose Launch Stack:
Choose Next.
For Stack name, enter a name. For example, support-case-central-as-member-account.
For CentralAccountIDs, enter the central account ID.
For CentralSupportCaseRawBucket, enter the S3 bucket in the central account that holds the support case raw data from all member accounts.
Choose Next.
Enter any tags you want to assign to the stack and choose Next.
Select the acknowledgement check boxes and choose Create stack.

Clean up

To avoid incurring future charges, delete the resources you created as part of this solution.

Troubleshooting

Note the following troubleshooting tips:

Make sure that you create the CloudFormation stacks and StackSet in the correct accounts: central and member.
If you get a permission denied error from Athena on the S3 path (see the following screenshot), review the steps to grant QuickSight permissions.

When creating the QuickSight dashboard using the template, if you get an error similar to the following, make sure that you use the ARN value from the output generated by the aws quicksight list-users --region <region> --aws-account-id <account-id> --namespace default command.

An error occurred (InvalidParameterValueException) when calling the CreateDashboard operation: Principal ARN xxxx is not part of the same account yyyy

When deleting the stack, if you encounter the DELETE_FAILED error, it means that your S3 bucket is not empty. To fix it, empty the contents of the bucket and try to delete the Stack again.

Conclusion

Congratulations! You have successfully built an analytics pipeline to push support cases created in individual member accounts into a central account. You have also built an analytics dashboard to gain visibility and insights on all support cases created in various accounts. As you start creating support cases in your member accounts, you will be able to view them in a single pane of glass.

With the steps and resources described in this post, you can build your own analytics dashboard to gain visibility and insights on all support cases created in various accounts within your organization.

About the authors

Sindhura Palakodety is a Solutions Architect at AWS. She is passionate about helping customers build enterprise-scale Well-Architected solutions on the AWS platform and specializes in the data analytics domain.

Shu Sia Lukito is a Partner Solutions Architect at AWS. She is on a mission to help AWS partners build successful AWS practices and help their customers accelerate their journey to the cloud. In her spare time, she enjoys spending time with her family and making spicy food.

AWS Security Profile: Tatyana Yatskevich, Principal Solutions Architect for AWS Identity

2023-04-28 Maddie Bacon

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws-security-profile-tatyana-yatskevich-principal-solutions-architect-for-aws-identity/

In the AWS Security Profile series, I interview some of the humans who work in AWS Security and help keep our customers safe and secure. In this profile, I interviewed Tatyana Yatskevich, Principal Solutions Architect for AWS Identity.

How long have you been at AWS and what do you do in your current role?

I’ve been at AWS for about five and a half years now. I’ve had several different roles, but I’m currently part of the Identity Solutions team, which is a team of solutions architects who are embedded into the Identity and Control Service. Our team focuses on staying current with customer use cases and emerging problems in the identity space so that we can facilitate the development of new capabilities and prescriptive guidance from AWS.

To keep up with the demand in certain industries, we work with some enterprise customers that operate large cloud environments on AWS. Knowing what these customers need to do to achieve their business outcomes while operating under stringent regulatory compliance requirements helps us provide valuable input into our service and feature development process and support customers in their cloud journey in the most efficient manner.

How did you get started in security?

At the beginning of my career, I mostly just happened to work on security-related projects. I performed security and vulnerability assessments, facilitated remediation work, and managed traditional on-premises security solutions such as web proxies, firewalls, and VPNs. Through these projects, I developed an interest in the security field because of its wide reach and impact, and because it presents a lot of opportunities for growth and problem solving as new challenges arise almost daily. My roles at AWS have been a logical continuation of my security-focused career. Here, I’ve mostly been motivated by empowering security teams to become business enablers, rather than being perceived as blockers to innovation and agility.

How do you explain your job to non-technical friends and family?

I usually give an example of a service or feature that most of us interact with on a regular basis, such as a banking application. I explain that it takes a lot of engineering work to build that application from the ground up and deliver on the user experience and security. That engineering work involves the use of many different technologies that support the user sign-in process, or storage of your personal information like your social security or credit card numbers. My job is to help companies that provide these services implement the proper security controls so that your personal information is used in accordance with local laws and isn’t disclosed for unauthorized use.

In your opinion, what’s one of the coolest things happening in identity right now?

I think it’s the increased role of identity, authentication, and authorization controls in the overall security model of newly built applications. It spans from helping to ensure secure workforce mobility now that providing access to business applications from anywhere is critical to business competitiveness, to keeping Internet of Things (IoT) infrastructure protected and operated in accordance with zero trust. The realization of the power and the increasing usage of identity-specific controls to manage access to digital assets is the coolest trend in identity right now.

What are you currently working on that you’re excited about?

One of the areas that I’m highly invested in is data perimeters. A data perimeter is a set of capabilities that help customers keep their data within their organizational boundary and mitigate the risks of data exfiltration or unintended access to data. We have customers in a wide variety of industries, such as the financial sector, telecom, media and entertainment, and public sector. There are compliance and regulatory requirements that they operate under. A lot of those requirements emphasize controls that guard sensitive data from unauthorized access and prevent movement of that data to places outside of company’s control.

To help customers meet these requirements in a scalable way, we continuously invest in the development of new capabilities. I talk to some of our largest enterprise customers on a regular basis to understand their challenges in this area, and I work with service teams to introduce new capabilities to meet new requirements. I also lead efforts to extend customer-facing guidance and solutions so that customers can design and implement data perimeters on their own. And I present at AWS events to reach more customers, with the most recent being our presentation with Goldman Sachs at re:Invent 2022.

Tell me about that presentation.

I co-presented a chalk talk with Shubham Shukla, Vice President of Cloud Enablement at Goldman Sachs, called Establishing a Data Perimeter on AWS. The session gave an overview of data perimeter capabilities and showcased Goldman Sachs’ experience implementing data perimeter controls at scale in their multi-account AWS environment. What’s cool about that session, I think, is that it’s always good to present about AWS best practices and our view of how certain things should be done, but it’s extra powerful when we include a customer. This is especially true when a large enterprise customer such as Goldman Sachs shares their experience and talks about how they do certain things in practice, like mapping specific requirements to the actual implementation and talking through lessons learned and their perspective on the problem and solution. A lot of our customers are interested in learning from other customers how to build and operate enterprise security controls at scale. We did a similar presentation with Vanguard at re:Inforce 2022, and I look forward to future opportunities to showcase the awesome work being done by our customers.

What is your favorite Amazon Leadership Principle and why?

Customer Obsession. For me, the core of it is building deeper, longer lasting relationships with our customers and taking their learnings back to our business to work backwards from the actual customer needs. Building better products, helping customers meet their business goals, and having wide-reaching impact is what makes me so excited to come to work every day.

What’s the thing you’re most proud of in your career?

As part of my former role as a security consultant in the AWS Professional Services organization, I led security-related projects to either help customers migrate their workloads to AWS or perform security assessments of their existing AWS environment. Part of that role involved developing mechanisms to better engage with customers on security-related topics and help them develop their own security strategy for running workloads on AWS. That work sometimes involved challenging conversations with customers. I would explain the value of the technology that AWS provides and help customers figure out how to implement AWS services to meet both their business and security needs. I took learnings from these conversations and developed some internal assets that helped newer AWS security consultants conduct those conversations more effectively, and I mentored them throughout the process.

If you had to pick an industry outside of security, what would you want to do?

I would be in the travel industry. I absolutely love visiting new places and exploring nature. I love learning the history and culture of different regions, and trying out different cuisines. It’s something that helps me learn more about myself through new experiences and ultimately be a happier person.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Working with percolators in Amazon OpenSearch Service

2023-04-27 Arun Lakshmanan

Post Syndicated from Arun Lakshmanan original https://aws.amazon.com/blogs/big-data/working-with-percolators-in-amazon-opensearch-service/

Amazon OpenSearch Service is a managed service that makes it easy to secure, deploy, and operate OpenSearch and legacy Elasticsearch clusters at scale in the AWS Cloud. Amazon OpenSearch Service provisions all the resources for your cluster, launches it, and automatically detects and replaces failed nodes, reducing the overhead of self-managed infrastructures. The service makes it easy for you to perform interactive log analytics, real-time application monitoring, website searches, and more by offering the latest versions of OpenSearch, support for 19 versions of Elasticsearch (1.5 to 7.10 versions), and visualization capabilities powered by OpenSearch Dashboards and Kibana (1.5 to 7.10 versions). Amazon OpenSearch Service now offers a serverless deployment option (public preview) that makes it even easier to use OpenSearch in the AWS cloud.

A typical workflow for OpenSearch is to store documents (as JSON data) in an index, and execute searches (also JSON) to find those documents. Percolation reverses that. You store searches and query with documents. Let’s say I’m searching for a house in Chicago that costs < 500K. I could go to the website every day and run my query. A clever website would be able to store my requirements (a query) and notify me when something new (a document) comes up that matches my requirements. Percolation is an OpenSearch feature that enables the website to store these queries and run documents against them to find new matches.

In this post, We will explore how to use percolators to find matching homes from new listings.

Before getting into the details of percolators, let’s explore how search works. When you insert a document, OpenSearch maintains an internal data structure called the “inverted index” which speeds up the search.

Indexing and Searching:

Let’s take the above example of a real estate application having the simple schema of type of the house, city, and the price.

First, let’s create an index with mappings as below

PUT realestate
{
     "mappings": {
        "properties": {
           "house_type": { "type": "keyword"},
           "city": { "type": "keyword" },
           "price": { "type": "long" }
         }
    }
}

Let’s insert some documents into the index.

ID	House_type	City	Price
1	townhouse	Chicago	650000
2	house	Washington	420000
3	condo	Chicago	580000

POST realestate/_bulk 
{ "index" : { "_id": "1" } } 
{ "house_type" : "townhouse", "city" : "Chicago", "price": 650000 }
{ "index" : { "_id": "2" } }
{ "house_type" : "house", "city" : "Washington", "price": 420000 }
{ "index" : { "_id": "3"} }
{ "house_type" : "condo", "city" : "Chicago", "price": 580000 }

As we don’t have any townhouses listed in Chicago for less than 500K, the below query returns no results.

GET realestate/_search
{
  "query": {
    "bool": {
      "filter": [ 
        { "term": { "city": "Chicago" } },
        { "term": { "house_type": "townhouse" } },
        { "range": { "price": { "lte": 500000 } } }
      ]
    }
  }
}

If you’re curious to know how search works under the hood at high level, you can refer to this article.

Percolation:

If one of your customers wants to get notified when a townhouse in Chicago is available, and listed at less than $500,000, you can store their requirements as a query in the percolator index. When a new listing becomes available, you can run that listing against the percolator index with a _percolate query. The query will return all matches (each match is a single set of requirements from one user) for that new listing. You can then notify each user that a new listing is available that fits their requirements. This process is called percolation in OpenSearch.

OpenSearch has a dedicated data type called “percolator” that allows you to store queries.

Let’s create a percolator index with the same mapping, with additional fields for query and optional metadata. Make sure you include all the necessary fields that are part of a stored query. In our case, along with the actual fields and query, we capture the customer_id and priority to send notifications.

PUT realestate-percolator-queries
{
  "mappings": {
    "properties": {
      "user": {
         "properties": {
            "query": { "type": "percolator" },
            "id": { "type": "keyword" },
            "priority":{ "type": "keyword" }
         }
      },
      "house_type": {"type": "keyword"},
      "city": {"type": "keyword"},
      "price": {"type": "long"}
    }
  }
}

After creating the index, insert a query as below

POST realestate-percolator-queries/_doc/chicago-house-alert-500k
{
  "user" : {
     "id": "CUST101",
     "priority": "high",
     "query": {
        "bool": {
           "filter": [ 
                { "term": { "city": "Chicago" } },
                { "term": { "house_type": "townhouse" } },
                { "range": { "price": { "lte": 500000 } } }
            ]
        }
      }
   }
}

The percolation begins when a new document gets run against the stored queries.

{"city": "Chicago", "house_type": "townhouse", "price": 350000}
{"city": "Dallas", "house_type": "house", "price": 500000}

Run the percolation query with document(s), and it matches the stored query

GET realestate-percolator-queries/_search
{
  "query": {
     "percolate": {
        "field": "user.query",
        "documents": [ 
           {"city": "Chicago", "house_type": "townhouse", "price": 350000 },
           {"city": "Dallas", "house_type": "house", "price": 500000}
        ]
      }
   }
}

The above query returns the queries along with the metadata we stored (customer_id in our case) that matches the documents

{
    "took" : 11,
    "timed_out" : false,
    "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
     },
     "hits" : {
        "total" : {
           "value" : 1,
           "relation" : "eq"
         },
         "max_score" : 0.0,
         "hits" : [ 
         {
              "_index" : "realestate-percolator-queries",
              "_id" : "chicago-house-alert-500k",
              "_score" : 0.0,
              "_source" : {
                   "user" : {
                       "id" : "CUST101",
                       "priority" : "high",
                       "query" : {
                            "bool" : {
                                 "filter" : [ 
                                      { "term" : { "city" : "Chicago" } },
                                      { "term" : { "house_type" : "townhouse" } },
                                      { "range" : { "price" : { "lte" : 500000 } } }
                                 ]
                              }
                        }
                  }
            },
            "fields" : {
                "_percolator_document_slot" : [0]
            }
        }
     ]
   }
}

Percolation at scale

When you have a high volume of queries stored in the percolator index, searching queries across the index might be inefficient. You can consider segmenting your queries and use them as filters to handle the high-volume queries effectively. As we already capture priority, you can now run percolation with filters on priority that reduces the scope of matching queries.

GET realestate-percolator-queries/_search
{
    "query": {
        "bool": {
            "must": [ 
             {
                  "percolate": {
                      "field": "user.query",
                      "documents": [ 
                          { "city": "Chicago", "house_type": "townhouse", "price": 35000 },
                          { "city": "Dallas", "house_type": "house", "price": 500000 }
                       ]
                  }
              }
          ],
          "filter": [ 
                  { "term": { "user.priority": "high" } }
            ]
       }
    }
}

Best practices

Prefer the percolation index separate from the document index. Different index configurations, like number of shards on percolation index, can be tuned independently for performance.
Prefer using query filters to reduce matching queries to percolate from percolation index.
Consider using a buffer in your ingestion pipeline for reasons below,
1. You can batch the ingestion and percolation independently to suit your workload and SLA
2. You can prioritize the ingest and search traffic by running the percolation at off hours. Make sure that you have enough storage in the buffering layer.

Consider using an independent cluster for percolation for the below reasons,
1. The percolation process relies on memory and compute, your primary search will not be impacted.
2. You have the flexibility of scaling the clusters independently.

Conclusion

In this post, we walked through how percolation in OpenSearch works, and how to use effectively, at scale. Percolation works in both managed and serverless versions of OpenSearch. You can follow the best practices to analyze and arrange data in an index, as it is important for a snappy search performance.

If you have feedback about this post, submit your comments in the comments section.

About the author

Arun Lakshmanan is a Search Specialist with Amazon OpenSearch Service based out of Chicago, IL. He has over 20 years of experience working with enterprise customers and startups. He loves to travel and spend quality time with his family.

Optimizing Amazon EC2 Spot Instances with Spot Placement Scores

2023-04-27 Sheila Busser

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/optimizing-amazon-ec2-spot-instances-with-spot-placement-scores/

This blog post is written by Steve Cole, Principal Specialist SA, and Robert McCone, Sr. Specialist SA.

Getting the compute resources you need, even vCPUS numbering in the millions, and completing a workload using Amazon EC2 Spot Instances is just a configuration away. In this post you will learn how to use Spot placement scores to reduce interruptions, acquire greater capacity, and identify optimal configurations, times, and locations to run workloads on Spot Instances. Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud and are available at up to a 90% discount compared to On-Demand prices. Spot placement scores is a feature that many customers use to identify optimal instance types or to choose the best Availability Zone (AZ) for ephemeral work like data analytics or high-performance computing. As a real-time tool, Spot placement scores are often integrated into deployment automation. However, because of its logging and graphic capabilities, you may find it be a valuable resource even before you launch a workload into the cloud. Now available through AWS Labs, a Github repository hosting tools for customers, the Spot placement score tracker tackles the undifferentiated heavy lifting and can do this for any customer.

About Spot placement score

Spot placement scores are a feature available through AWS APIs – also implemented in the Amazon EC2 Spot requests console – that uses internal capacity and interruption data to scrutinize the size and shape of a Spot Instance request and responds with a “likelihood of success” rating of 1 to imply lower likelihood of success and 10 to imply higher likelihood of success. The score represents confidence in being able to acquire the desired capacity (size) using the instance configuration (shape) for the next few hours. The shape of the request can be a list of specific instances or can be requirements-based with attribute-based instance type selection. The size of the request can be instance count, number of vCPUs, or GB of RAM. It’s based on known capacity, allocation strategies, and the trending of capacities over time.

Before the release of Spot placement score, customers could track the trends of their existing workloads and configurations. This might have helped them to anticipate capacity constraints over time, but the ability to do something more meaningful when assessing configurations was something customers requested often. With the launch of Spot placement score, that capability was delivered and enabled customers to receive guidance on how a configuration change might affect the effectiveness of Spot Instances in a workload.

Customers immediately recognized the power of this new feature and started writing tooling around their workloads to incorporate the new functionality provided by Spot placement scores. For examples, customers leveraged Spot placement scores to find the highest scoring AZ in a region for work that requires low latency within a cluster. Customers running data analytics with services like Amazon EMR could more confidently launch clusters on Spot Instances. This reduces costs and the time necessary to process data because of fewer interruptions. Financial customers, health care and life sciences, and high tech were some of the early adopters of this strategy.

Benefits of Spot placement scores

One specific customer used tools like the Spot instance advisor and Spot pricing history tools to make decisions about what instances to run every night. If the customer’s analytics workload received too many interruptions, then it would inevitably be relaunched using On-Demand Instances, increasing costs and time-to-complete. The addition of Spot placement scores to the customer’s tooling allowed for more informed decisions about which configurations worked best and, more specifically, which AZ(s) to use. Ultimately, this led not only to higher confidence in using Spot instances, but also to significant cost savings over time.

Other customers tracked Spot placement scores over time with regular queries stored in time series databases to identify not only the best configuration or location, but also the best time-of-day or day-of-week to run their workloads. Different configurations of instance types were queried through automation and the results were logged into a time series database that could then be presented as graphs. These graphs were scrutinized, configurations were tuned, and ultimately these customers could take greater advantage of the cost optimization that Spot instances offer through fewer interruptions by running their workloads where and when scores were higher.

AWS was interested in how this solved problems for customers, and after some more research with customers and design ideation, led to the creation of an OSS tool that AWS has recently released: Spot placement score tracker. Spot placement score tracker helps customers evaluate different configurations against multiple times and locations. It’s an AWS-native solution that leverages the Spot placement score API along with AWS Lambda and Amazon CloudWatch to create a dashboard that enables any AWS customer to benefit from this model without having to write it themselves.

How to use the Spot placement score tracker

The project provides Infrastructure as Code (IaC) automation using the AWS Cloud Development Kit (AWS CDK) to deploy the infrastructure and permissions required to run Lambda. This gets executed every five minutes to collect the placement scores of as many diversified configurations as defined.

After installing the CloudWatch dashboard, and given some time to collect and record data, you will be provided valuable insights in an intuitive graph such as those in the following example.

Insights available through the Spot placement score tracker

The first thing you may notice by observing data over time is that instance diversification is the primary driver of high placement scores. This has always been a best practice for the use of Spot Instances, and it extends to On-Demand Instances as well. In short, if you can only run on one instance type, then the likelihood of experiencing interruptions is far greater than if you can run on six or twelve. Sometimes the simple inclusion of -a, -d, and -n instance types (e.g. m5.large, m5a.large, m5d.large, m5d.large), previous generations (e.g., m5.large, m4.large), different sizes in a container environment (e.g., m5.large, m5.xlarge, m5.2xlarge), and even the inclusion of AWS Graviton will have a material impact on placement scores, which equates to fewer interruptions. This ultimately leads to more efficient use of resources through less restarted processes, resulting in increased efficiency and reduced costs.

The second insight that you can realize through the use of placement scores over time is identifying the optimal AZ in which an ephemeral process can be placed. Perhaps the best use case for this type of insight is data analytics clusters that are launched to complete many calculations overnight. This is common in financial institutions for various reasons including risk analysis and compliance, but could apply to medical research examining results of experiments during the day as well as other situations where a 24/7 presence isn’t required by the workload. These customers are typically using a single AZ to allow for faster communication between nodes and to reduce data transfer costs. Therefore, the ability for Spot placement scores to provide different scores for different AZs is highly advantageous.

Third, with access to placement scores over time, it becomes possible to identify exactly how large a workload’s footprint can be. By submitting identical configurations to Spot placement scores but with different sizes, you can surface the ideal workload size. Not too small, where perhaps the job takes too long to complete, but also not so large that the interruptions are too frequent and cause restarts too often. This can benefit not only ephemeral workloads, but also persistent clusters or fleets by understanding what the lowest score would be over time and giving you solid information regarding what they can expect from Spot Instances and where. This might inform you to be ready to launch On-Demand Instances to compensate when Spot Instance availability is lower. This can also help to forecast pricing and inform decisions about the consideration of AWS Savings Plans or On-Demand Capacity Reservations.

Finally, analyzing Spot placement scores over time can provide regional scoring. Through this lens it’s possible for you to identify entire regions that they may have overlooked without the knowledge that Spot Instances outside the your primary region(s) might offer lower interruptions during daylight hours due to them being off-peak. When it’s possible to place a workload in another region, unconstrained by local data access requirements, it’s quite possible to harness the compute of a significant footprint in locations that are otherwise un(der)-utilized. Workloads that require less data transfer and more compute can benefit tremendously from access to Spot Instances in other regions. For example, things like build servers might run extraordinarily well in Europe during North American business hours and the reduction in compute cost might offset the data transfer to complete the job.

Conclusion

Spot placement scores can be used to make decisions about how, when, and where Spot Instances can be most efficiently utilized to deliver business needs, and at greatly reduced prices. We’re very excited to release this tool to enable you to tap into information which was previously unavailable and make data-driven decisions for your business. The information in this post, combined with the output of placement scores over time, is a significant evolution.

Install the Spot placement score tracker today, configure it to match an existing Spot workload, and see how you might perform at different times or different locations. Explore more robust options and discover greater capacity and lower interruptions. Or investigate how On-Demand workloads could migrate to Spot Instances.

Let’s Architect! Getting started with containers

2023-04-26 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-getting-started-with-containers/

Most of AWS customers building cloud-native applications or modernizing applications choose containers to run their microservices applications to accelerate innovation and time to market while lowering their total cost of ownership (TCO). Using containers in AWS comes with other benefits, such as increased portability, scalability, and flexibility.

The combination of containers technologies and AWS services also provides features such as load balancing, auto scaling, and service discovery, making it easier to deploy and manage applications at scale.

In this edition of Let’s Architect! we share useful resources to help you to get started with containers on AWS.

Container Build Lens

This whitepaper describes the Container Build Lens for the AWS Well-Architected Framework. It helps customers review and improve their cloud-based architectures and better understand the business impact of their design decisions. The document describes general design principles for containers, as well as specific best practices and implementation guidance using the Six Pillars of the Well-Architected Framework.

Take me to explore the Containers Build Lens!

Follow Containers Build Lens Best practices to architect your containers-based workloads.

EKS Workshop

The EKS Workshop is a useful resource to familiarize yourself with Amazon Elastic Kubernetes Service (Amazon EKS) by practicing on real use-cases. It is built to help users learn about Amazon EKS features and integrations with popular open-source projects. The workshop is abstracted into high-level learning modules, including Networking, Security, DevOps Automation, and more. These are further broken down into standalone labs focusing on a particular feature, tool, or use case.

Once you’re done experimenting with EKS Workshop, start building your environments with Amazon EKS Blueprints, a collection of Infrastructure as Code (IaC) modules that helps you configure and deploy consistent, batteries-included Amazon EKS clusters across accounts and regions following AWS best practices. Amazon EKS Blueprints are available in both Terraform and CDK.

Take me to this workshop!

The workshop is abstracted into high-level learning modules, including Networking, Security, DevOps Automation, and more.

Architecting for resiliency on AWS App Runner

Learn how to architect an highly available and resilient application using AWS App Runner. With App Runner, you can start with just the source code of your application or a container image. The complexity of running containerized applications is abstracted away, including the cloud resources needed for running your web application or API. App Runner manages load balancers, TLS certificates, auto scaling, logs, metrics, teachability and more, so you can focus on implementing your business logic in a highly scalable and elastic environment.

Take me to this blog post!

A high-level architecture for an available and resilient application with AWS App Runner

Securing Kubernetes: How to address Kubernetes attack vectors

As part of designing any modern system on AWS, it is necessary to think about the security implications and what can affect your security posture. This session introduces the fundamentals of the Kubernetes architecture and common attack vectors. It also includes security controls provided by Amazon EKS and suggestions on how to address them. With these strategies, you can learn how to reduce risk for your Kubernetes-based workloads.

Take me to this video!

Some common attack vectors that need addressing with Kubernetes

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about serverless.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Scaling security and compliance

2023-04-18 Chad Woolf

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/scaling-security-and-compliance/

At Amazon Web Services (AWS), we move fast and continually iterate to meet the evolving needs of our customers. We design services that can help our customers meet even the most stringent security and compliance requirements. Additionally, our service teams work closely with our AWS Security Guardians program to coordinate security efforts and to maintain a high quality bar. We also have internal compliance teams that continually monitor security control requirements from all over the world and engage with external auditors to achieve third-party validation of our services against these requirements.

In this post, I’ll cover some key strategies and best practices that we use to scale security and compliance while maintaining a culture of innovation.

Security as the foundation

At AWS, security is our top priority. Although compliance might be challenging, treating security as an integral part of everything we do at AWS makes it possible for us to adhere to a broad range of compliance programs, to document our compliance, and to successfully demonstrate our compliance status to our auditors and customers.

Over time, as the auditors get deeper into what we’re doing, we can also help improve and refine their approach, as well. This increases the depth and quality of the reports that we provide directly to our customers.

The challenge of scaling securely

Many customers struggle with balancing security, compliance, and production. These customers have applications that they want to quickly make available to their own customer base. They might need to audit these applications. The traditional process can include writing the application, putting it into production, and then having the audit team take a look to make sure it meets compliance standards. This approach can cause issues, because retroactively adding compliance requirements can result in rework and churn for the development team.

Enforcing compliance requirements in this way doesn’t scale and eventually causes more complexity and friction between teams. So how do you scale quickly and securely?

Speak their language

The first way to earn trust with development teams is to speak their language. It’s critical to use terms and references that developers use, and to know what tools they are using to develop, deploy, and secure code. It’s not efficient or realistic to ask the engineering teams to do the translation of diverse (and often vague) compliance requirements into engineering specs. The compliance teams must do the hard work of translating what is required into what specifically must be done, using language that engineers are familiar with.

Another strategy to scale is to embed compliance requirements into the way developers do their daily work. It’s important that compliance teams enable developers to do their work just as they normally do, without compliance needing to intervene. If you’re successful at that strategy—and the compliant path becomes the simplest and most natural path—then that approach can lead to a very scalable compliance program that fosters understanding between teams and increased collaboration. This approach has helped break down the barriers between the developer and audit/compliance organizations.

Treat auditors and regulators as partners

I believe that you should treat auditors and regulators as true business partners. An independent auditor or regulator understands how a wide range of customers will use the security assurance artifacts that you are producing, and therefore will have valuable insights into how your reports can best be used. I think people can fall into the trap of treating regulators as adversaries. The best approach is to communicate openly with regulators, helping them understand your business and the value you bring to your customers, and getting them ramped up on your technology and processes.

At AWS, we help auditors and regulators get ramped up in various ways. For example, we have the Digital Audit Symposium, which contains presentations on how we control and operate particular services in terms of security and compliance. We also offer the Cloud Audit Academy, a learning path that provides both cloud-agnostic and AWS-specific training to help existing and prospective auditing, risk, and compliance professionals understand how to audit regulated cloud workloads. We’ve learned that being a partner with auditors and regulators is key in scaling compliance.

Conclusion

Having security as a foundation is essential to driving and scaling compliance efforts. Speaking the language of developers helps them continue to work without disruption, and makes the simple path the compliant path. Although some barriers still exist, especially for organizations in highly regulated industries such as financial services and healthcare, treating auditors like partners is a positive strategic shift in perspective. The more proactive you are in helping them accomplish what they need, the faster you will realize the value they bring to your business.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Security Profile – Cryptography Edition: Panos Kampanakis, Principal Security Engineer

2023-04-18 Roger Park

Post Syndicated from Roger Park original https://aws.amazon.com/blogs/security/aws-security-profile-panos-kampanakis/

In the AWS Security Profile — Cryptography Edition series, we interview Amazon Web Services (AWS) thought leaders who help keep our customers safe and secure. This interview features Panos Kampanakis, Principal Security Engineer, AWS Cryptography. Panos shares thoughts on data protection, cloud security, post-quantum cryptography, and more.

What do you do in your current role and how long have you been at AWS?

I have been with AWS for two years. I started as a Technical Program Manager in AWS Cryptography, where I led some AWS Cryptography projects related to cryptographic libraries and FIPS, but I’m currently working as a Principal Security Engineer on a team that focuses on applied cryptography, research, and cryptographic software. I also participate in standardization efforts in the security space, especially in cryptographic applications. It’s a very active space that can consume as much time as you have to offer.

How did you get started in the data protection/ cryptography space? What about it piqued your interest?

I always found cybersecurity fascinating. The idea of proactively focusing on security and enabling engineers to protect their assets against malicious activity was exciting. After working in organizations that deal with network security, application security, vulnerability management, and security information sharing, I found myself going back to what I did in graduate school: applied cryptography.

Cryptography is a constantly evolving, fundamental area of security that requires breadth of technical knowledge and understanding of mathematics. It provides a challenging environment for those that like to constantly learn. Cryptography is so critical to the security and privacy of data and assets that it is top of mind for the private and public sector worldwide.

How do you explain your job to your non-tech friends?

I usually tell them that my work focuses on protecting digital assets, information, and the internet from malicious actors. With cybersecurity incidents constantly in the news, it’s an easy picture to paint. Some of my non-technical friends still joke that I work as a security guard!

What makes cryptography exciting to you?

Cryptography is fundamental to security. It’s critical for the protection of data and many other secure information use cases. It combines deep mathematical topics, data information, practical performance challenges that threaten deployments at scale, compliance with various requirements, and subtle potential security issues. It’s certainly a challenging space that keeps evolving. Post-quantum or privacy preserving cryptography are examples of areas that have gained a lot of attention recently and have been consistently growing.

Given the consistent evolution of security in general, this is an important and impactful space where you can work on challenging topics. Additionally, working in cryptography, you are surrounded by intelligent people who you can learn from.

AWS has invested in the migration to post-quantum cryptography by contributing to post-quantum key agreement and post-quantum signature schemes to protect the confidentiality, integrity, and authenticity of customer data. What should customers do to prepare for post-quantum cryptography?

There are a few things that customers can do while waiting for the ratification of the new quantum-safe algorithms and their deployment. For example, you can inventory the use of asymmetric cryptography in your applications and software. Admittedly, this is not a simple task, but with proper subject matter expertise and instrumentation where necessary, you can identify where you’re using quantum-vulnerable algorithms in order to prioritize the uses. AWS is doing this exercise to have a prioritized plan for the upcoming migration.

You can also study and experiment with the potential impact of these new algorithms in critical use cases. There have been many studies on transport protocols like TLS, virtual private networks (VPNs), Secure Shell (SSH), and QUIC, but organizations might have unique uses that haven’t been accounted for yet. For example, a firm that specializes in document signing might require efficient signature methods with small size constraints, so deploying Dilithium, NIST’s preferred quantum-safe signature, could come at a cost. Evaluating its impact and performance implications would be important. If you write your own crypto software, you can also strive for algorithm agility, which would allow you to swap in new algorithms when they become available.

More importantly, you should push your vendors, your hardware suppliers, the software and open-source community, and cloud providers to adjust and enable their solutions to become quantum-safe in the near future.

What’s been the most dramatic change you’ve seen in the data protection and post-quantum cryptography landscape?

The transition from typical cryptographic algorithms to ones that can operate on encrypted data is an important shift in the last decade. This is a field that’s still seeing great development. It’s interesting how the power of data has brought forward a whole new area of being able to operate on encrypted information so that we can benefit from the analytics. For more information on the work that AWS is doing in this space, see Cryptographic Computing.

In terms of post-quantum cryptography, it’s exciting to see how an important potential risk brought a community from academia, industry, and research together to collaborate and bring new schemes to life. It’s also interesting how existing cryptography has reached optimal efficiency levels that the new cryptographic primitives sometimes cannot meet, which pushes the industry to reconsider some of our uses. Sometimes the industry might overestimate the potential impact of quantum computing to technology, but I don’t believe we should disregard the effect of heavier algorithms on performance, our carbon footprint, energy consumption, and cost. We ought to aim for efficient solutions that don’t undermine security.

Where do you see post-quantum cryptography heading in the future?

Post-quantum cryptography has received a lot of attention, and a transition is about to start ramping up after we have ratified algorithms. Although it’s sometimes considered a Herculian effort, some use cases can transition smoothly.

AWS and other industry peers and researchers have already evaluated some post-quantum migration strategies. With proper prioritization and focus, we can address some of the most important applications and gradually transition the rest. There might be some applications that will have no clear path to a post-quantum future, but most will. At AWS, we are committed to making the transitions necessary to protect our customer data against future threats.

What are you currently working on that you look forward to sharing with customers’?

I’m currently focused on bringing post-quantum algorithms to our customers’ cryptographic use cases. I’m looking into the challenges that this upcoming migration will bring and participating in standards and industry collaborations that will hopefully enable a simpler transition for everyone.

I also engage on various topics with our cryptographic libraries teams (for example, AWS-LC and s2n-tls). We build these libraries with security and performance in mind, and they are used in software across AWS.

Additionally, I work with some AWS service teams to help enable compliance with various cryptographic requirements and regulations.

Is there something you wish customers would ask you about more often?

I wish customers asked more often about provable security and how to integrate such solutions in their software. This is a fascinating field that can prevent serious issues where cryptography can go wrong. It’s a complicated topic. I would like for customers to become more aware of the importance of provable security especially in open-source software before adopting it in their solutions. Using provably secure software that is designed for performance and compliance with crypto requirements is beneficial to everyone.

I also wish customers asked more about why AWS made certain choices when deploying new mechanisms. In areas of active research, it’s often simpler to experimentally build a proof-of-concept of a new mechanism and test and prove its performance in a controlled benchmark scenario. On the other hand, it’s usually not trivial to deploy new solutions at scale (especially given the size and technological breadth of AWS), to help ensure backwards compatibility, commit to supporting these solutions in the long run, and make sure they’re suitable for various uses. I wish I had more opportunities to go over with customers the effort that goes into vetting and deploying new mechanisms at scale.

You have frequently contributed to cybersecurity publications, what is your favorite recent article and why?

I’m excited about a vision paper that I co-authored with Tancrède Lepoint called Do we need to change some things? Open questions posed by the upcoming post-quantum migration to existing standards and deployments. We are presenting this paper at the Security Standardisation Research Conference 2023. The paper discussed some open questions posed by the upcoming post-quantum transition. It also proposed some standards updates and research topics on cryptographic issues that we haven’t addressed yet.

How about outside of work—any hobbies?

I used to play basketball when I was younger, but I no longer have time. I spend most of my time with my family and little toddlers who have infinite amounts of energy. When I find an opportunity, I like reading books and short stories or watching quality films.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Security Profile: Ryan Dsouza, Principal Solutions Architect

2023-04-14 Maddie Bacon

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws-security-profile-ryan-dsouza-principal-solutions-architect/

In the AWS Security Profile series, I interview some of the humans who work in Amazon Web Services Security and help keep our customers safe and secure. This interview is with Ryan Dsouza, Principal Solutions Architect for industrial internet of things (IIoT) security.

How long have you been at AWS and what do you do in your current role?

I’ve been with AWS for over five years and have held several positions working with customers, AWS Partner Network partners, and standards organizations on IoT and IIoT solutions. Currently, I’m a Principal Solutions Architect for IIoT security. In this role, I’m the global technical leader and subject matter expert for operational technology (OT) and IIoT security, which means that I lead our OT/IIoT strategy and roadmap, translate customer requirements into technical solutions, and work with industry standards such as ISA/IEC 62443 to support IIoT and cloud technologies. I also work with our strategic OT/IIoT security partners to design and build integrations and solutions on AWS. And I work with some of our strategic customers to help them plan, assess, and manage the risk that comes from OT/IT convergence and to design, build, and operate more secure, scalable, and innovative IIoT solutions by using AWS capabilities to deliver measurable business outcomes.

How did you get started in the world of OT and IIoT security?

I’ve been working with OT for more than 25 years and with IIoT, for the last 10 years. I’ve led digital transformation initiatives for numerous world-class organizations including Accenture, Siemens, General Electric, IBM, and AECOM, serving customers on their digital transformation initiatives across a wide range of industry verticals such as manufacturing, buildings, utilities, smart cities, and more.

Throughout my career, I witnessed devices across critical infrastructure sectors, such as water, manufacturing, electricity, transportation, oil and gas, and buildings, getting digitized and connected to the internet. I quickly realized that this trend of connected assets and digitization will continue to grow and could outstrip the supply of cybersecurity professionals. Each customer that embraces the digital world faces cybersecurity challenges. At AWS, I work with customers to understand these challenges and provide prescriptive and practical guidance on how to secure their OT environments and IIoT solutions to help ensure safe and secure digital transformation.

What makes OT security different from information technology (IT) security?

OT and IT security are two distinct areas of security that are designed to protect different types of systems and assets. OT security is concerned with the protection of industrial control systems and other related operational technology, such as supervisory control and data acquisition (SCADA) systems, which are used to control and monitor physical processes in critical infrastructure industries such as manufacturing, energy, transportation, buildings, and utilities. The main focus of OT security is on the availability, integrity, safety, and reliability of these systems, as well as protection of the physical equipment that is being controlled. OT cybersecurity supports the safe operation of critical infrastructure. IT security, on the other hand, is concerned with the protection of computer systems, networks, and data from cyberthreats such as hacking, malware, and phishing attempts. The main focus of IT security is on the confidentiality, integrity, and availability of information and systems.

As a result of OT/IT convergence, IIoT, and the industrial digital transformation, our customers now must secure an increasing attack surface and overlapping IT and OT environments. They realize that it is business critical to secure OT/IIoT systems to avoid security events that could cause unplanned downtime and pose a safety risk. I refer to this as “securing cyber-physical systems and enabling safe and secure industrial digital transformation.”

How do you explain your job to your non-tech friends?

I explain that OT is used in buildings, manufacturing, utilities, transportation, and more, and when these systems connect to the internet, they’re exposed to risks. The risks are the same as those faced by IoT devices in our own homes and workplaces—but with greater consequences if compromised because these systems deal with critical infrastructure that our society relies on. I often share the Colonial Pipeline example and explain that I help AWS customers understand the risks and the consequences from a compromise, and design cybersecurity solutions to protect these critical infrastructure assets.

What are you currently working on that you’re excited about?

Our customers use lots of security tools from lots of different vendors. Security is a team sport, and I’m really excited to be working with customers, APN partners, and AWS service teams to build security features and product integrations that make it simpler for customers to monitor and secure OT, IIoT, and the cloud. For example, I’m working with our APN security partners to build integrations with AWS Security Hub and Amazon Security Lake, bring zero trust security solutions to OT environments, and improve security at the industrial edge.

Another project that I’m super excited about is bringing OT/IIoT security solutions to our critical infrastructure customers, including small and mid-sized organizations, by simplifying the deployment, management, procurement, and payment process so that customers can get more value from these AWS security solutions faster.

Another area of focus for me is tracking the fast-evolving critical infrastructure cybersecurity regulations, how they impact our customers, and the role that AWS can play to make it simpler for customers to align with these new security and compliance requirements.

Just like how the cloud transformed IT, I think the cloud will continue to revolutionize OT, and I’m super excited and energized to work with customers and APN partners to move OT and IIoT applications to the cloud and build nearly anything they can imagine faster and more cost-effectively on AWS.

What are the biggest challenges in securing critical infrastructure systems?

With critical infrastructure, the biggest challenge is legacy OT systems that may not have been designed with cybersecurity in mind and that use older operating systems and software, which can be difficult to upgrade and patch. These systems were designed to operate in an air-gapped environment, but there is a growing trend to connect them in new ways to IT systems. As IT and OT converge to support expanding business needs, air-gapped devices and perimeter security are no longer sufficient to address and defend against modern threats such as ransomware, data exfiltration, denial of service, and cryptocurrency mining. As OT and IT converge and OT becomes more cloud connected, the biggest challenge is to secure critical infrastructure that uses legacy and aging industrial control systems (ICS) and OT technology. We are seeing a trend to keep ICS/OT systems connected, but in smarter and more secure ways by using network segmentation, edge gateways, and the hybrid cloud so that if a problem occurs, you can still run the most important systems in an isolated and disconnected mode. For example, if your corporate systems are compromised with ransomware, you can disconnect your critical infrastructure systems from the external world and continue the most critical operations. There is a growing need to design innovative and highly distributed solution patterns to keep critical information and hybrid systems safe and secure. This is an area of focus for me at AWS.

What else can enterprises do to manage OT/IT convergence and protect themselves from these security risks?

I’ve done multiple presentations, blog posts, and whitepapers on this topic, and even if the solutions sound simple, they can be challenging to implement in industrial environments. I recommend reading the blog posts Managing Organization Transformation for Successful OT/IT Convergence and Assessing OT and IIoT cybersecurity risk, and implementing the Ten security golden rules for IIoT solutions. AWS offers lots of prescriptive guidance and solutions to help enterprises more safely and securely manage OT/IT convergence and mitigate risk with proper planning and implementation across the various aspects of business—people, processes, and technology. I encourage customers to start by focusing on the security fundamentals of securing identities, assessing their risk from OT/IT convergence, and improving their visibility into devices on the network and across the converged OT and IT environment. I also recommend using standards such as ISA/IEC 62443, which are comprehensive, consensus-based, and form a strong basis for securing critical infrastructure systems.

What skills do professionals need to be successful in critical infrastructure security?

Critical infrastructure security sounds harder than it really is. When I train people, I break it down into bite-sized pieces that are simple to understand and implement. There is some mystery around cybersecurity, but it’s just a lot of small parts. You must learn what all the parts are, what the acronyms are, and how they fit together to form cyber-physical systems. When I describe it in a real-world application, most people pick it up quickly.

Curiosity and a desire to continue learning are important characteristics to have, because cybersecurity is a fast-evolving technology field. Empathy is also important because to secure a system, you must have empathy for the people behind the work and why their goals and needs are important. For example, in the OT world, you have operations folks who just want the thing to work. If an alarm is going off on their computer screen and they must react by clicking a button, they don’t want their screen to lock them out so they can’t click that button, because this could cause the plant to have big problems. So, you need to design a solution that matches user access controls with roles and responsibilities so that a plant operator can take corrective actions in an emergency situation.

Another example is patching critical OT systems that have vulnerabilities. This may not be possible due to the risk of causing unplanned downtime, and it could pose a safety risk or result in additional time and cost for recertification due to compliance requirements. You must have empathy for the people in this situation and their needs, and then, as a security professional, design around that so they can still have those things but in a more secure way. For example, you might need to create mechanisms to identify, network isolate, or replace legacy devices that aren’t capable of receiving updates. If you are detail-oriented and have strong curiosity and empathy, you can succeed in the field of critical infrastructure cybersecurity.

What’s your favorite Amazon Leadership Principle, and why?

I have two favorite leadership principles: Learn and be Curious; and one that I initially discounted, Frugality. I believe that the best way to predict the future is to invent it, which is why I’m never done learning and seeking new ways to solve problems.

My view on the Frugality leadership principle is that we need to be frugal with each other’s time. There are so many competing demands on everyone’s time, and it’s important in a place like AWS to be mindful of that. Make sure you’ve done your due diligence on something before you broadly ask the question or escalate. Being frugal in my view is about being self-sufficient, learning to use self-service tools, and working with limited time or resources to deliver results.

I wake up every morning with the conviction that the world is always changing, and that, to succeed, I have to change faster by learning new skills and being frugal with time and resources.

What’s the thing you’re most proud of in your career?

I’m really proud of working with critical infrastructure customers across a diverse range of industries over the last 25 years and supporting their digital transformation initiatives. In the early part of my career, I was a design and commissioning engineer of industrial automation systems. In this role, I had the opportunity to design and commission new industrial plants and get them into operation, which was extremely fulfilling. I feel fortunate to have joined a company like AWS that takes cybersecurity seriously in developing its products and cloud services, and I’m proud to bring real-world experience in the design and security of cyber-physical systems to our critical infrastructure customers.

If you had to pick an industry outside of engineering, what would you want to do?

Growing up in India in a family of engineers and doctors, there were only two options: engineer or doctor. Both professions have the ability to change the world. Because my mother and brother worked at Siemens, I pursued a career in engineering. If I had to pick an industry outside of engineering, it would have been in the medical field.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Security Profile: Matt Luttrell, Principal Solutions Architect for AWS Identity

2023-04-12 Maddie Bacon

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws-security-profile-matt-luttrell-principal-solutions-architect-for-aws-identity/

In the AWS Security Profile series, I interview some of the humans who work in Amazon Web Services Security and help keep our customers safe and secure. In this profile, I interviewed Matt Luttrell, Principal Solutions Architect for AWS Identity.

How long have you been at AWS and what do you do in your current role?

I’ve been at AWS around five years and have worked in a variety of roles from Professional Services consulting as an application architect to a solutions architect. In my current role, I work on the Identity Solutions team, which is a group of solutions architects who are embedded directly in the Identity and Control Services team. We have both internal-facing and external-facing functions. Internally, we work with product managers, drive concepts like data perimeters, and generally act as the voice of the customer to our product teams. Externally, we have conversations with customers, present at events, and so on.

How did you get started in security?

My background is in software development. I’ve always had a side interest in security and have always worked for very security-conscious companies. Early in my career, I became CISSP certified and that’s what got me kickstarted in security-specific domains and conversations. At AWS, being involved in security isn’t an optional thing. So, even before I joined the Identity Solutions team, I spent a lot of time working on identity and AWS Identity and Access Management (IAM) in particular, as well as AWS IAM Access Analyzer, while working with security-conscious customers in the financial services industry. As I got involved in that, I was able to dive deep in the security elements of AWS, but I’ve always had a background in security.

How do you explain your job to non-technical friends and family?

I typically tell them that I work in the cloud computing division at Amazon and that my job title is Solutions Architect. Naturally, the next question is, “what does a solutions architect do? I’ve never heard of that.” I explain that I work with customers to figure out how to put the building blocks together that we offer them. We offer a bunch of different services and features, and my job is to teach customers how they all work and interact with each other.

What are you currently working on that you’re excited about?

One of the things our team is working on is data perimeters. Our customers will see continued guidance on data perimeters. We’ve done a lot of work in this space—workshops and presentations at some of our big conferences, as well as blog posts and example repositories.

I’m also putting together some videos that go in depth on IAM policy evaluation and offer prescriptive guidance on writing IAM policies.

In your opinion, what’s one of the coolest things happening in identity right now?

I might be biased here, but I think there’s been a shift in the security industry at large from network-based perimeters in the traditional on-premises world to identity-based perimeters in the cloud. This is where the concept of data perimeters comes into play. Because your resources and identities are distributed, you can no longer look at your server and touch your server that’s sitting right next to you. This really puts an extra emphasis on your authentication and authorization controls, as well as the need for visibility into those controls. I think there’s a lot of innovation happening in the identity world because of this increased focus on identity perimeters. You’re hearing about concepts in this area like zero trust, data perimeters, and general identity awareness in all levels of the application and infrastructure stacks. You have services like IAM Access Analyzer to help give you that visibility into your AWS environment and what your identities are doing in terms of who can access what. I think we’ll continue to see growth in these areas because workloads are not becoming less distributed over time.

Tell me about something fun that you’ve done recently at AWS.

Roberto Migli and I presented a 400-level workshop at re:Invent 2022 on IAM policy evaluation, AWS Identity and Access Management (IAM) policy evaluation in action. This workshop introduced a new mental model for thinking about policy evaluation and walked attendees through a number of different policy evaluation scenarios. The idea behind the workshop is that we introduce a scenario and have the attendee try to figure out what the result of the evaluation would be. It spends some extra time comparing how the evaluation of resource-based policies differs from that of identity-based policies. I hope attendees walked away with a better understanding of how policy evaluations work at a deeper level and how they can write better, more secure IAM policies. We presented practical advice on how to structure different types of IAM policies and the different tradeoffs when writing a policy one way compared to another. I hope the mental model we introduced helps customers better reason about how policies will evaluate when they write them in their environment.

What is your favorite Amazon Leadership Principle and why?

This is an easy one. For me, it’s definitely Learn and Be Curious. Something I try to do is put myself in uncomfortable situations because I feel that when I’m uncomfortable, I’m learning and growing because it means I don’t know something. I find comfortable situations boring at times, so I’m always trying to dig in and learn how things work. This can sometimes be distracting, too, because there’s so much to learn and understand in the identity world.

What’s the thing you’re most proud of in your career?

There’s no particular project that I can point to and say, “this is what I’m most proud of.” I’m proud to be a part of the team I’m on now. For my team, Customer Obsession is more than just a slogan. We really advocate on behalf of the customer, listen to the voice of the customer, and push back on features that might not be the best thing for the customer. I think it’s awesome that I get to work for a company that really does advocate on behalf of the customer, and that my voice is heard when I’m trying to be that advocate. That aspect of working at AWS and with my team is what I’m most proud of.

I’m also proud of the mentoring and teaching that I get to do within AWS and within my role specifically. It’s really fulfilling to watch somebody grow and realize that career growth is not a zero-sum game—just because someone else succeeds does not mean that I have to fail.

If you had to pick an industry outside of security, what would you want to do?

I’d probably choose to be a ski instructor. I’m a big fan of skiing, but I don’t get to ski very often because of where I live. I love being out on the mountains, skiing, and teaching. I’m looking for any excuse to spend my days in the mountains.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Let’s Architect! Monitoring production systems at scale

2023-04-12 Vittorio Denti

Post Syndicated from Vittorio Denti original https://aws.amazon.com/blogs/architecture/lets-architect-monitoring-production-systems-at-scale/

“Everything fails, all the time” is a famous quote from Amazon’s Chief Technology Officer Werner Vogels. This means that software and distributed systems may eventually fail because something can always go wrong. We have to accept this and design our systems accordingly, test our software and services, and think about all the possible edge cases.

With this in mind, we should also set our teams up for success by providing visibility in every environment for a quick turnaround when incidents happen. When a system serves traffic in production, we need to monitor it to make sure it behaves as expected and that all components are healthy. But questions arise such as:

How do we monitor a system?
What is monitoring?
What are some architectural and engineering approaches to implement in order to design a successful monitoring strategy?

All of these questions require complex answers. It’s not possible to cover everything in a blog post, but let’s start exploring the topic and sharing resources to guide you through this domain.

In this edition of Let’s Architect! we share some practices for monitoring used at Amazon and AWS, as well as more resources to discover how to build monitoring solutions for the workloads running on AWS.

Observability best practices at Amazon

Observability and monitoring are engineering tasks that also require putting a suitable cultural mindset in place. At Amazon, if a service doesn’t run as expected, the team writes a CoE (Correction of Errors) document to analyze the issue and answer critical questions to learn from it. There are also weekly operations meetings to analyze operational and performance dashboards for each service.

The session introduced here covers the full range of monitoring at Amazon, from how teams assess system health at a high level to how they understand the details of a single request. Use this resource to learn some best practices for metrics, logs, and tracing, and using these signals to achieve operational excellence.

Take me to this re:Invent video!

Observability is an iterative process which requires us to establish a feedback loop and improve based on the signals coming from the system.

Build an observability solution using managed AWS services and the OpenTelemetry standard

Visibility of what’s happening in a distributed system is key to operationalize workloads at scale. OpenTelemetry is the standard for observability and AWS services are fully integrated with that. The blog post introduced in this section shows you how AWS Distro for OpenTelemetry (ADOT) works under the hood and how to use it with a Kubernetes cluster. But keep in mind, this is just one of the many implementations available for AWS compute services and OpenTelemetry—so even if you’re not using Kubernetes right now, we’ve still got you covered!

Want more? Watch this re:Invent video for an understanding of how to think about logging, tracing, metrics, and monitoring with AWS services, and the possibilities to provide the observability your distributed systems need. This is a great learning resource with many demos and examples.

Take me to this blog post!

Flow of metrics and traces from Application services to the Observability Platform.

Optimizing your AWS Batch architecture for scale with observability dashboards

We’ve explored the mental models and strategies for monitoring in previous resources. Now let’s see how these principles can be applied in a scenario where we run batch and ML computing jobs at scale. In the blog post introduced in this section, you can learn how to use runtime metrics to understand an architecture designed on AWS Batch for running batch computing jobs. AWS Batch is a fully managed service enabling you to run jobs at any scale without needing to manage underlying compute resources. This blog explains how AWS Batch works and guides you through the process used to design a monitoring framework.

Since the solution is open-source, you are free to add other custom metrics you find useful. To get started with the AWS Batch open-source observability solution, visit the project page on GitHub. Several customers have used this monitoring tool to optimize their workload for scale by reshaping their jobs, refining their instance selection, and tuning their AWS Batch architecture.

Take me to this blog!

High-level structure of AWS Batch resources and interactions. This diagram depicts a user submitting jobs based on a job definition template to a job queue, which then communicates to a compute environment that resources are needed.

Observability workshop

This resource provides a hands-on experience for you on the variety of toolsets AWS offers to set up monitoring and observability on your applications. Whether your workload is on-premises or on AWS—or your application is a giant monolith or based on modern microservices-based architecture—the observability tools can provide deeper insights into application performance and health.

The monitoring tools covered in this workshop provide powerful capabilities that enable you to identify bottlenecks, issues, and defects without having to manually sift through various logs, metrics, and trace data.

Take me to this workshop!

The diagram illustrates the various components of the PetAdoptions architecture. In the workshop you will learn how to monitor this application.

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about containers on AWS.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Let’s Architect! Streamlining business with migration and modernization

2023-03-29 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-streamlining-business-with-migration-and-modernization/

Many customers migrate their systems to Amazon Web Services (AWS) to increase their competitive edge and drive business value. To maximize the benefits of a cloud migration, companies tend to move their applications in conjunction with modernization initiatives. These joined efforts help your applications gain more agility, scalability, and resilience. Modernizing the portfolio of workloads with AWS means that you can re-platform, refactor, or replace these workloads by using containers, serverless technologies, purpose-built data stores, and software automation. These functionalities allow you to benefit from the best of the AWS agility and total cost optimization (TCO) benefits.

In this edition of Let’s Architect! we share hands-on activities, customer stories, and tips and tricks to migrate and modernize your applications with AWS.

Migrating to the cloud: What is the cost of doing nothing?

Would you think that small companies always migrate faster than large enterprises? Actually, cloud migration speed doesn’t necessarily depend on the size of the business! Company size is not a clear indicator of migration and modernization success, but a shift of culture and mindset is essential for successful company evolution.

When it comes to migration, the cost of doing nothing is not just financial: Businesses can also expect a slower pace of innovation and a higher security burden. This video analyzes the financial benefits of migration and shares mental models for approaching an AWS cloud migration, and Marriott team members explain how they planned their migration and the lessons learned along the way.

Take me to this re:Invent 2022 video!

Benefits of an early migration start

Modernization pathways for a legacy .NET Framework monolithic application on AWS

Organizations aim to deliver the best technological solutions based on customer needs. At any stage in their cloud adoption journey, businesses often end up managing and building monolithic applications. Let’s explore a migration path for a monolithic .NET Framework application to a modern microservices-based stack on AWS, and discuss AWS tools to break the monolith into microservices and containerize applications.

Cost optimization is another key factor for modernizing your workloads and solutions include moving to Linux-based systems or using open-source database engines. This Migrate and Modernize enterprise workloads with AWS video walks you through the process of migrating and modernizing enterprise workloads with AWS.

Take me to this blog post with more detail!

A modernized microservices-based rearchitecture

Implementing a serverless-first strategy in an enterprise

Organizations of all sizes want to benefit from the agility, cost savings, and developer experience that serverless architectures can provide on AWS. For large enterprises, the return on investment (ROI) can be massive, but overcoming architecture inertia while ensuring security best practices and governance stay in place is a hurdle that many struggle with. In this lightning talk, learn how your organization can implement a serverless-first strategy to overcome these obstacles. Delta Air Lines shares the story of making serverless-first a reality as part of their AWS journey.

Take me to this video

Benefits of serverless

Application Migration with AWS

This workshop shows you how to migrate and modernize a fictional application to the AWS Cloud by:

Performing a database migration
Migrating and modernizing your web server using different migration strategies (for example, breaking down the monolith into containers)
Teaching you how to improve Operation excellence, Security, Performance efficiency, and Cost optimization of the deployed architecture by following these pillars of the AWS Well-Architected Framework.

Take me to this workshop!

Different migration strategies for web servers

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about distributed systems with containers.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

AWS Week in Review – March 20, 2023

2023-03-21 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-march-20-2023/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

A new week starts, and Spring is almost here! If you’re curious about AWS news from the previous seven days, I got you covered.

Last Week’s Launches
Here are the launches that got my attention last week:

Amazon S3 – Last week there was AWS Pi Day 2023 celebrating 17 years of innovation since Amazon S3 was introduced on March 14, 2006. For the occasion, the team released many new capabilities:

S3 Object Lambda now provides aliases that are interchangeable with bucket names and can be used with Amazon CloudFront to tailor content for end users.
S3 now support datasets that are replicated across multiple AWS accounts with cross-account support for S3 Multi-Region Access Points.
You can now create and configure replication rules to automatically replicate S3 objects from one AWS Outpost to another.
Amazon S3 has also simplified private connectivity from on-premises networks: with private DNS for S3, on-premises applications can use AWS PrivateLink to access S3 over an interface endpoint, while requests from your in-VPC applications access S3 using gateway endpoints.
We released Mountpoint for Amazon S3, a high performance open source file client. Read more in the blog. Note that Mountpoint isn’t a general-purpose networked file system, and comes with some restrictions on file operations.

Amazon Linux 2023 – Our new Linux-based operating system is now generally available. Sébastien’s post is full of tips and info.

Application Auto Scaling – Now can use arithmetic operations and mathematical functions to customize the metrics used with Target Tracking policies. You can use it to scale based on your own application-specific metrics. Read how it works with Amazon ECS services.

AWS Data Exchange for Amazon S3 is now generally available – You can now share and find data files directly from S3 buckets, without the need to create or manage copies of the data.

Amazon Neptune – Now offers a graph summary API to help understand important metadata about property graphs (PG) and resource description framework (RDF) graphs. Neptune added support for Slow Query Logs to help identify queries that need performance tuning.

Amazon OpenSearch Service – The team introduced security analytics that provides new threat monitoring, detection, and alerting features. The service now supports OpenSearch version 2.5 that adds several new features such as support for Point in Time Search and improvements to observability and geospatial functionality.

AWS Lake Formation and Apache Hive on Amazon EMR – Introduced fine-grained access controls that allow data administrators to define and enforce fine-grained table and column level security for customers accessing data via Apache Hive running on Amazon EMR.

Amazon EC2 M1 Mac Instances – You can now update guest environments to a specific or the latest macOS version without having to tear down and recreate the existing macOS environments.

AWS Chatbot – Now Integrates With Microsoft Teams to simplify the way you troubleshoot and operate your AWS resources.

Amazon GuardDuty RDS Protection for Amazon Aurora – Now generally available to help profile and monitor access activity to Aurora databases in your AWS account without impacting database performance

AWS Database Migration Service – Now supports validation to ensure that data is migrated accurately to S3 and can now generate an AWS Glue Data Catalog when migrating to S3.

AWS Backup – You can now back up and restore virtual machines running on VMware vSphere 8 and with multiple vNICs.

Amazon Kendra – There are new connectors to index documents and search for information across these new content: Confluence Server, Confluence Cloud, Microsoft SharePoint OnPrem, Microsoft SharePoint Cloud. This post shows how to use the Amazon Kendra connector for Microsoft Teams.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
A few more blog posts you might have missed:

Women founders Q&A – We’re talking to six women founders and leaders about how they’re making impacts in their communities, industries, and beyond.

What you missed at that 2023 IMAGINE: Nonprofit conference – Where hundreds of nonprofit leaders, technologists, and innovators gathered to learn and share how AWS can drive a positive impact for people and the planet.

Monitoring load balancers using Amazon CloudWatch anomaly detection alarms – The metrics emitted by load balancers provide crucial and unique insight into service health, service performance, and end-to-end network performance.

Extend geospatial queries in Amazon Athena with user-defined functions (UDFs) and AWS Lambda – Using a solution based on Uber’s Hexagonal Hierarchical Spatial Index (H3) to divide the globe into equally-sized hexagons.

How cities can use transport data to reduce pollution and increase safety – A guest post by Rikesh Shah, outgoing head of open innovation at Transport for London.

For AWS open-source news and updates, here’s the latest newsletter curated by Ricardo to bring you the most recent updates on open-source projects, posts, events, and more.

Upcoming AWS Events
Here are some opportunities to meet:

AWS Public Sector Day 2023 (March 21, London, UK) – An event dedicated to helping public sector organizations use technology to achieve more with less through the current challenging conditions.

Women in Tech at Skills Center Arlington (March 23, VA, USA) – Let’s celebrate the history and legacy of women in tech.

The AWS Summits season is warming up! You can sign up here to know when registration opens in your area.

That’s all from me for this week. Come back next Monday for another Week in Review!

— Danilo

How to choose the right Amazon MSK cluster type for you

2023-03-13 Ali Alemi

Post Syndicated from Ali Alemi original https://aws.amazon.com/blogs/big-data/how-to-choose-the-right-amazon-msk-cluster-type-for-you/

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is an AWS streaming data service that manages Apache Kafka infrastructure and operations, making it easy for developers and DevOps managers to run Apache Kafka applications and Kafka Connect connectors on AWS, without the need to become experts in operating Apache Kafka. Amazon MSK operates, maintains, and scales Apache Kafka clusters, provides enterprise-grade security features out of the box, and has built-in AWS integrations that accelerate development of streaming data applications. You can easily get started by creating an MSK cluster using the AWS Management Console with a few clicks.

When creating a cluster, you must choose a cluster type from two options: provisioned or serverless. Choosing the best cluster type for each workload depends on the type of workload and your DevOps preferences. Amazon MSK provisioned clusters offer more flexibility in how you scale, configure, and optimize your cluster. Amazon MSK Serverless, on the other hand, makes scaling, load management, and operation of the cluster easier for you. With MSK Serverless, you can run your applications without having to configure, manage the infrastructure, or optimize clusters, and you pay for the data volume you stream and retain. MSK Serverless fully manages partitions, including monitoring as well as ensuring an even balance of partition distribution across brokers in the cluster (auto-balancing).

In this post, I examine a use case with the fictitious company AnyCompany, who plans to use Amazon MSK for two applications. They must decide between provisioned or serverless cluster types. I describe a process by which they work backward from the applications’ requirements to find the best MSK cluster type for their workloads, including how the organizational structure and application requirements are relevant in finding the best offering. Lastly, I examine the requirements and their relationship to Amazon MSK features.

Use case

AnyCompany is an enterprise organization that is ready to move two of their Kafka applications to Amazon MSK.

The first is a large ecommerce platform, which is a legacy application that currently uses a self-managed Apache Kafka cluster run in their data centers. AnyCompany wants to migrate this application to the AWS Cloud and use Amazon MSK to reduce maintenance and operations overhead. AnyCompany has a DataOps team that has been operating self-managed Kafka clusters in their data centers for years. AnyCompany wants to continue using the DataOps team to manage the MSK cluster on behalf of the development team. There is very little flexibility for code changes. For example, a few modules of the application require plaintext communication and access to the Apache ZooKeeper cluster that comes with an MSK cluster. The ingress throughput for this application doesn’t fluctuate often. The ecommerce platform only experiences a surge in user activity during special sales events. The DataOps team has a good understanding of this application’s traffic pattern, and are confident that they can optimize an MSK cluster by setting some custom broker-level configurations.

The second application is a new cloud-native gaming application currently in development. AnyCompany hopes to launch this gaming application soon followed by a marketing campaign. Throughput needs for this application are unknown. The application is expected to receive high traffic initially, then user activity should decline gradually. Because the application is going to launch first in the US, traffic during the day is expected to be higher than at night. This application offers a lot of flexibility in terms of Kafka client version, encryption in transit, and authentication. Because this is a cloud-native application, AnyCompany hopes they can delegate full ownership of its infrastructure to the development team.

Solution overview

Let’s examine a process that helps AnyCompany decide between the two Amazon MSK offerings. The following diagram shows this process at a high level.

In the following sections, I explain each step in detail and the relevant information that AnyCompany needs to collect before they make a decision.

Competency in Apache Kafka

AWS recommends a list of best practices to follow when using the Amazon MSK provisioned offering. Amazon MSK provisioned, offers more flexibility so you make scaling decisions based on what’s best for your workloads. For example, you can save on cost by consolidating a group of workloads into a single cluster. You can decide which metrics are important to monitor and optimize your cluster through applying custom configurations to your brokers. You can choose your Apache Kafka version, among different supported versions, and decide when to upgrade to a new version. Amazon MSK takes care of applying your configuration and upgrading each broker in a rolling fashion.

With more flexibility, you have more responsibilities. You need to make sure your cluster is right-sized at any time. You can achieve this by monitoring a set of cluster-level, broker-level, and topic-level metrics to ensure you have enough resources that are needed for your throughput. You also need to make sure the number of partitions assigned to each broker doesn’t exceed the numbers suggested by Amazon MSK. If partitions are unbalanced, you need to even-load them across all brokers. If you have more partitions than recommended, you need to either upgrade brokers to a larger size or increase the number of brokers in your cluster. There are also best practices for the number of TCP connections when using AWS Identity and Access Management (IAM) authentication.

An MSK Serverless cluster takes away the complexity of right-sizing clusters and balancing partitions across brokers. This makes it easy for developers to focus on writing application code.

AnyCompany has an experienced DataOps team who are familiar with scaling operations and best practices for the MSK provisioned cluster type. AnyCompany can use their DataOps team’s Kafka expertise for building automations and easy-to-follow standard procedures on behalf of the ecommerce application team. The gaming development team is an exception, because they are expected to take the full ownership of the infrastructure.

In the following sections, I discuss other steps in the process before deciding which cluster type is right for each application.

Custom configuration

In certain use cases, you need to configure your MSK cluster differently from its default settings. This could be due to your application requirements. For example, AnyCompany’s ecommerce platform requires setting up brokers such that the default retention period for all topics is set to 72 hours. Also, topics should be or auto-created when they are requested and don’t exist.

The Amazon MSK provisioned offering provides a default configuration for brokers, topics, and Apache ZooKeeper nodes. It also allows you to create custom configurations and use them to create new MSK clusters or update existing clusters. An MSK cluster configuration consists of a set of properties and their corresponding values.

MSK Serverless doesn’t allow applying broker-level configuration. This is because AWS takes care of configuring and managing the backend nodes. It takes away the heavy lifting of configuring the broker nodes. You only need to manage your applications’ topics. To learn more, refer to the list of topic-level configurations that MSK Serverless allows you to change.

Unlike the ecommerce platform, AnyCompany’s gaming application doesn’t need broker-level custom configuration. The developers want to set the retention.ms and max.message.bytes per each topic only.

Application requirements

Apache Kafka applications differ in terms of their security; the way they connect, write, or read data; data retention period; and scaling patterns. For example, some applications can only scale vertically, whereas other applications can scale only horizontally. Although a flexible application can work with encryption in transit, a legacy application may only be able to communicate in plaintext format.

Cluster-level quotas

Amazon MSK enforces some quotas to ensure the performance, reliability, and availability of the service for all customers. These quotas are subject to change at any time. To access the latest values for each dimension, refer to Amazon MSK quota. Note that some of the quotas are soft limits and can be increased using a support ticket.

When choosing a cluster type in Amazon MSK, it’s important to understand your application requirements and compare those against quotas in relation with each offering. This makes sure you choose the best cluster type that meets your goals and application’s needs. Let’s examine how you can calculate the throughput you need and other important dimensions you need to compare with Amazon MSK quotas:

Number of clusters per account – Amazon MSK may have quotas for how many clusters you can create in a single AWS account. If this is limiting your ability to create more clusters, you can consider creating those in multiple AWS accounts and using secure connectivity patterns to provide access to your applications.
Message size – You need to make sure the maximum message size that your producer writes for a single message is lower than the configured size in the MSK cluster. MSK provisioned clusters allow you to change the default value in a custom configuration. If you choose MSK Serverless, check this value in Amazon MSK quota. The average message size is helpful when calculating the total ingress or egress throughput of the cluster, which I demonstrate later in this post.
Message rate per second – This directly influences total ingress and egress throughput of the cluster. Total ingress throughput equals the message rate per second multiplied by message size. You need to make sure your producer is configured for optimal throughput by adjusting batch.size and linger.ms properties. If you’re choosing MSK Serverless, you need to make sure you configure your producer to optimal batches with the rate that is lower than its request rate quota.
Number of consumer groups – This directly influences the total egress throughput of the cluster. Total egress throughput equals the ingress throughput multiplied by the number of consumer groups. If you’re choosing MSK Serverless, you need to make sure your application can work with these quotas.
Maximum number of partitions – Amazon MSK provisioned recommends not exceeding certain limits per broker (depending the broker size). If the number of partitions per broker exceeds the maximum value specified in the previous table, you can’t perform certain upgrade or update operations. MSK Serverless also has a quota of maximum number of partitions per cluster. You can request to increase the quota by creating a support case.

Partition-level quotas

Apache Kafka organizes data in structures called topics. Each topic consists of a single or many partitions. Partitions are the degree of parallelism in Apache Kafka. The data is distributed across brokers using data partitioning. Let’s examine a few important Amazon MSK requirements, and how you can make sure which cluster type works better for your application:

Maximum throughput per partition – MSK Serverless automatically balances the partitions of your topic between the backend nodes. It instantly scales when your ingress throughput increases. However, each partition has a quota of how much data it accepts. This is to ensure the data is distributed evenly across all partitions and backend nodes. In an MSK Serverless cluster, you need to create your topic with enough partitions such that the aggregated throughput is equal to the maximum throughput your application requires. You also need to make sure your consumers read data with a rate that is below the maximum egress throughput per partition quota. If you’re using Amazon MSK provisioned, there is no partition-level quota for write and read operations. However, AWS recommends you monitor and detect hot partitions and control how partitions should balance among the broker nodes.
Data storage – The amount of time each message is kept in a particular topic directly influences the total amount of storage needed for your cluster. Amazon MSK allows you to manage the retention period at the topic level. MSK provisioned clusters allow broker-level configuration to set the default data retention period. MSK Serverless clusters allow unlimited data retention, but there is a separate quota for the maximum data that can be stored in each partition.

Security

Amazon MSK recommends that you secure your data in the following ways. Availability of the security features varies depending on the cluster type. Before making a decision about your cluster type, check if your preferred security options are supported by your choice of cluster type.

Encryption at rest – Amazon MSK integrates with AWS Key Management Service (AWS KMS) to offer transparent server-side encryption. Amazon MSK always encrypts your data at rest. When you create an MSK cluster, you can specify the KMS key that you want Amazon MSK to use to encrypt your data at rest.
Encryption in transit – Amazon MSK uses TLS 1.2. By default, it encrypts data in transit between the brokers of your MSK cluster. You can override this default when you create the cluster. For communication between clients and brokers, you must specify one of the following settings:
- Only allow TLS encrypted data. This is the default setting.
- Allow both plaintext and TLS encrypted data.
- Only allow plaintext data.
Authentication and authorization – Use IAM to authenticate clients and allow or deny Apache Kafka actions. Alternatively, you can use TLS or SASL/SCRAM to authenticate clients, and Apache Kafka ACLs to allow or deny actions.

Cost of ownership

Amazon MSK helps you avoid spending countless hours and significant resources just managing your Apache Kafka cluster, adding little or no value to your business. With a few clicks on the Amazon MSK console, you can create highly available Apache Kafka clusters with settings and configuration based on Apache Kafka’s deployment best practices. Amazon MSK automatically provisions and runs Apache Kafka clusters. Amazon MSK continuously monitors cluster health and automatically replaces unhealthy nodes with no application downtime. In addition, Amazon MSK secures Apache Kafka clusters by encrypting data at rest and in transit. These capabilities can significantly reduce your Total Cost of Ownership (TCO).

With MSK provisioned clusters, you can specify and then scale cluster capacity to meet your needs. With MSK Serverless clusters, you don’t need to specify or scale cluster capacity. MSK Serverless automatically scales the cluster capacity based on the throughput, and you only pay per GB of data that your producers write to and your consumers read from the topics in your cluster. Additionally, you pay an hourly rate for your serverless clusters and an hourly rate for each partition that you create. The MSK Serverless cluster type generally offers a lower cost of ownership by taking away the cost of engineering resources needed for monitoring, capacity planning, and scaling MSK clusters. However, if your organization has a DataOps team with Kafka competency, you can use this competency to operate optimized MSK provisioned clusters. This allows you to save on Amazon MSK costs by consolidating several Kafka applications into a single cluster. There are a few critical considerations to decide when and how to split your workloads between multiple MSK clusters.

Apache ZooKeeper

Apache ZooKeeper is a service included in Amazon MSK when you create a cluster. It manages the Apache Kafka metadata and acts as a quorum controller for leader elections. Although interacting with ZooKeeper is not a recommended pattern, some Kafka applications have a dependency to connect directly to ZooKeeper. During the migration to Amazon MSK, you may find a few of these applications in your organization. This could be because they use an older version of the Kafka client library or other reasons. For example, applications that help with Apache Kafka admin operations or visibility such as Cruise Control usually need this kind of access.

Before you choose your cluster type, you first need to check which offering provides direct access to the ZooKeeper cluster. As of writing this post, only Amazon MSK provisioned provides direct access to ZooKeeper.

How AnyCompany chooses their cluster types

AnyCompany first needs to collect some important requirements about each of their applications. The following table shows these requirements. The rows marked with an asterisk (*) are calculated based on the values in previous rows.

Dimension	Ecommerce Platform	Gaming Application
Message rate per second	150,000	1,000
Maximum message size	15 MB	1 MB
Average message size	30 KB	15 KB
* Ingress throughput (average message size * message rate per second)	4.5GBps	15MBps
Number of consumer groups	2	1
* Outgress throughput (ingress throughput * number of consumer groups)	9 GBps	15 MBps
Number of topics	100	10
Average partition per topic	100	5
* Total number of partitions (number of topics * average partition per topic)	10,000	50
* Ingress per partition (ingress throughput / total number of partitions)	450 KBps	300 KBps
* Outgress per partition (outgress throughput / total number of partitions)	900 KBps	300 KBps
Data retention	72 hours	168 hours
* Total storage needed (ingress throughput * retention period in seconds)	1,139.06 TB	1.3 TB
Authentication	Plaintext and SASL/SCRAM	IAM
Need ZooKeeper access	Yes	No

For the gaming application, AnyCompany doesn’t want to use their in-house Kafka competency to support an MSK provisioned cluster. Also, the gaming application doesn’t need custom configuration, and its throughput needs are below the quotas set by the MSK Serverless cluster type. In this scenario, an MSK Serverless cluster makes more sense.

For the e-commerce platform, AnyCompany wants to use their Kafka competency. Moreover, their throughput needs exceed the MSK Serverless quotas, and the application requires some broker-level custom configuration. The ecommerce platform also can’t split between multiple clusters. Because of these reasons, AnyCompany chooses the MSK provisioned cluster type in this scenario. Additionally, AnyCompany can save more on cost with the Amazon MSK provisioned pricing model. Their throughput is consistent at most times and AnyCompany wants to use their DataOps team to optimize a provisioned MSK cluster and make scaling decisions based on their own expertise.

Conclusion

Choosing the best cluster type for your applications may seem complicated at first. In this post, I showed a process that helps you work backward from your application’s requirement and the resources available to you. MSK provisioned clusters offer more flexibility in how you scale, configure, and optimize your cluster. MSK Serverless, on the other hand, is a cluster type that makes it easier for you to run Apache Kafka clusters without having to manage compute and storage capacity. I generally recommend you begin with MSK Serverless if your application doesn’t require broker-level custom configurations, and your application throughput needs don’t exceed the quotas for the MSK Serverless cluster type. Sometimes it’s best to split your workloads between multiple MSK Serverless clusters, but if that isn’t possible, you may need to consider an MSK provisioned cluster. To operate an optimized MSK provisioned cluster, you need to have Kafka competency within your organization.

For further reading on Amazon MSK, visit the official product page.

About the author

Ali Alemi is a Streaming Specialist Solutions Architect at AWS. Ali advises AWS customers with architectural best practices and helps them design real-time analytics data systems that are reliable, secure, efficient, and cost-effective. He works backward from customers’ use cases and designs data solutions to solve their business problems. Prior to joining AWS, Ali supported several public sector customers and AWS consulting partners in their application modernization journey and migration to the cloud.

Let’s Architect! Architecting a data mesh

2023-03-08 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecting-a-data-mesh/

Data architectures were mainly designed around technologies rather than business domains in the past. This changed in 2019, when Zhamak Dehghani introduced the data mesh. Data mesh is an application of the Domain-Driven-Design (DDD) principles to data architectures: Data is organized into data domains and the data is the product that the team owns and offers for consumption.

A data mesh architecture unites the disparate data sources within an organization through centrally managed data-sharing and governance guidelines. Business functions can maintain control over how shared data is accessed because data mesh also solves advanced data security challenges through distributed, decentralized ownership.

This edition of Let’s Architect! introduces data mesh, highlights the foundational concepts of data architectures, and covers the patterns for designing a data mesh in the AWS cloud with supporting resources.

Data lakes, lake houses and data mesh: what, why, and how?

Let’s explore a video introduction to data lakes, lake houses, and data mesh. This resource explains how to leverage those concepts to gain greater data insights across different business segments, with a special focus on best practices to build a well-architected, modern data architecture on AWS. It also gives an overview of the AWS cloud services that can be used to create such architectures and describes the fundamental pillars of designing them.

Take me to this intro to data lakes, lake houses, and data mesh video!

Data mesh is an architecture pattern where data are organized into domains and seen as products to expose for consumption

Building data mesh architectures on AWS

Knowing what a data mesh architecture is, here is a step-by-step video from re:Invent 2022 on designing one. It covers a use case on how GoDaddy considered and implemented data mesh, in addition to:

The fundamental pillars behind a well-architected data mesh in the cloud
Finding an approach to build a data mesh architecture using native AWS services
Reasons for considering a data mesh architecture where data lakes provide limitations in some scenarios
How data mesh can be applied in practice to overcome them
The mental models to apply during the data mesh design process

Take me to this re:Invent 2022 video!

In the data mesh architecture the producers expose their data for consumption to the consumers. Access is regulated through a centralized governance layer.

Amazon DataZone: Democratize data with governance

Now let’s explore data accessibility as it relates to data mesh architectures.

Amazon DataZone is a new AWS business data catalog allowing you to unlock data across organizational boundaries with built-in governance. This service provides a unified environment where everyone in an organization—from data producers to data consumers—can access, share, and consume data in a governed manner.

Here is a video to learn how to apply AWS analytics services to discover, access, and share data across organizational boundaries within the context of a data mesh architecture.

Take me to this re:Invent 2022 video!

Amazon DataZone accelerates the adoption of the data mesh pattern by making it scalable to high number of producers and consumers.

Build a data mesh on AWS

Feeling inspired to build? Hands-on experience is a great way to learn and see how the theoretical concepts apply in practice.

This workshop teaches you a data mesh architecture building approach on AWS. Many organizations are interested in implementing this architecture to:

Move away from centralized data lakes to decentralized ownership
Deliver analytics solutions across business units

Learn how a data mesh architecture can be implemented with AWS native services.

Take me to this workshop!

The diagrams shows how to separate the producers, consumers and governance components through a multi-account strategy.

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about monitoring and observability.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Patterns for enterprise data sharing at scale

2023-02-27 Venkata Sistla

Post Syndicated from Venkata Sistla original https://aws.amazon.com/blogs/big-data/patterns-for-enterprise-data-sharing-at-scale/

Data sharing is becoming an important element of an enterprise data strategy. AWS services like AWS Data Exchange provide an avenue for companies to share or monetize their value-added data with other companies. Some organizations would like to have a data sharing platform where they can establish a collaborative and strategic approach to exchange data with a restricted group of companies in a closed, secure, and exclusive environment. For example, financial services companies and their auditors, or manufacturing companies and their supply chain partners. This fosters development of new products and services and helps improve their operational efficiency.

Data sharing is a team effort, it’s important to note that in addition to establishing the right infrastructure, successful data sharing also requires organizations to ensure that business owners sponsor data sharing initiatives. They also need to ensure that data is of high quality. Data platform owners and security teams should encourage proper data use and fix any privacy and confidentiality issues.

This blog discusses various data sharing options and common architecture patterns that organizations can adopt to set up their data sharing infrastructure based on AWS service availability and data compliance.

Data sharing options and data classification types

Organizations operate across a spectrum of security compliance constraints. For some organizations, it’s possible to use AWS services like AWS Data Exchange. However, organizations working in heavily regulated industries like federal agencies or financial services might be limited by the allow listed AWS service options. For example, if an organization is required to operate in a Fedramp Medium or Fedramp High environment, their options to share data may be limited by the AWS services that are available and have been allow listed. Service availability is based on platform certification by AWS, and allow listing is based on the organizations defining their security compliance architecture and guidelines.

The kind of data that the organization wants to share with its partners may also have an impact on the method used for data sharing. Complying with data classification rules may further limit their choice of data sharing options they may choose.

The following are some general data classification types:

Public data – Important information, though often freely available for people to read, research, review and store. It typically has the lowest level of data classification and security.
Private data – Information you might want to keep private like email inboxes, cell phone content, employee identification numbers, or employee addresses. If private data were shared, destroyed, or altered, it might pose a slight risk to an individual or the organization.
Confidential or restricted data – A limited group of individuals or parties can access sensitive information often requiring special clearance or special authorization. Confidential or restricted data access might involve aspects of identity and authorization management. Examples of confidential data include Social Security numbers and vehicle identification numbers.

The following is a sample decision tree that you can refer to when choosing your data sharing option based on service availability, classification type, and data format (structured or unstructured). Other factors like usability, multi-partner accessibility, data size, consumption patterns like bulk load/API access, and more may also affect the choice of data sharing pattern.

In the following sections, we discuss each pattern in more detail.

Pattern 1: Using AWS Data Exchange

AWS Data Exchange makes exchanging data easier, helping organizations lower costs, become more agile, and innovate faster. Organizations can choose to share data privately using AWS Data Exchange with their external partners. AWS Data Exchange offers perimeter controls that are applied at identity and resource levels. These controls decide which external identities have access to specific data resources. AWS Data Exchange provides multiple different patterns for external parties to access data, such as the following:

AWS Data Exchange for Amazon Redshift
AWS Data Exchange for AWS Lake Formation (currently in preview)
AWS Data Exchange for Data APIs
AWS Data Exchange for data files
AWS Data Exchange for Amazon S3 (currently in preview)

The following diagram illustrates an example architecture.

With AWS Data Exchange, once the dataset to share (or sell) is configured, AWS Data Exchange automatically manages entitlements (and billing) between the producer and the consumer. The producer doesn’t have to manage policies, set up new access points, or create new Amazon Redshift data shares for each consumer, and access is automatically revoked if the subscription ends. This can significantly reduce the operational overhead in sharing data.

Pattern 2: Using AWS Lake Formation for centralized access management

You can use this pattern in cases where both the producer and consumer are on the AWS platform with an AWS account that is enabled to use AWS Lake Formation. This pattern provides a no-code approach to data sharing. The following diagram illustrates an example architecture.

In this pattern, the central governance account has Lake Formation configured for managing access across the producer’s org accounts. Resource links from the production account Amazon Simple Storage Service (Amazon S3) bucket are created in Lake Formation. The producer grants Lake Formation permissions on an AWS Glue Data Catalog resource to an external account, or directly to an AWS Identity and Access Management (IAM) principal in another account. Lake Formation uses AWS Resource Access Manager (AWS RAM) to share the resource. If the grantee account is in the same organization as the grantor account, the shared resource is available immediately to the grantee. If the grantee account is not in the same organization, AWS RAM sends an invitation to the grantee account to accept or reject the resource grant. To make the shared resource available, the consumer administrator in the grantee account must use the AWS RAM console or AWS Command Line Interface (AWS CLI) to accept the invitation.

Authorized principals can share resources explicitly with an IAM principal in an external account. This feature is useful when the producer wants to have control over who in the external account can access the resources. The permissions the IAM principal receives are a union of direct grants and the account-level grants that are cascaded down to the principals. The data lake administrator of the recipient account can view the direct cross-account grants, but can’t revoke permissions.

Pattern 3: Using AWS Lake Formation from the producer external sharing account

The producer may have stringent security requirements where no external consumer should access their production account or their centralized governance account. They may also not have Lake Formation enabled on their production platform. In such cases, as shown in the following diagram, the producer production account (Account A) is dedicated to its internal organization users. The producer creates another account, the producer external sharing account (Account B), which is dedicated for external sharing. This gives the producer more latitude to create specific policies for specific organizations.

The following architecture diagram shows an overview of the pattern.

The producer implements a process to create an asynchronous copy of data in Account B. The bucket can be configured for Same Region Replication (SRR) or Cross Region Replication (CRR) for objects that need to be shared. This facilitates automated refresh of data to the external account to the “External Published Datasets” S3 bucket without having to write any code.

Creating a copy of the data allows the producer to add another degree of separation between the external consumer and its production data. It also helps meet any compliance or data sovereignty requirements.

Lake Formation is set up on Account B, and the administrator creates resources links for the “External Published Datasets” S3 bucket in its account to grant access. The administrator follows the same process to grant access as described earlier.

Pattern 4: Using Amazon Redshift data sharing

This pattern is ideally suited for a producer who has most of their published data products on Amazon Redshift. This pattern also requires the producer’s external sharing account (Account B) and the consumer account (Account C) to have an encrypted Amazon Redshift cluster or Amazon Redshift Serverless endpoint that meets the prerequisites for Amazon Redshift data sharing.

The following architecture diagram shows an overview of the pattern.

Two options are possible depending on the producer’s compliance constraints:

Option A – The producer enables data sharing directly on the production Amazon Redshift cluster.
Option B – The producer may have constraints with respect to sharing the production cluster. The producer creates a simple AWS Glue job that copies data from the Amazon Redshift cluster in the production Account A to the Amazon Redshift cluster in the external Account B. This AWS Glue job can be scheduled to refresh data as needed by the consumer. When the data is available in Account B, the producer can create multiple views and multiple data shares as needed.

In both options, the producer maintains complete control over what data is being shared, and the consumer admin maintains full control over who can access the data within their organization.

After both the producer and consumer admins approve the data sharing request, the consumer user can access this data as if it were part of their own account without have to write any additional code.

Pattern 5: Sharing data securely and privately using APIs

You can adopt this pattern when the external partner doesn’t have a presence on AWS. You can also use this pattern when published data products are spread across various services like Amazon S3, Amazon Redshift, Amazon DynamoDB, and Amazon OpenSearch Service but the producer would like to maintain a single data sharing interface.

Here’s an example use case: Company A would like to share some of its log data in near-real time with its partner Company B, who uses this data to generate predictive insights for Company A. Company A stores this data in Amazon Redshift. The company wants to share this transactional information with its partner after masking the personally identifiable information (PII) in a cost-effective and secure way to generate insights. Company B doesn’t use the AWS platform.

Company A establishes a microbatch process using an AWS Lambda function or AWS Glue that queries Amazon Redshift to get incremental log data, applies the rules to redact the PII, and loads this data to the “Published Datasets” S3 bucket. This instantiates an SRR/CRR process that refreshes this data in the “External Sharing” S3 bucket.

The following diagram shows how the consumer can then use an API-based approach to access this data.

The workflow contains the following steps:

An HTTPS API request is sent from the API consumer to the API proxy layer.
The HTTPS API request is forwarded from the API proxy to Amazon API Gateway in the external sharing AWS account.
Amazon API Gateway calls the request receiver Lambda function.
The request receiver function writes the status to a DynamoDB control table.
A second Lambda function, the poller, checks the status of the results in the DynamoDB table.
The poller function fetches results from Amazon S3.
The poller function sends a presigned URL to download the file from the S3 bucket to the requestor via Amazon Simple Email Service (Amazon SES).
The requestor downloads the file using the URL.
The network perimeter AWS account only allows egress internet connection.
The API proxy layer enforces both the egress security controls and perimeter firewall before the traffic leaves the producer’s network perimeter.
The AWS Transit Gateway security egress VPC routing table only allows connectivity from the required producer’s subnet, while preventing internet access.

Pattern 6: Using Amazon S3 access points

Data scientists may need to work collaboratively on image, videos, and text documents. Legal and audit groups may want to share reports and statements with the auditing agencies. This pattern discusses an approach to sharing such documents. The pattern assumes that the external partners are also on AWS. Amazon S3 access points allow the producer to share access with their consumer by setting up cross-account access without having to edit bucket policies.

Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations, such as GetObject and PutObject. Each access point has distinct permissions and network controls that Amazon S3 applies for any request that is made through that access point. Each access point enforces a customized access point policy that works in conjunction with the bucket policy attached to the underlying bucket.

The following architecture diagram shows an overview of the pattern.

The producer creates an S3 bucket and enables the use of access points. As part of the configuration, the producer specifies the consumer account, IAM role, and privileges for the consumer IAM role.

The consumer users with the IAM role in the consumer account can access the S3 bucket via the internet or restricted to an Amazon VPC via VPC endpoints and AWS PrivateLink.

Conclusion

Each organization has its unique set of constraints and requirements that it needs to fulfill to set up an efficient data sharing solution. In this post, we demonstrated various options and best practices available to organizations. The data platform owner and security team should work together to assess what works best for your specific situation. Your AWS account team is also available to help.

Related resources

For more information on related topics, refer to the following:

About the Authors

Venkata Sistla is a Cloud Architect – Data & Analytics at AWS. He specializes in building data processing capabilities and helping customers remove constraints that prevent them from leveraging their data to develop business insights.

Santosh Chiplunkar is a Principal Resident Architect at AWS. He has over 20 years of experience helping customers solve their data challenges. He helps customers develop their data and analytics strategy and provides them with guidance on how to make it a reality.