Tag Archives: Thought Leadership

Optimizing Amazon EC2 Spot Instances with Spot Placement Scores

2023-04-27 Sheila Busser

Post Syndicated from Sheila Busser original https://aws.amazon.com/blogs/compute/optimizing-amazon-ec2-spot-instances-with-spot-placement-scores/

This blog post is written by Steve Cole, Principal Specialist SA, and Robert McCone, Sr. Specialist SA.

Getting the compute resources you need, even vCPUS numbering in the millions, and completing a workload using Amazon EC2 Spot Instances is just a configuration away. In this post you will learn how to use Spot placement scores to reduce interruptions, acquire greater capacity, and identify optimal configurations, times, and locations to run workloads on Spot Instances. Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity in the AWS cloud and are available at up to a 90% discount compared to On-Demand prices. Spot placement scores is a feature that many customers use to identify optimal instance types or to choose the best Availability Zone (AZ) for ephemeral work like data analytics or high-performance computing. As a real-time tool, Spot placement scores are often integrated into deployment automation. However, because of its logging and graphic capabilities, you may find it be a valuable resource even before you launch a workload into the cloud. Now available through AWS Labs, a Github repository hosting tools for customers, the Spot placement score tracker tackles the undifferentiated heavy lifting and can do this for any customer.

About Spot placement score

Spot placement scores are a feature available through AWS APIs – also implemented in the Amazon EC2 Spot requests console – that uses internal capacity and interruption data to scrutinize the size and shape of a Spot Instance request and responds with a “likelihood of success” rating of 1 to imply lower likelihood of success and 10 to imply higher likelihood of success. The score represents confidence in being able to acquire the desired capacity (size) using the instance configuration (shape) for the next few hours. The shape of the request can be a list of specific instances or can be requirements-based with attribute-based instance type selection. The size of the request can be instance count, number of vCPUs, or GB of RAM. It’s based on known capacity, allocation strategies, and the trending of capacities over time.

Before the release of Spot placement score, customers could track the trends of their existing workloads and configurations. This might have helped them to anticipate capacity constraints over time, but the ability to do something more meaningful when assessing configurations was something customers requested often. With the launch of Spot placement score, that capability was delivered and enabled customers to receive guidance on how a configuration change might affect the effectiveness of Spot Instances in a workload.

Customers immediately recognized the power of this new feature and started writing tooling around their workloads to incorporate the new functionality provided by Spot placement scores. For examples, customers leveraged Spot placement scores to find the highest scoring AZ in a region for work that requires low latency within a cluster. Customers running data analytics with services like Amazon EMR could more confidently launch clusters on Spot Instances. This reduces costs and the time necessary to process data because of fewer interruptions. Financial customers, health care and life sciences, and high tech were some of the early adopters of this strategy.

Benefits of Spot placement scores

One specific customer used tools like the Spot instance advisor and Spot pricing history tools to make decisions about what instances to run every night. If the customer’s analytics workload received too many interruptions, then it would inevitably be relaunched using On-Demand Instances, increasing costs and time-to-complete. The addition of Spot placement scores to the customer’s tooling allowed for more informed decisions about which configurations worked best and, more specifically, which AZ(s) to use. Ultimately, this led not only to higher confidence in using Spot instances, but also to significant cost savings over time.

Other customers tracked Spot placement scores over time with regular queries stored in time series databases to identify not only the best configuration or location, but also the best time-of-day or day-of-week to run their workloads. Different configurations of instance types were queried through automation and the results were logged into a time series database that could then be presented as graphs. These graphs were scrutinized, configurations were tuned, and ultimately these customers could take greater advantage of the cost optimization that Spot instances offer through fewer interruptions by running their workloads where and when scores were higher.

AWS was interested in how this solved problems for customers, and after some more research with customers and design ideation, led to the creation of an OSS tool that AWS has recently released: Spot placement score tracker. Spot placement score tracker helps customers evaluate different configurations against multiple times and locations. It’s an AWS-native solution that leverages the Spot placement score API along with AWS Lambda and Amazon CloudWatch to create a dashboard that enables any AWS customer to benefit from this model without having to write it themselves.

How to use the Spot placement score tracker

The project provides Infrastructure as Code (IaC) automation using the AWS Cloud Development Kit (AWS CDK) to deploy the infrastructure and permissions required to run Lambda. This gets executed every five minutes to collect the placement scores of as many diversified configurations as defined.

After installing the CloudWatch dashboard, and given some time to collect and record data, you will be provided valuable insights in an intuitive graph such as those in the following example.

Insights available through the Spot placement score tracker

The first thing you may notice by observing data over time is that instance diversification is the primary driver of high placement scores. This has always been a best practice for the use of Spot Instances, and it extends to On-Demand Instances as well. In short, if you can only run on one instance type, then the likelihood of experiencing interruptions is far greater than if you can run on six or twelve. Sometimes the simple inclusion of -a, -d, and -n instance types (e.g. m5.large, m5a.large, m5d.large, m5d.large), previous generations (e.g., m5.large, m4.large), different sizes in a container environment (e.g., m5.large, m5.xlarge, m5.2xlarge), and even the inclusion of AWS Graviton will have a material impact on placement scores, which equates to fewer interruptions. This ultimately leads to more efficient use of resources through less restarted processes, resulting in increased efficiency and reduced costs.

The second insight that you can realize through the use of placement scores over time is identifying the optimal AZ in which an ephemeral process can be placed. Perhaps the best use case for this type of insight is data analytics clusters that are launched to complete many calculations overnight. This is common in financial institutions for various reasons including risk analysis and compliance, but could apply to medical research examining results of experiments during the day as well as other situations where a 24/7 presence isn’t required by the workload. These customers are typically using a single AZ to allow for faster communication between nodes and to reduce data transfer costs. Therefore, the ability for Spot placement scores to provide different scores for different AZs is highly advantageous.

Third, with access to placement scores over time, it becomes possible to identify exactly how large a workload’s footprint can be. By submitting identical configurations to Spot placement scores but with different sizes, you can surface the ideal workload size. Not too small, where perhaps the job takes too long to complete, but also not so large that the interruptions are too frequent and cause restarts too often. This can benefit not only ephemeral workloads, but also persistent clusters or fleets by understanding what the lowest score would be over time and giving you solid information regarding what they can expect from Spot Instances and where. This might inform you to be ready to launch On-Demand Instances to compensate when Spot Instance availability is lower. This can also help to forecast pricing and inform decisions about the consideration of AWS Savings Plans or On-Demand Capacity Reservations.

Finally, analyzing Spot placement scores over time can provide regional scoring. Through this lens it’s possible for you to identify entire regions that they may have overlooked without the knowledge that Spot Instances outside the your primary region(s) might offer lower interruptions during daylight hours due to them being off-peak. When it’s possible to place a workload in another region, unconstrained by local data access requirements, it’s quite possible to harness the compute of a significant footprint in locations that are otherwise un(der)-utilized. Workloads that require less data transfer and more compute can benefit tremendously from access to Spot Instances in other regions. For example, things like build servers might run extraordinarily well in Europe during North American business hours and the reduction in compute cost might offset the data transfer to complete the job.

Conclusion

Spot placement scores can be used to make decisions about how, when, and where Spot Instances can be most efficiently utilized to deliver business needs, and at greatly reduced prices. We’re very excited to release this tool to enable you to tap into information which was previously unavailable and make data-driven decisions for your business. The information in this post, combined with the output of placement scores over time, is a significant evolution.

Install the Spot placement score tracker today, configure it to match an existing Spot workload, and see how you might perform at different times or different locations. Explore more robust options and discover greater capacity and lower interruptions. Or investigate how On-Demand workloads could migrate to Spot Instances.

Let’s Architect! Getting started with containers

2023-04-26 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-getting-started-with-containers/

Most of AWS customers building cloud-native applications or modernizing applications choose containers to run their microservices applications to accelerate innovation and time to market while lowering their total cost of ownership (TCO). Using containers in AWS comes with other benefits, such as increased portability, scalability, and flexibility.

The combination of containers technologies and AWS services also provides features such as load balancing, auto scaling, and service discovery, making it easier to deploy and manage applications at scale.

In this edition of Let’s Architect! we share useful resources to help you to get started with containers on AWS.

Container Build Lens

This whitepaper describes the Container Build Lens for the AWS Well-Architected Framework. It helps customers review and improve their cloud-based architectures and better understand the business impact of their design decisions. The document describes general design principles for containers, as well as specific best practices and implementation guidance using the Six Pillars of the Well-Architected Framework.

Take me to explore the Containers Build Lens!

Follow Containers Build Lens Best practices to architect your containers-based workloads.

EKS Workshop

The EKS Workshop is a useful resource to familiarize yourself with Amazon Elastic Kubernetes Service (Amazon EKS) by practicing on real use-cases. It is built to help users learn about Amazon EKS features and integrations with popular open-source projects. The workshop is abstracted into high-level learning modules, including Networking, Security, DevOps Automation, and more. These are further broken down into standalone labs focusing on a particular feature, tool, or use case.

Once you’re done experimenting with EKS Workshop, start building your environments with Amazon EKS Blueprints, a collection of Infrastructure as Code (IaC) modules that helps you configure and deploy consistent, batteries-included Amazon EKS clusters across accounts and regions following AWS best practices. Amazon EKS Blueprints are available in both Terraform and CDK.

Take me to this workshop!

The workshop is abstracted into high-level learning modules, including Networking, Security, DevOps Automation, and more.

Architecting for resiliency on AWS App Runner

Learn how to architect an highly available and resilient application using AWS App Runner. With App Runner, you can start with just the source code of your application or a container image. The complexity of running containerized applications is abstracted away, including the cloud resources needed for running your web application or API. App Runner manages load balancers, TLS certificates, auto scaling, logs, metrics, teachability and more, so you can focus on implementing your business logic in a highly scalable and elastic environment.

Take me to this blog post!

A high-level architecture for an available and resilient application with AWS App Runner

Securing Kubernetes: How to address Kubernetes attack vectors

As part of designing any modern system on AWS, it is necessary to think about the security implications and what can affect your security posture. This session introduces the fundamentals of the Kubernetes architecture and common attack vectors. It also includes security controls provided by Amazon EKS and suggestions on how to address them. With these strategies, you can learn how to reduce risk for your Kubernetes-based workloads.

Take me to this video!

Some common attack vectors that need addressing with Kubernetes

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about serverless.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Scaling security and compliance

2023-04-18 Chad Woolf

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/scaling-security-and-compliance/

At Amazon Web Services (AWS), we move fast and continually iterate to meet the evolving needs of our customers. We design services that can help our customers meet even the most stringent security and compliance requirements. Additionally, our service teams work closely with our AWS Security Guardians program to coordinate security efforts and to maintain a high quality bar. We also have internal compliance teams that continually monitor security control requirements from all over the world and engage with external auditors to achieve third-party validation of our services against these requirements.

In this post, I’ll cover some key strategies and best practices that we use to scale security and compliance while maintaining a culture of innovation.

Security as the foundation

At AWS, security is our top priority. Although compliance might be challenging, treating security as an integral part of everything we do at AWS makes it possible for us to adhere to a broad range of compliance programs, to document our compliance, and to successfully demonstrate our compliance status to our auditors and customers.

Over time, as the auditors get deeper into what we’re doing, we can also help improve and refine their approach, as well. This increases the depth and quality of the reports that we provide directly to our customers.

The challenge of scaling securely

Many customers struggle with balancing security, compliance, and production. These customers have applications that they want to quickly make available to their own customer base. They might need to audit these applications. The traditional process can include writing the application, putting it into production, and then having the audit team take a look to make sure it meets compliance standards. This approach can cause issues, because retroactively adding compliance requirements can result in rework and churn for the development team.

Enforcing compliance requirements in this way doesn’t scale and eventually causes more complexity and friction between teams. So how do you scale quickly and securely?

Speak their language

The first way to earn trust with development teams is to speak their language. It’s critical to use terms and references that developers use, and to know what tools they are using to develop, deploy, and secure code. It’s not efficient or realistic to ask the engineering teams to do the translation of diverse (and often vague) compliance requirements into engineering specs. The compliance teams must do the hard work of translating what is required into what specifically must be done, using language that engineers are familiar with.

Another strategy to scale is to embed compliance requirements into the way developers do their daily work. It’s important that compliance teams enable developers to do their work just as they normally do, without compliance needing to intervene. If you’re successful at that strategy—and the compliant path becomes the simplest and most natural path—then that approach can lead to a very scalable compliance program that fosters understanding between teams and increased collaboration. This approach has helped break down the barriers between the developer and audit/compliance organizations.

Treat auditors and regulators as partners

I believe that you should treat auditors and regulators as true business partners. An independent auditor or regulator understands how a wide range of customers will use the security assurance artifacts that you are producing, and therefore will have valuable insights into how your reports can best be used. I think people can fall into the trap of treating regulators as adversaries. The best approach is to communicate openly with regulators, helping them understand your business and the value you bring to your customers, and getting them ramped up on your technology and processes.

At AWS, we help auditors and regulators get ramped up in various ways. For example, we have the Digital Audit Symposium, which contains presentations on how we control and operate particular services in terms of security and compliance. We also offer the Cloud Audit Academy, a learning path that provides both cloud-agnostic and AWS-specific training to help existing and prospective auditing, risk, and compliance professionals understand how to audit regulated cloud workloads. We’ve learned that being a partner with auditors and regulators is key in scaling compliance.

Conclusion

Having security as a foundation is essential to driving and scaling compliance efforts. Speaking the language of developers helps them continue to work without disruption, and makes the simple path the compliant path. Although some barriers still exist, especially for organizations in highly regulated industries such as financial services and healthcare, treating auditors like partners is a positive strategic shift in perspective. The more proactive you are in helping them accomplish what they need, the faster you will realize the value they bring to your business.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Security Profile – Cryptography Edition: Panos Kampanakis, Principal Security Engineer

2023-04-18 Roger Park

Post Syndicated from Roger Park original https://aws.amazon.com/blogs/security/aws-security-profile-panos-kampanakis/

In the AWS Security Profile — Cryptography Edition series, we interview Amazon Web Services (AWS) thought leaders who help keep our customers safe and secure. This interview features Panos Kampanakis, Principal Security Engineer, AWS Cryptography. Panos shares thoughts on data protection, cloud security, post-quantum cryptography, and more.

What do you do in your current role and how long have you been at AWS?

I have been with AWS for two years. I started as a Technical Program Manager in AWS Cryptography, where I led some AWS Cryptography projects related to cryptographic libraries and FIPS, but I’m currently working as a Principal Security Engineer on a team that focuses on applied cryptography, research, and cryptographic software. I also participate in standardization efforts in the security space, especially in cryptographic applications. It’s a very active space that can consume as much time as you have to offer.

How did you get started in the data protection/ cryptography space? What about it piqued your interest?

I always found cybersecurity fascinating. The idea of proactively focusing on security and enabling engineers to protect their assets against malicious activity was exciting. After working in organizations that deal with network security, application security, vulnerability management, and security information sharing, I found myself going back to what I did in graduate school: applied cryptography.

Cryptography is a constantly evolving, fundamental area of security that requires breadth of technical knowledge and understanding of mathematics. It provides a challenging environment for those that like to constantly learn. Cryptography is so critical to the security and privacy of data and assets that it is top of mind for the private and public sector worldwide.

How do you explain your job to your non-tech friends?

I usually tell them that my work focuses on protecting digital assets, information, and the internet from malicious actors. With cybersecurity incidents constantly in the news, it’s an easy picture to paint. Some of my non-technical friends still joke that I work as a security guard!

What makes cryptography exciting to you?

Cryptography is fundamental to security. It’s critical for the protection of data and many other secure information use cases. It combines deep mathematical topics, data information, practical performance challenges that threaten deployments at scale, compliance with various requirements, and subtle potential security issues. It’s certainly a challenging space that keeps evolving. Post-quantum or privacy preserving cryptography are examples of areas that have gained a lot of attention recently and have been consistently growing.

Given the consistent evolution of security in general, this is an important and impactful space where you can work on challenging topics. Additionally, working in cryptography, you are surrounded by intelligent people who you can learn from.

AWS has invested in the migration to post-quantum cryptography by contributing to post-quantum key agreement and post-quantum signature schemes to protect the confidentiality, integrity, and authenticity of customer data. What should customers do to prepare for post-quantum cryptography?

There are a few things that customers can do while waiting for the ratification of the new quantum-safe algorithms and their deployment. For example, you can inventory the use of asymmetric cryptography in your applications and software. Admittedly, this is not a simple task, but with proper subject matter expertise and instrumentation where necessary, you can identify where you’re using quantum-vulnerable algorithms in order to prioritize the uses. AWS is doing this exercise to have a prioritized plan for the upcoming migration.

You can also study and experiment with the potential impact of these new algorithms in critical use cases. There have been many studies on transport protocols like TLS, virtual private networks (VPNs), Secure Shell (SSH), and QUIC, but organizations might have unique uses that haven’t been accounted for yet. For example, a firm that specializes in document signing might require efficient signature methods with small size constraints, so deploying Dilithium, NIST’s preferred quantum-safe signature, could come at a cost. Evaluating its impact and performance implications would be important. If you write your own crypto software, you can also strive for algorithm agility, which would allow you to swap in new algorithms when they become available.

More importantly, you should push your vendors, your hardware suppliers, the software and open-source community, and cloud providers to adjust and enable their solutions to become quantum-safe in the near future.

What’s been the most dramatic change you’ve seen in the data protection and post-quantum cryptography landscape?

The transition from typical cryptographic algorithms to ones that can operate on encrypted data is an important shift in the last decade. This is a field that’s still seeing great development. It’s interesting how the power of data has brought forward a whole new area of being able to operate on encrypted information so that we can benefit from the analytics. For more information on the work that AWS is doing in this space, see Cryptographic Computing.

In terms of post-quantum cryptography, it’s exciting to see how an important potential risk brought a community from academia, industry, and research together to collaborate and bring new schemes to life. It’s also interesting how existing cryptography has reached optimal efficiency levels that the new cryptographic primitives sometimes cannot meet, which pushes the industry to reconsider some of our uses. Sometimes the industry might overestimate the potential impact of quantum computing to technology, but I don’t believe we should disregard the effect of heavier algorithms on performance, our carbon footprint, energy consumption, and cost. We ought to aim for efficient solutions that don’t undermine security.

Where do you see post-quantum cryptography heading in the future?

Post-quantum cryptography has received a lot of attention, and a transition is about to start ramping up after we have ratified algorithms. Although it’s sometimes considered a Herculian effort, some use cases can transition smoothly.

AWS and other industry peers and researchers have already evaluated some post-quantum migration strategies. With proper prioritization and focus, we can address some of the most important applications and gradually transition the rest. There might be some applications that will have no clear path to a post-quantum future, but most will. At AWS, we are committed to making the transitions necessary to protect our customer data against future threats.

What are you currently working on that you look forward to sharing with customers’?

I’m currently focused on bringing post-quantum algorithms to our customers’ cryptographic use cases. I’m looking into the challenges that this upcoming migration will bring and participating in standards and industry collaborations that will hopefully enable a simpler transition for everyone.

I also engage on various topics with our cryptographic libraries teams (for example, AWS-LC and s2n-tls). We build these libraries with security and performance in mind, and they are used in software across AWS.

Additionally, I work with some AWS service teams to help enable compliance with various cryptographic requirements and regulations.

Is there something you wish customers would ask you about more often?

I wish customers asked more often about provable security and how to integrate such solutions in their software. This is a fascinating field that can prevent serious issues where cryptography can go wrong. It’s a complicated topic. I would like for customers to become more aware of the importance of provable security especially in open-source software before adopting it in their solutions. Using provably secure software that is designed for performance and compliance with crypto requirements is beneficial to everyone.

I also wish customers asked more about why AWS made certain choices when deploying new mechanisms. In areas of active research, it’s often simpler to experimentally build a proof-of-concept of a new mechanism and test and prove its performance in a controlled benchmark scenario. On the other hand, it’s usually not trivial to deploy new solutions at scale (especially given the size and technological breadth of AWS), to help ensure backwards compatibility, commit to supporting these solutions in the long run, and make sure they’re suitable for various uses. I wish I had more opportunities to go over with customers the effort that goes into vetting and deploying new mechanisms at scale.

You have frequently contributed to cybersecurity publications, what is your favorite recent article and why?

I’m excited about a vision paper that I co-authored with Tancrède Lepoint called Do we need to change some things? Open questions posed by the upcoming post-quantum migration to existing standards and deployments. We are presenting this paper at the Security Standardisation Research Conference 2023. The paper discussed some open questions posed by the upcoming post-quantum transition. It also proposed some standards updates and research topics on cryptographic issues that we haven’t addressed yet.

How about outside of work—any hobbies?

I used to play basketball when I was younger, but I no longer have time. I spend most of my time with my family and little toddlers who have infinite amounts of energy. When I find an opportunity, I like reading books and short stories or watching quality films.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Security Profile: Ryan Dsouza, Principal Solutions Architect

2023-04-14 Maddie Bacon

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws-security-profile-ryan-dsouza-principal-solutions-architect/

In the AWS Security Profile series, I interview some of the humans who work in Amazon Web Services Security and help keep our customers safe and secure. This interview is with Ryan Dsouza, Principal Solutions Architect for industrial internet of things (IIoT) security.

How long have you been at AWS and what do you do in your current role?

I’ve been with AWS for over five years and have held several positions working with customers, AWS Partner Network partners, and standards organizations on IoT and IIoT solutions. Currently, I’m a Principal Solutions Architect for IIoT security. In this role, I’m the global technical leader and subject matter expert for operational technology (OT) and IIoT security, which means that I lead our OT/IIoT strategy and roadmap, translate customer requirements into technical solutions, and work with industry standards such as ISA/IEC 62443 to support IIoT and cloud technologies. I also work with our strategic OT/IIoT security partners to design and build integrations and solutions on AWS. And I work with some of our strategic customers to help them plan, assess, and manage the risk that comes from OT/IT convergence and to design, build, and operate more secure, scalable, and innovative IIoT solutions by using AWS capabilities to deliver measurable business outcomes.

How did you get started in the world of OT and IIoT security?

I’ve been working with OT for more than 25 years and with IIoT, for the last 10 years. I’ve led digital transformation initiatives for numerous world-class organizations including Accenture, Siemens, General Electric, IBM, and AECOM, serving customers on their digital transformation initiatives across a wide range of industry verticals such as manufacturing, buildings, utilities, smart cities, and more.

Throughout my career, I witnessed devices across critical infrastructure sectors, such as water, manufacturing, electricity, transportation, oil and gas, and buildings, getting digitized and connected to the internet. I quickly realized that this trend of connected assets and digitization will continue to grow and could outstrip the supply of cybersecurity professionals. Each customer that embraces the digital world faces cybersecurity challenges. At AWS, I work with customers to understand these challenges and provide prescriptive and practical guidance on how to secure their OT environments and IIoT solutions to help ensure safe and secure digital transformation.

What makes OT security different from information technology (IT) security?

OT and IT security are two distinct areas of security that are designed to protect different types of systems and assets. OT security is concerned with the protection of industrial control systems and other related operational technology, such as supervisory control and data acquisition (SCADA) systems, which are used to control and monitor physical processes in critical infrastructure industries such as manufacturing, energy, transportation, buildings, and utilities. The main focus of OT security is on the availability, integrity, safety, and reliability of these systems, as well as protection of the physical equipment that is being controlled. OT cybersecurity supports the safe operation of critical infrastructure. IT security, on the other hand, is concerned with the protection of computer systems, networks, and data from cyberthreats such as hacking, malware, and phishing attempts. The main focus of IT security is on the confidentiality, integrity, and availability of information and systems.

As a result of OT/IT convergence, IIoT, and the industrial digital transformation, our customers now must secure an increasing attack surface and overlapping IT and OT environments. They realize that it is business critical to secure OT/IIoT systems to avoid security events that could cause unplanned downtime and pose a safety risk. I refer to this as “securing cyber-physical systems and enabling safe and secure industrial digital transformation.”

How do you explain your job to your non-tech friends?

I explain that OT is used in buildings, manufacturing, utilities, transportation, and more, and when these systems connect to the internet, they’re exposed to risks. The risks are the same as those faced by IoT devices in our own homes and workplaces—but with greater consequences if compromised because these systems deal with critical infrastructure that our society relies on. I often share the Colonial Pipeline example and explain that I help AWS customers understand the risks and the consequences from a compromise, and design cybersecurity solutions to protect these critical infrastructure assets.

What are you currently working on that you’re excited about?

Our customers use lots of security tools from lots of different vendors. Security is a team sport, and I’m really excited to be working with customers, APN partners, and AWS service teams to build security features and product integrations that make it simpler for customers to monitor and secure OT, IIoT, and the cloud. For example, I’m working with our APN security partners to build integrations with AWS Security Hub and Amazon Security Lake, bring zero trust security solutions to OT environments, and improve security at the industrial edge.

Another project that I’m super excited about is bringing OT/IIoT security solutions to our critical infrastructure customers, including small and mid-sized organizations, by simplifying the deployment, management, procurement, and payment process so that customers can get more value from these AWS security solutions faster.

Another area of focus for me is tracking the fast-evolving critical infrastructure cybersecurity regulations, how they impact our customers, and the role that AWS can play to make it simpler for customers to align with these new security and compliance requirements.

Just like how the cloud transformed IT, I think the cloud will continue to revolutionize OT, and I’m super excited and energized to work with customers and APN partners to move OT and IIoT applications to the cloud and build nearly anything they can imagine faster and more cost-effectively on AWS.

What are the biggest challenges in securing critical infrastructure systems?

With critical infrastructure, the biggest challenge is legacy OT systems that may not have been designed with cybersecurity in mind and that use older operating systems and software, which can be difficult to upgrade and patch. These systems were designed to operate in an air-gapped environment, but there is a growing trend to connect them in new ways to IT systems. As IT and OT converge to support expanding business needs, air-gapped devices and perimeter security are no longer sufficient to address and defend against modern threats such as ransomware, data exfiltration, denial of service, and cryptocurrency mining. As OT and IT converge and OT becomes more cloud connected, the biggest challenge is to secure critical infrastructure that uses legacy and aging industrial control systems (ICS) and OT technology. We are seeing a trend to keep ICS/OT systems connected, but in smarter and more secure ways by using network segmentation, edge gateways, and the hybrid cloud so that if a problem occurs, you can still run the most important systems in an isolated and disconnected mode. For example, if your corporate systems are compromised with ransomware, you can disconnect your critical infrastructure systems from the external world and continue the most critical operations. There is a growing need to design innovative and highly distributed solution patterns to keep critical information and hybrid systems safe and secure. This is an area of focus for me at AWS.

What else can enterprises do to manage OT/IT convergence and protect themselves from these security risks?

I’ve done multiple presentations, blog posts, and whitepapers on this topic, and even if the solutions sound simple, they can be challenging to implement in industrial environments. I recommend reading the blog posts Managing Organization Transformation for Successful OT/IT Convergence and Assessing OT and IIoT cybersecurity risk, and implementing the Ten security golden rules for IIoT solutions. AWS offers lots of prescriptive guidance and solutions to help enterprises more safely and securely manage OT/IT convergence and mitigate risk with proper planning and implementation across the various aspects of business—people, processes, and technology. I encourage customers to start by focusing on the security fundamentals of securing identities, assessing their risk from OT/IT convergence, and improving their visibility into devices on the network and across the converged OT and IT environment. I also recommend using standards such as ISA/IEC 62443, which are comprehensive, consensus-based, and form a strong basis for securing critical infrastructure systems.

What skills do professionals need to be successful in critical infrastructure security?

Critical infrastructure security sounds harder than it really is. When I train people, I break it down into bite-sized pieces that are simple to understand and implement. There is some mystery around cybersecurity, but it’s just a lot of small parts. You must learn what all the parts are, what the acronyms are, and how they fit together to form cyber-physical systems. When I describe it in a real-world application, most people pick it up quickly.

Curiosity and a desire to continue learning are important characteristics to have, because cybersecurity is a fast-evolving technology field. Empathy is also important because to secure a system, you must have empathy for the people behind the work and why their goals and needs are important. For example, in the OT world, you have operations folks who just want the thing to work. If an alarm is going off on their computer screen and they must react by clicking a button, they don’t want their screen to lock them out so they can’t click that button, because this could cause the plant to have big problems. So, you need to design a solution that matches user access controls with roles and responsibilities so that a plant operator can take corrective actions in an emergency situation.

Another example is patching critical OT systems that have vulnerabilities. This may not be possible due to the risk of causing unplanned downtime, and it could pose a safety risk or result in additional time and cost for recertification due to compliance requirements. You must have empathy for the people in this situation and their needs, and then, as a security professional, design around that so they can still have those things but in a more secure way. For example, you might need to create mechanisms to identify, network isolate, or replace legacy devices that aren’t capable of receiving updates. If you are detail-oriented and have strong curiosity and empathy, you can succeed in the field of critical infrastructure cybersecurity.

What’s your favorite Amazon Leadership Principle, and why?

I have two favorite leadership principles: Learn and be Curious; and one that I initially discounted, Frugality. I believe that the best way to predict the future is to invent it, which is why I’m never done learning and seeking new ways to solve problems.

My view on the Frugality leadership principle is that we need to be frugal with each other’s time. There are so many competing demands on everyone’s time, and it’s important in a place like AWS to be mindful of that. Make sure you’ve done your due diligence on something before you broadly ask the question or escalate. Being frugal in my view is about being self-sufficient, learning to use self-service tools, and working with limited time or resources to deliver results.

I wake up every morning with the conviction that the world is always changing, and that, to succeed, I have to change faster by learning new skills and being frugal with time and resources.

What’s the thing you’re most proud of in your career?

I’m really proud of working with critical infrastructure customers across a diverse range of industries over the last 25 years and supporting their digital transformation initiatives. In the early part of my career, I was a design and commissioning engineer of industrial automation systems. In this role, I had the opportunity to design and commission new industrial plants and get them into operation, which was extremely fulfilling. I feel fortunate to have joined a company like AWS that takes cybersecurity seriously in developing its products and cloud services, and I’m proud to bring real-world experience in the design and security of cyber-physical systems to our critical infrastructure customers.

If you had to pick an industry outside of engineering, what would you want to do?

Growing up in India in a family of engineers and doctors, there were only two options: engineer or doctor. Both professions have the ability to change the world. Because my mother and brother worked at Siemens, I pursued a career in engineering. If I had to pick an industry outside of engineering, it would have been in the medical field.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Security Profile: Matt Luttrell, Principal Solutions Architect for AWS Identity

2023-04-12 Maddie Bacon

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws-security-profile-matt-luttrell-principal-solutions-architect-for-aws-identity/

In the AWS Security Profile series, I interview some of the humans who work in Amazon Web Services Security and help keep our customers safe and secure. In this profile, I interviewed Matt Luttrell, Principal Solutions Architect for AWS Identity.

How long have you been at AWS and what do you do in your current role?

I’ve been at AWS around five years and have worked in a variety of roles from Professional Services consulting as an application architect to a solutions architect. In my current role, I work on the Identity Solutions team, which is a group of solutions architects who are embedded directly in the Identity and Control Services team. We have both internal-facing and external-facing functions. Internally, we work with product managers, drive concepts like data perimeters, and generally act as the voice of the customer to our product teams. Externally, we have conversations with customers, present at events, and so on.

How did you get started in security?

My background is in software development. I’ve always had a side interest in security and have always worked for very security-conscious companies. Early in my career, I became CISSP certified and that’s what got me kickstarted in security-specific domains and conversations. At AWS, being involved in security isn’t an optional thing. So, even before I joined the Identity Solutions team, I spent a lot of time working on identity and AWS Identity and Access Management (IAM) in particular, as well as AWS IAM Access Analyzer, while working with security-conscious customers in the financial services industry. As I got involved in that, I was able to dive deep in the security elements of AWS, but I’ve always had a background in security.

How do you explain your job to non-technical friends and family?

I typically tell them that I work in the cloud computing division at Amazon and that my job title is Solutions Architect. Naturally, the next question is, “what does a solutions architect do? I’ve never heard of that.” I explain that I work with customers to figure out how to put the building blocks together that we offer them. We offer a bunch of different services and features, and my job is to teach customers how they all work and interact with each other.

What are you currently working on that you’re excited about?

One of the things our team is working on is data perimeters. Our customers will see continued guidance on data perimeters. We’ve done a lot of work in this space—workshops and presentations at some of our big conferences, as well as blog posts and example repositories.

I’m also putting together some videos that go in depth on IAM policy evaluation and offer prescriptive guidance on writing IAM policies.

In your opinion, what’s one of the coolest things happening in identity right now?

I might be biased here, but I think there’s been a shift in the security industry at large from network-based perimeters in the traditional on-premises world to identity-based perimeters in the cloud. This is where the concept of data perimeters comes into play. Because your resources and identities are distributed, you can no longer look at your server and touch your server that’s sitting right next to you. This really puts an extra emphasis on your authentication and authorization controls, as well as the need for visibility into those controls. I think there’s a lot of innovation happening in the identity world because of this increased focus on identity perimeters. You’re hearing about concepts in this area like zero trust, data perimeters, and general identity awareness in all levels of the application and infrastructure stacks. You have services like IAM Access Analyzer to help give you that visibility into your AWS environment and what your identities are doing in terms of who can access what. I think we’ll continue to see growth in these areas because workloads are not becoming less distributed over time.

Tell me about something fun that you’ve done recently at AWS.

Roberto Migli and I presented a 400-level workshop at re:Invent 2022 on IAM policy evaluation, AWS Identity and Access Management (IAM) policy evaluation in action. This workshop introduced a new mental model for thinking about policy evaluation and walked attendees through a number of different policy evaluation scenarios. The idea behind the workshop is that we introduce a scenario and have the attendee try to figure out what the result of the evaluation would be. It spends some extra time comparing how the evaluation of resource-based policies differs from that of identity-based policies. I hope attendees walked away with a better understanding of how policy evaluations work at a deeper level and how they can write better, more secure IAM policies. We presented practical advice on how to structure different types of IAM policies and the different tradeoffs when writing a policy one way compared to another. I hope the mental model we introduced helps customers better reason about how policies will evaluate when they write them in their environment.

What is your favorite Amazon Leadership Principle and why?

This is an easy one. For me, it’s definitely Learn and Be Curious. Something I try to do is put myself in uncomfortable situations because I feel that when I’m uncomfortable, I’m learning and growing because it means I don’t know something. I find comfortable situations boring at times, so I’m always trying to dig in and learn how things work. This can sometimes be distracting, too, because there’s so much to learn and understand in the identity world.

What’s the thing you’re most proud of in your career?

There’s no particular project that I can point to and say, “this is what I’m most proud of.” I’m proud to be a part of the team I’m on now. For my team, Customer Obsession is more than just a slogan. We really advocate on behalf of the customer, listen to the voice of the customer, and push back on features that might not be the best thing for the customer. I think it’s awesome that I get to work for a company that really does advocate on behalf of the customer, and that my voice is heard when I’m trying to be that advocate. That aspect of working at AWS and with my team is what I’m most proud of.

I’m also proud of the mentoring and teaching that I get to do within AWS and within my role specifically. It’s really fulfilling to watch somebody grow and realize that career growth is not a zero-sum game—just because someone else succeeds does not mean that I have to fail.

If you had to pick an industry outside of security, what would you want to do?

I’d probably choose to be a ski instructor. I’m a big fan of skiing, but I don’t get to ski very often because of where I live. I love being out on the mountains, skiing, and teaching. I’m looking for any excuse to spend my days in the mountains.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Let’s Architect! Monitoring production systems at scale

2023-04-12 Vittorio Denti

Post Syndicated from Vittorio Denti original https://aws.amazon.com/blogs/architecture/lets-architect-monitoring-production-systems-at-scale/

“Everything fails, all the time” is a famous quote from Amazon’s Chief Technology Officer Werner Vogels. This means that software and distributed systems may eventually fail because something can always go wrong. We have to accept this and design our systems accordingly, test our software and services, and think about all the possible edge cases.

With this in mind, we should also set our teams up for success by providing visibility in every environment for a quick turnaround when incidents happen. When a system serves traffic in production, we need to monitor it to make sure it behaves as expected and that all components are healthy. But questions arise such as:

How do we monitor a system?
What is monitoring?
What are some architectural and engineering approaches to implement in order to design a successful monitoring strategy?

All of these questions require complex answers. It’s not possible to cover everything in a blog post, but let’s start exploring the topic and sharing resources to guide you through this domain.

In this edition of Let’s Architect! we share some practices for monitoring used at Amazon and AWS, as well as more resources to discover how to build monitoring solutions for the workloads running on AWS.

Observability best practices at Amazon

Observability and monitoring are engineering tasks that also require putting a suitable cultural mindset in place. At Amazon, if a service doesn’t run as expected, the team writes a CoE (Correction of Errors) document to analyze the issue and answer critical questions to learn from it. There are also weekly operations meetings to analyze operational and performance dashboards for each service.

The session introduced here covers the full range of monitoring at Amazon, from how teams assess system health at a high level to how they understand the details of a single request. Use this resource to learn some best practices for metrics, logs, and tracing, and using these signals to achieve operational excellence.

Take me to this re:Invent video!

Observability is an iterative process which requires us to establish a feedback loop and improve based on the signals coming from the system.

Build an observability solution using managed AWS services and the OpenTelemetry standard

Visibility of what’s happening in a distributed system is key to operationalize workloads at scale. OpenTelemetry is the standard for observability and AWS services are fully integrated with that. The blog post introduced in this section shows you how AWS Distro for OpenTelemetry (ADOT) works under the hood and how to use it with a Kubernetes cluster. But keep in mind, this is just one of the many implementations available for AWS compute services and OpenTelemetry—so even if you’re not using Kubernetes right now, we’ve still got you covered!

Want more? Watch this re:Invent video for an understanding of how to think about logging, tracing, metrics, and monitoring with AWS services, and the possibilities to provide the observability your distributed systems need. This is a great learning resource with many demos and examples.

Take me to this blog post!

Flow of metrics and traces from Application services to the Observability Platform.

Optimizing your AWS Batch architecture for scale with observability dashboards

We’ve explored the mental models and strategies for monitoring in previous resources. Now let’s see how these principles can be applied in a scenario where we run batch and ML computing jobs at scale. In the blog post introduced in this section, you can learn how to use runtime metrics to understand an architecture designed on AWS Batch for running batch computing jobs. AWS Batch is a fully managed service enabling you to run jobs at any scale without needing to manage underlying compute resources. This blog explains how AWS Batch works and guides you through the process used to design a monitoring framework.

Since the solution is open-source, you are free to add other custom metrics you find useful. To get started with the AWS Batch open-source observability solution, visit the project page on GitHub. Several customers have used this monitoring tool to optimize their workload for scale by reshaping their jobs, refining their instance selection, and tuning their AWS Batch architecture.

Take me to this blog!

High-level structure of AWS Batch resources and interactions. This diagram depicts a user submitting jobs based on a job definition template to a job queue, which then communicates to a compute environment that resources are needed.

Observability workshop

This resource provides a hands-on experience for you on the variety of toolsets AWS offers to set up monitoring and observability on your applications. Whether your workload is on-premises or on AWS—or your application is a giant monolith or based on modern microservices-based architecture—the observability tools can provide deeper insights into application performance and health.

The monitoring tools covered in this workshop provide powerful capabilities that enable you to identify bottlenecks, issues, and defects without having to manually sift through various logs, metrics, and trace data.

Take me to this workshop!

The diagram illustrates the various components of the PetAdoptions architecture. In the workshop you will learn how to monitor this application.

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about containers on AWS.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Let’s Architect! Streamlining business with migration and modernization

2023-03-29 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-streamlining-business-with-migration-and-modernization/

Many customers migrate their systems to Amazon Web Services (AWS) to increase their competitive edge and drive business value. To maximize the benefits of a cloud migration, companies tend to move their applications in conjunction with modernization initiatives. These joined efforts help your applications gain more agility, scalability, and resilience. Modernizing the portfolio of workloads with AWS means that you can re-platform, refactor, or replace these workloads by using containers, serverless technologies, purpose-built data stores, and software automation. These functionalities allow you to benefit from the best of the AWS agility and total cost optimization (TCO) benefits.

In this edition of Let’s Architect! we share hands-on activities, customer stories, and tips and tricks to migrate and modernize your applications with AWS.

Migrating to the cloud: What is the cost of doing nothing?

Would you think that small companies always migrate faster than large enterprises? Actually, cloud migration speed doesn’t necessarily depend on the size of the business! Company size is not a clear indicator of migration and modernization success, but a shift of culture and mindset is essential for successful company evolution.

When it comes to migration, the cost of doing nothing is not just financial: Businesses can also expect a slower pace of innovation and a higher security burden. This video analyzes the financial benefits of migration and shares mental models for approaching an AWS cloud migration, and Marriott team members explain how they planned their migration and the lessons learned along the way.

Take me to this re:Invent 2022 video!

Benefits of an early migration start

Modernization pathways for a legacy .NET Framework monolithic application on AWS

Organizations aim to deliver the best technological solutions based on customer needs. At any stage in their cloud adoption journey, businesses often end up managing and building monolithic applications. Let’s explore a migration path for a monolithic .NET Framework application to a modern microservices-based stack on AWS, and discuss AWS tools to break the monolith into microservices and containerize applications.

Cost optimization is another key factor for modernizing your workloads and solutions include moving to Linux-based systems or using open-source database engines. This Migrate and Modernize enterprise workloads with AWS video walks you through the process of migrating and modernizing enterprise workloads with AWS.

Take me to this blog post with more detail!

A modernized microservices-based rearchitecture

Implementing a serverless-first strategy in an enterprise

Organizations of all sizes want to benefit from the agility, cost savings, and developer experience that serverless architectures can provide on AWS. For large enterprises, the return on investment (ROI) can be massive, but overcoming architecture inertia while ensuring security best practices and governance stay in place is a hurdle that many struggle with. In this lightning talk, learn how your organization can implement a serverless-first strategy to overcome these obstacles. Delta Air Lines shares the story of making serverless-first a reality as part of their AWS journey.

Take me to this video

Benefits of serverless

Application Migration with AWS

This workshop shows you how to migrate and modernize a fictional application to the AWS Cloud by:

Performing a database migration
Migrating and modernizing your web server using different migration strategies (for example, breaking down the monolith into containers)
Teaching you how to improve Operation excellence, Security, Performance efficiency, and Cost optimization of the deployed architecture by following these pillars of the AWS Well-Architected Framework.

Take me to this workshop!

Different migration strategies for web servers

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about distributed systems with containers.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

AWS Week in Review – March 20, 2023

2023-03-21 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-march-20-2023/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

A new week starts, and Spring is almost here! If you’re curious about AWS news from the previous seven days, I got you covered.

Last Week’s Launches
Here are the launches that got my attention last week:

Amazon S3 – Last week there was AWS Pi Day 2023 celebrating 17 years of innovation since Amazon S3 was introduced on March 14, 2006. For the occasion, the team released many new capabilities:

S3 Object Lambda now provides aliases that are interchangeable with bucket names and can be used with Amazon CloudFront to tailor content for end users.
S3 now support datasets that are replicated across multiple AWS accounts with cross-account support for S3 Multi-Region Access Points.
You can now create and configure replication rules to automatically replicate S3 objects from one AWS Outpost to another.
Amazon S3 has also simplified private connectivity from on-premises networks: with private DNS for S3, on-premises applications can use AWS PrivateLink to access S3 over an interface endpoint, while requests from your in-VPC applications access S3 using gateway endpoints.
We released Mountpoint for Amazon S3, a high performance open source file client. Read more in the blog. Note that Mountpoint isn’t a general-purpose networked file system, and comes with some restrictions on file operations.

Amazon Linux 2023 – Our new Linux-based operating system is now generally available. Sébastien’s post is full of tips and info.

Application Auto Scaling – Now can use arithmetic operations and mathematical functions to customize the metrics used with Target Tracking policies. You can use it to scale based on your own application-specific metrics. Read how it works with Amazon ECS services.

AWS Data Exchange for Amazon S3 is now generally available – You can now share and find data files directly from S3 buckets, without the need to create or manage copies of the data.

Amazon Neptune – Now offers a graph summary API to help understand important metadata about property graphs (PG) and resource description framework (RDF) graphs. Neptune added support for Slow Query Logs to help identify queries that need performance tuning.

Amazon OpenSearch Service – The team introduced security analytics that provides new threat monitoring, detection, and alerting features. The service now supports OpenSearch version 2.5 that adds several new features such as support for Point in Time Search and improvements to observability and geospatial functionality.

AWS Lake Formation and Apache Hive on Amazon EMR – Introduced fine-grained access controls that allow data administrators to define and enforce fine-grained table and column level security for customers accessing data via Apache Hive running on Amazon EMR.

Amazon EC2 M1 Mac Instances – You can now update guest environments to a specific or the latest macOS version without having to tear down and recreate the existing macOS environments.

AWS Chatbot – Now Integrates With Microsoft Teams to simplify the way you troubleshoot and operate your AWS resources.

Amazon GuardDuty RDS Protection for Amazon Aurora – Now generally available to help profile and monitor access activity to Aurora databases in your AWS account without impacting database performance

AWS Database Migration Service – Now supports validation to ensure that data is migrated accurately to S3 and can now generate an AWS Glue Data Catalog when migrating to S3.

AWS Backup – You can now back up and restore virtual machines running on VMware vSphere 8 and with multiple vNICs.

Amazon Kendra – There are new connectors to index documents and search for information across these new content: Confluence Server, Confluence Cloud, Microsoft SharePoint OnPrem, Microsoft SharePoint Cloud. This post shows how to use the Amazon Kendra connector for Microsoft Teams.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
A few more blog posts you might have missed:

Women founders Q&A – We’re talking to six women founders and leaders about how they’re making impacts in their communities, industries, and beyond.

What you missed at that 2023 IMAGINE: Nonprofit conference – Where hundreds of nonprofit leaders, technologists, and innovators gathered to learn and share how AWS can drive a positive impact for people and the planet.

Monitoring load balancers using Amazon CloudWatch anomaly detection alarms – The metrics emitted by load balancers provide crucial and unique insight into service health, service performance, and end-to-end network performance.

Extend geospatial queries in Amazon Athena with user-defined functions (UDFs) and AWS Lambda – Using a solution based on Uber’s Hexagonal Hierarchical Spatial Index (H3) to divide the globe into equally-sized hexagons.

How cities can use transport data to reduce pollution and increase safety – A guest post by Rikesh Shah, outgoing head of open innovation at Transport for London.

For AWS open-source news and updates, here’s the latest newsletter curated by Ricardo to bring you the most recent updates on open-source projects, posts, events, and more.

Upcoming AWS Events
Here are some opportunities to meet:

AWS Public Sector Day 2023 (March 21, London, UK) – An event dedicated to helping public sector organizations use technology to achieve more with less through the current challenging conditions.

Women in Tech at Skills Center Arlington (March 23, VA, USA) – Let’s celebrate the history and legacy of women in tech.

The AWS Summits season is warming up! You can sign up here to know when registration opens in your area.

That’s all from me for this week. Come back next Monday for another Week in Review!

— Danilo

How to choose the right Amazon MSK cluster type for you

2023-03-13 Ali Alemi

Post Syndicated from Ali Alemi original https://aws.amazon.com/blogs/big-data/how-to-choose-the-right-amazon-msk-cluster-type-for-you/

Amazon Managed Streaming for Apache Kafka (Amazon MSK) is an AWS streaming data service that manages Apache Kafka infrastructure and operations, making it easy for developers and DevOps managers to run Apache Kafka applications and Kafka Connect connectors on AWS, without the need to become experts in operating Apache Kafka. Amazon MSK operates, maintains, and scales Apache Kafka clusters, provides enterprise-grade security features out of the box, and has built-in AWS integrations that accelerate development of streaming data applications. You can easily get started by creating an MSK cluster using the AWS Management Console with a few clicks.

When creating a cluster, you must choose a cluster type from two options: provisioned or serverless. Choosing the best cluster type for each workload depends on the type of workload and your DevOps preferences. Amazon MSK provisioned clusters offer more flexibility in how you scale, configure, and optimize your cluster. Amazon MSK Serverless, on the other hand, makes scaling, load management, and operation of the cluster easier for you. With MSK Serverless, you can run your applications without having to configure, manage the infrastructure, or optimize clusters, and you pay for the data volume you stream and retain. MSK Serverless fully manages partitions, including monitoring as well as ensuring an even balance of partition distribution across brokers in the cluster (auto-balancing).

In this post, I examine a use case with the fictitious company AnyCompany, who plans to use Amazon MSK for two applications. They must decide between provisioned or serverless cluster types. I describe a process by which they work backward from the applications’ requirements to find the best MSK cluster type for their workloads, including how the organizational structure and application requirements are relevant in finding the best offering. Lastly, I examine the requirements and their relationship to Amazon MSK features.

Use case

AnyCompany is an enterprise organization that is ready to move two of their Kafka applications to Amazon MSK.

The first is a large ecommerce platform, which is a legacy application that currently uses a self-managed Apache Kafka cluster run in their data centers. AnyCompany wants to migrate this application to the AWS Cloud and use Amazon MSK to reduce maintenance and operations overhead. AnyCompany has a DataOps team that has been operating self-managed Kafka clusters in their data centers for years. AnyCompany wants to continue using the DataOps team to manage the MSK cluster on behalf of the development team. There is very little flexibility for code changes. For example, a few modules of the application require plaintext communication and access to the Apache ZooKeeper cluster that comes with an MSK cluster. The ingress throughput for this application doesn’t fluctuate often. The ecommerce platform only experiences a surge in user activity during special sales events. The DataOps team has a good understanding of this application’s traffic pattern, and are confident that they can optimize an MSK cluster by setting some custom broker-level configurations.

The second application is a new cloud-native gaming application currently in development. AnyCompany hopes to launch this gaming application soon followed by a marketing campaign. Throughput needs for this application are unknown. The application is expected to receive high traffic initially, then user activity should decline gradually. Because the application is going to launch first in the US, traffic during the day is expected to be higher than at night. This application offers a lot of flexibility in terms of Kafka client version, encryption in transit, and authentication. Because this is a cloud-native application, AnyCompany hopes they can delegate full ownership of its infrastructure to the development team.

Solution overview

Let’s examine a process that helps AnyCompany decide between the two Amazon MSK offerings. The following diagram shows this process at a high level.

In the following sections, I explain each step in detail and the relevant information that AnyCompany needs to collect before they make a decision.

Competency in Apache Kafka

AWS recommends a list of best practices to follow when using the Amazon MSK provisioned offering. Amazon MSK provisioned, offers more flexibility so you make scaling decisions based on what’s best for your workloads. For example, you can save on cost by consolidating a group of workloads into a single cluster. You can decide which metrics are important to monitor and optimize your cluster through applying custom configurations to your brokers. You can choose your Apache Kafka version, among different supported versions, and decide when to upgrade to a new version. Amazon MSK takes care of applying your configuration and upgrading each broker in a rolling fashion.

With more flexibility, you have more responsibilities. You need to make sure your cluster is right-sized at any time. You can achieve this by monitoring a set of cluster-level, broker-level, and topic-level metrics to ensure you have enough resources that are needed for your throughput. You also need to make sure the number of partitions assigned to each broker doesn’t exceed the numbers suggested by Amazon MSK. If partitions are unbalanced, you need to even-load them across all brokers. If you have more partitions than recommended, you need to either upgrade brokers to a larger size or increase the number of brokers in your cluster. There are also best practices for the number of TCP connections when using AWS Identity and Access Management (IAM) authentication.

An MSK Serverless cluster takes away the complexity of right-sizing clusters and balancing partitions across brokers. This makes it easy for developers to focus on writing application code.

AnyCompany has an experienced DataOps team who are familiar with scaling operations and best practices for the MSK provisioned cluster type. AnyCompany can use their DataOps team’s Kafka expertise for building automations and easy-to-follow standard procedures on behalf of the ecommerce application team. The gaming development team is an exception, because they are expected to take the full ownership of the infrastructure.

In the following sections, I discuss other steps in the process before deciding which cluster type is right for each application.

Custom configuration

In certain use cases, you need to configure your MSK cluster differently from its default settings. This could be due to your application requirements. For example, AnyCompany’s ecommerce platform requires setting up brokers such that the default retention period for all topics is set to 72 hours. Also, topics should be or auto-created when they are requested and don’t exist.

The Amazon MSK provisioned offering provides a default configuration for brokers, topics, and Apache ZooKeeper nodes. It also allows you to create custom configurations and use them to create new MSK clusters or update existing clusters. An MSK cluster configuration consists of a set of properties and their corresponding values.

MSK Serverless doesn’t allow applying broker-level configuration. This is because AWS takes care of configuring and managing the backend nodes. It takes away the heavy lifting of configuring the broker nodes. You only need to manage your applications’ topics. To learn more, refer to the list of topic-level configurations that MSK Serverless allows you to change.

Unlike the ecommerce platform, AnyCompany’s gaming application doesn’t need broker-level custom configuration. The developers want to set the retention.ms and max.message.bytes per each topic only.

Application requirements

Apache Kafka applications differ in terms of their security; the way they connect, write, or read data; data retention period; and scaling patterns. For example, some applications can only scale vertically, whereas other applications can scale only horizontally. Although a flexible application can work with encryption in transit, a legacy application may only be able to communicate in plaintext format.

Cluster-level quotas

Amazon MSK enforces some quotas to ensure the performance, reliability, and availability of the service for all customers. These quotas are subject to change at any time. To access the latest values for each dimension, refer to Amazon MSK quota. Note that some of the quotas are soft limits and can be increased using a support ticket.

When choosing a cluster type in Amazon MSK, it’s important to understand your application requirements and compare those against quotas in relation with each offering. This makes sure you choose the best cluster type that meets your goals and application’s needs. Let’s examine how you can calculate the throughput you need and other important dimensions you need to compare with Amazon MSK quotas:

Number of clusters per account – Amazon MSK may have quotas for how many clusters you can create in a single AWS account. If this is limiting your ability to create more clusters, you can consider creating those in multiple AWS accounts and using secure connectivity patterns to provide access to your applications.
Message size – You need to make sure the maximum message size that your producer writes for a single message is lower than the configured size in the MSK cluster. MSK provisioned clusters allow you to change the default value in a custom configuration. If you choose MSK Serverless, check this value in Amazon MSK quota. The average message size is helpful when calculating the total ingress or egress throughput of the cluster, which I demonstrate later in this post.
Message rate per second – This directly influences total ingress and egress throughput of the cluster. Total ingress throughput equals the message rate per second multiplied by message size. You need to make sure your producer is configured for optimal throughput by adjusting batch.size and linger.ms properties. If you’re choosing MSK Serverless, you need to make sure you configure your producer to optimal batches with the rate that is lower than its request rate quota.
Number of consumer groups – This directly influences the total egress throughput of the cluster. Total egress throughput equals the ingress throughput multiplied by the number of consumer groups. If you’re choosing MSK Serverless, you need to make sure your application can work with these quotas.
Maximum number of partitions – Amazon MSK provisioned recommends not exceeding certain limits per broker (depending the broker size). If the number of partitions per broker exceeds the maximum value specified in the previous table, you can’t perform certain upgrade or update operations. MSK Serverless also has a quota of maximum number of partitions per cluster. You can request to increase the quota by creating a support case.

Partition-level quotas

Apache Kafka organizes data in structures called topics. Each topic consists of a single or many partitions. Partitions are the degree of parallelism in Apache Kafka. The data is distributed across brokers using data partitioning. Let’s examine a few important Amazon MSK requirements, and how you can make sure which cluster type works better for your application:

Maximum throughput per partition – MSK Serverless automatically balances the partitions of your topic between the backend nodes. It instantly scales when your ingress throughput increases. However, each partition has a quota of how much data it accepts. This is to ensure the data is distributed evenly across all partitions and backend nodes. In an MSK Serverless cluster, you need to create your topic with enough partitions such that the aggregated throughput is equal to the maximum throughput your application requires. You also need to make sure your consumers read data with a rate that is below the maximum egress throughput per partition quota. If you’re using Amazon MSK provisioned, there is no partition-level quota for write and read operations. However, AWS recommends you monitor and detect hot partitions and control how partitions should balance among the broker nodes.
Data storage – The amount of time each message is kept in a particular topic directly influences the total amount of storage needed for your cluster. Amazon MSK allows you to manage the retention period at the topic level. MSK provisioned clusters allow broker-level configuration to set the default data retention period. MSK Serverless clusters allow unlimited data retention, but there is a separate quota for the maximum data that can be stored in each partition.

Security

Amazon MSK recommends that you secure your data in the following ways. Availability of the security features varies depending on the cluster type. Before making a decision about your cluster type, check if your preferred security options are supported by your choice of cluster type.

Encryption at rest – Amazon MSK integrates with AWS Key Management Service (AWS KMS) to offer transparent server-side encryption. Amazon MSK always encrypts your data at rest. When you create an MSK cluster, you can specify the KMS key that you want Amazon MSK to use to encrypt your data at rest.
Encryption in transit – Amazon MSK uses TLS 1.2. By default, it encrypts data in transit between the brokers of your MSK cluster. You can override this default when you create the cluster. For communication between clients and brokers, you must specify one of the following settings:
- Only allow TLS encrypted data. This is the default setting.
- Allow both plaintext and TLS encrypted data.
- Only allow plaintext data.
Authentication and authorization – Use IAM to authenticate clients and allow or deny Apache Kafka actions. Alternatively, you can use TLS or SASL/SCRAM to authenticate clients, and Apache Kafka ACLs to allow or deny actions.

Cost of ownership

Amazon MSK helps you avoid spending countless hours and significant resources just managing your Apache Kafka cluster, adding little or no value to your business. With a few clicks on the Amazon MSK console, you can create highly available Apache Kafka clusters with settings and configuration based on Apache Kafka’s deployment best practices. Amazon MSK automatically provisions and runs Apache Kafka clusters. Amazon MSK continuously monitors cluster health and automatically replaces unhealthy nodes with no application downtime. In addition, Amazon MSK secures Apache Kafka clusters by encrypting data at rest and in transit. These capabilities can significantly reduce your Total Cost of Ownership (TCO).

With MSK provisioned clusters, you can specify and then scale cluster capacity to meet your needs. With MSK Serverless clusters, you don’t need to specify or scale cluster capacity. MSK Serverless automatically scales the cluster capacity based on the throughput, and you only pay per GB of data that your producers write to and your consumers read from the topics in your cluster. Additionally, you pay an hourly rate for your serverless clusters and an hourly rate for each partition that you create. The MSK Serverless cluster type generally offers a lower cost of ownership by taking away the cost of engineering resources needed for monitoring, capacity planning, and scaling MSK clusters. However, if your organization has a DataOps team with Kafka competency, you can use this competency to operate optimized MSK provisioned clusters. This allows you to save on Amazon MSK costs by consolidating several Kafka applications into a single cluster. There are a few critical considerations to decide when and how to split your workloads between multiple MSK clusters.

Apache ZooKeeper

Apache ZooKeeper is a service included in Amazon MSK when you create a cluster. It manages the Apache Kafka metadata and acts as a quorum controller for leader elections. Although interacting with ZooKeeper is not a recommended pattern, some Kafka applications have a dependency to connect directly to ZooKeeper. During the migration to Amazon MSK, you may find a few of these applications in your organization. This could be because they use an older version of the Kafka client library or other reasons. For example, applications that help with Apache Kafka admin operations or visibility such as Cruise Control usually need this kind of access.

Before you choose your cluster type, you first need to check which offering provides direct access to the ZooKeeper cluster. As of writing this post, only Amazon MSK provisioned provides direct access to ZooKeeper.

How AnyCompany chooses their cluster types

AnyCompany first needs to collect some important requirements about each of their applications. The following table shows these requirements. The rows marked with an asterisk (*) are calculated based on the values in previous rows.

Dimension	Ecommerce Platform	Gaming Application
Message rate per second	150,000	1,000
Maximum message size	15 MB	1 MB
Average message size	30 KB	15 KB
* Ingress throughput (average message size * message rate per second)	4.5GBps	15MBps
Number of consumer groups	2	1
* Outgress throughput (ingress throughput * number of consumer groups)	9 GBps	15 MBps
Number of topics	100	10
Average partition per topic	100	5
* Total number of partitions (number of topics * average partition per topic)	10,000	50
* Ingress per partition (ingress throughput / total number of partitions)	450 KBps	300 KBps
* Outgress per partition (outgress throughput / total number of partitions)	900 KBps	300 KBps
Data retention	72 hours	168 hours
* Total storage needed (ingress throughput * retention period in seconds)	1,139.06 TB	1.3 TB
Authentication	Plaintext and SASL/SCRAM	IAM
Need ZooKeeper access	Yes	No

For the gaming application, AnyCompany doesn’t want to use their in-house Kafka competency to support an MSK provisioned cluster. Also, the gaming application doesn’t need custom configuration, and its throughput needs are below the quotas set by the MSK Serverless cluster type. In this scenario, an MSK Serverless cluster makes more sense.

For the e-commerce platform, AnyCompany wants to use their Kafka competency. Moreover, their throughput needs exceed the MSK Serverless quotas, and the application requires some broker-level custom configuration. The ecommerce platform also can’t split between multiple clusters. Because of these reasons, AnyCompany chooses the MSK provisioned cluster type in this scenario. Additionally, AnyCompany can save more on cost with the Amazon MSK provisioned pricing model. Their throughput is consistent at most times and AnyCompany wants to use their DataOps team to optimize a provisioned MSK cluster and make scaling decisions based on their own expertise.

Conclusion

Choosing the best cluster type for your applications may seem complicated at first. In this post, I showed a process that helps you work backward from your application’s requirement and the resources available to you. MSK provisioned clusters offer more flexibility in how you scale, configure, and optimize your cluster. MSK Serverless, on the other hand, is a cluster type that makes it easier for you to run Apache Kafka clusters without having to manage compute and storage capacity. I generally recommend you begin with MSK Serverless if your application doesn’t require broker-level custom configurations, and your application throughput needs don’t exceed the quotas for the MSK Serverless cluster type. Sometimes it’s best to split your workloads between multiple MSK Serverless clusters, but if that isn’t possible, you may need to consider an MSK provisioned cluster. To operate an optimized MSK provisioned cluster, you need to have Kafka competency within your organization.

For further reading on Amazon MSK, visit the official product page.

About the author

Ali Alemi is a Streaming Specialist Solutions Architect at AWS. Ali advises AWS customers with architectural best practices and helps them design real-time analytics data systems that are reliable, secure, efficient, and cost-effective. He works backward from customers’ use cases and designs data solutions to solve their business problems. Prior to joining AWS, Ali supported several public sector customers and AWS consulting partners in their application modernization journey and migration to the cloud.

Let’s Architect! Architecting a data mesh

2023-03-08 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecting-a-data-mesh/

Data architectures were mainly designed around technologies rather than business domains in the past. This changed in 2019, when Zhamak Dehghani introduced the data mesh. Data mesh is an application of the Domain-Driven-Design (DDD) principles to data architectures: Data is organized into data domains and the data is the product that the team owns and offers for consumption.

A data mesh architecture unites the disparate data sources within an organization through centrally managed data-sharing and governance guidelines. Business functions can maintain control over how shared data is accessed because data mesh also solves advanced data security challenges through distributed, decentralized ownership.

This edition of Let’s Architect! introduces data mesh, highlights the foundational concepts of data architectures, and covers the patterns for designing a data mesh in the AWS cloud with supporting resources.

Data lakes, lake houses and data mesh: what, why, and how?

Let’s explore a video introduction to data lakes, lake houses, and data mesh. This resource explains how to leverage those concepts to gain greater data insights across different business segments, with a special focus on best practices to build a well-architected, modern data architecture on AWS. It also gives an overview of the AWS cloud services that can be used to create such architectures and describes the fundamental pillars of designing them.

Take me to this intro to data lakes, lake houses, and data mesh video!

Data mesh is an architecture pattern where data are organized into domains and seen as products to expose for consumption

Building data mesh architectures on AWS

Knowing what a data mesh architecture is, here is a step-by-step video from re:Invent 2022 on designing one. It covers a use case on how GoDaddy considered and implemented data mesh, in addition to:

The fundamental pillars behind a well-architected data mesh in the cloud
Finding an approach to build a data mesh architecture using native AWS services
Reasons for considering a data mesh architecture where data lakes provide limitations in some scenarios
How data mesh can be applied in practice to overcome them
The mental models to apply during the data mesh design process

Take me to this re:Invent 2022 video!

In the data mesh architecture the producers expose their data for consumption to the consumers. Access is regulated through a centralized governance layer.

Amazon DataZone: Democratize data with governance

Now let’s explore data accessibility as it relates to data mesh architectures.

Amazon DataZone is a new AWS business data catalog allowing you to unlock data across organizational boundaries with built-in governance. This service provides a unified environment where everyone in an organization—from data producers to data consumers—can access, share, and consume data in a governed manner.

Here is a video to learn how to apply AWS analytics services to discover, access, and share data across organizational boundaries within the context of a data mesh architecture.

Take me to this re:Invent 2022 video!

Amazon DataZone accelerates the adoption of the data mesh pattern by making it scalable to high number of producers and consumers.

Build a data mesh on AWS

Feeling inspired to build? Hands-on experience is a great way to learn and see how the theoretical concepts apply in practice.

This workshop teaches you a data mesh architecture building approach on AWS. Many organizations are interested in implementing this architecture to:

Move away from centralized data lakes to decentralized ownership
Deliver analytics solutions across business units

Learn how a data mesh architecture can be implemented with AWS native services.

Take me to this workshop!

The diagrams shows how to separate the producers, consumers and governance components through a multi-account strategy.

See you next time!

Thanks for exploring architecture tools and resources with us!

Next time we’ll talk about monitoring and observability.

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Patterns for enterprise data sharing at scale

2023-02-27 Venkata Sistla

Post Syndicated from Venkata Sistla original https://aws.amazon.com/blogs/big-data/patterns-for-enterprise-data-sharing-at-scale/

Data sharing is becoming an important element of an enterprise data strategy. AWS services like AWS Data Exchange provide an avenue for companies to share or monetize their value-added data with other companies. Some organizations would like to have a data sharing platform where they can establish a collaborative and strategic approach to exchange data with a restricted group of companies in a closed, secure, and exclusive environment. For example, financial services companies and their auditors, or manufacturing companies and their supply chain partners. This fosters development of new products and services and helps improve their operational efficiency.

Data sharing is a team effort, it’s important to note that in addition to establishing the right infrastructure, successful data sharing also requires organizations to ensure that business owners sponsor data sharing initiatives. They also need to ensure that data is of high quality. Data platform owners and security teams should encourage proper data use and fix any privacy and confidentiality issues.

This blog discusses various data sharing options and common architecture patterns that organizations can adopt to set up their data sharing infrastructure based on AWS service availability and data compliance.

Data sharing options and data classification types

Organizations operate across a spectrum of security compliance constraints. For some organizations, it’s possible to use AWS services like AWS Data Exchange. However, organizations working in heavily regulated industries like federal agencies or financial services might be limited by the allow listed AWS service options. For example, if an organization is required to operate in a Fedramp Medium or Fedramp High environment, their options to share data may be limited by the AWS services that are available and have been allow listed. Service availability is based on platform certification by AWS, and allow listing is based on the organizations defining their security compliance architecture and guidelines.

The kind of data that the organization wants to share with its partners may also have an impact on the method used for data sharing. Complying with data classification rules may further limit their choice of data sharing options they may choose.

The following are some general data classification types:

Public data – Important information, though often freely available for people to read, research, review and store. It typically has the lowest level of data classification and security.
Private data – Information you might want to keep private like email inboxes, cell phone content, employee identification numbers, or employee addresses. If private data were shared, destroyed, or altered, it might pose a slight risk to an individual or the organization.
Confidential or restricted data – A limited group of individuals or parties can access sensitive information often requiring special clearance or special authorization. Confidential or restricted data access might involve aspects of identity and authorization management. Examples of confidential data include Social Security numbers and vehicle identification numbers.

The following is a sample decision tree that you can refer to when choosing your data sharing option based on service availability, classification type, and data format (structured or unstructured). Other factors like usability, multi-partner accessibility, data size, consumption patterns like bulk load/API access, and more may also affect the choice of data sharing pattern.

In the following sections, we discuss each pattern in more detail.

Pattern 1: Using AWS Data Exchange

AWS Data Exchange makes exchanging data easier, helping organizations lower costs, become more agile, and innovate faster. Organizations can choose to share data privately using AWS Data Exchange with their external partners. AWS Data Exchange offers perimeter controls that are applied at identity and resource levels. These controls decide which external identities have access to specific data resources. AWS Data Exchange provides multiple different patterns for external parties to access data, such as the following:

AWS Data Exchange for Amazon Redshift
AWS Data Exchange for AWS Lake Formation (currently in preview)
AWS Data Exchange for Data APIs
AWS Data Exchange for data files
AWS Data Exchange for Amazon S3 (currently in preview)

The following diagram illustrates an example architecture.

With AWS Data Exchange, once the dataset to share (or sell) is configured, AWS Data Exchange automatically manages entitlements (and billing) between the producer and the consumer. The producer doesn’t have to manage policies, set up new access points, or create new Amazon Redshift data shares for each consumer, and access is automatically revoked if the subscription ends. This can significantly reduce the operational overhead in sharing data.

Pattern 2: Using AWS Lake Formation for centralized access management

You can use this pattern in cases where both the producer and consumer are on the AWS platform with an AWS account that is enabled to use AWS Lake Formation. This pattern provides a no-code approach to data sharing. The following diagram illustrates an example architecture.

In this pattern, the central governance account has Lake Formation configured for managing access across the producer’s org accounts. Resource links from the production account Amazon Simple Storage Service (Amazon S3) bucket are created in Lake Formation. The producer grants Lake Formation permissions on an AWS Glue Data Catalog resource to an external account, or directly to an AWS Identity and Access Management (IAM) principal in another account. Lake Formation uses AWS Resource Access Manager (AWS RAM) to share the resource. If the grantee account is in the same organization as the grantor account, the shared resource is available immediately to the grantee. If the grantee account is not in the same organization, AWS RAM sends an invitation to the grantee account to accept or reject the resource grant. To make the shared resource available, the consumer administrator in the grantee account must use the AWS RAM console or AWS Command Line Interface (AWS CLI) to accept the invitation.

Authorized principals can share resources explicitly with an IAM principal in an external account. This feature is useful when the producer wants to have control over who in the external account can access the resources. The permissions the IAM principal receives are a union of direct grants and the account-level grants that are cascaded down to the principals. The data lake administrator of the recipient account can view the direct cross-account grants, but can’t revoke permissions.

Pattern 3: Using AWS Lake Formation from the producer external sharing account

The producer may have stringent security requirements where no external consumer should access their production account or their centralized governance account. They may also not have Lake Formation enabled on their production platform. In such cases, as shown in the following diagram, the producer production account (Account A) is dedicated to its internal organization users. The producer creates another account, the producer external sharing account (Account B), which is dedicated for external sharing. This gives the producer more latitude to create specific policies for specific organizations.

The following architecture diagram shows an overview of the pattern.

The producer implements a process to create an asynchronous copy of data in Account B. The bucket can be configured for Same Region Replication (SRR) or Cross Region Replication (CRR) for objects that need to be shared. This facilitates automated refresh of data to the external account to the “External Published Datasets” S3 bucket without having to write any code.

Creating a copy of the data allows the producer to add another degree of separation between the external consumer and its production data. It also helps meet any compliance or data sovereignty requirements.

Lake Formation is set up on Account B, and the administrator creates resources links for the “External Published Datasets” S3 bucket in its account to grant access. The administrator follows the same process to grant access as described earlier.

Pattern 4: Using Amazon Redshift data sharing

This pattern is ideally suited for a producer who has most of their published data products on Amazon Redshift. This pattern also requires the producer’s external sharing account (Account B) and the consumer account (Account C) to have an encrypted Amazon Redshift cluster or Amazon Redshift Serverless endpoint that meets the prerequisites for Amazon Redshift data sharing.

The following architecture diagram shows an overview of the pattern.

Two options are possible depending on the producer’s compliance constraints:

Option A – The producer enables data sharing directly on the production Amazon Redshift cluster.
Option B – The producer may have constraints with respect to sharing the production cluster. The producer creates a simple AWS Glue job that copies data from the Amazon Redshift cluster in the production Account A to the Amazon Redshift cluster in the external Account B. This AWS Glue job can be scheduled to refresh data as needed by the consumer. When the data is available in Account B, the producer can create multiple views and multiple data shares as needed.

In both options, the producer maintains complete control over what data is being shared, and the consumer admin maintains full control over who can access the data within their organization.

After both the producer and consumer admins approve the data sharing request, the consumer user can access this data as if it were part of their own account without have to write any additional code.

Pattern 5: Sharing data securely and privately using APIs

You can adopt this pattern when the external partner doesn’t have a presence on AWS. You can also use this pattern when published data products are spread across various services like Amazon S3, Amazon Redshift, Amazon DynamoDB, and Amazon OpenSearch Service but the producer would like to maintain a single data sharing interface.

Here’s an example use case: Company A would like to share some of its log data in near-real time with its partner Company B, who uses this data to generate predictive insights for Company A. Company A stores this data in Amazon Redshift. The company wants to share this transactional information with its partner after masking the personally identifiable information (PII) in a cost-effective and secure way to generate insights. Company B doesn’t use the AWS platform.

Company A establishes a microbatch process using an AWS Lambda function or AWS Glue that queries Amazon Redshift to get incremental log data, applies the rules to redact the PII, and loads this data to the “Published Datasets” S3 bucket. This instantiates an SRR/CRR process that refreshes this data in the “External Sharing” S3 bucket.

The following diagram shows how the consumer can then use an API-based approach to access this data.

The workflow contains the following steps:

An HTTPS API request is sent from the API consumer to the API proxy layer.
The HTTPS API request is forwarded from the API proxy to Amazon API Gateway in the external sharing AWS account.
Amazon API Gateway calls the request receiver Lambda function.
The request receiver function writes the status to a DynamoDB control table.
A second Lambda function, the poller, checks the status of the results in the DynamoDB table.
The poller function fetches results from Amazon S3.
The poller function sends a presigned URL to download the file from the S3 bucket to the requestor via Amazon Simple Email Service (Amazon SES).
The requestor downloads the file using the URL.
The network perimeter AWS account only allows egress internet connection.
The API proxy layer enforces both the egress security controls and perimeter firewall before the traffic leaves the producer’s network perimeter.
The AWS Transit Gateway security egress VPC routing table only allows connectivity from the required producer’s subnet, while preventing internet access.

Pattern 6: Using Amazon S3 access points

Data scientists may need to work collaboratively on image, videos, and text documents. Legal and audit groups may want to share reports and statements with the auditing agencies. This pattern discusses an approach to sharing such documents. The pattern assumes that the external partners are also on AWS. Amazon S3 access points allow the producer to share access with their consumer by setting up cross-account access without having to edit bucket policies.

Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations, such as GetObject and PutObject. Each access point has distinct permissions and network controls that Amazon S3 applies for any request that is made through that access point. Each access point enforces a customized access point policy that works in conjunction with the bucket policy attached to the underlying bucket.

The following architecture diagram shows an overview of the pattern.

The producer creates an S3 bucket and enables the use of access points. As part of the configuration, the producer specifies the consumer account, IAM role, and privileges for the consumer IAM role.

The consumer users with the IAM role in the consumer account can access the S3 bucket via the internet or restricted to an Amazon VPC via VPC endpoints and AWS PrivateLink.

Conclusion

Each organization has its unique set of constraints and requirements that it needs to fulfill to set up an efficient data sharing solution. In this post, we demonstrated various options and best practices available to organizations. The data platform owner and security team should work together to assess what works best for your specific situation. Your AWS account team is also available to help.

Related resources

For more information on related topics, refer to the following:

About the Authors

Venkata Sistla is a Cloud Architect – Data & Analytics at AWS. He specializes in building data processing capabilities and helping customers remove constraints that prevent them from leveraging their data to develop business insights.

Santosh Chiplunkar is a Principal Resident Architect at AWS. He has over 20 years of experience helping customers solve their data challenges. He helps customers develop their data and analytics strategy and provides them with guidance on how to make it a reality.

Architecting for Sustainability at AWS re:Invent 2022

2023-02-24 Thomas Burns

Post Syndicated from Thomas Burns original https://aws.amazon.com/blogs/architecture/architecting-for-sustainability-at-aws-reinvent-2022/

AWS re:Invent 2022 featured 24 breakout sessions, chalk talks, and workshops on sustainability. In this blog post, we’ll highlight the sessions and announcements and discuss their relevance to the sustainability of, in, and through the cloud.

First, we’ll look at AWS’ initiatives and progress toward delivering efficient, shared infrastructure, water stewardship, and sourcing renewable power.

We’ll then summarize breakout sessions featuring AWS customers who are demonstrating the best practices from the AWS Well-Architected Framework Sustainability Pillar.

Lastly, we’ll highlight use cases presented by customers who are solving sustainability challenges through the cloud.

Sustainability of the cloud

The re:Invent 2022 Sustainability in AWS global infrastructure (SUS204) session is a deep dive on AWS’ initiatives to optimize data centers to minimize their environmental impact. These increases in efficiency provide carbon reduction opportunities to customers who migrate workloads to the cloud. Amazon’s progress includes:

Amazon is on path to power its operations with 100% renewable energy by 2025, five years ahead of the original target of 2030.
Amazon is the largest corporate purchaser of renewable energy with more than 400 projects globally, including recently announced projects in India, Canada, and Singapore. Once operational, the global renewable energy projects are expected to generate 56,881 gigawatt-hours (GWh) of clean energy each year.

At re:Invent, AWS announced that it will become water positive (Water+) by 2030. This means that AWS will return more water to communities than it uses in direct operations. This Water stewardship and renewable energy at scale (SUS211) session provides an excellent overview of our commitment. For more details, explore the Water Positive Methodology that governs implementation of AWS’ water positive goal, including the approach and measuring of progress.

Sustainability in the cloud

Independent of AWS efforts to make the cloud more sustainable, customers continue to influence the environmental impact of their workloads through the architectural choices they make. This is what we call sustainability in the cloud.

At re:Invent 2021, AWS launched the sixth pillar of the AWS Well-Architected Framework to explain the concepts, architectural patterns, and best practices to architect sustainably. In 2022, we extended the Sustainability Pillar best practices with a more comprehensive structure of anti-patterns to avoid, expected benefits, and implementation guidance.

Let’s explore sessions that show the Sustainability Pillar in practice. In the session Architecting sustainably and reducing your AWS carbon footprint (SUS205), Elliot Nash, Senior Manager of Software Development at Amazon Prime Video, dives deep on the exclusive streaming of Thursday Night Football on Prime Video. The teams followed the Sustainability Pillar’s improvement process from setting goals to replicating the successes to other teams. Implemented improvements include:

Automation of contingency switches that turn off non-critical customer features under stress to flatten demand peaks
Pre-caching content shown to the whole audience at the end of the game

Amazon Prime Video uses the AWS Customer Carbon Footprint Tool along with sustainability proxy metrics and key performance indicators (KPIs) to quantify and track the effectiveness of optimizations. Example KPIs are normalized Amazon Elastic Compute Cloud (Amazon EC2) instance hours per page impression or infrastructure cost per concurrent stream.

Another example of sustainability KPIs was presented in the Build a cost-, energy-, and resource-efficient compute environment (CMP204) session by Troy Gasaway, Vice President of Infrastructure and Engineering at Arm—a global semiconductor industry leader. Troy’s team wanted to measure, track, and reduce the impact of Electronic Design Automation (EDA) jobs. They used Amazon EC2 instances’ vCPU hours to calculate KPIs for Amazon EC2 Spot adoption, AWS Graviton adoption, and the resources needed per job.

The Sustainability Pillar recommends selecting Amazon EC2 instance types with the least impact and taking advantage of those designed to support specific workloads. The Sustainability and AWS silicon (SUS206) session gives an overview of the embodied carbon and energy consumption of silicon devices. The session highlights examples in which AWS silicon reduced the power consumption for machine learning (ML) inference with AWS Inferentia by 92 percent, and model training with AWS Trainium by 52 percent. Two effects contributed to the reduction in power consumption:

Purpose-built processors use less energy for the job
Due to better performance fewer instances are needed

David Chaiken, Chief Architect at Pinterest, shared Pinterest’s sustainability journey and how they complemented a rigid cost and usage management for ML workloads with data from the AWS Customer Carbon Footprint Tool, as in the figure below.

Figure 1. David Chaiken, Chief Architect at Pinterest, describes Pinterest’s sustainability journey with AWS

AWS announced the preview of a new generation of AWS Inferentia with the Inf2 instances, and C7gn instances. C7gn instances utilize the fifth generation of AWS Nitro cards. AWS Nitro offloads the host CPU to specialized hardware for a more consistent performance with lower CPU utilization. The new Nitro cards offer 40 percent better performance per watt than the previous generation.

Another best practice from the Sustainability Pillar is to use managed services. AWS is responsible for a large share of the optimization for resource efficiency for AWS managed services. We want to highlight the launch of AWS Verified Access. Traditionally, customers protect internal services from unauthorized access by placing resources into private subnets accessible through a Virtual Private Network (VPN). This often involves dedicated on-premises infrastructure that is provisioned to handle peak network usage of the staff. AWS Verified Access removes the need for a VPN. It shifts the responsibility for managing the hardware to securely access corporate applications to AWS and even improves your security posture. The service is built on AWS Zero Trust guiding principles and validates each application request before granting access. Explore the Introducing AWS Verified Access: Secure connections to your apps (NET214) session for demos and more.

In the session Provision and scale OpenSearch resources with serverless (ANT221) we announced the availability of Amazon OpenSearch Serverless. By decoupling compute and storage, OpenSearch Serverless scales resources in and out for both indexing and searching independently. This feature supports two key sustainability in the cloud design principles from the Sustainability Pillar out of the box:

Maximizing utilization
Scaling the infrastructure with user load

Sustainability through the cloud

Sustainability challenges are data problems that can be solved through the cloud with big data, analytics, and ML.

According to one study by PEDCA research, data centers in the EU consume approximately 3 percent of the EU’s energy generated. While it’s important to optimize IT for sustainability, we must also pay attention to reducing the other 97 percent of energy usage.

The session Serve your customers better with AWS Supply Chain (BIZ213) introduces AWS Supply Chain that generates insights into the data from your suppliers and your network to forecast and mitigate inventory risks. This service provides recommendations for stock rebalancing scored by distance to move inventory, risks, and also an estimation of the carbon emission impact.

The Easily build, train, and deploy ML models using geospatial data (AIM218) session introduces new Amazon SageMaker geospatial capabilities to analyze satellite images for forest density and land use changes and observe supply chain impacts. The AWS Solutions Library contains dedicated Guidance for Geospatial Insights for Sustainability on AWS with example code.

Some other examples for driving sustainability through the cloud as covered at re:Invent 2022 include these sessions:

SUS208: Utilizing sustainability data at scale
SUS210: Modeling climate change impacts and risks at scale
SUS212: Accelerating decarbonization and sustainability transformation
SUS301: Sustainable machine learning for protecting natural resources
SUS312: How innovators are driving more sustainable manufacturing
STP213: Scaling global carbon footprint management

Conclusion

We recommend revisiting the talks highlighted in this post to learn how you can utilize AWS to enhance your sustainability strategy. You can find all videos from the AWS re:Invent 2022 sustainability track in the Customer Enablement playlist. If you’d like to optimize your workloads on AWS for sustainability, visit the AWS Well-Architected Sustainability Pillar.

Let’s Architect! Architecture tools

2023-02-22 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecture-tools/

Tools, such as diagramming software, low-code applications, and frameworks, make it possible to experiment quickly. They are essential in today’s fast-paced and technology-driven world. From improving efficiency and accuracy, to enhancing collaboration and creativity, a well-defined set of tools can make a significant impact on the quality and success of a project in the area of software architecture.

As an architect, you can take advantage of a wide range of resources to help you build solutions that meet the needs of your organization. For example, with tools in the likes of the Amazon Web Services (AWS) Solutions Library and Serverless Land, you can boost your knowledge and productivity while working on event-driven architectures, microservices, and stateless computing.

In this Let’s Architect! edition, we explore how to incorporate these patterns into your architecture, and which tools to leverage to build solutions that are scalable, secure, and cost-effective.

How AWS Application Composer helps your team build great apps

In this re:Invent 2022 session, Chase Douglas, Principal Engineer at AWS, speaks about AWS Application Composer, a newly launched service.

This service has the potential to change the way architects design solutions—without writing a single line of code! The service is user-friendly, intuitive, and requires no prior coding experience. It allows users to scaffold a serverless architecture, defining a CloudFormation template visually with drag-and-drop. A detailed AWS Compute Blog post takes readers through the process of using AWS Application Composer.

Take me to this re:Invent 2022 video!

How an architecture can be designed with AWS Application Composer

AWS design + build tools

When migrating to the cloud, we suggest referencing these four tried-and-true AWS resources that can be used to design and build projects.

AWS Workshops are created by AWS teams to provide opportunities for hands-on learning to develop practical skills. Workshops are available in multiple categories and for skill levels 100-400.
AWS Architecture Center contains a collection of best practices and architectural patterns for designing and deploying cloud-based solutions using AWS services. Furthermore, it includes detailed architecture diagrams, whitepapers, case studies, and other resources that provide a wealth of information on how to design and implement cloud solutions.
Serverless Land (an Amazon property) brings together various patterns, workflows, code snippets, and blog posts pertaining to AWS serverless architectures.
AWS Solutions Library provides customers with templates, tools, and automated workflows to easily deploy, operate, and manage common use cases on the AWS Cloud.

Inside event-driven architectures designed by David Boyne on Serverless Land

The Well-Architected way

In this session, the AWS Well-Architected provides guidance on how to implement the architectural models reported in the AWS Well-Architected Framework within your organization at scale.

Discover a customer story and understand how to use the features of the AWS Well-Architected Tool and APIs to receive recommendations based on your workload and measure your architectural metrics. In the Framework whitepaper, you can explore the six pillars of Well-Architected (operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability) and best practices to achieve them.

Understanding the key design pillars can help architects make informed design decisions, leading to more robust and efficient solutions. This knowledge also enables architects to identify potential problems early on in the design process and find appropriate patterns to address those issues.

Take me to the Well-Architected video!

Discover how the AWS Well-Architected Framework can help you design scalable, maintainable, and reusable solutions

See you next time!

Thanks for exploring architecture tools and resources with us!

Join us next time when we’ll talk about data mesh architecture!

To find all the posts from this series, check out the Let’s Architect! page of the AWS Architecture Blog.

Author Spotlight: Eduardo Monich Fronza, Senior Partner Solutions Architect, Linux and IBM

2023-02-17 Elise Chahine

Post Syndicated from Elise Chahine original https://aws.amazon.com/blogs/architecture/author-spotlight-eduardo-monich-fronza-senior-partner-sa-linux-and-ibm/

The Author Spotlight series pulls back the curtain on some of AWS’s most prolific authors. Read on to find out more about our very own Eduardo Monich Fronza’s journey, in his own words!

I have been a Partner Solutions Architect at Amazon Web Services (AWS) for just over two years. In this period, I have had the opportunity to work in projects from different partners and customers across the globe, in multiple industry segments, using a wide variety of technologies.

At AWS, we are obsessed with our customers, and this influences all of our activities. I enjoy diving deep to understand our partners’ motivations, as well as their technical and business challenges. Plus, I work backwards from their goals, helping them build innovative solutions using AWS services—solutions that they can successfully offer to their customers and achieve their targeted business results.

Before joining AWS, I worked mainly in Brazil for many years as a middleware engineer and, later, a cloud migration architect. During this period, I travelled to my customers in North America and Europe. These experiences taught me a lot about customer-facing engagements, how to focus on customers problems, and how to work backwards from those.

When I joined AWS, I was exposed to so many new technologies and projects that I have never had any previous experience with! This was a very exciting, as it provided me with many opportunities to dive deep and learn. A couple of the places I love to go to learn new content are our AWS Architecture Blog and AWS Reference Architecture Diagrams library.

The other thing I’ve realized during my tenure is how amazing it is to work with other people at AWS. I can say that I feel very fortunate to work with a wide range of intelligent and passionate problem-solvers. My peers are always willing to help and work together to provide the best possible solutions for our partners. I believe this collaboration is one of the reasons why AWS has been able to help partners and customer be so successful in their journeys to the cloud.

AWS encourages us to dive deep and specialize in technology domains. My background as a middleware engineer has influenced my decisions, and I am passionate about application modernization and containers areas in particular. A couple of topics that I am particularly interested in are Red Hat OpenShift Service on AWS (ROSA) and IBM software on AWS.

Eduardo presenting on the strategic partnership between AWS and IBM at IBM Think London 2022

This also shows how interesting it is to work with ISVs like Red Hat and IBM. It demonstrates, yet again, how AWS is customer-obsessed and works backwards from what customers need to be successful in their own rights. Regardless of if they are using AWS native services or an ISV solution on AWS, we at AWS always focus on what is right for our customers.

I am also very fond of running workshops, called Immersion Days, for our customers. And, I have recently co-authored an AWS modernization workshop with IBM, which shows how customers can use IBM Cloud Pak for Data on AWS along with AWS services to create exciting Analytics and AI/ML workloads!

In conclusion, working as a Partner Solutions Architect at AWS has been an incredibly rewarding experience for me. I work with great people, a wide range of industries and technologies, and, most importantly, help our customers and partners innovate and find success on AWS. If you are considering a career at AWS, I would highly recommend it: it’s an unparalleled working experience, and the are no shortages of opportunities to take part in exciting projects!

Eduardo’s favorite blog posts!

Deploying IBM Cloud Pak for Data on Red Hat OpenShift Service on AWS

Alright, I will admit: I am being a bit biased. But, hey, this was my first blog at AWS! Many customers are looking to adopt IBM Data and AI solutions on AWS, particularly on how to use ROSA to deploy IBM Cloud Pak for Data.

So, I created a how-to deployment guide, demonstrating how a customer can take advantage of ROSA, without having to manage the lifecycle of Red Hat OpenShift Container Platform clusters. Instead, I focus on developing new solutions and innovating faster, using IBM’s integrated data and artificial intelligence platform on AWS.

IBM Cloud Pak for Integration on ROSA architecture

Unleash Mainframe Applications by Augmenting New Channels on AWS with IBM Z and Cloud Modernization Stack

Many AWS customers use the IBM mainframe for their core business-critical applications. These customers are looking for ways to build modern cloud-native applications on AWS, that often require access to business-critical data on their IBM mainframe.

This AWS Partner Network (APN) Blog post shows how these customers can integrate cloud-native applications on AWS, with workloads running on mainframes, by exposing them as industry standard RESTful APIs with a no-code approach.

Mainframe-to-AWS integration reference architecture.

Migrate and Modernize Db2 Databases to Amazon EKS Using IBM’s Click to Containerize Tool

This blog shows customers, who are exploring ways to modernize their IBM Db2 databases, can move their databases quickly and easily to Amazon Elastic Kubernetes Service (Amazon EKS), ROSA and IBM’s Cloud Pak for Data products on AWS.

Scenario showing move from instance to container

Self-service AWS native service adoption in OpenShift using ACK

This Containers Blog post demonstrates how customers can use AWS Controllers for Kubernetes (ACK) to define and create AWS resources directly from within OpenShift. It allows customers to take advantage of AWS-managed services to complement the application workloads running in OpenShift, without needing to define resources outside of the cluster or run services that provide supporting capabilities like databases or message queues.

ACK is now integrated into OpenShift and being used to provide a broad collection of AWS native services presently available on the OpenShift OperatorHub.

AWS Controllers for Kubernetes workflow

AWS Security Profile: Jana Kay, Cloud Security Strategist

2023-02-14 Roger Park

Post Syndicated from Roger Park original https://aws.amazon.com/blogs/security/aws-security-profile-jana-kay-cloud-security-strategist/

In the AWS Security Profile series, we interview Amazon Web Services (AWS) thought leaders who help keep our customers safe and secure. This interview features Jana Kay, Cloud Security Strategist. Jana shares her unique career journey, insights on the Security and Resiliency of the Cloud Tabletop Exercise (TTX) program, thoughts on the data protection and cloud security landscape, and more.

How long have you been at AWS and what do you do in your current role?
I’ve been at AWS a little over four years. I started in 2018 as a Cloud Security Strategist, and in my opinion, I have one of the coolest jobs at AWS. I get to help customers think through how to use the cloud to address some of their most difficult security challenges, by looking at trends and emerging and evolving issues, and anticipating those that might still be on the horizon. I do this through various means, such as whitepapers, short videos, and tabletop exercises. I love working on a lot of different projects, which all have an impact on customers and give me the opportunity to learn new things myself all the time!

How did you get started in the security space? What about it piqued your interest?
After college, I worked in the office of a United States senator, which led me to apply to the Harvard Kennedy School for a graduate degree in public policy. When I started graduate school, I wasn’t sure what my focus would be, but my first day of class was September 11, 2001, which obviously had a tremendous impact on me and my classmates. I first heard about the events of September 11 while I was in an international security policy class, taught by the late Dr. Ash Carter. My classmates and I came from a range of backgrounds, cultures, and professions, and Dr. Carter challenged us to think strategically and objectively—but compassionately—about what was unfolding in the world and our responsibility to effect change. That experience led me to pursue a career in security. I concentrated in international security and political economy, and after graduation, accepted a Presidential Management Fellowship in the Office of the Secretary of Defense at the Pentagon, where I worked for 16 years before coming to AWS.

What’s been the most dramatic change you’ve seen in the security industry?
From the boardroom to builder teams, the understanding that security has to be integrated into all aspects of an organization’s ecosystem has been an important shift. Acceptance of security as foundational to the health of an organization has been evolving for a while, and a lot of organizations have more work to do, but overall there is prioritization of security within organizations.

I understand you’ve helped publish a number of papers at AWS. What are they and how can customers find them?
Good question! AWS publishes a lot of great whitepapers for customers. A few that I’ve worked on are Accreditation Models for Secure Cloud Adoption, Security at the Edge: Core Principles, and Does data localization cause more problems than it solves? To stay updated on the latest whitepapers, see AWS Whitepapers & Guides.

What are your thoughts on the security of the cloud today?
There are a lot of great technologies—such as AWS Data Protection services—that can help you with data protection, but it’s equally important to have the right processes in place and to create a strong culture of data protection. Although one of the biggest shifts I’ve seen in the industry is recognition of the importance of security, we still have a ways to go for people to understand that security and data protection is everyone’s job, not just the job of security experts. So when we talk about data protection and privacy issues, a lot of the conversation focuses on things like encryption, but the conversation shouldn’t end there because ultimately, security is only as good as the processes and people who implement it.

Do you have a favorite AWS Security service and why?
I like anything that helps simplify my life, so AWS Control Tower is one of my favorites. It has so much functionality. Not only does AWS Control Tower help you set up multi-account AWS environments, you can use it to help identify which of your resources are compliant. The dashboard, which allows for visibility of provisioned accounts, controls enabled policy enforcement and can help you detect noncompliant resources.

What are you currently working on that you’re excited about?
Currently, my focus is the Security and Resiliency of the Cloud Tabletop Exercise (TTX). It’s a 3-hour interactive event about incident response in which participants discuss how to prevent, detect, contain, and eradicate a simulated cyber incident. I’ve had the opportunity to conduct the TTX in South America, the Middle East, Europe, and the US, and it’s been so much fun meeting customers and hearing the discussions during the TTX and how much participants enjoy the experience. It scales well for groups of different sizes—and for a single customer or industry or for multiple customers or industries—and it’s been really interesting to see how the conversations change depending on the participants.

How does the Security and Resiliency of the Cloud Tabletop Exercise help security professionals hone their skills?
One of the great things about the tabletop is that it involves interacting with other participants. So it’s an opportunity for security professionals and business leaders to learn from their peers, hear different perspectives, and understand all facets of the problem and potential solutions. Often our participants range from CISOs to policymakers to technical experts, who come to the exercise with different priorities for data protection and different ideas on how to solve the scenarios that we present. The TTX isn’t a technical exercise, but participants link their collective understanding of what capabilities are needed in a given scenario to what services are available to them and then finally how to implement those services. One of the things that I hope participants leave with is a better understanding of the AWS tools and services that are available to them.

How can customers learn more about the Security and Resiliency of the Cloud Tabletop Exercise?
To learn more about the TTX, reach out to your account manager.

Is there something you wish customers would ask you about more often?
I wish they’d ask more about what they should be doing to prepare for a cyber incident. It’s one thing to have an incident response plan; it’s another thing to be confident that it’s going to work if you ever need it. If you don’t practice the plan, how do you know that it’s effective, if it has gaps, or if everyone knows their role in an incident?

How about outside of work—any hobbies?
I’m the mother of a teenager and tween, so between keeping up with their activities, I wish I had more time for hobbies! But someday soon, I’d like to get back to traveling more for leisure, reading for fun, and playing tennis.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

How to Connect Business and Technology to Embrace Strategic Thinking (Book Review)

2023-02-14 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/how-to-connect-business-and-technology-to-embrace-strategic-thinking-book-review/

The Value Flywheel Effect: Power the Future and Accelerate Your Organization to the Modern Cloud
by David Anderson with Mark McCann and Michael O’Reilly

With this post, I’d like to share a new book that got my attention. It’s a book at the intersection of business, technology, and people. This is a great read for anyone who wants to understand how organizations can evolve to maximize the business impact of new technologies and speed up their internal processes.

The Value FlyWheel Effect book with David Anderson and Danilo Poccia

Last year at re:Invent, I had the opportunity to meet David Anderson. As Director of Technology at Liberty Mutual, he drove the technology change when the global insurance company, founded in 1912, moved its services to the cloud and adopted a serverless-first strategy. He created an environment where experimentation was normal, and software engineers had time and space to learn. This worked so well that, at some point, he had four AWS Heroes in his extended team.

A few months before, I heard that David was writing a book with Mark McCann and Michael O’Reilly. They all worked together at Liberty Mutual, and they were distilling their learnings to help other organizations implement a similar approach. The book was just out when we met, and I was curious to learn more, starting from the title. We met in the expo area, and David was kind enough to give me a signed copy of the book.

The book is published by IT Revolution, the same publisher behind some of my favorite books such as The Phoenix Project, Team Topologies, and Accelerate. The book is titled The Value Flywheel Effect because when you connect business and technology in an organization, you start to turn a flywheel that builds momentum with each small win.

The Value Flywhell
The four phases of the Value Flywheel are:

Clarity of Purpose – This is the part where you look at what is really important for your organization, what makes your company different, and define your North Star and how to measure your distance from it. In this phase, you look at the company through the eyes of the CEO.
Challenge & Landscape – Here you prepare the organization and set up the environment for the teams. We often forget the social aspect of technical teams and great focus is given here on how to set up the right level of psychological safety for teams to operate. This phase is for engineers.
Next Best Action – In this phase, you think like a product leader and plan the next steps with a focus on how to improve the developer experience. One of the key aspects is that “code is a liability” and the less code you write to solve a business problem, the better it is for speed and maintenance. For example, you can avoid some custom implementations and offload their requirements to capabilities offered by cloud providers.
Long-Term Value – This is the CTO perspective, looking at how to set up a problem-preventing culture with well-architected systems and a focus on observability and sustainability. Sustainability here is not just considering the global environment but also the teams and the people working for the organization.

As you would expect from a flywheel, you should iterate on these four phases so that every new spin gets easier and faster.

Wardley Mapping
One thing that I really appreciate from the book is how it made it easy for me to use Wardley mapping (usually applied to a business context) in a technical scenario. Wardley maps, invented by Simon Wardley, provide a visual representation of the landscape in which a business operates.

Each map consists of a value chain, where you draw the components that your customers need. The components are connected to show how they depend on each other. The position of the components is based on how visible they are to customers (vertical) and their evolution status from genesis to being a product or a commodity (horizontal). Over time, some components evolve from being custom-built to becoming a product or being commoditized. This displays on the map with a natural movement to the right as things evolve. For example, data centers were custom-built in the past, but then they became a standard product, and cloud computing made them available as a commodity.

Basic elements of a map – Provided courtesy of Simon Wardley, CC BY-SA 4.0.

With mapping, you can more easily understand what improvements you need and what gaps you have in your technical solution. In this way, engineers can identify which components they should focus on to maximize their impact and what parts are not strategic and can be offloaded to a SaaS solution. It’s a sort of evolutionary architecture where mapping gives a way to look ahead at how the system should evolve over time and where inertia can slow down the evolution of part of the system.

Sometimes it seems the same best practices apply everywhere but this is not true. An advantage of mapping is that it helps identify the best team and methodology to use based on a component evolution status as described by its horizontal position on a map. For example, an “explorer” attitude is best suited for components in their genesis or being custom built, a “villager” works best on products, and when something becomes a commodity you need a “town planner.”

More Tools and Less Code
The authors look at many available tools and frameworks. For example, the book introduces the North Star Framework, a way to manage products by first identifying their most important metric (the North Star), and Gojko Adzic‘s Impact Mapping, a collaborative planning technique that focuses on leading indicators to help teams make a big impact with their software products. By the way, Gojko is also an AWS Serverless Hero.

Another interesting point is how to provide engineers with the necessary time and space to learn. I specifically like how internal events are called out and compared to public conferences. In internal events, engineers have a chance to use a new technology within their company environment, making it easier to demonstrate what can be done with all the limits of an actual scenario.

Finally, I’d like to highlight this part that clearly defines what the book intends by the statements, “code is a liability”:

“When you ask a software team to build something, they deliver a system, not lines of code. The asset is not the code; the asset is the system. The less code in the system, the less overhead you have bought. Some developers may brag about how much code they’ve written, but this isn’t something to brag about.”

This is not a programming book, and serverless technologies are used as examples of how you can speed up the flywheel. If you are looking for a technical deep dive on serverless technologies, you can find more on Serverless Land, a site that brings together the latest information and learning resources for serverless computing, or have a look at the Serverless Architectures on AWS book.

Now that every business is a technology business, The Value Flywheel Effect is about how to accelerate and transform an organization. It helps set the right environment, purpose, and stage to modernize your applications as you adopt cloud computing and get the benefit of it.

You can meet David, Mark, and Michael at the Serverless Edge, where a team of engineers, tech enthusiasts, marketers, and thought leaders obsessed with technology help learn and communicate how serverless can transform a business model.

— Danilo

Let’s Architect! Architecting for sustainability

2023-02-08 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecting-for-sustainability/

Sustainability is an important topic in the tech industry, as well as society as a whole, and defined as the ability to continue to perform a process or function over an extended period of time without depletion of natural resources or the environment.

One of the key elements to designing a sustainable workload is software architecture. Think about how event-driven architecture can help reduce the load across multiple microservices, leveraging solutions like batching and queues. In these cases, the main traffic is absorbed at the entry-point of a cloud workload and ease inside your system. On top of architecture, think about data patterns, hardware optimizations, multi-environment strategies, and many more aspects of a software development lifecycle that can contribute to your sustainable posture in the Cloud.

The key takeaway: designing with sustainability in mind can help you build an application that is not only durable but also flexible enough to maintain the agility your business requires.

In this edition of Let’s Architect!, we share hands-on activities, case studies, and tips and tricks for making your Cloud applications more sustainable.

Architecting sustainably and reducing your AWS carbon footprint

Amazon Web Services (AWS) launched the Sustainability Pillar of the AWS Well-Architected Framework to help organizations evaluate and optimize their use of AWS services, and built the customer carbon footprint tool so organizations can monitor, analyze, and reduce their AWS footprint.

This session provides updates on these programs and highlights the most effective techniques for optimizing your AWS architectures. Find out how Amazon Prime Video used these tools to establish baselines and drive significant efficiencies across their AWS usage.

Take me to this re:Invent 2022 video!

Prime Video case study for understanding how the architecture can be designed for sustainability

Optimize your modern data architecture for sustainability

The modern data architecture is the foundation for a sustainable and scalable platform that enables business intelligence. This AWS Architecture Blog series provides tips on how to develop a modern data architecture with sustainability in mind.

Comprised of two posts, it helps you revisit and enhance your current data architecture without compromising sustainability.

Take me to Part 1! | Take me to Part 2!

An AWS data architecture; it’s now time to account for sustainability

AWS Well-Architected Labs: Sustainability

This workshop introduces participants to the AWS Well-Architected Framework, a set of best practices for designing and operating high-performing, highly scalable, and cost-efficient applications on AWS. The workshop also discusses how sustainability is critical to software architecture and how to use the AWS Well-Architected Framework to improve your application’s sustainability performance.

Take me to this workshop!

Sustainability implementation best practices and monitoring

Sustainability in the cloud with Rust and AWS Graviton

In this video, you can learn about the benefits of Rust and AWS Graviton to reduce energy consumption and increase performance. Rust combines the resource efficiency of programming languages, like C, with memory safety of languages, like Java. The video also explains the benefits deriving from AWS Graviton processors designed to deliver performance- and cost-optimized cloud workloads. This resource is very helpful to understand how sustainability can become a driver for cost optimization.

Take me to this re:Invent 2022 video!

Discover how Rust and AWS Graviton can help you make your workload more sustainable and performant

See you next time!

Thanks for joining us to discuss sustainability in the cloud! See you in two weeks when we’ll talk about tools for architects.

To find all the blogs from this series, you can check the Let’s Architect! list of content on the AWS Architecture Blog.

Let’s Architect! Designing event-driven architectures

2023-01-25 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-designing-event-driven-architectures/

During the design of distributed systems, we have to identify a communication strategy to exchange information between different services while keeping the evolutionary nature of the architecture in mind. Event-driven architectures are based on events (facts that happened in a system), which are asynchronously exchanged to implement communication across different services while having a high degree of decoupling. This paradigm also allows us to run code in response to events, with benefits like cost optimization and sustainability for the entire infrastructure.

In this edition of Let’s Architect!, we share architectural resources to introduce event-driven architectures, how to build them on AWS, and how to approach the design phase.

AWS re:Invent 2022 – Keynote with Dr. Werner Vogels

re:Invent 2022 may be finished, but the keynote given by Amazon’s Chief Technology Officer, Dr. Werner Vogels, will not be forgotten. Vogels not only covered the announcements of new services but also event-driven architecture foundations in conjunction with customers’ stories on how this architecture helped to improve their systems.

Take me to this re:Invent 2022 video!

Dr. Werner Vogels presenting an example of architecture where Amazon EventBridge is used as event bus

Benefits of migrating to event-driven architecture

In this blog post, we enumerate clearly and concisely the benefits of event-driven architectures, such as scalability, fault tolerance, and developer velocity. This is a great post to start your journey into the event-driven architecture style, as it explains the difference from request-response architecture.

Take me to this Compute Blog post!

Two common options when building applications are request-response and event-driven architectures

Building next-gen applications with event-driven architectures

When we build distributed systems or migrate from a monolithic to a microservices architecture, we need to identify a communication strategy to integrate the different services. Teams who are building microservices often find that integration with other applications and external services can make their workloads tightly coupled.

In this re:Invent 2022 video, you learn how to use event-driven architectures to decouple and decentralize application components through asynchronous communication. The video introduces the differences between synchronous and asynchronous communications before drilling down into some key concepts for designing and building event-driven architectures on AWS.

Take me to this re:Invent 2022 video!

How to use choreography to exchange information across services plus implement orchestration for managing operations within the service boundaries

Designing events

When starting on the journey to event-driven architectures, a common challenge is how to design events: “how much data should an event contain?” is a typical first question we encounter.

In this pragmatic post, you can explore the different types of events, watch a video that explains even further how to use event-driven architectures, and also go through the new event-driven architecture section of serverlessland.com.

Take me to Serverless Land!

An example of events with sparse and full state description

See you next time!

Thanks for reading our first blog of 2023! Join us next time, when we’ll talk about architecture and sustainability.

To find all the blogs from this series, visit the Let’s Architect! section of the AWS Architecture Blog.

Journey to adopt Cloud-Native DevOps platform Series #2: Progressive delivery on Amazon EKS with Flagger and Gloo Edge Ingress Controller

2023-01-18 Purna Sanyal

Post Syndicated from Purna Sanyal original https://aws.amazon.com/blogs/devops/journey-to-adopt-cloud-native-devops-platform-series-2-progressive-delivery-on-amazon-eks-with-flagger-and-gloo-edge-ingress-controller/

In the last post, OfferUp modernized its DevOps platform with Amazon EKS and Flagger to accelerate time to market, we talked about hypergrowth and the technical challenges encountered by OfferUp in its existing DevOps platform. As a reminder, we presented how OfferUp modernized its DevOps platform with Amazon Elastic Kubernetes Service (Amazon EKS) and Flagger to gain developer’s velocity, automate faster deployment, and achieve lower cost of ownership.

In this post, we discuss the technical steps to build a DevOps platform that enables the progressive deployment of microservices on Amazon Managed Amazon EKS. Progressive delivery exposes a new version of the software incrementally to ingress traffic and continuously measures the success rate of the metrics before allowing all of the new traffics to a newer version of the software. Flagger is the Graduate project of Cloud Native Computing Foundations (CNCF) that enables progressive canary delivery, along with bule/green and A/B Testing, while measuring metrics like HTTP/gRPC request success rate and latency. Flagger shifts and routes traffic between app versions using a service mesh or an Ingress controller

We leverage Gloo Ingress Controller for traffic routing, Prometheus, Datadog, and Amazon CloudWatch for application metrics analysis and Slack to send notification. Flagger will post messages to slack when a deployment has been initialized, when a new revision has been detected, and if the canary analysis failed or succeeded.

Prerequisite steps to build the modern DevOps platform

You need an AWS Account and AWS Identity and Access Management (IAM) user to build the DevOps platform. If you don’t have an AWS account with Administrator access, then create one now by clicking here. Create an IAM user and assign admin role. You can build this platform in any AWS region however, I will you us-west-1 region throughout this post. You can use a laptop (Mac or Windows) or an Amazon Elastic Compute Cloud (AmazonEC2) instance as a client machine to install all of the necessary software to build the GitOps platform. For this post, I launched an Amazon EC2 instance (with Amazon Linux2 AMI) as the client and install all of the prerequisite software. You need the awscli, git, eksctl, kubectl, and helm applications to build the GitOps platform. Here are the prerequisite steps,

Create a named profile(eks-devops) with the config and credentials file:

aws configure --profile eks-devops

AWS Access Key ID [None]: xxxxxxxxxxxxxxxxxxxxxx

AWS Secret Access Key [None]: xxxxxxxxxxxxxxxxxx

Default region name [None]: us-west-1

Default output format [None]:

View and verify your current IAM profile:

export AWS_PROFILE=eks-devops

aws sts get-caller-identity

If the Amazon EC2 instance doesn’t have git preinstalled, then install git in your Amazon EC2 instance:

sudo yum update -y

sudo yum install git -y

Check git version

git version

Git clone the repo and download all of the prerequisite software in the home directory.

git clone https://github.com/aws-samples/aws-gloo-flux.git

Download all of the prerequisite software from install.sh which includes awscli, eksctl, kubectl, helm, and docker:

cd aws-gloo-flux/eks-flagger/

ls -lt

chmod 700 install.sh ecr-setup.sh

. install.sh

Check the version of the software installed:

aws --version

eksctl version

kubectl version -o json

helm version

docker --version

docker info

If the docker info shows an error like “permission denied”, then reboot the Amazon EC2 instance or re-log in to the instance again.

Create an Amazon Elastic Container Repository (Amazon ECR) and push application images.

Amazon ECR is a fully-managed container registry that makes it easy for developers to share and deploy container images and artifacts. ecr setup.sh script will create a new Amazon ECR repository and also push the podinfo images (6.0.0, 6.0.1, 6.0.2, 6.1.0, 6.1.5 and 6.1.6) to the Amazon ECR. Run ecr-setup.sh script with the parameter, “ECR repository name” (e.g. ps-flagger-repository) and region (e.g. us-west-1)

./ecr-setup.sh <ps-flagger-repository> <us-west-1>

You’ll see output like the following (truncated).

###########################################################

Successfully created ECR repository and pushed podinfo images to ECR #

Please note down the ECR repository URI

xxxxxx.dkr.ecr.us-west-1.amazonaws.com/ps-flagger-repository

Technical steps to build the modern DevOps platform

This post shows you how to use the Gloo Edge ingress controller and Flagger to automate canary releases for progressive deployment on the Amazon EKS cluster. Flagger requires a Kubernetes cluster v1.16 or newer and Gloo Edge ingress 1.6.0 or newer. This post will provide a step-by-step approach to install the Amazon EKS cluster with managed node group, Gloo Edge ingress controller, and Flagger for Gloo in the Amazon EKS cluster. Now that the cluster, metrics infrastructure, and Flagger are installed, we can install the sample application itself. We’ll use the standard Podinfo application used in the Flagger project and the accompanying loadtester tool. The Flagger “podinfo” backend service will be called by Gloo’s “VirtualService”, which is the root routing object for the Gloo Gateway. A virtual service describes the set of routes to match for a set of domains. We’ll automate the canary promotion, with the new image of the “podinfo” service, from version 6.0.0 to version 6.0.1. We’ll also create a scenario by injecting an error for automated canary rollback while deploying version 6.0.2.

Use myeks-cluster.yaml to create your Amazon EKS cluster with managed nodegroup. myeks-cluster.yaml deployment file has “cluster name” value as ps-eks-66, region value as us-west-1, availabilityZones as [us-west-1a, us-west-1b], Kubernetes version as 1.24, and nodegroup Amazon EC2 instance type as m5.2xlarge. You can change this value if you want to build the cluster in a separate region or availability zone.

eksctl create cluster -f myeks-cluster.yaml

Check the Amazon EKS Cluster details:

kubectl cluster-info

kubectl version -o json

kubectl get nodes -o wide

kubectl get pods -A -o wide

Deploy the Metrics Server:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

kubectl get deployment metrics-server -n kube-system

Update the kubeconfig file to interact with you cluster:

# aws eks update-kubeconfig --name <ekscluster-name> --region <AWS_REGION>

kubectl config view

cat $HOME/.kube/config

Create a namespace “gloo-system” and Install Gloo with Helm Chart. Gloo Edge is an Envoy-based Kubernetes-native ingress controller to facilitate and secure application traffic.

helm repo add gloo https://storage.googleapis.com/solo-public-helm

kubectl create ns gloo-system

helm upgrade -i gloo gloo/gloo --namespace gloo-system

Install Flagger and the Prometheus add-on in the same gloo-system namespace. Flagger is a Cloud Native Computing Foundation project and part of Flux family of GitOps tools.

helm repo add flagger https://flagger.app

helm upgrade -i flagger flagger/flagger \

--namespace gloo-system \

--set prometheus.install=true \

--set meshProvider=gloo

[Optional] If you’re using Datadog as a monitoring tool, then deploy Datadog agents as a DaemonSet using the Datadog Helm chart. Replace RELEASE_NAME and DATADOG_API_KEY accordingly. If you aren’t using Datadog, then skip this step. For this post, we leverage the Prometheus open-source monitoring tool.

helm repo add datadog https://helm.datadoghq.com

helm repo update

helm install <RELEASE_NAME> \

--set datadog.apiKey=<DATADOG_API_KEY> datadog/datadog

Integrate Amazon EKS/ K8s Cluster with the Datadog Dashboard – go to the Datadog Console and add the Kubernetes integration.

[Optional] If you’re using Slack communication tool and have admin access, then Flagger can be configured to send alerts to the Slack chat platform by integrating the Slack alerting system with Flagger. If you don’t have admin access in Slack, then skip this step.

helm upgrade -i flagger flagger/flagger \

--set slack.url=https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK \

--set slack.channel=general \

--set slack.user=flagger \

--set clusterName=<my-cluster>

Create a namespace “apps”, and applications and load testing service will be deployed into this namespace.

kubectl create ns apps

Create a deployment and a horizontal pod autoscaler for your custom application or service for which canary deployment will be done.

kubectl -n apps apply -k app

kubectl get deployment -A

kubectl get hpa -n apps

Deploy the load testing service to generate traffic during the canary analysis.

kubectl -n apps apply -k tester

kubectl get deployment -A

kubectl get svc -n apps

Use apps-vs.yaml to create a Gloo virtual service definition that references a route table that will be generated by Flagger.

kubectl apply -f ./apps-vs.yaml

kubectl get vs -n apps

[Optional] If you have your own domain name, then open apps-vs.yaml in vi editor and replace podinfo.example.com with your own domain name to run the app in that domain.

Use canary.yaml to create a canary custom resource. Review the service, analysis, and metrics sections of the canary.yaml file.

kubectl apply -f ./canary.yaml

After a couple of seconds, Flagger will create the canary objects. When the bootstrap finishes, Flagger will set the canary status to “Initialized”.

kubectl -n apps get canary podinfo

NAME STATUS WEIGHT LASTTRANSITIONTIME

podinfo Initialized 0 2023-xx-xxTxx:xx:xxZ

Gloo automatically creates an ELB. Once the load balancer is provisioned and health checks pass, we can find the sample application at the load balancer’s public address. Note down the ELB’s Public address:

kubectl get svc -n gloo-system --field-selector 'metadata.name==gateway-proxy' -o=jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}{"\n"}'

Validate if your application is running, and you’ll see an output with version 6.0.0.

curl <load balancer’s public address> -H "Host:podinfo.example.com"

Trigger progressive deployments and monitor the status

You can Trigger a canary deployment by updating the application container image from 6.0.0 to 6.01.

kubectl -n apps set image deployment/podinfo podinfod=<ECR URI>:6.0.1

Flagger detects that the deployment revision changed and starts a new rollout.

kubectl -n apps describe canary/podinfo

Monitor all canaries, as the promoted status condition can have one of the following statuses: initialized, Waiting, Progressing, Promoting, Finalizing, Succeeded, and Failed.

watch kubectl get canaries --all-namespaces

curl < load balancer’s public address> -H "Host:podinfo.example.com"

Once canary is completed, validate your application. You can see that the version of the application is changed from 6.0.0 to 6.0.1.

{

"hostname": "podinfo-primary-658c9f9695-4pqbl",

"version": "6.0.1",

"revision": "",

"color": "#34577c",

"logo": "https://raw.githubusercontent.com/stefanprodan/podinfo/gh-pages/cuddle_clap.gif",

"message": "greetings from podinfo v6.0.1",

}

[Optional] Open podinfo application from the laptop browser

Find out both of the IP addresses associated with load balancer.

dig < load balancer’s public address >

Open /etc/hosts file in the laptop and add both of the IPs of load balancer in the host file.

sudo vi /etc/hosts

<Public IP address of LB Target node> podinfo.example.com

e.g.

xx.xx.xxx.xxx podinfo.example.com

Type “podinfo.example.com” in your browser and you’ll find the application in form similar to this:

Figure 1: Greetings from podinfo v6.0.1

Automated rollback

While doing the canary analysis, you’ll generate HTTP 500 errors and high latency to check if Flagger pauses and rolls back the faulted version. Flagger performs automatic Rollback in the case of failure.

Introduce another canary deployment with podinfo image version 6.0.2 and monitor the status of the canary.

kubectl -n apps set image deployment/podinfo podinfod=<ECR URI>:6.0.2

Run HTTP 500 errors or a high-latency error from a separate terminal window.

Generate HTTP 500 errors:

watch curl -H 'Host:podinfo.example.com' <load balancer’s public address>/status/500

Generate high latency:

watch curl -H 'Host:podinfo.example.com' < load balancer’s public address >/delay/2

When the number of failed checks reaches the canary analysis threshold, the traffic is routed back to the primary, the canary is scaled to zero, and the rollout is marked as failed.

kubectl get canaries --all-namespaces

kubectl -n apps describe canary/podinfo

Cleanup

When you’re done experimenting, you can delete all of the resources created during this series to avoid any additional charges. Let’s walk through deleting all of the resources used.

Delete Flagger resources and apps namespace
kubectl delete canary podinfo -n apps

kubectl delete HorizontalPodAutoscaler podinfo -n apps

kubectl delete deployment podinfo -n apps

helm -n gloo-system delete flagger

helm -n gloo-system delete gloo

kubectl delete namespace apps

Delete Amazon EKS Cluster
After you’ve finished with the cluster and nodes that you created for this tutorial, you should clean up by deleting the cluster and nodes with the following command:

eksctl delete cluster --name <cluster name> --region <region code>

Delete Amazon ECR

aws ecr delete-repository --repository-name ps-flagger-repository --force

Conclusion

This post explained the process for setting up Amazon EKS cluster and how to leverage Flagger for progressive deployments along with Prometheus and Gloo Ingress Controller. You can enhance the deployments by integrating Flagger with Slack, Datadog, and webhook notifications for progressive deployments. Amazon EKS removes the undifferentiated heavy lifting of managing and updating the Kubernetes cluster. Managed node groups automate the provisioning and lifecycle management of worker nodes in an Amazon EKS cluster, which greatly simplifies operational activities such as new Kubernetes version deployments.

We encourage you to look into modernizing your DevOps platform from monolithic architecture to microservice-based architecture with Amazon EKS, and leverage Flagger with the right Ingress controller for secured and automated service releases.

Further Reading

Journey to adopt Cloud-Native DevOps platform Series #1: OfferUp modernized DevOps platform with Amazon EKS and Flagger to accelerate time to market

About the authors: