Tag Archives: Architecture

Building SAML federation for Amazon OpenSearch Dashboards with Auth0

2022-04-11 Raghavarao Sodabathina

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/architecture/building-saml-federation-for-amazon-opensearch-dashboards-with-auth0/

Amazon OpenSearch is a fully managed, distributed, open search, and analytics service that is powered by the Apache Lucene search library. OpenSearch is derived from Elasticsearch 7.10.2, and is used for real-time application monitoring, log analytics, and website search. It’s ideal for use cases that require fast access and response for large volumes of data. OpenSearch Dashboards is derived from Kibana 7.10.2, and used for visual data exploration. With Security Assertion Markup Language (SAML)-based federation for OpenSearch, Dashboards lets you use your existing identity provider (IdP) like Auth0. You can use Auth0 to provide single sign-on (SSO) for OpenSearch Dashboards on Amazon OpenSearch search domains. It also gives you fine-grained access control, and the ability to search your data and build visualizations. Amazon OpenSearch supports providers that use the SAML 2.0 standard, such as Auth0, Okta, Keycloak, Active Directory Federation Services (AD FS), and Ping Identity (PingID).

In this post, we provide step-by-step guidance to show you how to set up a trial Auth0 account. We’ll demonstrate how to build users and groups within your organization’s directory, and enable SP-initiated single sign-on (SSO) into OpenSearch Dashboards.

To use this feature, you must enable fine-grained access control. Rather than authenticating through Amazon Cognito or an internal user database, SAML authentication for OpenSearch Dashboards lets you use third-party identity providers to log in to the OpenSearch Dashboards. SAML authentication for OpenSearch Dashboards is only for accessing the OpenSearch Dashboards through a web browser. Your SAML credentials do not let you make direct HTTP requests to OpenSearch or OpenSearch Dashboards APIs.

Auth0 is an AWS Competency Partner and popular Identity-as-a-Service (IDaaS) solution. It supports both service provider (SP)-initiated and identity provider (IdP)-initiated SSO. For SP-initiated SSO, when you sign into the OpenSearch Dashboards login page it sends an authorization request to Auth0. Once it authenticates your identity, you are redirected to OpenSearch Dashboards. In IdP-initiated SSO, you log in to the Auth0 SSO page, and choose OpenSearch Dashboards to open the application.

Overview of AuthO SAML authenticated solution

Figure 1 depicts a sample architecture of a generic, integrated solution between Auth0 and OpenSearch Dashboards over SAML authentication.

High level flow of SAML transactions between Amazon OpenSearch and Auth0

Figure 1. A high-level view of a SAML transaction between Amazon OpenSearch and Auth0

The sign-in flow is as follows:

User opens browser window and navigates to Amazon OpenSearch Dashboards
Amazon OpenSearch generates SAML authentication request
Amazon OpenSearch redirects request back to browser
Browser redirects to Auth0 URL
Auth0 parses SAML request, authenticates user, and generates SAML response
Auth0 returns encoded SAML response to browser
Browser sends SAML response back to Amazon OpenSearch Assertion Consumer Service (ACS) URL
ACS verifies SAML response
User logs into Amazon OpenSearch domain

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
A virtual private cloud (VPC) based Amazon OpenSearch domain with fine-grained access control enabled
An Auth0 account with user and a group
A browser with network connectivity to Auth0, Amazon OpenSearch domain, and Amazon OpenSearch Dashboards.

The steps in this post are structured into the following sections:

Identity provider (Auth0) setup
Prepare Amazon OpenSearch for SAML configuration
Identity provider (Auth0) SAML configuration
Finish Amazon OpenSearch for SAML configuration
Validation
Cleanup

Identity provider (Auth0) setup

Step 1: Sign up for an Auth0 account

Sign up for an Auth0 account, then click on the Sign up button to complete your account setup.
If you already have an account with Auth0, log in to your Auth0 account.

Step 2: Create Groups in Auth0

Choose User Management in the left menu and click Users, then click on the +Create User button.
Provide an email, password, and connection to your users. Click on the Create button to create your user.
Add more users to your Auth0 account.

Step 3: Install Auth0 Extension to create a group and assign users to the group

Click on Extensions in the left menu and search for “Auth0 Authorization”. Click on Auth0 Authorization to install the extension, shown in Figure 2.

The diagram depicts the Installing of Auth0 Authorization extension

Figure 2. Installing Auth0 Authorization extension

Use all default options and click on the Install button to install the extension.
Click on the Auth0 Authorization extension and choose the Accept button to provide access to your Auth0 account.
The Auth0 Authorization extension must be configured. Click on Go to Configuration (Figure 3).

The diagram depicts the configuration of Auth0 Authorization extension

Figure 3. Configuring the Auth0 Authorization extension

Rotate your API keys and check Groups, Roles, and Permissions to provide authorization to the extension and then click on PUBLISH RULE to complete the configuration, see Figure 4.

The diagram depicts the providing permissions to Auth0 Authorization extension

Figure 4. Providing the permissions to Auth0 Authorization extension

Step 4: Create a group in Auth0

Choose Groups from the left menu and click on the Create your first Group button. For this example, we will create a group called opensearch for OpenSearch Dashboards access.
Add your users to opensearch by clicking on ADD MEMBERS BUTTON, then click on the CONFIRM button to complete your group assignment (Figure 5).

The diagram depicts the adding users to Auth0 Group

Figure 5. Adding users to Auth0 Group

Step 5: Create an Auth0 Application

Choose Applications from the left menu. Click on the +Create Application button.
For this example, we are creating an application called “opensearch”.
Select Single Page Web Applications, then click on the CREATE button to proceed.
Click on the Addons tab on the application Kibana (Figure 6).

The diagram depicts the creation of Auth0 SAML application

Figure 6. Creating an Auth0 SAML application

Click on the SAML2 WEB APP, then select settings to provide SAML URLs from Amazon OpenSearch. We will configure these details after preparing the Amazon OpenSearch cluster for SAML.

Prepare Amazon OpenSearch for SAML configuration

Once the Amazon OpenSearch domain is up and running, we can proceed with configuration.

Under Actions, choose Edit security configuration (Figure 7).

The diagram depicts the enablement of OpenSearch security configuration for SAML

Figure 7. Enabling Amazon OpenSearch security configuration for SAML

Under SAML authentication for OpenSearch Dashboards/Kibana, select the Enable SAML authentication check box (Figure 8). When we enable SAML, it will create different URLs required for configuring SAML with your identity provider.

The diagram depicts the Amazon OpenSearch URLs for SAML configuration

Figure 8. Amazon OpenSearch URLs for SAML configuration

We will be using the Service Provider entity ID and SP-initiated SSO URL (highlighted in Figure 8) for Auth0 SAML configuration. We will complete the rest of the Amazon OpenSearch SAML configuration after the Auth0 SAML configuration.

Auth0 SAML configuration

Go back to Auth0.com, and navigate to Applications from the left menu. Then select the opensearch application that you created as a part of the Auth0 setup.

Click on the Addons tab on the application opensearch.
Click on the SAML2 WEB APP, then select Settings to provide SAML URLs from Amazon OpenSearch, as shown in Figure 9:
- Application Callback URL = https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_dashboards/_opendistro/_security/saml/acs (SP-initiated SSO URL)
- audience”: “https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com” (Service provider entity ID)
- destination”: “ https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_plugin/kibana/_opendistro/_security/saml/acs” (SP-initiated SSO URL)
- Mappings and other configurations shown in Figure 9

{
"audience": "https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com",
"destination": "https://vpc-XXXXX-XXXXX.us-east-1.es.amazonaws.com/_plugin/kibana/_opendistro/_security/saml/acs",
"mappings":
{
    "email":
    "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress",
    "name": "http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name",
    "groups": "http://schemas.xmlsoap.org/claims/Group"
},
"createUpnClaim": false,
"passthroughClaimsWithNoMapping": false,
"mapUnknownClaimsAsIs": false,
"mapIdentities": false,
"nameIdentifierFormat":
"urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress", "nameIdentifierProbes": [
"http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress" ]
}

The diagram depicts the configuration of Auth0 SAML parameters

Figure 9. Configuring Auth0 SAML parameters

Click on Enable to save the SAML configurations.
Go to the Usage tab, and click on the Download button to download Identity Provider Metadata, see Figure 10.

The diagram depicts the downloading of Auth0 identity provider meta data for SAML configuration

Figure 10. Downloading Auth0 identity provider metadata for SAML configuration

Amazon OpenSearch SAML configuration

Switch back to Amazon OpenSearch domain:
- Navigate to Amazon OpenSearch console
- Click on Actions, then click on Modify Security configuration
- Select Enable SAML authentication check box
Under Import IdP metadata section (Figure 11):
- Metadata from IdP: Import the Auth0 identity provider metadata from downloaded XML file
- SAML master backend role: opensearch (Auth0 group). Provide a SAML backend role/group SAML assertion key for group SSO into Kibana

The diagram depicts the configuration of Amazon OpenSearch SAML parameters

Figure 11. Configuring Amazon OpenSearch SAML parameters

Under Optional SAML settings (Figure 12):
- Leave Subject Key as blank, as Auth0 provides NameIdentifier
- Role key should be http://schemas.xmlsoap.org/claims/Group. Auth0 lets you view a sample assertion during the configuration process by clicking on the DEBUG button on SAML2 WebApp. Tools like SAML-tracer can help you examine and troubleshoot the contents of real assertions.
- Session time to live (mins): 60

The diagram depicts the configuration of Amazon OpenSearch optional SAML parameters

Figure 12. Configuring Amazon OpenSearch optional SAML parameters

Click on the Save changes button to complete Amazon OpenSearch SAML configuration for Kibana. We have successfully completed SAML configuration and are now ready for testing.

Validating access with Auth0 users

Access OpenSearch Dashboards from the previously created OpenSearch cluster. The OpenSearch Dashboards URL can be found as shown in Figure 13. The first access to the OpenSearch Dashboards URL redirects you to the Auth0 login screen.

The diagram depicts the validation of Auth0 users access with Amazon OpenSearch

Figure 13. Validating Auth0 users access with Amazon OpenSearch

Now copy and paste the OpenSearch Dashboards URL in your browser, and enter the user credentials.
If your OpenSearch domain is hosted within a private VPC, you will not be able to access your OpenSearch Dashboard over the public internet. But you can still use SAML as long as your browser can communicate with both your OpenSearch cluster and your identity provider.
You can create a Mac or Windows EC2 instance within the same VPC. This way you can access Amazon OpenSearch Dashboards from your EC2 instance’s web browser to validate your SAML configuration. You can also access Amazon OpenSearch Dashboards through Site-to-Site VPN from an on-premises environment.
After successful login, you will be redirected into the OpenSearch Dashboards home page. Explore our sample data and visualizations in OpenSearch Dashboards, as shown in Figure 14.

Figure 14. SAML authenticated Amazon OpenSearch Dashboards

You now have successfully federated Amazon OpenSearch Dashboards with Auth0 as an identity provider. You can connect OpenSearch Dashboards by using your Auth0 credentials.

Cleaning up

After you test out this solution, remember to delete all the resources you created to avoid incurring future charges. Refer to these links:

Deleting your Amazon OpenSearch domain
Deleting your Auth0 account (if needed)

Conclusion

In this blog post, we have demonstrated how to set up Auth0 as an identity provider over SAML authentication for Amazon OpenSearch Dashboards access. With this solution, you now have an OpenSearch Dashboard that uses Auth0 as the custom identity provider for your users. This reduces the customer login process to one set of credentials and improves employee productivity.

Get started by checking the Amazon OpenSearch Developer Guide, which provides guidance on how to build applications using Amazon OpenSearch for your operational analytics.

Journey to Adopt Cloud-Native Architecture Series #5 – Enhancing Threat Detection, Data Protection, and Incident Response

2022-04-07 Anuj Gupta

Post Syndicated from Anuj Gupta original https://aws.amazon.com/blogs/architecture/journey-to-adopt-cloud-native-architecture-series-5-enhancing-threat-detection-data-protection-and-incident-response/

In Part 4 of this series, Governing Security at Scale and IAM Baselining, we discussed building a multi-account strategy and improving access management and least privilege to prevent unwanted access and to enforce security controls.

As a refresher from previous posts in this series, our example e-commerce company’s “Shoppers” application runs in the cloud. The company experienced hypergrowth, which posed a number of platform and technology challenges, including enforcing security and governance controls to mitigate security risks.

With the pace of new infrastructure and software deployments, we had to ensure we maintain strong security. This post, Part 5, shows how we detect security misconfigurations, indicators of compromise, and other anomalous activity. We also show how we developed and iterated on our incident response processes.

Threat detection and data protection

With our newly acquired customer base from hypergrowth, we had to make sure we maintained customer trust. We also needed to detect and respond to security events quickly to reduce the scope and impact of any unauthorized activity. We were concerned about vulnerabilities on our public-facing web servers, accidental sensitive data exposure, and other security misconfigurations.

Prior to hypergrowth, application teams scanned for vulnerabilities and maintained the security of their applications. After hypergrowth, we established dedicated security team and identified tools to simplify the management of our cloud security posture. This allowed us to easily identify and prioritize security risks.

Use AWS security services to detect threats and misconfigurations

We use the following AWS security services to simplify the management of cloud security risks and reduce the burden of third-party integrations. This also minimizes the amount of engineering work required by our security team.

Detect threats with Amazon GuardDuty

We use Amazon GuardDuty to keep up with the newest threat actor tactics, techniques, and procedures (TTPs) and indicators of compromise (IOCs).

GuardDuty saves us time and reduces complexity, because we don’t have to continuously engineer detections for new TTPs and IOCs for static events and machine-learning-based detections. This allows our security analysts to focus on building runbooks and quickly responding to security findings.

Discover sensitive data with Amazon Macie for Amazon S3

To host our external website, we use a few public Amazon Simple Storage Service (Amazon S3) buckets with static assets. We don’t want developers to accidentally put sensitive data in these buckets, and we wanted to understand which S3 buckets contain sensitive information, such as financial or personally identifiable information (PII).

We explored building a custom scanner to search for sensitive data, but maintaining the search patterns was too complex. It was also costly to continuously re-scan files each month. Therefore, we use Amazon Macie to continuously scan our S3 buckets for sensitive data. After Macie makes its initial scan, it will only scan new or updated objects in those S3 buckets, which reduces our costs significantly. We added filter rules to exclude files of larger size and S3 prefixes to scan required objects and provided a sampling rate to further cost optimize scanning large S3 buckets (in our case, S3 buckets greater than 1 TB).

Scan for vulnerabilities with Amazon Inspector

Because we use a wide variety of operating systems and software, we must scan our Amazon Elastic Compute Cloud (Amazon EC2) instances for known software vulnerabilities, such as Log4J.

We use Amazon Inspector to run continuous vulnerability scans on our EC2 instances and Amazon Elastic Container Registry (Amazon ECR) container images. With Amazon Inspector, we can continuously detect if our developers are deploying and releasing vulnerable software on our EC2 instances and ECR images without setting up a third-party vulnerability scanner and installing additional endpoint agents.

Aggregate security findings with AWS Security Hub

We don’t want our security analysts to arbitrarily act on one security finding over another. This is time-consuming and does not properly prioritize the highest risks to address. We also need to track ownership, view progress of various findings, and build consistent responses for common security findings.

With AWS Security Hub, our analysts can seamlessly prioritize findings from GuardDuty, Macie, Amazon Inspector, and many other AWS services. Our analysts also use Security Hub’s built-in security checks and insights to identify AWS resources and accounts that have a high number of findings and act on them.

Setting up the threat detection services

This is how we set up these services:

Assigned a security tooling account (covered in a previous blog post in this series) as the delegated administrator for these services. The delegated administrator configures the services and aggregates findings from other member accounts.
Used the AWS Security Reference Architecture and the associated scripts to assist with set up, which helped ensure we set up and configured the security services according to best practices.
Used Security Hub’s new multi-Region aggregation to aggregate all findings into our primary Region.
Integrated Jira with the security tooling account with the steps outlined in How to set up a two-way integration between AWS Security Hub and Jira Service Management to track ownership and remediation status.

Our security analysts use Security Hub-generated Jira tickets to view, prioritize, and respond to all security findings and misconfigurations across our AWS environment.

Through this configuration, our analysts no longer need to pivot between various AWS accounts, security tool consoles, and Regions, which makes the day-to-day management and operations much easier. Figure 1 depicts the data flow to Security Hub.

Figure 1. Aggregation of security services in security tooling account

Figure 2. Delegated administrator setup

Incident response

Before hypergrowth, there was no formal way to respond to security incidents. To prevent future security issues, we built incident response plans and processes to quickly address potential security incidents and minimize the impact and exposure. Following the AWS Security Incident Response Guide and NIST framework, we adopted the following best practices.

Playbooks and runbooks for repeatability

We developed incident response playbooks and runbooks for repeatable responses for security events that include:

Playbooks for more strategic scenarios and responses based on some of the sample playbooks found here.
Runbooks that provide step-by-step guidance for our security analysts to follow in case an event occurs. We used Amazon SageMaker notebooks and AWS Systems Manager Incident Manager runbooks to develop repeatable responses for pre-identified incidents, such as suspected command and control activity on an EC2 instance.

Automation for quicker response time

After developing our repeatable processes, we identified areas where we could accelerate responses to security threats by automating the response. We used the AWS Security Hub Automated Response and Remediation solution as a starting point.

By using this solution, we didn’t need to build our own automated response and remediation workflow. The code is also easy to read, repeat, and centrally deploy through AWS CloudFormation StackSets. We used some of the built-in remediations like disabling active keys that have not been rotated for more than 90 days, making all Amazon Elastic Block Store (Amazon EBS) snapshots private, and many more. With automatic remediation, our analysts can respond quicker and in a more holistic and repeatable way.

Simulations to improve incident response capabilities

We implemented quarterly incident response simulations. These simulations test how well prepared our people, processes, and technologies are for an incident. We included some cloud-specific simulations like an S3 bucket exposure and an externally shared Amazon Relational Database Service (Amazon RDS) snapshot to ensure our security staff are prepared for an incident in the cloud. We use the results of the simulations to iterate on our incident response processes.

Conclusion

In this blog post, we discussed how to prepare for, detect, and respond to security events in an AWS environment. We identified security services to detect security events, vulnerabilities, and misconfigurations. We then discussed how to develop incident response processes through building playbooks and runbooks, performing simulations, and automation. With these new capabilities, we can detect and respond to a security incident throughout hypergrowth.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Let’s Architect! Architecting microservices with containers

2022-04-06 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecting-microservices-with-containers/

Microservices structure an application as a set of independently deployable services. They speed up software development and allow architects to quickly update systems to adhere to changing business requirements.

According to best practices, the different services should be loosely coupled, organized around business capabilities, independently deployable, and owned by a single team. If applied correctly, there are multiple advantages to using microservices. However, working with microservices can also bring challenges. In this edition of Let’s Architect!, we explore the advantages, mental models, and challenges deriving from microservices with containers.

Application integration patterns for microservices

As Tim Bray said in his time with AWS, “If your application is cloud native, large scale, or distributed, and doesn’t include a messaging component, that’s probably a bug.”

This video evaluates several design patterns based on messaging and shows you how to implement them in your workloads to achieve the full capabilities of microservices. You’ll learn some fundamental application integration patterns and some of the benefits that asynchronous messaging can have over REST APIs for communication between microservices.

The scatter-gather pattern scales parallel processing across nodes and aggregates the results in a queue

Distributed monitoring

Customers often cite monitoring as one of the main challenges while working with containers. Monitoring collects operational data as logs, metrics, events, and traces to identify and respond to issues quickly and minimize disruptions.

This whitepaper covers cross-service challenges in microservices, including service discovery, distributed monitoring, and auditing. You’ll learn about the role of DNS and service meshes in interservice communication and discovery and the tools available for monitoring your clusters that run containers and for logging.

This view from AWS X-Ray shows how a request can be tracked across different services. This is implemented by taking advantage of correlation IDs

Create a pipeline with canary deployments for Amazon ECS using AWS App Mesh

When architects deploy a new version of an application, they want to test it on a set of users before routing all the traffic to the new version. This is known as a “canary deployment.” A canary deployment can automatically switch traffic back to the old version if some inconsistencies are detected. This decreases the impact of the bug(s) introduced in the new release. For microservices, this is helpful when testing a complex distributed system because you can send a percentage of traffic to newer versions in a controlled manner.

A service mesh provides application-level networking so your services can communicate with each other across multiple types of compute infrastructure. This blog post shows how to use AWS App Mesh to implement a canary deployment strategy using AWS Step Functions for orchestrating the different steps during testing and AWS Code Pipeline for continuous delivery of each microservice.

An overview of the architecture used to create the pipeline and perform the canary deployments

Running microservices in Amazon EKS with AWS App Mesh and Kong

Distributed architectures bring up several questions. How do we expose our APIs towards client-side applications? How do our microservices communicate?

This blog post answers these questions with a solution that uses Amazon Elastic Kubernetes Service (Amazon EKS) in conjunction with AWS App Mesh. This solution helps you manage the security and discoverability of microservices, and Kong protects your service mesh and runs side by side with your application services.

The Kong for Kubernetes architecture can be implemented using Amazon EKS and AWS App Mesh

See you next time!

See you in a couple of weeks when we discuss open source technologies on AWS!

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Selecting the appropriate discovery tool for your cloud migration

2022-04-04 David Ninnis

Post Syndicated from David Ninnis original https://aws.amazon.com/blogs/architecture/selecting-the-appropriate-discovery-tool-for-your-cloud-migration/

Cloud migrations invariably require the coordination of multiple stakeholders, such as business and technical teams, partners, and third-party providers. As a stakeholder, understanding your portfolio is crucial to determine which workloads to migrate, and their requirements and interdependencies. But manually gathering these insights can be a daunting task. You can inform your decision by provisioning a discovery tool.

Given the variety of choices available, choosing the appropriate discovery tool for your use case can also be challenging. In this blog post, we explain a proven three-step technique to successfully filter and prioritize a list of discovery tools based on the essential features needed for your business.

Diagram of 3 steps to determine your migration discovery tool

Figure 1. Steps to determine your migration discovery tool

Step 1: Review

Review the outcomes that your migration journey should deliver. These will drive your discovery requirements and baseline the features needed. Compare the baseline features with existing tools within your organization such as Configuration Management Databases (CMDBs) or Application Performance Management (APM) tooling. By the end of this Review step, you should establish if your in-house tools are sufficient for your objectives. Here is a list of questions to support your migration analysis.

You may need to collect high-level data or complete datasets depending on your stage in the migration process. You may be exploring migration costs to assess the lift-and-shift migration threshold of a benefit program such as AWS Migration Acceleration Program (MAP). In this case, you probably only need a snapshot of your on-premises environment with a list of servers, their configurations, and attached licenses. You may be evaluating the Total Cost of Ownership (TCO) between your on-premises infrastructure and an elastic deployment model on the cloud. In this case, you’ll want to be able to estimate the cost of a right-sized cloud infrastructure. For that, you will want to run a discovery tool for the duration of a business cycle.

Finally, to decide how and when applications are going to migrate, you will need a complete and accurate dataset with the application and database features to be migrated. This must include network dependencies, non-functional requirements (NFRs), disaster recovery (DR) plans, and third-party licensing terms and conditions (T&C).

Conclude this step by baselining the features that you require in the discovery tool.

Step 2: Refine

This step is an elimination process. One of the resources you can use to compile a list of candidate discovery tools is the Discovery migration tool comparison page. Filter and sort the list using the following criteria categories:

Core features
Common features
Special features
Tool provisioning
Operation

By the end of this step, you should have a prioritized list of the discovery tools that optimally match your requirements.

Core features

Following are the basic set of features that you should expect from any discovery tool. Analysis of the data captured by these features will support your high-level business case.

Automatic inventory collection. Reports on the infrastructure profile, such as CPU family, CPU cores, memory size, disk size and speed, and operating system.
Utilization. Shows peak and average utilization of CPU, memory, and disk.
Network storage discovery. Detects and profiles network shares from network-attached storage (NAS).
Software. Identifies running processes and installed software, pinpointing database engines and their versions.
Network scanning. Scans network subnets to discover unknown infrastructure assets.

Common features

Here are some common features of discovery tools. With this data, you will be able to create a more detailed TCO analysis and migration plan.

Lift-and-shift cost estimation. Maps a recommended target AWS infrastructure for the rehost of the source infrastructure, and calculates the AWS cost.
Target sizing recommendation. Maps and calculates the cost for alternative target AWS infrastructures based on the peak and average utilization.
TCO analysis. Provides a cost comparison between current on-premises cost and projected AWS cost.
Dependency mapping. Collects network connection information and builds inbound and outbound dependency maps of the servers and running applications. Infers applications from groups of infrastructure resources based on communication patterns.
Application prioritization. Assigns weight or relevance to application and infrastructure attributes to create prioritization criteria for migration.
Wave planning. Recommends groups of applications and the ability to create migration wave plans.

Special features

Special features map to less common requirements, or to a specific set of the workloads you want to migrate. For example, you may require your tools to collect database dependency information if databases form a significant part of your workloads. If you have strict regulatory compliance to follow (HIPAA, GDPR), you’ll need tools that comply with these regulations. Other examples are:

Licensing analysis. Provides optimization recommendations for Microsoft SQL Server and Oracle systems in rehosting and replatforming scenarios.
Enterprise platforms. Includes the ability to collect details from proprietary operating systems like AIX and Solaris, or infrastructure such as AS/400 and mainframe.

When evaluating each feature, consider how much of your environment it applies to, and how important it is for the overall objective. It’s a good practice to deprioritize rather than completely eliminating tools that don’t provide a specific feature.

Provisioning

Once you run through the minimum, common, and special feature requirements, further refine the tool list by evaluating the challenges associated with the provisioning process of each tool. For example, consider aspects like data residency and the cost model. Read Evaluating the need for discovery tooling for a detailed list of provisioning criteria.

Operations

Finally, refine your tool list further by evaluating the requirements to operate the tool. This includes considerations like the running cost and the support model.

Step 3: Select

At this stage, you should have a shortlist of preferred discovery tools, with only one or two tools remaining for final evaluation.

All of these shortlisted tools should fulfill your requirements. You can further refine your selection by choosing the tool that best fits your priorities. For example, if only two tools remain in your shortlist and ease of installation and operation are paramount, then select the tool with the highest levels of deployment automation. If cost is your main constraint, then select the least expensive tool to acquire and operate.

Conclusion

Tools from AWS and AWS Partners can help accelerate your migration to the AWS Cloud. To select the relevant discovery tool for your specific use case, we recommend the following proven three-step approach:

Review – Start by reviewing existing in-house capabilities, tools, and data sources. By the end of this step, you should establish if your in-house tools are sufficient for your cloud migration objectives.
Refine – Narrow down a list of candidate discovery tools by filtering and prioritizing them based on the requirements from the previous step.
Select – Filter the final list of suitable discovery tools by selecting the tool that best addresses your priorities.

How Net at Work built an email threat report system on AWS

2022-03-31 Florian Mair

Post Syndicated from Florian Mair original https://aws.amazon.com/blogs/architecture/how-net-at-work-built-an-email-threat-report-system-on-aws/

Emails are often used as an entry point for malicious software like trojan horses, rootkits, or encryption-based ransomware. The NoSpamProxy offering developed by Net at Work tackles this threat, providing secure and confidential email communication.

A subservice of NoSpamProxy called 32guards is responsible for threat reports of inbound and outbound emails. With the increasing number of NoSpamProxy customers, 32guards was found to have several limitations. 32guards was previously built on a relational database. But with the growth in traffic, this database was not able to keep up with storage demands and expected query performance. Further, the relational database schema was limiting the possibilities of complex pattern detections, due to performance limitations. The NoSpamProxy team decided to rearchitect the service based on the Lake House approach.

The goal was to move away from a one-size-fits-all approach for data analytics and integrate a data lake with purpose-built data stores, unified governance, and smooth data movement.

This post shows how Net at Work modernized their 32guards service, from a relational database to a fully serverless analytics solution. With adoption of the Well-Architected Analytics Lens best practices and the use of fully managed services, the 32guards team was able to build a production-ready application within six weeks.

Architecture for email threat reports and analytics

This section gives a walkthrough of the solution’s architecture, as illustrated in Figure 1.

Figure 1. 32guards threat reports architecture

1. The entry point is an Amazon API Gateway, which receives email metadata in JSON format from the NoSpamProxy fleet. The message contains information about the email in general, email attachments, and URLs in the email. As an example, a subset of the data is presented in JSON as follows:

{
...
"Attachments": [
    {
      "Sha256Hash": "69FB43BD7CCFD79E162B638596402AD1144DD5D762DEC7433111FC88EDD483FE",
      "Classification": 0,
      "Filename": "test.ods.tar.gz",
      "DetectedMimeType": "application/tar+gzip",
      "Size": 5895
    }
],
"Urls": [
    {
      "Url": "http://www.aarhhie.work/",
      "Classification": 0,
    },        {
      "Url": "http://www.netatwork.de/",
      "Classification": 0,
    },
    {
      "Url": "http://aws.amazon.com/",
      "Classification": 0,
    }
]
}

2. This JSON message is forwarded to an AWS Lambda function (called “frontend”), which takes care of the further downstream processing. There are two activities the Lambda function initiates:

Forwarding the record for real-time analysis/storage
Generating a threat report based on the information derived from the data stored in the indicators of compromises (IOCs) Amazon DynamoDB table

IOCs are patterns within the email metadata that are used to determine if emails are safe or not. For example, this could be for a suspicious file attachment or domain.

Threat report for suspicious emails

In the preceding JSON message, the attachments and URLs have been classified with “0” by the email service itself, which indicates that none of them look suspicious. The frontend Lambda function uses the vast number of IOCs stored in the DynamoDB table and heuristics to determine any potential threats within the email. The use of DynamoDB enables fast lookup times to generate a threat report. For the example, the response to the API Gateway in step 2 looks like this:

{
"ReportedOnUtc": "2021-10-14T14:33:34.5070945Z",
"Reason": "realtimeSuspiciousOrganisationalDomain",
"Identifier": "aarhhie.work",
...
}

This threat report shows that the top-level domain “aarhiie.work” has been detected as suspicious. The report is used to determine further actions for the email, such as blocking.

Real-time data processing

3. In the real-time analytics flow, the frontend Lambda function ingests email metadata into a data stream using Amazon Kinesis Data Streams. This is a massively scalable, serverless, and durable real-time data streaming service. Compared to a queue, streaming storage permits more than one consumer of the same data.

4. The first consumer is an Apache Flink application running in Amazon Kinesis Data Analytics. This application generates statistical metrics (for example, occurrences of the top-level domain “.work”). The output is stored in Apache Parquet format on Amazon S3. Parquet is a columnar storage format for row-based files like csv.

The second consumer of the streaming data is Amazon Kinesis Data Firehose. Kinesis Data Firehose is a fully managed solution to reliably load streaming data into data lakes, data stores, and analytics services. Within the 32guards service, Kinesis Data Firehose is used to store all email metadata into Amazon S3. The data is stored in Apache Parquet format, which makes queries more time and cost efficient.

IOC detection

Now that we have shown how data is ingested and threat reports are generated to respond quickly to requests, let’s look at how the IOCs are updated. These IOCs are used for generating the threat report within the “frontend” Lambda function. As attack vectors are changing over time, quickly analyzing the data for new threats, is crucial to provide high-quality reports to the NoSpamProxy service.

The incoming email metadata is stored every few minutes in Amazon S3 by Kinesis Data Firehose. To query data directly in Amazon S3, Amazon Athena is used. Athena is a serverless query service that analyzes data stored in Amazon S3, by using standard SQL syntax.

5. To be able to query data in S3, Amazon Athena uses the AWS Glue Data Catalog, which contains the structure of the email metadata stored in the data lake. The data structure is derived from the data itself using AWS Glue Crawlers. Other external downstream processing services like business intelligence applications, also use Amazon Athena to consume the data.

6. Athena queries are initiated on a predefined schedule to update or generate new IOCs. The results of these queries are stored in the DynamoDB table to enable fast lookup times for the “frontend” Lambda.

Conclusion

In this blog post, we showed how Net at Work modernized their 32guards service within their NoSpamProxy product. The previous architecture used a relational database to ingest and store email metadata. This database was running into performance and storage issues, and must be redesigned into a more performant and scalable architecture.

Amazon S3 is used as the storage layer, which can scale up to exabytes of data. With Amazon Athena as the query engine, there is no need to operate a high-performance database cluster, as compute and storage is separated. By using Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics, valuable insight can be generated in real time, and acted upon more quickly.

As a serverless, fully managed solution, the 32guards service has a lower-cost footprint of as much as 50% and requires less maintenance. By moving away from a relational database model, the query runtimes decrease significantly. You can now conduct analyses that have not been feasible before.

Interested in the NoSpamProxy? Read more about NoSpamProxy or sign up for a free trial.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Build a multi-language notification system with Amazon Translate and Amazon Pinpoint

2022-03-29 Praveen Allam

Post Syndicated from Praveen Allam original https://aws.amazon.com/blogs/architecture/build-a-multi-language-notification-system-with-amazon-translate-and-amazon-pinpoint/

Organizations with global operations can struggle to notify their customers of any business-related announcements or notifications in different languages. Their customers want to receive notifications in their local language and communication preference. Organizations often rely on complicated third-party services or individuals to manually translate the notifications. This can lead to a loss of revenue due to delayed communication and additional operational expenses.

This blog post demonstrates how to build a straightforward, cost-effective, and scalable multi-language notification system using AWS Serverless technologies. You can post a business-related announcement or notification in English, and based on the customer profile data, it will convert this announcement or notification into different languages. Additionally, the system will also deliver these translated announcements or notifications as an email, voice, or SMS.

Example of a multi-language notification use case

A restaurant franchise company is adding a new item to their menu and plans to release it in North America, Germany, and France. The corporate office has decided to send the following notification.

The company is adding a new item to the menu, and this will go live by May 10. Please ensure you are prepared for this change and plan accordingly.

The franchise owners in Germany want to receive the notifications in the German language, whereas the franchise owners in France want to receive it in French. North American franchises want to receive it in English.

Solution design for multi-language notification system

The solution in Figure 1 demonstrates how to build a multi-language notification system using Amazon Translate and Amazon Pinpoint.

AWS Serverless technologies handle automatic scaling, have built-in high availability architecture, and a pay-for-use billing model, which increases agility and optimizes costs. The system built with this solution is invoked using REST API endpoints. Once this solution is deployed, it can be integrated with any frontend application where users can log in and send out notification events.

Figure 1 illustrates the architecture of this solution.

Solution architecture for multi-language notification system. It includes all the AWS services that are required in this solution. The flow is described as follows.

Figure 1. Solution architecture for multi-language notification system

1. The restaurant franchise will log in to their UI to type the notification message in English. Upon submission, the notification message is sent to the Amazon API Gateway REST endpoint.
Note: In this solution, there is no UI available. You will use a terminal to submit the message.

2. Amazon API Gateway will send this message to Amazon Simple Queue Service (SQS), which will keep the HTTP requests asynchronous.

3. The SQS queue will invoke the SQS AWS Lambda function.

4. The SQS Lambda function invokes the AWS Step Functions state machine. This SQS Lambda function is used as a proxy mechanism to start the state machine workflow. AWS Step Functions are used to orchestrate the notification workflow process. The workflow process validates the message, converts it into different languages, and notifies the customers in their preferred way of communication (email, voice, or SMS). It also handles errors if any of the steps fail by using SQS dead-letter queue.

5. The message entered must be validated in order to ensure that the organizational standards are followed. To perform the message validation, we use the Amazon Comprehend service. Comprehend’s Sentiment analysis will determine whether to send or flag the message. All flagged messages are sent for review.

In the example use case message preceding, the message sentiment neutral score is 0.85 confidence. If you set the acceptable score to anything greater than 0.5 confidence, then it is a valid message. Once it passes the validation step, the workflow will proceed to the next step.
If the message is vague or not clear, the sentiment score might be less than 0.5 confidence. For example, if this is the message used: We are adding a dish; be ready for it, the sentiment score might be only 0.45 confidence. This is under the acceptable score, and the message will not be processed further.

6. After the message is successfully validated, the message is translated into various languages depending on the customers’ profiles. The Translate Lambda function determines the number of unique languages by referring to the customer profile data in the Amazon DynamoDB table. The function then uses Amazon Translate to translate the message to the different languages required for that notification event. In our example use case, the converted messages will look as follows:

German (de):

Das Unternehmen fügt dem Menü einen neuen Punkt hinzu, der bis zum 10. Mai live geschaltet wird. Bitte stellen Sie sicher, dass Sie auf diese Änderung vorbereitet sind und planen Sie entsprechend.

French (fr):

La société ajoute un nouvel article au menu, qui sera mis en ligne d’ici le 10 mai. Assurez-vous d’être prêt pour ce changement et de planifier en conséquence.

7. The last step in the workflow is to build the notification logic and deliver the notifications. The Amazon Pinpoint Lambda function retrieves the customer’s profile from the Amazon DynamoDB table. It then parses each record for a given notification event to find out the delivery mode (email, voice, or SMS message). The function then builds the notification logic using Amazon Pinpoint. Amazon Pinpoint notifies each customer either by email, voice, or SMS.

Code repository

The code for this solution is available on GitHub. Review the README file for detailed instructions on how to download and run the solution in your AWS account.

Conclusion

Organizations that operate on an international basis often struggle to build a multi-language notification system to communicate any business-related announcements or notifications to their customers in different languages. Communicating these announcements or notifications in a variety of formats such as email, voice, and SMS can be time-consuming. Our solution addresses these challenges using AWS services with fewer steps than traditional third-party options. This solution also features automatic scaling, built-in high availability, and a pay-for-use billing model to increase agility and optimize costs. These technologies not only decrease infrastructure management tasks like capacity provisioning and patching, but provides for a better customer experience.

Further reading:

Selecting the right database and database migration plan for your workloads

2022-03-28 Nikhil Anand

Post Syndicated from Nikhil Anand original https://aws.amazon.com/blogs/architecture/selecting-the-right-database-and-database-migration-plan-for-your-workloads/

There has been a tectonic shift in the approach to hosting enterprise workloads. Companies are rapidly moving from on-premises data centers to cloud-based services. The driving factor has been the ability to innovate faster on the cloud.

Your transition to cloud can be straightforward, but it does go beyond the usual ‘lift-and-shift’ approach. To start with, you’ll need a cloud provider that provides a data store and computing facility. But you’ll be able to grow your business with a provider that has purpose-built resources and a platform that supports innovation. Companies are using higher-level services such as fully managed databases and serverless compute offered by Amazon Web Services (AWS) to get the most out of their cloud adoption journey.

In this blog, we will focus on the database tier of enterprise applications. We will evaluate the reasons for moving to managed or purpose-built databases. Then we’ll discuss in more detail the options you have and how to implement the right database migration plan for your workloads.

Why move to a managed database?

Managed databases, or purpose-built databases, are managed services by AWS that free you to focus on other business and technical priorities. Different types of databases fill certain needs, and so the database tier in an application should never be a one-size-fits-all approach. Based on the kind of application you are developing, you should research managed options that suit your business and enterprise use cases.

Database Type	Use Cases	AWS Service
Relational	Traditional applications, enterprise resource planning (ERP), customer relationship management (CRM), ecommerce	Amazon Aurora Amazon RDS Amazon Redshift
Key-Value	High-traffic web applications, ecommerce systems, gaming applications	Amazon DynamoDB
In-Memory	Caching, session management, gaming leaderboards, geospatial applications	Amazon ElastiCache Amazon MemoryDB for Redis
Document	Content management, catalogs, user profiles	Amazon DocumentDB (with MongoDB compatibility)
Wide Column	High-scale industrial apps for equipment maintenance, fleet management, and route optimization	Amazon Keyspaces
Graph	Fraud detection, social networking, recommendation engines	Amazon Neptune
Time Series	Internet of Things (IoT) applications, DevOps, industrial telemetry	Amazon TimeStream
Ledger	Systems of record, supply chain, registrations, banking transactions	Amazon QLDB

Table 1. Managed databases by AWS

Managed database features

Manageability. The top priority and most valuable asset that you own as a company is your data. While data remains your key asset, spending time on database management is not the optimum use of your time. Managed services have built-in reliable tooling for multiple aspects of database management, which can help your database operate at the highest level.

Availability and disaster recovery. Most managed databases at AWS are highly available by default. For example, Amazon Aurora is designed to offer 99.99% availability, replicating six copies of your data across three Availability Zones (Figure 1). It backs up your data continuously to Amazon S3. It recovers transparently from physical storage failures; instance failover typically takes less than 30 seconds.

Figure 1. Replication across three Availability Zones with Amazon Aurora DB cluster

With managed databases, you get multiple options to create a highly available and fault tolerant database. Alternatively, if you choose to self-host a database elsewhere you will have to formulate your own disaster recovery (DR) strategy. This takes meticulous DR planning and relies heavily on a constant monitoring solution.

Elasticity and agility: Cloud offers elasticity for scaling your database. You can scale in minutes and spin up and down additional servers and storage size as needed. It offers you flexibility with capacity planning. You can always reassess your database tier to see if it is over or under provisioned.

Self-managed databases on AWS

If I do not need a managed database, should I still use Amazon EC2 to host my database?

Here are some cases when you may find it beneficial to host your database on Amazon EC2 instances:

You need full control over the database and access to its underlying operating system, database installation, and configuration.
You are ready to plan for the high availability of your database, and prepare your disaster recovery plan.
You want to administer your database, including backups and recovery. You must perform tasks such as patching the operating system and the database, tuning the operating system and database parameters, managing security, and configuring high availability or replication.
You want to use features that aren’t currently supported by managed services.
You need a specific database version that isn’t supported by AWS managed database service.
Your database size and performance needs exceed the limits of the managed service.
You want to avoid automatic software patches that might not be compliant with your applications.
You want to achieve higher IOPS and storage capacity than the current limits of the managed services.

Can I customize my underlying OS and database environment?

Amazon RDS Custom is a managed database service for applications that require customization of the underlying operating system and database environment. With Amazon RDS Custom, you have access to the underlying EC2 instance that hosts the DB engine. You can access the EC2 instance via secure shell (AWS Systems Manager and SSH) and perform the customizations to suit your application needs.

Choosing which migration plan to implement

The first step in your cloud database migration journey is identifying which database that you want to migrate to. For relational databases, determine the migration strategy. In the majority of the database migrations, you can choose to rehost, replatform, or refactor.

Refer the AWS Prescriptive Guidance for choosing a migration strategy for relational databases.

The next is determining which database migration plan best serves your needs. AWS provides a number of options to help correctly handle your data migration with minimal downtime. Here are the database migration plans that you can consider using for your cloud database adoption journey:

1. AWS-offered AWS Database Migration Service (AWS DMS): AWS Database Migration Service (AWS DMS) is a self-service option for migrating databases. You can use AWS DMS to migrate between homogeneous database types, such as going from one MySQL instance to a new one. You can also use AWS DMS between heterogeneous database types, such as moving from a commercial database like Oracle to a cloud-native relational database like Aurora. Read tutorials about migrating sample data: AWS Database Migration Service Step-by-Step Walkthroughs.

AWS DMS offers minimal downtime, supports widely used databases, supports on-going replication, is offered as a low-cost service, and is highly reliable. If you are looking for an end-to-end service for database migration, consider AWS DMS.

2. AWS Database Migration Service (DMS) with AWS Schema Conversion Tool (SCT): If you are migrating between heterogenous databases, use the AWS Schema Conversion Tool (SCT). It converts the source database schema and the database code objects (like views, stored procedures, and functions) to a format compatible with the target database (Figure 2).

Figure 2. Supported conversions with AWS Schema Conversion Tool

For heterogeneous migrations, consider using AWS DMS with AWS SCT.

3. Database Freedom Program: If you are new to cloud computing or if your database migration plan must be evaluated and validated by industry experts, then try the Database Freedom Program. Database Freedom is a unique program designed to assist customers in migrating to AWS databases and analytics services. They will provide technical advice, migration support, and financial assistance.

You can use the Contact Us page for the Database Freedom program to get in touch with experts that can assist you in your cloud adoption journey.

4. AWS Professional Services and Partners: You may have in-house database expertise, but need end-to-end implementation assistance across different application tiers. Get help from the Professional Services of AWS or connect with the certified network of AWS Partners. AWS Database Migration Service Delivery Partners help customers use AWS DMS to migrate databases to AWS securely, while minimizing application downtime and following best practices.

Conclusion

Migrating to the cloud is a journey that is ever-evolving. To remain focused on your innovations, leverage the managed services of AWS for your migration journey.

I hope this post helps you consider using a managed database service when available, and effectively evaluate and choose the right database migration plan for your enterprise. For any queries, feel free to get in touch with us. We will be happy to help you in your cloud journey.

Happy migrating!

Additional Reading

Dream11: Blocking application attacks using AWS WAF at scale

2022-03-25 Vatsal Shah

Post Syndicated from Vatsal Shah original https://aws.amazon.com/blogs/architecture/dream11-blocking-application-attacks-using-aws-waf-at-scale/

As the world’s largest fantasy sports platforms with more than 120 million registered users, Dream11 runs multiple contests simultaneously while processing millions of user requests per minute. Their user-centric and data-driven teams make it a priority to ensure that the Dream11 application (app) remains protected against all kinds of threats and vulnerabilities.

Introduction to AWS WAF Security Automations

AWS WAF is a web application firewall that helps protect apps and APIs against common web exploits and bots. These attacks may affect availability, compromise security, or consume excessive resources. AWS WAF gives you control over how traffic reaches your applications. You can create security rules that control bot traffic and block common attack patterns, such as SQL injection or cross-site scripting (XSS.)

AWS WAF Security Automations use AWS CloudFormation to quickly configure AWS WAF rules that help block the following common types of attacks:

SQL injection
Cross-site scripting
HTTP floods
Scanners and probes
Known attacker origins (IP reputation lists)
Bots and scrapers

In this blog post, we will explain how Dream11 uses AWS WAF Security Automations to protect its application from scanners and probes attacks.

Scanner and probe automation

To understand the scanner and probe automation, let’s look at a realistic attack scenario for a standard app that is protected by AWS WAF. Let’s assume that a malicious user is trying to scan the app and identify loopholes using their custom tool. They plan to conduct injection attacks (such as SQLi, XSS) or directory brute force attacks.

The app, secured by AWS WAF, has rules in place to block requests if certain signatures and patterns are matched. AWS WAF cannot have all possible payload lists for each attack vector. This means that after some trial and errors, an attacker may find the payload that doesn’t get blocked by AWS WAF and try to exploit the vulnerability.

In this case, what if AWS WAF can detect the behavior of malicious user IPs and block it for a certain time period? Wouldn’t it be great if AWS WAF blocks the IP of a malicious user after receiving a couple of malicious requests? That way, new requests coming from that IP will be blocked without AWS WAF having to check all the rules in the web ACL. Any successful bypass attempts will also get blocked from that IP. Rather than permanently blocking the IP, this feature blocks the offending IP for a certain time period, discouraging the attacker from any further attempts. It acts as a first step of incident response. Here’s where automation can help.

Scanner and probe automation monitors Amazon CloudFront logs and analyses HTTP status codes for requests coming from different IPs. Based on the configured threshold of HTTP status codes, scanner and probe automation will update the malicious IP directly to the AWS WAF rule IPSet. It then blocks subsequent requests from that IP for a configured period of time.

The AWS WAF Security Automations solution creates an AWS WAF rule, an AWS Lambda function, and a Scanner and Probes Amazon Athena query. The Athena query parses Amazon CloudFront or Application Load Balancer access logs at regular intervals. It counts the number of bad requests per minute from unique source IP addresses. The Lambda function updates the AWS WAF IPSet rule to block further scans from IP addresses with a high error rate.

Scanner and probe solution

Figure 1. Solution architecture for scanner and probe automation (xxx represents the numbers as defined by the use case)

The workflow of the solution is as follows, shown in Figure 1:

CloudFront logs are pushed to the Amazon S3 bucket
Log Parser Lambda will run the Athena query to find the error code threshold for each unique IP
If the HTTP error threshold is crossed for any IP, the Lambda function will update the IP into an AWS WAF IPSet for a certain time
The IPSet is unblocked automatically after the time period is over

Customizing the AWS WAF Security Automation solution

Scanner and probe automation with rules will block traffic if the error rate for a particular IP crosses the threshold. It then adds the IP in the blocked IPSet. This IP is blocked for a configurable amount of time (for example, 12 hours, 2 days, 1 week).

During the customization of AWS WAF for Dream11, there were instances which required exceptions to the preceding rule. One was to prevent internal services/gateway IPs from getting blocked by the security automation. We needed to customize the rules for these predefined thresholds. For example: the solution should block the external traffic, but exclude any internal IP addresses.

The Dream11 Security team customized the Lambda logic to approve all internal NAT gateway IPs. Scanner and probe automation ignores these IPs even if there is a high number of errors from the approved IPs. Sample code is as follows:

log.info("[update_ip_set] \tIgnore the approved IP ")

if ip_type == "IPV4" and source_ip not in outstanding_requesters['ApprovedIPs']:
                addresses_v4.append(source_ip)
elif ip_type == "IPV6" and source_ip not in outstanding_requesters['ApprovedIPs']:                     addresses_v6.append(source_ip)

Note: Create a JSON file with list of approved IPs and store it in APP_ACCESS_LOG_BUCKET
We will use the same S3 bucket to put our office-approved IPs as xyz.json file where we store our CloudFront access logs. This is configurable during CloudFormation template for Security Automation.

Code explanation:

The custom code first validates the particular IP for which the error threshold is crossed against the approved IPs.
If the IP belongs to the IPV4 or IPV6 format and isn’t an approved IP, it will be appended to the blocked IPSet for a certain period of time.

The customization of the Lambda function provides a security automation solution that doesn’t block any legitimate request. At the same time, it provides protection against scanner and probe attacks. AWS WAF security automation is an open-source solution and is hosted on GitHub.

Conclusion

In this blog post, we’ve given a brief overview of how you can reduce attacks by using AWS WAF Security Automations against scanners and probes. We’ve also illustrated the customization implemented by the Dream11 security team.

By automating your security operations, you will improve effective incident response. You can prioritize threats and handle cyber attacks automatically with automated courses of action. This reduces the need for human intervention, reduces response time, and addresses security issues without manual effort.

After implementing this at Dream11, we were able to create custom, application-specific rules that blocked attack patterns. This has provided application availability, secure resources, and has prevented excessive resource consumption. With this solution, we are able to provide the best fantasy sports experience for over 120 million users.

Read more about Security Automations in AWS WAF.

Migration updates announced at re:Invent 2021

2022-03-24 Angélica Ortega

Post Syndicated from Angélica Ortega original https://aws.amazon.com/blogs/architecture/migration-updates-announced-at-reinvent-2021/

re:Invent is a yearly event that offers learning and networking opportunities for the global cloud computing community. 2021 marks the launch of several new features in different areas of cloud services and migration.

In this blog, we’ll cover some of the most important recent announcements.

AWS Mainframe Modernization (Preview)

Mainframe modernization has become a necessity for many companies. One of the main drivers fueling this requirement is the need for agility, as the market constantly demands new functionalities. The mainframe platform, due to its complex dependencies, long procurement cycles, and escalating costs, makes it impossible for companies to innovate at the needed pace.

Mainframe modernization can be a complex undertaking. To assist you, we have launched a comprehensive platform, called AWS Mainframe Modernization, that enables two popular migration patterns: replatforming, and automated refactoring.

Figure 1. AWS Mainframe Modernization flow

AWS Migration and Modernization Competency

Application modernization is becoming an important migration strategy, especially for strategic business applications. It brings many benefits: software licensing and operation cost optimization, better performance, agility, resilience, and more. Selecting a partner with the required expertise can help reduce the time and risk for these kinds of projects. In the next section, you’ll find a summary of the experience required by a partner to get the AWS Migration and Modernization Competency. More information can be found at AWS Migration Competency Partners.

AWS Application Migration Service (AWS MGN)

AWS MGN is recommended as the primary migration service for lift and shift migrations. Customers currently using AWS Server Migration Service are encouraged to switch to it for future migrations.

Starting in November 2021, AWS MGN supports agentless replication from VMWare vCenter versions 6.7 and 7.0 to the AWS Cloud. This new feature is intended for users who want to rehost their applications to AWS, but cannot install the AWS Replication Agent on individual servers due to company policies or technical restrictions.

AWS Elastic Disaster Recovery

Two of the pillars of the Well-Architected Framework are Operational Excellence and Reliability. Both are directly concerned with the capability of a service to recover and work efficiently. AWS Elastic Disaster Recovery is a new service to help you to minimize downtime and data loss with fast, reliable, and recovery of on-premises and cloud-based applications. It uses storage, compute, point-in-time recovery, and cost-optimization.

AWS Resilience Hub

AWS Resilience Hub is a service designed to help customers define, measure, and manage the resilience of their applications in the cloud. This service helps you define RTO (Recovery Time Objective) and RPO (Recovery Point Objective) and evaluates the configuration to meet the requirements defined. Aligned with the AWS Well-Architected Framework, this service can recover applications deployed with AWS CloudFormation, and integrates with AWS Fault Injection Simulator, AWS Systems Manager, or Amazon CloudWatch.

AWS Migration Hub Strategy Recommendations

One of the critical tasks in a migration is determining the right strategy. AWS Migration Hub can help you build a migration and modernization strategy for applications running on-premises or in AWS. AWS Migration Hub Strategy Recommendations were announced on October 2021. It’s designed to be the starting point for your cloud journey. It helps you to assess the appropriate strategy to transform your portfolios to use the full benefits of cloud services.

AWS Migration Hub Refactor Spaces (Preview)

Refactoring is the migration strategy that requires the biggest effort, but it permits you to take full advantage of cloud-native features to improve agility, performance, and scalability. AWS Migration Hub Refactor Spaces is the starting point for incremental application refactoring to microservices in AWS. It will help you reduce the undifferentiated heavy lifting of building and operating your AWS infrastructure for incremental refactoring.

AWS Database Migration Service

AWS Database Migration Service (AWS DMS) is a service that helps you migrate databases to AWS quickly and securely.

AWS DMS Fleet Advisor is a new free feature of AWS DMS that enables you to quickly build a database and analytics migration plan, by automating the discovery and analysis of your fleet. AWS DMS Fleet Advisor is intended for users looking to migrate a large number of database and analytic servers to AWS.

AWS Microservice Extractor for .NET is a new free tool and simplifies the process of re-architecting applications into smaller code projects. Modernize and transform your .NET applications with an assistive tool that analyzes source code and runtime metrics. It creates a visual representation of your application and its dependencies.

This tool visualizes your applications source code, helps with code refactoring, and assists in extraction of the code base into separate code projects. Teams can then develop, build, and operate independently to improve agility, uptime, and scalability.

AWS Migration Evaluator

AWS Migration Evaluator (ME) is a migration assessment service that helps you create a directional business case for AWS Cloud planning and migration. Building a business case for the cloud can be a time-consuming process on your own. With Migration Evaluator, organizations can accelerate their evaluation and decision-making for migration to AWS. During 2021, there were some existing improvements to mention:

Quick Insights. This new capability of Migration Evaluator, provides customers with a one-page summary of their projected AWS costs, based on measured on-premises provisioning and utilization.
Enhanced Microsoft SQL Discovery. This is a new feature of the Migration Evaluator Collector, which assists you by including your SQL Server environment in their migration assessment.
Agentless Collection for Dependency Mapping. The ME Collector now enables agentless network traffic collection to be sent to the customer’s AWS Migration Hub account.

AWS Amplify Studio

This is a visual development environment that offers frontend developers new features to accelerate UI development with minimal coding, while integrating with Amplify. Read Introducing AWS Amplify Studio.

Conclusion

Migration is a crucial process for many enterprises as they move from on-premises systems to the cloud. It helps accelerate your cloud journey, and offers additional tools and methodologies created by AWS. AWS has created and is continually improving services and features to optimize the migration process and help you reach your business goals faster.

Related information

Let’s Architect! Architecting for Blockchain

2022-03-23 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-architecting-for-blockchain/

You’ve likely read about or heard someone talk about blockchain. This distributed and decentralized ledger collects immutable blocks of information and helps secure your data without going through third party. It is commonly used to maintain secure and decentralized records for registries, consensus, cryptocurrencies, and the latest trend: non-fungible tokens (NFTs).

This collection of content will help you learn the basics of blockchain and drill down in to the mindset to apply while architecting for blockchain. We focus on the architectural aspects to explain what the blockchain is from a technological perspective, how it works, when we need it, as well as its characteristics applied to different scenarios.

Amazon Managed Blockchain: When to use blockchain

There is a lot of buzz about blockchain, but when should you use it? What are its benefits and limitations? This video introduces you to Amazon Managed Blockchain and will help you identify if blockchain is a good solution for you and what type of blockchain is best suited for your use case.

John Liu covers the characteristics and benefits of private and public blockchain

Deep Dive on Amazon Managed Blockchain

In this video, Johnathan Fritz, a Principal Product Manager for Managed Blockchain shares some challenges his team faced while building a distributed and immutable network and how they overcame them. The talk provides a good example of mental models you can use to understand and solve challenges while architecting.

Blockchain is based on a consensus mechanism in a distributed system

Mint and deploy NFTs to the Ethereum blockchain using Amazon Managed Blockchain

Buying NFTs is a hot topic right now. But how do you create your own? This blog post provides you a step-by-step guide that shows you how to create an NFT and how to establish a workflow to deploy ERC-721 contracts to the public blockchain Ethereum Rinkeby testnet.

The architecture uses Managed Blockchain to take advantage of maintained Ethereum nodes and allow developers to focus on smart contracts

How Specright uses Amazon QLDB to create a traceable supply chain network

Blockchain and distributed ledger technologies focus on decentralizing applications involving multiple parties where no single entity owns the application. When your application is decentralized and involves multiple, unknown parties, blockchains can be appropriate. On the other hand, if your application only requires a complete and verifiable history of data changes, you can consider a ledger database.

This post shows how Specright uses use Amazon Quantum Ledger Database (Amazon QLDB) to generate a complete, verifiable history of data changes, to generate an append-only immutable journal of events. Their architecture makes sure that all members of the network have access to the same and latest version of the specification to instantly track change history to investigate quality issues.

This architecture allows all members of the supply chain network to access the same and latest versions of specifications

See you next time!

Thanks for reading! If you’re looking for more ways tools to architect your workload, check out the AWS Architecture Center.

See you in a couple of weeks when we discuss strategies for running microservices with containers!

Deploy Quarkus-based applications using AWS Lambda with AWS SAM

2022-03-23 Joan Bonilla

Post Syndicated from Joan Bonilla original https://aws.amazon.com/blogs/architecture/deploy-quarkus-based-applications-using-aws-lambda-with-aws-sam/

Quarkus offers Java developers the capability of building native images based on GraalVM. A native image is a binary that includes everything: your code, libraries, and a smaller virtual machine (VM). This approach improves the startup time of your AWS Lambda functions, because it is optimized for container-based environments. These use cloud native and serverless architectures with a container-first philosophy.

In this blog post, you learn how to integrate the Quarkus framework with AWS Lambda functions, using the AWS Serverless Application Model (AWS SAM).

Reduce infrastructure costs and improve latency

When you develop applications with Quarkus and GraalVM with native images, the bootstrap file generated requires more time to compile, but it has a faster runtime. GraalVM is a JIT compiler that generates optimized native machine code that provides different garbage collector implementations, and uses less memory and CPU. This is achieved with a battery of advanced compiler optimizations and aggressive and sophisticated inlining techniques. By using Quarkus, you can also reduce your infrastructure costs because you need less resources.

With Quarkus and AWS SAM features, you can improve the latency performance of your Java-based AWS Lambda functions by reducing the cold-start time. A cold-start is the initialization time that a Lambda function takes before running the actual code. After the function is initialized for the first time, future requests will reuse the same execution environment without incurring the cold-start time, leading to improved performance.

Overview of solution

Figure 1 shows the AWS components and workflow of our solution.

Architecture diagram deploying an AWS SAM template using the Amazon API Gateway and AWS Lambda services with Amazon CloudWatch metrics

Figure 1. Architecture diagram for Quarkus (AWS Lambda) application

With AWS SAM, you can easily integrate external frameworks by using custom runtimes and configuring properties in the template file and the Makefile.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Software components: Java 11 JDK (like Amazon Corretto), Maven, and the latest AWS SAM CLI.
Admin permissions in the following services: AWS Lambda, Amazon CloudWatch, Amazon API Gateway, and AWS Identity and Access Management. You should consider least privilege principle in your own use case as a best practice.

Creating a Java-based AWS Lambda function

AWS SAM provides default templates to accelerate the development of new functions. Create a Java-based function by following these steps:

Run the following command in your terminal:

sam init -a x86_64 -r java11 -p Zip -d maven -n java11-mvn-default

These parameters select a x86 architecture, java11 as Java runtime LTS version, Zip as a build artifact, and Maven as the package and dependency tool. It also defines the project name.

Choose the first option to use a template for your base code:

1 – AWS Quick Start Templates

Finally, with the previous selection you have different templates to choose from to create the base structure of your function. In our case, select the first one, which creates an AWS Lambda function calling an external HTTPS endpoint. This will get the IP address and return it with a “Hello World” response to the user in JSON:

1 – Hello World Example

The output will yield the following, shown in Figure 2:

AWS SAM input fields to select the programming language, the build artifact, the project name and the dependency tool for our sample.

Figure 2. AWS SAM configuration input data

Integrating Quarkus framework

Using AWS SAM, you can easily integrate non-AWS custom runtimes in your AWS Lambda functions. With this feature, you can integrate the Quarkus framework. Follow the next four steps:

1. Create a Makefile file

Create a “Makefile” file in the “HelloWorldFunction” directory with this code:

build-HelloWorldFunction:
mvn clean package -Pnative -Dquarkus.native.container-build=true -Dquarkus.native.builder-image=quay.io/quarkus/ubi-quarkus-mandrel:21.3-java11
@ unzip ./target/function.zip -d $(ARTIFACTS_DIR)

With this snippet, you are configuring AWS SAM to build the bootstrap runtime using Maven instructions for AWS SAM.

Using Quarkus, you can build a Linux executable without having to install GraalVM with the next option:

-Dquarkus.native.container-build=true

For more information, you can visit the official site and learn more about building a native image.

2. Configure Maven dependencies

As a Maven project, include the necessary dependencies. Change the pom.xml file in the “HelloWorldFunction” directory to remove the default libraries:

<dependencies>
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-lambda-java-core</artifactId>
    <version>1.2.1</version>
</dependency>
<dependency>
    <groupId>com.amazonaws</groupId>
    <artifactId>aws-lambda-java-events</artifactId>
    <version>3.6.0</version>
</dependency>
</dependencies>

Add the Quarkus libraries, profile, and plugins in the right pom.xml section as shown in the following XML configuration. At the current time, the latest version of Quarkus is 2.7.1.Final. We highly recommend using the latest versions of the libraries and plugins:

<dependencies>
<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-amazon-lambda</artifactId>
    <version>2.7.1.Final</version>
</dependency>
<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-arc</artifactId>
    <version>2.7.1.Final</version>
</dependency>
<dependency>
    <groupId>junit</groupId>
    <artifactId>junit</artifactId>
    <version>4.13.1</version>
    <scope>test</scope>
</dependency>
</dependencies>

<build>
<finalName>function</finalName>
<plugins>
    <plugin>
      <groupId>io.quarkus</groupId>
      <artifactId>quarkus-maven-plugin</artifactId>
      <version>2.7.1.Final</version>
      <extensions>true</extensions>
      <executions>
        <execution>
          <goals>
            <goal>build</goal>
            <goal>generate-code</goal>
            <goal>generate-code-tests</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
</plugins>
</build>

<profiles>
<profile>
    <id>native</id>
    <activation>
      <property>
        <name>native</name>
      </property>
    </activation>
    <properties>
      <quarkus.package.type>native</quarkus.package.type>
    </properties>
</profile>
</profiles>

3. Configure the template.yaml to use the previous Makefile

To configure the AWS SAM template to use your own Makefile configuration using Quarkus and Maven instructions correctly, edit the template.yaml file to add the following properties:

Resources:
HelloWorldFunction:
    Metadata:
      BuildMethod: makefile
    Properties:
      Runtime: provided

4. Add a new properties file to enable SSL configuration

Finally, create an application.properties file in the directory: ../HelloWorldFunction/src/main/resources/ with the following property:

quarkus.ssl.native=true

This property is needed because the sample function uses a secure connection to https://checkip.amazonaws.com. It will get the response body in the sample you selected previously.

Now you can build and deploy your first Quarkus function with the following AWS SAM commands:

sam build

This will create the Zip artifact using the Maven tool and will build the native image to deploy on AWS Lambda based on your previous Makefile configuration. Finally, run the following AWS SAM command to deploy your function:

sam deploy -–guided

The first time you deploy an AWS SAM application, you can customize some configurations or parameters like the Stack name, the AWS Region, and more (see Figure 3). You can also accept the default one. For more information about AWS SAM deploy options, read the AWS SAM documentation.

AWS SAM input fields to configure the deployment options in our sample.

Figure 3. Lambda deployment configuration input data

This sample configuration enables you to configure the necessary IAM permissions to deploy the AWS SAM resources for this sample. After completing the task, you can see the AWS CloudFormation Stack and resources created by AWS SAM.

You have now created and deployed an HTTPS API Gateway endpoint with a Quarkus application on AWS Lambda that you can test.

Testing your Quarkus function

Finally, test your Quarkus function in the AWS Management Console by selecting the new function in the AWS Lambda functions list. Use the test feature included in the console, as shown in Figure 4:

Test Quarkus execution result succeeded showing the response body returning the IP address.

Figure 4. Lambda execution test example

You will get a response to your Lambda request and a summary. This includes information like duration, or resources needed in your new Quarkus function. For more information about testing applications on AWS SAM, you can read Testing and debugging serverless applications. You can also visit the official site to read more information using AWS SAM with Quarkus.

Cleaning up

To avoid incurring future charges, delete the resources created in your AWS Lambda stack. You can delete resources with the following command:

sam delete

Conclusion

In this post, we demonstrated how to integrate Java frameworks like Quarkus on AWS Lambda using custom runtimes with AWS SAM. This enables you to configure custom build configurations or your preferred frameworks. These tools improve the developer experience, standardizing the tool used to develop serverless applications with future requirements, showing a strong flexibility for developers.

The Quarkus native image generated and applied in the AWS Lambda function reduces the heavy Java footprint. You can use your Java skills to develop serverless applications without having to change the programming language. This is a great advantage when cold-starts or compute resources are important for business or technical requirements.

Migrating a self-managed message broker to Amazon SQS

2022-03-22 Vikas Panghal

Post Syndicated from Vikas Panghal original https://aws.amazon.com/blogs/architecture/migrating-a-self-managed-message-broker-to-amazon-sqs/

Amazon Payment Services is a payment service provider that operates across the Middle East and North Africa (MENA) geographic regions. Our mission is to provide online businesses with an affordable and trusted payment experience. We provide a secure online payment gateway that is straightforward and safe to use.

Amazon Payment Services has regional experts in payment processing technology in eight countries throughout the Gulf Cooperation Council (GCC) and Levant regional areas. We offer solutions tailored to businesses in their international and local currency. We are continuously improving and scaling our systems to deliver with near-real-time processing capabilities. Everything we do is aimed at creating safe, reliable, and rewarding payment networks that connect the Middle East to the rest of the world.

Our use case of message queues

Our business built a high throughput and resilient queueing system to send messages to our customers. Our implementation relied on a self-managed RabbitMQ cluster and consumers. Consumer is a software that subscribes to a topic name in the queue. When subscribed, any message published into the queue tagged with the same topic name will be received by the consumer for processing. The cluster and consumers were both deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances. As our business scaled, we faced challenges with our existing architecture.

Challenges with our message queues architecture

Managing a RabbitMQ cluster with its nodes deployed inside Amazon EC2 instances came with some operational burdens. Dealing with payments at scale, managing queues, performance, and availability of our RabbitMQ cluster introduced significant challenges:

Managing durability with RabbitMQ queues. When messages are placed in the queue, they persist and survive server restarts. But during a maintenance window they can be lost because we were using a self-managed setup.
Back-pressure mechanism. Our setup lacked a back-pressure mechanism, which resulted in flooding our customers with huge number of messages in peak times. All messages published into the queue were getting sent at the same time.
Customer business requirements. Many customers have business requirements to delay message delivery for a defined time to serve their flow. Our architecture did not support this delay.
Retries. We needed to implement a back-off strategy to space out multiple retries for temporarily failed messages.

Figure 1. Amazon Payment Services’ previous messaging architecture

The previous architecture shown in Figure 1 was able to process a large load of messages within a reasonable delivery time. However, when the message queue built up due to network failures on the customer side, the latency of the overall flow was affected. This required manually scaling the queues, which added significant human effort, time, and overhead. As our business continued to grow, we needed to maintain a strict delivery time service level agreement (SLA.)

Using Amazon SQS as the messaging backbone

The Amazon Payment Services core team designed a solution to use Amazon Simple Queue Service (SQS) with AWS Fargate (see Figure 2.) Amazon SQS is a fully managed message queuing service that enables customers to decouple and scale microservices, distributed systems, and serverless applications. It is a highly scalable, reliable, and durable message queueing service that decreases the complexity and overhead associated with managing and operating message-oriented middleware.

Amazon SQS offers two types of message queues. SQS standard queues offer maximum throughput, best-effort ordering, and at-least-once delivery. SQS FIFO queues provide that messages are processed exactly once, in the exact order they are sent. For our application, we used SQS FIFO queues.

In SQS FIFO queues, messages are stored in partitions (a partition is an allocation of storage replicated across multiple Availability Zones within an AWS Region). With message distribution through message group IDs, we were able to achieve better optimization and partition utilization for the Amazon SQS queues. We could offer higher availability, scalability, and throughput to process messages through consumers.

Figure 2. Amazon Payment Services’ new architecture using Amazon SQS, Amazon ECS, and Amazon SNS

This serverless architecture provided better scaling options for our payment processing services. This helps manage the MENA geographic region peak events for the customers without the need for capacity provisioning. Serverless architecture helps us reduce our operational costs, as we only pay when using the services. Our goals in developing this initial architecture were to achieve consistency, scalability, affordability, security, and high performance.

How Amazon SQS addressed our needs

Migrating to Amazon SQS helped us address many of our requirements and led to a more robust service. Some of our main issues included:

Losing messages during maintenance windows

While doing manual upgrades on RabbitMQ and the hosting operating system, we sometimes faced downtimes. By using Amazon SQS, messaging infrastructure became automated, reducing the need for maintenance operations.

Handling concurrency

Different customers handle messages differently. We needed a way to customize the concurrency by customer. With SQS message group ID in FIFO queues, we were able to use a tag that groups messages together. Messages that belong to the same message group are always processed one by one, in a strict order relative to the message group. Using this feature and a consistent hashing algorithm, we were able to limit the number of simultaneous messages being sent to the customer.

Message delay and handling retries

When messages are sent to the queue, they are immediately pulled and received by customers. However, many customers ask to delay their messages for preprocessing work, so we introduced a message delay timer. Some messages encounter errors that can be resubmitted. But the window between multiple retries must be delayed until we receive delivery confirmation from our customer, or until the retries limit is exceeded. Using SQS, we were able to use the ChangeMessageVisibility operation, to adjust delay times.

Scalability and affordability

To save costs, Amazon SQS FIFO queues and Amazon ECS Fargate tasks run only when needed. These services process data in smaller units and run them in parallel. They can scale up efficiently to handle peak traffic loads. This will satisfy most architectures that handle non-uniform traffic without needing additional application logic.

Secure delivery

Our service delivers messages to the customers via host-to-host secure channels. To secure this data outside our private network, we use Amazon Simple Notification Service (SNS) as our delivery mechanism. Amazon SNS provides HTTPS endpoint delivery of messages coming to topics and subscriptions. AWS enables at-rest and/or in-transit encryption for all architectural components. Amazon SQS also provides AWS Key Management Service (KMS) based encryption or service-managed encryption to encrypt the data at rest.

Performance

To quantify our product’s performance, we monitor the message delivery delay. This metric evaluates the time between sending the message and when the customer receives it from Amazon payment services. Our goal is to have the message sent to the customer in near-real time once the transaction is processed. The new Amazon SQS/ECS architecture enabled us to achieve 200 ms with p99 latency.

Summary

In this blog post, we have shown how using Amazon SQS helped transform and scale our service. We were able to offer a secure, reliable, and highly available solution for businesses. We use AWS services and technologies to run Amazon Payment Services payment gateway, and infrastructure automation to deliver excellent customer service. By using Amazon SQS and Amazon ECS Fargate, Amazon Payment Services can offer secure message delivery at scale to our customers.

Optimize AI/ML workloads for sustainability: Part 2, model development

2022-03-21 Benoit de Chateauvieux

Post Syndicated from Benoit de Chateauvieux original https://aws.amazon.com/blogs/architecture/optimize-ai-ml-workloads-for-sustainability-part-2-model-development/

More complexity often means using more energy, and machine learning (ML) models are becoming bigger and more complex. And though ML hardware is getting more efficient, the energy required to train these ML models is increasing sharply.

In this series, we’re following the phases of the Well-Architected machine learning lifecycle (Figure 1) to optimize your artificial intelligence (AI)/ML workloads. In Part 2, we examine the model development phase and show you how to train, tune, and evaluate your ML model to help you reduce your carbon footprint.

If you missed the first part of this series, we showed you how to examine your workload to help you 1) evaluate the impact of your workload, 2) identify alternatives to training your own model, and 3) optimize data processing.

Figure 1. ML lifecycle

Model building

Define acceptable performance criteria

When you build an ML model, you’ll likely need to make trade-offs between your model’s accuracy and its carbon footprint. When we focus only on the model’s accuracy, we “ignore the economic, environmental, or social cost of reaching the reported accuracy.” Because the relationship between model accuracy and complexity is at best logarithmic, training a model longer or looking for better hyperparameters only leads to a small increase in performance.

Establish performance criteria that support your sustainability goals while meeting your business requirements, not exceeding them.

Select energy-efficient algorithms

Begin with a simple algorithm to establish a baseline. Then, test different algorithms with increasing complexity to observe whether performance has improved. If so, compare the performance gain against the difference in resources required.

Try to find simplified versions of algorithms. This will help you use less resources to achieve a similar outcome. For example, DistilBERT, a distilled version of BERT, has 40% fewer parameters, runs 60% faster, and preserves 97% of BERT’s performance.

Use pre-trained or partially pre-trained models

Consider techniques to avoid training a model from scratch:

Transfer Learning: Use a pre-trained source model and reuse it as the starting point for a second task. For example, a model trained on ImageNet (14 million images) can generalize with other datasets.
Incremental Training: Use artifacts from an existing model on an expanded dataset to train a new model.

Optimize your deep learning models to accelerate training

Compile your DL models from their high-level language representation to hardware-optimized instructions to reduce training time. You can achieve this with open-source compilers or Amazon SageMaker Training Compiler, which can speed up training of DL models by up to 50% by more efficiently using SageMaker GPU instances.

Start with small experiments, datasets, and compute resources

Experiment with smaller datasets in your development notebook. This allows you to iterate quickly with limited carbon emission.

Automate the ML environment

When building your model, use Lifecycle Configuration Scripts to automatically stop idle SageMaker Notebook instances. If you are using SageMaker Studio, install the auto-shutdown Jupyter extension to detect and stop idle resources.

Use the fully managed training process provided by SageMaker to automatically launch training instances and shut them down as soon as the training job is complete. This minimizes idle compute resources and thus limits the environmental impact of your training job.

Adopt a serverless architecture for your MLOps pipelines. For example, orchestration tools like AWS Step Functions or SageMaker Pipelines only provision resources when work needs to be done. This way, you’re not maintaining compute infrastructure 24/7.

Model training

Select sustainable AWS Regions

As mentioned in Part 1, select an AWS Region with sustainable energy sources. When regulations and legal aspects allow, choose Regions near Amazon renewable energy projects and Regions where the grid has low published carbon intensity to train your model.

Use a debugger

A debugger like SageMaker Debugger can identify training problems like system bottlenecks, overfitting, saturated activation functions, and under-utilization of system resources. It also provides built-in rules like LowGPUUtilization or Overfit. These rules monitor your workload and will automatically stop a training job as soon as it detects a bug (Figure 2), which helps you avoid unnecessary carbon emissions.

Figure 2. Automatically stop buggy training jobs with SageMaker Debugger

Optimize the resources of your training environment

Reference the recommended instance types for the algorithm you’ve selected in the SageMaker documentation. For example, for DeepAR, you should start with a single CPU instance and only switch to GPU and multiple instances when necessary.

Right size your training jobs with Amazon CloudWatch metrics that monitor the utilization of resources like CPU, GPU, memory, and disk utilization.

Consider Managed Spot Training, which takes advantage of unused Amazon Elastic Compute Cloud (Amazon EC2) capacity and can save you up to 90% in cost compared to On-Demand instances. By shaping your demand for the existing supply of EC2 instance capacity, you will improve your overall resource efficiency and reduce idle capacity of the overall AWS Cloud.

Use efficient silicon

Use AWS Trainium for optimized for DL training workloads. It is expected to be our most energy efficient processor for this purpose.

Archive or delete unnecessary training artifacts

Organize your ML experiments with SageMaker Experiments to clean up training resources you no longer need.

Reduce the volume of logs you keep. By default, CloudWatch retains logs indefinitely. By setting limited retention time for your notebooks and training logs, you’ll avoid the carbon footprint of unnecessary log storage.

Model tuning and evaluation

Use efficient cross-validation techniques for hyperparameter optimization

Prefer Bayesian search over random search (and avoid grid search). Bayesian search makes intelligent guesses about the next set of parameters to pick based on the prior set of trials. It typically requires 10 times fewer jobs than random search, and thus 10 times less compute resources, to find the best hyperparameters.

Limit the maximum number of concurrent training jobs. Running hyperparameter tuning jobs concurrently gets more work done quickly. However, a tuning job improves only through successive rounds of experiments. Typically, running one training job at a time achieves the best results with the least amount of compute resources.

Carefully choose the number of hyperparameters and their ranges. You get better results and use less compute resources by limiting your search to a few parameters and small ranges of values. If you know that a hyperparameter is log-scaled, convert it to further improve the optimization.

Use warm-start hyperparameter tuning

Use warm-start to leverage the learning gathered in previous tuning jobs to inform which combinations of hyperparameters to search over in the new tuning job. This technique avoids restarting hyperparameter optimization jobs from scratch and thus reduces the compute resources needed.

Measure results and improve

To monitor and quantify improvements of your training jobs, track the following metrics:

Resources provisioned for your training jobs (InstanceCount, InstanceType, and VolumeSizeInGB)
Efficient use of these resources (CPUUtilization, GPUUtilization, GPUMemoryUtilization, MemoryUtilization, and DiskUtilization) in the SageMaker Console, the CloudWatch Console or your SageMaker Debugger Profiling Report

For storage:

The total size of your Amazon Simple Storage Service (Amazon S3) buckets and storage class distribution, using Amazon S3 Storage Lens
The size of your CloudWatch log groups

Conclusion

In this blog post, we discussed techniques and best practices to reduce the energy required to build, train, and evaluate your ML models.

We also provided recommendations for the tuning process as it makes up a large part of the carbon impact of building an ML model. During hyperparameter and neural design search, hundreds of versions of a given model are created, trained, and evaluated before identifying an optimal design.

In the next post, we’ll continue our sustainability journey through the ML lifecycle and discuss the best practices you can follow when deploying and monitoring your model in production.

Want to learn more? Check out the Sustainability Pillar of the AWS Well-Architected Framework, the Architecting for sustainability session at re:Invent 2021, and other blog posts on architecting for sustainability.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Migrating petabytes of data from on-premises file systems to Amazon FSx for Lustre

2022-03-18 Vimala Pydi

Post Syndicated from Vimala Pydi original https://aws.amazon.com/blogs/architecture/migrating-petabytes-of-data-from-on-premises-file-systems-to-amazon-fsx-for-lustre/

Many organizations use the Lustre filesystem for Linux-based applications that require petabytes of data and high-performance storage. Lustre file systems are used in machine learning (ML), high performance computing (HPC), big data, and financial analytics. Many such high-performance workloads are being migrated to Amazon Web Services (AWS) to take advantage of the scalability, elasticity, and agility that AWS offers. Amazon FSx for Lustre is a fully managed service that provides cost-effective, high-performance, and scalable storage for Lustre file systems on AWS.

AWS DataSync is an AWS managed service for copying data to and from Amazon FSx for Lustre. It provides high-speed transfer through its use of compression and parallel transfer mechanism and integrates with Amazon CloudWatch for observability.

This blog will show you how to migrate petabytes of data files from on-premises to Amazon FSx for Lustre using AWS DataSync. It will provide an overview of Amazon CloudWatch metrics and logs to help you monitor your data transfer using AWS DataSync and metrics from Amazon FSx for Lustre.

Solution overview for file storage data migration

The high-level architecture diagram in Figure 1 depicts file storage data migration from on-premises data center to Amazon FSx for Lustre using AWS DataSync.

Following are the steps for the migration:

Create an Amazon FSx file system.
Install AWS DataSync agent on premises to connect to AWS DataSync service over secured TLS connection.
Configure source and target locations to create an AWS DataSync task.
Configure and start the AWS DataSync task to migrate the data from on-premises to Amazon FSx for Lustre.

Figure 1. Architecture diagram for transferring files on-premises to Amazon FSx for Lustre using AWS DataSync

Prerequisites

On-premises hypervisor or virtual machine
The necessary network communications between the AWS DataSync agent and AWS as detailed in AWS DataSync network requirements
AWS Management Console access to AWS DataSync, Amazon FSx for Lustre, and Amazon CloudWatch

Steps for migration

1. Create an Amazon FSx file system

To start the migration, create a Lustre file system in Amazon FSx service and follow the step-by-step guidance provided in Getting started with Amazon FSx for Lustre.

For this blog, a target of ‘Persistent 2’ deployment type FSx for Lustre is selected with a storage capacity of 1.2 TB (Figure 2.)

Figure 2. FSx for Lustre target file system

2. Install AWS DataSync agent on-premises

Follow steps in the article: Getting started with AWS DataSync to get started with the AWS DataSync service. Configure the source system to migrate the file system data using the following steps:

Deploy an AWS DataSync agent on-premises on a supported virtual machine or hypervisor (Figure 3.)
Configure the AWS DataSync agent from AWS Management Console.
Activate the AWS DataSync agent configured from the preceding step.

Figure 3. Create AWS DataSync agent

3. Configure source and destination locations

A DataSync task consists of a pair of locations between which data is transferred. The source location defines the storage system that you want to read from. The destination location defines the storage service that you want to write data to. Here the source location is an on-premises Lustre system and the destination location is the Amazon FSx for Lustre service (Figure 4.)

Figure 4. Configure source and destination location for AWS DataSync task

4. Configure and start task

A task is a set of two locations (source and destination) and a set of options that you use to control the behavior of the task. Create a task with the source and destination locations and choose Start from the Actions menu (Figure 5.)

Figure 5. Start task

Wait for the task status to change to Running (Figure 6.)

Figure 6. Task status

To check the details of the task completion, select the task and click on the History tab (Figure 7.) The status should show Success once the task successfully completes the migration.

Figure 7. Task history

Monitoring the file transfer

Amazon CloudWatch is the AWS native observability service. It collects and processes raw data from AWS services such as Amazon FSx for Lustre and AWS DataSync into readable, near real-time metrics. It provides metrics that you can use to get more visibility into the data transfer. For a full list of CloudWatch metrics for AWS DataSync and Amazon FSx for Lustre, read Monitoring AWS DataSync and Monitoring Amazon FSx for Lustre.

Amazon FSx for Lustre can also provide various metrics, for example, the number of read or write operations using DataReadOperations and DataWriteOperations. To find the total storage available you can check the metric FreeDataStorageCapacity (Figure 8.)

Figure 8. CloudWatch metrics for Amazon FSx for Lustre

AWS DataSync metrics such as FilesTransferred, gives the actual number of files or metadata that transferred over the network. BytesTransferred provides the total number of bytes that transferred over the network when the agent reads from the source location to the destination location.

A robust monitoring system can be built by setting up an automated notification process for any errors or issues in the data transfer task. Integrate Amazon CloudWatch in combination with the Amazon Simple Notification Service (SNS). Figure 9 depicts the AWS DataSync logs in Amazon CloudWatch.

Figure 9. AWS DataSync logs in Amazon CloudWatch

You can also gather insights from the logs of the data transfer metrics using CloudWatch Logs Insights. CloudWatch Log Insights enables you to quickly search and query your log data (Figure 10.) You can set a filter metric for error codes and alert the appropriate team.

Figure 10. Amazon CloudWatch Logs Insights for querying logs

Cleanup

If you are no longer using the resources discussed in this blog, remove the unneeded AWS resources to avoid incurring charges. After finishing the file transfer, clean up resources by deleting the Amazon FSx file system and AWS DataSync objects (DataSync agent, task, source location, and destination location.)

Conclusion

In this post, we demonstrated how we can accelerate migration of Lustre files from on-premises into Amazon FSx for Lustre using AWS DataSync. As a fully managed service, AWS DataSync securely and seamlessly connects to your Amazon FSx for Lustre file system. This makes it possible for you to move millions of files and petabytes of data without the need for deploying or managing infrastructure in the cloud. We walked through different observability metrics with Amazon CloudWatch integration to provide performance metrics, logging, and events. This can further help to speed up critical hybrid cloud storage workflows in industries that must move active files into AWS quickly. This capability is available in Regions where AWS DataSync and Amazon FSx for Lustre are available. For further details on using this cost-effective service, see Amazon FSx for Lustre pricing and AWS DataSync pricing.

For further reading:

Mainframe offloading and modernization: Using mainframe data to build cloud native services with AWS

2022-03-16 Malathi Pinnamaneni

Post Syndicated from Malathi Pinnamaneni original https://aws.amazon.com/blogs/architecture/mainframe-offloading-and-modernization-using-mainframe-data-to-build-cloud-native-services-with-aws/

Many companies in the financial services and insurance industries rely on mainframes for their most business-critical applications and data. But mainframe workloads typically lack agility. This is one reason that organizations struggle to innovate, iterate, and pivot quickly to develop new applications or release new capabilities. Unlocking this mainframe data can be the first step in your modernization journey.

In this blog post, we will discuss some typical offloading patterns. Whether your goal is developing new applications using mainframe data or modernizing with the Strangler Fig Application pattern, you might want some guidance on how to begin.

Refactoring mainframe applications to the cloud

Refactoring mainframe applications to cloud-native services on AWS is a common industry pattern and a long-term goal for many companies to remain competitive. But this takes an investment of time, money, and organizational change management to realize the full benefits. We see customers start their modernization journey by offloading data from the mainframe to AWS to reduce risks and create new capabilities.

The mainframe data offloading patterns that we will discuss in this post use software services that facilitate data replication to Amazon Web Services (AWS):

File-based data synchronization
Change data capture
Event-sourced replication

Once data is liberated from the mainframe, you can develop new agile applications for deeper insights using analytics and machine learning (ML). You could create a microservices-based, or voice-based mobile application. For example, if a bank could access their historical mainframe data to analyze customer behavior, they could develop a new solution based on profiles to use for loan recommendations.

The patterns we illustrate can be used as a reference to begin your modernization efforts with reduced risk. The long-term goal is to rewrite the mainframe applications and modernize them workload by workload.

Solution overview: Mainframe offloading and modernization

This figure shows the flow of data being replicated from mainframe using integration services and consumed in AWS

Figure 1. Mainframe offloading and modernization conceptual flow

Mainframe modernization: Architecture reference patterns

File-based batch integration

Modernization scenarios often require replicating files to AWS, or synchronizing between on-premises and AWS. Use cases include:

Analyzing current and historical data to enhance business analytics
Providing data for further processing on downstream or upstream dependent systems. This is necessary for exchanging data between applications running on the mainframe and applications running on AWS

This diagram shows a file-based integration pattern on how data can be replicated to AWS for interactive data analytics

Figure 2. File-based batch ingestion pattern for interactive data analytics

File-based batch integration – Batch ingestion for interactive data analytics (Figure 2)

Data ingestion. In this example, we show how data can be ingested to Amazon S3 using AWS Transfer Family Services or AWS DataSync. Mainframe data is typically encoded in extended binary-coded decimal interchange code (EBCDIC) format. Prescriptive guidance exists to convert EBCDIC to ASCII format.
Data transformation. Before moving data to AWS data stores, transformation of the data may be necessary to use it for analytics. AWS analytics services like AWS Glue and AWS Lambda can be used to transform the data. For large volume processing, use Apache Spark on AWS Elastic Map Reduce (Amazon EMR), or a custom Spring Boot application running on Amazon EC2 to perform these transformations. This process can be orchestrated using AWS Step Functions or AWS Data Pipeline.
Data store. Data is transformed into a consumable format that can be stored in Amazon S3.
Data consumption. You can use AWS analytics services like Amazon Athena for interactive ad-hoc query access, Amazon QuickSight for analytics, and Amazon Redshift for complex reporting and aggregations.

This diagram shows a file-based integration pattern on how data can be replicated to AWS for further processing by downstream systems

Figure 3. File upload to operational data stores for further processing

File-based batch integration – File upload to operational data stores for further processing (Figure 3)

Using AWS File Transfer Services, upload CSV files to Amazon S3.
Once the files are uploaded, S3’s event notification can invoke AWS Lambda function to load to Amazon Aurora. For low latency data access requirements, you can use a scalable serverless import pattern with AWS Lambda and Amazon SQS to load into Amazon DynamoDB.
Once the data is in data stores, it can be consumed for further processing.

Transactional replication-based integration (Figure 4)

Several modernization scenarios require continuous near-real-time replication of relational data to keep a copy of the data in the cloud. Change Data Capture (CDC) for near-real-time transactional replication works by capturing change log activity to drive changes in the target dataset. Use cases include:

Command Query Responsibility Segregation (CQRS) architectures that use AWS to service all read-only and retrieve functions
On-premises systems with tightly coupled applications that require a phased modernization
Real-time operational analytics

This diagram shows a transaction-based replication (CDC) integration pattern on how data can be replicated to AWS for building reporting and read-only functions

Figure 4. Transactional replication (CDC) pattern

Partner CDC tools in the AWS Marketplace can be used to manage real-time data movement between the mainframe and AWS.
You can use a fan-out pattern to read once from the mainframe to reduce processing requirements and replicate data to multiple data stores based on your requirements:
- For low latency requirements, replicate to Amazon Kinesis Data Streams and use AWS Lambda to store in Amazon DynamoDB.
- For critical business functionality with complex logic, use Amazon Aurora or Amazon Relational Database Service (RDS) as targets.
- To build data lake or use as an intermediary for ETL processing, customers can replicate to S3 as target.
Once the data is in AWS, customers can build agile microservices for read-only functions.

Message-oriented middleware (event sourcing) integration (Figure 5)

With message-oriented middleware (MOM) systems like IBM MQ on mainframe, several modernization scenarios require integrating with cloud-based streaming and messaging services. These act as a buffer to keep your data in sync. Use cases include:

Consume data from AWS data stores to enable new communication channels. Examples of new channels can be mobile or voice-based applications and can be innovations based on ML
Migrate the producer (senders) and consumer (receivers) applications communicating with on-premises MOM platforms to AWS with an end goal to retire on-premises MOM platform

This diagram shows an event-sourcing integration reference pattern for customers using middleware systems like IBM MQ on-premises with AWS services

Figure 5. Event-sourcing integration pattern

Mainframe transactions from IBM MQ can be read using a connector or a bridge solution. They can then be published to Amazon MQ queues or Amazon Managed Streaming for Apache Kakfa (MSK) topics.
Once the data is published to the queue or topic, consumers encoded in AWS Lambda functions or Amazon compute services can process, map, transform, or filter the messages. They can store the data in Amazon RDS, Amazon ElastiCache, S3, or DynamoDB.
Now that the data resides in AWS, you can build new cloud-native applications and do the following:

- Push notifications. Use Amazon RDS or S3 event triggers to call Amazon Simple Notification Services (SNS) to push notifications to mobile devices.
- Build innovative services. Invoke ML services such as Amazon SageMaker directly from these data stores. Build voice interfaces using Amazon API Gateway, Amazon Lex, or Amazon Alexa skills.
- Enable new functions. Build new applications with business logic residing in microservices hosted by AWS Lambda or in containers within Amazon Elastic Container Service (ECS).

Conclusion

Mainframe offloading and modernization using AWS services enables you to reduce cost, modernize your architectures, and integrate your mainframe and cloud-native technologies. You’ll be able to inform your business decisions with improved analytics, and create new opportunities for innovation and the development of modern applications.

Building your brand as a Solutions Architect

2022-03-15 Clare Holley

Post Syndicated from Clare Holley original https://aws.amazon.com/blogs/architecture/building-your-brand-as-a-solutions-architect/

As AWS Solutions Architects, we use our business, technical, and people skills to help our customers understand, implement, and refine cloud-based solutions. We keep up-to-date with always-evolving technology trends and use our technical training to provide scalable, flexible, secure, and resilient cloud architectures that benefit our customers.

Today, each of us will examine how we’ve established our “brand” as Solutions Architects.

“Each of us has a brand, but we have to continuously cultivate and refine it to highlight what we’re passionate about to give that brand authenticity and make it work for us.” – Bhavisha Dawada, Enterprise Solutions Architect at AWS

We talk about our journeys as Solutions Architects and show you the specific skills and techniques we’ve sought out, learned, and refined to help set you on the path to success in your career as a Solutions Architect. We’ll share tips on how to establish yourself when you’re just starting out and how to move forward if you’re already a few years in.

Establishing your brand

As a Solutions Architect, there are many resources available to help you establish your brand. You can pursue specific training or attend workshops to develop your business and technical acumen. You’ll also have opportunities to attend or even speak at industry conferences about trends and innovation in the tech industry.

Learning, adapting, and constantly growing is how you’ll find your voice and establish your brand.

Bhavisha Dawada, Enterprise Solutions Architect

Bhavisha helps customers solve business problems using AWS technology. She is passionate about working in the field of technology and helping customers build and scale enterprise applications.

What helped her move forward in her career? “When I joined AWS, I had limited cloud experience and was overwhelmed by the quantity of services I had to learn in order to advise my customers on scalable, flexible, secured, and resilient cloud architectures.

To tackle this challenge, I built a learning plan for areas I was interested in and I wanted to go deeper. AWS offers several resources to help you cultivate your skills, such as AWS blogs, AWS Online Tech Talks, and certifications to keep your skills updated and relevant.”

Jyoti Tyagi, Enterprise Solutions Architect

Jyoti is passionate about artificial intelligence and machine learning. She helps customers to architect securely with best practices of the cloud.

What helped her move forward in her career? “Working closely with mentor and specialist helped me quickly ramp up with necessary business and technical skills.

I also took part in shadow programs and speaking opportunities to further enhance my skills. My advice? Focus on your strengths, and then build a plan to work on where you’re not as strong.”

Clare Holley, Senior Solutions Architect

Clare helps customers achieve their business outcomes on their cloud journey. With more than 20 years of experience in the IT discipline, she helps customers build highly resilient and scalable architectures.

What helped her move forward in her career? “Coming from the database discipline, I wanted to learn more about helping customers with migrations. I took online courses and attended workshops on the various services to help our customers.

I went through a shadowing process on diving deeper on the services. Eventually, I was comfortable with the topics and even began conducting these workshops myself and providing support.

This can be a repeatable pattern. We are encouraged to experiment, learn, and iterate new ways to meet and stay ahead of our customer’s needs. Identify quarterly realistic goals. Adjust time on your calendar to learn and be curious, since self-learning is essential for success.”

Maintaining work-life balance

As a woman in tech, work-life balance is key. Despite being career driven, you also have a life beyond your job. It’s not always easy, but by maintaining a healthy boundary between work and life, you’ll likely be more productive.

Sujatha Kuppuraju, Senior Solutions Architect

Sujatha engages with customers to create innovative and secured solutions that address customer business problems and accelerate the adoption of AWS services.

What helps her maintain a healthy work-life balance? “As a mother, a spouse, and working woman, finding the right balance between family and work is important to me.

Expectations at work and with my family fluctuate, so I have to stay flexible. To re-prioritize my schedule, I analyze the impact, reversibility, raise critical feedback before jumping into a task. This allows me to control where I spend my time.

In this growing technology industry, we have innovative and improved ways to solve problems. I invest time to empower customers and team members to execute changes by themselves. This not only reduces the dependencies on me, it also helps to scale changes and get quick returns on benefits.”

Reach out and connect to others

To continue to succeed as a Solutions Architect, in addition to technical and business knowledge, it is important to discuss ideas and learn from like-minded peers.

Clare highlights the importance of growing your network. “When I joined AWS, I looked into joining several affinity groups. These groups allow people who identify with a cause the opportunity to collaborate. They provide an opportunity for networking, speaking engagements, leadership and career growth opportunities, as well as mentorship. In these affinity groups, I met so many awesome women who inspired me to be authentic, honest, and push beyond my comfort zone.”

Women in tech have unfortunately experienced lack of representation. Bhavisha says having connections in the field helped her feel less alone in the field. “When I was interviewing for AWS, I networked with other women to know more about the company’s culture and learn about their journey at AWS. Learning about this experience helped me feel confident in my career choice.”

Continue your journey

Take the AWS Certified Solution Architect – Associate certification to build your technical skills. This course validates your ability to design and deploy well-architected solutions on AWS that meet customer requirements.

We also encourage you to attend our virtual coffee events, which provide a unique opportunity to engage with AWS leaders, career opportunities, and insight into our approach to serving customers.

Ready to get started?

Interested in applying for a Solutions Architecture role?

We’ve got more content for International Women’s Day!

For more than a week we’re sharing content created by women. Check it out!

Deploying service-mesh-based architectures using AWS App Mesh and Amazon ECS from Kesha Williams, an AWS Hero and award-winning software engineer.
A collection of several blog posts written and co-authored by women
Curated content from the Let’s Architect! team and a live Twitter chat
Women at AWS – Diverse backgrounds make great solutions architects

Other ways to participate

Extend SQL Server DR using log shipping for SQL Server FCI with Amazon FSx for Windows configuration

2022-03-14 Yogi Barot

Post Syndicated from Yogi Barot original https://aws.amazon.com/blogs/architecture/extend-sql-server-dr-using-log-shipping-for-sql-server-fci-with-amazon-fsx-for-windows-configuration/

This week for Women’s History Month, we’re continuing to feature female authors. We’re showcasing women in the tech industry who are building, creating, and, above all, inspiring, empowering, and encouraging everyone—especially women and girls—in tech.

Companies choosing to rehost their on-premises SQL Server workloads to AWS can face challenges with setting up their disaster recovery (DR) strategy. Solutions such as Always On can be a more expensive, complex configuration across Regions. It can cause latency issues when synchronously replicating data cross-Region. Snapshots have additional overhead and may breach their stringent recovery point objective/recovery time objective (RPO/RTO) requirements.

A log shipping solution can take advantage of cross-Region replication of data using Amazon FSx for Windows File Server. It has less maintenance overhead, doesn’t introduce latency, and meets RPO/RTO requirements. A multi-Region architecture for Microsoft SQL Server is often adopted for SQL Server deployments for business continuity (disaster recovery) and improved latency (for a geographically distributed customer base).

This blog post explores SQL Server DR architecture using SQL Server failover cluster with Amazon FSx for Windows File Server for the primary site and secondary DR site. We describe how to set up a multi-Region DR using log shipping. We’ll explain the architecture patterns so you can follow along and effectively design a highly available SQL Server deployment that spans two or more AWS Regions.

Here are some advantages of using log shipping versus Always On distributed availability group DR setup.

Log shipping works with SQL Server Standard edition
It lowers total cost of ownership (TCO) as you only need one SQL Server Standard edition license at the primary/DR site
It’s straightforward to configure
There’s no need for clustering setup at the OS level
It supports all SQL Server versions. You don’t need the SQL Server version to be the same for source and destination instances.

Log shipping DR solution for SQL Server FCI with Amazon FSx

The architecture diagram in Figure 1 depicts SQL Server failover cluster instance (FCI) using Amazon FSx as storage (multiple Availability Zones) in Region 1. It uses a standalone or a similar setup on Region 2. It uses a log shipping feature for replication and DR. This will also serve as the reference architecture for our solution.

Figure 1. Log shipping DR solution for SQL Server FCI with Amazon FSx

Figure 1 shows an SQL cluster in Region 1 and standalone SQL cluster in Region 2. The primary cluster in Region 1 is initially configured with SQL Server failover cluster instance (FCI) using Amazon FSx for its shared storage. Region 2 can have a standalone Amazon EC2 server with SQL Server and Amazon Elastic Block Store (EBS) as storage. Or it can have an identical configuration to Region 1, but with different hostnames, and an SQL network name (SQLFCI02) to avoid possible collisions.

You can build the VPC peering or AWS Transit Gateway to have seamless connectivity between the two Regions for the opened ports (SQL Server, SMB for file share, and others.)

With Amazon FSx, you get a fully managed and shared file storage solution, that automatically replicates the underlying storage synchronously across multiple Availability Zones. Amazon FSx provides high availability with automatic failure detection and automatic failover if there are any hardware or storage issues. The service fully supports continuously available shares, a feature that permits SQL Server uninterrupted access to shared file data.

There is an asynchronous replication setup from Region 1 to Region 2 using the log shipping feature. In this type of configuration, Microsoft SQL Server log shipping replicates databases using transaction logs. This ensures that a physically replicated warm standby database is an exact binary replica of the primary database. This is referred to as physical replication.

Log shipping can be configured with two available modes. These are related to the state of the secondary log-shipped SQL Server database.

Standby mode. The database is available for querying. Users cannot access the database while restore is going on. But once restore is completed, users can access it in read-only mode.
Restore mode. The database is not accessible for users.

In this solution, you configure a warm standby SQL Server database on an EC2 instance designated in SQL FCI using Amazon FSx as shared storage. You can send transaction log backups asynchronously between your primary Region database and the warm standby server in the other Region. The transaction log backups are then applied to the warm standby database sequentially. When all the logs have been applied, you can perform a manual failover and point the application to the secondary Region. We recommend running the primary and secondary database instances in separate Availability Zones, and configuring a monitor instance to track all the details of log shipping.

Prerequisites

Configure two SQL Server clusters with FCI in Region 1 and Region 2, or have SQL Server cluster on Region 1 and Amazon EC2 with SQL Server with EBS on Region 2. Learn more about how to set up SQL Server FCI with Amazon FSx.
Configure VPC peering or Transit Gateway for the VPCs (where the SQL Server clusters reside).
- VPC peering – Learn more about setting up Inter-Region VPC Peering.
- Transit gateway – Similar to VPC peering. Learn how to Use an AWS Transit Gateway to Simplify Your Network Architecture and Scaling VPN throughput using AWS Transit Gateway.
Configure networking and security to work across the peering connection or Transit Gateway.
Verify that Amazon FSx and SQL connectivity is seamless across both Regions. For example, we should be able to connect Amazon FSx and SQL Server remotely from one Region to the other. Confirm that security group rules are in place. Learn more about FSx for Windows File Server.
SQL Server shouldn’t be running in Express edition as log shipping supports all editions except Express edition.
Give shared folders on primary and secondary Regions appropriate permissions so the network path is accessible across Regions.
The databases for log shipping must be in FULL recovery mode. Learn more about log shipping.

Walkthrough steps to set up DR for SQL Server FCI

Following are the steps required to configure SQL Server DR using SQL Server failover cluster. Amazon FSx for Windows File Server is used for the primary site and secondary DR site. We also demonstrate how to set up a multi-Region log shipping.

Assumed variables

Region_01:
WSFC Cluster Name: SQLCluster1
FCI Virtual Network Name: SQLFCI01
Region_02:
Amazon EC2 Name: EC2SQL2

Make sure to configure network connectivity between your clusters. In this solution, we are using two VPCs in two separate Regions.

- VPC peering is configured to enable network traffic on both VPCs.
- The domain controller (AWS Managed Microsoft AD) on both VPCs are configured with conditional forwarding. This enables DNS resolution between the two VPCs.

Configure SQL FCI setup using Amazon FSx as shared storage on Region_01.
Configure SQL standalone instance on Region_02 with EBS volume as storage.
Create an Amazon FSx in the primary Region with AWS managed Active Directory, or on-premises Active Directory connected with trust relation or AD Connector.
Create a SQL Server service account with proper permissions to be able to set up transaction log settings.
Configure VPC peering between the primary and DR/secondary Region.
Join the domain to the Active Directory network for both primary and secondary servers in primary Region.
Mount Amazon FSx on primary and secondary server and allow shared permissions, so SQL Server is able to access the folder. Use Amazon FSx for storing transaction log backups and EBS for storing transaction logs on the secondary Region.
Set up log shipping from the primary server SQL Server FCI01 to the secondary SQL Server EC2SQL2 with the standby option enabled. This way the databases can be in read on the secondary SQL Server.
In case of disaster, follow the FAILOVER and FAILBACK steps in the next sections. Learn more by reading Change Roles Between Primary and Secondary Log Shipping Servers.

Failover steps

In case of disaster at primary Region node SQLFCI01, log shipping acts as DR solution. Following, we show the steps to bring the databases online on EC2SQL02. Once SQLFCI01 is back, Use the following steps if DR drill checks to failover. In a real disaster, follow the process from Step 3 onwards.

1. Stop all activities on SQLFCI01 databases involved in log shipping jobs on SQLFCI01 and EC2SQL02. Confirm if any process is running by using the following query:

Use master
Go
select * from sysprocesses where dbid = DB_ID('DatabaseName')

2. Take full backup on SQLFCI01 as rollback option.

BACKUP DATABASE [DatabaseName]
TO DISK = N'Provide Drive details'
WITH COMPRESSION
GO

3. Take last tail transaction log backup if we have access to SQL Server. Otherwise, check the last available transaction log stored in EC2SQL02 and restore it with RECOVERY to bring the databases online on EC2SQL02.

RESTORE LOG [DatabaseName] FROM DISK = N'Provide path of last tlog'
WITH FILE = 1, RECOVERY, NOUNLOAD, STATS = 10
GO

4. Redirect the application connections to EC2SQL02.

Failback methods

1. Native backup/restore or rollback strategy

Take full backup from EC2SQL02 and copy to the SQLFCI01.
RESTORE the full backup on SQLFCI01.
Reconfigure log shipping between SQLFCI01 and EC2SQL02.

2. Reverse log shipping

In case of DR drills or business continuity and disaster recovery (BCDR) activities, we can set up reverse log shipping to reduce the time taken to failover. It doesn’t require reinitializing the database with a full backup if performed carefully. It is crucial to preserve the log sequence number (LSN) chain. Perform the final log backup using the NORECOVERY option. Backing up the log with this option puts the database in a state where log backups can be restored. It ensures that the database’s LSN chain doesn’t deviate. This procedure helps reduce downtime to bring back SQLFCI01.

STOP all activities on SQLFCI01 databases involved in log shipping jobs on SQLFCI01 and EC2SQL02.
TAKE Tlog backup of SQLFCI01 with NORECOVERY option.

BACKUP LOG [DatabaseName]
TO DISK = 'BackupFilePathname'
WITH NORECOVERY;

RESTORE transaction log backup on EC2SQL02 with NORECOVERY.
Reconfigure log shipping and reenable the jobs back.
Reconfigure the application connections to SQLFCI01.

Conclusion

A multi-Region strategy for your mission-critical SQL Server deployments is key for business continuity and disaster recovery. This blog post shows how to achieve that using log shipping for SQL Server FCI deployment. Setting up DR using log shipping can help you save costs and meet your business requirements.

To learn more, check out Simplify your Microsoft SQL Server high availability deployments using Amazon FSx for Windows File Server.

Women at AWS – Diverse backgrounds make great solutions architects

2022-03-11 Jigna Gandhi

Post Syndicated from Jigna Gandhi original https://aws.amazon.com/blogs/architecture/women-at-aws-diverse-backgrounds-make-great-solutions-architects/

This International Women’s Day, we’re featuring more than a week’s worth of posts that highlight female builders and leaders. We’re showcasing women in the industry who are building, creating, and, above all, inspiring, empowering, and encouraging everyone—especially women and girls—in tech.

Thinking about becoming a Solutions Architect, but not sure where to start? Wondering if your work experience and skills qualify you for the role? Let us help!

We’re Solutions Architects at Amazon Web Services (AWS). In this post, we’ll cover what Solutions Architects do, what got us interested in being Solutions Architects, and what skills and resources you might need to be successful in the role.

We also share our different career backgrounds and how we ended up pursuing careers as Solutions Architects. Let our experiences be your guide.

What do Solutions Architects do?

We work with enterprise customers from various industries and bring our unique technical and business knowledge to provide technical solutions that use AWS services.

Being a Solutions Architect is a combination of technical and sales roles; the technical aspect is 60-70% and the sales aspect is 30-40%.

As Solutions Architects, we provide technical guidance to customers on how they can achieve business outcomes by using cloud technology. The role requires strong business acumen to understand each stakeholder’s motivation, as well as technical skills to provide guidance.

What we’re working on right now

Despite having the same job title, we all work with different technologies across different industries.

Jigna works with Digital Native Business customers (cloud native/customers who started in the cloud). She helps them apply best practices for AWS services and guides them in implementing complex workloads on AWS. You can see some of her recent work in action here:

Jennifer works with enterprise customers to understand their business requirements and provide technical solutions that align with their objectives. See what she’s co-written recently:

Cheryl works with AWS enterprise customers. Her core area of focus is serverless technologies. Lately, she has presented in AWS She Builds Tech Skills and has co-authored multiple blogs:

Sanjukta works with Greenfield customers (enterprises in early stages of AWS adoption) and helps them accelerate initial workloads and lays the foundations to help them scale their AWS usage to innovate and modernize their applications. She collaborates with AWS internal teams for:

Adoption of AWS Solutions for US Northeast customers
Contributing towards mainframe-focused opportunities for Greenfield customers

What got you interested in this role?

We all started in different roles and had limited exposure to cloud technologies, but we all had one thing in common. We were curious and wanted to learn. Being in this career means that you’re continuously learning and researching about current and emerging technologies.

As we progressed in our careers, we expanded our technical knowledge and skills. Most of us had technical depth in a few areas, such as development or architecture.

We all continued on various career paths and earned in-job training or acquired external certifications that led us to explore Solutions Architecture.

As enterprises adopted cloud technologies, we knew that our ability to adapt to the changing technical landscape along with our industry experience would enable us to better assist customers with their needs to provide the best technical solutions.

How did we get here?

We have prior experience working on analytics, application development, infrastructure, and legacy technologies across financial services, healthcare, retail, and gaming industries. We use this expertise to help AWS customers with similar needs.

We now use this expertise to help AWS customers with similar needs. Strengthening your individual experience will help you become a Solutions Architect.

Jigna has a Bachelor of Engineering in Information Technology. She has held multiple roles ranging from software engineer, cloud engineer, to technical team lead. She has worked with several enterprise customers on their requirements, technical designs, and implementation.

Driven by her passion for helping clients in their business and technology endeavors, she decided to become a Solutions Architect.

Jennifer has a BS in Information Systems. She worked in the financial services industry, where she held various architecture and implementation roles.

She became a Solutions Architect because she was interested in working with a wide range of technical services to provide complete solutions for business applications.

Cheryl has a BS in Physics, Chemistry, and Mathematics and a Post BS Diploma in Computer Science. She has led several complex, highly robust and massively scalable software solutions for large-scale enterprise applications. She has worked as a software engineer and in several other roles in IT.

She became a Solutions Architect because she wanted to use her technical and communication skills to partner successfully with her business counterparts to meet their objectives.

Sanjukta has a BS in Computer Application and an MS in Software Engineering. She has worked on and led several mainframe projects for financial and healthcare enterprises.

As companies retired their legacy applications, she pursued external trainings and certifications to learn Solutions Architecture to help customers in their migration journey.

What skills do I need?

There is no one set of skills that fits all when it comes to being a Solutions Architect.

You do not need to meet all these requirements right now, but they are good skills to develop over time:

Technical Knowledge: Good knowledge of how different technical components work together is beneficial. This includes networking, database, storage, analytics, etc.
Communication: Even though this is a soft skill, learning and practicing how to communicate clearly and confidently will enable you to be successful in customer engagements.
Domain Knowledge: Developing a command over a few domains like retail, financial services, healthcare, etc., is useful, but you don’t need an in-depth knowledge about all domains, industries, or technologies.
Architecture Design: System architecture defines its major components, their relationships, and how they interact with each other.
Resourcefulness: Be curious. Wanting to learn new things is an absolute necessity when you want to be a SA. You may not know all the answers, but being willing and able to find the answers and solve problems is what sets you apart and helps you excel in this field.

Can you give me some resources to get started?

There were various resources we used to develop our skills at AWS:

AWS Cloud Practitioner Essentials is a good starting point.
AWS Skill Builder offers online courses and classroom training to learn new topics and sharpen our skills.
AWS Certifications will help you reinforce knowledge.

Conclusion

Role models and mentors helped us gravitate toward this role. Our colleagues and managers inspired us to broaden our horizons and look beyond our current roles. We encourage you to do the same.

No matter where you’ve started in your career, your talent and experience can be an asset to customers.

Ready to get started?

Interested in applying for a Solutions Architecture role?

We’ve got more content for International Women’s Day!

For more than a week we’re sharing content created by women. Check it out!

Deploying service-mesh-based architectures using AWS App Mesh and Amazon ECS from Kesha Williams, an AWS Hero and award-winning software engineer.
A collection of several blog posts written and co-authored by women
Curated content from the Let’s Architect! team and a live Twitter chat
Another post on Building your brand as a SA

Other ways to participate

Women write blogs: a selection of posts from AWS Solutions Architects

2022-03-10 Bonnie McClure

Post Syndicated from Bonnie McClure original https://aws.amazon.com/blogs/architecture/women-write-blogs/

A blog can be a great starting point for you in finding and implementing a particular solution; learning about new features, services, and products; keeping up with the latest trends and ideas; or even understanding and resolving a tricky problem. Today, as part of our International Women’s Day celebration, we’re showcasing blogs written by women that do just that and more.

We’ve included all kinds of posts for you to peruse:

Architecture overview posts
Best practices posts
Customer/partner (co-written/sponsored/partnered) posts that highlight architectural solutions built with AWS services
How-to tutorials that explain the steps the reader needs to take to complete a task

Architecture overviews

How a Grocer Can Deliver Personalized Experiences with Recipes

by Chara Gravani and Stefano Vozza

Chara and Stefano bring us a way to differentiate and reinvent the customer journey for a grocery retailer. Their solution uses Amazon Personalize to deliver personalized recipe recommendations to increase customer satisfaction and loyalty, and in turn, increase revenue. They consider a customer who is shopping for groceries online. As they place products in their basket, they are presented with a list of recipes that contain the same ingredients as those products added to the basket. The suggested recipes are then personalized based on the customer’s profile and historical product preferences.

Best practices posts

Best practices for migrating self-hosted Prometheus on Amazon EKS to Amazon Managed Service for Prometheus

by Elamaran Shanmugam, Deval Parikh, and Ramesh Kumar Venkatraman

With a focus on the five pillars of the AWS Well-Architected Framework, Elamaran, Deval, and Ramesh examine some of the best practices to follow if you’re moving a self-managed Prometheus workload on Amazon Elastic Kubernetes Service (Amazon EKS) to Amazon Managed Service for Prometheus.

Optimizing your AWS Infrastructure for Sustainability Series

by Katja Philipp, Aleena Yunus, Otis Antoniou, and Ceren Tahtasiz

As organizations align their business with sustainable practices, it is important to review every functional area. If you’re building, deploying, and maintaining an IT stack, improving its environmental impact requires informed decision making. In this three-part blog series, Katja, Aleena, Otis, and Cern provide strategies to optimize your AWS architecture within compute, storage, and networking.

Customer/partner posts

Scaling DLT to 1M TPS on AWS: Optimizing a Regulated Liabilities Network

by Erica Salinas and Jack Iu

Erica and Jack discuss how they partnered with SETL to jointly stand up a basic Regulated Liabilities Network (RLN) and refine the scalability of the environment to at least 1 million transactions per second. They show you how scaling characteristics were achieved while maintaining the business requirements of atomicity and finality and discuss how each RLN component was optimized for high performance.

How-to tutorials

Monitor and visualise building occupancy with AWS IoT Core, Amazon QuickSight and Raspberry Pi

by Jamila Jamilova

Occupancy monitoring in buildings is a valuable tool across different industries. For example, museums can analyze occupancy data in near real-time to understand the popularity and number of visitors to decide where a particular gallery should be located. To help with cases like this, Jamila brings you a solution that monitors how building space is being utilized. It shows how busy each area of a building gets during different times of the day based on a motion sensor’s location. This device, a Raspberry Pi with a passive infrared (PIR) sensor, senses motion in direct proximity (in other words, if a human has moved in or out of the sensor’s range) and will generate data that is stored, analyzed, and visualized to help you understand how best to use your space.

Create an iOS tracker application with Amazon Location Service and AWS Amplify

by Panna Shetty and Fernando Rocha

Emergency management teams venture into dangerous situations to rescue those in need, potentially risking their own lives. To keep themselves safe during an event where they cannot easily track each other by line of sight, a muster point is established as a designated safety zone, or a geofence. This geofence may change in response to evolving conditions. One way to improve this process is automating member tracking and response activity, so that emergency managers can quickly account for all members and ensure they are safe. Panna and Fernando bring you a solution to apply to this situation and others like it. It uses Amazon Location Service to create a serverless architecture that is capable of tracking the user’s current location and identify if they are in a safe area or not.

Optimize workforce in your store using Amazon Rekognition

by Laura Reith and Kayla Jing

Retailers often need to make decisions to improve the in-store customer experience through personnel management. Having too few or too many employees working can be detrimental to the business. When store traffic outpaces staffing, it can result in long checkout lines and limited customer interface, creating a poor customer experience. The opposite can be true as well by having too many employees during periods of low traffic, which generates wasted operating costs. In this post, Laura and Kayla show you how to use Amazon Rekognition and AWS DeepLens to detect and analyze occupancy in a retail business to optimize workforce utilization.

Adding Build MLOps workflows with Amazon SageMaker projects, GitLab, and GitLab pipelines

by Lauren Mullennex, Indrajit Ghosalkar, and Kirit Thadaka

In this post, Lauren, Indrajit, and Kirit walk you through using a custom Amazon SageMaker machine learning operations project template to automatically build and configure a continuous integration/continuous delivery (CI/CD) pipeline. This pipeline incorporates your existing CI/CD tooling with SageMaker features for data preparation, model training, model evaluation, and model deployment. In their use case, they focus on using GitLab and GitLab pipelines with SageMaker projects and pipelines.

Deploying Sample UI Forms using React, Formik, and AWS CDK

by Kevin Rivera, Mark Carlson, Shruti Arora, and Britney Tong

Many companies use UI forms to collect customer data for account registrations, online shopping, and surveys. These forms can be difficult to write, maintain, and test. To help with this, Kevin, Mark, Shruti, and Britney show you how to use the JavaScript libraries React and Formik. These third-party libraries provide front-end developers with tools to implement simple forms for a user interface.

Multi-Region Migration using AWS Application Migration Service

by Shreya Pathak and Medha Shree

Shreya and Medha demonstrate how AWS Application Migration Service simplifies, expedites, and reduces the cost of migrating Amazon Elastic Compute Cloud (Amazon EC2)-hosted workloads from one AWS Region to another. It integrates with AWS Migration Hub, which allows you to organize your servers into applications. With the migration services they discuss, you can track the progress of your migration at the server and application level, even as you move servers into multiple Regions.

Tracking Overall Equipment Effectiveness with AWS IoT Analytics and Amazon QuickSight

by Shailaja Suresh and Michael Brown

To drive process efficiencies and optimize costs, manufacturing organizations need a scalable approach to access data across disparate silos across their organization. In this post, Shailaja and Michael demonstrate how overall equipment effectiveness can be calculated, monitored, and scaled out using two key services: AWS IoT Analytics and Amazon QuickSight.

Use AnalyticsIQ with Amazon QuickSight to gain insights for your business

by Sumitha AP

Sumitha shows you how to use the AnalyticsIQ Social Determinants of Health Sample Data dataset to gain insights into society’s health and wellness and how to generate easy-to-understand visualizations using QuickSight that could improve healthcare professionals’ decision making.

We’ve got more content for International Women’s Day!

For more than a week we’re sharing content created by women. Check it out!

Deploying service-mesh-based architectures using AWS App Mesh and Amazon ECS from Kesha Williams, an AWS Hero and award-winning software engineer.
A collection of several blog posts written and co-authored by women
Curated content from the Let’s Architect! team and a live Twitter chat
A post on Women at AWS – Diverse Backgrounds, Common Goal of Becoming Solutions Architects
Another post on Building your brand as a SA

Other ways to participate

Let’s Architect! Tools for Cloud Architects

2022-03-09 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-tools-for-cloud-architects/

A great way for cloud architects to learn is to experiment with the tools that our teams are using or could consider for the future. This allows us to learn new technologies, become familiar with the latest trends, and understand the entire cycle of our solutions.

Amazon Web Services (AWS) provides several tools for architects, including resources that can analyze your environment for creating a visual diagram and a community of builders who can answer your technical questions.

Today we’re excited to share tools and methodologies that you should be aware of. In honor of the Architecture Blog’s International Women’s Day, half of these tools have been developed with and by women.

AWS Perspective

One of the main challenges for every architect is making sure their documentation is up to date. Recently, we’ve seen the rise of “architecture as code” tools for deriving architecture diagrams directly from the code in production.

In that vein, AWS developed AWS Perspective, a diagramming tool solution that helps you represent your live workload.

AWS Perspective analyzes your environment and creates a diagram with all your cloud components

Chaos Testing with AWS Fault Injection Simulator and AWS CodePipeline

Chaos engineering is the process of testing a distributed computing system to ensure that it can withstand unexpected disruptions.

This blog post shows an architecture pattern for automating chaos testing as part of your continuous integration/continuous delivery (CI/CD) process. By automating the implementation of chaos experiments inside CI/CD pipelines, complex risks and modeled failure scenarios can be tested against application environments with every deployment

This high-level architecture shows how to automate chaos engineering in your environment

AWS re:Post – A Reimagined Q&A Experience for the AWS Community

Often when architecting we run into different design choices, issues, and roadblocks. What service should you use? What is the best way to implement this? Who do you ask?

AWS re:Post is a new question-and-answer service (think Stack Overflow specifically for AWS). It is monitored by the community who answers your questions, and then employees and official partners review these answers to ensure accuracy.

AWS re:Post is public. There is a wide community of AWS experts ready to answer your questions

Establishing Feedback Loops Based on the AWS Well-Architected Framework Review

In 2018, AWS released the Well-Architected Framework, a mechanism for reviewing and/or improving your workloads that provides recommendations based on best practices in different areas such as security, costs optimization, or reliability. This article shows you how to improve iteratively your systems in the cloud using the Well-Architected Framework.

Creating a healthy feedback loop will enhance your architecture over time

See you next time!

Thanks for reading! If you’re looking for more ways tools to architect your workload, check out the AWS Architecture Center.

See you in a couple of weeks when we discuss blockchain!

Overview of AuthO SAML authenticated solution

Prerequisites

Identity provider (Auth0) setup

Step 1: Sign up for an Auth0 account

Step 2: Create Groups in Auth0

Step 3: Install Auth0 Extension to create a group and assign users to the group

Step 4: Create a group in Auth0

Step 5: Create an Auth0 Application

Prepare Amazon OpenSearch for SAML configuration

Auth0 SAML configuration

Amazon OpenSearch SAML configuration

Validating access with Auth0 users

Cleaning up

Conclusion

Threat detection and data protection

Use AWS security services to detect threats and misconfigurations

Detect threats with Amazon GuardDuty

Discover sensitive data with Amazon Macie for Amazon S3

Scan for vulnerabilities with Amazon Inspector

Aggregate security findings with AWS Security Hub

Setting up the threat detection services

Incident response

Playbooks and runbooks for repeatability

Automation for quicker response time

Simulations to improve incident response capabilities

Conclusion

Other blog posts in this series

See you next time!

Other posts in this series

Step 1: Review

Step 2: Refine

Core features

Common features

Special features

Provisioning

Operations

Step 3: Select

Conclusion

Architecture for email threat reports and analytics

Threat report for suspicious emails

Real-time data processing

IOC detection

Conclusion

Example of a multi-language notification use case

Solution design for multi-language notification system

Code repository

Conclusion

Why move to a managed database?

Managed database features

Self-managed databases on AWS

If I do not need a managed database, should I still use Amazon EC2 to host my database?

Can I customize my underlying OS and database environment?

Choosing which migration plan to implement

Conclusion

Introduction to AWS WAF Security Automations

Scanner and probe automation

Scanner and probe solution

Customizing the AWS WAF Security Automation solution

Code explanation:

Conclusion

AWS Mainframe Modernization (Preview)

AWS Migration and Modernization Competency

AWS Application Migration Service (AWS MGN)

AWS Elastic Disaster Recovery

AWS Resilience Hub

AWS Migration Hub Strategy Recommendations

AWS Migration Hub Refactor Spaces (Preview)

AWS Database Migration Service

AWS Migration Evaluator

AWS Amplify Studio

Conclusion

See you next time!

Other posts in this series

Reduce infrastructure costs and improve latency

Overview of solution

Prerequisites

Creating a Java-based AWS Lambda function

Integrating Quarkus framework

Testing your Quarkus function

Cleaning up