Tag Archives: announcements

Mistral AI models now available on Amazon Bedrock

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/mistral-ai-models-now-available-on-amazon-bedrock/

Last week, we announced that Mistral AI models are coming to Amazon Bedrock. In that post, we elaborated on a few reasons why Mistral AI models may be a good fit for you. Mistral AI offers a balance of cost and performance, fast inference speed, transparency and trust, and is accessible to a wide range of users.

Today, we’re excited to announce the availability of two high-performing Mistral AI models, Mistral 7B and Mixtral 8x7B, on Amazon Bedrock. Mistral AI is the 7th foundation model provider offering cutting-edge models in Amazon Bedrock, joining other leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. This integration provides you the flexibility to choose optimal high-performing foundation models in Amazon Bedrock.

Mistral 7B is the first foundation model from Mistral AI, supporting English text generation tasks with natural coding capabilities. It is optimized for low latency with a low memory requirement and high throughput for its size. Mixtral 8x7B is a popular, high-quality, sparse Mixture-of-Experts (MoE) model, that is ideal for text summarization, question and answering, text classification, text completion, and code generation.

Here’s a quick look at Mistral AI models on Amazon Bedrock:

Getting Started with Mistral AI Models
To get started with Mistral AI models in Amazon Bedrock, first you need to get access to the models. On the Amazon Bedrock console, select Model access, and then select Manage model access. Next, select Mistral AI models, and then select Request model access.

Once you have the access to selected Mistral AI models, you can test the models with your prompts using Chat or Text in the Playgrounds section.

Programmatically Interact with Mistral AI Models
You can also use AWS Command Line Interface (CLI) and AWS Software Development Kit (SDK) to make various calls using Amazon Bedrock APIs. Following, is a sample code in Python that interacts with Amazon Bedrock Runtime APIs with AWS SDK:

import boto3
import json

bedrock = boto3.client(service_name="bedrock-runtime")

prompt = "<s>[INST] INSERT YOUR PROMPT HERE [/INST]"

body = json.dumps({
    "prompt": prompt,
    "max_tokens": 512,
    "top_p": 0.8,
    "temperature": 0.5,
})

modelId = "mistral.mistral-7b-instruct-v0:2"

accept = "application/json"
contentType = "application/json"

response = bedrock.invoke_model(
    body=body,
    modelId=modelId,
    accept=accept,
    contentType=contentType
)

print(json.loads(response.get('body').read()))

Mistral AI models in action
By integrating your application with AWS SDK to invoke Mistral AI models using Amazon Bedrock, you can unlock possibilities to implement various use cases. Here are a few of my personal favorite use cases using Mistral AI models with sample prompts. You can see more examples on Prompting Capabilities from the Mistral AI documentation page.

Text summarization — Mistral AI models extract the essence from lengthy articles so you quickly grasp key ideas and core messaging.

You are a summarization system. In clear and concise language, provide three short summaries in bullet points of the following essay.

# Essay:
{insert essay text here}

Personalization — The core AI capabilities of understanding language, reasoning, and learning, allow Mistral AI models to personalize answers with more human-quality text. The accuracy, explanation capabilities, and versatility of Mistral AI models make them useful at personalization tasks, because they can deliver content that aligns closely with individual users.

You are a mortgage lender customer service bot, and your task is to create personalized email responses to address customer questions. Answer the customer's inquiry using the provided facts below. Ensure that your response is clear, concise, and directly addresses the customer's question. Address the customer in a friendly and professional manner. Sign the email with "Lender Customer Support."

# Facts
<INSERT FACTS AND INFORMATION HERE>

# Email
{insert customer email here}

Code completion — Mistral AI models have an exceptional understanding of natural language and code-related tasks, which is essential for projects that need to juggle computer code and regular language. Mistral AI models can help generate code snippets, suggest bug fixes, and optimize existing code, accelerating your development process.

[INST] You are a code assistant. Your task is to generate a 5 valid JSON object based on the following properties:
name: 
lastname: 
address: 
Just generate the JSON object without explanations:
[/INST]

Things You Have to Know
Here are few additional information for you:

  • Availability — Mistral AI’s Mixtral 8x7B and Mistral 7B models in Amazon Bedrock are available in the US West (Oregon) Region.
  • Deep dive into Mistral 7B and Mixtral 8x7B — If you want to learn more about Mistral AI models on Amazon Bedrock, you might also enjoy this article titled “Mistral AI – Winds of Change” prepared by my colleague, Mike.

Now Available
Mistral AI models are available today in Amazon Bedrock, and we can’t wait to see what you’re going to build. Get yourself started by visiting Mistral AI on Amazon Bedrock.

Happy building,
Donnie

Introducing the AWS WAF traffic overview dashboard

Post Syndicated from Dmitriy Novikov original https://aws.amazon.com/blogs/security/introducing-the-aws-waf-traffic-overview-dashboard/

For many network security operators, protecting application uptime can be a time-consuming challenge of baselining network traffic, investigating suspicious senders, and determining how best to mitigate risks. Simplifying this process and understanding network security posture at all times is the goal of most IT organizations that are trying to scale their applications without also needing to scale their security operations center staff. To help you with this challenge, AWS WAF introduced traffic overview dashboards so that you can make informed decisions about your security posture when your application is protected by AWS WAF.

In this post, we introduce the new dashboards and delve into a few use cases to help you gain better visibility into the overall security of your applications using AWS WAF and make informed decisions based on insights from the dashboards.

Introduction to traffic overview dashboards

The traffic overview dashboard in AWS WAF displays an overview of security-focused metrics so that you can identify and take action on security risks in a few clicks, such as adding rate-based rules during distributed denial of service (DDoS) events. The dashboards include near real-time summaries of the Amazon CloudWatch metrics that AWS WAF collects when it evaluates your application web traffic.

These dashboards are available by default and require no additional setup. They show metrics—total requests, blocked requests, allowed requests, bot compared to non-bot requests, bot categories, CAPTCHA solve rate, top 10 matched rules, and more—for each web access control list (web ACL) that you monitor with AWS WAF.

You can access default metrics such as the total number of requests, blocked requests, and common attacks blocked, or you can customize your dashboard with the metrics and visualizations that are most important to you.

These dashboards provide enhanced visibility and help you answer questions such as these:

  • What percent of the traffic that AWS WAF inspected is getting blocked?
  • What are the top originating countries for the traffic that’s getting blocked?
  • What are common attacks that AWS WAF detects and protects me from?
  • How do my traffic patterns from this week compare with last week?

The dashboard has native and out-of-the-box integration with CloudWatch. Using this integration, you can navigate back and forth between the dashboard and CloudWatch; for example, you can get a more granular metric overview by viewing the dashboard in CloudWatch. You can also add existing CloudWatch widgets and metrics to the traffic overview dashboard, bringing your tried-and-tested visibility structure into the dashboard.

With the introduction of the traffic overview dashboard, one AWS WAF tool—Sampled requests—is now a standalone tab inside a web ACL. In this tab, you can view a graph of the rule matches for web requests that AWS WAF has inspected. Additionally, if you have enabled request sampling, you can see a table view of a sample of the web requests that AWS WAF has inspected.

The sample of requests contains up to 100 requests that matched the criteria for a rule in the web ACL and another 100 requests for requests that didn’t match rules and thus had the default action for the web ACL applied. The requests in the sample come from the protected resources that have received requests for your content in the previous three hours.

The following figure shows a typical layout for the traffic overview dashboard. It categorizes inspected requests with a breakdown of each of the categories that display actionable insights, such as attack types, client device types, and countries. Using this information and comparing it with your expected traffic profile, you can decide whether to investigate further or block the traffic right away. For the example in Figure 1, you might want to block France-originating requests from mobile devices if your web application isn’t supposed to receive traffic from France and is a desktop-only application.

Figure 1: Dashboard with sections showing multiple categories serves as a single pane of glass

Figure 1: Dashboard with sections showing multiple categories serves as a single pane of glass

Use case 1: ­Analyze traffic patterns with the dashboard

In addition to visibility into your web traffic, you can use the new dashboard to analyze patterns that could indicate potential threats or issues. By reviewing the dashboard’s graphs and metrics, you can spot unusual spikes or drops in traffic that deserve further investigation.

The top-level overview shows the high-level traffic volume and patterns. From there, you can drill down into the web ACL metrics to see traffic trends and metrics for specific rules and rule groups. The dashboard displays metrics such as allowed requests, blocked requests, and more.

Notifications or alerts about a deviation from expected traffic patterns provide you a signal to explore the event. During your exploration, you can use the dashboard to understand the broader context and not just the event in isolation. This makes it simpler to detect a trend in anomalies that could signify a security event or misconfigured rules. For example, if you normally get 2,000 requests per minute from a particular country, but suddenly see 10,000 requests per minute from it, you should investigate. Using the dashboard, you can look at the traffic across various dimensions. The spike in requests alone might not be a clear indication of a threat, but if you see an additional indicator, such as an unexpected device type, this could be a strong reason for you to take follow-up action.

The following figure shows the actions taken by rules in a web ACL and which rule matched the most.

Figure 2: Multidimensional overview of the web requests

Figure 2: Multidimensional overview of the web requests

The dashboard also shows the top blocked and allowed requests over time. Check whether unusual spikes in blocked requests correspond to spikes in traffic from a particular IP address, country, or user agent. That could indicate attempted malicious activity or bot traffic.

The following figure shows a disproportionately larger number of matches to a rule indicating that a particular vector is used against a protected web application.

Figure 3: The top terminating rule could indicate a particular vector of an attack

Figure 3: The top terminating rule could indicate a particular vector of an attack

Likewise, review the top allowed requests. If you see a spike in traffic to a specific URL, you should investigate whether your application is working properly.

Next steps after you analyze traffic

After you’ve analyzed the traffic patterns, here are some next steps to consider:

  • Tune your AWS WAF rules to better match legitimate or malicious traffic based on your findings. You might be able to fine-tune rules to reduce false positives or false negatives. Tune rules that are blocking legitimate traffic by adjusting regular expressions or conditions.
  • Configure AWS WAF logging, and if you have a dedicated security information and event management (SIEM) solution, integrate the logging to enable automated alerting for anomalies.
  • Set up AWS WAF to automatically block known malicious IPs. You can maintain an IP block list based on identified threat actors. Additionally, you can use the Amazon IP reputation list managed rule group, which the Amazon Threat Research Team regularly updates.
  • If you see spikes in traffic to specific pages, check that your web applications are functioning properly to rule out application issues driving unusual patterns.
  • Add new rules to block new attack patterns that you spot in the traffic flows. Then review the metrics to help confirm the impact of the new rules.
  • Monitor source IPs for DDoS events and other malicious spikes. Use AWS WAF rate-based rules to help mitigate these spikes.
  • If you experience traffic floods, implement additional layers of protection by using CloudFront with DDoS protection.

The new dashboard gives you valuable insight into the traffic that reaches your applications and takes the guesswork out of traffic analysis. Using the insights that it provides, you can fine-tune your AWS WAF protections and block threats before they affect availability or data. Analyze the data regularly to help detect potential threats and make informed decisions about optimizing.

As an example, if you see an unexpected spike of traffic, which looks conspicuous in the dashboard compared to historical traffic patterns, from a country where you don’t anticipate traffic originating from, you can create a geographic match rule statement in your web ACL to block this traffic and prevent it from reaching your web application.

The dashboard is a great tool to gain insights and to understand how AWS WAF managed rules help protect your traffic.

Use case 2: Understand bot traffic during onboarding and fine-tune your bot control rule group

With AWS WAF Bot Control, you can monitor, block, or rate limit bots such as scrapers, scanners, crawlers, status monitors, and search engines. If you use the targeted inspection level of the rule group, you can also challenge bots that don’t self-identify, making it harder and more expensive for malicious bots to operate against your website.

On the traffic overview dashboard, under the Bot Control overview tab, you can see how much of your current traffic is coming from bots, based on request sampling (if you don’t have Bot Control enabled) and real-time CloudWatch metrics (if you do have Bot Control enabled).

During your onboarding phase, use this dashboard to monitor your traffic and understand how much of it comes from various types of bots. You can use this as a starting point to customize your bot management. For example, you can enable common bot control rule groups in count mode and see if desired traffic is being mislabeled. Then you can add rule exceptions, as described in AWS WAF Bot Control example: Allow a specific blocked bot.

The following figure shows a collection of widgets that visualize various dimensions of requests detected as generated by bots. By understanding categories and volumes, you can make an informed decision to either investigate by further delving into logs or block a specific category if it’s clear that it’s unwanted traffic.

Figure 4: Collection of bot-related metrics on the dashboard

Figure 4: Collection of bot-related metrics on the dashboard

After you get started, you can use the same dashboard to monitor your bot traffic and evaluate adding targeted detection for sophisticated bots that don’t self-identify. Targeted protections use detection techniques such as browser interrogation, fingerprinting, and behavior heuristics to identify bad bot traffic. AWS WAF tokens are an integral part of these enhanced protections.

AWS WAF creates, updates, and encrypts tokens for clients that successfully respond to silent challenges and CAPTCHA puzzles. When a client with a token sends a web request, it includes the encrypted token, and AWS WAF decrypts the token and verifies its contents.

In the Bot Control dashboard, the token status pane shows counts for the various token status labels, paired with the rule action that was applied to the request. The IP token absent thresholds pane shows data for requests from IPs that sent too many requests without a token. You can use this information to fine-tune your AWS WAF configuration.

For example, within a Bot Control rule group, it’s possible for a request without a valid token to exit the rule group evaluation and continue to be evaluated by the web ACL. To block requests that are missing their token or for which the token is rejected, you can add a rule to run immediately after the managed rule group to capture and block requests that the rule group doesn’t handle for you. Using the Token status pane, illustrated in Figure 5, you can also monitor the volume of requests that acquire tokens and decide if you want to rate limit or block such requests.

Figure 5: Token status enables monitoring of the volume of requests that acquire tokens

Figure 5: Token status enables monitoring of the volume of requests that acquire tokens

Comparison with CloudFront security dashboard

The AWS WAF traffic overview dashboard provides enhanced overall visibility into web traffic reaching resources that are protected with AWS WAF. In contrast, the CloudFront security dashboard brings AWS WAF visibility and controls directly to your CloudFront distribution. If you want the detailed visibility and analysis of patterns that could indicate potential threats or issues, then the AWS WAF traffic overview dashboard is the best fit. However, if your goal is to manage application delivery and security in one place without navigating between service consoles and to gain visibility into your application’s top security trends, allowed and blocked traffic, and bot activity, then the CloudFront security dashboard could be a better option.

Availability and pricing

The new dashboards are available in the AWS WAF console, and you can use them to better monitor your traffic. These dashboards are available by default, at no cost, and require no additional setup. CloudWatch logging has a separate pricing model and if you have full logging enabled you will incur CloudWatch charges. See here for more information about CloudWatch charges. You can customize the dashboards if you want to tailor the displayed data to the needs of your environment.

Conclusion

With the AWS WAF traffic overview dashboard, you can get actionable insights on your web security posture and traffic patterns that might need your attention to improve your perimeter protection.

In this post, you learned how to use the dashboard to help secure your web application. You walked through traffic patterns analysis and possible next steps. Additionally, you learned how to observe traffic from bots and follow up with actions related to them according to the needs of your application.

The AWS WAF traffic overview dashboard is designed to meet most use cases and be a go-to default option for security visibility over web traffic. However, if you’d prefer to create a custom solution, see the guidance in the blog post Deploy a dashboard for AWS WAF with minimal effort.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Dmitriy Novikov

Dmitriy Novikov

As a Senior Solutions Architect at AWS, Dmitriy supports AWS customers to use emerging technologies to generate business value. He’s a technology enthusiast who loves finding innovative solutions to complex challenges. He enjoys sharing his learnings on architecture and best practices in blog posts and whitepapers and at events. Outside of work, Dmitriy has a passion for reading and triathlons.

Harith Gaddamanugu

Harith Gaddamanugu

Harith works at AWS as a Senior Edge Specialist Solutions Architect. He stays motivated by solving problems for customers across AWS Perimeter Protection and Edge services. When he’s not working, he enjoys spending time outdoors with friends and family.

New AWS whitepaper: AWS User Guide for Federally Regulated Financial Institutions in Canada

Post Syndicated from Dan MacKay original https://aws.amazon.com/blogs/security/new-aws-whitepaper-aws-user-guide-for-federally-regulated-financial-institutions-in-canada/

Amazon Web Services (AWS) has released a new whitepaper to help financial services customers in Canada accelerate their use of the AWS Cloud.

The new AWS User Guide for Federally Regulated Financial Institutions in Canada helps AWS customers navigate the regulatory expectations of the Office of the Superintendent of Financial Institutions (OSFI) in a shared responsibility environment. It is intended for OSFI-regulated institutions that are looking to run material workloads in the AWS Cloud, and is particularly useful for leadership, security, risk, and compliance teams that need to understand OSFI requirements and guidance applicable to the use of AWS services.

This whitepaper summarizes OSFI’s expectations with respect to Technology and Cyber Risk Management (OSFI Guideline B-13). It also gives OSFI-regulated institutions information that they can use to commence their due diligence and assess how to implement the appropriate programs for their use of AWS Cloud services. In subsequent versions of the whitepaper, we will provide considerations for other OSFI guidelines as applicable.

In addition to this whitepaper, AWS provides updates on the evolving Canadian regulatory landscape on the AWS Security Blog and the AWS Compliance page. Customers looking for more information on cloud-related regulatory compliance in different countries around the world can refer to the AWS Compliance Center. For additional resources or support, reach out to your AWS account manager or contact us here.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Dan MacKay

Dan MacKay

Dan is the Financial Services Compliance Specialist for AWS Canada. He advises financial services customers on best practices and practical solutions for cloud-related governance, risk, and compliance. Dan specializes in helping AWS customers navigate financial services and privacy regulations applicable to the use of cloud technology in Canada with a focus on third-party risk management and operational resilience.

Dave Trieu

Dave Trieu

Dave is an AWS Solutions Architect Manager with over two decades in the tech industry. He excels in guiding organizations through modernization and using cloud technologies for transformation. Dave helps businesses navigate the digital landscape and maintain a competitive edge by crafting and implementing cutting-edge solutions that address immediate business needs while anticipating future trends.

Top Architecture Blog Posts of 2023

Post Syndicated from Andrea Courtright original https://aws.amazon.com/blogs/architecture/top-architecture-blog-posts-of-2023/

2023 was a rollercoaster year in tech, and we at the AWS Architecture Blog feel so fortunate to have shared in the excitement. As we move into 2024 and all of the new technologies we could see, we want to take a moment to highlight the brightest stars from 2023.

As always, thanks to our readers and to the many talented and hardworking Solutions Architects and other contributors to our blog.

I give you our 2023 cream of the crop!

#10: Build a serverless retail solution for endless aisle on AWS

In this post, Sandeep and Shashank help retailers and their customers alike in this guided approach to finding inventory that doesn’t live on shelves.

Building endless aisle architecture for order processing

Figure 1. Building endless aisle architecture for order processing

Check it out!

#9: Optimizing data with automated intelligent document processing solutions

Who else dreads wading through large amounts of data in multiple formats? Just me? I didn’t think so. Using Amazon AI/ML and content-reading services, Deependra, Anirudha, Bhajandeep, and Senaka have created a solution that is scalable and cost-effective to help you extract the data you need and store it in a format that works for you.

AI-based intelligent document processing engine

Figure 2: AI-based intelligent document processing engine

Check it out!

#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive

Disaster recovery posts are always popular, and this post by Brent and Dhruv is no exception. Their creative approach in part 3 of this series is most helpful for customers who have business-critical workloads with higher availability requirements.

Warm standby with managed services

Figure 3. Warm standby with managed services

Check it out!

#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator

Continuing with the theme of “when bad things happen,” we have Siva, Elamaran, and Re’s post about preparing for workload failures. If resiliency is a concern (and it really should be), the secret is test, test, TEST.

Architecture flow for Microservices to simulate a realistic failure scenario

Figure 4. Architecture flow for Microservices to simulate a realistic failure scenario

Check it out!

#6: Let’s Architect! Designing event-driven architectures

Luca, Laura, Vittorio, and Zamira weren’t content with their four top-10 spots last year – they’re back with some things you definitely need to know about event-driven architectures.

Let's Architect

Figure 5. Let’s Architect artwork

Check it out!

#5: Use a reusable ETL framework in your AWS lake house architecture

As your lake house increases in size and complexity, you could find yourself facing maintenance challenges, and Ashutosh and Prantik have a solution: frameworks! The reusable ETL template with AWS Glue templates might just save you a headache or three.

Reusable ETL framework architecture

Figure 6. Reusable ETL framework architecture

Check it out!

#4: Invoking asynchronous external APIs with AWS Step Functions

It’s possible that AWS’ menagerie of services doesn’t have everything you need to run your organization. (Possible, but not likely; we have a lot of amazing services.) If you are using third-party APIs, then Jorge, Hossam, and Shirisha’s architecture can help you maintain a secure, reliable, and cost-effective relationship among all involved.

Invoking Asynchronous External APIs architecture

Figure 7. Invoking Asynchronous External APIs architecture

Check it out!

#3: Announcing updates to the AWS Well-Architected Framework

The Well-Architected Framework continues to help AWS customers evaluate their architectures against its six pillars. They are constantly striving for improvement, and Haleh’s diligence in keeping us up to date has not gone unnoticed. Thank you, Haleh!

Well-Architected logo

Figure 8. Well-Architected logo

Check it out!

#2: Let’s Architect! Designing architectures for multi-tenancy

The practically award-winning Let’s Architect! series strikes again! This time, Luca, Laura, Vittorio, and Zamira were joined by Federica to discuss multi-tenancy and why that concept is so crucial for SaaS providers.

Let's Architect

Figure 9. Let’s Architect

Check it out!

And finally…

#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud

Haresh, Lewis, and Bonnie revamped this 2022 post into a masterpiece that completely stole our readers’ hearts and is among the top posts we’ve ever made!

Resilience patterns and trade-offs

Figure 10. Resilience patterns and trade-offs

Check it out!

Bonus! Three older special mentions

These three posts were published before 2023, but we think they deserve another round of applause because you, our readers, keep coming back to them.

Thanks again to everyone for their contributions during a wild year. We hope you’re looking forward to the rest of 2024 as much as we are!

Introducing Terraform support for Amazon OpenSearch Ingestion

Post Syndicated from Rahul Sharma original https://aws.amazon.com/blogs/big-data/introducing-terraform-support-for-amazon-opensearch-ingestion/

Today, we are launching Terraform support for Amazon OpenSearch Ingestion. Terraform is an infrastructure as code (IaC) tool that helps you build, deploy, and manage cloud resources efficiently. OpenSearch Ingestion is a fully managed, serverless data collector that delivers real-time log, metric, and trace data to Amazon OpenSearch Service domains and Amazon OpenSearch Serverless collections. In this post, we explain how you can use Terraform to deploy OpenSearch Ingestion pipelines. As an example, we use an HTTP source as input and an Amazon OpenSearch Service domain (Index) as output.

Solution overview

The steps in this post deploy a publicly accessible OpenSearch Ingestion pipeline with Terraform, along with other supporting resources that are needed for the pipeline to ingest data into Amazon OpenSearch. We have implemented the Tutorial: Ingesting data into a domain using Amazon OpenSearch Ingestion, using Terraform.

We create the following resources with Terraform:

The pipeline that you create exposes an HTTP source as input and an Amazon OpenSearch sink to save batches of events.

Prerequisites

To follow the steps in this post, you need the following:

  • An active AWS account.
  • Terraform installed on your local machine. For more information, see Install Terraform.
  • The necessary IAM permissions required to create the AWS resources using Terraform.
  • awscurl for sending HTTPS requests through the command line with AWS Sigv4 authentication. For instructions on installing this tool, see the GitHub repo.

Create a directory

In Terraform, infrastructure is managed as code, called a project. A Terraform project contains various Terraform configuration files, such as main.tf, provider.tf, variables.tf, and output.df . Let’s create a directory on the server or machine that we can use to connect to AWS services using the AWS Command Line Interface (AWS CLI):

mkdir osis-pipeline-terraform-example

Change to the directory.

cd osis-pipeline-terraform-example

Create the Terraform configuration

Create a file to define the AWS resources.

touch main.tf

Enter the following configuration in main.tf and save your file:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.36"
    }
  }

  required_version = ">= 1.2.0"
}

provider "aws" {
  region = "eu-central-1"
}

data "aws_region" "current" {}
data "aws_caller_identity" "current" {}
locals {
    account_id = data.aws_caller_identity.current.account_id
}

output "ingest_endpoint_url" {
  value = tolist(aws_osis_pipeline.example.ingest_endpoint_urls)[0]
}

resource "aws_iam_role" "example" {
  name = "exampleosisrole"
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Sid    = ""
        Principal = {
          Service = "osis-pipelines.amazonaws.com"
        }
      },
    ]
  })
}

resource "aws_opensearch_domain" "test" {
  domain_name           = "osi-example-domain"
  engine_version = "OpenSearch_2.7"
  cluster_config {
    instance_type = "r5.large.search"
  }
  encrypt_at_rest {
    enabled = true
  }
  domain_endpoint_options {
    enforce_https       = true
    tls_security_policy = "Policy-Min-TLS-1-2-2019-07"
  }
  node_to_node_encryption {
    enabled = true
  }
  ebs_options {
    ebs_enabled = true
    volume_size = 10
  }
 access_policies = <<EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "${aws_iam_role.example.arn}"
      },
      "Action": "es:*"
    }
  ]
}

EOF

}

resource "aws_iam_policy" "example" {
  name = "osis_role_policy"
  description = "Policy for OSIS pipeline role"
  policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
        {
          Action = ["es:DescribeDomain"]
          Effect = "Allow"
          Resource = "arn:aws:es:${data.aws_region.current.name}:${local.account_id}:domain/*"
        },
        {
          Action = ["es:ESHttp*"]
          Effect = "Allow"
          Resource = "arn:aws:es:${data.aws_region.current.name}:${local.account_id}:domain/osi-test-domain/*"
        }
    ]
})
}

resource "aws_iam_role_policy_attachment" "example" {
  role       = aws_iam_role.example.name
  policy_arn = aws_iam_policy.example.arn
}

resource "aws_cloudwatch_log_group" "example" {
  name = "/aws/vendedlogs/OpenSearchIngestion/example-pipeline"
  retention_in_days = 365
  tags = {
    Name = "AWS Blog OSIS Pipeline Example"
  }
}

resource "aws_osis_pipeline" "example" {
  pipeline_name               = "example-pipeline"
  pipeline_configuration_body = <<-EOT
            version: "2"
            example-pipeline:
              source:
                http:
                  path: "/test_ingestion_path"
              processor:
                - date:
                    from_time_received: true
                    destination: "@timestamp"
              sink:
                - opensearch:
                    hosts: ["https://${aws_opensearch_domain.test.endpoint}"]
                    index: "application_logs"
                    aws:
                      sts_role_arn: "${aws_iam_role.example.arn}"   
                      region: "${data.aws_region.current.name}"
        EOT
  max_units                   = 1
  min_units                   = 1
  log_publishing_options {
    is_logging_enabled = true
    cloudwatch_log_destination {
      log_group = aws_cloudwatch_log_group.example.name
    }
  }
  tags = {
    Name = "AWS Blog OSIS Pipeline Example"
  }
  }

Create the resources

Initialize the directory:

terraform init

Review the plan to see what resources will be created:

terraform plan

Apply the configuration and answer yes to run the plan:

terraform apply

The process might take around 7–10 minutes to complete.

Test the pipeline

After you create the resources, you should see the ingest_endpoint_url output displayed. Copy this value and export it in your environment variable:

export OSIS_PIPELINE_ENDPOINT_URL=<Replace with value copied>

Send a sample log with awscurl. Replace the profile with your appropriate AWS profile for credentials:

awscurl --service osis --region eu-central-1 -X POST -H "Content-Type: application/json" -d '[{"time":"2014-08-11T11:40:13+00:00","remote_addr":"122.226.223.69","status":"404","request":"GET http://www.k2proxy.com//hello.html HTTP/1.1","http_user_agent":"Mozilla/4.0 (compatible; WOW64; SLCC2;)"}]' https://$OSIS_PIPELINE_ENDPOINT_URL/test_ingestion_path

You should receive a 200 OK as a response.

To verify that the data was ingested in the OpenSearch Ingestion pipeline and saved in the OpenSearch, navigate to the OpenSearch and get its domain endpoint. Replace the <OPENSEARCH ENDPOINT URL> in the snippet given below and run it.

awscurl --service es --region eu-central-1 -X GET https://<OPENSEARCH ENDPOINT URL>/application_logs/_search | json_pp 

You should see the output as below:

Clean up

To destroy the resources you created, run the following command and answer yes when prompted:

terraform destroy

The process might take around 30–35 minutes to complete.

Conclusion

In this post, we showed how you can use Terraform to deploy OpenSearch Ingestion pipelines. AWS offers various resources for you to quickly start building pipelines using OpenSearch Ingestion and use Terraform to deploy them. You can use various built-in pipeline integrations to quickly ingest data from Amazon DynamoDB, Amazon Managed Streaming for Apache Kafka (Amazon MSK), Amazon Security Lake, Fluent Bit, and many more. The following OpenSearch Ingestion blueprints allow you to build data pipelines with minimal configuration changes and manage them with ease using Terraform. To learn more, check out the Terraform documentation for Amazon OpenSearch Ingestion.


About the Authors

Rahul Sharma is a Technical Account Manager at Amazon Web Services. He is passionate about the data technologies that help leverage data as a strategic asset and is based out of New York city, New York.

Farhan Angullia is a Cloud Application Architect at AWS Professional Services, based in Singapore. He primarily focuses on modern applications with microservice software patterns, and advocates for implementing robust CI/CD practices to optimize the software delivery lifecycle for customers. He enjoys contributing to the open source Terraform ecosystem in his spare time.

Arjun Nambiar is a Product Manager with Amazon OpenSearch Service. He focusses on ingestion technologies that enable ingesting data from a wide variety of sources into Amazon OpenSearch Service at scale. Arjun is interested in large scale distributed systems and cloud-native technologies and is based out of Seattle, Washington.

Muthu Pitchaimani is a Search Specialist with Amazon OpenSearch Service. He builds large-scale search applications and solutions. Muthu is interested in the topics of networking and security, and is based out of Austin, Texas.

Introducing Amazon MWAA support for Apache Airflow version 2.8.1

Post Syndicated from Mansi Bhutada original https://aws.amazon.com/blogs/big-data/introducing-amazon-mwaa-support-for-apache-airflow-version-2-8-1/

Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is a managed orchestration service for Apache Airflow that makes it straightforward to set up and operate end-to-end data pipelines in the cloud.

Organizations use Amazon MWAA to enhance their business workflows. For example, C2i Genomics uses Amazon MWAA in their data platform to orchestrate the validation of algorithms processing cancer genomics data in billions of records. Twitch, a live streaming platform, manages and orchestrates the training and deployment of its recommendation models for over 140 million active users. They use Amazon MWAA to scale, while significantly improving security and reducing infrastructure management overhead.

Today, we are announcing the availability of Apache Airflow version 2.8.1 environments on Amazon MWAA. In this post, we walk you through some of the new features and capabilities of Airflow now available in Amazon MWAA, and how you can set up or upgrade your Amazon MWAA environment to version 2.8.1.

Object storage

As data pipelines scale, engineers struggle to manage storage across multiple systems with unique APIs, authentication methods, and conventions for accessing data, requiring custom logic and storage-specific operators. Airflow now offers a unified object storage abstraction layer that handles these details, letting engineers focus on their data pipelines. Airflow object storage uses fsspec to enable consistent data access code across different object storage systems, thereby streamlining infrastructure complexity.

The following are some of the feature’s key benefits:

  • Portable workflows – You can switch storage services with minimal changes in your Directed Acyclic Graphs (DAGs)
  • Efficient data transfers – You can stream data instead of loading into memory
  • Reduced maintenance – You don’t need separate operators, making your pipelines straightforward to maintain
  • Familiar programming experience – You can use Python modules, like shutil, for file operations

To use object storage with Amazon Simple Storage Service (Amazon S3), you need to install the package extra s3fs with the Amazon provider (apache-airflow-providers-amazon[s3fs]==x.x.x).

In the sample code below, you can see how to move data directly from Google Cloud Storage to Amazon S3. Because Airflow’s object storage uses shutil.copyfileobj, the objects’ data is read in chunks from gcs_data_source and streamed to amazon_s3_data_target.

gcs_data_source = ObjectStoragePath("gcs://source-bucket/prefix/", conn_id="google_cloud_default")

amazon_s3_data_target = ObjectStoragePath("s3://target-bucket/prefix/", conn_id="aws_default ")

with DAG(
    dag_id="copy_from_gcs_to_amazon_s3",
    start_date=datetime(2024, 2, 26),
    schedule="0 0 * * *",
    catchup=False,    
    tags=["2.8", "ObjectStorage"],
) as dag:

    def list_objects(path: ObjectStoragePath) -> list[ObjectStoragePath]:
        objects = [f for f in path.iterdir() if f.is_file()]
        return objects

    def copy_object(path: ObjectStoragePath, object: ObjectStoragePath):    
        object.copy(dst=path)

    objects_list = list_objects(path=gcs_data_source)
    copy_object.partial(path=amazon_s3_data_target).expand(object=objects_list)

For more information on Airflow object storage, refer to Object Storage.

XCom UI

XCom (cross-communications) allows for the passing of data between tasks, facilitating communication and coordination between them. Previously, developers had to switch to a diffferent view to see XComs related to a task. With Airflow 2.8, XCom key-values are rendered directly on a tab within the Airflow Grid view, as shown in the following screenshot.

The new XCom tab provides the following benefits:

  • Improved XCom visibility – A dedicated tab in the UI provides a convenient and user-friendly way to see all XComs associated with a DAG or task.
  • Improved debugging – Being able to see XCom values directly in the UI is helpful for debugging DAGs. You can quickly see the output of upstream tasks without needing to manually pull and inspect them using Python code.

Task context logger

Managing task lifecycles is crucial for the smooth operation of data pipelines in Airflow. However, certain challenges have persisted, particularly in scenarios where tasks are unexpectedly stopped. This can occur due to various reasons, including scheduler timeouts, zombie tasks (tasks that remain in a running state without sending heartbeats), or instances where the worker runs out of memory.

Traditionally, such failures, particularly those triggered by core Airflow components like the scheduler or executor, weren’t recorded within the task logs. This limitation required users to troubleshoot outside the Airflow UI, complicating the process of pinpointing and resolving issues.

Airflow 2.8 introduced a significant improvement that addresses this problem. Airflow components, including the scheduler and executor, can now use the new TaskContextLogger to forward error messages directly to the task logs. This feature allows you to see all the relevant error messages related to a task’s run in one place. This simplifies the process of figuring out why a task failed, offering a complete perspective of what went wrong within a single log view.

The following screenshot shows how the task is detected as zombie, and the scheduler log is being included as part of the task log.

You need to set the environment configuration parameter enable_task_context_logger to True, to enable the feature. Once it’s enabled, Airflow can ship logs from the scheduler, the executor, or callback run context to the task logs, and make them available in the Airflow UI.

Listener hooks for datasets

Datasets were introduced in Airflow 2.4 as a logical grouping of data sources to create data-aware scheduling and dependencies between DAGs. For example, you can schedule a consumer DAG to run when a producer DAG updates a dataset. Listeners enable Airflow users to create subscriptions to certain events happening in the environment. In Airflow 2.8, listeners are added for two datasets events: on_dataset_created and on_dataset_changed, effectively allowing Airflow users to write custom code to react to dataset management operations. For example, you can trigger an external system, or send a notification.

Using listener hooks for datasets is straightforward. Complete the following steps to create a listener for on_dataset_changed:

  1. Create the listener (dataset_listener.py):
    from airflow import Dataset
    from airflow.listeners import hookimpl
    
    @hookimpl
    def on_dataset_changed(dataset: Dataset):
        """Following custom code is executed when a dataset is changed."""
        print("Invoking external endpoint")
    
        """Validating a specific dataset"""
        if dataset.uri == "s3://bucket-prefix/object-key.ext":
            print ("Execute specific/different action for this dataset")

  2. Create a plugin to register the listener in your Airflow environment (dataset_listener_plugin.py):
    from airflow.plugins_manager import AirflowPlugin
    from plugins import listener_code
    
    class DatasetListenerPlugin(AirflowPlugin):
        name = "dataset_listener_plugin"
        listeners = [dataset_listener]

For more information on how to install plugins in Amazon MWAA, refer to Installing custom plugins.

Set up a new Airflow 2.8.1 environment in Amazon MWAA

You can initiate the setup in your account and preferred Region using the AWS Management Console, API, or AWS Command Line Interface (AWS CLI). If you’re adopting infrastructure as code (IaC), you can automate the setup using AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or Terraform scripts.

Upon successful creation of an Airflow version 2.8.1 environment in Amazon MWAA, certain packages are automatically installed on the scheduler and worker nodes. For a complete list of installed packages and their versions, refer to Apache Airflow provider packages installed on Amazon MWAA environments. You can install additional packages using a requirements file.

Upgrade from older versions of Airflow to version 2.8.1

You can take advantage of these latest capabilities by upgrading your older Airflow version 2.x-based environments to version 2.8.1 using in-place version upgrades. To learn more about in-place version upgrades, refer to Upgrading the Apache Airflow version or Introducing in-place version upgrades with Amazon MWAA.

Conclusion

In this post, we discussed some important features introduced in Airflow version 2.8, such as object storage, the new XCom tab added to the grid view, task context logging, listener hooks for datasets, and how you can start using them. We also provided some sample code to show implementations in Amazon MWAA. For the complete list of changes, refer to Airflow’s release notes.

For additional details and code examples on Amazon MWAA, visit the Amazon MWAA User Guide and the Amazon MWAA examples GitHub repo.

Apache, Apache Airflow, and Airflow are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.


About the Authors

Mansi Bhutada is an ISV Solutions Architect based in the Netherlands. She helps customers design and implement well-architected solutions in AWS that address their business problems. She is passionate about data analytics and networking. Beyond work, she enjoys experimenting with food, playing pickleball, and diving into fun board games.

Hernan Garcia is a Senior Solutions Architect at AWS based in the Netherlands. He works in the financial services industry, supporting enterprises in their cloud adoption. He is passionate about serverless technologies, security, and compliance. He enjoys spending time with family and friends, and trying out new dishes from different cuisines.

AWS Payment Cryptography is PCI PIN and P2PE certified

Post Syndicated from Tim Winston original https://aws.amazon.com/blogs/security/aws-payment-cryptography-is-pci-pin-and-p2pe-certified/

Amazon Web Services (AWS) is pleased to announce that AWS Payment Cryptography is certified for Payment Card Industry Personal Identification Number (PCI PIN) version 3.1 and as a PCI Point-to-Point Encryption (P2PE) version 3.1 Decryption Component.

With Payment Cryptography, your payment processing applications can use payment hardware security modules (HSMs) that are PCI PIN Transaction Security (PTS) HSM certified and fully managed by AWS, with PCI PIN and P2PE-compliant key management. These attestations give you the flexibility to deploy your regulated workloads with reduced compliance overhead.

The PCI P2PE Decryption Component enables PCI P2PE Solutions to use AWS to decrypt credit card transactions from payment terminals, and PCI PIN attestation is required for applications that process PIN-based debit transactions. According to PCI, “Use of a PCI P2PE Solution can also allow merchants to reduce where and how the PCI DSS applies within their retail environment, increasing security of customer data while simplifying compliance with the PCI DSS”.

Coalfire, a third-party Qualified PIN Assessor (QPA) and Qualified Security Assessor (P2PE), evaluated Payment Cryptography. Customers can access the PCI PIN Attestation of Compliance (AOC) report, the PCI PIN Shared Responsibility Summary, and the PCI P2PE Attestation of Validation through AWS Artifact.

To learn more about our PCI program and other compliance and security programs, see the AWS Compliance Programs page. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Author

Tim Winston

Tim is a Principal Payments Industry Specialist for AWS Payment Cryptography. He focuses on compliance for the service and its customers.

Author

Nivetha Chandran

Nivetha is a Security Assurance Manager at AWS. She leads multiple security and compliance initiatives within AWS. Nivetha has over 10 years of experience in security assurance and holds a master’s degree in information management from University of Washington.

2023 H2 IRAP report is now available on AWS Artifact for Australian customers

Post Syndicated from Patrick Chang original https://aws.amazon.com/blogs/security/2023-h2-irap-report-is-now-available-on-aws-artifact-for-australian-customers/

Amazon Web Services (AWS) is excited to announce that a new Information Security Registered Assessors Program (IRAP) report (2023 H2) is now available through AWS Artifact. An independent Australian Signals Directorate (ASD) certified IRAP assessor completed the IRAP assessment of AWS in December 2023.

The new IRAP report includes an additional seven AWS services that are now assessed at the PROTECTED level under IRAP. This brings the total number of services assessed at the PROTECTED level to 151.

The following are the seven newly assessed services:

For the full list of services, see the IRAP tab on the AWS Services in Scope by Compliance Program page.

AWS has developed an IRAP documentation pack to assist Australian government agencies and their partners to plan, architect, and assess risk for their workloads when they use AWS Cloud services.

We developed this pack in accordance with the Australian Cyber Security Centre (ACSC) Cloud Security Guidance and Cloud Assessment and Authorisation framework, which addresses guidance within the Australian Government’s Information Security Manual (ISM, September 2023 version), the Department of Home Affairs’ Protective Security Policy Framework (PSPF), and the Digital Transformation Agency’s Secure Cloud Strategy.

The IRAP pack on AWS Artifact also includes newly updated versions of the AWS Consumer Guide and the whitepaper Reference Architectures for ISM PROTECTED Workloads in the AWS Cloud.

Reach out to your AWS representatives to let us know which additional services you would like to see in scope for upcoming IRAP assessments. We strive to bring more services into scope at the PROTECTED level under IRAP to support your requirements.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Patrick Chang

Patrick Chang

Patrick is the Asia Pacific and Japan (APJ) Audit Lead at AWS. He leads security audits, certifications, and compliance programs across the APJ region. Patrick is a technology risk and audit professional with over a decade of experience. He is passionate about delivering assurance programs that build trust with customers and provide them assurance on cloud security.

AWS recognized as an Overall Leader in 2024 KuppingerCole Leadership Compass for Policy Based Access Management

Post Syndicated from Julian Lovelock original https://aws.amazon.com/blogs/security/aws-recognized-as-overall-leader-in-2023kuppingercole-leadership-compass/

Amazon Web Services (AWS) was recognized by KuppingerCole Analysts AG as an Overall Leader in the firm’s Leadership Compass report for Policy Based Access Management. The Leadership Compass report reveals Amazon Verified Permissions as an Overall Leader (as shown in Figure 1), a Product Leader for functional strength, and an Innovation Leader for open source security. The recognition is based on a comparison with 14 other vendors, using standardized evaluation criteria set by KuppingerCole.

Figure 1: KuppingerCole Leadership Compass for Policy Based Access Management

Figure 1: KuppingerCole Leadership Compass for Policy Based Access Management

The report helps organizations learn about policy-based access management solutions for common use cases and requirements. KuppingerCole defines policy-based access management as an approach that helps to centralize policy management, run authorization decisions across a variety of applications and resource types, continually evaluate authorization decisions, and support corporate governance.

Policy-based access management has three major benefits: consistency, security, and agility. Many organizations grapple with a patchwork of access control mechanisms, which can hinder their ability to implement a consistent approach across the organization, increase their security risk exposure, and reduce the agility of their development teams. A policy-based access control architecture helps organizations centralize their policies in a policy store outside the application codebase, where the policies can be audited and consistently evaluated. This enables teams to build, refactor, and expand applications faster, because policy guardrails are in place and access management is externalized.

Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service for the applications that you build. This service helps your developers to build more secure applications faster, by externalizing authorization and centralizing policy management and administration. Developers can align their application access with Zero Trust principles by implementing least privilege and continual authorization within applications. Security and audit teams can better analyze and audit who has access to what within applications.

Verified Permissions uses Cedar, a purpose-built and security-first open source policy language, to define policy-based access controls by using roles and attributes for more granular, context-aware access control. Cedar demonstrates the AWS commitment to raising the bar for open source security by developing key security-related technologies in collaboration with the community, with a goal of improving security postures across the industry.

The KuppingerCole Leadership Compass report offers insightful guidance as you evaluate policy-based access management solutions for your organization. Access a complimentary copy of the 2024 KuppingerCole Leadership Compass for Policy-Based Access Management.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Julian Lovelock

Julian Lovelock

Julian is a Principal Product Manager at AWS, with over 20 years’ experience in the field of Identity and Access Management. He leads the product team for Amazon Verified Permissions, and works closely with customers, partners, and the internal teams building out the service and the underlying Cedar language. He’s based in Northern California, where he enjoys mountain biking and the idea of camping.

Winter 2023 SOC 1 report now available for the first time

Post Syndicated from Brownell Combs original https://aws.amazon.com/blogs/security/winter-2023-soc-1-report-now-available-for-the-first-time/

We continue to expand the scope of our assurance programs at Amazon Web Services (AWS) and are pleased to announce the first ever Winter 2023 AWS System and Organization Controls (SOC) 1 report. The new Winter SOC report demonstrates our continuous commitment to adhere to the heightened expectations for cloud service providers.

The report covers the period January 1–December 31, 2023, and specifically addresses requests from customers that require coverage over the fourth quarter, but have a fiscal year-end that necessitates obtaining a SOC 1 report before the issuance of the Spring AWS SOC 1 report, which is typically published in mid-May.

Customers can download the Winter 2023 SOC 1 report through AWS Artifact, a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

The Winter 2023 SOC 1 report includes a total of 171 services in scope. For up-to-date information, including when additional services are added, see the AWS Services in Scope by Compliance Program and choose SOC.

AWS strives to continuously bring services into the scope of its compliance programs to help you meet your architectural and regulatory needs. If you have questions or feedback about SOC compliance, reach out to your AWS account team.

To learn more about our compliance and security programs, see AWS Compliance Programs. As always, we value your feedback and questions; reach out to the AWS Compliance team through the Contact Us page.

If you have feedback about this post, submit comments in the Comments section below.

Brownell Combs

Brownell Combs

Brownell is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Brownell holds a master of science degree in computer science from the University of Virginia and a bachelor of science degree in computer science from Centre College. He has over 20 years of experience in IT risk management and CISSP, CISA, CRISC, and GIAC GCLD certifications.

Paul Hong

Paul Hong

Paul is a Compliance Program Manager at AWS. He leads multiple security, compliance, and training initiatives within AWS, and has 10 years of experience in security assurance. Paul holds CISSP, CEH, and CPA certifications, and a master’s degree in accounting information systems and a bachelor’s degree in business administration from James Madison University, Virginia.

Tushar Jain

Tushar Jain

Tushar is a Compliance Program Manager at AWS. He leads multiple security, compliance, and training initiatives within AWS. Tushar holds a master of business administration from Indian Institute of Management, Shillong, and a bachelor of technology in electronics and telecommunication engineering from Marathwada University. He has over 12 years of experience in information security and holds CCSK and CSXF certifications.

Michael Murphy

Michael Murphy

Michael is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Michael has 12 years of experience in information security. He holds a master’s degree in information and data engineering and a bachelor’s degree in computer engineering from Stevens Institute of Technology. He also holds CISSP, CRISC, CISA and CISM certifications.

Nathan Samuel

Nathan Samuel

Nathan is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Nathan has a bachelor of commerce degree from the University of the Witwatersrand, South Africa, and has over 21 years of experience in security assurance. He holds the CISA, CRISC, CGEIT, CISM, CDPSE, and Certified Internal Auditor certifications.

ryan wilks

Ryan Wilks

Ryan is a Compliance Program Manager at AWS. He leads multiple security and privacy initiatives within AWS. Ryan has 13 years of experience in information security. He has a bachelor of arts degree from Rutgers University and holds ITIL, CISM, and CISA certifications.

Mistral AI models coming soon to Amazon Bedrock

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/mistral-ai-models-coming-soon-to-amazon-bedrock/

Mistral AI, an AI company based in France, is on a mission to elevate publicly available models to state-of-the-art performance. They specialize in creating fast and secure large language models (LLMs) that can be used for various tasks, from chatbots to code generation.

We’re pleased to announce that two high-performing Mistral AI models, Mistral 7B and Mixtral 8x7B, will be available soon on Amazon Bedrock. AWS is bringing Mistral AI to Amazon Bedrock as our 7th foundation model provider, joining other leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. With these two Mistral AI models, you will have the flexibility to choose the optimal, high-performing LLM for your use case to build and scale generative AI applications using Amazon Bedrock.

Overview of Mistral AI Models
Here’s a quick overview of these two highly anticipated Mistral AI models:

  • Mistral 7B is the first foundation model from Mistral AI, supporting English text generation tasks with natural coding capabilities. It is optimized for low latency with a low memory requirement and high throughput for its size. This model is powerful and supports various use cases from text summarization and classification, to text completion and code completion.
  • Mixtral 8x7B is a popular, high-quality sparse Mixture-of-Experts (MoE) model that is ideal for text summarization, question and answering, text classification, text completion, and code generation.

Choosing the right foundation model is key to building successful applications. Let’s have a look at a few highlights that demonstrate why Mistral AI models could be a good fit for your use case:

  • Balance of cost and performance — One prominent highlight of Mistral AI’s models strikes a remarkable balance between cost and performance. The use of sparse MoE makes these models efficient, affordable, and scalable, while controlling costs.
  • Fast inference speed — Mistral AI models have an impressive inference speed and are optimized for low latency. The models also have a low memory requirement and high throughput for their size. This feature matters most when you want to scale your production use cases.
  • Transparency and trust — Mistral AI models are transparent and customizable. This enables organizations to meet stringent regulatory requirements.
  • Accessible to a wide range of users — Mistral AI models are accessible to everyone. This helps organizations of any size integrate generative AI features into their applications.

Available Soon
Mistral AI publicly available models are coming soon to Amazon Bedrock. As usual, subscribe to this blog so that you will be among the first to know when these models will be available on Amazon Bedrock.

Learn more

Stay tuned,
Donnie

AWS HITRUST Shared Responsibility Matrix for HITRUST CSF v11.2 now available

Post Syndicated from Mark Weech original https://aws.amazon.com/blogs/security/aws-hitrust-shared-responsibility-matrix-for-hitrust-csf-v11-2-now-available/

The latest version of the AWS HITRUST Shared Responsibility Matrix (SRM)—SRM version 1.4.2—is now available. To request a copy, choose SRM version 1.4.2 from the HITRUST website.

SRM version 1.4.2 adds support for the HITRUST Common Security Framework (CSF) v11.2 assessments in addition to continued support for previous versions of HITRUST CSF assessments v9.1–v11.2. As with the previous SRM versions v1.4 and v1.4.1, SRM v1.4.2 enables users to trace the HITRUST CSF cross-version lineage and inheritability of requirement statements, especially when inheriting from or to v9.x and 11.x assessments.

The SRM is intended to serve as a resource to help customers use the AWS Shared Responsibility Model to navigate their security compliance needs. The SRM provides an overview of control inheritance, and customers also use it to perform the control scoring inheritance functions for organizations that use AWS services.

Using the HITRUST certification, you can tailor your security control baselines to a variety of factors—including, but not limited to, regulatory requirements and organization type. As part of their approach to security and privacy, leading organizations in a variety of industries have adopted the HITRUST CSF.

AWS doesn’t provide compliance advice, and customers are responsible for determining compliance requirements and validating control implementation in accordance with their organization’s policies, requirements, and objectives. You can deploy your environments on AWS and inherit our HITRUST CSF certification, provided that you use only in-scope services and apply the controls detailed on the HITRUST website.

What this means for our customers

The new AWS HITRUST SRM version 1.4.2 has been tailored to reflect both the Cross Version ID (CVID) and Baseline Unique ID (BUID) in the CSF object so that you can select the correct control for inheritance even if you’re still using an older version of the HITRUST CSF for your own assessment. As an additional benefit, the AWS HITRUST Inheritance Program also supports the control inheritance of AWS cloud-based workloads for new HITRUST e1 and i1 assessment types, in addition to the validated r2-type assessments offered through HITRUST.

For additional details on the AWS HITRUST program, see our HITRUST CSF compliance page.

At AWS, we’re committed to helping you achieve and maintain the highest standards of security and compliance. We value your feedback and questions. Contact the AWS HITRUST team at AWS Compliance Contact Us. If you have feedback about this post, submit comments in the Comments section below.

Mark Weech

Mark Weech

Mark is the Program Manager for the AWS HITRUST Security Assurance Program. He has over 10 years of experience in the healthcare industry holding director-level IT and security positions both within hospital facilities and enterprise-level positions supporting greater than 30,000 user healthcare environments. Mark has been involved with HITRUST as both an assessor and validated entity for over 9 years.

Introducing the .NET 8 runtime for AWS Lambda

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-the-net-8-runtime-for-aws-lambda/

This post is written by Beau Gosse, Senior Software Engineer and Paras Jain, Senior Technical Account Manager.

AWS Lambda now supports .NET 8 as both a managed runtime and container base image. With this release, Lambda developers can benefit from .NET 8 features including API enhancements, improved Native Ahead of Time (Native AOT) support, and improved performance. .NET 8 supports C# 12, F# 8, and PowerShell 7.4. You can develop Lambda functions in .NET 8 using the AWS Toolkit for Visual Studio, the AWS Extensions for .NET CLI, AWS Serverless Application Model (AWS SAM), AWS CDK, and other infrastructure as code tools.

Creating .NET 8 function in the console

Creating .NET 8 function in the console

What’s new

Upgraded operating system

The .NET 8 runtime is built on the Amazon Linux 2023 (AL2023) minimal container image. This provides a smaller deployment footprint than earlier Amazon Linux 2 (AL2) based runtimes and updated versions of common libraries such as glibc 2.34 and OpenSSL 3.

The new image also uses microdnf as a package manager, symlinked as dnf. This replaces the yum package manager used in earlier AL2-based images. If you deploy your Lambda functions as container images, you must update your Dockerfiles to use dnf instead of yum when upgrading to the .NET 8 base image. For more information, see Introducing the Amazon Linux 2023 runtime for AWS Lambda.

Performance

There are a number of language performance improvements available as part of .NET 8. Initialization time can impact performance, as Lambda creates new execution environments to scale your function automatically. There are a number of ways to optimize performance for Lambda-based .NET workloads, including using source generators in System.Text.Json or using Native AOT.

Lambda has increased the default memory size from 256 MB to 512 MB in the blueprints and templates for improved performance with .NET 8. Perform your own functional and performance tests on your .NET 8 applications. You can use AWS Compute Optimizer or AWS Lambda Power Tuning for performance profiling.

At launch, new Lambda runtimes receive less usage than existing established runtimes. This can result in longer cold start times due to reduced cache residency within internal Lambda subsystems. Cold start times typically improve in the weeks following launch as usage increases. As a result, AWS recommends not drawing performance comparison conclusions with other Lambda runtimes until the performance has stabilized.

Native AOT

Lambda introduced .NET Native AOT support in November 2022. Benchmarks show up to 86% improvement in cold start times by eliminating the JIT compilation. Deploying .NET 8 Native AOT functions using the managed dotnet8 runtime rather than the OS-only provided.al2023 runtime gives your function access to .NET system libraries. For example, libicu, which is used for globalization, is not included by default in the provided.al2023 runtime but is in the dotnet8 runtime.

While Native AOT is not suitable for all .NET functions, .NET 8 has improved trimming support. This allows you to more easily run ASP.NET APIs. Improved trimming support helps eliminate build time trimming warnings, which highlight possible runtime errors. This can give you confidence that your Native AOT function behaves like a JIT-compiled function. Trimming support has been added to the Lambda runtime libraries, AWS .NET SDK, .NET Lambda Annotations, and .NET 8 itself.

Using.NET 8 with Lambda

To use .NET 8 with Lambda, you must update your tools.

  1. Install or update the .NET 8 SDK.
  2. If you are using AWS SAM, install or update to the latest version.
  3. If you are using Visual Studio, install or update the AWS Toolkit for Visual Studio.
  4. If you use the .NET Lambda Global Tools extension (Amazon.Lambda.Tools), install the CLI extension and templates. You can upgrade existing tools with dotnet tool update -g Amazon.Lambda.Tools and existing templates with dotnet new install Amazon.Lambda.Templates.

You can also use .NET 8 with Powertools for AWS Lambda (.NET), a developer toolkit to implement serverless best practices such as observability, batch processing, retrieving parameters, idempotency, and feature flags.

Building new .NET 8 functions

Using AWS SAM

  1. Run sam init.
  2. Choose 1- AWS Quick Start Templates.
  3. Choose one of the available templates such as Hello World Example.
  4. Select N for Use the most popular runtime and package type?
  5. Select dotnet8 as the runtime. The dotnet8 Hello World Example also includes a Native AOT template option.
  6. Follow the rest of the prompts to create the .NET 8 function.
AWS SAM .NET 8 init options

AWS SAM .NET 8 init options

You can amend the generated function code and use sam deploy --guided to deploy the function.

Using AWS Toolkit for Visual Studio

  1. From the Create a new project wizard, filter the templates to either the Lambda or Serverless project type and select a template. Use Lambda for deploying a single function. Use Serverless for deploying a collection of functions using AWS CloudFormation.
  2. Continue with the steps to finish creating your project.
  3. You can amend the generated function code.
  4. To deploy, right click on the project in the Solution Explorer and select Publish to AWS Lambda.

Using AWS extensions for the .NET CLI

  1. Run dotnet new list --tag Lambda to get a list of available Lambda templates.
  2. Choose a template and run dotnet new <template name>. To build a function using Native AOT, use dotnet new lambda.NativeAOT or dotnet new serverless.NativeAOT when using the .NET Lambda Annotations Framework.
  3. Locate the generated Lambda function in the directory under src which contains the .csproj file. You can amend the generated function code.
  4. To deploy, run dotnet lambda deploy-function and follow the prompts.
  5. You can test the function in the cloud using dotnet lambda invoke-function or by using the test functionality in the Lambda console.

You can build and deploy .NET Lambda functions using container images. Follow the instructions in the documentation.

Migrating from .NET 6 to .NET 8 without Native AOT

Using AWS SAM

  1. Open the template.yaml file.
  2. Update Runtime to dotnet8.
  3. Open a terminal window and rebuild the code using sam build.
  4. Run sam deploy to deploy the changes.

Using AWS Toolkit for Visual Studio

  1. Open the .csproj project file and update the TargetFramework to net8.0. Update NuGet packages for your Lambda functions to the latest version to pull in .NET 8 updates.
  2. Verify that the build command you are using is targeting the .NET 8 runtime.
  3. There may be additional steps depending on what build/deploy tool you’re using. Updating the function runtime may be sufficient.

.NET function in AWS Toolkit for Visual Studio

Using AWS extensions for the .NET CLI or AWS Toolkit for Visual Studio

  1. Open the aws-lambda-tools-defaults.json file if it exists.
    1. Set the framework field to net8.0. If unspecified, the value is inferred from the project file.
    2. Set the function-runtime field to dotnet8.
  2. Open the serverless.template file if it exists. For any AWS::Lambda::Function or AWS::Servereless::Function resources, set the Runtime property to dotnet8.
  3. Open the .csproj project file if it exists and update the TargetFramework to net8.0. Update NuGet packages for your Lambda functions to the latest version to pull in .NET 8 updates.

Migrating from .NET 6 to .NET 8 Native AOT

The following example migrates a .NET 6 class library function to a .NET 8 Native AOT executable function. This uses the optional Lambda Annotations framework which provides idiomatic .NET coding patterns.

Update your project file

  1. Open the project file.
  2. Set TargetFramework to net8.0.
  3. Set OutputType to exe.
  4. Remove PublishReadyToRun if it exists.
  5. Add PublishAot and set to true.
  6. Add or update NuGet package references to Amazon.Lambda.Annotations and Amazon.Lambda.RuntimeSupport. You can update using the NuGet UI in your IDE, manually, or by running dotnet add package Amazon.Lambda.RuntimeSupport and dotnet add package Amazon.Lambda.Annotations from your project directory.

Your project file should look similar to the following:

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
    <AWSProjectType>Lambda</AWSProjectType>
    <CopyLocalLockFileAssemblies>true</CopyLocalLockFileAssemblies>
    <!-- Generate native aot images during publishing to improve cold start time. -->
    <PublishAot>true</PublishAot>
	  <!-- StripSymbols tells the compiler to strip debugging symbols from the final executable if we're on Linux and put them into their own file. 
		This will greatly reduce the final executable's size.-->
	  <StripSymbols>true</StripSymbols>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Amazon.Lambda.Core" Version="2.2.0" />
    <PackageReference Include="Amazon.Lambda.RuntimeSupport" Version="1.10.0" />
    <PackageReference Include="Amazon.Lambda.Serialization.SystemTextJson" Version="2.4.0" />
  </ItemGroup>
</Project>

Updating your function code

    1. Reference the annotations library with using Amazon.Lambda.Annotations;
    2. Add [assembly:LambdaGlobalProperties(GenerateMain = true)] to allow the annotations framework to create the main method. This is required as the project is now an executable instead of a library.
    3. Add the below partial class and include a JsonSerializable attribute for any types that you need to serialize, including for your function input and output This partial class is used at build time to generate reflection free code dedicated to serializing the listed types. The following is an example:
    4. /// <summary>
      /// This class is used to register the input event and return type for the FunctionHandler method with the System.Text.Json source generator.
      /// There must be a JsonSerializable attribute for each type used as the input and return type or a runtime error will occur 
      /// from the JSON serializer unable to find the serialization information for unknown types.
      /// </summary>
      [JsonSerializable(typeof(APIGatewayHttpApiV2ProxyRequest))]
      [JsonSerializable(typeof(APIGatewayHttpApiV2ProxyResponse))]
      public partial class MyCustomJsonSerializerContext : JsonSerializerContext
      {
          // By using this partial class derived from JsonSerializerContext, we can generate reflection free JSON Serializer code at compile time
          // which can deserialize our class and properties. However, we must attribute this class to tell it what types to generate serialization code for
          // See https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-source-generation
      }

    5. After the using statement, add the following to specify the serializer to use. [assembly: LambdaSerializer(typeof(SourceGeneratorLambdaJsonSerializer<LambdaFunctionJsonSerializerContext>))]

    Swap LambdaFunctionJsonSerializerContext for your context if you are not using the partial class from the previous step.

    Updating your function configuration

    If you are using aws-lambda-tools-defaults.json.

    1. Set function-runtime to dotnet8.
    2. Set function-architecture to match your build machine – either x86_64 or arm64.
    3. Set (or update) environment-variables to include ANNOTATIONS_HANDLER=<YourFunctionHandler>. Replace <YourFunctionHandler> with the method name of your function handler, so the annotations framework knows which method to call from the generated main method.
    4. Set function-handler to the name of the executable assembly in your bin directory. By default, this is your project name, which tells the .NET Lambda bootstrap script to run your native binary instead of starting the .NET runtime. If your project file has AssemblyName then use that value for the function handler.
    {
      "function-architecture": "x86_64",
      "function-runtime": "dotnet8",
      "function-handler": "<your-assembly-name>",
      "environment-variables",
      "ANNOTATIONS_HANDLER=<your-function-handler>",
    }

    Deploy and test

    1. Deploy your function. If you are using Amazon.Lambda.Tools, run dotnet lambda deploy-function. Check for trim warnings during build and refactor to eliminate them.
    2. Test your function to ensure that the native calls into AL2023 are working correctly. By default, running local unit tests on your development machine won’t run natively and will still use the JIT compiler. Running with the JIT compiler does not allow you to catch native AOT specific runtime errors.

    Conclusion

    Lambda is introducing the new .NET 8 managed runtime. This post highlights new features in .NET 8. You can create new Lambda functions or migrate existing functions to .NET 8 or .NET 8 Native AOT.

    For more information, see the AWS Lambda for .NET repository, documentation, and .NET on Serverless Land.

    For more serverless learning resources, visit Serverless Land.

AWS Customer Compliance Guides now publicly available

Post Syndicated from Kevin Donohue original https://aws.amazon.com/blogs/security/aws-customer-compliance-guides-now-publicly-available/

The AWS Global Security & Compliance Acceleration (GSCA) Program has released AWS Customer Compliance Guides (CCGs) on the AWS Compliance Resources page to help customers, AWS Partners, and assessors quickly understand how industry-leading compliance frameworks map to AWS service documentation and security best practices.

CCGs offer security guidance mapped to 16 different compliance frameworks for more than 130 AWS services and integrations. Customers can select from the frameworks and services available to see how security “in the cloud” applies to AWS services through the lens of compliance.

CCGs focus on security topics and technical controls that relate to AWS service configuration options. The guides don’t cover security topics or controls that are consistent across AWS services or those specific to customer organizations, such as policies or governance. As a result, the guides are shorter and are focused on the unique security and compliance considerations for each AWS service.

We value your feedback on the guides. Take our CCG survey to tell us about your experience, request new services or frameworks, or suggest improvements.

CCGs provide summaries of the user guides for AWS services and map configuration guidance to security control requirements from the following frameworks:

  • National Institute of Standards and Technology (NIST) 800-53
  • NIST Cybersecurity Framework (CSF)
  • NIST 800-171
  • System and Organization Controls (SOC) II
  • Center for Internet Security (CIS) Critical Controls v8.0
  • ISO 27001
  • NERC Critical Infrastructure Protection (CIP)
  • Payment Card Industry Data Security Standard (PCI-DSS) v4.0
  • Department of Defense Cybersecurity Maturity Model Certification (CMMC)
  • HIPAA
  • Canadian Centre for Cyber Security (CCCS)
  • New York’s Department of Financial Services (NYDFS)
  • Federal Financial Institutions Examination Council (FFIEC)
  • Cloud Controls Matrix (CCM) v4
  • Information Security Manual (ISM-IRAP) (Australia)
  • Information System Security Management and Assessment Program (ISMAP) (Japan)

CCGs can help customers in the following ways:

  • Shorten the process of manually searching the AWS user guides to understand security “in the cloud” details and align configuration guidance to compliance requirements
  • Determine the scope of controls applicable in risk assessments or audits based on which AWS services are running in customer workloads
  • Assist customers who perform due diligence assessments on new AWS services under consideration for use in their organization
  • Provide assessors or risk teams with resources to identify which security areas are handled by AWS services and which are the customer’s responsibility to implement, which might influence the scope of evidence required for assessments or internal security checks
  • Provide a basis for developing security documentation such as control responses or procedures that might be required to meet various compliance documentation requirements or fulfill assessment evidence requests

The AWS Global Security & Compliance Acceleration (GSCA) Program connects customers with AWS partners that can help them navigate, automate, and accelerate building compliant workloads on AWS by helping to reduce time and cost. GSCA supports businesses globally that need to meet security, privacy, and compliance requirements for healthcare, privacy, national security, and financial sectors. To connect with a GSCA compliance specialist, complete the GSCA Program questionnaire.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Kevin Donohue

Kevin Donohue

Kevin is a Senior Security Partner Strategist in AWS World Wide Public Sector, specializing in helping customers meet their compliance goals. Kevin began his tenure with AWS in 2019 supporting U.S. government customers in AWS Security Assurance. He is based in Northern Virginia and enjoys spending time outdoors with his wife and daughter outside of work.

Improve your ETL performance using multiple Redshift warehouses for writes

Post Syndicated from Ryan Waldorf original https://aws.amazon.com/blogs/big-data/improve-your-etl-performance-using-multiple-redshift-warehouses-for-writes/

Amazon Redshift is a fast, petabyte-scale, cloud data warehouse that tens of thousands of customers rely on to power their analytics workloads. Thousands of customers use Amazon Redshift read data sharing to enable instant, granular, and fast data access across Redshift provisioned clusters and serverless workgroups. This allows you to scale your read workloads to thousands of concurrent users without having to move or copy the data.

Now, at Amazon Redshift we are announcing multi-data warehouse writes through data sharing in public preview. This allows you to achieve better performance for extract, transform, and load (ETL) workloads by using different warehouses of different types and sizes based on your workload needs. Additionally, this allows you to easily keep your ETL jobs running more predictably as you can split them between warehouses in a few clicks, monitor and control costs as each warehouse has its own monitoring and cost controls, and foster collaboration as you can enable different teams to write to another team’s databases in just a few clicks.

The data is live and available across all warehouses as soon as it is committed, even when it’s written to cross-account or cross-region. For preview you can use a combination of ra3.4xl clusters, ra3.16xl clusters, or serverless workgroups.

In this post, we discuss when you should consider using multiple warehouses to write to the same databases, explain how multi-warehouse writes through data sharing works, and walk you through an example on how to use multiple warehouses to write to the same database.

Reasons for using multiple warehouses to write to the same databases

In this section, we discuss some of the reasons why you should consider using multiple warehouses to write to the same database.

Better performance and predictability for mixed workloads

Customers often start with a warehouse sized to fit their initial workload needs. For example, if you need to support occasional user queries and nightly ingestion of 10 million rows of purchase data, a 32 RPU workgroup may be perfectly suited for your needs. However, adding a new hourly ingestion of 400 million rows of user website and app interactions could slow existing users’ response times as the new workload consumes significant resources. You could resize to a larger workgroup so read and write workloads complete quickly without fighting over resources. However, this may provide unneeded power and cost for existing workloads. Also, because workloads share compute, a spike in one workload can affect the ability of other workloads to meet their SLAs.

The following diagram illustrates a single-warehouse architecture.

Single-Warehouse ETL Architecture. Three separate workloads--a Purchase History ETL job ingesting 10M rows nightly, Users running 25 read queries per hour, and a Web Interactions ETL job ingesting 400M rows/hour--all using the same 256 RPU Amazon Redshift serverless workgroup to read and write from the database called Customer DB.

With the ability to write through datashares, you can now separate the new user website and app interactions ETL into a separate, larger workgroup so that it completes quickly with the performance you need without impacting the cost or completion time of your existing workloads. The following diagram illustrates this multi-warehouse architecture.

Multi-Warehouse ETL Architecture. Two workloads--a Purchase History ETL job ingesting 10M rows nightly and users running 25 read queries per hour--using a 32 RPU serverless workgroup to read from and write to the database Customer DB. It shows a separate workload--a Web Interactions ETL job ingesting 400M rows/hour--using a separate 128 RPU serverless workgroup to write to the database Customer DB.

The multi-warehouse architecture enables you to have all write workloads complete on time with less combined compute, and subsequently lower cost, than a single warehouse supporting all workloads.

Control and monitor costs

When you use a single warehouse for all your ETL jobs, it can be difficult to understand which workloads are contributing to your costs. For instance, you may have one team running an ETL workload ingesting data from a CRM system while another team is ingesting data from internal operational systems. It’s hard for you to monitor and control the costs for the workloads because queries are running together using the same compute in the warehouse. By splitting the write workloads into separate warehouses, you can separately monitor and control costs while ensuring the workloads can progress independently without resource conflict.

Collaborate on live data with ease

The are times when two teams use different warehouses for data governance, compute performance, or cost reasons, but also at times need to write to the same shared data. For instance, you may have a set of customer 360 tables that need to be updated live as customers interact with your marketing, sales, and customer service teams. When these teams use different warehouses, keeping this data live can be difficult because you may have to build a multi-service ETL pipeline using tools like Amazon Simple Storage Service (Amazon S3), Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), and AWS Lambda to track live changes in each team’s data and ingest it into a single source.

With the ability to write through datashares, you can grant granular permissions on your database objects (for example, SELECT on one table, and SELECT, INSERT, and TRUNCATE on another) to different teams using different warehouses in a few clicks. This enables teams to start writing to the shared objects using their own warehouses. The data is live and available to all warehouses as soon as it is committed, and this even works if the warehouses are using different accounts and regions.

In the following sections, we walk you through how to use multiple warehouses to write to the same databases via data sharing.

Solution overview

We use the following terminology in this solution:

  • Namespace – A logical container for database objects, users and roles, their permissions on database objects, and compute (serverless workgroups and provisioned clusters).
  • Datashare – The unit of sharing for data sharing. You grant permissions on objects to datashares.
  • Producer – The warehouse that creates the datashare, grants permissions on objects to datashares, and grants other warehouses and accounts access to the datashare.
  • Consumer – The warehouse that is granted access to the datashare. You can think of consumers as datashare tenants.

This use case involves a customer with two warehouses: a primary warehouse used for attached to the primary namespace for most read and write queries, and a secondary warehouse attached to a secondary namespace that is primarily used to write to the primary namespace. We use the publicly available 10 GB TPCH dataset from AWS Labs, hosted in an S3 bucket. You can copy and paste many of the commands to follow along. Although it’s small for a data warehouse, this dataset allows easy functional testing of this feature.

The following diagram illustrates our solution architecture.

Architecture Diagram showing Two Warehouses for ETL

We set up the primary namespace by connecting to it via its warehouse, creating a marketing database in it with a prod and staging schema, and creating three tables in the prod schema called region, nation, and af_customer. We then load data into the region and nation tables using the warehouse. We do not ingest data into the af_customer table.

We then create a datashare in the primary namespace. We grant the datashare the ability to create objects in the staging schema and the ability to select, insert, update, and delete from objects in the prod schema. We then grant usage on the schema to another namespace in the account.

At that point, we connect to the secondary warehouse. We create a database from a datashare in that warehouse as well as a new user. We then grant permissions on the datashare object to the new user. Then we reconnect to the secondary warehouse as the new user.

We then create a customer table in the datashare’s staging schema and copy data from the TPCH 10 customer dataset into the staging table. We insert staging customer table data into the shared af_customer production table, and then truncate the table.

At this point, the ETL is complete and you are able to read the data in the primary namespace, inserted by the secondary ETL warehouse, from both the primary warehouse and the secondary ETL warehouse.

Prerequisites

To follow along with this post, you should have the following prerequisites:

  • Two warehouses created with the PREVIEW_2023 track. The warehouses can be a mix of serverless workgroups, ra3.4xl clusters, and ra3.16xl clusters.
  • Access to a superuser in both warehouses.
  • An AWS Identity and Access Management (IAM) role that is able to ingest data from Amazon Redshift to Amazon S3 (Amazon Redshift creates one by default when you create a cluster or serverless workgroup).
  • For cross-account only, you need access to an IAM user or role that is allowed to authorize datashares. For the IAM policy, refer to Sharing datashares.

Refer to Sharing both read and write data within an AWS account or across accounts (preview) for the most up-to-date information.

Set up the primary namespace (producer)

In this section, we show how to set up the primary (producer) namespace we will use to store our data.

Connect to producer

Complete the following steps to connect to the producer:

  1. On the Amazon Redshift console, choose Query editor v2 in the navigation pane.

In the query editor v2, you can see all the warehouses you have access to in the left pane. You can expand them to see their databases.

  1. Connect to your primary warehouse using a superuser.
  2. Run the following command to create the marketing database:
CREATE DATABASE marketing;

Create the database objects to share

Complete the following steps to create your database objects to share:

  1. After you create the marketing database, switch your database connection to the marketing database.

You may need to refresh your page to be able to see it.

  1. Run the following commands to create the two schemas you intend to share:
CREATE SCHEMA staging;
CREATE SCHEMA prod;
  1. Create the tables to share with the following code. These are standard DDL statements coming from the AWS Labs DDL file with modified table names.
create table prod.region (
  r_regionkey int4 not null,
  r_name char(25) not null ,
  r_comment varchar(152) not null,
  Primary Key(R_REGIONKEY)
);

create table prod.nation (
  n_nationkey int4 not null,
  n_name char(25) not null ,
  n_regionkey int4 not null,
  n_comment varchar(152) not null,
  Primary Key(N_NATIONKEY)
);

create table prod.af_customer (
  c_custkey int8 not null ,
  c_name varchar(25) not null,
  c_address varchar(40) not null,
  c_nationkey int4 not null,
  c_phone char(15) not null,
  c_acctbal numeric(12,2) not null,
  c_mktsegment char(10) not null,
  c_comment varchar(117) not null,
  Primary Key(C_CUSTKEY)
) distkey(c_custkey) sortkey(c_custkey);

Copy data into the region and nation tables

Run the following commands to copy data from the AWS Labs S3 bucket into the region and nation tables. If you created a cluster while keeping the default created IAM role, you can copy and paste the following commands to load data into your tables:

copy prod.nation from 's3://redshift-downloads/TPC-H/2.18/10GB/nation.tbl' iam_role default delimiter '|' region 'us-east-1';
copy prod.region from 's3://redshift-downloads/TPC-H/2.18/10GB/region.tbl' iam_role default delimiter '|' region 'us-east-1';

Create the datashare

Create the datashare using the following command:

create datashare marketing publicaccessible true;

The publicaccessible setting specifies whether or not a datashare can be used by consumers with publicly accessible provisioned clusters and serverless workgroups. If your warehouses are not publicly accessible, you can ignore that field.

Grant permissions on schemas to the datashare

To add objects with permissions to the datashare, use the grant syntax, specifying the datashare you’d like to grant the permissions to:

grant usage on schema prod to datashare marketing;
grant usage, create on schema staging to datashare marketing;

This allows the datashare consumers to use objects added to the prod schema and use and create objects added to the staging schema. To maintain backward compatibility, if you use the alter datashare command to add a schema, it will be the equivalent of granting usage on the schema.

Grant permissions on tables to the datashare

Now you can grant access to tables to the datashare using the grant syntax, specifying the permissions and the datashare. The following code grants all privileges on the af_customer table to the datashare:

grant all on table prod.af_customer to datashare marketing;

To maintain backward compatibility, if you use the alter datashare command to add a table, it will be the equivalent of granting select on the table.

Additionally, we’ve added scoped permissions that allow you to grant the same permission to all current and future objects within the datashare. We add the scoped select permission on the prod schema tables to the datashare:

grant select for tables in schema prod to datashare marketing;

After this grant, the customer will have select permissions on all current and future tables in the prod schema. This gives them select access on the region and nation tables.

View permissions granted to the datashare

You can view permissions granted to the datashare by running the following command:

show access for datashare marketing;

Grant permissions to the secondary ETL namespace

You can grant permissions to the secondary ETL namespace using the existing syntax. You do this by specifying the namespace ID. You can find the namespace on the namespace details page if your secondary ETL namespace is serverless, as part of the namespace ID in the cluster details page if your secondary ETL namespace is provisioned, or by connecting to the secondary ETL warehouse in the query editor v2 and running select current_namespace. You can then grant access to the other namespace with the following command (change the consumer namespace to the namespace UID of your own secondary ETL warehouse):

grant usage on datashare marketing to namespace '<consumer_namespace>';

Set up the secondary ETL namespace (consumer)

At this point, you’re ready to set up your secondary (consumer) ETL warehouse to start writing to the shared data.

Create a database from the datashare

Complete the following steps to create your database:

  1. In the query editor v2, switch to the secondary ETL warehouse.
  2. Run the command show datashares to see the marketing datashare as well as the datashare producer’s namespace.
  3. Use that namespace to create a database from the datashare, as shown in the following code:
create database marketing_ds_db with permissions from datashare marketing of namespace '&lt;producer_namespace&gt;';

Specifying with permissions allows you to grant granular permissions to individual database users and roles. Without this, if you grant usage permissions on the datashare database, users and roles get all permissions on all objects within the datashare database.

Create a user and grant permissions to that user

Create a user using the CREATE USER command:

create user data_engineer password '[choose a secure password]';
grant usage on database marketing_ds_db to data_engineer;
grant all on schema marketing_ds_db.prod to data_engineer;
grant all on schema marketing_ds_db.staging to data_engineer;
grant all on all tables in schema marketing_ds_db.staging to data_engineer;
grant all on all tables in schema marketing_ds_db.prod to data_engineer;

With these grants, you’ve given the user data_engineer all permissions on all objects in the datashare. Additionally, you’ve granted all permissions available in the schemas as scoped permissions for data_engineer. Any permissions on any objects added to those schemas will be automatically granted to data_engineer.

At this point, you can continue the steps using either the admin user you’re currently signed in as or the data_engineer.

Options for writing to the datashare database

You can write data to the datashare database three ways.

Use three-part notation while connected to a local database

Like with read data sharing, you can use three-part notation to reference the datashare database objects. For instance, insert into marketing_ds_db.prod.customer. Note that you can’t use multi-statement transactions to write to objects in the datashare database like this.

Connect directly to the datashare database

You can connect directly to the datashare database via the Redshift JDBC, ODBC, or Python driver, in addition to the Amazon Redshift Data API (new). To connect like this, specify the datashare database name in the connection string. This allows you to write to the datashare database using two-part notation and use multi-statement transactions to write to the datashare database. Note that some system and catalog tables are not available this way.

Run the use command

You can now specify that you want to use another database with the command use <database_name>. This allows you to write to the datashare database using two-part notation and use multi-statement transactions to write to the datashare database. Note that some system and catalog tables are not available this way. Also, when querying system and catalog tables, you will be querying the system and catalog tables of the database you are connected to, not the database you are using.

To try this method, run the following command:

use marketing_ds_db;

Start writing to the datashare database

In this section, we show how to write to the datashare database using the second and third options we discussed (direct connection or use command). We use the AWS Labs provided SQL to write to the datashare database.

Create a staging table

Create a table within the staging schema, because you’ve been granted create privileges. We create a table within the datashare’s staging schema with the following DDL statement:

create table staging.customer (
  c_custkey int8 not null ,
  c_name varchar(25) not null,
  c_address varchar(40) not null,
  c_nationkey int4 not null,
  c_phone char(15) not null,
  c_acctbal numeric(12,2) not null,
  c_mktsegment char(10) not null,
  c_comment varchar(117) not null,
  Primary Key(C_CUSTKEY)
) distkey(c_nationkey) sortkey(c_nationkey);

You can use two-part notation because you used the USE command or directly connected to the datashare database. If not, you need to specify the datashare database names as well.

Copy data into the staging table

Copy the customer TPCH 10 data from the AWS Labs public S3 bucket into the table using the following command:

copy staging.customer from 's3://redshift-downloads/TPC-H/2.18/10GB/customer.tbl' iam_role default delimiter '|' region 'us-east-1';

As before, this requires you to have set up the default IAM role when creating this warehouse.

Ingest African customer data to the table prod.af_customer

Run the following command to ingest only the African customer data to the table prod.af_customer:

insert into prod.af_customer
select c.* from staging.customer c
  join prod.nation n on c.c_nationkey = n.n_nationkey
  join prod.region r on n.n_regionkey = r.r_regionkey
  where r.r_regionkey = 0; --0 is the region key for Africa

This requires you to join on the nation and region tables you have select permission for.

Truncate the staging table

You can truncate the staging table so that you can write to it without recreating it in a future job. The truncate action will run transactionally and can be rolled back if you are connected directly to the datashare database or you are using the use command (even if you’re not using a datashare database). Use the following code:

truncate staging.customer;

At this point, you’ve completed ingesting the data to the primary namespace. You can query the af_customer table from both the primary warehouse and secondary ETL warehouse and see the same data.

Conclusion

In this post, we showed how to use multiple warehouses to write to the same database. This solution has the following benefits:

  • You can use provisioned clusters and serverless workgroups of different sizes to write to the same databases
  • You can write across accounts and regions
  • Data is live and available to all warehouses as soon as it is committed
  • Writes work even if the producer warehouse (the warehouse that owns the database) is paused

To learn more about this feature, see Sharing both read and write data within an AWS account or across accounts (preview). Additionally, if you have any feedback, please email us at [email protected].


About the authors

Ryan Waldorf is a Senior Product Manager at Amazon Redshift. Ryan focuses on features that enable customers to define and scale compute including data sharing and concurrency scaling.

Harshida Patel is a Analytics Specialist Principal Solutions Architect, with Amazon Web Services (AWS).

Sudipto Das is a Senior Principal Engineer at Amazon Web Services (AWS). He leads the technical architecture and strategy of multiple database and analytics services in AWS with special focus on Amazon Redshift and Amazon Aurora.

Knowledge Bases for Amazon Bedrock now supports Amazon Aurora PostgreSQL and Cohere embedding models

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/knowledge-bases-for-amazon-bedrock-now-supports-amazon-aurora-postgresql-and-cohere-embedding-models/

During AWS re:Invent 2023, we announced the general availability of Knowledge Bases for Amazon Bedrock. With a knowledge base, you can securely connect foundation models (FMs) in Amazon Bedrock to your company data for Retrieval Augmented Generation (RAG).

In my previous post, I described how Knowledge Bases for Amazon Bedrock manages the end-to-end RAG workflow for you. You specify the location of your data, select an embedding model to convert the data into vector embeddings, and have Amazon Bedrock create a vector store in your AWS account to store the vector data, as shown in the following figure. You can also customize the RAG workflow, for example, by specifying your own custom vector store.

Knowledge Bases for Amazon Bedrock

Since my previous post in November, there have been a number of updates to Knowledge Bases, including the availability of Amazon Aurora PostgreSQL-Compatible Edition as an additional custom vector store option next to vector engine for Amazon OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud. But that’s not all. Let me give you a quick tour of what’s new.

Additional choice for embedding model
The embedding model converts your data, such as documents, into vector embeddings. Vector embeddings are numeric representations of text data within your documents. Each embedding aims to capture the semantic or contextual meaning of the data.

Cohere Embed v3 – In addition to Amazon Titan Text Embeddings, you can now also choose from two additional embedding models, Cohere Embed English and Cohere Embed Multilingual, each supporting 1,024 dimensions.

Knowledge Bases for Amazon Bedrock

Check out the Cohere Blog to learn more about Cohere Embed v3 models.

Additional choice for vector stores
Each vector embedding is put into a vector store, often with additional metadata such as a reference to the original content the embedding was created from. The vector store indexes the stored vector embeddings, which enables quick retrieval of relevant data.

Knowledge Bases gives you a fully managed RAG experience that includes creating a vector store in your account to store the vector data. You can also select a custom vector store from the list of supported options and provide the vector database index name as well as index field and metadata field mappings.

We have made three recent updates to vector stores that I want to highlight: The addition of Amazon Aurora PostgreSQL-Compatible and Pinecone serverless to the list of supported custom vector stores, as well as an update to the existing Amazon OpenSearch Serverless integration that helps to reduce cost for development and testing workloads.

Amazon Aurora PostgreSQL – In addition to vector engine for Amazon OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud, you can now also choose Amazon Aurora PostgreSQL as your vector database for Knowledge Bases.

Knowledge Bases for Amazon Bedrock

Aurora is a relational database service that is fully compatible with MySQL and PostgreSQL. This allows existing applications and tools to run without the need for modification. Aurora PostgreSQL supports the open source pgvector extension, which allows it to store, index, and query vector embeddings.

Many of Aurora’s features for general database workloads also apply to vector embedding workloads:

  • Aurora offers up to 3x the database throughput when compared to open source PostgreSQL, extending to vector operations in Amazon Bedrock.
  • Aurora Serverless v2 provides elastic scaling of storage and compute capacity based on real-time query load from Amazon Bedrock, ensuring optimal provisioning.
  • Aurora global database provides low-latency global reads and disaster recovery across multiple AWS Regions.
  • Blue/green deployments replicate the production database in a synchronized staging environment, allowing modifications without affecting the production environment.
  • Aurora Optimized Reads on Amazon EC2 R6gd and R6id instances use local storage to enhance read performance and throughput for complex queries and index rebuild operations. With vector workloads that don’t fit into memory, Aurora Optimized Reads can offer up to 9x better query performance over Aurora instances of the same size.
  • Aurora seamlessly integrates with AWS services such as Secrets Manager, IAM, and RDS Data API, enabling secure connections from Amazon Bedrock to the database and supporting vector operations using SQL.

For a detailed walkthrough of how to configure Aurora for Knowledge Bases, check out this post on the AWS Database Blog and the User Guide for Aurora.

Pinecone serverless – Pinecone recently introduced Pinecone serverless. If you choose Pinecone as a custom vector store in Knowledge Bases, you can provide either Pinecone or Pinecone serverless configuration details. Both options are supported.

Reduce cost for development and testing workloads in Amazon OpenSearch Serverless
When you choose the option to quickly create a new vector store, Amazon Bedrock creates a vector index in Amazon OpenSearch Serverless in your account, removing the need to manage anything yourself.

Since becoming generally available in November, vector engine for Amazon OpenSearch Serverless gives you the choice to disable redundant replicas for development and testing workloads, reducing cost. You can start with just two OpenSearch Compute Units (OCUs), one for indexing and one for search, cutting the costs in half compared to using redundant replicas. Additionally, fractional OCU billing further lowers costs, starting with 0.5 OCUs and scaling up as needed. For development and testing workloads, a minimum of 1 OCU (split between indexing and search) is now sufficient, reducing cost by up to 75 percent compared to the 4 OCUs required for production workloads.

Usability improvement – Redundant replicas disabled is now the default selection when you choose the quick-create workflow in Knowledge Bases for Amazon Bedrock. Optionally, you can create a collection with redundant replicas by selecting Update to production workload.

Knowledge Bases for Amazon Bedrock

For more details on vector engine for Amazon OpenSearch Serverless, check out Channy’s post.

Additional choice for FM
At runtime, the RAG workflow starts with a user query. Using the embedding model, you create a vector embedding representation of the user’s input prompt. This embedding is then used to query the database for similar vector embeddings to retrieve the most relevant text as the query result. The query result is then added to the original prompt, and the augmented prompt is passed to the FM. The model uses the additional context in the prompt to generate the completion, as shown in the following figure.

Knowledge Bases for Amazon Bedrock

Anthropic Claude 2.1 – In addition to Anthropic Claude Instant 1.2 and Claude 2, you can now choose Claude 2.1 for Knowledge Bases. Compared to previous Claude models, Claude 2.1 doubles the supported context window size to 200 K tokens.

Knowledge Bases for Amazon Bedrock

Check out the Anthropic Blog to learn more about Claude 2.1.

Now available
Knowledge Bases for Amazon Bedrock, including the additional choice in embedding models, vector stores, and FMs, is available in the AWS Regions US East (N. Virginia) and US West (Oregon).

Learn more

Read more about Knowledge Bases for Amazon Bedrock

— Antje

AWS CodePipeline adds support for Branch-based development and Monorepos

Post Syndicated from Michael Ohde original https://aws.amazon.com/blogs/devops/aws-codepipeline-adds-support-for-branch-based-development-and-monorepos/

AWS CodePipeline is a managed continuous delivery service that automates your release pipelines for application and infrastructure updates. Today, CodePipeline adds triggers and new execution modes to support teams with various delivery strategies. These features give customers more choice in the pipelines they build.

In this post, I am going to show you how to use triggers and pipeline execution modes together to create three pipeline designs. These examples are requested by customers that practice branch-based development or manage multiple projects within a monorepo.

  • Pipeline #1: Create a GitFlow (multi-branch) release pipeline.
  • Pipeline #2: Run a pipeline on all pull requests (PRs).
  • Pipeline #3: Run a pipeline on a single folder within a monorepo.

As I walkthrough each of the pipelines you will learn more about these features and how to use them. After completing the blog, you can use triggers and execution modes to adapt these examples to your pipeline needs.

Pipeline #1 – Create a GitFlow (multi-branch) release pipeline

GitFlow is a development model that manages large projects with parallel development and releases using long-running branches. GitFlow uses two permanent branches, main and develop, along with supporting feature, release, and hotfix branches. Since I will cover triggering a pipeline from multiple branches, these concepts can be applied to simplify other multi-branch pipeline strategies such as GitHub flow.

I can create pipelines using the AWS Management Console, AWS CLI, AWS CloudFormation, or by writing code that calls the CodePipeline CreatePipeline API. In this blog, I will keep things simple by creating two pipelines, a release pipeline and a feature development pipeline. I start by navigating to the CodePipeline console and choosing Create pipeline.

In the first step, Pipeline settings, as per Figure 1 below, you will now see options for the newly added Execution modesQueued and Parallel.

Example GitFlow release pipeline settings for Queued execution mode.
Figure 1. Example GitFlow release pipeline settings for Queued execution mode.

The execution mode of the pipeline determines the handling of multiple executions:

  • Superseded – an execution that started more recently can overtake one that began earlier. Before today, CodePipeline only supported Superseded execution mode.
  • Queued – the executions wait and do not overtake executions that have already started. Executions are processed one by one in the order that they are queued.
  • Parallel – the executions are independent of one another and do not wait for other executions to complete before starting.

The first pipeline, release pipeline, will trigger for main, develop, hotfix, and release branches. I select Queued, since I want to run every push to these branches in the order triggered by the pipeline. I make sure the Pipeline type chosen is V2 and I click Next.

Example source connection and repository.
Figure 2. Example source connection and repository.

In step two, Add source stage, I select my Source provider, Connection, Repository name, and Default branch. I need to use a source provider that uses a connection to my external code repository, in this example I’m using GitHub so I select Connect to GitHub. Connections authorize and establish configurations that associate a third-party provider such as GitHub with CodePipeline.

Now that I have my Source setup, I’m going to configure a Trigger. Triggers define the event type that starts the pipeline, such as a code push or pull request. I select the Specify filter from the Trigger types, since I want to add a filtered trigger.

For this pipeline, I select Push for the Event type. A push trigger starts a pipeline when a change is pushed to the source repository. The execution uses the files in the branch that is pushed to, the destination branch. Next, I select the Filter type of Branch. The branch filter type specifies the branches in GitHub connected repository that the trigger monitors in order to know when to start an execution.

There are two types of branch filters:

  • Include – the trigger will start a pipeline if the branch name matches the pattern.
  • Exclude – the trigger will NOT start a pipeline if the branch name matches the pattern.

Note: If Include and Exclude both have the same pattern, then the default is to exclude the pattern.

Branching patterns are entered in the glob format, detailed in Working with glob patterns in syntax, to specify the branch I want to trigger, I enter main,develop,hotfix/**,release/** in Include and I leave Exclude empty.

Example GitFlow release pipeline for push event type and branch filters.
Figure 3. Example GitFlow release pipeline for push event type and branch filters.

I am done configuring the filters and I click Next.

To keep the focus of the blog on the pipeline and not the application, I will skip ahead to Create pipeline. If you are curious about by my application and build step, I followed the example in AWS CodeBuild adds support for AWS Lambda compute mode.

Next, I create the feature development pipeline. The feature development pipeline will trigger for feature branches. This time, I select the Parallel Execution mode, as developers should not be blocked by their peers working in other feature branches. I make sure the Pipeline type chosen is V2 and I click Next.

Example GitFlow feature pipeline settings for Queued execution mode.
Figure 4. Example GitFlow feature pipeline settings for Queued execution mode.

In Step 2, the source provider and connection is setup the same as per the previous release pipeline, see Figure 2 above. Once the Source step is complete, I configure my Trigger with an Event type of Push, but this time I only enter feature/** for Include.

Example GitFlow feature pipeline for push event type and branch filters.
Figure 5. Example GitFlow feature pipeline for push event type and branch filters.

I am done configuring the filters and I skip forward to Create pipeline.

After the pipeline is finished creating, I can now see both of the pipelines I created – the release pipeline and the feature development pipeline.

Example GitFlow pipelines.
Figure 6. Example GitFlow pipelines.

To verify my pipeline setup, I create and merge multiple code changes to feature branches and to the release branches – develop, release, and main. The Pipeline view now displays the executions that have been triggered by the matching branches. Note how these executions have been successfully added to the queue by the pipeline.

Example GitFlow release executions queued.
Figure 7. Example GitFlow release executions queued.

I have now implemented a GitFlow release pipeline. By using Branch filter types and Push event triggers, you can now extend this example to meet your branch-based development needs.

Pipeline #2 – Run a pipeline on all pull requests (PRs)

Before proceeding, I recommend you review the concepts covered in Pipeline #1, as you will build on that knowledge.

Triggering a pipeline on a pull request (PR) is a common continuous integration pattern to catch build and test failures before the PR is merged into the branch. A PR pipeline is often faster and lighter than the full release by limiting tests like security scans, validation tests, or performance tests to the changes in the PR rather than running them on every commit. Having a single pipeline triggered for all PRs allows reviewing and validating any proposed changes to the repository before merging.

To start I create a new pipeline, by clicking Create Pipeline. I change the Execution mode to Parallel. I choose Parallel because the development team will be working on multiple features at the same time and it is wasteful to wait for other executions to finish. I make sure the Pipeline type chosen is V2 and I click Next.

Example PR pipeline settings for Parallel execution mode.
Figure 8. Example PR pipeline settings for Parallel execution mode.

As per the previous pipeline, the Source provider and connection is setup as show in Figure 2 above. Once the Source step is setup, I configure my Pull Request Trigger.

For this pipeline, I select Pull Request for the Event type. A pull request trigger starts a pipeline when a pull request is opened, updated, or closed in the source repository. The execution will use the files in the branch that the change is being pulled from, the source branch. Next, I select Pull request is created and New revision is made to pull request for Events for pull request. To match pull requests for all branches, I enter ** under Include for Branches and leave Exclude empty.

Example PR pipeline for pull request event type.
Figure 9. Example PR pipeline for pull request event type.

I will fast-forward to the Create pipeline, skipping the details of the build and deploy steps, similar to what I did in Pipeline #1.

Once the pipeline has finished creating, I open a few PRs in my GitHub repository as a test. Back in CodePipeline when I click on my pipeline, I notice the pipeline takes me straight to the Execution history view. The reason I’m redirected to the execution history is the pipeline execution mode is Parallel and all executions are independent. From this view, I see the Trigger column displaying details about each pull request that has triggered the pipeline.

Example PR pipeline with executions in parallel.
Figure 10. Example PR pipeline with executions in parallel.

Note: To view an individual execution Pipeline, click the Execution ID.

I have now implemented a PR validation pipeline for all PRs across branches. By using Pull request event triggers and Branch filter types, you can now extend this example to meet your PR pipeline needs.

Pipeline #3 – Run a pipeline on a single folder within a monorepo

Before proceeding, I recommend you review the concepts covered in Pipeline #1, as you will build on that knowledge.

A monorepo is a software-development strategy where a single repository is used to contain the code for multiple projects. Running pipelines for each project contained in the monorepo on every commit can be inefficient, especially when each project requires different pipelines. For this pipeline example, I want to limit pipeline executions to only changes inside the infrastructure folder in the main branch. This can reduce cost, speed up deployments, and optimize resource usage.

To start, I create a new pipeline by clicking Create Pipeline. For this example, I keep the default Execution mode as Suspended, since I do not have any specific execution mode requirements. I make sure the Pipeline type chosen is V2 and I click Next.

As per the previous pipeline, the Source provider and connection is setup as per Figure 2 above. Once the Source step is complete, I configure my Trigger to focus on the infrastructure folder in the main branch.

For this pipeline, I select Push for the Event type. Next, I select the Filter type of Branch.  To match pushes to only main, I enter main under Include for Branches and leave Exclude empty. Under File paths, for Include, I enter infrastructure/** and I leave Exclude empty. The file paths filter type specifies file path names in the source repository that the trigger monitors in order to know when to start an execution. Similar to branch filters, I can specify file path name patterns in glob format under Include and Exclude.

Example monorepo pipeline for push event type and file path filters.
Figure 11. Example monorepo pipeline for push event type and file path filters.

I click Next, since I am done configuring the filters.

I will jump ahead to the Create pipeline, omitting the details of the build and deploy steps, like I did in Pipeline #1.

Once the pipeline has been created, I can test the pipeline Trigger in GitHub by making changes on the main branch inside and outside the infrastructure folder. To verify it is only invoking the pipelines inside the infrastructure folder, I open the History for the pipeline in CodePipeline. I confirm that only the changes I’m expecting are running.

Example monorepo pipeline with only infrastructure executions.
Figure 12. Example monorepo pipeline with only infrastructure executions.

I have now selectively invoked a pipeline based on repository changes in a monorepo. By using File paths filter types, you can now extend this example to meet your monorepo release pipelines.

Conclusion

AWS CodePipeline’s new triggers and execution modes unlock new patterns for building pipelines on AWS. In this post, I discussed the new features and three popular pipeline patterns you can build. If you are creating GitFlow or your own multi-branch strategy, CodePipeline simplifies managing release pipelines for multi-branch models. Whether you are using File path filter types for monorepos or leveraging Parallel execution mode to unblock developers, CodePipeline accelerates the delivery of your software. Check out the AWS CodePipeline User Guide and hands-on tutorials to automate your delivery workflows today.

Michael Ohde

Michael Ohde is a Senior Solutions Architect from Long Beach, CA. As a Product Acceleration Solution Architect at AWS, he currently assists Independent Software Vendor (ISVs) in the GovTech and EdTech sectors, by building modern applications using practices like serverless, DevOps, and AI/ML.

AWS completes the 2023 South Korea CSP Safety Assessment Program

Post Syndicated from Andy Hsia original https://aws.amazon.com/blogs/security/aws-completes-the-2023-south-korea-csp-safety-assessment-program/

We’re excited to announce that Amazon Web Services (AWS) has completed the 2023 South Korea Cloud Service Providers (CSP) Safety Assessment Program, also known as the Regulation on Supervision on Electronic Financial Transactions (RSEFT) Audit Program. The financial sector in South Korea is required to abide by a variety of cybersecurity standards and regulations. Key regulatory requirements include RSEFT and the Guidelines on the Use of Cloud Computing Services in the Financial Industry (FSIGUC). Prior to 2019, the RSEFT guidance didn’t permit the use of cloud computing. The guidance was amended on January 1, 2019, to allow financial institutions to use the public cloud to store and process data, subject to compliance with security measures applicable to financial companies.

AWS is committed to helping our customers adhere to applicable regulations and guidelines, and we help ensure that our financial customers have a hassle-free experience using the cloud. Since 2019, our RSEFT compliance program has aimed to provide a scalable approach to support South Korean financial services customers’ adherence to RSEFT and FSIGUC. Financial services customers can annually either perform an individual audit by using publicly available AWS resources and visiting on-site, or request the South Korea Financial Security Institute (FSI) to conduct the primary audit on their behalf and use the FSI-produced audit reports. In 2023, we worked again with FSI and completed the annual RSEFT primary audit with the participation of 59 customers.

The audit scope of the 2023 assessment covered data center facilities in four Availability Zones (AZ) of the AWS Asia Pacific (Seoul) Region and the services that are available in that Region. The audit program assessed different security domains including security policies, personnel security, risk management, business continuity, incident management, access control, encryption, and physical security.

Completion of this audit program helps our customers use the results and audit report for their annual submission to the South Korea Financial Supervisory Service (FSS) for their adoption and continued use of our cloud services and infrastructure. To learn more about the RSEFT program, see the AWS South Korea Compliance Page. If you have questions, contact your AWS account manager.

If you have feedback about this post, submit comments in th Comments section below.

Andy Hsia

Andy Hsia

Andy is the Customer Audit Lead for APJ, based in Singapore. He is responsible for all customer audits in the Asia Pacific region. Andy has been with Security Assurance since 2020 and has delivered key audit programs in Hong Kong, India, Indonesia, South Korea, and Taiwan.

AWS renews K-ISMS certificate for the AWS Asia Pacific (Seoul) Region

Post Syndicated from Joseph Goh original https://aws.amazon.com/blogs/security/aws-renews-k-isms-certificate-for-the-asia-pacific/

We’re excited to announce that Amazon Web Services (AWS) has successfully renewed certification under the Korea Information Security Management System (K-ISMS) standard (effective from December 16, 2023, to December 15, 2026).

The certification assessment covered the operation of infrastructure (including compute, storage, networking, databases, and security) in the AWS Asia Pacific (Seoul) Region. AWS was the first global cloud service provider (CSP) to obtain the K-ISMS certification back in 2017 and has held that certification longer than any other global CSP. In this year’s audit, 144 services running in the Asia Pacific (Seoul) Region were included.

Sponsored by the Korea Internet & Security Agency (KISA) and affiliated with the Korean Ministry of Science and ICT (MSIT), K-ISMS serves as a standard for evaluating whether enterprises and organizations operate and manage their information security management systems consistently and securely, such that they thoroughly protect their information assets.

This certification helps enterprises and organizations across South Korea, regardless of industry, meet KISA compliance requirements more efficiently. Achieving this certification demonstrates the AWS commitment on cloud security adoption, adhering to compliance requirements set by the South Korean government and delivering secure AWS services to customers.

The Operational Best Practices (conformance pack) page provides customers with a compliance framework that they can use for their K-ISMS compliance needs. Enterprises and organizations can use the toolkit and AWS certification to reduce the effort and cost of getting their own K-ISMS certification.

Customers can download the AWS K-ISMS certification from AWS Artifact. To learn more about the AWS K-ISMS certification, see the AWS K-ISMS page. If you have questions, contact your AWS account manager.

If you have feedback about this post, submit comments in the Comments section below.

Joseph Goh

Joseph Goh

Joseph is the APJ ASEAN Lead at AWS based in Singapore. He leads security audits, certifications, and compliance programs across the Asia Pacific region. Joseph is passionate about delivering programs that build trust with customers and provide them assurance on cloud security.

Hwee Hwang

Hwee Hwang

Hwee is an Audit Specialist at AWS based in Seoul, South Korea. Hwee is responsible for third-party and customer audits, certifications, and assessments in Korea. Hwee previously worked in security governance, risk, and compliance consulting in the Big Four. Hwee is laser focused on building customers’ trust and providing them assurance in the cloud.

Track Amazon OpenSearch Service configuration changes more easily with new visibility improvements

Post Syndicated from Siddhant Gupta original https://aws.amazon.com/blogs/big-data/track-amazon-opensearch-service-configuration-with-improved-visibility/

Amazon OpenSearch Service offers multiple domain configuration settings to meet your workload-specific requirements. As part of standard service operations, you may be required to update these configuration settings on a regular basis. Recently, Amazon OpenSearch Service launched visibility improvements that allow you to track configuration changes more effectively. We’ve introduced granular and more descriptive configuration statuses that enable you to set up alarms and use them in automation to minimize manual monitoring.

We recommend that you take advantage of these visibility improvements in your applications. These changes are backward compatible, and if your automations rely on the legacy processing parameter to determine configuration change status, then they should still continue to work without any disruption. To simplify tracking of multiple in-flight configuration change requests, Amazon OpenSearch Service allows configuration request only when Domain Processing Status is Active. Additional details are in section ‘Single configuration change at a time’.

Solution overview

Earlier, configuration change status visibility was available through processing parameters in the OpenSearch Service APIs (Application Programming Interface), and as a Domain Status field in the OpenSearch Service console. We have now introduced the following changes to improve the configuration update experience:

  • Introduced two new parameters, DomainProcessingStatus and ConfigChangeStatus, in the API responses. Similarly, added Domain Processing Status and Configuration Change Status fields in the console. These changes provide better visibility through multiple, intuitive statuses. Earlier statuses were limited to only two values: Active and Processing.
  • Ability to easily compare active and in-flight configurations for clarity. Earlier, it required multiple steps.
  • Amazon OpenSearch Service has now adopted the approach of allowing a single configuration change request at a time. There is no limit on the number of domain configuration changes you can bundle in a single request. However, you can submit the next configuration request when the previous request is complete and the domain processing status becomes Active. This improvement streamlines configuration updates and addresses previous challenges of tracking multiple, in-flight configuration change requests.
  • Ability to cancel a change request in case of a validation failure. Previously, when instances were unavailable, domains remained in processing state. Now, upon encountering any validation failure, you can cancel the change request and retry after some time.
  • Domain processing status turns to Active only after all the background activities, including shard movement is complete. This means that you can confidently use newly introduced statuses in your automation scripts without needing to infer if all the internal processes, such as data movement, are complete.

How do you get granular details to track the configuration update status?

As part of recent improvements, Amazon OpenSearch Service introduced DomainProcessingStatus and ConfigChangeStatus parameters in the APIs along with the respective Domain Processing Status and Configuration Change Status fields in the console. You can rely on these statuses to get accurate and consistent information during different configuration change scenarios, like when configuration changes involve blue/green operations or without blue/green operations, and when configuration changes are triggered by the operator or by the OpenSearch Service. Let us explore these enhanced visibility experiences.

  1. Domain processing status visibility: You can track the staus of domain-level configuration changes through the Domain Processing Status field in the console. Similarly, API responses include the DomainProcessingStatus parameter. The values and a brief description are provided in the following details:
    1. Active: No configuration change is in progress. You can submit a new configuration change request.
    2. Creating: New domain creation is in progress.
    3. Modifying: This status indicates that one or more configuration changes, such as new data node addition, Amazon Elastic Block Store (Amazon EBS) GP3 storage provisioning, or setting up KMS keys, are in progress. In other words, changes made through the UpdateDomainConfig API, set the status to modifying. The ‘Modifying’ status also covers situations where domains require shard movement to complete configuration changes. Note: For backward compatibility, we have kept the behavior of the processing parameter unchanged in the API responses, and it is set to false as soon as the core configuration changes are complete, without waiting for shard movement completion.
    4. Upgrading Engine Version: Engine version upgrades are in progress, such as from Elasticsearch version 7.9 to OpenSearch version 1.0.
    5. Updating Service Software: This status refers to configuration changes related to service software updates.
    6. Deleting: Domain deletion is progressing.
    7. Isolated: This represents domains that are suspended due to a variety of reasons, such as account-related billing issues or domains that are not compliant with critical security patch updates.
  2. Configuration change status visibility: Configuration changes can be initiated by the user (e.g., new data node addition, instance type change) or by the Service (e.g., AutoTune and mandatory service software updates). You can find the latest status details through Configuration Change Status field in the console, and through the ConfigChangeStatus parameter in API responses. Below are the values and a brief description:
    1. Pending: Indicates that a configuration change request has been submitted.
    2. Initializing: Service is initializing a configuration change request.
    3. Validating: Service is validating the requested changes and resources required.
    4. Validation Failed: Requested changes failed validation. At this point, no configuration changes are applied. Some possible validation failures could be the presence of red indices in the domain, unavailability of a chosen instance type, and low disk space. Here is a list of potential validation failures. During a validation failure event, you can cancel, retry, or edit configuration changes.
    5. Awaiting user inputs: Scenarios where user may be able to fix validation errors such as invalid KMS key. At this status, user can edit the configuration changes.
    6. Applying changes: Service is applying requested configuration changes.
    7. Cancelled: During validation failed status, you can either click on the Cancel button in the console or call the CancelDomainConfigChange API. All the applied changes that were part of the change request will be rolled back.
    8. Completed: Requested configuration changes have been successfully completed.

Console enhancements

The Amazon OpenSearch Service console offers enhanced visibility to track configuration change progress. Below are a few screenshots to give you an idea of these improvements.

  • Amazon OpenSearch Service console provides Domain Processing Status, Configuration Change Status, and Change ID fields. Note: To know the change details associated with the Change ID, you can use the DescribeDomainChangeProgress API.

  • Configuration change summary. To see a side by side comparison of your active configurations and requested changes, on the domain detail page, navigate to the cluster configuration tab, scroll down to the configuration change summary section. Pending Changes field shows the status of the pending properties at that time and does not include changes that have been applied. You can also get similar details from the DescribeDomain and DescribeDomainConfig APIs through theModifyingProperties parameter.

Cancelling during validation failure. In the below screenshots, you can see a new option to cancel a change request when a configuration change request fails validations. For example, when you encounter SubnetNotFound error, you can use the Cancel request button to roll back to the previous active configuration, fix the issue and then retry the configuration update.

Single configuration change at a time

Previously, it was not straightforward to track the success and failure of individual change requests, when several requests were made. To provide a simplified experience, OpenSearch Service now limits you to only a single change request at a time. In a single configuration change request, you can bundle multiple changes at once. Once a configuration change request is submitted, it must be completed before you can request the next configuration change through the console, or through the UpdateDomainConfig API. This simplified experience makes it easier to keep track of changes that have been requested and their most recent status. If your automation is written to call configuration change update APIs multiple times, then it should be updated to group multiple configuration changes in a single update call, or wait for individual updates to complete before you submit the next configuration change. You can update domain configuration when domain processing status becomes active. For a list of changes that might need a blue/green deployment, please see here.

The below screenshot shows an example alert on the ‘Edit domain’ page informing the user that another change or update is in progress. OpenSearch Service no longer allows you to submit new configuration update requests, and the ‘Apply change’ button is disabled until the change in progress is completed.

API changes

You can use the DescribeDomain, DescribeDomainChangeProgress, and DescribeDomainConfig APIs to get detailed configuration update statuses. In addition, you can use CancelDomainConfigChange to cancel the change request in the event of a validation failures. You can refer Amazon OpenSearch Service API documentation here.

Conclusion

In this post, we showed you how to get granular information about a configuration update request. These newly introduced changes will help you gain better visibility into the progress of configuration change requests, and easily distinguish between applied changes and pending ones. You need to ensure that the DomainProcessingStatus processing status value is Active before submitting configuration change requests. The ability to cancel changes in the event of validation failures gives you better control in getting your domain out of processing state in a self-service manner. Visit product documentation to learn more.


About the Authors

Siddhant Gupta is a Sr. Technical Product Manager at Amazon Web Services based in Hyderabad, India. Siddhant has been with Amazon for over six years and is currently working with the OpenSearch Service team, helping with new region launches, pricing strategy, and bringing EC2 and EBS innovations to OpenSearch Service customers. He is passionate about analytics and machine learning. In his free time, he loves traveling, fitness activities, spending time with his family and reading non-fiction books.

Deniz Ercelebi is a Sr. UX Designer at Amazon OpenSearch Service. In her role she contributes to the creation, implementation, and successful delivery of design solutions for complex problems. Her personal drive is fueled by a passion for user experience, a dedication to customer-centric solutions, and a firm belief in collaborative innovation.

Shashank Gupta is a Sr. Software Developer at Amazon OpenSearch Service, specializing in the enhancement of the Managed service aspect of the platform. His primary focus is on optimizing the managed experience, spanning from the console to APIs and resource provisioning in an efficient manner. With a dedicated commitment to innovation, Shashank aims to elevate the overall customer experience by introducing inventive solutions within the service.