Tag Archives: serverless

Discovering sensitive data in AWS CodeCommit with AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/discovering-sensitive-data-in-aws-codecommit-with-aws-lambda-2/

This post is courtesy of Markus Ziller, Solutions Architect.

Today, git is a de facto standard for version control in modern software engineering. The workflows enabled by git’s branching capabilities are a major reason for this. However, with git’s distributed nature, it can be difficult to reliably remove changes that have been committed from all copies of the repository. This is problematic when secrets such as API keys have been accidentally committed into version control. The longer it takes to identify and remove secrets from git, the more likely that the secret has been checked out by another user.

This post shows a solution that automatically identifies credentials pushed to AWS CodeCommit in near-real-time. I also show three remediation measures that you can use to reduce the impact of secrets pushed into CodeCommit:

  • Notify users about the leaked credentials.
  • Lock the repository for non-admins.
  • Hard reset the CodeCommit repository to a healthy state.

I use the AWS Cloud Development Kit (CDK). This is an open source software development framework to model and provision cloud application resources. Using the CDK can reduce the complexity and amount of code needed to automate the deployment of resources.

Overview of solution

The services in this solution are AWS Lambda, AWS CodeCommit, Amazon EventBridge, and Amazon SNS. These services are part of the AWS serverless platform. They help reduce undifferentiated work around managing servers, infrastructure, and the parts of the application that add less value to your customers. With serverless, the solution scales automatically, has built-in high availability, and you only pay for the resources you use.

Solution architecture

This diagram outlines the workflow implemented in this blog:

  1. After a developer pushes changes to CodeCommit, it emits an event to an event bus.
  2. A rule defined on the event bus routes this event to a Lambda function.
  3. The Lambda function uses the AWS SDK for JavaScript to get the changes introduced by commits pushed to the repository.
  4. It analyzes the changes for secrets. If secrets are found, it publishes another event to the event bus.
  5. Rules associated with this event type then trigger invocations of three Lambda functions A, B, and C with information about the problematic changes.
  6. Each of the Lambda functions runs a remediation measure:
    • Function A sends out a notification to an SNS topic that informs users about the situation (A1).
    • Function B locks the repository by setting a tag with the AWS SDK (B2). It sends out a notification about this action (B2).
    • Function C runs git commands that remove the problematic commit from the CodeCommit repository (C2). It also sends out a notification (C1).

Walkthrough

The following walkthrough explains the required components, their interactions and how the provisioning can be automated via CDK.

For this walkthrough, you need:

Checkout and deploy the sample stack:

  1. After completing the prerequisites, clone the associated GitHub repository by running the following command in a local directory:
    git clone [email protected]:aws-samples/discover-sensitive-data-in-aws-codecommit-with-aws-lambda.git
  2. Open the repository in a local editor and review the contents of cdk/lib/resources.ts, src/handlers/commits.ts, and src/handlers/remediations.ts.
  3. Follow the instructions in the README.md to deploy the stack.

The CDK will deploy resources for the following services in your account.

Using CodeCommit to manage your git repositories

The CDK creates a new empty repository called TestRepository and adds a tag RepoState with an initial value of ok. You later use this tag in the LockRepo remediation strategy to restrict access.

It also creates two IAM groups with one user in each. Members of the CodeCommitSuperUsers group are always able to access the repository, while members of the CodeCommitUsers group can only access the repository when the value of the tag RepoState is not locked.

I also import the CodeCommitSystemUser into the CDK. Since the user requires git credentials in a downloaded CSV file, it cannot be created by the CDK. Instead it must be created as described in the README file.

The following CDK code sets up all the described resources:

const TAG_NAME = "RepoState";

const superUsers = new Group(this, "CodeCommitSuperUsers", { groupName: "CodeCommitSuperUsers" });
superUsers.addUser(new User(this, "CodeCommitSuperUserA", {
    password: new Secret(this, "CodeCommitSuperUserPassword").secretValue,
    userName: "CodeCommitSuperUserA"
}));

const users = new Group(this, "CodeCommitUsers", { groupName: "CodeCommitUsers" });
users.addUser(new User(this, "User", {
    password: new Secret(this, "CodeCommitUserPassword").secretValue,
    userName: "CodeCommitUserA"
}));

const systemUser = User.fromUserName(this, "CodeCommitSystemUser", props.codeCommitSystemUserName);

const repo = new Repository(this, "Repository", {
    repositoryName: "TestRepository",
    description: "The repository to test this project out",
});
Tags.of(repo).add(TAG_NAME, "ok");

users.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn],
    conditions: {
        StringNotEquals: {
            [`aws:ResourceTag/${TAG_NAME}`]: "locked"
        }
    }
}));

superUsers.addToPolicy(new PolicyStatement({
    effect: Effect.ALLOW,
    actions: ["*"],
    resources: [repo.repositoryArn]
}));

Using EventBridge to pass events between components

I use EventBridge, a serverless event bus, to connect the Lambda functions together. Many AWS services like CodeCommit are natively integrated into EventBridge and publish events about changes in their environment.

repo.onCommit is a higher-level CDK construct. It creates the required resources to invoke a Lambda function for every commit to a given repository. The created events rule looks like this:

EventBridge rule definition

Note that this event rule only matches commit events in TestRepository. To send commits of all repositories in that account to the inspecting Lambda function, remove the resources filter in the event pattern.

CodeCommit Repository State Change is a default event that is published by CodeCommit if changes are made to a repository. In addition, I define CodeCommit Security Event, a custom event, which Lambda publishes to the same event bus if secrets are discovered in the inspected code.

The sample below shows how you can set up Lambda functions as targets for both type of events.

const DETAIL_TYPE = "CodeCommit Security Event";
const eventBus = new EventBus(this, "CodeCommitEventBus", {
    eventBusName: "CodeCommitSecurityEvents"
});

repo.onCommit("AnyCommitEvent", {
    ruleName: "CallLambdaOnAnyCodeCommitEvent",
    target: new targets.LambdaFunction(commitInspectLambda)
});


new Rule(this, "CodeCommitSecurityEvent", {
    eventBus,
    enabled: true,
    ruleName: "CodeCommitSecurityEventRule",
    eventPattern: {
        detailType: [DETAIL_TYPE]
    },
    targets: [
        new targets.LambdaFunction(lockRepositoryLambda),
        new targets.LambdaFunction(raiseAlertLambda),
        new targets.LambdaFunction(forcefulRevertLambda)
    ]
});

Using Lambda functions to run remediation measures

AWS Lambda functions allow you to run code in response to events. The example defines four Lambda functions.

By comparing the delta to its predecessor, the commitInspectLambda function analyzes if secrets are introduced by a commit. With the CDK, you can create a Lambda function with:

const myLambdaInCDK = new Function(this, "UniqueIdentifierRequiredByCDK", {
    runtime: Runtime.NODEJS_12_X,
    handler: "<handlerfile>.<function name>",
    code: Code.fromAsset(path.join(__dirname, "..", "..", "src", "handlers")),
    // See git repository for complete code
});

The code for this Lambda function uses the AWS SDK for JavaScript to fetch the details of the commit, the differences introduced, and the new content.

The code checks each modified file line by line with a regular expression that matches typical secret formats. In src/handlers/regex.json, I provide a few regular expressions that match common secrets. You can extend this with your own patterns.

If a secret is discovered, a CodeCommit Security Event is published to the event bus. EventBridge then invokes all Lambda functions that are registered as targets with this event. This demo triggers three remediation measures.

The raiseAlertLambda function uses the AWS SDK for JavaScript to send out a notification to all subscribers (that is, CodeCommit administrators) on an SNS topic. It takes no further action.

SNS.publish({
    TopicArn: <TOPIC_ARN>,
    Subject: `[ACTION REQUIRED] Secrets discovered in <repo>`
    Message: `<Your message>
}

Notification about secrets discovered in a commit in TestRepository

The lockRepositoryLambda function uses the AWS SDK for JavaScript to change the RepoState tag from ok to locked. This restricts access to members of the CodeCommitSuperUsers IAM group.

CodeCommit.tagResource({
    resourceArn: event.detail.repositoryArn,
    tags: {
        RepoState: "locked"
    }
})

In addition, the Lambda function uses SNS to send out a notification. The forcefulRevertLambda function runs the following git commands:

git clone <repository>
git checkout <branch>
git reset –hard <previousCommitId>
git push origin <branch> --force

These commands reset the repository to the last accepted commit, by forcefully removing the respective commit from the git history of your CodeCommit repo. I advise you to handle this with care and only activate it on a real project if you fully understand the consequences of rewriting git history.

The Node.js v12 runtime for Lambda does not have a git runtime installed by default. You can add one by using the git-lambda2 Lambda layer. This allows you to run git commands from within the Lambda function.

Logs for the remediation measure Hard Reset

Finally, this Lambda function also sends out a notification. The complete code is available in the GitHub repo.

Using SNS to notify users

To notify users about secrets discovered and actions taken, you create an SNS topic and subscribe to it via email.

const topic = new Topic(this, "CodeCommitSecurityEventNotification", {
    displayName: "CodeCommitSecurityEventNotification",
});

topic.addSubscription(new subs.EmailSubscription(/* your email address */));

Testing the solution

You can test the deployed solution by running these two sets of commands. First, add a file with no credentials:

echo "Clean file - no credentials here" > clean_file.txt
git add clean_file.txt
git commit clean_file.txt -m "Adds clean_file.txt"
git push

Then add a file containing credentials:

SECRET_LIKE_STRING=$(cat /dev/urandom | env LC_CTYPE=C tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1)
echo "secret=$SECRET_LIKE_STRING" > problematic_file.txt
git add problematic_file.txt
git commit problematic_file.txt -m "Adds secret-like string to problematic_file.txt"
git push

This first command creates, commits and pushes an unproblematic file clean_file.txt that will pass the checks of commitInspectLambda. The second command creates, commits, and pushes problematic_file.txt, which matches the regular expressions and triggers the remediation measures.

If you check your email, you soon receive notifications about actions taken by the Lambda functions.

Cleaning up

To avoid incurring charges, delete the resources by running cdk destroy and confirming the deletion.

Conclusion

This post demonstrates how you can implement a solution to discover secrets in commits to AWS CodeCommit repositories. It also defines different strategies to remediate this.

The CDK code to set up all components is minimal and can be extended for remediation measures. The template is portable between Regions and uses serverless technologies to minimize cost and complexity.

For more serverless learning resources, visit Serverless Land.

ICYMI: Serverless Q4 2020

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/icymi-serverless-q4-2020/

Welcome to the 12th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all of the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

ICYMI Q4 calendar

In case you missed our last ICYMI, check out what happened last quarter here.

AWS re:Invent

re:Invent 2020 banner

re:Invent was entirely virtual in 2020 and free to all attendees. The conference had a record number of registrants and featured over 700 sessions. The serverless developer advocacy team presented a number of talks to help developers build their skills. These are now available on-demand:

AWS Lambda

There were three major Lambda announcements at re:Invent. Lambda duration billing changed granularity from 100 ms to 1 ms, which is shown in the December billing statement. All functions benefit from this change automatically, and it’s especially beneficial for sub-100ms Lambda functions.

Lambda has also increased the maximum memory available to 10 GB. Since memory also controls CPU allocation in Lambda, this means that functions now have up to 6 vCPU cores available for processing. Finally, Lambda now supports container images as a packaging format, enabling teams to use familiar container tooling, such as Docker CLI. Container images are stored in Amazon ECR.

There were three feature releases that make it easier for developers working on data processing workloads. Lambda now supports self-hosted Kafka as an event source, allowing you to source events from on-premises or instance-based Kafka clusters. You can also process streaming analytics with tumbling windows and use custom checkpoints for processing batches with failed messages.

We launched Lambda Extensions in preview, enabling you to more easily integrate monitoring, security, and governance tools into Lambda functions. You can also build your own extensions that run code during Lambda lifecycle events. See this example extensions repo for starting development.

You can now send logs from Lambda functions to custom destinations by using Lambda Extensions and the new Lambda Logs API. Previously, you could only forward logs after they were written to Amazon CloudWatch Logs. Now, logging tools can receive log streams directly from the Lambda execution environment. This makes it easier to use your preferred tools for log management and analysis, including Datadog, Lumigo, New Relic, Coralogix, Honeycomb, or Sumo Logic.

Lambda Logs API architecture

Lambda launched support for Amazon MQ as an event source. Amazon MQ is a managed broker service for Apache ActiveMQ that simplifies deploying and scaling queues. The event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service manages an internal poller to invoke the target Lambda function.

Lambda announced support for AWS PrivateLink. This allows you to invoke Lambda functions from a VPC without traversing the public internet. It provides private connectivity between your VPCs and AWS services. By using VPC endpoints to access the Lambda API from your VPC, this can replace the need for an Internet Gateway or NAT Gateway.

For developers building machine learning inferencing, media processing, high performance computing (HPC), scientific simulations, and financial modeling in Lambda, you can now use AVX2 support to help reduce duration and lower cost. In this blog post’s example, enabling AVX2 for an image-processing function increased performance by 32-43%.

Lambda now supports batch windows of up to 5 minutes when using SQS as an event source. This is useful for workloads that are not time-sensitive, allowing developers to reduce the number of Lambda invocations from queues. Additionally, the batch size has been increased from 10 to 10,000. This is now the same batch size as Kinesis as an event source, helping Lambda-based applications process more data per invocation.

Code signing is now available for Lambda, using AWS Signer. This allows account administrators to ensure that Lambda functions only accept signed code for deployment. You can learn more about using this new feature in the developer documentation.

AWS Step Functions

Synchronous Express Workflows have been launched for AWS Step Functions, providing a new way to run high-throughput Express Workflows. This feature allows developers to receive workflow responses without needing to poll services or build custom solutions. This is useful for high-volume microservice orchestration and fast compute tasks communicating via HTTPS.

The Step Functions service recently added support for other AWS services in workflows. You can now integrate API Gateway REST and HTTP APIs. This enables you to call API Gateway directly from a state machine as an asynchronous service integration.

Step Functions now also supports Amazon EKS service integration. This allows you to build workflows with steps that synchronously launch tasks in EKS and wait for a response. The service also announced support for Amazon Athena, so workflows can now query data in your S3 data lakes.

Amazon API Gateway

API Gateway now supports mutual TLS authentication, which is commonly used for business-to-business applications and standards such as Open Banking. This is provided at no additional cost. You can now also disable the default REST API endpoint when deploying APIs using custom domain names.

HTTP APIs now supports service integrations with Step Functions Synchronous Express Workflows. This is a result of the service team’s work to add the most popular features of REST APIs to HTTP APIs.

AWS X-Ray

X-Ray now integrates with Amazon S3 to trace upstream requests. If a Lambda function uses the X-Ray SDK, S3 sends tracing headers to downstream event subscribers. This allows you to use the X-Ray service map to view connections between S3 and other services used to process an application request.

X-Ray announced support for end-to-end tracing in Step Functions to make it easier to trace requests across multiple AWS services. It also launched X-Ray Insights in preview, which generates actionable insights based on anomalies detected in an application. For Java developers, the services released an auto-instrumentation agent, for collecting instrumentation without modifying existing code.

Additionally, the AWS Distro for Open Telemetry is now in preview. OpenTelemetry is a collaborative effort by tracing solution providers to create common approaches to instrumentation.

Amazon EventBridge

You can now use event replay to archive and replay events with Amazon EventBridge. After configuring an archive, EventBridge automatically stores all events or filtered events, based upon event pattern matching logic. Event replay can help with testing new features or changes in your code, or hydrating development or test environments.

EventBridge archive and replay

EventBridge also launched resource policies that simplify managing access to events across multiple AWS accounts. Resource policies provide a powerful mechanism for modeling event buses across multiple account and providing fine-grained access control to EventBridge API actions.

EventBridge resource policies

EventBridge announced support for Server-Side Encryption (SSE). Events are encrypted using AES-256 at no additional cost for customers. EventBridge also increased PutEvent quotas to 10,000 transactions per second in US East (N. Virginia), US West (Oregon), and Europe (Ireland). This helps support workloads with high throughput.

Developer tools

The AWS SDK for JavaScript v3 was launched and includes first-class TypeScript support and a modular architecture. This makes it easier to import only the services needed to minimize deployment package sizes.

The AWS Serverless Application Model (AWS SAM) is an AWS CloudFormation extension that makes it easier to build, manage, and maintain serverless applications. The latest versions include support for cached and parallel builds, together with container image support for Lambda functions.

You can use AWS SAM in the new AWS CloudShell, which provides a browser-based shell in the AWS Management Console. This can help run a subset of AWS SAM CLI commands as an alternative to using a dedicated instance or AWS Cloud9 terminal.

AWS CloudShell

Amazon SNS

Amazon SNS announced support for First-In-First-Out (FIFO) topics. These are used with SQS FIFO queues for applications that require strict message ordering with exactly once processing and message deduplication.

Amazon DynamoDB

Developers can now use PartiQL, an SQL-compatible query language, with DynamoDB tables, bringing familiar SQL syntax to NoSQL data. You can also choose to use Kinesis Data Streams to capture changes to tables.

For customers using DynamoDB global tables, you can now use your own encryption keys. While all data in DynamoDB is encrypted by default, this feature enables you to use customer managed keys (CMKs). DynamoDB also announced the ability to export table data to data lakes in Amazon S3. This enables you to use services like Amazon Athena and AWS Lake Formation to analyze DynamoDB data with no custom code required.

AWS Amplify and AWS AppSync

You can now use existing Amazon Cognito user pools and identity pools for Amplify projects, making it easier to build new applications for an existing user base. With the new AWS Amplify Admin UI, you can configure application backends without using the AWS Management Console.

AWS AppSync enabled AWS WAF integration, making it easier to protect GraphQL APIs against common web exploits. You can also implement rate-based rules to help slow down brute force attacks. Using AWS Managed Rules for AWS WAF provides a faster way to configure application protection without creating the rules directly.

Serverless Posts

October

November

December

Tech Talks & Events

We hold AWS Online Tech Talks covering serverless topics throughout the year. These are listed in the Serverless section of the AWS Online Tech Talks page. We also regularly deliver talks at conferences and events around the world, speak on podcasts, and record videos you can find to learn in bite-sized chunks.

Here are some from Q4:

Videos

October:

November:

December:

There are also other helpful videos covering Serverless available on the Serverless Land YouTube channel.

The Serverless Land website

Serverless Land website

To help developers find serverless learning resources, we have curated a list of serverless blogs, videos, events, and training programs at a new site, Serverless Land. This is regularly updated with new information – you can subscribe to the RSS feed for automatic updates or follow the LinkedIn page.

Still looking for more?

The Serverless landing page has lots of information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow all of us on Twitter to see latest news, follow conversations, and interact with the team.

Using One Cron Parser Everywhere With Rust and Saffron

Post Syndicated from Aaron Loyd original https://blog.cloudflare.com/using-one-cron-parser-everywhere-with-rust-and-saffron/

Using One Cron Parser Everywhere With Rust and Saffron

Using One Cron Parser Everywhere With Rust and Saffron

As part of the development for Cron Triggers on Cloudflare Workers, we had an interesting problem to tackle relating to parsers and the cron expression format. Cron expressions are the format used to write schedules in Cron Triggers, and extensions for cron expressions are everywhere. They vary between parsers and platforms as well, and aren’t standardized by a governing body, which means most parsers out there support many different feature sets, which isn’t good if you’d like something off the shelf that just works.

It can be tough to find the right parser for each part of the Cron Triggers stack, when its user interface, API, and edge service are all written in different languages. On top of that, it isn’t practical to reinvent the wheel multiple times by writing the same parser in different languages and make sure they all match perfectly. So you’re likely stuck with a less-than-perfect solution.

However, in the end, because we wrote our backend service in Rust, it took much less effort to solve this problem. Rust has a great ecosystem for working across multiple languages, which allows us to write a parser once and pull it from the backend to the frontend and everywhere in between with minimal glue code.

The Trouble with Cron

Cron expressions are a set of fields that represent a set of times. They act as a pattern that matches over the minute, hour, day of the month, month, and day of the week of a given time. Since cron is a simple format, it’s easy to extend with extra fields, so some parsers and platforms allow specifying seconds and years as well. However, seconds are a bit too granular and years are a bit too long, so we opted to not support them as part of Cron Triggers.

Using One Cron Parser Everywhere With Rust and Saffron

In the original cron program, the expressions supported were simple, each field could contain either:

  • A star (‘*’) representing all values,
  • A value (a number for all fields or a 3 letter abbreviation for months or days of the week, like JUN or FRI)
  • A range of values (i.e. ‘0-30’), or
  • A set of ranges and/or values (i.e. ‘0-15,30,45-50,55’)

This is a good start for specifying most time patterns, but many extensions exist out there to fill in some gaps. For example,

  • ‘L’ can be used for the day of the month position to specify the last day of the month, or in the day of the week position with a day value to specify the last of that weekday during the month (i.e. 7L, the last Saturday of the month).
  • ‘W’ can be used for the day of the month, and lets you specify “the closest weekday (MON-FRI) to a given day”, like 15W, or the closest weekday to the 15th of the month.
  • ‘/’ can be used for step values in any field. For example, */5 in the minute field is every 5th minute in the hour. This can be combined with a range to specify things such as ‘30-59/5’, or every 5th minute from minute 30 to minute 59 in the hour.
  • ‘#’ can be used with a day of the week value to specify the “nth day of the month”, such as ‘5#3’, or the 3rd Thursday of the month.

So far I’ve only listed extensions we currently support on Workers, but others exist such as ‘H’ in Jenkins and ‘?’ in some cron implementations for start-up time. Most libraries don’t support said extensions, however ‘?’ is used in some implementations in certain circumstances, but not as start-up time. With all these extensions and a lack of standardization, some libraries aren’t guaranteed to support them all.

The Multitude of Libraries

During the development of Cron Triggers, we needed some things to just work, and to do that, we opted to pull some libraries off the shelf from package repositories for different parts of the stack.

In the Rust backend, we needed a cron library that supported all the extensions we wanted, while also leaving off other field extensions like seconds and years, and had an API that let us simply check if a given time matched the expression pattern. None of the crates on crates.io offered these, so we had to write it ourselves. Using the nom crate, it was easy to draft a simple, fast, safe parser, named ‘saffron’. As time went on and we got closer to release, it became clearer which extensions we really wanted to support. It was incredibly easy to add support for the new features without worrying about safety since the compiler checked it for us, so all we had to do was extensive logic testing. Last offset weekdays (“L-XW”) and leap years were difficult to get right the first time, but testing them was easy with Rust.

#[test]
   fn parse_check_offset_weekend_start_months() {
       let cron = "0 0 L-30W * *";
 
       check_does_contain(
           cron,
           &["2021-05-3T00:00:00+00:00", "2022-01-3T00:00:00+00:00"],
       );
   }
   #[test]
   fn parse_check_offset_leap_days() {
       let cron = "0 0 L-1 FEB *";
 
       check_does_contain(
           cron,
           &[
               "2400-02-28T00:00:00+00:00",
               "2300-02-27T00:00:00+00:00",
               "2200-02-27T00:00:00+00:00",
               "2100-02-27T00:00:00+00:00",
               "2024-02-28T00:00:00+00:00",
               "2020-02-28T00:00:00+00:00",
               "2004-02-28T00:00:00+00:00",
               "2000-02-28T00:00:00+00:00",
           ],
       );
 
       check_does_not_contain(
           cron,
           &[
               "2400-02-29T00:00:00+00:00",
               "2300-02-28T00:00:00+00:00",
               "2200-02-28T00:00:00+00:00",
               "2100-02-28T00:00:00+00:00",
               "2024-02-29T00:00:00+00:00",
               "2020-02-29T00:00:00+00:00",
               "2004-02-29T00:00:00+00:00",
               "2000-02-29T00:00:00+00:00",
           ],
       );
   }

However, the UI had a different set of requirements. It didn’t need to know whether a given time matched a cron pattern we wanted to provide information to the user about the cron expression they’d written, so it needed to provide a more human readable translation (description) of their cron expression and show them their next five executions (future times). But we were on a limited time budget — we needed something off the shelf.

Using One Cron Parser Everywhere With Rust and Saffron

We used two different JavaScript libraries for displaying info about given cron expressions: one gave us descriptions, the other gave us future times. Since these two libraries were tasked with parsing cron expressions, they also acted as validation; however, just using these two libraries for validation proved to be less than optimal. Both of the libraries supported extensions that were different both from each other and from the backend. Because of that they’d sometimes allow users to add schedules that would be rejected by the API on submit, which doesn’t translate into a good user experience. This validation should happen while the user writes their cron expression, not after they already hit submit! Because of this fracture in extension support, the UI parsers also sometimes didn’t parse expressions that should be supported and were accepted by the API!

Before release on the API side, we simply used a Go library for validation. This proved to be an easy solution, but we quickly noticed that the API accepted more than the schedule runner supported. This caused some triggers to be successfully added to the schedule, but were ignored by the runner because they failed to parse.

So before launch, we were using four completely different parsers! This probably wouldn’t be much of an issue if cron expressions were standardized. But because they aren’t, inconsistencies could exist at every step in the trigger creation process: between the two libraries we used on the frontend, between the frontend and API, and between the API and the backend.

Using One Cron Parser Everywhere With Rust and Saffron

To solve these issues in the UI and API before release, we synced the API and backend with another schedule runner entrypoint that simply read a cron expression from stdio, parsed it, and returned whether it was valid, to make sure they perfectly matched. We also added a validation endpoint to the API that could be used by the UI to check a cron expression, to make sure the backend actually accepted it. This fixed all cases of the API and UI being too accepting of expressions that weren’t supported, but neither of these solutions were optimal.

For one, they weren’t performant. Each time we wanted to validate a cron expression in the UI, we’d have to parse the expression twice in JavaScript (once for a description, and again for future times) and make an request to the API, which would start an instance of the schedule runner, parse the expression, and return whether it properly parsed.

Another reason this was nonoptimal is we were still limited in the features we supported by one library. One of our UI libraries didn’t support the ‘L’ and ‘W’ extensions, and since we also programmed the UI to accept expressions based on whether all parsers accepted it, expressions that used those extensions couldn’t be added.

Using One Cron Parser Everywhere With Rust and Saffron

So even though we dropped it to three parsers before release, it still didn’t seem good enough. Soon after release, I made plans to remedy it and started working on saffron (originally this project was called cfron but Cloudflare’s CTO couldn’t resist suggesting renaming it to saffron because he loves puns) to fill in for the one library holding us back in the UI. It would’ve been OK if missing extension support was the only thing wrong after release, but soon some other issues came up.

Off By One

Saffron is based on the Quartz open source scheduler’s cron parser, which makes days of the week when specified as integers start from 1 (Sunday) and go to 7 (Saturday). Both parsers on the frontend follow the original values for cron, where days start from 0 and go to 6, and 7 could be used for Sunday as well. So when users entered 1-5, the UI told them they were entering a schedule from Monday to Friday, and the backend ended up executing Sunday to Thursday! This was missed when testing Cron Triggers initially and was caught by observant community members on the forum.

Fixing the issue turned out to be a bit difficult. While the library we were using for descriptions had the option to simply switch from 0-6 to 1-7 days of the week, our future times library did not have that option. Luckily, development was already halfway through with replacing it in Saffron. However, we couldn’t place it directly on the frontend yet, since web bindings didn’t exist and I didn’t have time to write them. We needed something easier to develop quickly.

Reintroducing: Cloudflare Workers!

Workers made it incredibly easy to take the existing code, add some wasm entry points for a makeshift API, and call with JavaScript. No need to build a whole separate API in Go! Just take your existing code and put it directly within 100ms of nearly everyone on the Internet. Why call all the way back home when the nearest PoP works just as well?

Plus, we don’t have to worry about building and publishing, wrangler does it for us! For example, our validation code is all written in Rust:

#[wasm_bindgen]
#[derive(Clone, Debug)]
pub struct ValidationResult {
   errors: Option<Vec<String>>,
}
 
#[wasm_bindgen]
pub fn validate(crons: JsArray) -> ValidationResult {
   set_panic_hook();
 
   let len = crons.length();
   let mut map = HashMap::with_capacity(len as usize);
   for i in 0..len {
       let string = match crons.get(i).as_string() {
           Some(string) => string,
           None => {
               return ValidationResult {
                   errors: Some(vec![format!("Element '{}' is not a string", i)]),
               }
           }
       };
 
       let cron: Cron = match string.parse() {
           Ok(cron) => cron,
           Err(err) => {
               return ValidationResult {
                   errors: Some(vec![format!(
                       "Failed to parse expression at index '{}': {}",
                       i, err
                   )]),
               }
           }
       };
 
       if let Some(old_str) = map.insert(cron, string.clone()) {
           return ValidationResult {
               errors: Some(vec![format!(
                   "Expression '{}' already exists in the form of '{}'",
                   string, old_str
               )]),
           };
       }
   }
 
   ValidationResult { errors: None }
}

and our code to handle processing the request and response is written in JavaScript:

  const path = new URL(request.url).pathname;
 switch (path) {
   case "/validate": {
     let body;
     try {
       body = await request.json()
     } catch (e) {
       return status(400, "Bad Request");
     }
     let crons = body.crons;
     if (!Array.isArray(crons)) {
       return status(400, "Bad Request");
     }
 
     let result = validate(crons).errors();
     let success = result == null;
     return apiResponse({}, success, result);
   }

After a week of dedicated development, a Worker was written, the future times were calculated, and the UI was fixed! On top of that, we also implicitly introduced support for more extensions by removing the old parser and replacing it with the same one used on the backend as part of the fix itself. But we’re still using two parsers, so inconsistencies may still exist out there that we haven’t seen yet (that we don’t already know about).

Using One Cron Parser Everywhere With Rust and Saffron

For example, this expression “0 0 L-1W 2 *”, or “12:00 AM on the closest weekday to the 2nd to last day of the month in February” cannot be parsed by the parser we use for descriptions, but it’s accepted by the API, backend, and Worker, so you can use it in your cron triggers, but the UI won’t give you a description for it.

Using One Cron Parser Everywhere With Rust and Saffron

The Quest for the One True Parser

This brings us to today. In the search of better and faster, we want to bring the number of parsers down from two to one. One source of truth for the entire stack. To make it all faster, we should do parsing on the frontend locally instead of making a call to a remote Worker (if possible). In the API, the separate entry point was a nice easy solution, but starting the schedule runner just to check if a cron string is valid every time a user adds one doesn’t seem like it’s the best it could be.

Luckily Rust has a vibrant ecosystem that can meet all these needs! To bring the parser to the UI, we can compile saffron to wasm and use generated bindings created with wasm-pack. This can be easily integrated with our existing webpack setup, making it simple to get future times and create descriptions of cron strings on the frontend. Then, to bring the parser closer to the API, we can use Rust’s ability to create C APIs that we can then integrate with Go using cgo.

With our parser everywhere, we can then focus exclusively on cron descriptions to replace the one other parser we’re using in the UI. At that point we will have one parser for the whole stack, a single source of truth that anyone can reference to understand how the frontend, API, and backend all work together. It also simplifies our graph. Now instead of multiple libraries written in different languages, we have one library with multiple language wrappers, each serving a different part of the stack. No inconsistencies will exist since they’re all using the same parser!

Using One Cron Parser Everywhere With Rust and Saffron

However, we wanted to do something before that…

We made it open source!

I think this project serves as a great example of Rust’s type system, its safety, and its extensibility across the entire stack. The project itself is simple, easy to understand, and easy to port and provide bindings for. By open sourcing, we can publish packages for these bindings on npm and crates.io, allowing anyone to use these bindings for whatever they want. It also means you can also follow along with development to see the finishing touches get added and maybe make some suggestions for future improvements in the UI and the parser itself.

You can view the project on GitHub at https://github.com/cloudflare/saffron.

Optimizing AWS Lambda cost and performance using AWS Compute Optimizer

Post Syndicated from Chad Schmutzer original https://aws.amazon.com/blogs/compute/optimizing-aws-lambda-cost-and-performance-using-aws-compute-optimizer/

This post is authored by Brooke Chen, Senior Product Manager for AWS Compute Optimizer, Letian Feng, Principal Product Manager for AWS Compute Optimizer, and Chad Schmutzer, Principal Developer Advocate for Amazon EC2

Optimizing compute resources is a critical component of any application architecture. Over-provisioning compute can lead to unnecessary infrastructure costs, while under-provisioning compute can lead to poor application performance.

Launched in December 2019, AWS Compute Optimizer is a recommendation service for optimizing the cost and performance of AWS compute resources. It generates actionable optimization recommendations tailored to your specific workloads. Over the last year, thousands of AWS customers reduced compute costs up to 25% by using Compute Optimizer to help choose the optimal Amazon EC2 instance types for their workloads.

One of the most frequent requests from customers is for AWS Lambda recommendations in Compute Optimizer. Today, we announce that Compute Optimizer now supports memory size recommendations for Lambda functions. This allows you to reduce costs and increase performance for your Lambda-based serverless workloads. To get started, opt in for Compute Optimizer to start finding recommendations.

Overview

With Lambda, there are no servers to manage, it scales automatically, and you only pay for what you use. However, choosing the right memory size settings for a Lambda function is still an important task. Computer Optimizer uses machine-learning based memory recommendations to help with this task.

These recommendations are available through the Compute Optimizer console, AWS CLI, AWS SDK, and the Lambda console. Compute Optimizer continuously monitors Lambda functions, using historical performance metrics to improve recommendations over time. In this blog post, we walk through an example to show how to use this feature.

Using Compute Optimizer for Lambda

This tutorial uses the AWS CLI v2 and the AWS Management Console.

In this tutorial, we setup two compute jobs that run every minute in AWS Region US East (N. Virginia). One job is more CPU intensive than the other. Initial tests show that the invocation times for both jobs typically last for less than 60 seconds. The goal is to either reduce cost without much increase in duration, or reduce the duration in a cost-efficient manner.

Based on these requirements, a serverless solution can help with this task. Amazon EventBridge can schedule the Lambda functions using rules. To ensure that the functions are optimized for cost and performance, you can use the memory recommendation support in Compute Optimizer.

In your AWS account, opt in to Compute Optimizer to start analyzing AWS resources. Ensure you have the appropriate IAM permissions configured – follow these steps for guidance. If you prefer to use the console to opt in, follow these steps. To opt in, enter the following command in a terminal window:

$ aws compute-optimizer update-enrollment-status --status Active

Once you enable Compute Optimizer, it starts to scan for functions that have been invoked for at least 50 times over the trailing 14 days. The next section shows two example scheduled Lambda functions for analysis.

Example Lambda functions

The code for the non-CPU intensive job is below. A Lambda function named lambda-recommendation-test-sleep is created with memory size configured as 1024 MB. An EventBridge rule is created to trigger the function on a recurring 1-minute schedule:

import json
import time

def lambda_handler(event, context):
  time.sleep(30)
  x=[0]*100000000
  return {
    'statusCode': 200,
    'body': json.dumps('Hello World!')
  }

The code for the CPU intensive job is below. A Lambda function named lambda-recommendation-test-busy is created with memory size configured as 128 MB. An EventBridge rule is created to trigger the function on a recurring 1-minute schedule:

import json
import random

def lambda_handler(event, context):
  random.seed(1)
  x=0
  for i in range(0, 20000000):
    x+=random.random()

  return {
    'statusCode': 200,
    'body': json.dumps('Sum:' + str(x))
  }

Understanding the Compute Optimizer recommendations

Compute Optimizer needs a history of at least 50 invocations of a Lambda function over the trailing 14 days to deliver recommendations. Recommendations are created by analyzing function metadata such as memory size, timeout, and runtime, in addition to CloudWatch metrics such as number of invocations, duration, error count, and success rate.

Compute Optimizer will gather the necessary information to provide memory recommendations for Lambda functions, and make them available within 48 hours. Afterwards, these recommendations will be refreshed daily.

These are recent invocations for the non-CPU intensive function:

Recent invocations for the non-CPU intensive function

Function duration is approximately 31.3 seconds with a memory setting of 1024 MB, resulting in a duration cost of about $0.00052 per invocation. Here are the recommendations for this function in the Compute Optimizer console:

Recommendations for this function in the Compute Optimizer console

The function is Not optimized with a reason of Memory over-provisioned. You can also fetch the same recommendation information via the CLI:

$ aws compute-optimizer \
  get-lambda-function-recommendations \
  --function-arns arn:aws:lambda:us-east-1:123456789012:function:lambda-recommendation-test-sleep
{
    "lambdaFunctionRecommendations": [
        {
            "utilizationMetrics": [
                {
                    "name": "Duration",
                    "value": 31333.63587049883,
                    "statistic": "Average"
                },
                {
                    "name": "Duration",
                    "value": 32522.04,
                    "statistic": "Maximum"
                },
                {
                    "name": "Memory",
                    "value": 817.67049838188,
                    "statistic": "Average"
                },
                {
                    "name": "Memory",
                    "value": 819.0,
                    "statistic": "Maximum"
                }
            ],
            "currentMemorySize": 1024,
            "lastRefreshTimestamp": 1608735952.385,
            "numberOfInvocations": 3090,
            "functionArn": "arn:aws:lambda:us-east-1:123456789012:function:lambda-recommendation-test-sleep:$LATEST",
            "memorySizeRecommendationOptions": [
                {
                    "projectedUtilizationMetrics": [
                        {
                            "name": "Duration",
                            "value": 30015.113193697029,
                            "statistic": "LowerBound"
                        },
                        {
                            "name": "Duration",
                            "value": 31515.86878891883,
                            "statistic": "Expected"
                        },
                        {
                            "name": "Duration",
                            "value": 33091.662123300975,
                            "statistic": "UpperBound"
                        }
                    ],
                    "memorySize": 900,
                    "rank": 1
                }
            ],
            "functionVersion": "$LATEST",
            "finding": "NotOptimized",
            "findingReasonCodes": [
                "MemoryOverprovisioned"
            ],
            "lookbackPeriodInDays": 14.0,
            "accountId": "123456789012"
        }
    ]
}

The Compute Optimizer recommendation contains useful information about the function. Most importantly, it has determined that the function is over-provisioned for memory. The attribute findingReasonCodes shows the value MemoryOverprovisioned. In memorySizeRecommendationOptions, Compute Optimizer has found that using a memory size of 900 MB results in an expected invocation duration of approximately 31.5 seconds.

For non-CPU intensive jobs, reducing the memory setting of the function often doesn’t have a negative impact on function duration. The recommendation confirms that you can reduce the memory size from 1024 MB to 900 MB, saving cost without significantly impacting duration. The new duration cost per invocation saves approximately 12%.

The Compute Optimizer console validates these calculations:

Compute Optimizer console validates these calculations

These are recent invocations for the second function which is CPU-intensive:

Recent invocations for the second function which is CPU-intensive

The function duration is about 37.5 seconds with a memory setting of 128 MB, resulting in a duration cost of about $0.000078 per invocation. The recommendations for this function appear in the Compute Optimizer console:

recommendations for this function appear in the Compute Optimizer console

The function is also Not optimized with a reason of Memory under-provisioned. The same recommendation information is available via the CLI:

$ aws compute-optimizer \
  get-lambda-function-recommendations \
  --function-arns arn:aws:lambda:us-east-1:123456789012:function:lambda-recommendation-test-busy
{
    "lambdaFunctionRecommendations": [
        {
            "utilizationMetrics": [
                {
                    "name": "Duration",
                    "value": 36006.85851551957,
                    "statistic": "Average"
                },
                {
                    "name": "Duration",
                    "value": 38540.43,
                    "statistic": "Maximum"
                },
                {
                    "name": "Memory",
                    "value": 53.75978407557355,
                    "statistic": "Average"
                },
                {
                    "name": "Memory",
                    "value": 55.0,
                    "statistic": "Maximum"
                }
            ],
            "currentMemorySize": 128,
            "lastRefreshTimestamp": 1608725151.752,
            "numberOfInvocations": 741,
            "functionArn": "arn:aws:lambda:us-east-1:123456789012:function:lambda-recommendation-test-busy:$LATEST",
            "memorySizeRecommendationOptions": [
                {
                    "projectedUtilizationMetrics": [
                        {
                            "name": "Duration",
                            "value": 27340.37604781184,
                            "statistic": "LowerBound"
                        },
                        {
                            "name": "Duration",
                            "value": 28707.394850202432,
                            "statistic": "Expected"
                        },
                        {
                            "name": "Duration",
                            "value": 30142.764592712556,
                            "statistic": "UpperBound"
                        }
                    ],
                    "memorySize": 160,
                    "rank": 1
                }
            ],
            "functionVersion": "$LATEST",
            "finding": "NotOptimized",
            "findingReasonCodes": [
                "MemoryUnderprovisioned"
            ],
            "lookbackPeriodInDays": 14.0,
            "accountId": "123456789012"
        }
    ]
}

For this function, Compute Optimizer has determined that the function’s memory is under-provisioned. The value of findingReasonCodes is MemoryUnderprovisioned. The recommendation is to increase the memory from 128 MB to 160 MB.

This recommendation may seem counter-intuitive, since the function only uses 55 MB of memory per invocation. However, Lambda allocates CPU and other resources linearly in proportion to the amount of memory configured. This means that increasing the memory allocation to 160 MB also reduces the expected duration to around 28.7 seconds. This is because a CPU-intensive task also benefits from the increased CPU performance that comes with the additional memory.

After applying this recommendation, the new expected duration cost per invocation is approximately $0.000075. This means that for almost no change in duration cost, the job latency is reduced from 37.5 seconds to 28.7 seconds.

The Compute Optimizer console validates these calculations:

Compute Optimizer console validates these calculations

Applying the Compute Optimizer recommendations

To optimize the Lambda functions using Compute Optimizer recommendations, use the following CLI command:

$ aws lambda update-function-configuration \
  --function-name lambda-recommendation-test-sleep \
  --memory-size 900

After invoking the function multiple times, we can see metrics of these invocations in the console. This shows that the function duration has not changed significantly after reducing the memory size from 1024 MB to 900 MB. The Lambda function has been successfully cost-optimized without increasing job duration:

Console shows the metrics from recent invocations

To apply the recommendation to the CPU-intensive function, use the following CLI command:

$ aws lambda update-function-configuration \
  --function-name lambda-recommendation-test-busy \
  --memory-size 160

After invoking the function multiple times, the console shows that the invocation duration is reduced to about 28 seconds. This matches the recommendation’s expected duration. This shows that the function is now performance-optimized without a significant cost increase:

Console shows that the invocation duration is reduced to about 28 seconds

Final notes

A couple of final notes:

  • Not every function will receive a recommendation. Compute optimizer only delivers recommendations when it has high confidence that these recommendations may help reduce cost or reduce execution duration.
  • As with any changes you make to an environment, we strongly advise that you test recommended memory size configurations before applying them into production.

Conclusion

You can now use Compute Optimizer for serverless workloads using Lambda functions. This can help identify the optimal Lambda function configuration options for your workloads. Compute Optimizer supports memory size recommendations for Lambda functions in all AWS Regions where Compute Optimizer is available. These recommendations are available to you at no additional cost. You can get started with Compute Optimizer from the console.

To learn more visit Getting started with AWS Compute Optimizer.

 

Ingesting MongoDB Atlas data using Amazon EventBridge

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/ingesting-mongodb-atlas-data-using-amazon-eventbridge/

This post is courtesy of Gopalakrishnan Ramaswamy, Solutions Architect

Amazon EventBridge is a serverless event bus that makes it easier to connect applications together using data from your own applications, integrated software as a service (SaaS) applications, and AWS services. It does so by delivering a stream of real-time data from various event sources. You can set up routing rules to send data to targets like AWS Lambda and build loosely coupled application architectures that react in near-real time to data sources.

MongoDB is a document database, which means it stores data in JSON-like documents. It provides a query language and has support for multi-document ACID transactions. MongoDB Atlas is a fully managed MongoDB database service hosted on the cloud. It can be used as a globally distributed database that automates administrative tasks such as database configuration, infrastructure provisioning, patching, scaling, and backups.

With EventBridge, you can use data from MongoDB to trigger workflows for customer support, business operations and more. In this post, I walk through the process of connecting MongoDB Atlas with the AWS Cloud and triggering events from changes in the MongoDB collections data.

Overview

The following diagram shows the high-level architecture of an example scenario to ingest MongoDB data into the AWS Cloud using Amazon EventBridge.

Solution architecture

MongoDB stores data records as BSON documents, which are gathered together in collections. A database stores one or more collections of documents.

This walkthrough shows you how to:

  1. Create a MongoDB cluster and load sample data.
  2. Create a database trigger associated to a collection.
  3. Create an event bus in AWS, linked to the partner event source.
  4. Create a Lambda function and the associated role with permissions.
  5. Create an EventBridge rule and associate it to the Lambda function.
  6. Verify the process.

Steps 3–5 create and configure the AWS resources using the AWS Serverless Application Model (AWS SAM). To set up the sample application, visit the GitHub repo and follow the instructions in the README.md file.

Prerequisites

This walkthrough requires:

  • An AWS account.
  • A MongoDB account.
  • The AWS SAM CLI installed and configured on your machine.

Creating a MongoDB Atlas cluster and loading sample data

For detailed steps to create a cluster and load data, see MongoDB Atlas documentation. To create the test cluster:

  1. Create a MongoDB Atlas account.
  2. Deploy a free tier cluster using these instructions, selecting your preferred cloud provider and Region.
  3. Add your trusted connection IP address to the IP access list. This allows to connect to the cluster and access the data.
  4. After connecting to your cluster, load sample data into your cluster:
    • Navigate to the clusters view by choosing Clusters in the left navigation pane.
    • Select the cluster, choose the ellipses (…) button, and Load Sample Dataset.

MongoDB clusters UI

Create MongoDB database trigger

MongoDB database triggers allow you to run server-side logic when a document is added, updated, or removed in a linked cluster. Use database triggers to implement complex data interactions, including updating information in one document when a related document changes or interacting with an external service when a new document is inserted.

  1. Sign in to your account and choose Triggers in the left-hand panel.
  2. Choose Add Trigger to open the trigger configuration page.
  3. Select Database for Trigger Type.Add trigger
  4. Enter a name for the trigger.
  5. In the Trigger Source Details section:
    • Select the cluster with sample data loaded (for example, Cluster0) for Cluster Name.
    • For Database Name select sample_analytics.
    • Select customers for Collection Name.
    • Check Insert, Update, Delete, and Replace for Operation Type.Trigger source details
  6. In the Function section:
    • For Select An Event Type, Select EventBridge.
    • Enter your AWS Account ID. Learn how to find your account ID in this documentation.
    • Select an AWS Region where the event bus will be created.EventBridge configuration
  7. Choose Save.

Once a MongoDB Atlas trigger is created, it creates a corresponding partner event source in the Amazon EventBridge console. Initially, these event sources show as Pending with no event bus associated to them.

Partner event source

Next, use the AWS SAM template in the GitHub repo to create the event bus, Lambda function, and event rule.

  1. Clone the GitHub repo and deploy the AWS SAM template:
    git clone https://github.com/aws-samples/amazon-eventbridge-partnerevent-example
    cd ./amazon-eventbridge-partnerevent-example
    sam deploy --guided
  2. Choose a stack name and enter the partner event source name.

The next section explains the steps that are performed by the AWS SAM template.

Creating the event bus

To receive events from SaaS partners, an event bus must be created that is associated to the partner event source:

  PartnerEventBus: 
    Type: AWS::Events::EventBus
    Properties: 
      EventSourceName: !Ref PartnerEventSource
      Name: !Ref PartnerEventSource

The partner event source name and the name of the event bus are derived from the parameter entered when running the template.

Once you create an event bus associated with a partner event source, the status of the partner event source changes to Active. A new event bus with the same name as the partner event source is created. You can see this in the EventBridge console, in Event buses in the left-hand panel.

Partner event sources

Creating the Lambda function

The following section of the template creates a Lambda function that is invoked by an event rule:

  myeventfunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: eventLambda/
      Handler: index.handler
      Runtime: nodejs12.x
      FunctionName: myeventfunction

Creating an event bus rule

The following section in the template creates an event rule that triggers the preceding Lambda function. The event pattern used by the rule, selects and routes events to targets.

  myeventrule:
    Type: 'AWS::Events::Rule'
    Properties:
      Description: Test Events Rule
      EventBusName: !Ref PartnerEventSource
      EventPattern: 
        account: [!Ref AWS::AccountId]
      Name: myeventrule
      State: ENABLED
      Targets:
       - 
         Arn: 
           Fn::GetAtt:
             - "myeventfunction"
             - "Arn"
         Id: "idmyeventrule"

Permission is provided to the rule, to invoke Lambda functions. This allows the rule to trigger the associated Lambda function:

  PermissionForEventsToInvokeLambda: 
    Type: AWS::Lambda::Permission
    Properties: 
      FunctionName: 
        Ref: "myeventfunction"
      Action: "lambda:InvokeFunction"
      Principal: "events.amazonaws.com"
      SourceArn: 
        Fn::GetAtt: 
          - "myeventrule"
          - "Arn"         

Verifying the integration

After deploying the AWS SAM template, verify that the EventBridge integration works by inserting test data into the source MongoDB collection. After adding this data, the event is sent to the event bus, which invokes the Lambda function. This is shown in the CloudWatch logs for the event payload.

To verify the deployment:

  1. Download and install the MongoDB shell.
  2. Connect to MongoDB shell using:
    mongo "mongodb+srv://cluster0.xvo4o.mongodb.net/sample_analytics" --username yourusername

    Replace the cluster name with the cluster you created. Connect to the sample_analytics database, which has the sample data and collections.

  3. Next, insert a record into the customers collection with associated the database trigger. In the MongoDB shell, run the following command:
    db.customers.insertOne(
    {
      username:"myuser99",
      name:"Eventbridge Mongo",
      address:"My Address XYZ",
      birthdate:{"$date":"1975-03-02T02:20:31.000Z"},
      email:"[email protected]",
      active:true,
      accounts:[371138,324287,276528,332179,422649,387979],
      tier_and_details:{
         "0df078f33aa74a2e9696e0520c1a828a":{
         tier:"Bronze",
         id:"0df078f33aa74a2e9696e0520c1a828a",
         active:true,
         benefits:["sports tickets"]
        },
       "699456451cc24f028d2aa99d7534c219":{
       tier:"Bronze",
       benefits:["24 hour dedicated line","concierge services"],
       active:true,
       id:"699456451cc24f028d2aa99d7534c219"
      }
      }
      }
    )
    
  4. Once the record is successfully inserted:
    • Navigate to CloudWatch in the AWS console and choose Log groups in the left-hand panel.
    • Search for the log group /aws/lambda/myeventfunction and choose the event stream.
    • Expand the log items to reveal the event. This contains the payload that was sent from MongoDB Atlas to EventBridge.

Conclusion

This post demonstrates how to connect MongoDB Atlas data with the AWS Cloud using Amazon EventBridge. EventBridge helps you connect data from a range of SaaS applications using minimal code. It can help reduce operational overhead and build powerful event-driven architectures more easily. For more information about integrating data between SaaS applications, see Amazon EventBridge.

For more serverless learning resources, visit Serverless Land.

Automating mutual TLS setup for Amazon API Gateway

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/automating-mutual-tls-setup-for-amazon-api-gateway/

This post is courtesy of Pankaj Agrawal, Solutions Architect.

In September 2020, Amazon API Gateway announced support for mutual Transport Layer Security (TLS) authentication. This is a new method for client-to-server authentication that can be used with API Gateway’s existing authorization options. Mutual TLS (mTLS) is an extension of Transport Layer Security(TLS), requiring both the server and client to verify each other.

Mutual TLS is commonly used for business-to-business (B2B) applications. It’s used in standards such as Open Banking, which enables secure open API integrations for financial institutions. It’s also common for Internet of Things (IoT) applications to authenticate devices using digital certificates.

This post covers automating the mTLS setup for API Gateway HTTP APIs, but the same steps can also be used for REST APIs as well. Download the code used in this walkthrough from the project’s GitHub repo.

Overview

To enable mutual TLS, you must create an API with a valid custom domain name. Mutual TLS is available for both regional REST APIs and the newer HTTP APIs. To set up mutual TLS with API Gateway, you must upload a certificate authority (CA) public key certificate to Amazon S3. This is called a truststore and is used for validating client certificates.

Reference architecture

The AWS Certificate Manager Private Certificate Authority (ACM Private CA) is a highly available private CA service. I am using the ACM Private CA as a certificate authority to configure HTTP APIs and to distribute certificates to clients.

Deploying the solution

To deploy the application, the solution uses the AWS Serverless Application Model (AWS SAM). AWS SAM provides shorthand syntax to define functions, APIs, databases, and event source mappings. As a prerequisite, you must have AWS SAM CLI and Java 8 installed. You must also have the AWS CLI configured.

To deploy the solution:

  1. Clone the GitHub repository and build the application with the AWS SAM CLI. Run the following commands in a terminal:
    git clone https://github.com/aws-samples/api-gateway-auth.git
    cd api-gateway-auth
    sam build

    Console output

  2. Deploy the application:
    sam deploy --guided

Provide a stack name and preferred AWS Region for the deployment process. The template requires three parameters:

  1. HostedZoneId: The template uses an Amazon Route 53 public hosted zone to configure the custom domain. Provide the hosted zone ID where the record set must be created.
  2. DomainName: The custom domain name for the API Gateway HTTP API.
  3. TruststoreKey: The name for the trust store file in S3 bucket, which is used by API Gateway for mTLS. By default its truststore.pem.

SAM deployment configuration

After deployment, the stack outputs the ARN of a test client certificate (ClientOneCertArn). This is used to validate the setup later. The API Gateway HTTP API endpoint is also provided as output.

SAM deployment output

You have now created an API Gateway HTTP APIs endpoint using mTLS.

Setting up the ACM Private CA

The AWS SAM template starts with setting up the ACM Private CA. This enables you to create a hierarchy of certificate authorities with up to five levels. A well-designed CA hierarchy offers benefits such as granular security controls and division of administrative tasks. To learn more about the CA hierarchy, visit designing a CA hierarchy. The ACM Private CA is used to configure HTTP APIs and to distribute certificates to clients.

First, a root CA is created and activated, followed by a subordinate CA following best practices. The subordinate CA is used to configure mTLS for the API and distribute the client certificates.

  PrivateCA:
    Type: AWS::ACMPCA::CertificateAuthority
    Properties:
      KeyAlgorithm: RSA_2048
      SigningAlgorithm: SHA256WITHRSA
      Subject:
        CommonName: !Sub "${AWS::StackName}-rootca"
      Type: ROOT

  PrivateCACertificate:
    Type: AWS::ACMPCA::Certificate
    Properties:
      CertificateAuthorityArn: !Ref PrivateCA
      CertificateSigningRequest: !GetAtt PrivateCA.CertificateSigningRequest
      SigningAlgorithm: SHA256WITHRSA
      TemplateArn: 'arn:aws:acm-pca:::template/RootCACertificate/V1'
      Validity:
        Type: YEARS
        Value: 10

  PrivateCAActivation:
    Type: AWS::ACMPCA::CertificateAuthorityActivation
    Properties:
      Certificate: !GetAtt
        - PrivateCACertificate
        - Certificate
      CertificateAuthorityArn: !Ref PrivateCA
      Status: ACTIVE

  MtlsCA:
    Type: AWS::ACMPCA::CertificateAuthority
    Properties:
      Type: SUBORDINATE
      KeyAlgorithm: RSA_2048
      SigningAlgorithm: SHA256WITHRSA
      Subject:
        CommonName: !Sub "${AWS::StackName}-mtlsca"

  MtlsCertificate:
    DependsOn: PrivateCAActivation
    Type: AWS::ACMPCA::Certificate
    Properties:
      CertificateAuthorityArn: !Ref PrivateCA
      CertificateSigningRequest: !GetAtt
        - MtlsCA
        - CertificateSigningRequest
      SigningAlgorithm: SHA256WITHRSA
      TemplateArn: 'arn:aws:acm-pca:::template/SubordinateCACertificate_PathLen3/V1'
      Validity:
        Type: YEARS
        Value: 3

  MtlsActivation:
    Type: AWS::ACMPCA::CertificateAuthorityActivation
    Properties:
      CertificateAuthorityArn: !Ref MtlsCA
      Certificate: !GetAtt
        - MtlsCertificate
        - Certificate
      CertificateChain: !GetAtt
        - PrivateCAActivation
        - CompleteCertificateChain
      Status: ACTIVE

Issuing client certificate from ACM Private CA

Create a client certificate, which is used as a test certificate to validate the mTLS setup:

ClientOneCert:
    DependsOn: MtlsActivation
    Type: AWS::CertificateManager::Certificate
    Properties:
      CertificateAuthorityArn: !Ref MtlsCA
      CertificateTransparencyLoggingPreference: ENABLED
      DomainName: !Ref DomainName
      Tags:
        - Key: Name
          Value: ClientOneCert

Setting up a truststore in Amazon S3

The ACM Private CA is ready for configuring mTLS on the API. The configuration uses an S3 object as its truststore to validate client certificates. To automate this, an AWS Lambda backed custom resource copies the public certificate chain of the ACM Private CA to the S3 bucket:

  TrustStoreBucket:
    Type: AWS::S3::Bucket
    Properties:
      VersioningConfiguration:
        Status: Enabled

  TrustedStoreCustomResourceFunction:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: TrustedStoreCustomResourceFunction
      Handler: com.auth.TrustedStoreCustomResourceHandler::handleRequest
      Timeout: 120
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref TrustStoreBucket

The example custom resource is written in Java but it could also be written in another language runtime. The custom resource is invoked with the public certificate details of the private root CA, subordinate CAs, and the target S3 bucket. The Lambda function then concatenates the certificate chain and stores the object in the S3 bucket.

TrustedStoreCustomResource:
    Type: Custom::TrustedStore
    Properties:
      ServiceToken: !GetAtt TrustedStoreCustomResourceFunction.Arn
      TrustStoreBucket: !Ref TrustStoreBucket
      TrustStoreKey: !Ref TruststoreKey
      Certs:
        - !GetAtt MtlsCertificate.Certificate
        - !GetAtt PrivateCACertificate.Certificate

You can view and download the handler code for the Lambda-backed custom resource from the repo.

Configuring Amazon API Gateway HTTP APIs with mTLS

With a valid truststore object in the S3 bucket, you can set up the API. A valid custom domain must be configured for API Gateway to enable mTLS. The following code creates and sets up a custom domain for HTTP APIs. See template.yaml for a complete example.

CustomDomainCert:
    Type: AWS::CertificateManager::Certificate
    Properties:
      CertificateTransparencyLoggingPreference: ENABLED
      DomainName: !Ref DomainName
      DomainValidationOptions:
        - DomainName: !Ref DomainName
          HostedZoneId: !Ref HostedZoneId
      ValidationMethod: DNS

  SampleHttpApi:
    Type: AWS::Serverless::HttpApi
    DependsOn: TrustedStoreCustomResource
    Properties:
      CorsConfiguration:
        AllowMethods:
          - GET
        AllowOrigins:
          - http://localhost:8080
      Domain:
        CertificateArn: !Ref CustomDomainCert
        DomainName: !Ref DomainName
        EndpointConfiguration: REGIONAL
        SecurityPolicy: TLS_1_2
        MutualTlsAuthentication:
          TruststoreUri: !GetAtt TrustedStoreCustomResource.TrustStoreUri
          TruststoreVersion: !GetAtt TrustedStoreCustomResource.ObjectVersion
        Route53:
          EvaluateTargetHealth: False
          HostedZoneId: !Ref HostedZoneId
        DisableExecuteApiEndpoint: true

An Amazon Route 53 public hosted zone is used to configure the custom domain. This must be set up in your AWS account separately and you must provide the hosted zone ID as a parameter to the template.

Since the HTTP APIs default endpoint does not require mutual TLS, it is disabled via DisableExecuteApiEndpoint. This helps to ensure that mTLS authentication is enforced for all traffic to the API.

The sample API invokes a Lambda function and returns the request payload as the response.

Testing and validating the setup

To validate the setup, first export the client certificate created earlier. You can export the certificate by using the AWS Management Console or AWS CLI. This example uses the AWS CLI to export the certificate. To learn how to do this via the console, see exporting a private certificate using the console.

  1. Export the base64 PEM-encoded certificate to a local file, client.pem.aws acm export-certificate --certificate-arn <<Certificat ARN from stack output>>
    --passphrase $(echo -n 'your paraphrase' | base64) --region us-east-2 | jq -r '"\(.Certificate)"' > client.pem
  2. Export the encrypted private key associated with the public key in the certificate and save it to a local file client.encrypted.key. You must provide a passphrase to associate with the encrypted private key. This is used to decrypt the exported private key.aws acm export-certificate --certificate-arn <<Certificat ARN from stack output>>
    --passphrase $(echo -n 'your paraphrase' | base64) --region us-east-2| jq -r '"\(.PrivateKey)"' > client.encrypted.key
  3. Decrypt the exported private key using passphrase and OpenSSL:openssl rsa -in client.encrypted.key -out client.decrypted.key
  4. Access the API using mutual TLS:curl -v --cert client.pem  --key client.decrypted.key https://demo-api.example.com

Adding a certificate revocation list

AWS Certificate Manager Private Certificate Authority (ACM Private CA) can be natively configured with an optional certificate revocation list (CRL).

CRL is a way for certificate authority (CA) to make it known that one or more of their digital certificates is no longer trustworthy. When they revoke a certificate, they invalidate the certificate ahead of its expiration date. The certificate authority can revoke an issued certificate for several reasons, the most common one being that the certificate’s private key are compromised.

API Gateway HTTP APIs mTLS setup can be used along with all existing API Gateway authorizer options. You can further extend validation to AWS Lambda authorizers, which can be configured to validate the client certificates against this certificate revocation list (CRL). For example:

Certificate revocation architecture

For Lambda authorizer blueprint examples, refer to aws-apigateway-lambda-authorizer-blueprints.

Conclusion

Mutual TLS (mTLS) for API Gateway is now generally available at no additional cost. This post shows how to automate mutual TLS for Amazon API Gateway HTTP APIs using the AWS Certificate Manager Private Certificate Authority as a private CA. Using infrastructure as code (IaC) enables you to develop, deploy, and scale cloud applications, often with greater speed, less risk, and reduced cost.

Download the complete working example for deploying mTLS with API Gateway at this GitHub repo. To learn more about Amazon API Gateway, visit the API Gateway developer guide documentation.

For more serverless learning resources, visit Serverless Land.

Cloudflare Acquires Linc

Post Syndicated from Aly Cabral original https://blog.cloudflare.com/cloudflare-acquires-linc/

Cloudflare Acquires Linc

Cloudflare Acquires Linc

Cloudflare has always been about democratizing the Internet. For us, that means bringing the most powerful tools used by the largest of enterprises to the smallest development shops. Sometimes that looks like putting our global network to work defending against large-scale attacks. Other times it looks like giving Internet users simple and reliable privacy services like 1.1.1.1.  Last week, it looked like Cloudflare Pages — a fast, secure and free way to build and host your JAMstack sites.

We see a huge opportunity with Cloudflare Pages. It goes beyond making it as easy as possible to deploy static sites, and extending that same ease of use to building full dynamic applications. By creating a seamless integration between Pages and Cloudflare Workers, we will be able to host the frontend and backend together, at the edge of the Internet and close to your users. The Linc team is joining Cloudflare to help us do just that.

Today, we’re excited to announce the acquisition of Linc, an automation platform to help front-end developers collaborate and build powerful applications. Linc has done amazing work with Frontend Application Bundles (FABs), making dynamic backends more accessible to frontend developers. Their approach offers a straightforward path to building end-to-end applications on Pages, with both frontend logic and powerful backend logic in one bundle. With the addition of Linc, we will accelerate Pages to enable richer and more powerful full-stack applications.

Combining Cloudflare’s edge network with Linc’s approach to server-side rendering, we’re able to set a new standard for performance on the web by delivering the speed of powerful servers close to users. Now, I’ll hand it over to Glen Maddern, who was the CTO of Linc, to share why they joined Cloudflare.


Linc and the Frontend Application Bundle (FAB) specification were designed with a single goal in mind: to give frontend developers the best possible tools to build, review, refine, and deploy their applications. An important piece of that is making server-side logic and rendering much more accessible, regardless of what type of app you’re building.

Static vs Dynamic frontends

One of the biggest problems in frontend web development today is the dramatic difference in complexity when moving from generating static sites (e.g. building a directory full of HTML, JS, and CSS files) to hosting a full application (traditionally using NodeJS and a web server like Express). While you gain the flexibility of being able to render everything on-demand and customised for the current user, you increase your maintenance cost — you now have servers that you need to keep running. And unless you’re operating at a global scale already, you’ll often see worse end-user performance as your requests are only being served from one or maybe a couple of locations worldwide.

While serverless platforms have arisen to solve these problems for backend services and can be brought to bear on frontend apps, they’re much less cost-effective than using static hosting, especially if the bulk of your frontend assets are static. As such, we’ve seen a rise of technologies under the umbrella term of “JAMstack”; they aim at making static sites more powerful (like rebuilding based off CMS updates), or at making it possible to deploy small pieces of server-side APIs as “cloud functions”, along with each update of your app. But it’s still fundamentally a limited architecture — you always have a static layer between you and your users, so the more dynamic your needs, the more complex your build pipeline becomes, or the more you’re forced to rely on client-side logic.

Cloudflare Acquires Linc

FABs took a different approach: a deployment artefact that could support the full range of server-side needs, from entirely static sites, apps with some API routes or cloud functions, all the way to full server-side streaming rendering. We also made it compatible with all the cloud hosting providers you might want, so that deploying becomes as easy as uploading a ZIP file. Then, as your needs change, as dynamic content becomes more important, as new frameworks arise that offer increasing performance or you look at moving which provider you’re hosting with, you never need to change your tooling and deployment processes.

The FAB approach

Regardless of what framework you’re working with, the FAB compiler generates a fab.zip file that has two components: a server.js file that acts as a server-side entry point, and an _assets directory that stores the HTML, CSS, JS, images, and fonts that are sent to the client.

Cloudflare Acquires Linc

This simple structure gives us enough flexibility to handle all kinds of apps. For example, a static site will have a server.js of only a few auto-generated lines of server-side code, just enough to add redirects for any files outside the _assets directory. On the other end of the spectrum, an app with full server rendering looks and works exactly the same. It just has a lot more code inside its server.js file.

On a server running NodeJS, serving a compiled FAB is as easy as fab serve fab.zip, but FABs are really designed with production class hosting in mind. They make use of world-class CDNs and the best serverless hosting platforms around.

Cloudflare Acquires Linc

When a FAB is deployed, it’s often split into these component parts and deployed separately. Assets are sent to a low-cost object storage platform with a CDN in front of it, and the server component is sent to dedicated serverless hosting. It’s all deployed in an atomic, idempotent manner that feels as simple as uploading static files, but completely unlocks dynamic server-side code as part of your architecture.

That generic architecture works great and is compatible with virtually every hosting platform around, but it works slightly differently on Cloudflare Workers.

Cloudflare Acquires Linc

Workers, unlike other serverless platforms, truly runs at the edge: there is no CDN or load balancer in front of it to split off /_assets routes and send them directly to the Assets storage. This means that every request hits the worker, whether it’s triggering a full page render or piping through the bytes for an image file. It might feel like a downside, but with Workers’ performance and cost profile, it’s quite the opposite — it actually gives us much more flexibility in what we end up building, and gets us closer to the goal of fully unlocking server-side code.

To give just one example, we no longer need to store our asset files on a dedicated static file host — instead, we can use Cloudflare’s global key-value storage: Workers KV. Our server.js running inside a Worker can then map /_assets requests directly into the KV store and stream the result to the user. This results in significantly better performance than proxying to a third-party asset host.

What we’ve found is that Cloudflare offered the most “FAB-native” hosting option, and so it’s very exciting to have the opportunity to further develop what they can do.

Linc + Cloudflare

As we stated above, Linc’s goal was to give frontend developers the best tooling to build and refine their apps, regardless of which hosting they were using. But we started to notice an important trend —  if a team had a free choice for where to host their frontend, they inevitably chose Cloudflare Workers. In some cases, for a period, teams even used Linc to deploy a FAB to Workers alongside their existing hosting to demonstrate the performance improvement before migrating permanently.

At the same time, we started to see more and more opportunities to fully embrace edge-rendering and make global serverless hosting more powerful and accessible. But the most exciting ideas required deep integration with the hosting providers themselves. Which is why, when we started talking to Cloudflare, everything fell into place.

We’re so excited to join the Cloudflare effort and work on expanding Cloudflare Pages to cover the full spectrum of applications. Not only do they share our goal of bringing sophisticated technology to every development team, but with innovations like Durable Objects starting to offer new storage paradigms, the potential for a truly next-generation deployment, review &  hosting platform is tantalisingly close.

Introducing Cloudflare Pages: the best way to build JAMstack websites

Post Syndicated from Rita Kozlov original https://blog.cloudflare.com/cloudflare-pages/

Introducing Cloudflare Pages: the best way to build JAMstack websites

Introducing Cloudflare Pages: the best way to build JAMstack websites

Across multiple cultures around the world, this time of year is a time of celebration and sharing of gifts with the people we care the most about. In that spirit, we thought we’d take this time to give back to the developer community that has been so supportive of Cloudflare for the last 10 years.

Today, we’re excited to announce Cloudflare Pages: a fast, secure and free way to build and host your JAMstack sites.

Today, the path from an idea to a website is paved with good intentions

Websites are the way we express ourselves on the web. It doesn’t matter if you’re a hobbyist with a blog, or the largest of corporations with millions of customers — if you want to reach people outside the confines of 140 280 characters, the web is the place to be.

As a frontend developer, it’s your responsibility to bring this expression to life. And make no mistake — with so many frontend frameworks, tooling, and static site generators at your disposal — it’s a great time to be in your line of work.

That is, of course, right up until the point when you’re ready to show your work off to the world. That’s when things can start to get a little hairy.

At this point, continuing to keep things local rather than committing to source starts to become… irresponsible. But then: how do you quickly iterate and maintain momentum? As you change things, you need to make sure those changes don’t get lost — saving them to source control — while keeping in sync with what’s currently deployed to production.

There are no great solutions.

If you’re in a larger organization, you might have a DevOps organization devoted to exactly that: automating deployments using Continuous Integration (CI) tooling.

Most CI tooling, however, is quite cumbersome, and for good reason — to allow organizations to customize their automation, regardless of their stack and setup. But for the purpose of developing a website, it can still feel like an unnecessary and frustrating diversion on the road to delivering your web project. Configuring a .yaml file, adding and removing commands, waiting minutes for each build to run, and praying to the CI gods at each one that these are the right commands. Hopelessly rerunning the same build over and over, and expecting a different result.  

Often, hours are lost. The process stands in the way of you and doing your best work.

Cloudflare Pages: letting frontend devs do what they do best

We think there’s a better way.

With Cloudflare Pages, we set out to simplify every step along the journey by tying deployment to your existing development workflow.

Seamless Git integration, with builds built-in

With Cloudflare Pages, all you have to do is select your repo, and tell us which framework you’re using. We’ll take care of chanting CI incantations on your behalf, while you keep doing what you were already doing: git commit and git push your changes — we’ll build and deploy them for you.

As the project grows, so do the stakes, and the number of collaborators.

For a site in production, changes need to be reviewed thoroughly. As the reviewer, looking at the code, and skimming for red flags only gets you so far. To thoroughly review, you have to commit or git stash your changes, pull down locally, get it running to make sure it actually works — looking at code alone won’t catch everything!

The other developers on the team are not the only stakeholders. There are designers, marketers, PMs who want to provide feedback before the changes go out.

Unique preview URLs

With Cloudflare Pages, each commit gets its own unique URL. Preview URLs make it easier to get meaningful code reviews without the overhead of pulling down the branch. They also make it easier to get feedback from PMs, designers and marketers on the latest iteration, bridging the gap between mocks and code.

Infinite staging

“Does anyone mind if I take over staging?” might also sound like a familiar question. With Cloudflare Pages, each feature branch will have its own dedicated consistent alias, allowing you to have a consistent URL for the latest changes.

With Preview and Production environments, all feature branches and preview links will be built with preview variables, so you can experiment without impacting production data.

When you’re ready to deploy to production, we’ll redeploy to production for you with the updated production environment variables.

Collaboration for all

Collaboration is the key to building amazing websites and products — the more the merrier! As a security company, we definitely don’t want you sharing password and credentials. Which is why we provide multi user access for free for unlimited users — invite all your friends, on us!

Modern sites with modern standards

We all know premature optimization is a cardinal sin, but once your project is in front of customers you want to have the best performance possible. If it’s successful, you also want it to be available!

Today, this is time you have to spend optimizing performance (chasing those 100 lighthouse scores), and scaling, from a few to millions of users.

Luckily, we happen to know a thing or two about running a global network of 200 data centers though, so we’ve got you covered.

With Pages, your site is deployed directly to our edge, milliseconds away from customers, and at global scale.

The latest web standards are fun to read about on Hacker News but not fun to implement yourself. With Cloudflare Pages, we’ll do the heavy lifting to keep you ahead of the curve: IPv6, HTTP/3, TLS 1.3, all the latest image formats.

Oh, and one more thing

We’re really excited for developers and their teams to use Cloudflare Pages to collaborate on the best static sites together. There’s just one thing that didn’t sit quite right with us: why stop at static sites?

What if we could make building full-blown, dynamic applications just as easy?

Although APIs are a core part of the JAMstack, today that refers primarily to the robust API economy developers have access to. And while that’s great, it’s not always enough. If you want to build your own APIs, and store user or application data, you need more than third party APIs. What to do, though?

Well, this is the point at which it’s mighty helpful we’ve already built a global serverless platform: Cloudflare Workers. Workers allows frontend developers to easily write scalable backends to their applications in the same language as the frontend, JavaScript.

Over the coming months, we’ll be working on integrating Workers and Pages into a seamless experience. It’ll work the exact same way Pages does: just write your code, git push, and we’ll deploy it for you. The only difference is, it won’t just be your frontend, it’ll be your backend, too. And just to be clear: this is not just for stateless functions. With Workers KV and Durable Objects, we see a huge opportunity to really enable any web application to be built on this platform.

We’re super excited about the future of Pages, and how with the power of Cloudflare Workers behind it, it represents a bold vision for how new applications are going to be built on the web.

But you know the thing about gifts? They’re no good without someone to receive them. We’d love for you to sign up for our beta and try out Cloudflare Pages!

PS: we’re hiring!

Want to help us shape the future of development on the web? Join our team.

Using container image support for AWS Lambda with AWS SAM

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/using-container-image-support-for-aws-lambda-with-aws-sam/

At AWS re:Invent 2020, AWS Lambda released Container Image Support for Lambda functions. This new feature allows developers to package and deploy Lambda functions as container images of up to 10 GB in size. With this release, AWS SAM also added support to manage, build, and deploy Lambda functions using container images.

In this blog post, I walk through building a simple serverless application that uses Lambda functions packaged as container images with AWS SAM. I demonstrate creating a new application and highlight changes to the AWS SAM template specific to container image support. I then cover building the image locally for debugging in addition to eventual deployment. Finally, I show using AWS SAM to handle packaging and deploying Lambda functions from a developer’s machine or a CI/CD pipeline.

Push to invoke lifecycle

Push to invoke lifecycle

The process for creating a Lambda function packaged as a container requires only a few steps. A developer first creates the container image and tags that image with the appropriate label. The image is then uploaded to an Amazon Elastic Container Registry (ECR) repository using docker push.

During the Lambda create or update process, the Lambda service pulls the image from ECR, optimizes the image for use, and deploys the image to the Lambda service. Once this, and any other configuration processes are complete, the Lambda function is then in Active status and ready to be invoked. The AWS SAM CLI manages most of these steps for you.

Prerequisites

The following tools are required in this walkthrough:

Create the application

Use the terminal and follow these steps to create a serverless application:

  1. Enter sam init.
  2. For Template source, select option one for AWS Quick Start Templates.
  3. For Package type, choose option two for Image.
  4. For Base image, select option one for amazon/nodejs12.x-base.
  5. Name the application demo-app.
Demonstration of sam init

Demonstration of sam init

Exploring the application

Open the template.yaml file in the root of the project to see the new options available for container image support. The AWS SAM template has two new values that are required when working with container images. PackageType: Image tells AWS SAM that this function is using container images for packaging.

AWS SAM template

AWS SAM template

The second set of required data is in the Metadata section that helps AWS SAM manage the container images. When a container is created, a new tag is added to help identify that image. By default, Docker uses the tag, latest. However, AWS SAM passes an explicit tag name to help differentiate between functions. That tag name is a combination of the Lambda function resource name, and the DockerTag value found in the Metadata. Additionally, the DockerContext points to the folder containing the function code and Dockerfile identifies the name of the Dockerfile used in building the container image.

In addition to changes in the template.yaml file, AWS SAM also uses the Docker CLI to build container images. Each Lambda function has a Dockerfile that instructs Docker how to construct the container image for that function. The Dockerfile for the HelloWorldFunction is at hello-world/Dockerfile.

Local development of the application

AWS SAM provides local development support for zip-based and container-based Lambda functions. When using container-based images, as you modify your code, update the local container image using sam build. AWS SAM then calls docker build using the Dockerfile for instructions.

Dockerfile for Lambda function

Dockerfile for Lambda function

In the case of the HelloWorldFunction that uses Node.js, the Docker command:

  1. Pulls the latest container base image for nodejs12.x from the Amazon Elastic Container Registry Public.
  2. Copies the app.js code and package.json files to the container image.
  3. Installs the dependencies inside the container image.
  4. Sets the invocation handler.
  5. Creates and tags new version of the local container image.

To build your application locally on your machine, enter:

sam build

The results are:

Results for sam build

Results for sam build

Now test the code by locally invoking the HelloWorldFunction using the following command:

sam local invoke HelloWorldFunction

The results are:

Results for sam local invoke

Results for sam local invoke

You can also combine these commands and add flags for cached and parallel builds:

sam build --cached --parallel && sam local invoke HelloWorldFunction

Deploying the application

There are two ways to deploy container-based Lambda functions with AWS SAM. The first option is to deploy from AWS SAM using the sam deploy command. The deploy command tags the local container image, uploads it to ECR, and then creates or updates your Lambda function. The second method is the sam package command used in continuous integration and continuous delivery or deployment (CI/CD) pipelines, where the deployment process is separate from the artifact creation process.

AWS SAM package tags and uploads the container image to ECR but does not deploy the application. Instead, it creates a modified version of the template.yaml file with the newly created container image location. This modified template is later used to deploy the serverless application using AWS CloudFormation.

Deploying from AWS SAM with the guided flag

Before you can deploy the application, use the AWS CLI to create a new ECR repository to store the container image for the HelloWorldFunction.

Run the following command from a terminal:

aws ecr create-repository --repository-name demo-app-hello-world \
--image-tag-mutability IMMUTABLE --image-scanning-configuration scanOnPush=true

This command creates a new ECR repository called demo-app-hello-world. The –image-tag-mutability IMMUTABLE option prevents overwriting tags. The –image-scanning-configuration scanOnPush=true enables automated vulnerability scanning whenever a new image is pushed to the repository. The output is:

Amazon ECR creation output

Amazon ECR creation output

Make a note of the repositoryUri as you need it in the next step.

Before you can push your images to this new repository, ensure that you have logged in to the managed Docker service that ECR provides. Update the bracketed tokens with your information and run the following command in the terminal:

aws ecr get-login-password --region <region> | docker login --username AWS \
--password-stdin <account id>.dkr.ecr.<region>.amazonaws.com

You can also install the Amazon ECR credentials helper to help facilitate Docker authentication with Amazon ECR.

After building the application locally and creating a repository for the container image, you can deploy the application. The first time you deploy an application, use the guided version of the sam deploy command and follow these steps:

  1. Type sam deploy --guided, or sam deploy -g.
  2. For Stack Name, enter demo-app.
  3. Choose the same Region that you created the ECR repository in.
  4. Enter the Image Repository for the HelloWorldFunction (this is the repositoryUri of the ECR repository).
  5. For Confirm changes before deploy and Allow SAM CLI IAM role creation, keep the defaults.
  6. For HelloWorldFunction may not have authorization defined, Is this okay? Select Y.
  7. Keep the defaults for the remaining prompts.
Results of sam deploy --guided

Results of sam deploy –guided

AWS SAM uploads the container images to the ECR repo and deploys the application. During this process, you see a changeset along with the status of the deployment. When the deployment is complete, the stack outputs are then displayed. Use the HelloWorldApi endpoint to test your application in production.

Deploy outputs

Deploy outputs

When you use the guided version, AWS SAM saves the entered data to the samconfig.toml file. For subsequent deployments with the same parameters, use sam deploy. If you want to make a change, use the guided deployment again.

This example demonstrates deploying a serverless application with a single, container-based Lambda function in it. However, most serverless applications contain more than one Lambda function. To work with an application that has more than one Lambda function, follow these steps to add a second Lambda function to your application:

  1. Copy the hello-world directory using the terminal command cp -R hello-world hola-world
  2. Replace the contents of the template.yaml file with the following
    AWSTemplateFormatVersion: '2010-09-09'
    Transform: AWS::Serverless-2016-10-31
    Description: demo app
      
    Globals:
      Function:
        Timeout: 3
    
    Resources:
      HelloWorldFunction:
        Type: AWS::Serverless::Function
        Properties:
          PackageType: Image
          Events:
            HelloWorld:
              Type: Api
              Properties:
                Path: /hello
                Method: get
        Metadata:
          DockerTag: nodejs12.x-v1
          DockerContext: ./hello-world
          Dockerfile: Dockerfile
          
      HolaWorldFunction:
        Type: AWS::Serverless::Function
        Properties:
          PackageType: Image
          Events:
            HolaWorld:
              Type: Api
              Properties:
                Path: /hola
                Method: get
        Metadata:
          DockerTag: nodejs12.x-v1
          DockerContext: ./hola-world
          Dockerfile: Dockerfile
    
    Outputs:
      HelloWorldApi:
        Description: "API Gateway endpoint URL for Prod stage for Hello World function"
        Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/hello/"
      HolaWorldApi:
        Description: "API Gateway endpoint URL for Prod stage for Hola World function"
        Value: !Sub "https://${ServerlessRestApi}.execute-api.${AWS::Region}.amazonaws.com/Prod/hola/"
  3. Replace the contents of hola-world/app.js with the following
    let response;
    exports.lambdaHandler = async(event, context) => {
        try {
            response = {
                'statusCode': 200,
                'body': JSON.stringify({
                    message: 'hola world',
                })
            }
        }
        catch (err) {
            console.log(err);
            return err;
        }
        return response
    };
  4. Create an ECR repository for the HolaWorldFunction
    aws ecr create-repository --repository-name demo-app-hola-world \
    --image-tag-mutability IMMUTABLE --image-scanning-configuration scanOnPush=true
  5. Run the guided deploy to add the second repository:
    sam deploy -g

The AWS SAM guided deploy process allows you to provide the information again but prepopulates the defaults with previous values. Update the following:

  1. Keep the same stack name, Region, and Image Repository for HelloWorldFunction.
  2. Use the new repository for HolaWorldFunction.
  3. For the remaining steps, use the same values from before. For Lambda functions not to have authorization defined, enter Y.
Results of sam deploy --guided

Results of sam deploy –guided

Deploying in a CI/CD pipeline

Companies use continuous integration and continuous delivery (CI/CD) pipelines to automate application deployment. Because the process is automated, using an interactive process like a guided AWS SAM deployment is not possible.

Developers can use the packaging process in AWS SAM to prepare the artifacts for deployment and produce a separate template usable by AWS CloudFormation. The package command is:

sam package --output-template-file packaged-template.yaml \
--image-repository 5555555555.dkr.ecr.us-west-2.amazonaws.com/demo-app

For multiple repositories:

sam package --output-template-file packaged-template.yaml \ 
--image-repositories HelloWorldFunction=5555555555.dkr.ecr.us-west-2.amazonaws.com/demo-app-hello-world \
--image-repositories HolaWorldFunction=5555555555.dkr.ecr.us-west-2.amazonaws.com/demo-app-hola-world

Both cases create a file called packaged-template.yaml. The Lambda functions in this template have an added tag called ImageUri that points to the ECR repository and a tag for the Lambda function.

Packaged template

Packaged template

Using sam package to generate a separate CloudFormation template enables developers to separate artifact creation from application deployment. The deployment process can then be placed in an isolated stage allowing for greater customization and observability of the pipeline.

Conclusion

Container image support for Lambda enables larger application artifacts and the ability to use container tooling to manage Lambda images. AWS SAM simplifies application management by bringing these tools into the serverless development workflow.

In this post, you create a container-based serverless application in using command lines in the terminal. You create ECR repositories and associate them with functions in the application. You deploy the application from your local machine and package the artifacts for separate deployment in a CI/CD pipeline.

To learn more about serverless and AWS SAM, visit the Sessions with SAM series at s12d.com/sws and find more resources at serverlessland.com.

#ServerlessForEveryone

Optimizing batch processing with custom checkpoints in AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/optimizing-batch-processing-with-custom-checkpoints-in-aws-lambda/

AWS Lambda can process batches of messages from sources like Amazon Kinesis Data Streams or Amazon DynamoDB Streams. In normal operation, the processing function moves from one batch to the next to consume messages from the stream.

However, when an error occurs in one of the items in the batch, this can result in reprocessing some of the same messages in that batch. With the new custom checkpoint feature, there is now much greater control over how you choose to process batches containing failed messages.

This blog post explains the default behavior of batch failures and options available to developers to handle this error state. I also cover how to use this new checkpoint capability and show the benefits of using this feature in your stream processing functions.

Overview

When using a Lambda function to consume messages from a stream, the batch size property controls the maximum number of messages passed in each event.

The stream manages two internal pointers: a checkpoint and a current iterator. The checkpoint is the last known item position that was successfully processed. The current iterator is the position in the stream for the next read operation. In a successful operation, here are two batches processed from a stream with a batch size of 10:

Checkpoints and current iterators

  1. The first batch delivered to the Lambda function contains items 1–10. The function processes these items without error.
  2. The checkpoint moves to item 11. The next batch delivered to the Lambda function contains items 11–20.

In default operation, the processing of the entire batch must succeed or fail. If a single item fails processing and the function returns an error, the batch fails. The entire batch is then retried until the maximum retries is reached. This can result in the same failure occurring multiple times and unnecessary processing of individual messages.

You can also enable the BisectBatchOnFunctonError property in the event source mapping. If there is a batch failure, the calling service splits the failed batch into two and retries the half-batches separately. The process continues recursively until there is a single item in a batch or messages are processed successfully. For example, in a batch of 10 messages, where item number 5 is failing, the processing occurs as follows:

Bisect batch on error processing

  1. Batch 1 fails. It’s split into batches 2 and 3.
  2. Batch 2 fails, and batch 3 succeeds. Batch 2 is split into batches 4 and 5.
  3. Batch 4 fails and batch 5 succeeds. Batch 4 is split into batches 6 and 7.
  4. Batch 6 fails and batch 7 succeeds.

While this provides a way to process messages in a batch with one failing message, it results in multiple invocations of the function. In this example, message number 4 is processed four times before succeeding.

With the new custom checkpoint feature, you can return the sequence identifier for the failed messages. This provides more precise control over how to choose to continue processing the stream. For example, in a batch of 10 messages where the sixth message fails:

Custom checkpoint behavior

  1. Lambda processes the batch of messages, items 1–10. The sixth message fails and the function returns the failed sequence identifier.
  2. The checkpoint in the stream is moved to the position of the failed message. The batch is retried for only messages 6–10.

Existing stream processing behaviors

In the following examples, I use a DynamoDB table with a Lambda function that is invoked by the stream for the table. You can also use a Kinesis data stream if preferred, as the behavior is the same. The event source mapping is set to a batch size of 10 items so all the stream messages are passed in the event to a single Lambda invocation.

Architecture diagram

I use the following Node.js script to generate batches of 10 items in the table.

const AWS = require('aws-sdk')
AWS.config.update({ region: 'us-east-1' })
const docClient = new AWS.DynamoDB.DocumentClient()

const ddbTable = 'ddbTableName'
const BATCH_SIZE = 10

const createRecords = async () => {
  // Create envelope
  const params = {
    RequestItems: {}
  }
  params.RequestItems[ddbTable] = []

  // Add items to batch and write to DDB
  for (let i = 0; i < BATCH_SIZE; i++) {
    params.RequestItems[ddbTable].push({
      PutRequest: {
        Item: {
          ID: Date.now() + i
        }
      }
    })
  }
  await docClient.batchWrite(params).promise()
}

const main = async() => await createRecords()
main()

After running this script, there are 10 items in the DynamoDB table, which are then put into the DynamoDB stream for processing.

10 items in DynamoDB table

The processing Lambda function uses the following code. This contains a constant called FAILED_MESSAGE_NUM to force an error on the message with the corresponding index in the event batch:

exports.handler = async (event) => {
  console.log(JSON.stringify(event, null, 2))
  console.log('Records: ', event.Records.length)
  const FAILED_MESSAGE_NUM = 6
  
  let recordNum = 1
  let batchItemFailures = []

  event.Records.map((record) => {
    const sequenceNumber = record.dynamodb.SequenceNumber
    
    if ( recordNum === FAILED_MESSAGE_NUM ) {
      console.log('Error! ', sequenceNumber)
      throw new Error('kaboom')
    }
    console.log('Success: ', sequenceNumber)
    recordNum++
  })
}

The code uses the DynamoDB item’s sequence number, which is provided in each record of the stream event:

Item sequence number in event

In the default configuration of the event source mapping, the failure of message 6 causes the whole batch to fail. The entire batch is then retried multiple times. This appears in the CloudWatch Logs for the function:

Logs with retried batches

Next, I enable the bisect-on-error feature in the function’s event trigger. The first invocation fails as before but this causes two subsequent invocations with batches of five messages. The original batch is bisected. These batches complete processing successfully.

Logs with bisected batches

Configuring a custom checkpoint

Finally, I enable the custom checkpoint feature. This is configured in the Lambda function console by selecting the “Report batch item failures” check box in the DynamoDB trigger:

Add trigger settings

I update the processing Lambda function with the following code:

exports.handler = async (event) => {
  console.log(JSON.stringify(event, null, 2))
  console.log('Records: ', event.Records.length)
  const FAILED_MESSAGE_NUM = 4
  
  let recordNum = 1
  let sequenceNumber = 0
    
  try {
    event.Records.map((record) => {
      sequenceNumber = record.dynamodb.SequenceNumber
  
      if ( recordNum === FAILED_MESSAGE_NUM ) {
        throw new Error('kaboom')
      }
      console.log('Success: ', sequenceNumber)
      recordNum++
    })
  } catch (err) {
    // Return failed sequence number to the caller
    console.log('Failure: ', sequenceNumber)
    return { "batchItemFailures": [ {"itemIdentifier": sequenceNumber} ]  }
  }
}

In this version of the code, the processing of each message is wrapped in a try…catch block. When processing fails, the function stops processing any remaining messages. It returns the sequence number of the failed message in a JSON object:

{ 
  "batchItemFailures": [ 
    {
      "itemIdentifier": sequenceNumber
    }
  ]
}

The calling service then updates the checkpoint value with the sequence number provided. If the batchItemFailures array is empty, the caller assumes all messages have been processed correctly. If the batchItemFailures array contains multiple items, the lowest sequence number is used as the checkpoint.

In this example, I also modify the FAILED_MESSAGE_NUM constant to 4 in the Lambda function. This causes the fourth message in every batch to throw an error. After adding 10 items to the DynamoDB table, the CloudWatch log for the processing function shows:

Lambda function logs

This is how the stream of 10 messages has been processed using the custom checkpoint:

Custom checkpointing walkthrough

  1. In the first invocation, all 10 messages are in the batch. The fourth message throws an error. The function returns this position as the checkpoint.
  2. In the second invocation, messages 4–10 are in the batch. Message 7 throws an error and its sequence number is returned as the checkpoint.
  3. In the third invocation, the batch contains messages 7–10. Message 10 throws an error and its sequence number is now the returned checkpoint.
  4. The final invocation contains only message 10, which is successfully processed.

Using this approach, subsequent invocations do not receive messages that have been successfully processed previously.

Conclusion

The default behavior for stream processing in Lambda functions enables entire batches of messages to succeed or fail. You can also use batch bisecting functionality to retry batches iteratively if a single message fails. Now with custom checkpoints, you have more control over handling failed messages.

This post explains the three different processing modes and shows example code for handling failed messages. Depending upon your use-case, you can choose the appropriate mode for your workload. This can help reduce unnecessary Lambda invocations and prevent reprocessing of the same messages in batches containing failures.

To learn more about how to use this feature, read the developer documentation. To learn more about building with serverless technology, visit Serverless Land.

Using AWS Lambda for streaming analytics

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-aws-lambda-for-streaming-analytics/

AWS Lambda now supports streaming analytics calculations for Amazon Kinesis and Amazon DynamoDB. This allows developers to calculate aggregates in near-real time and pass state across multiple Lambda invocations. This feature provides an alternative way to build analytics in addition to services like Amazon Kinesis Data Analytics.

In this blog post, I explain how this feature works with Kinesis Data Streams and DynamoDB Streams, together with example use-cases.

Overview

For workloads using streaming data, data arrives continuously, often from different sources, and is processed incrementally. Discrete data processing tasks, such as operating on files, have a known beginning and end boundary for the data. For applications with streaming data, the processing function does not know when the data stream starts or ends. Consequently, this type of data is commonly processed in batches or windows.

Before this feature, Lambda-based stream processing was limited to working on the incoming batch of data. For example, in Amazon Kinesis Data Firehose, a Lambda function transforms the current batch of records with no information or state from previous batches. This is also the same for processing DynamoDB streams using Lambda functions. This existing approach works well for MapReduce or tasks focused exclusively on the date in the current batch.

Comparing DynamoDB and Kinesis streams

  1. DynamoDB streams invoke a processing Lambda function asynchronously. After processing, the function may then store the results in a downstream service, such as Amazon S3.
  2. Kinesis Data Firehose invokes a transformation Lambda function synchronously, which returns the transformed data back to the service.

This new feature introduces the concept of a tumbling window, which is a fixed-size, non-overlapping time interval of up to 15 minutes. To use this, you specify a tumbling window duration in the event-source mapping between the stream and the Lambda function. When you apply a tumbling window to a stream, items in the stream are grouped by window and sent to the processing Lambda function. The function returns a state value that is passed to the next tumbling window.

You can use this to calculate aggregates over multiple windows. For example, you can calculate the total value of a data item in a stream using 30-second tumbling windows:

Tumbling windows

  1. Integer data arrives in the stream at irregular time intervals.
  2. The first tumbling window consists of data in the 0–30 second range, passed to the Lambda function. It adds the items and returns the total of 6 as a state value.
  3. The second tumbling window invokes the Lambda function with the state value of 6 and the 30–60 second batch of stream data. This adds the items to the existing total, returning 18.
  4. The third tumbling window invokes the Lambda function with a state value of 18 and the next window of values. The running total is now 28 and returned as the state value.
  5. The fourth tumbling window invokes the Lambda function with a state value of 28 and the 90–120 second batch of data. The final total is 32.

This feature is useful in workloads where you need to calculate aggregates continuously. For example, for a retailer streaming order information from point-of-sale systems, it can generate near-live sales data for downstream reporting. Using Lambda to generate aggregates only requires minimal code, and the function can access other AWS services as needed.

Using tumbling windows with Lambda functions

When you configure an event source mapping between Kinesis or DynamoDB and a Lambda function, use the new setting, Tumbling window duration. This appears in the trigger configuration in the Lambda console:

Trigger configuration

You can also set this value in AWS CloudFormation and AWS SAM templates. After the event source mapping is created, events delivered to the Lambda function have several new attributes:

New attributes in events

These include:

  • Window start and end: the beginning and ending timestamps for the current tumbling window.
  • State: an object containing the state returned from the previous window, which is initially empty. The state object can contain up to 1 MB of data.
  • isFinalInvokeForWindow: indicates if this is the last invocation for the tumbling window. This only occurs once per window period.
  • isWindowTerminatedEarly: a window ends early only if the state exceeds the maximum allowed size of 1 MB.

In any tumbling window, there is a series of Lambda invocations following this pattern:

Tumbling window process in Lambda

  1. The first invocation contains an empty state object in the event. The function returns a state object containing custom attributes that are specific to the custom logic in the aggregation.
  2. The second invocation contains the state object provided by the first Lambda invocation. This function returns an updated state object with new aggregated values. Subsequent invocations follow this same sequence.
  3. The final invocation in the tumbling window has the isFinalInvokeForWindow flag set to the true. This contains the state returned by the most recent Lambda invocation. This invocation is responsible for storing the result in S3 or in another data store, such as a DynamoDB table. There is no state returned in this final invocation.

Using tumbling windows with DynamoDB

DynamoDB streams can invoke Lambda function using tumbling windows, enabling you to generate aggregates per shard. In this example, an ecommerce workload saves orders in a DynamoDB table and uses a tumbling window to calculate the near-real time sales total.

First, I create a DynamoDB table to capture the order data and a second DynamoDB table to store the aggregate calculation. I create a Lambda function with a trigger from the first orders table. The event source mapping is created with a Tumbling window duration of 30 seconds:

DynamoDB trigger configuration

I use the following code in the Lambda function:

const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.AWS_REGION })
const docClient = new AWS.DynamoDB.DocumentClient()
const TableName = 'tumblingWindowsAggregation'

function isEmpty(obj) { return Object.keys(obj).length === 0 }

exports.handler = async (event) => {
    // Save aggregation result in the final invocation
    if (event.isFinalInvokeForWindow) {
        console.log('Final: ', event)
        
        const params = {
          TableName,
          Item: {
            windowEnd: event.window.end,
            windowStart: event.window.start,
            sales: event.state.sales,
            shardId: event.shardId
          }
        }
        return await docClient.put(params).promise()
    }
    console.log(event)
    
    // Create the state object on first invocation or use state passed in
    let state = event.state

    if (isEmpty (state)) {
        state = {
            sales: 0
        }
    }
    console.log('Existing: ', state)

    // Process records with custom aggregation logic

    event.Records.map((item) => {
        // Only processing INSERTs
        if (item.eventName != "INSERT") return
        
        // Add sales to total
        let value = parseFloat(item.dynamodb.NewImage.sales.N)
        console.log('Adding: ', value)
        state.sales += value
    })

    // Return the state for the next invocation
    console.log('Returning state: ', state)
    return { state: state }
}

This function code processes the incoming event to aggregate a sales attribute, and return this aggregated result in a state object. In the final invocation, it stores the aggregated value in another DynamoDB table.

I then use this Node.js script to generate random sample order data:

const AWS = require('aws-sdk')
AWS.config.update({ region: 'us-east-1' })
const docClient = new AWS.DynamoDB.DocumentClient()

const TableName = 'tumblingWindows'
const ITERATIONS = 100
const SLEEP_MS = 100

let totalSales = 0

function sleep(ms) { 
  return new Promise(resolve => setTimeout(resolve, ms));
}

const createSales = async () => {
  for (let i = 0; i < ITERATIONS; i++) {

    let sales = Math.round (parseFloat(100 * Math.random()))
    totalSales += sales
    console.log ({i, sales, totalSales})

    await docClient.put ({
      TableName,
      Item: {
        ID: Date.now().toString(),
        sales,
        ITERATIONStamp: new Date().toString()
      }
    }).promise()
    await sleep(SLEEP_MS)
  }
}

const main = async() => {
  await createSales()
  console.log('Total Sales: ', totalSales)
}

main()

Once the script is complete, the console shows the individual order transactions and the total sales:

Script output

After the tumbling window duration is finished, the second DynamoDB table shows the aggregate values calculated and stored by the Lambda function:

Aggregate values in second DynamoDB table

Since aggregation for each shard is independent, the totals are stored by shardId. If I continue to run the test data script, the aggregation function continues to calculate and store more totals per tumbling window period.

Using tumbling windows with Kinesis

Kinesis data streams can also invoke a Lambda function using a tumbling window in a similar way. The biggest difference is that you control how many shards are used in the data stream. Since aggregation occurs per shard, this controls the total number aggregate results per tumbling window.

Using the same sales example, first I create a Kinesis data stream with one shard. I use the same DynamoDB tables from the previous example, then create a Lambda function with a trigger from the first orders table. The event source mapping is created with a Tumbling window duration of 30 seconds:

Kinesis trigger configuration

I use the following code in the Lambda function, modified to process the incoming Kinesis data event:

const AWS = require('aws-sdk')
AWS.config.update({ region: process.env.AWS_REGION })
const docClient = new AWS.DynamoDB.DocumentClient()
const TableName = 'tumblingWindowsAggregation'

function isEmpty(obj) {
    return Object.keys(obj).length === 0
}

exports.handler = async (event) => {

    // Save aggregation result in the final invocation
    if (event.isFinalInvokeForWindow) {
        console.log('Final: ', event)
        
        const params = {
          TableName,
          Item: {
            windowEnd: event.window.end,
            windowStart: event.window.start,
            sales: event.state.sales,
            shardId: event.shardId
          }
        }
        console.log({ params })
        await docClient.put(params).promise()

    }
    console.log(JSON.stringify(event, null, 2))
    
    // Create the state object on first invocation or use state passed in
    let state = event.state

    if (isEmpty (state)) {
        state = {
            sales: 0
        }
    }
    console.log('Existing: ', state)

    // Process records with custom aggregation logic

    event.Records.map((record) => {
        const payload = Buffer.from(record.kinesis.data, 'base64').toString('ascii')
        const item = JSON.parse(payload).Item

        // // Add sales to total
        let value = parseFloat(item.sales)
        console.log('Adding: ', value)
        state.sales += value
    })

    // Return the state for the next invocation
    console.log('Returning state: ', state)
    return { state: state }
}

This function code processes the incoming event in the same way as the previous example. I then use this Node.js script to generate random sample order data, modified to put the data on the Kinesis stream:

const AWS = require('aws-sdk')
AWS.config.update({ region: 'us-east-1' })
const kinesis = new AWS.Kinesis()

const StreamName = 'testStream'
const ITERATIONS = 100
const SLEEP_MS = 10

let totalSales = 0

function sleep(ms) { 
  return new Promise(resolve => setTimeout(resolve, ms));
}

const createSales = async() => {

  for (let i = 0; i < ITERATIONS; i++) {

    let sales = Math.round (parseFloat(100 * Math.random()))
    totalSales += sales
    console.log ({i, sales, totalSales})

    const data = {
      Item: {
        ID: Date.now().toString(),
        sales,
        timeStamp: new Date().toString()
      }
    }

    await kinesis.putRecord({
      Data: Buffer.from(JSON.stringify(data)),
      PartitionKey: 'PK1',
      StreamName
    }).promise()
    await sleep(SLEEP_MS)
  }
}

const main = async() => {
  await createSales()
}

main()

Once the script is complete, the console shows the individual order transactions and the total sales:

Console output

After the tumbling window duration is finished, the second DynamoDB table shows the aggregate values calculated and stored by the Lambda function:

Aggregate values in second DynamoDB table

As there is only one shard in this Kinesis stream, there is only one aggregation value for all the data items in the test.

Conclusion

With tumbling windows, you can calculate aggregate values in near-real time for Kinesis data streams and DynamoDB streams. Unlike existing stream-based invocations, state can be passed forward by Lambda invocations. This makes it easier to calculate sums, averages, and counts on values across multiple batches of data.

In this post, I walk through an example that aggregates sales data stored in Kinesis and DynamoDB. In each case, I create an aggregation function with an event source mapping that uses the new tumbling window duration attribute. I show how state is passed between invocations and how to persist the aggregated value at the end of the tumbling window.

To learn more about how to use this feature, read the developer documentation. To learn more about building with serverless technology, visit Serverless Land.

Using self-hosted Apache Kafka as an event source for AWS Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-self-hosted-apache-kafka-as-an-event-source-for-aws-lambda/

Apache Kafka is an open source event streaming platform used to support workloads such as data pipelines and streaming analytics. Apache Kafka is a distributed streaming platform that it is conceptually similar to Amazon Kinesis.

With the launch of Kafka as an event source for Lambda, you can now consume messages from a topic in a Lambda function. This makes it easier to integrate your self-hosted Kafka clusters with downstream serverless workflows.

In this blog post, I explain how to set up an Apache Kafka cluster on Amazon EC2 and configure key elements in the networking configuration. I also show how to create a Lambda function to consume messages from a Kafka topic. Although the process is similar to using Amazon Managed Streaming for Apache Kafka (Amazon MSK) as an event source, there are also some important differences.

Overview

Using Kafka as an event source operates in a similar way to using Amazon SQS or Amazon Kinesis. In all cases, the Lambda service internally polls for new records or messages from the event source, and then synchronously invokes the target Lambda function. Lambda reads the messages in batches and provides the message batches to your function in the event payload.

Lambda is a consumer application for your Kafka topic. It processes records from one or more partitions and sends the payload to the target function. Lambda continues to process batches until there are no more messages in the topic.

Configuring networking for self-hosted Kafka

It’s best practice to deploy the Amazon EC2 instances running Kafka in private subnets. For the Lambda function to poll the Kafka instances, you must ensure that there is a NAT Gateway running in the public subnet of each Region.

It’s possible to route the traffic to a single NAT Gateway in one AZ for test and development workloads. For redundancy in production workloads, it’s recommended that there is one NAT Gateway available in each Availability Zone. This walkthrough creates the following architecture:

Self-hosted Kafka architecture

  1. Deploy a VPC with public and private subnets and a NAT Gateway that enables internet access. To configure this infrastructure with AWS CloudFormation, deploy this template.
  2. From the VPC console, edit the default security group created by this template to provide inbound access to the following ports:
    • Custom TCP: ports 2888–3888 from all sources.
    • SSH (port 22), restricted to your own IP address.
    • Custom TCP: port 2181 from all sources.
    • Custom TCP: port 9092 from all sources.
    • All traffic from the same security group identifier.

Security Group configuration

Deploying the EC2 instances and installing Kafka

Next, you deploy the EC2 instances using this network configuration and install the Kafka application:

  1. From the EC2 console, deploy an instance running Ubuntu Server 18.04 LTS. Ensure that there is one instance in each private subnet, in different Availability Zones. Assign the default security group configured by the template.
  2. Next, deploy another EC2 instance in either of the public subnets. This is a bastion host used to access the private instances. Assign the default security group configured by the template.EC2 instances
  3. Connect to the bastion host, then SSH to the first private EC2 instance using the method for your preferred operating system. This post explains different methods. Repeat the process in another terminal for the second private instance.EC2 terminals
  4. On each instance, install Java:
    sudo add-apt-repository ppa:webupd8team/java
    sudo apt update
    sudo apt install openjdk-8-jdk
    java –version
  5. On each instance, install Kafka:
    wget http://www-us.apache.org/dist/kafka/2.3.1/kafka_2.12-2.3.1.tgz
    tar xzf kafka_2.12-2.3.1.tgz
    ln -s kafka_2.12-2.3.1 kafka

Configure and start Zookeeper

Configure and start the Zookeeper service that manages the Kafka brokers:

  1. On the first instance, configure the Zookeeper ID:
    cd kafka
    mkdir /tmp/zookeeper
    touch /tmp/zookeeper/myid
    echo "1" >> /tmp/zookeeper/myid
  2. Repeat the process on the second instance, using a different ID value:
    cd kafka
    mkdir /tmp/zookeeper
    touch /tmp/zookeeper/myid
    echo "2" >> /tmp/zookeeper/myid
  3. On the first instance, edit the config/zookeeper.properties file, adding the private IP address of the second instance:
    initLimit=5
    syncLimit=2
    tickTime=2000
    # list of servers: <ip>:2888:3888
    server.1=0.0.0.0:2888:3888 
    server.2=<<IP address of second instance>>:2888:3888
    
  4. On the second instance, edit the config/zookeeper.properties file, adding the private IP address of the first instance:
    initLimit=5
    syncLimit=2
    tickTime=2000
    # list of servers: <ip>:2888:3888
    server.1=<<IP address of first instance>>:2888:3888 
    server.2=0.0.0.0:2888:3888
  5. On each instance, start Zookeeper:bin/zookeeper-server-start.sh config/zookeeper.properties

Configure and start Kafka

Configure and start the Kafka broker:

  1. On the first instance, edit the config/server.properties file:
    broker.id=1
    zookeeper.connect=0.0.0.0:2181, =<<IP address of second instance>>:2181
  2. On the second instance, edit the config/server.properties file:
    broker.id=2
    zookeeper.connect=0.0.0.0:2181, =<<IP address of first instance>>:2181
  3. Start Kafka on each instance:
    bin/kafka-server-start.sh config/server.properties

At the end of this process, Zookeeper and Kafka are running on both instances. If you use separate terminals, it looks like this:

Zookeeper and Kafka terminals

Configuring and publishing to a topic

Kafka organizes channels of messages around topics, which are virtual groups of one or many partitions across Kafka brokers in a cluster. Multiple producers can send messages to Kafka topics, which can then be routed to and processed by multiple consumers. Producers publish to the tail of a topic and consumers read the topic at their own pace.

From either of the two instances:

  1. Create a new topic called test:
    bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 2 --partitions 2 --topic test
  2. Start a producer:
    bin/kafka-console-producer.sh --broker-list localhost:9092 –topic
  3. Enter test messages to check for successful publication:Sending messages to the Kafka topic

At this point, you can successfully publish messages to your self-hosted Kafka cluster. Next, you configure a Lambda function as a consumer for the test topic on this cluster.

Configuring the Lambda function and event source mapping

You can create the Lambda event source mapping using the AWS CLI or AWS SDK, which provide the CreateEventSourceMapping API. In this walkthrough, you use the AWS Management Console to create the event source mapping.

Create a Lambda function that uses the self-hosted cluster and topic as an event source:

  1. From the Lambda console, select Create function.
  2. Enter a function name, and select Node.js 12.x as the runtime.
  3. Select the Permissions tab, and select the role name in the Execution role panel to open the IAM console.
  4. Choose Add inline policy and create a new policy called SelfHostedKafkaPolicy with the following permissions. Replace the resource example with the ARNs of your instances:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "ec2:CreateNetworkInterface",
                    "ec2:DescribeNetworkInterfaces",
                    "ec2:DescribeVpcs",
                    "ec2:DeleteNetworkInterface",
                    "ec2:DescribeSubnets",
                    "ec2:DescribeSecurityGroups",
                    "logs:CreateLogGroup",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents"
                ],
                "Resource": " arn:aws:ec2:<REGION>:<ACCOUNT_ID>:instance/<instance-id>"
            }
        ]
    }
    

    Create policy

  5. Choose Create policy and ensure that the policy appears in Permissions policies.IAM role page
  6. Back in the Lambda function, select the Configuration tab. In the Designer panel, choose Add trigger.
  7. In the dropdown, select Apache Kafka:
    • For Bootstrap servers, add each of the two instances private IPv4 DNS addresses with port 9092 appended.
    • For Topic name, enter ‘test’.
    • Enter your preferred batch size and starting position values (see this documentation for more information).
    • For VPC, select the VPC created by the template.
    • For VPC subnets, select the two private subnets.
    • For VPC security groups, select the default security group.
    • Choose Add.

Add trigger configuration

The trigger’s status changes to Enabled in the Lambda console after a few seconds. It then takes several minutes for the trigger to receive messages from the Kafka cluster.

Testing the Lambda function

At this point, you have created a VPC with two private and public subnets and a NAT Gateway. You have created a Kafka cluster on two EC2 instances in private subnets. You set up a target Lambda function with the necessary IAM permissions. Next, you publish messages to the test topic in Kafka and see the resulting invocation in the logs for the Lambda function.

  1. In the Function code panel, replace the contents of index.js with the following code and choose Deploy:
    exports.handler = async (event) => {
        // Iterate through keys
        for (let key in event.records) {
          console.log('Key: ', key)
          // Iterate through records
          event.records[key].map((record) => {
            console.log('Record: ', record)
            // Decode base64
            const msg = Buffer.from(record.value, 'base64').toString()
            console.log('Message:', msg)
          }) 
        }
    }
  2. Back in the terminal with the producer script running, enter a test message:Send test message in Kafka
  3. In the Lambda function console, select the Monitoring tab then choose View logs in CloudWatch. In the latest log stream, you see the original event and the decoded message:Log events output

Using Lambda as event source

The Lambda function target in the event source mapping does not need to be connected to a VPC to receive messages from the private instance hosting Kafka. However, you must provide details of the VPC, subnets, and security groups in the event source mapping for the Kafka cluster.

The Lambda function must have permission to describe VPCs and security groups, and manage elastic network interfaces. These execution roles permissions are:

  • ec2:CreateNetworkInterface
  • ec2:DescribeNetworkInterfaces
  • ec2:DescribeVpcs
  • ec2:DeleteNetworkInterface
  • ec2:DescribeSubnets
  • ec2:DescribeSecurityGroups

The event payload for the Lambda function contains an array of records. Each array item contains details of the topic and Kafka partition identifier, together with a timestamp and base64 encoded message:

Event payload example

There is an important difference in the way the Lambda service connects to the self-hosted Kafka cluster compared with Amazon MSK. MSK encrypts data in transit by default so the broker connection defaults to using TLS. With a self-hosted cluster, TLS authentication is not supported when using the Apache Kafka event source. Instead, if accessing brokers over the internet, the event source uses SASL/SCRAM authentication, which can be configured in the event source mapping:

SASL/SCRAM configuration

To learn how to configure SASL/SCRAM authentication your self-hosted Kafka cluster, see this documentation.

Conclusion

Lambda now supports self-hosted Kafka as an event source so you can invoke Lambda functions from messages in Kafka topics to integrate into other downstream serverless workflows.

This post shows how to configure a self-hosted Kafka cluster on EC2 and set up the network configuration. I also cover how to set up the event source mapping in Lambda and test a function to decode the messages sent from Kafka.

To learn more about how to use this feature, read the documentation. For more serverless learning resource, visit Serverless Land.

Supporting Jurisdictional Restrictions for Durable Objects

Post Syndicated from Greg McKeon original https://blog.cloudflare.com/supporting-jurisdictional-restrictions-for-durable-objects/

Supporting Jurisdictional Restrictions for Durable Objects

Supporting Jurisdictional Restrictions for Durable Objects

Over the past week, you’ve heard how Cloudflare is making it easy for our customers to control where their data is stored and protected.

We’re not the only ones building these data controls. Around the world, companies are working to figure out where and how to store customer data in a way that is compliant with data localization obligations. For developers, this means new deployment models and new headaches — wrangling infrastructure in multiple regions, partitioning user data based on location, and staying on top of the latest rules from regulators.

Durable Objects, currently in limited beta, already make it easy for customers to manage state on Cloudflare Workers without worrying about provisioning infrastructure. Today, we’re announcing Jurisdictional Restrictions for Durable Objects, which ensure that a Durable Object only stores and processes data in a given geographical region. Jurisdictional Restrictions make it easy for developers to build serverless, stateful applications that not only comply with today’s regulations, but can handle new and updated policies as new regulations are added.

How Jurisdictional Restrictions Work

When creating a Durable Object, developers generate a unique ID that lets a Cloudflare Worker communicate with the Object.

Let’s say I want to create a Durable Object that represents a specific user of my application:

async function handle(request) {
    let objectId = USERS.newUniqueId();
    let user = await USERS.get(objectId);
}

The unique ID encodes metadata for the Workers runtime, including a mapping to a specific Cloudflare data center. That data center is responsible for handling the creation of the Object and maintaining a routing table entry, so that a Worker can communicate with the Object if the Object migrates to another Cloudflare data center.

If the user is an EU data subject, I may want to ensure that the Durable Object that handles their data only stores and processes data inside of the EU. I can do that when I generate their Object ID, which encodes a restriction that this Durable Object can only be handled by a data center in the EU.

async function handle(request) {
    let objectId = USERS.newUniqueId({jurisdiction: "eu"});
    let user = await USERS.get(objectId);
}

There are no servers to spin up and no databases to maintain. Handling a new set of regional restrictions will be as easy as passing a different string at ID generation.

Today, we only support the EU jurisdiction, but we’ll be adding more based on developer demand.

By setting restrictions at a per-object level, it becomes easy to ensure compliance without sacrificing developer productivity. Applications running on Durable Objects just need to identify the jurisdictional rules a given Object should follow and set the corresponding rule at creation time. Gone is the need to run multiple clusters of infrastructure across cloud provider regions to stay compliant — Durable Objects are both globally accessible and capable of partitioning state with no infrastructure overhead.

In the future, we’ll add additional features to Jurisdictional Restrictions — including the ability to migrate your Objects between Jurisdictions to handle changes in regulations.

Under the hood with Durable Object ID generation

Durable Objects support two types of IDs: system-generated, where the system creates a unique ID for you, and user-generated, where a user passes in an identifier to access the Durable Object. You can think of the user-provided identifier as a seed to a hash function that determines the data center the object starts in.

By default with system-generated IDs, we construct the ID so that it maps to a data center near the Worker that generated the ID. This data center is responsible for creating the Object and storing a routing record if that Object migrates.

If the user passes in a Jurisdictional Restriction, we instead encode in the ID a mapping to a jurisdiction, which encodes a list of data centers that adhere to the rules of the Jurisdictional Restriction. We guarantee that the data center we select for creating the Object is in this list and that we will not migrate the Object to a data center that isn’t in this list. In the case of the ‘eu’ jurisdiction, that maps to one of Cloudflare’s data centers in the EU.

For user-generated IDs, though, we cannot encode this data in the ID, since we must use the string the user passed us to generate the ID! This is because requests may originate anywhere in the world, and they need to know where to find an Object without depending on coordination. For now, this means we do not support Jurisdictional Restrictions in combination with user-generated IDs.

Join the Durable Objects limited beta

Durable Objects are currently in an invite-only beta, while we scale up our systems and build out additional features. If you’re interested in using Durable Objects to meet your compliance requirements, reach out to us with your use case!

Request a beta invite

Use Macie to discover sensitive data as part of automated data pipelines

Post Syndicated from Brandon Wu original https://aws.amazon.com/blogs/security/use-macie-to-discover-sensitive-data-as-part-of-automated-data-pipelines/

Data is a crucial part of every business and is used for strategic decision making at all levels of an organization. To extract value from their data more quickly, Amazon Web Services (AWS) customers are building automated data pipelines—from data ingestion to transformation and analytics. As part of this process, my customers often ask how to prevent sensitive data, such as personally identifiable information, from being ingested into data lakes when it’s not needed. They highlight that this challenge is compounded when ingesting unstructured data—such as files from process reporting, text files from chat transcripts, and emails. They also mention that identifying sensitive data inadvertently stored in structured data fields—such as in a comment field stored in a database—is also a challenge.

In this post, I show you how to integrate Amazon Macie as part of the data ingestion step in your data pipeline. This solution provides an additional checkpoint that sensitive data has been appropriately redacted or tokenized prior to ingestion. Macie is a fully managed data security and privacy service that uses machine learning and pattern matching to discover sensitive data in AWS.

When Macie discovers sensitive data, the solution notifies an administrator to review the data and decide whether to allow the data pipeline to continue ingesting the objects. If allowed, the objects will be tagged with an Amazon Simple Storage Service (Amazon S3) object tag to identify that sensitive data was found in the object before progressing to the next stage of the pipeline.

This combination of automation and manual review helps reduce the risk that sensitive data—such as personally identifiable information—will be ingested into a data lake. This solution can be extended to fit your use case and workflows. For example, you can define custom data identifiers as part of your scans, add additional validation steps, create Macie suppression rules to archive findings automatically, or only request manual approvals for findings that meet certain criteria (such as high severity findings).

Solution overview

Many of my customers are building serverless data lakes with Amazon S3 as the primary data store. Their data pipelines commonly use different S3 buckets at each stage of the pipeline. I refer to the S3 bucket for the first stage of ingestion as the raw data bucket. A typical pipeline might have separate buckets for raw, curated, and processed data representing different stages as part of their data analytics pipeline.

Typically, customers will perform validation and clean their data before moving it to a raw data zone. This solution adds validation steps to that pipeline after preliminary quality checks and data cleaning is performed, noted in blue (in layer 3) of Figure 1. The layers outlined in the pipeline are:

  1. Ingestion – Brings data into the data lake.
  2. Storage – Provides durable, scalable, and secure components to store the data—typically using S3 buckets.
  3. Processing – Transforms data into a consumable state through data validation, cleanup, normalization, transformation, and enrichment. This processing layer is where the additional validation steps are added to identify instances of sensitive data that haven’t been appropriately redacted or tokenized prior to consumption.
  4. Consumption – Provides tools to gain insights from the data in the data lake.

 

Figure 1: Data pipeline with sensitive data scan

Figure 1: Data pipeline with sensitive data scan

The application runs on a scheduled basis (four times a day, every 6 hours by default) to process data that is added to the raw data S3 bucket. You can customize the application to perform a sensitive data discovery scan during any stage of the pipeline. Because most customers do their extract, transform, and load (ETL) daily, the application scans for sensitive data on a scheduled basis before any crawler jobs run to catalog the data and after typical validation and data redaction or tokenization processes complete.

You can expect that this additional validation will add 5–10 minutes to your pipeline execution at a minimum. The validation processing time will scale linearly based on object size, but there is a start-up time per job that is constant.

If sensitive data is found in the objects, an email is sent to the designated administrator requesting an approval decision, which they indicate by selecting the link corresponding to their decision to approve or deny the next step. In most cases, the reviewer will choose to adjust the sensitive data cleanup processes to remove the sensitive data, deny the progression of the files, and re-ingest the files in the pipeline.

Additional considerations for deploying this application for regular use are discussed at the end of the blog post.

Application components

The following resources are created as part of the application:

Note: the application uses various AWS services, and there are costs associated with these resources after the Free Tier usage. See AWS Pricing for details. The primary drivers of the solution cost will be the amount of data ingested through the pipeline, both for Amazon S3 storage and data processed for sensitive data discovery with Macie.

The architecture of the application is shown in Figure 2 and described in the text that follows.
 

Figure 2: Application architecture and logic

Figure 2: Application architecture and logic

Application logic

  1. Objects are uploaded to the raw data S3 bucket as part of the data ingestion process.
  2. A scheduled EventBridge rule runs the sensitive data scan Step Functions workflow.
  3. triggerMacieScan Lambda function moves objects from the raw data S3 bucket to the scan stage S3 bucket.
  4. triggerMacieScan Lambda function creates a Macie sensitive data discovery job on the scan stage S3 bucket.
  5. checkMacieStatus Lambda function checks the status of the Macie sensitive data discovery job.
  6. isMacieStatusCompleteChoice Step Functions Choice state checks whether the Macie sensitive data discovery job is complete.
    1. If yes, the getMacieFindingsCount Lambda function runs.
    2. If no, the Step Functions Wait state waits 60 seconds and then restarts Step 5.
  7. getMacieFindingsCount Lambda function counts all of the findings from the Macie sensitive data discovery job.
  8. isSensitiveDataFound Step Functions Choice state checks whether sensitive data was found in the Macie sensitive data discovery job.
    1. If there was sensitive data discovered, run the triggerManualApproval Lambda function.
    2. If there was no sensitive data discovered, run the moveAllScanStageS3Files Lambda function.
  9. moveAllScanStageS3Files Lambda function moves all of the objects from the scan stage S3 bucket to the scanned data S3 bucket.
  10. triggerManualApproval Lambda function tags and moves objects with sensitive data discovered to the manual review S3 bucket, and moves objects with no sensitive data discovered to the scanned data S3 bucket. The function then sends a notification to the ApprovalRequestNotification Amazon SNS topic as a notification that manual review is required.
  11. Email is sent to the email address that’s subscribed to the ApprovalRequestNotification Amazon SNS topic (from the application deployment template) for the manual review user with the option to Approve or Deny pipeline ingestion for these objects.
  12. Manual review user assesses the objects with sensitive data in the manual review S3 bucket and selects the Approve or Deny links in the email.
  13. The decision request is sent from the Amazon API Gateway to the receiveApprovalDecision Lambda function.
  14. manualApprovalChoice Step Functions Choice state checks the decision from the manual review user.
    1. If denied, run the deleteManualReviewS3Files Lambda function.
    2. If approved, run the moveToScannedDataS3Files Lambda function.
  15. deleteManualReviewS3Files Lambda function deletes the objects from the manual review S3 bucket.
  16. moveToScannedDataS3Files Lambda function moves the objects from the manual review S3 bucket to the scanned data S3 bucket.
  17. The next step of the automated data pipeline will begin with the objects in the scanned data S3 bucket.

Prerequisites

For this application, you need the following prerequisites:

You can use AWS Cloud9 to deploy the application. AWS Cloud9 includes the AWS CLI and AWS SAM CLI to simplify setting up your development environment.

Deploy the application with AWS SAM CLI

You can deploy this application using the AWS SAM CLI. AWS SAM uses AWS CloudFormation as the underlying deployment mechanism. AWS SAM is an open-source framework that you can use to build serverless applications on AWS.

To deploy the application

  1. Initialize the serverless application using the AWS SAM CLI from the GitHub project in the aws-samples repository. This will clone the project locally which includes the source code for the Lambda functions, Step Functions state machine definition file, and the AWS SAM template. On the command line, run the following:
    sam init --location gh: aws-samples/amazonmacie-datapipeline-scan
    

    Alternatively, you can clone the Github project directly.

  2. Deploy your application to your AWS account. On the command line, run the following:
    sam deploy --guided
    

    Complete the prompts during the guided interactive deployment. The first deployment prompt is shown in the following example.

    Configuring SAM deploy
    ======================
    
            Looking for config file [samconfig.toml] :  Found
            Reading default arguments  :  Success
    
            Setting default arguments for 'sam deploy'
            =========================================
            Stack Name [maciepipelinescan]:
    

  3. Settings:
    • Stack Name – Name of the CloudFormation stack to be created.
    • AWS RegionRegion—for example, us-west-2, eu-west-1, ap-southeast-1—to deploy the application to. This application was tested in the us-west-2 and ap-southeast-1 Regions. Before selecting a Region, verify that the services you need are available in those Regions (for example, Macie and Step Functions).
    • Parameter StepFunctionName – Name of the Step Functions state machine to be created—for example, maciepipelinescanstatemachine).
    • Parameter BucketNamePrefix – Prefix to apply to the S3 buckets to be created (S3 bucket names are globally unique, so choosing a random prefix helps ensure uniqueness).
    • Parameter ApprovalEmailDestination – Email address to receive the manual review notification.
    • Parameter EnableMacie – Whether you need Macie enabled in your account or Region. You can select yes or no; select yes if you need Macie to be enabled for you as part of this template, select no, if you already have Macie enabled.
  4. Confirm changes and provide approval for AWS SAM CLI to deploy the resources to your AWS account by responding y to prompts, as shown in the following example. You can accept the defaults for the SAM configuration file and SAM configuration environment prompts.
    #Shows you resources changes to be deployed and require a 'Y' to initiate deploy
    Confirm changes before deploy [y/N]: y
    #SAM needs permission to be able to create roles to connect to the resources in your template
    Allow SAM CLI IAM role creation [Y/n]: y
    ReceiveApprovalDecisionAPI may not have authorization defined, Is this okay? [y/N]: y
    ReceiveApprovalDecisionAPI may not have authorization defined, Is this okay? [y/N]: y
    Save arguments to configuration file [Y/n]: y
    SAM configuration file [samconfig.toml]: 
    SAM configuration environment [default]:
    

    Note: This application deploys an Amazon API Gateway with two REST API resources without authorization defined to receive the decision from the manual review step. You will be prompted to accept each resource without authorization. A token (Step Functions taskToken) is used to authenticate the requests.

  5. This creates an AWS CloudFormation changeset. Once the changeset creation is complete, you must provide a final confirmation of y to Deploy the changeset? [y/N] when prompted as shown in the following example.
    Changeset created successfully. arn:aws:cloudformation:ap-southeast-1:XXXXXXXXXXXX:changeSet/samcli-deploy1605213119/db681961-3635-4305-b1c7-dcc754c7XXXX
    
    
    Previewing CloudFormation changeset before deployment
    ======================================================
    Deploy this changeset? [y/N]:
    

Your application is deployed to your account using AWS CloudFormation. You can track the deployment events in the command prompt or via the AWS CloudFormation console.

After the application deployment is complete, you must confirm the subscription to the Amazon SNS topic. An email will be sent to the email address entered in Step 3 with a link that you need to select to confirm the subscription. This confirmation provides opt-in consent for AWS to send emails to you via the specified Amazon SNS topic. The emails will be notifications of potentially sensitive data that need to be approved. If you don’t see the verification email, be sure to check your spam folder.

Test the application

The application uses an EventBridge scheduled rule to start the sensitive data scan workflow, which runs every 6 hours. You can manually start an execution of the workflow to verify that it’s working. To test the function, you will need a file that contains data that matches your rules for sensitive data. For example, it is easy to create a spreadsheet, document, or text file that contains names, addresses, and numbers formatted like credit card numbers. You can also use this generated sample data to test Macie.

We will test by uploading a file to our S3 bucket via the AWS web console. If you know how to copy objects from the command line, that also works.

Upload test objects to the S3 bucket

  1. Navigate to the Amazon S3 console and upload one or more test objects to the <BucketNamePrefix>-data-pipeline-raw bucket. <BucketNamePrefix> is the prefix you entered when deploying the application in the AWS SAM CLI prompts. You can use any objects as long as they’re a supported file type for Amazon Macie. I suggest uploading multiple objects, some with and some without sensitive data, in order to see how the workflow processes each.

Start the Scan State Machine

  1. Navigate to the Step Functions state machines console. If you don’t see your state machine, make sure you’re connected to the same region that you deployed your application to.
  2. Choose the state machine you created using the AWS SAM CLI as seen in Figure 3. The example state machine is maciepipelinescanstatemachine, but you might have used a different name in your deployment.
     
    Figure 3: AWS Step Functions state machines console

    Figure 3: AWS Step Functions state machines console

  3. Select the Start execution button and copy the value from the Enter an execution name – optional box. Change the Input – optional value replacing <execution id> with the value just copied as follows:
    {
        “id”: “<execution id>”
    }
    

    In my example, the <execution id> is fa985a4f-866b-b58b-d91b-8a47d068aa0c from the Enter an execution name – optional box as shown in Figure 4. You can choose a different ID value if you prefer. This ID is used by the workflow to tag the objects being processed to ensure that only objects that are scanned continue through the pipeline. When the EventBridge scheduled event starts the workflow as scheduled, an ID is included in the input to the Step Functions workflow. Then select Start execution again.
     

    Figure 4: New execution dialog box

    Figure 4: New execution dialog box

  4. You can see the status of your workflow execution in the Graph inspector as shown in Figure 5. In the figure, the workflow is at the pollForCompletionWait step.
     
    Figure 5: AWS Step Functions graph inspector

    Figure 5: AWS Step Functions graph inspector

The sensitive discovery job should run for about five to ten minutes. The jobs scale linearly based on object size, but there is a start-up time per job that is constant. If sensitive data is found in the objects uploaded to the <BucketNamePrefix>-data-pipeline-upload S3 bucket, an email is sent to the address provided during the AWS SAM deployment step, notifying the recipient requesting of the need for an approval decision, which they indicate by selecting the link corresponding to their decision to approve or deny the next step as shown in Figure 6.
 

Figure 6: Sensitive data identified email

Figure 6: Sensitive data identified email

When you receive this notification, you can investigate the findings by reviewing the objects in the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket. Based on your review, you can either apply remediation steps to remove any sensitive data or allow the data to proceed to the next step of the data ingestion pipeline. You should define a standard response process to address discovery of sensitive data in the data pipeline. Common remediation steps include review of the files for sensitive data, deleting the files that you do not want to progress, and updating the ETL process to redact or tokenize sensitive data when re-ingesting into the pipeline. When you re-ingest the files into the pipeline without sensitive data, the files will not be flagged by Macie.

The workflow performs the following:

  • If you select Approve, the files are moved to the <BucketNamePrefix>-data-pipeline-scanned-data S3 bucket with an Amazon S3 SensitiveDataFound object tag with a value of true.
  • If you select Deny, the files are deleted from the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket.
  • If no action is taken, the Step Functions workflow execution times out after five days and the file will automatically be deleted from the <BucketNamePrefix>-data-pipeline-manual-review S3 bucket after 10 days.

Clean up the application

You’ve successfully deployed and tested the sensitive data pipeline scan workflow. To avoid ongoing charges for resources you created, you should delete all associated resources by deleting the CloudFormation stack. In order to delete the CloudFormation stack, you must first delete all objects that are stored in the S3 buckets that you created for the application.

To delete the application

  1. Empty the S3 buckets created in this application (<BucketNamePrefix>-data-pipeline-raw S3 bucket, <BucketNamePrefix>-data-pipeline-scan-stage, <BucketNamePrefix>-data-pipeline-manual-review, and <BucketNamePrefix>-data-pipeline-scanned-data).
  2. Delete the CloudFormation stack used to deploy the application.

Considerations for regular use

Before using this application in a production data pipeline, you will need to stop and consider some practical matters. First, the notification mechanism used when sensitive data is identified in the objects is email. Email doesn’t scale: you should expand this solution to integrate with your ticketing or workflow management system. If you choose to use email, subscribe a mailing list so that the work of reviewing and responding to alerts is shared across a team.

Second, the application is run on a scheduled basis (every 6 hours by default). You should consider starting the application when your preliminary validations have completed and are ready to perform a sensitive data scan on the data as part of your pipeline. You can modify the EventBridge Event Rule to run in response to an Amazon EventBridge event instead of a scheduled basis.

Third, the application currently uses a 60 second Step Functions Wait state when polling for the Macie discovery job completion. In real world scenarios, the discovery scan will take 10 minutes at a minimum, likely several orders of magnitude longer. You should evaluate the typical execution times for your application execution and tune the polling period accordingly. This will help reduce costs related to running Lambda functions and log storage within CloudWatch Logs. The polling period is defined in the Step Functions state machine definition file (macie_pipeline_scan.asl.json) under the pollForCompletionWait state.

Fourth, the application currently doesn’t account for false positives in the sensitive data discovery job results. Also, the application will progress or delete all objects identified based on the decision by the reviewer. You should consider expanding the application to handle false positives through automation rather than manual review / intervention (such as deleting the files from the manual review bucket or removing the sensitive data tags applied).

Last, the solution will stop the ingestion of a subset of objects into your pipeline. This behavior is similar to other validation and data quality checks that most customers perform as part of the data pipeline. However, you should test to ensure that this will not cause unexpected outcomes and address them in your downstream application logic accordingly.

Conclusion

In this post, I showed you how to integrate sensitive data discovery using Macie as an additional validation step in an automated data pipeline. You’ve reviewed the components of the application, deployed it using the AWS SAM CLI, tested to validate that the application functions as expected, and cleaned up by removing deployed resources.

You now know how to integrate sensitive data scanning into your ETL pipeline. You can use automation and—where required—manual review to help reduce the risk of sensitive data, such as personally identifiable information, being inadvertently ingested into a data lake. You can take this application and customize it to fit your use case and workflows, such as using custom data identifiers as part of your scans, adding additional validation steps, creating Macie suppression rules to define cases to archive findings automatically, or only request manual approvals for findings that meet certain criteria (such as high severity findings).

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Macie forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Brandon Wu

Brandon is a security solutions architect helping financial services organizations secure their critical workloads on AWS. In his spare time, he enjoys exploring outdoors and experimenting in the kitchen.

Working with Lambda layers and extensions in container images

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/working-with-lambda-layers-and-extensions-in-container-images/

In this post, I explain how to use AWS Lambda layers and extensions with Lambda functions packaged and deployed as container images.

Previously, Lambda functions were packaged only as .zip archives. This includes functions created in the AWS Management Console. You can now also package and deploy Lambda functions as container images.

You can use familiar container tooling such as the Docker CLI with a Dockerfile to build, test, and tag images locally. Lambda functions built using container images can be up to 10 GB in size. You push images to an Amazon Elastic Container Registry (ECR) repository, a managed AWS container image registry service. You create your Lambda function, specifying the source code as the ECR image URL from the registry.

Lambda container image support

Lambda container image support

Lambda functions packaged as container images do not support adding Lambda layers to the function configuration. However, there are a number of solutions to use the functionality of Lambda layers with container images. You take on the responsible for packaging your preferred runtimes and dependencies as a part of the container image during the build process.

Understanding how Lambda layers and extensions work as .zip archives

If you deploy function code using a .zip archive, you can use Lambda layers as a distribution mechanism for libraries, custom runtimes, and other function dependencies.

When you include one or more layers in a function, during initialization, the contents of each layer are extracted in order to the /opt directory in the function execution environment. Each runtime then looks for libraries in a different location under /opt, depending on the language. You can include up to five layers per function, which count towards the unzipped deployment package size limit of 250 MB. Layers are automatically set as private, but they can be shared with other AWS accounts, or shared publicly.

Lambda Extensions are a way to augment your Lambda functions and are deployed as Lambda layers. You can use Lambda Extensions to integrate functions with your preferred monitoring, observability, security, and governance tools. You can choose from a broad set of tools provided by AWS, AWS Lambda Ready Partners, and AWS Partners, or create your own Lambda Extensions. For more information, see “Introducing AWS Lambda Extensions – In preview.”

Extensions can run in either of two modes, internal and external. An external extension runs as an independent process in the execution environment. They can start before the runtime process, and can continue after the function invocation is fully processed. Internal extensions run as part of the runtime process, in-process with your code.

Lambda searches the /opt/extensions directory and starts initializing any extensions found. Extensions must be executable as binaries or scripts. As the function code directory is read-only, extensions cannot modify function code.

It helps to understand that Lambda layers and extensions are just files copied into specific file paths in the execution environment during the function initialization. The files are read-only in the execution environment.

Understanding container images with Lambda

A container image is a packaged template built from a Dockerfile. The image is assembled or built from commands in the Dockerfile, starting from a parent or base image, or from scratch. Each command then creates a new layer in the image, which is stacked in order on top of the previous layer. Once built from the packaged template, a container image is immutable and read-only.

For Lambda, a container image includes the base operating system, the runtime, any Lambda extensions, your application code, and its dependencies. Lambda provides a set of open-source base images that you can use to build your container image. Lambda uses the image to construct the execution environment during function initialization. You can use the AWS Serverless Application Model (AWS SAM) CLI or native container tools such as the Docker CLI to build and test container images locally.

Using Lambda layers in container images

Container layers are added to a container image, similar to how Lambda layers are added to a .zip archive function.

There are a number of ways to use container image layering to add the functionality of Lambda layers to your Lambda function container images.

Use a container image version of a Lambda layer

A Lambda layer publisher may have a container image format equivalent of a Lambda layer. To maintain the same file path as Lambda layers, the published container images must have the equivalent files located in the /opt directory. An image containing an extension must include the files in the /opt/extensions directory.

An example Lambda function, packaged as a .zip archive, is created with two layers. One layer contains shared libraries, and the other layer is a Lambda extension from an AWS Partner.

aws lambda create-function –region us-east-1 –function-name my-function \

aws lambda create-function --region us-east-1 --function-name my-function \  
    --role arn:aws:iam::123456789012:role/lambda-role \
    --layers \
        "arn:aws:lambda:us-east-1:123456789012:layer:shared-lib-layer:1" \
        "arn:aws:lambda:us-east-1:987654321987:extensions-layer:1" \
    …

The corresponding Dockerfile syntax for a function packaged as a container image includes the following lines. These pull the container image versions of the Lambda layers and copy them into the function image. The shared library image is pulled from ECR and the extension image is pulled from Docker Hub.

FROM public.ecr.aws/myrepo/shared-lib-layer:1 AS shared-lib-layer
# Layer code
WORKDIR /opt
COPY --from=shared-lib-layer /opt/ .

FROM aws-partner/extensions-layer:1 as extensions-layer
# Extension  code
WORKDIR /opt/extensions
COPY --from=extensions-layer /opt/extensions/ .

Copy the contents of a Lambda layer into a container image

You can use existing Lambda layers, and copy the contents of the layers into the function container image /opt directory during docker build.

You need to build a Dockerfile that includes the AWS Command Line Interface to copy the layer files from Amazon S3.

The Dockerfile to add two layers into a single image includes the following lines to copy the Lambda layer contents.

FROM alpine:latest

ARG AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION:-"us-east-1"}
ARG AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID:-""}
ARG AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY:-""}
ENV AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}
ENV AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}
ENV AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

RUN apk add aws-cli curl unzip

RUN mkdir -p /opt

RUN curl $(aws lambda get-layer-version-by-arn --arn arn:aws:lambda:us-east-1:1234567890123:layer:shared-lib-layer:1 --query 'Content.Location' --output text) --output layer.zip
RUN unzip layer.zip -d /opt
RUN rm layer.zip

RUN curl $(aws lambda get-layer-version-by-arn --arn arn:aws:lambda:us-east-1:987654321987:extensions-layer:1 --query 'Content.Location' --output text) --output layer.zip
RUN unzip layer.zip -d /opt
RUN rm layer.zip

To run the AWS CLI, specify your AWS_ACCESS_KEY, and AWS_SECRET_ACCESS_KEY, and include the required AWS_DEFAULT_REGION as command-line arguments.

docker build . -t layer-image1:latest \
--build-arg AWS_DEFAULT_REGION=us-east-1 \
--build-arg AWS_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE \
--build-arg AWS_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

This creates a container image containing the existing Lambda layer and extension files. This can be pushed to ECR and used in a function.

Build a container image from a Lambda layer

You can repackage and publish Lambda layer file content as container images. Creating separate container images for different layers allows you to add them to multiple functions, and share them in a similar way as Lambda layers.

You can create a separate container image containing the files from a single layer, or combine the files from multiple layers into a single image. If you create separate container images for layer files, you then add these images into your function image.

There are two ways to manage language code dependencies. You can pre-build the dependencies and copy the files into the container image, or build the dependencies during docker build.

In this example, I migrate an existing Python application. This comprises a Lambda function and extension, from a .zip archive to separate function and extension container images. The extension writes logs to S3.

You can choose how to store images in repositories. You can either push both images to the same ECR repository with different image tags, or push to different repositories. In this example, I use separate ECR repositories.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

The existing example extension uses a makefile to install boto3 using pip install with a requirements.txt file. This is migrated to the docker build process. I must add a Python runtime to be able to run pip install as part of the build process. I use python:3.8-alpine as a minimal base image.

I create separate Dockerfiles for the function and extension. The extension Dockerfile contains the following lines.

FROM python:3.8-alpine AS installer
#Layer Code
COPY extensionssrc /opt/
COPY extensionssrc/requirements.txt /opt/
RUN pip install -r /opt/requirements.txt -t /opt/extensions/lib

FROM scratch AS base
WORKDIR /opt/extensions
COPY --from=installer /opt/extensions .

I build, tag, login, and push the extension container image to an existing ECR repository.

docker build -t log-extension-image:latest  .
docker tag log-extension-image:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/log-extension-image:latest
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/log-extension-image:latest

The function Dockerfile contains the following lines, which add the files from the previously created extension image to the function image. There is no need to run pip install for the function as it does not require any additional dependencies.

FROM 123456789012.dkr.ecr.us-east-1.amazonaws.com/log-extension-image:latest AS layer
FROM public.ecr.aws/lambda/python:3.8
# Layer code
WORKDIR /opt
COPY --from=layer /opt/ .
# Function code
WORKDIR /var/task
COPY app.py .
CMD ["app.lambda_handler"]

I build, tag, and push the function container image to a separate existing ECR repository. This creates an immutable image of the Lambda function.

docker build -t log-extension-function:latest  .
docker tag log-extension-function:latest 123456789012.dkr.ecr.us-east-1.amazonaws.com/log-extension-function:latest
docker push 123456789012.dkr.ecr.us-east-1.amazonaws.com/log-extension-function:latest

The function requires a unique S3 bucket to store the logs files, which I create in the S3 console. I create a Lambda function from the ECR repository image, and specify the bucket name as a Lambda environment variable.

aws lambda create-function --region us-east-1  --function-name log-extension-function \
--package-type Image --code ImageUri=123456789012.dkr.ecr.us-east-1.amazonaws.com/log-extension-function:latest \
--role "arn:aws:iam:: 123456789012:role/lambda-role" \
--environment  "Variables": {"S3_BUCKET_NAME": "s3-logs-extension-demo-logextensionsbucket-us-east-1"}

For subsequent extension code changes, I need to update both the extension and function images. If only the function code changes, I need to update the function image. I push the function image as the :latest image to ECR. I then update the function code deployment to use the updated :latest ECR image.

aws lambda update-function-code --function-name log-extension-function --image-uri 123456789012.dkr.ecr.us-east-1.amazonaws.com/log-extension-function:latest

Using custom runtimes with container images

With .zip archive functions, custom runtimes are added using Lambda layers. With container images, you no longer need to copy in Lambda layer code for custom runtimes.

You can build your own custom runtime images starting with AWS provided base images for custom runtimes. You can add your preferred runtime, dependencies, and code to these images. To communicate with Lambda, the image must implement the Lambda Runtime API. We provide Lambda runtime interface clients for all supported runtimes, or you can implement your own for additional runtimes.

Running extensions in container images

A Lambda extension running in a function packaged as a container image works in the same way as a .zip archive function. You build a function container image including the extension files, or adding an extension image layer. Lambda looks for any external extensions in the /opt/extensions directory and starts initializing them. Extensions must be executable as binaries or scripts.

Internal extensions modify the Lambda runtime startup behavior using language-specific environment variables, or wrapper scripts. For language-specific environment variables, you can set the following environment variables in your function configuration to augment the runtime command line.

  • JAVA_TOOL_OPTIONS (Java Corretto 8 and 11)
  • NODE_OPTIONS (Node.js 10 and 12)
  • DOTNET_STARTUP_HOOKS (.NET Core 3.1)

An example Lambda environment variable for JAVA_TOOL_OPTIONS:

-javaagent:"/opt/ExampleAgent-0.0.jar"

Wrapper scripts delegate the runtime start-up to a script. The script can inject and alter arguments, set environment variables, or capture metrics, errors, and other diagnostic information. The following runtimes support wrapper scripts: Node.js 10 and 12, Python 3.8, Ruby 2.7, Java 8 and 11, and .NET Core 3.1

You specify the script by setting the value of the AWS_LAMBDA_EXEC_WRAPPER environment variable as the file system path of an executable binary or script, for example:

/opt/wrapper_script

Conclusion

You can now package and deploy Lambda functions as container images in addition to .zip archives. Lambda functions packaged as container images do not directly support adding Lambda layers to the function configuration as .zip archives do.

In this post, I show a number of solutions to use the functionality of Lambda layers and extensions with container images, including example Dockerfiles.

I show how to migrate an existing Lambda function and extension from a .zip archive to separate function and extension container images. Follow the instructions in the README.md file in the GitHub repository.

For more serverless learning resources, visit https://serverlessland.com.

New for AWS Lambda – Container Image Support

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-lambda-container-image-support/

With AWS Lambda, you upload your code and run it without thinking about servers. Many customers enjoy the way this works, but if you’ve invested in container tooling for your development workflows, it’s not easy to use the same approach to build applications using Lambda.

To help you with that, you can now package and deploy Lambda functions as container images of up to 10 GB in size. In this way, you can also easily build and deploy larger workloads that rely on sizable dependencies, such as machine learning or data intensive workloads. Just like functions packaged as ZIP archives, functions deployed as container images benefit from the same operational simplicity, automatic scaling, high availability, and native integrations with many services.

We are providing base images for all the supported Lambda runtimes (Python, Node.js, Java, .NET, Go, Ruby) so that you can easily add your code and dependencies. We also have base images for custom runtimes based on Amazon Linux that you can extend to include your own runtime implementing the Lambda Runtime API.

You can deploy your own arbitrary base images to Lambda, for example images based on Alpine or Debian Linux. To work with Lambda, these images must implement the Lambda Runtime API. To make it easier to build your own base images, we are releasing Lambda Runtime Interface Clients implementing the Runtime API for all supported runtimes. These implementations are available via native package managers, so that you can easily pick them up in your images, and are being shared with the community using an open source license.

We are also releasing as open source a Lambda Runtime Interface Emulator that enables you to perform local testing of the container image and check that it will run when deployed to Lambda. The Lambda Runtime Interface Emulator is included in all AWS-provided base images and can be used with arbitrary images as well.

Your container images can also use the Lambda Extensions API to integrate monitoring, security and other tools with the Lambda execution environment.

To deploy a container image, you select one from an Amazon Elastic Container Registry repository. Let’s see how this works in practice with a couple of examples, first using an AWS-provided image for Node.js, and then building a custom image for Python.

Using the AWS-Provided Base Image for Node.js
Here’s the code (app.js) for a simple Node.js Lambda function generating a PDF file using the PDFKit module. Each time it is invoked, it creates a new mail containing random data generated by the faker.js module. The output of the function is using the syntax of the Amazon API Gateway to return the PDF file.

const PDFDocument = require('pdfkit');
const faker = require('faker');
const getStream = require('get-stream');

exports.lambdaHandler = async (event) => {

    const doc = new PDFDocument();

    const randomName = faker.name.findName();

    doc.text(randomName, { align: 'right' });
    doc.text(faker.address.streetAddress(), { align: 'right' });
    doc.text(faker.address.secondaryAddress(), { align: 'right' });
    doc.text(faker.address.zipCode() + ' ' + faker.address.city(), { align: 'right' });
    doc.moveDown();
    doc.text('Dear ' + randomName + ',');
    doc.moveDown();
    for(let i = 0; i < 3; i++) {
        doc.text(faker.lorem.paragraph());
        doc.moveDown();
    }
    doc.text(faker.name.findName(), { align: 'right' });
    doc.end();

    pdfBuffer = await getStream.buffer(doc);
    pdfBase64 = pdfBuffer.toString('base64');

    const response = {
        statusCode: 200,
        headers: {
            'Content-Length': Buffer.byteLength(pdfBase64),
            'Content-Type': 'application/pdf',
            'Content-disposition': 'attachment;filename=test.pdf'
        },
        isBase64Encoded: true,
        body: pdfBase64
    };
    return response;
};

I use npm to initialize the package and add the three dependencies I need in the package.json file. In this way, I also create the package-lock.json file. I am going to add it to the container image to have a more predictable result.

$ npm init
$ npm install pdfkit
$ npm install faker
$ npm install get-stream

Now, I create a Dockerfile to create the container image for my Lambda function, starting from the AWS provided base image for the nodejs12.x runtime:

FROM amazon/aws-lambda-nodejs:12
COPY app.js package*.json ./
RUN npm install
CMD [ "app.lambdaHandler" ]

The Dockerfile is adding the source code (app.js) and the files describing the package and the dependencies (package.json and package-lock.json) to the base image. Then, I run npm to install the dependencies. I set the CMD to the function handler, but this could also be done later as a parameter override when configuring the Lambda function.

I use the Docker CLI to build the random-letter container image locally:

$ docker build -t random-letter .

To check if this is working, I start the container image locally using the Lambda Runtime Interface Emulator:

$ docker run -p 9000:8080 random-letter:latest

Now, I test a function invocation with cURL. Here, I am passing an empty JSON payload.

$ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'

If there are errors, I can fix them locally. When it works, I move to the next step.

To upload the container image, I create a new ECR repository in my account and tag the local image to push it to ECR. To help me identify software vulnerabilities in my container images, I enable ECR image scanning.

$ aws ecr create-repository --repository-name random-letter --image-scanning-configuration scanOnPush=true
$ docker tag random-letter:latest 123412341234.dkr.ecr.sa-east-1.amazonaws.com/random-letter:latest
$ aws ecr get-login-password | docker login --username AWS --password-stdin 123412341234.dkr.ecr.sa-east-1.amazonaws.com
$ docker push 123412341234.dkr.ecr.sa-east-1.amazonaws.com/random-letter:latest

Here I am using the AWS Management Console to complete the creation of the function. You can also use the AWS Serverless Application Model, that has been updated to add support for container images.

In the Lambda console, I click on Create function. I select Container image, give the function a name, and then Browse images to look for the right image in my ECR repositories.

Screenshot of the console.

After I select the repository, I use the latest image I uploaded. When I select the image, the Lambda is translating that to the underlying image digest (on the right of the tag in the image below). You can see the digest of your images locally with the docker images --digests command. In this way, the function is using the same image even if the latest tag is passed to a newer one, and you are protected from unintentional deployments. You can update the image to use in the function code. Updating the function configuration has no impact on the image used, even if the tag was reassigned to another image in the meantime.

Screenshot of the console.

Optionally, I can override some of the container image values. I am not doing this now, but in this way I can create images that can be used for different functions, for example by overriding the function handler in the CMD value.

Screenshot of the console.

I leave all other options to their default and select Create function.

When creating or updating the code of a function, the Lambda platform optimizes new and updated container images to prepare them to receive invocations. This optimization takes a few seconds or minutes, depending on the size of the image. After that, the function is ready to be invoked. I test the function in the console.

Screenshot of the console.

It’s working! Now let’s add the API Gateway as trigger. I select Add Trigger and add the API Gateway using an HTTP API. For simplicity, I leave the authentication of the API open.

Screenshot of the console.

Now, I click on the API endpoint a few times and download a few random mails.

Screenshot of the console.

It works as expected! Here are a few of the PDF files that are generated with random data from the faker.js module.

Output of the sample application.

 

Building a Custom Image for Python
Sometimes you need to use your custom container images, for example to follow your company guidelines or to use a runtime version that we don’t support.

In this case, I want to build an image to use Python 3.9. The code (app.py) of my function is very simple, I just want to say hello and the version of Python that is being used.

import sys
def handler(event, context): 
    return 'Hello from AWS Lambda using Python' + sys.version + '!'

As I mentioned before, we are sharing with you open source implementations of the Lambda Runtime Interface Clients (which implement the Runtime API) for all the supported runtimes. In this case, I start with a Python image based on Alpine Linux. Then, I add the Lambda Runtime Interface Client for Python (link coming soon) to the image. Here’s the Dockerfile:

# Define global args
ARG FUNCTION_DIR="/home/app/"
ARG RUNTIME_VERSION="3.9"
ARG DISTRO_VERSION="3.12"

# Stage 1 - bundle base image + runtime
# Grab a fresh copy of the image and install GCC
FROM python:${RUNTIME_VERSION}-alpine${DISTRO_VERSION} AS python-alpine
# Install GCC (Alpine uses musl but we compile and link dependencies with GCC)
RUN apk add --no-cache \
    libstdc++

# Stage 2 - build function and dependencies
FROM python-alpine AS build-image
# Install aws-lambda-cpp build dependencies
RUN apk add --no-cache \
    build-base \
    libtool \
    autoconf \
    automake \
    libexecinfo-dev \
    make \
    cmake \
    libcurl
# Include global args in this stage of the build
ARG FUNCTION_DIR
ARG RUNTIME_VERSION
# Create function directory
RUN mkdir -p ${FUNCTION_DIR}
# Copy handler function
COPY app/* ${FUNCTION_DIR}
# Optional – Install the function's dependencies
# RUN python${RUNTIME_VERSION} -m pip install -r requirements.txt --target ${FUNCTION_DIR}
# Install Lambda Runtime Interface Client for Python
RUN python${RUNTIME_VERSION} -m pip install awslambdaric --target ${FUNCTION_DIR}

# Stage 3 - final runtime image
# Grab a fresh copy of the Python image
FROM python-alpine
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
# (Optional) Add Lambda Runtime Interface Emulator and use a script in the ENTRYPOINT for simpler local runs
COPY https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie /usr/bin/aws-lambda-rie
RUN chmod 755 /usr/bin/aws-lambda-rie
COPY entry.sh /
ENTRYPOINT [ "/entry.sh" ]
CMD [ "app.handler" ]

The Dockerfile this time is more articulated, building the final image in three stages, following the Docker best practices of multi-stage builds. You can use this three-stage approach to build your own custom images:

  • Stage 1 is building the base image with the runtime, Python 3.9 in this case, plus GCC that we use to compile and link dependencies in stage 2.
  • Stage 2 is installing the Lambda Runtime Interface Client and building function and dependencies.
  • Stage 3 is creating the final image adding the output from stage 2 to the base image built in stage 1. Here I am also adding the Lambda Runtime Interface Emulator, but this is optional, see below.

I create the entry.sh script below to use it as ENTRYPOINT. It executes the Lambda Runtime Interface Client for Python. If the execution is local, the Runtime Interface Client is wrapped by the Lambda Runtime Interface Emulator.

#!/bin/sh
if [ -z "${AWS_LAMBDA_RUNTIME_API}" ]; then
    exec /usr/bin/aws-lambda-rie /usr/local/bin/python -m awslambdaric
else
    exec /usr/local/bin/python -m awslambdaric
fi

Now, I can use the Lambda Runtime Interface Emulator to check locally if the function and the container image are working correctly:

$ docker run -p 9000:8080 lambda/python:3.9-alpine3.12

Not Including the Lambda Runtime Interface Emulator in the Container Image
It’s optional to add the Lambda Runtime Interface Emulator to a custom container image. If I don’t include it, I can test locally by installing the Lambda Runtime Interface Emulator in my local machine following these steps:

  • In Stage 3 of the Dockerfile, I remove the commands copying the Lambda Runtime Interface Emulator (aws-lambda-rie) and the entry.sh script. I don’t need the entry.sh script in this case.
  • I use this ENTRYPOINT to start by default the Lambda Runtime Interface Client:
    ENTRYPOINT [ "/usr/local/bin/python", “-m”, “awslambdaric” ]
  • I run these commands to install the Lambda Runtime Interface Emulator in my local machine, for example under ~/.aws-lambda-rie:
mkdir -p ~/.aws-lambda-rie
curl -Lo ~/.aws-lambda-rie/aws-lambda-rie https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie
chmod +x ~/.aws-lambda-rie/aws-lambda-rie

When the Lambda Runtime Interface Emulator is installed on my local machine, I can mount it when starting the container. The command to start the container locally now is (assuming the Lambda Runtime Interface Emulator is at ~/.aws-lambda-rie):

docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 \
       --entrypoint /aws-lambda/aws-lambda-rie lambda/python:3.9-alpine3.12
       /lambda-entrypoint.sh app.handler

Testing the Custom Image for Python
Either way, when the container is running locally, I can test a function invocation with cURL:

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'

The output is what I am expecting!

"Hello from AWS Lambda using Python3.9.0 (default, Oct 22 2020, 05:03:39) \n[GCC 9.3.0]!"

I push the image to ECR and create the function as before. Here’s my test in the console:

Screenshot of the console.

My custom container image based on Alpine is running Python 3.9 on Lambda!

Available Now
You can use container images to deploy your Lambda functions today in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Singapore), Europe (Ireland), Europe (Frankfurt), South America (São Paulo). We are working to add support in more Regions soon. The container image support is offered in addition to ZIP archives and we will continue to support the ZIP packaging format.

There are no additional costs to use this feature. You pay for the ECR repository and the usual Lambda pricing.

You can use container image support in AWS Lambda with the console, AWS Command Line Interface (CLI), AWS SDKs, AWS Serverless Application Model, and solutions from AWS Partners, including Aqua Security, Datadog, Epsagon, HashiCorp Terraform, Honeycomb, Lumigo, Pulumi, Stackery, Sumo Logic, and Thundra.

This new capability opens up new scenarios, simplifies the integration with your development pipeline, and makes it easier to use custom images and your favorite programming platforms to build serverless applications.

Learn more and start using container images with AWS Lambda.

Danilo

Preview: AWS Proton – Automated Management for Container and Serverless Deployments

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/preview-aws-proton-automated-management-for-container-and-serverless-deployments/

Today, we are excited to announce the public preview of AWS Proton, a new service that helps you automate and manage infrastructure provisioning and code deployments for serverless and container-based applications.

Maintaining hundreds – or sometimes thousands – of microservices with constantly changing infrastructure resources and configurations is a challenging task for even the most capable teams.

AWS Proton enables infrastructure teams to define standard templates centrally and make them available for developers in their organization. This allows infrastructure teams to manage and update infrastructure without impacting developer productivity.

How AWS Proton Works
The process of defining a service template involves the definition of cloud resources, continuous integration and continuous delivery (CI/CD) pipelines, and observability tools. AWS Proton will integrate with commonly used CI/CD pipelines and observability tools such as CodePipeline and CloudWatch. It also provides curated templates that follow AWS best practices for common use cases such as web services running on AWS Fargate or stream processing apps built on AWS Lambda.

Infrastructure teams can visualize and manage the list of service templates in the AWS Management Console.

This is what the list of templates looks like.

AWS Proton also collects information about the deployment status of the application such as the last date it was successfully deployed. When a template changes, AWS Proton identifies all the existing applications using the old version and allows infrastructure teams to upgrade them to the most recent definition, while monitoring application health during the upgrade so it can be rolled-back in case of issues.

This is what a service template looks like, with its versions and running instances.

Once service templates have been defined, developers can select and deploy services in a self-service fashion. AWS Proton will take care of provisioning cloud resources, deploying the code, and health monitoring, while providing visibility into the status of all the deployed applications and their pipelines.

This way, developers can focus on building and shipping application code for serverless and container-based applications without having to learn, configure, and maintain the underlying resources.

This is what the list of deployed services looks like.

Available in Preview
AWS Proton is now available in preview in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), and Europe (Ireland); it’s free of charge, as you only pay for the underlying services and resources. Check out the technical documentation.

You can get started using the AWS Management Console here.

Alex

New for AWS Lambda – Functions with Up to 10 GB of Memory and 6 vCPUs

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-lambda-functions-with-up-to-10-gb-of-memory-and-6-vcpus/

AWS Lambda runs your code on an highly available and scalable compute infrastructure so that you can focus on what you want to build. Do you want to get the advantages of Lambda for workloads that are memory or computationally intensive? Wait no more!

Starting today, you can allocate up to 10 GB of memory to a Lambda function. This is more than a 3x increase compared to previous limits. Lambda allocates CPU and other resources linearly in proportion to the amount of memory configured. That means you can now have access to up to 6 vCPUs in each execution environment. In this way, your multithreaded and multiprocess applications run faster. Since Lambda charges are proportional to memory configured and function duration (GB-seconds), the additional costs for using more memory may be offset by lower duration. I have more on this in the example below.

With more memory and CPU power, and support for the AVX2 instruction set, new use cases — such as machine learning applications; batch and extract, transform, load (ETL) jobs; modelling; genomics; gaming; high-performance computing (HPC); and media processing — become easier to implement and scale with Lambda functions.

Let’s see how this works in practice!

Lambda Function Performance as Memory Increases
When I first wrote about the capability of mounting a shared Amazon Elastic File System (EFS) for Lambda functions, one of the examples I used was a function doing machine learning inference to classify images of birds. The function is using PyTorch to run the inference, applying a pre-trained machine learning model.

Now, I can execute the same function in the updated Lambda execution environment. Let’s see how increasing memory affects the duration of the function. Here are the results of using memory configurations between 1 and 10 GB. To get these numbers, I ran 20 invocations for each memory configuration. Then, I computed the average duration, discarding function initializations. To avoid possible outliers, I also excluded from the average the top and bottom 10% of reported durations. Based on the results, I estimated the charges I would have for 1 million invocations with each configuration.

Graph showing Function Duration and Charges for 1M Invocations as Memory Increases

As you can see, the function is able to use the additional CPU power that comes with more memory, decreasing the duration of the invocations. What is interesting is the impact of increasing memory on my costs.

Lambda charges are related to memory and duration, so if I increase memory and this is reducing duration by the same proportion, the overall charges are about the same. For example, looking at the graph above, when I configure 5 GB of memory, I have the same costs as when I have 1 GB of memory (about $61 for one million invocations), but the function is 5x faster. If I need lower latency, I can increase memory up to 10 GB, where the function is 7.6x faster and I pay a little more ($80 for one million invocations).

Depending on your code and business case, you can find out which memory configuration gives the optimal trade-off between cost and performance. To help you with that, my colleague and friend Alex Casalboni started the AWS Lambda Power Tuning project to help you optimize your Lambda functions in a data-driven way. This open source tool is really useful and has been improved by the support of many contributors. Give it a try!

In my tests, PyTorch is also using the optimizations of the Advanced Vector Extensions 2 (AVX2) instruction set, now available in the Lambda execution environment. With the AVX2 instruction set, the processor allows running a certain set of operations simultaneously. This is extremely beneficial for applications with operations that can run in parallel such as matrix multiplication. As a result, using AVX2 can improve performance by increasing CPU throughput per cycle. This typically helps compute intensive workloads such as machine learning inference, multimedia processing, scientific simulations, and financial modeling applications.

Available Now
AWS Lambda support for larger functions is available in Africa (Cape Town), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), EU (Frankfurt), EU (Ireland), EU (London), EU (Milano), EU (Paris), EU (Stockholm), South America (Sao Paulo), US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon).

You can configure up to 10 GB of memory for new or existing Lambda functions using the AWS Management Console, AWS Command Line Interface (CLI), AWS SDKs, and Serverless Application Model.

Here’s a snapshot of the new console experience. We replaced the slider with a field, and you can now configure memory in 1 MB increments (it was 64MB increments before). In this way, the console works similarly to the Lambda API that always accepted memory configurations with 1MB granularity.

There is no change in Lambda pricing, you pay for requests and usage, with duration and Provisioned Concurrency charged at a rate proportional to the amount of memory configured.

Start using Lambda functions with up to 10 GB of memory and 6 vCPUs today.

Danilo

New for AWS Lambda – 1ms Billing Granularity Adds Cost Savings

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-lambda-1ms-billing-granularity-adds-cost-savings/

What I like about AWS Lambda is that it lets you run code without provisioning or managing servers, and you pay only for what you use. Since we launched Lambda in 2014, you have been charged for the number of times your code is triggered (requests) and for the time your code executes, rounded up to the nearest 100ms (duration).

Starting today, we are rounding up duration to the nearest millisecond with no minimum execution time.

With this new pricing, you are going to pay less most of the time, but it’s going to be more noticeable when you have functions whose execution time is much lower than 100ms, such as low latency APIs.

For example, let’s look at a simple web app that I have running. In the Amazon CloudWatch Logs, for each invocation there is a REPORT line. To improve readability, I am breaking the REPORT line into three lines here:

REPORT RequestId: 35a7e0cb-4902-490d-b8d3-eb315dded660
Duration: 27.40 ms  Billed Duration: 100 ms Memory Size: 1024 MB  Max Memory Used: 472 MB

With 1ms billing granularity that becomes:

REPORT RequestId: a24d03b5-429d-4ca3-a490-878a52a0182f
Duration: 27.55 ms  Billed Duration: 28 ms Memory Size: 1024 MB  Max Memory Used: 472 MB

My application doesn’t have a lot of traffic, so let’s do a simple production scenario. Let’s say I have 100,000 users for a web/mobile app. I expect each user to call this function via the web/mobile app about 20 times per day. The duration of those invocations is on average 28ms. Each month, I should expect:

  • 100,000 users * 20 invocations * 30 days = 60 million invocations.

Let’s estimate the costs in US East (N. Virginia). For simplicity, I am not considering the Lambda free tier.

The Lambda monthly request charges are unchanged:

  • 60 million invocations * $0.20 per 1M requests = $12

To that, I have to add compute charges based on duration.

The Lambda monthly compute charges with the old 100ms rounded up pricing would have been:

  • 60 million invocations* 100ms * 1G memory * $0.0000166667 for every GB-second = $100

With the new 1ms billing granularity, the duration costs are:

  • 60 million invocations * 28ms * 1G memory * $0.0000166667 for every GB-second = $28

For this scenario, overall costs including request and compute charges are much cheaper ($40) than before ($112).

With this pricing, there is now more of an incentive to optimize the duration of functions even if it is already well below 100ms. Your engineering efforts can reduce costs even more.

If you increase memory to get more CPU power and speed up your functions, you now get the benefit of a lower billed duration below 100ms as well. That means that increasing performance and reducing latency is going to be cheaper than before.

We are applying 1ms billing granularity for duration, including when you have Provisioned Concurrency enabled, in all AWS Regions with the exception of those based in China starting with the December 2020 billing period. Regions in China will get the change from January.

Enjoy the new pricing!

Danilo

A Thanksgiving 2020 Reading List

Post Syndicated from Val Vesa original https://blog.cloudflare.com/a-thanksgiving-2020-reading-list/

A Thanksgiving 2020 Reading List

While our colleagues in the US are celebrating Thanksgiving this week and taking a long weekend off, there is a lot going on at Cloudflare. The EMEA team is having a full day on CloudflareTV with a series of live shows celebrating #CloudflareCareersDay.

So if you want to relax in an active and learning way this weekend, here are some of the topics we’ve covered on the Cloudflare blog this past week that you may find interesting.

Improving Performance and Search Rankings with Cloudflare for Fun and Profit

Making things fast is one of the things we do at Cloudflare. More responsive websites, apps, APIs, and networks directly translate into improved conversion and user experience. On November 10, Google announced that Google Search will directly take web performance and page experience data into account when ranking results on their search engine results pages (SERPs), beginning in May 2021.

Rustam Lalkaka and Rita Kozlov explain in this blog post how Google Search will prioritize results based on how pages score on Core Web Vitals, a measurement methodology Cloudflare has worked closely with Google to establish, and we have implemented support for in our analytics tools. Read the full blog post.

Getting to the Core: Benchmarking Cloudflare’s Latest Server Hardware

At the Cloudflare Core, we process logs to analyze attacks and compute analytics. In 2020, our Core servers were in need of a refresh, so we decided to redesign the hardware to be more in line with our Gen X edge servers. We designed two major server variants for the core. The first is Core Compute 2020, an AMD-based server for analytics and general-purpose compute paired with solid-state storage drives. The second is Core Storage 2020, an Intel-based server with twelve spinning disks to run database workloads. This is a refresh of the hardware that Cloudflare uses to run analytics provided big efficiency improvements.

Read the full blog post by Brian Bassett

Moving Quicksilver into production

We previously explained how and why we built Quicksilver. Quicksilver is the data store responsible for storing and distributing the billions of KV pairs used to configure the millions of sites and Internet services which use Cloudflare. This second blog post is about the long journey to production which culminates with Kyoto Tycoon removal from Cloudflare infrastructure and points to the first signs of obsolescence.

Geoffrey Plouviez takes you through the entire story of real-world engineering challenges and what it’s like to replace one of Cloudflare’s oldest critical components: read the full blog post here.

Building Black Friday e-commerce experiences with JAMstack and Cloudflare Workers

In this blog post, we explore how Cloudflare Workers continues to excel as a JAMstack deployment platform, and how it can be used to power e-commerce experiences, integrating with familiar tools like Stripe, as well as new technologies like Nuxt.js, and Sanity.io.

Read the full blog post and get all the details and open-source code from Kristian Freeman.

A Byzantine failure in the real world

When we review design documents at Cloudflare, we are always on the lookout for Single Points of Failure (SPOFs). In this post, we present a timeline of a real-world incident, and how an interesting failure mode known as a Byzantine fault played a role in a cascading series of events.

Tom Lianza and Chris Snook’s full blog post describes the consequences of a malfunctioning switch on a system built for reliability.

ASICs at the Edge

At Cloudflare, we pride ourselves in our global network that spans more than 200 cities in over 100 countries. To accelerate all that traffic through our network, there are multiple technologies at play. So let’s have a look at one of the cornerstones that makes all of this work.

Tom Strickx’ epic deep dive into ASICs is here.

Let us know your thoughts and comments below or feel free to also reach out to us via our social media channels. And because we talked about careers in the beginning of this blog post, check out our available jobs if you are interested to join Cloudflare.