Tag Archives: nodejs

Building a location-based, scalable, serverless web app – part 3

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-location-based-scalable-serverless-web-app-part-3/

In part 2, I cover the API configuration, geohashing algorithm, and real-time messaging architecture used in the Ask Around Me web application. These are needed for receiving and processing questions and answers, and sending results back to users in real time.

In this post, I explain the backend processing architecture, how data is aggregated, and how to deploy the final application to production. The code and instructions for this application are available in the GitHub repo.

Processing questions

The frontend sends new user questions to the backend via the POST questions API. While the predicted volume of questions is only 1,000 per hour, it’s possible for usage to spike unexpectedly. To help handle this load, the PostQuestions Lambda function puts incoming questions onto an Amazon SQS queue. The ProcessQuestions function takes messages from the Questions queue in batches of 10, and loads these into the Questions table in Amazon DynamoDB.

Questions processing architecture

This asynchronous process smooths out traffic spikes, ensuring that the application is not throttled by DynamoDB. It also provides consistent response times to the front-end POST request, since the API call returns as soon as the message is durably persisted to the queue.

Currently, the ProcessQuestions function does not parse or validate user questions. It would be easy to add message filtering at this stage, using Amazon Comprehend to detect sentiment or inappropriate language. These changes would increase the processing time per question, but by handling this asynchronously, the initial POST API latency is not adversely affected.

The ProcessQuestions function uses the Geo Library for Amazon DynamoDB that converts the question’s latitude and longitude into a geohash. This geohash attribute is one of the indexes in the underlying DynamoDB table. The GetQuestions function using the same library for efficiently querying questions based on proximity to the user.

There are a couple of different mechanisms used to pass information between the frontend and backend applications. When the frontend first initializes, it retrieves the current location of the user from the browser. It then calls the questions API to get a list of active questions within 5 miles of the current location. This retrieves the state up to this point in time. To receive notifications of new messages posted in the user’s area, the frontend also subscribes to the geohash topic in AWS IoT Core.

Processing answers

Answers processing architecture

The application allows two types of question that have different answer types. First, the rating questions accept an answer with a 0–5 score range. Second, the geography questions accept a geo-point, which is a latitude and longitude representing a location.

Similar to the way questions are handled, answers are also queued before processing. However, the PostAnswers Lambda function sends answers to different queues, depending on question type. Ratings messages are sent to the StarAnswers queue, while geography messages are routed to the GeoAnswers queue. Star ratings are saved as raw data in the Answers table by the ProcessAnswerStar function. Geography answers are first converted to a geohash before they are stored.

It’s possible for users to submit updates to their answers. For a star rating, the processing function simply saves the new score. For geography answers, if the updated answer contains a latitude and longitude close enough to the original answer, it results in the same geohash. This is due to the different aggregation processes used for these types of answers.

Aggregating data

In this application, the users asking questions are seeking aggregated answers instead of raw data. For example, “How do you rate the park?” shows an average score from users instead of thousands of individual ratings. To maintain performance, this aggregation occurs when new answers are saved to the database, not when the application fetches the question list.

The Answers table emits updates to a DynamoDB stream whenever new items are inserted or updated. The StreamSpecification parameter in the table definition is set to NEW_AND_OLD_IMAGES, meaning the stream record contains both the new and old item record.

New answers to questions are new items in the table, so the stream record only contains the new image. If users update their answers, this creates an updated item in the table, and the stream record contains both the new and old images of the item.

For star ratings, when receiving an updated rating, the Aggregation function uses both images to calculate the delta in the score. For example, if the old rating was 2 and the user changes this to 5, then the delta is 3. The summary score related to the answer is updated in the Questions table, using a DynamoDB update expression:

    const result = await myGeoTableManager.updatePoint({
      RangeKeyValue: { S: update }, 
      GeoPoint: {
        latitude: item.lat,
        longitude: item.lng
      },
      UpdateItemInput: {
        UpdateExpression: 'ADD answers :deltaAnswers, totalScore :deltaTotalScore',
        ExpressionAttributeValues: {
          ':deltaAnswers': { N: item.deltaAnswers.toString()},
          ':deltaTotalScore': { N: item.deltaValue.toString()}
        }
      }
    }).promise()

For geo-point ratings, the same approach is used but if the geohash changes, then the delta is -1 for the geohash in the old image, and +1 for the geohash in the new image. The update expression automatically creates a new geohash attribute on the DynamoDB item if it is not already present:

    const result = await myGeoTableManager.updatePoint({
      RangeKeyValue: { S: item.ID }, 
      GeoPoint: {
        latitude: item.lat,
        longitude: item.lng
      },
      UpdateItemInput: {
        UpdateExpression: `ADD ${item.geohash} :deltaAnswers, answers :deltaAnswers`,
        ExpressionAttributeValues: {
          ':deltaAnswers': { N: item.deltaAnswers.toString() }          
        }
      }
    }).promise()

By using a Lambda function as a DynamoDB stream processor, you can aggregate large amounts of data in near real time. The Questions and Answers tables have a one-to-many relationship – many answers belong to one question. As answers are saved, the aggregation process updates the summaries in the Questions table.

The Questions table also publishes updates to another DynamoDB stream. These are consumed by a Lambda function that sends the aggregated update to topics in AWS IoT Core. This is how updated scores are sent back to the frontend client application.

Publishing to production with Amplify Console

At this point, you can run the application on your local development machine and view the application via the localhost Vue.js server. Once you are ready to launch the application to users, you must deploy to production.

Single-page applications are easy to deploy publicly. The build process creates static HTML, JS, and CSS files. These can be served via Amazon S3 and Amazon CloudFront, together with any image and media assets used. The process of running the build process and managing the deployment can be automated using AWS Amplify Console.

In this walk through, I use GitHub as the repo provider. You can also use AWS CodeCommit, Bitbucket, GitLab, or upload the build directory from your machine.

To deploy the front end via Amplify Console:

  1. From the AWS Management Console, select the Services dropdown and choose AWS Amplify. From the initial splash screen, choose Get Started under Deploy.Amplify Console getting started
  2. Select GitHub as the repository provider, then choose Continue:Select GitHub as your code repo
  3. Follow the prompts to enable GitHub access, then select the repository dropdown and choose the repo. In the Branch dropdown, choose master. Choose Next.Add repository branch
  4. In the App build and test settings page, choose Next.
  5. In the Review page, choose Save and deploy.
  6. The final screen shows the deployment pipeline for the connected repo, starting at the Provision phase:Amplify Console deployment pipeline

After a few minutes, the Build, Deploy, and Verify steps show green checkmarks. Open the URL in a browser, and you see that the application is now served by the public URL:

Ask Around Me - Deployed application

Finally, before logging in, you must add the URL to the list of allowed URLs in the Auth0 settings:

  1. Log into Auth0 and navigate to the dashboard.
  2. Choose Applications in the menu, then select Ask Around Me from the list of applications.
  3. On the Settings tab, add the application’s URL to Allowed Callback URLs, Allowed Logout URLs, and Allowed Web Origins. Separate from the existing values using a comma.Updating the Auth0 configuration
  4. Choose Save changes. This allows the new published domain name to interact with Auth0 for authentication your application’s users.

Anytime you push changes to the code repository, Amplify Console detects the commit and redeploys the application. If errors are detected, the existing version is presented to users. If there are no errors, the new version is served to visitors.

Conclusion

In the last part of this series, I show how the application queues posted questions and answers. I explain how this asynchronous approach smooths traffic spikes and helps maintain responsive APIs.

I cover how answers are collected from thousands of users and are aggregated using DynamoDB streams. These totals are saved as summaries in the Questions table, and live updates are pushed via AWS IoT Core back to the frontend.

Finally, I show how you can automate deployment using Amplify Console. By connecting the service directly with your code repository, it publishes and serves your application with no need to manually copy files.

To learn more about this application, see the accompanying GitHub repo.

Building a location-based, scalable, serverless web app – part 2

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-a-location-based-scalable-serverless-web-app-part-2/

Part 1 introduces the Ask Around Me web application that allows users to send questions to other local users in real time. I explain the app’s functionality and how using a single-page application (SPA) framework complements a serverless backend. I configure Auth0 for authentication and show how to deploy the frontend and backend. I also introduce how SPA frontends can send and receive data using both a traditional API and real-time messaging via a WebSocket.

In this post, I review the backend architecture, Amazon API Gateway’s HTTP APIs, and the geohashing implementation. The code and instructions for this application are available in the GitHub repo.

Architecture overview

After deploying the application using the repo’s README.md instructions, the backend architecture looks like this:

Ask Around Me backend architecture

The Vue.js frontend primarily interacts with the backend via HTTP APIs using Amazon API Gateway. When users submit questions or answers, the data is sent via the POST API endpoints. When the frontend requests lists of questions or answers, this occurs via the GET API endpoints.

Incoming questions and answers are posted to separate Amazon SQS queues. These queues invoke AWS Lambda functions that process and store the data in the application’s Amazon DynamoDB tables. In the Questions table, the application saves geo-location data and aggregated statistics for each question. The Answers table maintains a record of user IDs and answers to ensure that each user can only post one answer per question.

When new answers are stored in the Answers tables, a DynamoDB stream triggers a Lambda aggregation function with the update. This calculates average scores for questions and aggregates data for the heat map, then stores the result in the main Questions table. When the Questions table is updated, this DynamoDB stream invokes the Publish Lambda function. This publishes updates to the relevant topic in AWS IoT Core, which the front-end application subscribes to.

Using HTTP APIs

API Gateway is a common integration service used between the frontend and backend of serverless web applications. You can choose between the standard REST APIs, and the newer HTTP APIs. The choice depends upon which features you need, and cost considerations for your workload.

This application uses JWT authentication via Auth0 and Lambda proxy integration, and both are supported by HTTP APIs. Many advanced features like API key management, Amazon Cognito integration, and usage plans are not required in this application. It’s also important to compare to cost of each service:

API typeHourlyDaily Annually
PUT questions1,00024,0008,760,000
GET questions50,0001,200,000438,000,000
PUT answers10,000240,00087,600,000
Total API requests 534,360,000
REST APIs cost$1,870.26
HTTP APIs$534.36

Using the predicted API usage covered in part 1, you can compare the REST APIs and HTTP APIs overall cost. At an estimated $534 annually, the HTTP APIs option is approximately 30% of the cost of REST APIs.

The AWS Serverless Application Model (SAM) template in the repo defines the HTTP API resource and CORS configuration. It also includes the Auth0 authorizer used to validate each API request:

  MyApi:
    Type: AWS::Serverless::HttpApi
    Properties:
     Auth:
        Authorizers:
          MyAuthorizer:
            JwtConfiguration:
              issuer: !Ref Auth0issuer
              audience:
                - https://auth0-jwt-authorizer
            IdentitySource: "$request.header.Authorization"
        DefaultAuthorizer: MyAuthorizer

      CorsConfiguration:
        AllowMethods:
          - GET
          - POST
          - DELETE
          - OPTIONS
        AllowHeaders:
          - "*"   
        AllowOrigins: 
          - "*"   

With the HTTP API resource defined, each Lambda function has an event configuration referencing this resource. All the functions referencing the HTTP API resource automatically use the Auth0 authorizer.

  GetAnswersFunction: 
    Type: AWS::Serverless::Function
    Properties:
      Description: Get all answers for a question
      ... 
      Events:
        Get:
          Type: HttpApi
          Properties:
            Path: /answers/{Key}
            Method: get
            ApiId: !Ref MyApi    

Using geohashing in web applications

A key part of the functionality in Ask Around Me is the ability to find and answer questions near the user. Given the expected volume of questions in this system, this requires an efficient way to query based upon location that maintains performance as traffic grows.

In a naïve implementation, you might compare the current geographical position of the user with the geo-location of each question and answer in the database. But with an expected 1,000 questions per hour, this would soon become a slow operation with O(n) performance.

A more efficient solution is geohashing. This divides the geographical area of the planet into series of grid cells that are identified by an alphanumeric hash. The first character of the hash identifies one of 32 cells in the grid, roughly 5000 km x 5000 km on the planet. The second character identifies one of 32 squares in that first cell, so combining the first two characters provides a resolution of approximately 1250 km x 1250 km. By the 12th character in the hash, you can identify an area as small as a couple of square inches on Earth. For a more detailed explanation, see this geohashing site.

When using this algorithm, it’s important to choose the correct level of resolution. For Ask Around Me, the frontend searches for questions within 5 miles of the user. You can identify these areas with a 5-character hash. This means you can compare the user’s current location using their geohash, to the geohash stored in the Questions table. This comparison allows you to immediately discard most questions from the search and quickly find the relevant items.

This solution uses the Geo Library for Amazon DynamoDB npm library. Both the GET and POST questions APIs use this library to calculate the geohash when storing and fetching questions. The library requires a dedicated DynamoDB table, which is why user answers are stored in a separate table.

The GET questions API uses the latitude and longitude from the query parameters to query the underlying DynamoDB using this library:

const AWS = require('aws-sdk')
AWS.config.update({region: process.env.AWS_REGION})

const ddb = new AWS.DynamoDB() 
const ddbGeo = require('dynamodb-geo')
const config = new ddbGeo.GeoDataManagerConfiguration(ddb, process.env.TableName)
config.hashKeyLength = 5

const myGeoTableManager = new ddbGeo.GeoDataManager(config)
const SEARCH_RADIUS_METERS = 4000

exports.handler = async (event) => {

  const latitude = parseFloat(event.queryStringParameters.lat)
  const longitude = parseFloat(event.queryStringParameters.lng)

  // Get questions within geo range
  const result = await myGeoTableManager.queryRadius({
    RadiusInMeter: SEARCH_RADIUS_METERS,
    CenterPoint: {
      latitude,
      longitude
    }
  })

  return {
    statusCode: 200,
    body: JSON.stringify(result)
  }
}

The publish/subscribe pattern for real time in web apps

Modern web applications frequently use real-time notifications to keep users informed of state changes. You could achieve this with frequently polling of the APIs to fetch new information. However, this approach is usually wasteful, both in cost and compute terms, because most API calls do not return new information. Additionally, if updates are evenly distributed and you poll every n seconds, there is an average delay of n/2 seconds between data becoming available and your application receiving it.

Instead of polling, a better option for many web applications is a WebSocket. Data availability is closer to real time, and the messaging is less frequent. This can be important for web applications used on mobile devices where unnecessary messaging can impact battery life.

This approach uses the publish-subscribe pattern. The frontend makes subscriptions to a backend service, indicating topics of interest. The backend service receives messages from publishers, which are upstream processes in the application. It filters the messages and routes to the appropriate subscribers.

Although powerful, this can be complex to implement due to connectivity issues over networks. For a web application, users may turn off their devices, disconnect Wi-Fi, or become unreachable due to limited coverage. This pattern is generally forward-only, meaning you only receive messages after the point of subscription.

AWS IoT Core simplifies this process, and the JavaScript SDK handles the common reconnection issues. The backend application sends messages to topics in AWS IoT Core, and the frontend application subscribes to topics of interest. The service maintains the list of active publishers and subscribers, and routes messages between the two. It also automatically manages fan-out, which occurs when there are many subscribers to a single topic.

From a pricing perspective, this is also a cost-efficient approach. At the time of writing, AWS IoT Core costs $0.08 per million minutes of connection, and $1.00 per million messages. There are also no servers to manage, and the service scales automatically to handle your application’s load.

In the example application, the real-time connection is configured and managed in a single component, IoT.vue. This initiates a connection to an IoT endpoint when the application first starts, and listens for messages on subscribed topics. It passes data back to the global Vuex store so other components automatically receive updates with no dependency on the IoT component.

Choosing publish-subscribe topics for web apps

In a typical synchronous API call, the client application makes a specific request and receives a response from a backend service. With a topic-based subscription, the topic itself is the equivalent of the request, but you usually don’t receive immediate information.

In this web application, there are a number of topics that are potentially important to users. Some topics are shared across multiple users, while other are private to a single user:

  • Account-level topic: messages relating only to a single user ID, such as billing and notifications. These are intended for any devices where that user is logged in.
  • Per-question topic: when a user asks a question, they need alerts when new answers arrive. Each question ID maps to an individual topic. Anyone who asks or watches a question subscribes to this topic.
  • Geo-fenced alert topic: a user receives alerts when new questions are asked in their local area. In this case, the geohash of their location is the topic identifier. New questions are published to their geohash topics, and users within the same geohash area receive those messages.
  • A system-wide topic: this is a single topic that all users subscribe to. This is reserved for important messages for all application users.

In web applications, you subscribe to some topics when the application initializes, such as account-level or system-wide topics. Other subscriptions are dynamic. For example, you subscribe to a question ID topic only after posting a question, or subscribe to different geo-fence hashes when the user’s location changes.

Conclusion

This post explores the backend architecture of the Ask Around Me application. I compare the cost and features in deciding between REST APIs and HTTP APIs in API Gateway. I introduce geohashing and the npm library used to handle geo-location queries in DynamoDB. And I show how you can build real-time messaging into your web applications using the publish-subscribe pattern with AWS IoT Core.

To learn more, visit the application’s code repo on GitHub.

Creating a scalable serverless import process for Amazon DynamoDB

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/creating-a-scalable-serverless-import-process-for-amazon-dynamodb/

Amazon DynamoDB is a web-scale NoSQL database designed to provide low latency access to data. It’s well suited to many serverless applications as a primary data store, and fits into many common enterprise architectures. In this post, I show how you can import large amounts of data to DynamoDB using a serverless approach. This uses Amazon S3 as a staging area and AWS Lambda for the custom business logic.

This pattern is useful as a general import mechanism into DynamoDB because it separates the challenge of scaling from the data transformation logic. The incoming data is stored in S3 objects, formatted as JSON, CSV, or any custom format your applications produce. The process works whether you import only a few large files or many small files. It takes advantage of parallelization to import data quickly into a DynamoDB table.

Using S3-to-Lambda to import at scale to DynamoDB.

This is useful for applications where upstream services produce transaction information, and can be effective for handling data generated by spiky workloads. Alternatively, it’s also a simple way to migrate from another data source to DynamoDB, especially for large datasets.

In this blog post, I show two different import applications. The first is a direct import into the DynamoDB table. The second explores a more advanced method for smoothing out volume in the import process. The code uses the AWS Serverless Application Model (SAM), enabling you to deploy the application in your own AWS Account. This walkthrough creates resources covered in the AWS Free Tier but you may incur cost for large data imports.

To set up both example applications, visit the GitHub repo and follow the instructions in the README.md file.

Directly importing data from S3 to DynamoDB

The first example application loads data directly from S3 to DynamoDB via a Lambda function. This uses the following architecture:

Architecture for the first example application.

  1. A downstream process creates source import data in JSON format and writes to an S3 bucket.
  2. When the objects are saved, S3 invokes the main Lambda function.
  3. The function reads the S3 object and converts the JSON into the correct format for the DynamoDB table. It uploads this data in batches to the table.

The repo’s SAM template creates a DynamoDB table with a partition key, configured to use on-demand capacity. This mode enables the DynamoDB service to scale appropriately to match the number of writes required by the import process. This means you do not need to manage DynamoDB table capacity, as you would in the standard provisioned mode.

  DDBtable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
      - AttributeName: ID
        AttributeType: S
      KeySchema:
      - AttributeName: ID
        KeyType: HASH
      BillingMode: PAY_PER_REQUEST

The template defines the Lambda function to import the data:

  ImportFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: importFunction/
      Handler: app.handler
      Runtime: nodejs12.x
      MemorySize: 512
      Environment:
        Variables:
          DDBtable: !Ref DDBtable      
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref DDBtable        
        - S3ReadPolicy:
            BucketName: !Ref InputBucketName
      Events:
        FileUpload:
          Type: S3
          Properties:
            Bucket: !Ref InputS3Bucket
            Events: s3:ObjectCreated:*
            Filter: 
              S3Key:
                Rules:
                  - Name: suffix
                    Value: '.json'            

This uses SAM policy templates to provide write access to the DynamoDB table and read access to S3 bucket. It also defines the event that causes the function invocation from S3, filtering only for new objects with a .json suffix.

Testing the application

  1. Deploy the first application by following the README.md in the GitHub repo, and note the application’s S3 bucket name and DynamoDB table name.
  2. Change into the dataGenerator directory:
    cd ./dataGenerator
  3. Create sample data for testing. The following command creates 10 files of 100 records each:
    node ./app.js 100 10
  4. Upload the sample data into your application’s S3 bucket, replacing your-bucket below with your bucket name:
    aws s3 cp ./data/ s3://your-bucket --recursiveYour console output shows the following, confirming that the sample data is uploaded to S3.Generating and uploading sample data for testing.
  5. After a few seconds, enter this command to show the number of items in the application’s DynamoDB table. Replace your-table with your deployed table name:aws dynamodb scan --table-name your-table --select "COUNT"Your console output shows that 1,000 items are now stored in the DynamoDB and the files are successfully imported.Checking the number of items stored in the DynamoDB.

With on-demand provisioning, the per-table limit of 40,000 write request units still applies. For high volumes or sudden, spiky workloads, DynamoDB may throttle the import when using this approach. Any throttling events appears in the Metrics tab of the table in the DynamoDB service console. Throttling is intended to protect your infrastructure but there are times when you want to process these high volumes. The second application in the repo shows how to address this.

Handling extreme loads and variability in the import process

In this next example, the goal is to smooth out the traffic, so that the load process into DynamoDB is much more consistent. The key service used to achieve this is Amazon SQS, which holds all the items until a loader process stores the data in DynamoDB. The architecture looks like this:

Architecture for the second example application.

  1. A downstream process creates source import data in JSON format and writes to an S3 bucket.
  2. When the objects are saved, S3 invokes a Lambda function that transforms the input and adds these as messages in an Amazon SQS queue.
  3. The Lambda polls the SQS queue and invokes a function to process the messages batches.
  4. The function converts the JSON messages into the correct format for the DynamoDB table. It uploads this data in batches to the table.

Testing the application

In this test, you generate a much larger amount of data using a greater number of S3 objects. The instructions below creates 100,000 sample records, so running this code may incur cost on your AWS bill.

  1. Deploy the second application by following the README.md in the GitHub repo, and note the application’s S3 bucket name and DynamoDB table name.
  2. Change into the dataGenerator directory:
    cd ./dataGenerator
  3. Create sample data for testing. The following command creates 100 files of 1,000 records each:
    node ./app.js 1000 100
  4. Upload the sample data into your application’s S3 bucket, replacing your-bucket below with your deployed bucket name:aws s3 cp ./data/ s3://your-bucket --recursiveThis process takes around 10 minutes to complete with the default configuration in the repo.
  5. From the DynamoDB console, select the application’s table and then choose the Metrics tab. Select the Write capacity graph to zoom into the chart:Using CloudWatch to view WCUs consumed in the uploading process.

The default configuration deliberately slows down the load process to illustrate how it works. Using this approach, the load into the database is much more consistent, consuming between 125-150 write capacity units (WCUs) per minute. This design makes it possible to vary how quickly you load data into the DynamoDB table, depending upon the needs of your use-case.

How this works

In this second application, there are multiple points where the application uses a configuration setting to throttle the flow of data to the next step.

Throttling points in the architecture.

  1. AddToQueue function: this loads data from the source S3 object into SQS in batches of 25 messages. Depending on the size of your source records, you may add more records into a single SQS message, which has a size limit of 256 Kb. You can also compress this message with gzip to further add more records.
  2. Function concurrency: the SAM template sets the Loader function’s concurrency to 1, using the ReservedConcurrentExecutions attribute. In effect, this stops Lambda from scaling this function, which means it keeps fetching the next batch from SQS as soon as processing finishes. The concurrency is a multiplier – as this value is increased, the loading into the DynamoDB table increases proportionately, if there are messages available in SQS. Select a value greater than 1 to use parallelization in the load process.
  3. Loader function: this consumes messages from the SQS queue. The BatchSize configured in the SAM template is set to four messages per invocation. Since each message contains 25 records, this represents 100 records per invocation when the queue has enough messages. You can set a BatchSize value from 1 to 10, so could increase this from the application’s default.

When you combine these settings, it’s possible to dramatically increase the throughput for loading data into DynamoDB. Increasing the load also increases WCUs consumed, which increases cost. Your use case can inform you about the optimal balance between speed and cost. You can make changes, too – it’s simple to make adjustments to meet your requirements.

Additionally, each of the services used has its own service limits. For high production loads, it’s important to understand the quotas set, and whether these are soft or hard limits. If your application requires higher throughput, you can request raising soft limits via an AWS Support Center ticket.

Conclusion

DynamoDB does not offer a native import process and existing solutions may not meet your needs for unplanned, large-scale imports. The AWS Database Migration Service is not serverless, and the AWS Data Pipeline is schedule-based rather than event-based. This solution is designed to provide a fully serverless alternative that responds to incoming data to S3 on-demand.

In this post, I show how you can create a simple import process directly to the DynamoDB table, triggered by objects put into an S3 bucket. This provides a near-real time import process. I also show a more advanced approach to smooth out traffic for high-volume or spiky workloads. This helps create a resilient and consistent data import for DynamoDB.

To learn more, watch this video to see how to deploy and test the DynamoDB importer application.

Automating scalable business workflows using minimal code

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/automating-scalable-business-workflows-using-minimal-code/

Organizations frequently have complex workflows embedded in their processes. When a customer places an order, it triggers a workflow. Or when an employee requests vacation time, this starts another set of processes. Managing these at scale can be challenging in traditional applications, which must often manage thousands of separate tasks.

In this blog post, I show how to use a serverless application to build and manage enterprise workflows at scale. This minimal-code solution is highly scalable and flexible, and can be modified easily to meet your needs. This application uses Amazon S3, AWS Lambda, and AWS Step Functions:

Using S3-to-Lambda to trigger Step Functions workflows

AWS Step Functions allows you to represent workflows as a JSON state machine. This service can help remove custom code and convoluted logic from distributed systems, and make it easier to maintain and modify. S3 is a highly scalable service that stores trillions of objects, and Lambda runs custom code in response to events. By combining these services, it’s simple to build resilient workflows with high throughput, triggered by putting objects in S3 buckets.

There are many business use-cases for this approach. For example, you could automatically pay invoices from approved vendors under a threshold amount by reading the invoices stored in S3 using Amazon Textract. Or your application could automatically book consultations for patients emailing their completed authorization forms. Almost any action that is triggered by a document or form is a potential candidate for an automated workflow solution.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file. The code uses the AWS Serverless Application Model (SAM), enabling you to deploy the application in your own AWS account. This walkthrough creates resources covered in the AWS Free Tier but you may incur cost if you test with large amounts of data.

How the application works

The starting point for this serverless solution is S3. When new objects are stored, this triggers a Lambda function that starts an execution in the Step Functions workflow. Lambda scales to keep pace as more objects are written to the S3 bucket, and Step Functions creates a separate execution for each S3 object. It also manages the state of all the distinct workflows.

Simple Step Functions workflow.

  1. A downstream process stores data in the S3 bucket.
  2. This invokes the Start Execution Lambda function. The function creates a new execution in Step Functions using the S3 object as event data.
  3. The workflow invokes the Decider function. This uses Amazon Rekognition to detect the contents of objects stored in S3.
  4. This function uses environment variables to determine the matching attributes. If the S3 object matches the criteria, it triggers the Match function. Otherwise, the No Match function is invoked.

The application’s SAM template configures the Step Functions state machine as JSON. It also defines an IAM role allowing Step Functions to invoke the Lambda functions. The initial function invoked by S3 is defined to accept the state machine ARN as an environment variable. The template also defines the permissions needed and the S3 trigger:

  StartExecutionFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: StartExecutionFunction/
      Handler: app.handler
      Runtime: nodejs12.x
      MemorySize: 128
      Environment:
        Variables:
          stateMachineArn: !Ref 'MatcherStateMachine'
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref InputBucketName
        - Statement:
          - Effect: Allow
            Resource: !Ref 'MatcherStateMachine'
            Action:
              - states:*
      Events:
        FileUpload:
          Type: S3
          Properties:
            Bucket: !Ref InputBucket
            Events: s3:ObjectCreated:*

This uses SAM policy templates to provide read access to the S3 bucket. It also defines the event that causes the function invocation from S3, filtering only for new objects with a .json suffix.

The Decider function is the first step of the Step Functions workflow. It uses Amazon Rekognition to detect labels and words from the images provide. The SAM template passes the required labels and words to the function, together with an optional confidence score:

  DeciderFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: deciderFunction/
      Handler: app.handler
      Environment:
        Variables:
          requiredWords: "NEW YORK"
          requiredLabels: "Driving License,Person"
          minConfidence: 70

If the requiredLabels environment variable is present, the function’s code calls Amazon Rekognition’s detectLabel method. It then calls the detectText method if the requiredWords environment variable is used:

// The standard Lambda handler
exports.handler = async (event) => {
  return await processDocument(event)
}

// Detect words/labels on document or image
const processDocument = async (event) => {

  // If using a required labels
  if (process.env.requiredLabels) {
    // If no match, return immediately
    if (!await checkRequiredLabels(event)) return 'NoMatch'
  }  

  // If using a required words test
  if (process.env.requiredWords) {
    // If no match, return immediately
    if (!await checkRequiredWords(event)) return 'NoMatch'
  }

  return 'Match'
}

The Decider function returns “Match” or “No match” to the Step Functions workflow. This invokes downstream functions depending on the result. The Match and No Match functions are stubs where you can build the intended functionality in the workflow. This Step Functions workflow is designed generically so you can extend the functionality easily.

Testing the application

Deploy the first application by following the README.md in the GitHub repo, and note the application’s S3 bucket name. There are three test cases:

  • Create a workflow for a matched subject in an image. From photos uploaded to S3, identify which images contain one or more subjects, and invoke the Match path of the workflow.
  • Create a workflow for invoices from a specific vendor. From multiple invoices uploaded, matching those from a vendor, and trigger the Match path of the workflow.
  • Create a workflow for driver licenses issued by a single state. From a collection of drivers licenses, trigger the Match workflow for only a single state.

1. Create a workflow for matched subject in an image

In this example, the application identifies cats in images uploaded to the S3 bucket. The default configuration in the SAM template in the GitHub repo contains the environment variables set for this example:

Environment variables in SAM template.

First, I upload over 20 images of various animals to the S3 bucket:

Uploading files to the S3 bucket.

After navigating to the Step Function console, and selecting the application’s state machine, it shows 24 separate executions, one per image:

Step Functions execution detail.

I select one of these executions, for cat3.jpg. This has followed the MatchFound execution path of the workflow:

MatchFound execution path.

2. Create a workflow for invoices for a specified vendor.

For this example, the application looks for a customer account number and vendor name in invoices uploaded to the S3 bucket. The Decider function uses environment variables to determine the matching keywords. These can be updated by either deploying the SAM template or editing the Lambda function directly.

I modify the SAM template to match the vendor name and account number as follows:

SAM template with vendor information.

Next I upload several different invoices from the local machine to the S3 bucket:

Uploading different files to the S3 bucket.

In the Step Functions console, I select the execution for utility-bill.png. This execution matches the criteria and follows the MatchFound path in the workflow.

MatchFound path in visual workflow.

3. Matching a driver’s license by state

In this example, the application routes based upon the state where a driver’s license is issued. For this test, I use a range of sample images of licenses from DMVs in multiple states.

I modify the SAM template so the Decider function uses both label and word detection. I set “Driving License” and “Person” as required labels. This ensures that Amazon Rekognition identifies a person is in the photo in addition to the document type.

Environment variables in the SAM template.

Next, I upload the driver’s license images to the S3 bucket:

Uploading files to the S3 bucket.

In the Step Functions console, I open the execution for the driver-license-ny.png file, and it has followed the MatchFound path in the workflow:

Execution path for driver's license test.

When I select the execution for the Texas driver’s license, this did not match and has followed the NoMatchFound execution path:

Execution path for NoMatchFound.

Extending the functionality

By triggering Step Functions workflows from S3 PutObject events, this application is highly scalable. As more objects are stored in the S3 bucket, it creates as many executions as needed in the state machine. The custom code only handles the specific logic requirements for a single object and the Lambda service scales up to meet demand.

In these examples, the application uses Amazon Rekognition to analyze specific document types or image contents. You could extend this logic to include value ranges, multiple alternative workflow paths, or include steps to enable human intervention.

Using Step Functions also makes it easy to modify workflows as requirements change. Any incomplete workflows continue on the existing version of the state machine used when they started. As a result, you can add steps without impacting existing code, making it faster to adapt applications to users’ needs.

Conclusion

You can use Step Functions to model many common business workflows with JSON. Combining this powerful workflow management service with the scalability of S3 and Lambda, you can quickly build nuanced solutions that operate at scale.

In this post, I show how you can deploy a simple Step Functions workflow where executions are created by objects stored in an S3 bucket. Using minimal code, it can perform complex workflow routing tasks based on document types and contents. This provides a highly flexible and scalable way to manage common organizational workflow needs.

Translating documents at enterprise scale with serverless

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/translating-documents-at-enterprise-scale-with-serverless/

For organizations operating in multiple countries, helping customers in different languages is an everyday reality. But in many IT systems, data remains static in a single language, making it difficult or impossible for international customers to use. In this blog post, I show how you can automate language translation at scale to solve a number of common enterprise problems.

Many types of data are good targets for translation. For example, product catalog information, for sharing with a geographically broad customer base. Or customer emails and interactions in multiple languages, translating back to a single language for analytics. Or even resource files for mobile applications, to automate string processing into different languages during the build process.

Building the machine learning models for language translation is extraordinarily complex, so fortunately you have a pay-as-you-go service available in Amazon Translate. This can accurately translate text between 54 languages, and automatically detects the source language.

Developing a scalable translation solution for thousands of documents can be challenging using traditional, server-based architecture. Using a serverless approach, this becomes much easier since you can use storage and compute services that scale for you – Amazon S3 and AWS Lambda:

Integrating S3 with Translate via Lambda.

In this post, I show two solutions that provide an event-based architecture for automated translation. In the first, S3 invokes a Lambda function when objects are stored. It immediately reads the file and requests a translation from Amazon Translate. The second solution explores a more advanced method for scaling to large numbers of documents, queuing the requests and tracking their state. The walkthrough creates resources covered in the AWS Free Tier but you may incur costs for usage.

To set up both example applications, visit the GitHub repo and follow the instructions in the README.md file. Both applications use the AWS Serverless Application Model (SAM) to make it easy to deploy in your AWS account.

Translating in near real time

In the first application, the workflow is straightforward. The source text is sent immediately to Translate for processing, and the result is saved back into the S3 bucket. It provides near real-time translation whenever an object is saved. This uses the following architecture:

Architecture for the first example application.

  1. The source text is saved in the Batching S3 bucket.
  2. The S3 put event invokes the Batching Lambda function. Since Translate has a limit of 5,000 characters per request, it slices the contents of the input into parts small enough for processing.
  3. The resulting parts are saved in the Translation S3 bucket.
  4. The S3 put events invoke the Translation function, which scales up concurrently depending on the number of parts.
  5. Amazon Translate returns the translations back to the Lambda function, which saves the results in the Translation bucket.

The repo’s SAM template allows you to specify a list of target languages, as a space-delimited list of supported language codes. In this case, any text uploaded to S3 is translated into French, Spanish, and Italian:

Parameters:
  TargetLanguage:
    Type: String
    Default: 'fr es it'

Testing the application

  1. Deploy the first application by following the README.md in the GitHub repo. Note the application’s S3 Translation and Batching bucket names shown in the output:Output values after SAM deployment.
  2. The testdata directory contains several sample text files. Change into this directory, then upload coffee.txt to the S3 bucket, replacing your-bucket below with your Translation bucket name:
    cd ./testdata/
    aws s3 cp ./coffee.txt s3://your-bucket
  3. The application invokes the translation workflow, and within a couple of seconds you can list the output files in the translations folder:aws s3 ls s3://your-bucket/translations/

    Translations output.

  4. Create an output directory, then download the translations to your local machine to view the contents:
    mkdir output
    aws s3 cp s3://your-bucket/translations/ ./output/ --recursive
    more ./output/coffee-fr.txt
    more ./output/coffee-es.txt
    more ./output/coffee-it.txt
  5. For the next step, translate several text files containing test data. Copy these to the Translation bucket, replacing your-bucket below with your bucket name:aws s3 cp ./ s3://your-bucket --include "*.txt" --exclude "*/*" --recursive
  6. After a few seconds, list the files in the translations folder to see your translated files:aws s3 ls s3://your-bucket/translations/

    Listing the translated files.

  7. Finally, translate a larger file using the batching process. Copy this file to the Batching S3 bucket (replacing your-bucket with this bucket name):
    cd ../testdata-batching/
    aws s3 cp ./your-filename.txt s3://your-bucket
  8. Since this is a larger file, the batching Lambda function breaks it apart into smaller text files in the Translation bucket. List these files in the terminal, together with their translations:
    aws s3 ls s3://your-bucket
    aws s3 ls s3://your-bucket/translations/

    Listing the output files.

In this example, you can translate a reasonable number of text files for a trivial use-case. However, in an enterprise environment where there could be thousands of files in a single bucket, you need a more robust architecture. The second application introduces a more resilient approach.

Scaling up the translation solution

In an enterprise environment, the application must handle long documents and large quantities of documents. Amazon Translate has service limits in place per account – you can request an increase via an AWS Support Center ticket if needed. However, S3 can ingest a large number of objects quickly, so the application should decouple these two services.

The next example uses Amazon SQS for decoupling, and also introduces Amazon DynamoDB for tracking the status of each translation. The new architecture looks like this:

Decoupled translation architecture.

  1. A downstream process saves text objects in the Batching S3 bucket.
  2. The Batching function breaks these files into smaller parts, saving these in the Translation S3 bucket.
  3. When an object is saved in this bucket, this invokes the Add to Queue function. This writes a message to an SQS queue, and logs the item in a DynamoDB table.
  4. The Translation function receives messages from the SQS queue, and requests translations from the Amazon Translate service.
  5. The function updates the item as completed in the DynamoDB table, and stores the output translation in the Results S3 bucket.

Testing the application

This test uses a much larger text document – the text version of the novel War and Peace, which is over 3 million characters long. It’s recommended that you use a shorter piece of text for the walkthrough, between 20-50 kilobytes, to minimize cost on your AWS bill.

  1. Deploy the second application by following the README.md in the GitHub repo, and note the application’s S3 bucket name and DynamoDB table name.
    The output values from the SAM deployment.
  2. Download your text sample and then upload it the Batching bucket. Replace your-bucket with your bucket name and your-text.txt with your text file name:aws s3 cp ./your-text.txt s3://your-bucket/ 
  3. The batching process creates smaller files in the Translation bucket. After a few seconds, list the files in the Translation bucket (replacing your-bucket with your bucket name):aws s3 ls s3://patterns-translations-v2/ --recursive --summarize

    Listing the translated files.

  4. To see the status of the translations, navigate to the DynamoDB console. Select Tables in the left-side menu and then choose the application’s DynamoDB table. Select the Items tab:Listing items in the DynamoDB table.

    This shows each translation file and a status of Queue or Translated.

  5. As translations complete, these appear in the Results bucket:aws s3 ls s3://patterns-results-v2/ --summarize

    Listing the output translations.

How this works

In the second application, the SQS queue acts as a buffer between the Batching process and the Translation process. The Translation Lambda function fetches messages from the SQS queue when they are available, and submits the source file to Amazon Translate. This throttles the overall speed of processing.

There are configuration settings you can change in the SAM template to vary the speed of throughput:

  • Translator function: this consumes messages from the SQS queue. The BatchSize configured in the SAM template is set to one message per invocation. This is equivalent to processing one source file at a time. You can set a BatchSize value from 1 to 10, so could increase this from the application’s default.
  • Function concurrency: the SAM template sets the Loader function’s concurrency to 1, using the ReservedConcurrentExecutions attribute. In effect, this means Lambda can only invoke 1 function at the same time. As a result, it keeps fetching the next batch from SQS as soon as processing finishes. The concurrency is a multiplier – as this value is increased, the translation throughput increases proportionately, if there are messages available in SQS.
  • Amazon Translate limits: the service limits in place are designed to protect you from higher-than-intended usage. If you need higher soft limits, open an AWS Support Center ticket.

Combining these settings, you have considerable control over the speed of processing. The defaults in the sample application are set at the lowest values possible so you can observe the queueing mechanism.

Conclusion

Automated translation using deep learning enables you to make documents available at scale to an international audience. For organizations operating globally, this can improve your user experience and increase customer access to your company’s products and services.

In this post, I show how you can create a serverless application to process large numbers of files stored in S3. The write operation in the S3 bucket triggers the process, and you use SQS to buffer the workload between S3 and the Amazon Translate service. This solution also uses DynamoDB to help track the state of the translated files.

Node.js 12.x runtime now available in AWS Lambda

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/node-js-12-x-runtime-now-available-in-aws-lambda/

We are excited to announce that you can now develop AWS Lambda functions using the Node.js 12.x runtime, which is the current Long Term Support (LTS) version of Node.js. Start using this new version today by specifying a runtime parameter value of nodejs12.x when creating or updating functions.

Language Updates

Here is a quick primer that highlights just some of the new or improved features that come with Node.js 12:

  • Updated V8 engine
  • Public class fields
  • Private class fields
  • TLS improvements

Updated V8 engine

Node.js 12.x is powered by V8 7.4, which is a significant upgrade from V8 6.8 powering the previous Node.js 10.x. This upgrade brings with it performance improvements for faster JavaScript execution, better memory management, and broader support for ECMAScript.

Public class fields

With the upgraded V8 version comes support for public class fields. This enhancement allows for public fields to be defined at the class level, providing cleaner code.

Before:

class User {
	constructor(user){
		this.firstName = user.firstName
		this.lastName = user.lastName
		this.id = idGenerator()
	}
}

After:

{	
	id = idGenerator()
	
	constructor(user){
		this.firstName = user.firstName
		this.lastName = user.lastName
	}
}

Private class fields

In addition to public class fields, Node.js also supports the use of private fields in a class. These fields are not accessible outside of the class and will throw a SyntaxError if an attempt is made. To mark a field as private in a class simply start the name of the field with a ‘#’.

class User {
	#loginAttempt = 0;
	
	increment() {
		this.#loginAttempt++;
	}
	
	get loginAttemptCount() {
		return this.#loginAttempt;
	}
}

const user = new User()
console.log(user.loginAttemptCount) // returns 0
user.increment()				
console.log(user.loginAttemptCount)	// returns 1
console.log(user.#loginAttempt)		// returns SyntaxError: Private field '#loginAttempt'
									// must be declared in an enclosing class

TLS improvements

As a security improvement, Node.js 12 has also added support for TLS 1.3. This increases the security of TLS connections by removing hard to configure and often vulnerable features like SHA-1, RC4, DES, and AES-CBC. Performance is also increased because TLS 1.3 only requires a single round trip for a TLS handshake compared to earlier versions requiring at least two.

For more information, see the AWS Lambda Developer Guide.

Runtime Updates

Multi-line log events in Node.js 12 will work the same way they did in Node.js 8.10 and before. Node.js 12 will also support exception stack traces in AWS X-Ray helping you to debug and optimize your application. Additionally, to help keep Lambda functions secure, AWS will update Node.js 12 with all minor updates released by the Node.js community.

Deprecation schedule

AWS will be deprecating Node.js 8.10 according to the end of life schedule provided by the community. Node.js 8.10 will reach end of life on December 31, 2019. After January 6, 2020, you can no longer create a Node.js 8.10 Lambda function and the ability to update will be disabled after February 3, 2020. More information on can be found here.

Existing Node.js 8.10 functions can be migrated to the new runtime by making any necessary changes to code for compatibility with Node.js 12, and changing the function’s runtime configuration to “nodejs12.x”. Lambda functions running on Node.js 12 will have 2 full years of support.

Amazon Linux 2

Node.js 12, like Node.js 10, Java 11, and Python 3.8, is based on an Amazon Linux 2 execution environment. Amazon Linux 2 provides a secure, stable, and high-performance execution environment to develop and run cloud and enterprise applications.

Next steps

Get started building with Nodejs 12 today by specifying a runtime parameter value of nodejs12.x when creating or updating your Lambda functions.

Happy coding with Nodejs 12!

Security updates for Thursday

Post Syndicated from ris original https://lwn.net/Articles/756164/rss

Security updates have been issued by CentOS (389-ds-base, corosync, firefox, java-1.7.0-openjdk, java-1.8.0-openjdk, kernel, librelp, libvirt, libvncserver, libvorbis, PackageKit, patch, pcs, and qemu-kvm), Fedora (asterisk, ca-certificates, gifsicle, ncurses, nodejs-base64-url, nodejs-mixin-deep, and wireshark), Mageia (thunderbird), Red Hat (procps), SUSE (curl, kvm, and libvirt), and Ubuntu (apport, haproxy, and tomcat7, tomcat8).

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/755796/rss

Security updates have been issued by Debian (batik, cups, gitlab, ming, and xdg-utils), Fedora (dpdk, firefox, glibc, nodejs-deep-extend, strongswan, thunderbird, thunderbird-enigmail, wavpack, xdg-utils, and xen), Gentoo (ntp, rkhunter, and zsh), openSUSE (Chromium, GraphicsMagick, jasper, opencv, pdns, and wireshark), SUSE (jasper, java-1_7_1-ibm, krb5, libmodplug, and openstack-nova), and Ubuntu (thunderbird).

Security updates for Friday

Post Syndicated from ris original https://lwn.net/Articles/754257/rss

Security updates have been issued by Arch Linux (libmupdf, mupdf, mupdf-gl, and mupdf-tools), Debian (firebird2.5, firefox-esr, and wget), Fedora (ckeditor, drupal7, firefox, kubernetes, papi, perl-Dancer2, and quassel), openSUSE (cairo, firefox, ImageMagick, libapr1, nodejs6, php7, and tiff), Red Hat (qemu-kvm-rhev), Slackware (mariadb), SUSE (xen), and Ubuntu (openjdk-8).

Fedora 28 released

Post Syndicated from corbet original https://lwn.net/Articles/753208/rss

The Fedora 28 release has been announced.
The headline feature for Fedora 28 Server is the inclusion of the
new Modular repository. This lets you select between different versions of
software like NodeJS or Django, so you can chose the stack you need for
your software.
” Some users will also appreciate that proprietary
blobs (such as the NVIDIA drivers) are now easier to obtain and install.

Implementing safe AWS Lambda deployments with AWS CodeDeploy

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/implementing-safe-aws-lambda-deployments-with-aws-codedeploy/

This post courtesy of George Mao, AWS Senior Serverless Specialist – Solutions Architect

AWS Lambda and AWS CodeDeploy recently made it possible to automatically shift incoming traffic between two function versions based on a preconfigured rollout strategy. This new feature allows you to gradually shift traffic to the new function. If there are any issues with the new code, you can quickly rollback and control the impact to your application.

Previously, you had to manually move 100% of traffic from the old version to the new version. Now, you can have CodeDeploy automatically execute pre- or post-deployment tests and automate a gradual rollout strategy. Traffic shifting is built right into the AWS Serverless Application Model (SAM), making it easy to define and deploy your traffic shifting capabilities. SAM is an extension of AWS CloudFormation that provides a simplified way of defining serverless applications.

In this post, I show you how to use SAM, CloudFormation, and CodeDeploy to accomplish an automated rollout strategy for safe Lambda deployments.

Scenario

For this walkthrough, you write a Lambda application that returns a count of the S3 buckets that you own. You deploy it and use it in production. Later on, you receive requirements that tell you that you need to change your Lambda application to count only buckets that begin with the letter “a”.

Before you make the change, you need to be sure that your new Lambda application works as expected. If it does have issues, you want to minimize the number of impacted users and roll back easily. To accomplish this, you create a deployment process that publishes the new Lambda function, but does not send any traffic to it. You use CodeDeploy to execute a PreTraffic test to ensure that your new function works as expected. After the test succeeds, CodeDeploy automatically shifts traffic gradually to the new version of the Lambda function.

Your Lambda function is exposed as a REST service via an Amazon API Gateway deployment. This makes it easy to test and integrate.

Prerequisites

To execute the SAM and CloudFormation deployment, you must have the following IAM permissions:

  • cloudformation:*
  • lambda:*
  • codedeploy:*
  • iam:create*

You may use the AWS SAM Local CLI or the AWS CLI to package and deploy your Lambda application. If you choose to use SAM Local, be sure to install it onto your system. For more information, see AWS SAM Local Installation.

All of the code used in this post can be found in this GitHub repository: https://github.com/aws-samples/aws-safe-lambda-deployments.

Walkthrough

For this post, use SAM to define your resources because it comes with built-in CodeDeploy support for safe Lambda deployments.  The deployment is handled and automated by CloudFormation.

SAM allows you to define your Serverless applications in a simple and concise fashion, because it automatically creates all necessary resources behind the scenes. For example, if you do not define an execution role for a Lambda function, SAM automatically creates one. SAM also creates the CodeDeploy application necessary to drive the traffic shifting, as well as the IAM service role that CodeDeploy uses to execute all actions.

Create a SAM template

To get started, write your SAM template and call it template.yaml.

AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: An example SAM template for Lambda Safe Deployments.

Resources:

  returnS3Buckets:
    Type: AWS::Serverless::Function
    Properties:
      Handler: returnS3Buckets.handler
      Runtime: nodejs6.10
      AutoPublishAlias: live
      Policies:
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "s3:ListAllMyBuckets"
            Resource: '*'
      DeploymentPreference:
          Type: Linear10PercentEvery1Minute
          Hooks:
            PreTraffic: !Ref preTrafficHook
      Events:
        Api:
          Type: Api
          Properties:
            Path: /test
            Method: get

  preTrafficHook:
    Type: AWS::Serverless::Function
    Properties:
      Handler: preTrafficHook.handler
      Policies:
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "codedeploy:PutLifecycleEventHookExecutionStatus"
            Resource:
              !Sub 'arn:aws:codedeploy:${AWS::Region}:${AWS::AccountId}:deploymentgroup:${ServerlessDeploymentApplication}/*'
        - Version: "2012-10-17"
          Statement: 
          - Effect: "Allow"
            Action: 
              - "lambda:InvokeFunction"
            Resource: !Ref returnS3Buckets.Version
      Runtime: nodejs6.10
      FunctionName: 'CodeDeployHook_preTrafficHook'
      DeploymentPreference:
        Enabled: false
      Timeout: 5
      Environment:
        Variables:
          NewVersion: !Ref returnS3Buckets.Version

This template creates two functions:

  • returnS3Buckets
  • preTrafficHook

The returnS3Buckets function is where your application logic lives. It’s a simple piece of code that uses the AWS SDK for JavaScript in Node.JS to call the Amazon S3 listBuckets API action and return the number of buckets.

'use strict';

var AWS = require('aws-sdk');
var s3 = new AWS.S3();

exports.handler = (event, context, callback) => {
	console.log("I am here! " + context.functionName  +  ":"  +  context.functionVersion);

	s3.listBuckets(function (err, data){
		if(err){
			console.log(err, err.stack);
			callback(null, {
				statusCode: 500,
				body: "Failed!"
			});
		}
		else{
			var allBuckets = data.Buckets;

			console.log("Total buckets: " + allBuckets.length);
			callback(null, {
				statusCode: 200,
				body: allBuckets.length
			});
		}
	});	
}

Review the key parts of the SAM template that defines returnS3Buckets:

  • The AutoPublishAlias attribute instructs SAM to automatically publish a new version of the Lambda function for each new deployment and link it to the live alias.
  • The Policies attribute specifies additional policy statements that SAM adds onto the automatically generated IAM role for this function. The first statement provides the function with permission to call listBuckets.
  • The DeploymentPreference attribute configures the type of rollout pattern to use. In this case, you are shifting traffic in a linear fashion, moving 10% of traffic every minute to the new version. For more information about supported patterns, see Serverless Application Model: Traffic Shifting Configurations.
  • The Hooks attribute specifies that you want to execute the preTrafficHook Lambda function before CodeDeploy automatically begins shifting traffic. This function should perform validation testing on the newly deployed Lambda version. This function invokes the new Lambda function and checks the results. If you’re satisfied with the tests, instruct CodeDeploy to proceed with the rollout via an API call to: codedeploy.putLifecycleEventHookExecutionStatus.
  • The Events attribute defines an API-based event source that can trigger this function. It accepts requests on the /test path using an HTTP GET method.
'use strict';

const AWS = require('aws-sdk');
const codedeploy = new AWS.CodeDeploy({apiVersion: '2014-10-06'});
var lambda = new AWS.Lambda();

exports.handler = (event, context, callback) => {

	console.log("Entering PreTraffic Hook!");
	
	// Read the DeploymentId & LifecycleEventHookExecutionId from the event payload
    var deploymentId = event.DeploymentId;
	var lifecycleEventHookExecutionId = event.LifecycleEventHookExecutionId;

	var functionToTest = process.env.NewVersion;
	console.log("Testing new function version: " + functionToTest);

	// Perform validation of the newly deployed Lambda version
	var lambdaParams = {
		FunctionName: functionToTest,
		InvocationType: "RequestResponse"
	};

	var lambdaResult = "Failed";
	lambda.invoke(lambdaParams, function(err, data) {
		if (err){	// an error occurred
			console.log(err, err.stack);
			lambdaResult = "Failed";
		}
		else{	// successful response
			var result = JSON.parse(data.Payload);
			console.log("Result: " +  JSON.stringify(result));

			// Check the response for valid results
			// The response will be a JSON payload with statusCode and body properties. ie:
			// {
			//		"statusCode": 200,
			//		"body": 51
			// }
			if(result.body == 9){	
				lambdaResult = "Succeeded";
				console.log ("Validation testing succeeded!");
			}
			else{
				lambdaResult = "Failed";
				console.log ("Validation testing failed!");
			}

			// Complete the PreTraffic Hook by sending CodeDeploy the validation status
			var params = {
				deploymentId: deploymentId,
				lifecycleEventHookExecutionId: lifecycleEventHookExecutionId,
				status: lambdaResult // status can be 'Succeeded' or 'Failed'
			};
			
			// Pass AWS CodeDeploy the prepared validation test results.
			codedeploy.putLifecycleEventHookExecutionStatus(params, function(err, data) {
				if (err) {
					// Validation failed.
					console.log('CodeDeploy Status update failed');
					console.log(err, err.stack);
					callback("CodeDeploy Status update failed");
				} else {
					// Validation succeeded.
					console.log('Codedeploy status updated successfully');
					callback(null, 'Codedeploy status updated successfully');
				}
			});
		}  
	});
}

The hook is hardcoded to check that the number of S3 buckets returned is 9.

Review the key parts of the SAM template that defines preTrafficHook:

  • The Policies attribute specifies additional policy statements that SAM adds onto the automatically generated IAM role for this function. The first statement provides permissions to call the CodeDeploy PutLifecycleEventHookExecutionStatus API action. The second statement provides permissions to invoke the specific version of the returnS3Buckets function to test
  • This function has traffic shifting features disabled by setting the DeploymentPreference option to false.
  • The FunctionName attribute explicitly tells CloudFormation what to name the function. Otherwise, CloudFormation creates the function with the default naming convention: [stackName]-[FunctionName]-[uniqueID].  Name the function with the “CodeDeployHook_” prefix because the CodeDeployServiceRole role only allows InvokeFunction on functions named with that prefix.
  • Set the Timeout attribute to allow enough time to complete your validation tests.
  • Use an environment variable to inject the ARN of the newest deployed version of the returnS3Buckets function. The ARN allows the function to know the specific version to invoke and perform validation testing on.

Deploy the function

Your SAM template is all set and the code is written—you’re ready to deploy the function for the first time. Here’s how to do it via the SAM CLI. Replace “sam” with “cloudformation” to use CloudFormation instead.

First, package the function. This command returns a CloudFormation importable file, packaged.yaml.

sam package –template-file template.yaml –s3-bucket mybucket –output-template-file packaged.yaml

Now deploy everything:

sam deploy –template-file packaged.yaml –stack-name mySafeDeployStack –capabilities CAPABILITY_IAM

At this point, both Lambda functions have been deployed within the CloudFormation stack mySafeDeployStack. The returnS3Buckets has been deployed as Version 1:

SAM automatically created a few things, including the CodeDeploy application, with the deployment pattern that you specified (Linear10PercentEvery1Minute). There is currently one deployment group, with no action, because no deployments have occurred. SAM also created the IAM service role that this CodeDeploy application uses:

There is a single managed policy attached to this role, which allows CodeDeploy to invoke any Lambda function that begins with “CodeDeployHook_”.

An API has been set up called safeDeployStack. It targets your Lambda function with the /test resource using the GET method. When you test the endpoint, API Gateway executes the returnS3Buckets function and it returns the number of S3 buckets that you own. In this case, it’s 51.

Publish a new Lambda function version

Now implement the requirements change, which is to make returnS3Buckets count only buckets that begin with the letter “a”. The code now looks like the following (see returnS3BucketsNew.js in GitHub):

'use strict';

var AWS = require('aws-sdk');
var s3 = new AWS.S3();

exports.handler = (event, context, callback) => {
	console.log("I am here! " + context.functionName  +  ":"  +  context.functionVersion);

	s3.listBuckets(function (err, data){
		if(err){
			console.log(err, err.stack);
			callback(null, {
				statusCode: 500,
				body: "Failed!"
			});
		}
		else{
			var allBuckets = data.Buckets;

			console.log("Total buckets: " + allBuckets.length);
			//callback(null, allBuckets.length);

			//  New Code begins here
			var counter=0;
			for(var i  in allBuckets){
				if(allBuckets[i].Name[0] === "a")
					counter++;
			}
			console.log("Total buckets starting with a: " + counter);

			callback(null, {
				statusCode: 200,
				body: counter
			});
			
		}
	});	
}

Repackage and redeploy with the same two commands as earlier:

sam package –template-file template.yaml –s3-bucket mybucket –output-template-file packaged.yaml
	
sam deploy –template-file packaged.yaml –stack-name mySafeDeployStack –capabilities CAPABILITY_IAM

CloudFormation understands that this is a stack update instead of an entirely new stack. You can see that reflected in the CloudFormation console:

During the update, CloudFormation deploys the new Lambda function as version 2 and adds it to the “live” alias. There is no traffic routing there yet. CodeDeploy now takes over to begin the safe deployment process.

The first thing CodeDeploy does is invoke the preTrafficHook function. Verify that this happened by reviewing the Lambda logs and metrics:

The function should progress successfully, invoke Version 2 of returnS3Buckets, and finally invoke the CodeDeploy API with a success code. After this occurs, CodeDeploy begins the predefined rollout strategy. Open the CodeDeploy console to review the deployment progress (Linear10PercentEvery1Minute):

Verify the traffic shift

During the deployment, verify that the traffic shift has started to occur by running the test periodically. As the deployment shifts towards the new version, a larger percentage of the responses return 9 instead of 51. These numbers match the S3 buckets.

A minute later, you see 10% more traffic shifting to the new version. The whole process takes 10 minutes to complete. After completion, open the Lambda console and verify that the “live” alias now points to version 2:

After 10 minutes, the deployment is complete and CodeDeploy signals success to CloudFormation and completes the stack update.

Check the results

If you invoke the function alias manually, you see the results of the new implementation.

aws lambda invoke –function [lambda arn to live alias] out.txt

You can also execute the prod stage of your API and verify the results by issuing an HTTP GET to the invoke URL:

Summary

This post has shown you how you can safely automate your Lambda deployments using the Lambda traffic shifting feature. You used the Serverless Application Model (SAM) to define your Lambda functions and configured CodeDeploy to manage your deployment patterns. Finally, you used CloudFormation to automate the deployment and updates to your function and PreTraffic hook.

Now that you know all about this new feature, you’re ready to begin automating Lambda deployments with confidence that things will work as designed. I look forward to hearing about what you’ve built with the AWS Serverless Platform.

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/751346/rss

Security updates have been issued by Arch Linux (openssl and zziplib), Debian (ldap-account-manager, ming, python-crypto, sam2p, sdl-image1.2, and squirrelmail), Fedora (bchunk, koji, libidn, librelp, nodejs, and php), Gentoo (curl, dhcp, libvirt, mailx, poppler, qemu, and spice-vdagent), Mageia (389-ds-base, aubio, cfitsio, libvncserver, nmap, and ntp), openSUSE (GraphicsMagick, ImageMagick, spice-gtk, and wireshark), Oracle (kubernetes), Slackware (patch), and SUSE (apache2 and openssl).

Security updates for Wednesday

Post Syndicated from ris original https://lwn.net/Articles/750902/rss

Security updates have been issued by Debian (apache2, ldap-account-manager, and openjdk-7), Fedora (libuv and nodejs), Gentoo (glibc and libxslt), Mageia (acpica-tools, openssl, and php), SUSE (clamav, coreutils, and libvirt), and Ubuntu (kernel, libraw, linux-hwe, linux-gcp, linux-oem, and python-crypto).

Node.js 8.10 runtime now available in AWS Lambda

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/node-js-8-10-runtime-now-available-in-aws-lambda/

This post courtesy of Ed Lima, AWS Solutions Architect

We are excited to announce that you can now develop your AWS Lambda functions using the Node.js 8.10 runtime, which is the current Long Term Support (LTS) version of Node.js. Start using this new version today by specifying a runtime parameter value of nodejs8.10 when creating or updating functions.

Supporting async/await

The Lambda programming model for Node.js 8.10 now supports defining a function handler using the async/await pattern.

Asynchronous or non-blocking calls are an inherent and important part of applications, as user and human interfaces are asynchronous by nature. If you decide to have a coffee with a friend, you usually order the coffee then start or continue a conversation with your friend while the coffee is getting ready. You don’t wait for the coffee to be ready before you start talking. These activities are asynchronous, because you can start one and then move to the next without waiting for completion. Otherwise, you’d delay (or block) the start of the next activity.

Asynchronous calls used to be handled in Node.js using callbacks. That presented problems when they were nested within other callbacks in multiple levels, making the code difficult to maintain and understand.

Promises were implemented to try to solve issues caused by “callback hell.” They allow asynchronous operations to call their own methods and handle what happens when a call is successful or when it fails. As your requirements become more complicated, even promises become harder to work with and may still end up complicating your code.

Async/await is the new way of handling asynchronous operations in Node.js, and makes for simpler, easier, and cleaner code for non-blocking calls. It still uses promises but a callback is returned directly from the asynchronous function, just as if it were a synchronous blocking function.

Take for instance the following Lambda function to get the current account settings, using the Node.js 6.10 runtime:

let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();

exports.handler = (event, context, callback) => {
    let getAccountSettingsPromise = lambda.getAccountSettings().promise();
    getAccountSettingsPromise.then(
        (data) => {
            callback(null, data);
        },
        (err) => {
            console.log(err);
            callback(err);
        }
    );
};

With the new Node.js 8.10 runtime, there are new handler types that can be declared with the “async” keyword or can return a promise directly.

This is how the same function looks like using async/await with Node.js 8.10:

let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();

exports.handler = async (event) => {
    return await lambda.getAccountSettings().promise() ;
};

Alternatively, you could have the handler return a promise directly:

let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();

exports.handler = (event) => {
    return new Promise((resolve, reject) => {
        lambda.getAccountSettings(event)
        .then((data) => {
            resolve data;
        })
        .catch(reject);
     });
};

The new handler types are alternatives to the callback pattern, which is still fully supported.

All three functions return the same results. However, in the new runtime with async/await, all callbacks in the code are gone, which makes it easier to read. This is especially true for those less familiar with promises.

{
    "AccountLimit":{
        "TotalCodeSize":80530636800,
        "CodeSizeUnzipped":262144000,
        "CodeSizeZipped":52428800, 
        "ConcurrentExecutions":1000,
        "UnreservedConcurrentExecutions":1000
    },
    "AccountUsage":{
        "TotalCodeSize":52234461,
        "FunctionCount":53
    }
}

Another great advantage of async/await is better error handling. You can use a try/catch block inside the scope of an async function. Even though the function awaits an asynchronous operation, any errors end up in the catch block.

You can improve your previous Node.js 8.10 function with this trusted try/catch error handling pattern:

let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
let data;

exports.handler = async (event) => {
    try {
        data = await lambda.getAccountSettings().promise();
    }
    catch (err) {
        console.log(err);
        return err;
    }
    return data;
};

While you now have a similar number of lines in both runtimes, the code is cleaner and more readable with async/await. It makes the asynchronous calls look more synchronous. However, it is important to notice that the code is still executed the same way as if it were using a callback or promise-based API.

Backward compatibility

You may port your existing Node.js 4.3 and 6.10 functions over to Node.js 8.10 by updating the runtime. Node.js 8.10 does include numerous breaking changes from previous Node versions.

Make sure to review the API changes between Node.js 4.3, 6.10, and Node.js 8.10 to see if there are other changes that might affect your code. We recommend testing that your Lambda function passes internal validation for its behavior when upgrading to the new runtime version.

You can use Lambda versions/aliases to safely test that your function runs as expected on Node 8.10, before routing production traffic to it.

New node features

You can now get better performance when compared to the previous LTS version 6.x (up to 20%). The new V8 6.0 engine comes with Turbofan and the Ignition pipeline, which leads to lower memory consumption and faster startup time across Node.js applications.

HTTP/2, which is subject to future changes, allows developers to use the new protocol to speed application development and undo many of HTTP/1.1 workarounds to make applications faster, simpler, and more powerful.

For more information, see the AWS Lambda Developer Guide.

Hope you enjoy and… go build with Node.js 8.10!

Central Logging in Multi-Account Environments

Post Syndicated from matouk original https://aws.amazon.com/blogs/architecture/central-logging-in-multi-account-environments/

Centralized logging is often required in large enterprise environments for a number of reasons, ranging from compliance and security to analytics and application-specific needs.

I’ve seen that in a multi-account environment, whether the accounts belong to the same line of business or multiple business units, collecting logs in a central, dedicated logging account is an established best practice. It helps security teams detect malicious activities both in real-time and during incident response. It provides protection to log data in case it is accidentally or intentionally deleted. It also helps application teams correlate and analyze log data across multiple application tiers.

This blog post provides a solution and building blocks to stream Amazon CloudWatch log data across accounts. In a multi-account environment this repeatable solution could be deployed multiple times to stream all relevant Amazon CloudWatch log data from all accounts to a centralized logging account.

Solution Summary 

The solution uses Amazon Kinesis Data Streams and a log destination to set up an endpoint in the logging account to receive streamed logs and uses Amazon Kinesis Data Firehose to deliver log data to the Amazon Simple Storage Solution (S3) bucket. Application accounts will subscribe to stream all (or part) of their Amazon CloudWatch logs to a defined destination in the logging account via subscription filters.

Below is a diagram illustrating how the various services work together.


In logging an account, a Kinesis Data Stream is created to receive streamed log data and a log destination is created to facilitate remote streaming, configured to use the Kinesis Data Stream as its target.

The Amazon Kinesis Data Firehose stream is created to deliver log data from the data stream to S3. The delivery stream uses a generic AWS Lambda function for data validation and transformation.

In each application account, a subscription filter is created between each Amazon CloudWatch log group and the destination created for this log group in the logging account.

The following steps are involved in setting up the central-logging solution:

  1. Create an Amazon S3 bucket for your central logging in the logging account
  2. Create an AWS Lambda function for log data transformation and decoding in logging account
  3. Create a central logging stack as a logging-account destination ready to receive streamed logs and deliver them to S3
  4. Create a subscription in application accounts to deliver logs from a specific CloudWatch log group to the logging account destination
  5. Create Amazon Athena tables to query and analyze log data in your logging account

Creating a log destination in your logging account

In this section, we will setup the logging account side of the solution, providing detail on the list above. The example I use is for the us-east-1 region, however any region where required services are available could be used.

It’s important to note that your logging-account destination and application-account subscription must be in the same region. You can deploy the solution multiple times to create destinations in all required regions if application accounts use multiple regions.

Step 1: Create an S3 bucket

Use the CloudFormation template below to create S3 bucket in logging account. This template also configures the bucket to archive log data to Glacier after 60 days.


{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "CF Template to create S3 bucket for central logging",
  "Parameters":{

    "BucketName":{
      "Type":"String",
      "Default":"",
      "Description":"Central logging bucket name"
    }
  },
  "Resources":{
                        
   "CentralLoggingBucket" : {
      "Type" : "AWS::S3::Bucket",
      "Properties" : {
        "BucketName" : {"Ref": "BucketName"},
        "LifecycleConfiguration": {
            "Rules": [
                {
                  "Id": "ArchiveToGlacier",
                  "Prefix": "",
                  "Status": "Enabled",
                  "Transitions":[{
                      "TransitionInDays": "60",
                      "StorageClass": "GLACIER"
                  }]
                }
            ]
        }
      }
    }

  },
  "Outputs":{
    "CentralLogBucket":{
    	"Description" : "Central log bucket",
    	"Value" : {"Ref": "BucketName"} ,
    	"Export" : { "Name" : "CentralLogBucketName"}
    }
  }
} 

To create your central-logging bucket do the following:

  1. Save the template file to your local developer machine as “central-log-bucket.json”
  2. From the CloudFormation console, select “create new stack” and import the file “central-log-bucket.json”
  3. Fill in the parameters and complete stack creation steps (as indicated in the screenshot below)
  4. Verify the bucket has been created successfully and take a note of the bucket name

Step 2: Create data processing Lambda function

Use the template below to create a Lambda function in your logging account that will be used by Amazon Firehose for data transformation during the delivery process to S3. This function is based on the AWS Lambda kinesis-firehose-cloudwatch-logs-processor blueprint.

The function could be created manually from the blueprint or using the cloudformation template below. To find the blueprint navigate to Lambda -> Create -> Function -> Blueprints

This function will unzip the event message, parse it and verify that it is a valid CloudWatch log event. Additional processing can be added if needed. As this function is generic, it could be reused by all log-delivery streams.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create cloudwatch data processing lambda function",
  "Resources":{
      
    "LambdaRole": {
        "Type": "AWS::IAM::Role",
        "Properties": {
            "AssumeRolePolicyDocument": {
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Effect": "Allow",
                        "Principal": {
                            "Service": "lambda.amazonaws.com"
                        },
                        "Action": "sts:AssumeRole"
                    }
                ]
            },
            "Path": "/",
            "Policies": [
                {
                    "PolicyName": "firehoseCloudWatchDataProcessing",
                    "PolicyDocument": {
                        "Version": "2012-10-17",
                        "Statement": [
                            {
                                "Effect": "Allow",
                                "Action": [
                                    "logs:CreateLogGroup",
                                    "logs:CreateLogStream",
                                    "logs:PutLogEvents"
                                ],
                                "Resource": "arn:aws:logs:*:*:*"
                            }
                        ]
                    }
                }
            ]
        }
    },
      
    "FirehoseDataProcessingFunction": {
        "Type": "AWS::Lambda::Function",
        "Properties": {
            "Handler": "index.handler",
            "Role": {"Fn::GetAtt": ["LambdaRole","Arn"]},
            "Description": "Firehose cloudwatch data processing",
            "Code": {
                "ZipFile" : { "Fn::Join" : ["\n", [
                  "'use strict';",
                  "const zlib = require('zlib');",
                  "function transformLogEvent(logEvent) {",
                  "       return Promise.resolve(`${logEvent.message}\n`);",
                  "}",
                  "exports.handler = (event, context, callback) => {",
                  "    Promise.all(event.records.map(r => {",
                  "        const buffer = new Buffer(r.data, 'base64');",
                  "        const decompressed = zlib.gunzipSync(buffer);",
                  "        const data = JSON.parse(decompressed);",
                  "        if (data.messageType !== 'DATA_MESSAGE') {",
                  "            return Promise.resolve({",
                  "                recordId: r.recordId,",
                  "                result: 'ProcessingFailed',",
                  "            });",
                  "         } else {",
                  "            const promises = data.logEvents.map(transformLogEvent);",
                  "            return Promise.all(promises).then(transformed => {",
                  "                const payload = transformed.reduce((a, v) => a + v, '');",
                  "                const encoded = new Buffer(payload).toString('base64');",
                  "                console.log('---------------payloadv2:'+JSON.stringify(payload, null, 2));",
                  "                return {",
                  "                    recordId: r.recordId,",
                  "                    result: 'Ok',",
                  "                    data: encoded,",
                  "                };",
                  "           });",
                  "        }",
                  "    })).then(recs => callback(null, { records: recs }));",
                    "};"

                ]]}
            },
            "Runtime": "nodejs6.10",
            "Timeout": "60"
        }
    }

  },
  "Outputs":{
   "Function" : {
      "Description": "Function ARN",
      "Value": {"Fn::GetAtt": ["FirehoseDataProcessingFunction","Arn"]},
      "Export" : { "Name" : {"Fn::Sub": "${AWS::StackName}-Function" }}
    }
  }
}

To create the function follow the steps below:

  1. Save the template file as “central-logging-lambda.json”
  2. Login to logging account and, from the CloudFormation console, select “create new stack”
  3. Import the file “central-logging-lambda.json” and click next
  4. Follow the steps to create the stack and verify successful creation
  5. Take a note of Lambda function arn from the output section

Step 3: Create log destination in logging account

Log destination is used as the target of a subscription from application accounts, log destination can be shared between multiple subscriptions however according to the architecture suggested in this solution all logs streamed to the same destination will be stored in the same S3 location, if you would like to store log data in different hierarchy or in a completely different bucket you need to create separate destinations.

As noted previously, your destination and subscription have to be in the same region

Use the template below to create destination stack in logging account.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create log destination and required resources",
  "Parameters":{

    "LogBucketName":{
      "Type":"String",
      "Default":"central-log-do-not-delete",
      "Description":"Destination logging bucket"
    },
    "LogS3Location":{
      "Type":"String",
      "Default":"<BU>/<ENV>/<SOURCE_ACCOUNT>/<LOG_TYPE>/",
      "Description":"S3 location for the logs streamed to this destination; example marketing/prod/999999999999/flow-logs/"
    },
    "ProcessingLambdaARN":{
      "Type":"String",
      "Default":"",
      "Description":"CloudWatch logs data processing function"
    },
    "SourceAccount":{
      "Type":"String",
      "Default":"",
      "Description":"Source application account number"
    }
  },
    
  "Resources":{
    "MyStream": {
      "Type": "AWS::Kinesis::Stream",
      "Properties": {
        "Name": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-Stream"] ]},
        "RetentionPeriodHours" : 48,
        "ShardCount": 1,
        "Tags": [
          {
            "Key": "Solution",
            "Value": "CentralLogging"
          }
       ]
      }
    },
    "LogRole" : {
      "Type"  : "AWS::IAM::Role",
      "Properties" : {
          "AssumeRolePolicyDocument" : {
              "Statement" : [ {
                  "Effect" : "Allow",
                  "Principal" : {
                      "Service" : [ {"Fn::Join": [ "", [ "logs.", { "Ref": "AWS::Region" }, ".amazonaws.com" ] ]} ]
                  },
                  "Action" : [ "sts:AssumeRole" ]
              } ]
          },         
          "Path" : "/service-role/"
      }
    },
      
    "LogRolePolicy" : {
        "Type" : "AWS::IAM::Policy",
        "Properties" : {
            "PolicyName" : {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-LogPolicy"] ]},
            "PolicyDocument" : {
              "Version": "2012-10-17",
              "Statement": [
                {
                  "Effect": "Allow",
                  "Action": ["kinesis:PutRecord"],
                  "Resource": [{ "Fn::GetAtt" : ["MyStream", "Arn"] }]
                },
                {
                  "Effect": "Allow",
                  "Action": ["iam:PassRole"],
                  "Resource": [{ "Fn::GetAtt" : ["LogRole", "Arn"] }]
                }
              ]
            },
            "Roles" : [ { "Ref" : "LogRole" } ]
        }
    },
      
    "LogDestination" : {
      "Type" : "AWS::Logs::Destination",
      "DependsOn" : ["MyStream","LogRole","LogRolePolicy"],
      "Properties" : {
        "DestinationName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-Destination"] ]},
        "RoleArn": { "Fn::GetAtt" : ["LogRole", "Arn"] },
        "TargetArn": { "Fn::GetAtt" : ["MyStream", "Arn"] },
        "DestinationPolicy": { "Fn::Join" : ["",[
		
				"{\"Version\" : \"2012-10-17\",\"Statement\" : [{\"Effect\" : \"Allow\",",
                " \"Principal\" : {\"AWS\" : \"", {"Ref":"SourceAccount"} ,"\"},",
                "\"Action\" : \"logs:PutSubscriptionFilter\",",
                " \"Resource\" : \"", 
                {"Fn::Join": [ "", [ "arn:aws:logs:", { "Ref": "AWS::Region" }, ":" ,{ "Ref": "AWS::AccountId" }, ":destination:",{ "Ref" : "AWS::StackName" },"-Destination" ] ]}  ,"\"}]}"

			]]}
          
          
      }
    },
      
    "S3deliveryStream": {
      "DependsOn": ["S3deliveryRole", "S3deliveryPolicy"],
      "Type": "AWS::KinesisFirehose::DeliveryStream",
      "Properties": {
        "DeliveryStreamName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-DeliveryStream"] ]},
        "DeliveryStreamType": "KinesisStreamAsSource",
        "KinesisStreamSourceConfiguration": {
            "KinesisStreamARN": { "Fn::GetAtt" : ["MyStream", "Arn"] },
            "RoleARN": {"Fn::GetAtt" : ["S3deliveryRole", "Arn"] }
        },
        "ExtendedS3DestinationConfiguration": {
          "BucketARN": {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]},
          "BufferingHints": {
            "IntervalInSeconds": "60",
            "SizeInMBs": "50"
          },
          "CompressionFormat": "UNCOMPRESSED",
          "Prefix": {"Ref": "LogS3Location"},
          "RoleARN": {"Fn::GetAtt" : ["S3deliveryRole", "Arn"] },
          "ProcessingConfiguration" : {
              "Enabled": "true",
              "Processors": [
              {
                "Parameters": [ 
                { 
                    "ParameterName": "LambdaArn",
                    "ParameterValue": {"Ref":"ProcessingLambdaARN"}
                }],
                "Type": "Lambda"
              }]
          }
        }

      }
    },
      
    "S3deliveryRole": {
      "Type": "AWS::IAM::Role",
      "Properties": {
        "AssumeRolePolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Sid": "",
              "Effect": "Allow",
              "Principal": {
                "Service": "firehose.amazonaws.com"
              },
              "Action": "sts:AssumeRole",
              "Condition": {
                "StringEquals": {
                  "sts:ExternalId": {"Ref":"AWS::AccountId"}
                }
              }
            }
          ]
        }
      }
    },
      
    "S3deliveryPolicy": {
      "Type": "AWS::IAM::Policy",
      "Properties": {
        "PolicyName": {"Fn::Join" : [ "", [{ "Ref" : "AWS::StackName" },"-FirehosePolicy"] ]},
        "PolicyDocument": {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Action": [
                "s3:AbortMultipartUpload",
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:PutObject"
              ],
              "Resource": [
                {"Fn::Join": ["", [ {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]}]]},
                {"Fn::Join": ["", [ {"Fn::Join" : [ "", ["arn:aws:s3:::",{"Ref":"LogBucketName"}] ]}, "*"]]}
              ]
            },
            {
              "Effect": "Allow",
              "Action": [
                "lambda:InvokeFunction",
                "lambda:GetFunctionConfiguration",
                "logs:PutLogEvents",
                "kinesis:DescribeStream",
                "kinesis:GetShardIterator",
                "kinesis:GetRecords",
                "kms:Decrypt"
              ],
              "Resource": "*"
            }
          ]
        },
        "Roles": [{"Ref": "S3deliveryRole"}]
      }
    }

  },
  "Outputs":{
      
   "Destination" : {
      "Description": "Destination",
      "Value": {"Fn::Join": [ "", [ "arn:aws:logs:", { "Ref": "AWS::Region" }, ":" ,{ "Ref": "AWS::AccountId" }, ":destination:",{ "Ref" : "AWS::StackName" },"-Destination" ] ]},
      "Export" : { "Name" : {"Fn::Sub": "${AWS::StackName}-Destination" }}
    }

  }
} 

To create log your destination and all required resources, follow these steps:

  1. Save your template as “central-logging-destination.json”
  2. Login to your logging account and, from the CloudFormation console, select “create new stack”
  3. Import the file “central-logging-destination.json” and click next
  4. Fill in the parameters to configure the log destination and click Next
  5. Follow the default steps to create the stack and verify successful creation
    1. Bucket name is the same as in the “create central logging bucket” step
    2. LogS3Location is the directory hierarchy for saving log data that will be delivered to this destination
    3. ProcessingLambdaARN is as created in “create data processing Lambda function” step
    4. SourceAccount is the application account number where the subscription will be created
  6. Take a note of destination ARN as it appears in outputs section as you did above.

Step 4: Create the log subscription in your application account

In this section, we will create the subscription filter in one of the application accounts to stream logs from the CloudWatch log group to the log destination that was created in your logging account.

Create log subscription filter

The subscription filter is created between the CloudWatch log group and a destination endpoint. Asubscription could be filtered to send part (or all) of the logs in the log group. For example,you can create a subscription filter to stream only flow logs with status REJECT.

Use the CloudFormation template below to create subscription filter. Subscription filter and log destination must be in the same region.

{
  "AWSTemplateFormatVersion":"2010-09-09",
  "Description": "Create log subscription filter for a specific Log Group",
  "Parameters":{

    "DestinationARN":{
      "Type":"String",
      "Default":"",
      "Description":"ARN of logs destination"
    },
    "LogGroupName":{
      "Type":"String",
      "Default":"",
      "Description":"Name of LogGroup to forward logs from"
    },
    "FilterPattern":{
      "Type":"String",
      "Default":"",
      "Description":"Filter pattern to filter events to be sent to log destination; Leave empty to send all logs"
    }
  },
    
  "Resources":{
    "SubscriptionFilter" : {
      "Type" : "AWS::Logs::SubscriptionFilter",
      "Properties" : {
        "LogGroupName" : { "Ref" : "LogGroupName" },
        "FilterPattern" : { "Ref" : "FilterPattern" },
        "DestinationArn" : { "Ref" : "DestinationARN" }
      }
    }
  }
}

To create a subscription filter for one of CloudWatch log groups in your application account, follow the steps below:

  1. Save the template as “central-logging-subscription.json”
  2. Login to your application account and, from the CloudFormation console, select “create new stack”
  3. Select the file “central-logging-subscription.json” and click next
  4. Fill in the parameters as appropriate to your environment as you did above
    a.  DestinationARN is the value of obtained in “create log destination in logging account” step
    b.  FilterPatterns is the filter value for log data to be streamed to your logging account (leave empty to stream all logs in the selected log group)
    c.  LogGroupName is the log group as it appears under CloudWatch Logs
  5. Verify successful creation of the subscription

This completes the deployment process in both the logging- and application-account side. After a few minutes, log data will be streamed to the central-logging destination defined in your logging account.

Step 5: Analyzing log data

Once log data is centralized, it opens the door to run analytics on the consolidated data for business or security reasons. One of the powerful services that AWS offers is Amazon Athena.

Amazon Athena allows you to query data in S3 using standard SQL.

Follow the steps below to create a simple table and run queries on the flow logs data that has been collected from your application accounts

  1. Login to your logging account and from the Amazon Athena console, use the DDL below in your query  editor to create a new table

CREATE EXTERNAL TABLE IF NOT EXISTS prod_vpc_flow_logs (

Version INT,

Account STRING,

InterfaceId STRING,

SourceAddress STRING,

DestinationAddress STRING,

SourcePort INT,

DestinationPort INT,

Protocol INT,

Packets INT,

Bytes INT,

StartTime INT,

EndTime INT,

Action STRING,

LogStatus STRING

)

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.RegexSerDe’

WITH SERDEPROPERTIES (

“input.regex” = “^([^ ]+)\\s+([0-9]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([0-9]+)\\s+([0-9]+)\\s+([^ ]+)\\s+([^ ]+)$”)

LOCATION ‘s3://central-logging-company-do-not-delete/’;

2. Click ”run query” and verify a successful run/ This creates the table “prod_vpc_flow_logs”

3. You can then run queries against the table data as below:

Conclusion

By following the steps I’ve outlined, you will build a central logging solution to stream CloudWatch logs from one application account to a central logging account. This solution is repeatable and could be deployed multiple times for multiple accounts and logging requirements.

 

About the Author

Mahmoud Matouk is a Senior Cloud Infrastructure Architect. He works with our customers to help accelerate migration and cloud adoption at the enterprise level.

 

Security updates for Thursday

Post Syndicated from jake original https://lwn.net/Articles/746078/rss

Security updates have been issued by Debian (chromium-browser, krb5, and smarty3), Fedora (firefox, GraphicsMagick, and moodle), Mageia (rsync), openSUSE (bind, chromium, freeimage, gd, GraphicsMagick, libtasn1, libvirt, nodejs6, php7, systemd, and webkit2gtk3), Red Hat (chromium-browser, systemd, and thunderbird), Scientific Linux (systemd), and Ubuntu (curl, firefox, and ruby2.3).

Invoking AWS Lambda from Amazon MQ

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/invoking-aws-lambda-from-amazon-mq/

Contributed by Josh Kahn, AWS Solutions Architect

Message brokers can be used to solve a number of needs in enterprise architectures, including managing workload queues and broadcasting messages to a number of subscribers. Amazon MQ is a managed message broker service for Apache ActiveMQ that makes it easy to set up and operate message brokers in the cloud.

In this post, I discuss one approach to invoking AWS Lambda from queues and topics managed by Amazon MQ brokers. This and other similar patterns can be useful in integrating legacy systems with serverless architectures. You could also integrate systems already migrated to the cloud that use common APIs such as JMS.

For example, imagine that you work for a company that produces training videos and which recently migrated its video management system to AWS. The on-premises system used to publish a message to an ActiveMQ broker when a video was ready for processing by an on-premises transcoder. However, on AWS, your company uses Amazon Elastic Transcoder. Instead of modifying the management system, Lambda polls the broker for new messages and starts a new Elastic Transcoder job. This approach avoids changes to the existing application while refactoring the workload to leverage cloud-native components.

This solution uses Amazon CloudWatch Events to trigger a Lambda function that polls the Amazon MQ broker for messages. Instead of starting an Elastic Transcoder job, the sample writes the received message to an Amazon DynamoDB table with a time stamp indicating the time received.

Getting started

To start, navigate to the Amazon MQ console. Next, launch a new Amazon MQ instance, selecting Single-instance Broker and supplying a broker name, user name, and password. Be sure to document the user name and password for later.

For the purposes of this sample, choose the default options in the Advanced settings section. Your new broker is deployed to the default VPC in the selected AWS Region with the default security group. For this post, you update the security group to allow access for your sample Lambda function. In a production scenario, I recommend deploying both the Lambda function and your Amazon MQ broker in your own VPC.

After several minutes, your instance changes status from “Creation Pending” to “Available.” You can then visit the Details page of your broker to retrieve connection information, including a link to the ActiveMQ web console where you can monitor the status of your broker, publish test messages, and so on. In this example, use the Stomp protocol to connect to your broker. Be sure to capture the broker host name, for example:

<BROKER_ID>.mq.us-east-1.amazonaws.com

You should also modify the Security Group for the broker by clicking on its Security Group ID. Click the Edit button and then click Add Rule to allow inbound traffic on port 8162 for your IP address.

Deploying and scheduling the Lambda function

To simplify the deployment of this example, I’ve provided an AWS Serverless Application Model (SAM) template that deploys the sample function and DynamoDB table, and schedules the function to be invoked every five minutes. Detailed instructions can be found with sample code on GitHub in the amazonmq-invoke-aws-lambda repository, with sample code. I discuss a few key aspects in this post.

First, SAM makes it easy to deploy and schedule invocation of our function:

SubscriberFunction:
	Type: AWS::Serverless::Function
	Properties:
		CodeUri: subscriber/
		Handler: index.handler
		Runtime: nodejs6.10
		Role: !GetAtt SubscriberFunctionRole.Arn
		Timeout: 15
		Environment:
			Variables:
				HOST: !Ref AmazonMQHost
				LOGIN: !Ref AmazonMQLogin
				PASSWORD: !Ref AmazonMQPassword
				QUEUE_NAME: !Ref AmazonMQQueueName
				WORKER_FUNCTIOn: !Ref WorkerFunction
		Events:
			Timer:
				Type: Schedule
				Properties:
					Schedule: rate(5 minutes)

WorkerFunction:
Type: AWS::Serverless::Function
	Properties:
		CodeUri: worker/
		Handler: index.handler
		Runtime: nodejs6.10
Role: !GetAtt WorkerFunctionRole.Arn
		Environment:
			Variables:
				TABLE_NAME: !Ref MessagesTable

In the code, you include the URI, user name, and password for your newly created Amazon MQ broker. These allow the function to poll the broker for new messages on the sample queue.

The sample Lambda function is written in Node.js, but clients exist for a number of programming languages.

stomp.connect(options, (error, client) => {
	if (error) { /* do something */ }

	let headers = {
		destination: ‘/queue/SAMPLE_QUEUE’,
		ack: ‘auto’
	}

	client.subscribe(headers, (error, message) => {
		if (error) { /* do something */ }

		message.readString(‘utf-8’, (error, body) => {
			if (error) { /* do something */ }

			let params = {
				FunctionName: MyWorkerFunction,
				Payload: JSON.stringify({
					message: body,
					timestamp: Date.now()
				})
			}

			let lambda = new AWS.Lambda()
			lambda.invoke(params, (error, data) => {
				if (error) { /* do something */ }
			})
		}
})
})

Sending a sample message

For the purpose of this example, use the Amazon MQ console to send a test message. Navigate to the details page for your broker.

About midway down the page, choose ActiveMQ Web Console. Next, choose Manage ActiveMQ Broker to launch the admin console. When you are prompted for a user name and password, use the credentials created earlier.

At the top of the page, choose Send. From here, you can send a sample message from the broker to subscribers. For this example, this is how you generate traffic to test the end-to-end system. Be sure to set the Destination value to “SAMPLE_QUEUE.” The message body can contain any text. Choose Send.

You now have a Lambda function polling for messages on the broker. To verify that your function is working, you can confirm in the DynamoDB console that the message was successfully received and processed by the sample Lambda function.

First, choose Tables on the left and select the table name “amazonmq-messages” in the middle section. With the table detail in view, choose Items. If the function was successful, you’ll find a new entry similar to the following:

If there is no message in DynamoDB, check again in a few minutes or review the CloudWatch Logs group for Lambda functions that contain debug messages.

Alternative approaches

Beyond the approach described here, you may consider other approaches as well. For example, you could use an intermediary system such as Apache Flume to pass messages from the broker to Lambda or deploy Apache Camel to trigger Lambda via a POST to API Gateway. There are trade-offs to each of these approaches. My goal in using CloudWatch Events was to introduce an easily repeatable pattern familiar to many Lambda developers.

Summary

I hope that you have found this example of how to integrate AWS Lambda with Amazon MQ useful. If you have expertise or legacy systems that leverage APIs such as JMS, you may find this useful as you incorporate serverless concepts in your enterprise architectures.

To learn more, see the Amazon MQ website and Developer Guide. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.

Security updates for Monday

Post Syndicated from ris original https://lwn.net/Articles/743575/rss

Security updates have been issued by Arch Linux (linux-hardened, linux-lts, linux-zen, and mongodb), Debian (gdk-pixbuf, gifsicle, graphicsmagick, kernel, and poppler), Fedora (dracut, electron-cash, and firefox), Gentoo (backintime, binutils, chromium, emacs, libXcursor, miniupnpc, openssh, optipng, and webkit-gtk), Mageia (kernel, kernel-linus, kernel-tmb, openafs, and python-mistune), openSUSE (clamav-database, ImageMagick, kernel-firmware, nodejs4, and qemu), Red Hat (linux-firmware, ovirt-guest-agent-docker, qemu-kvm-rhev, redhat-virtualization-host, rhev-hypervisor7, rhvm-appliance, thunderbird, and vdsm), Scientific Linux (thunderbird), SUSE (kernel and qemu), and Ubuntu (firefox and poppler).