Robust Serverless Application Design with AWS Lambda Dead Letter Queues

Post Syndicated from Bryan Liston original https://aws.amazon.com/blogs/compute/robust-serverless-application-design-with-aws-lambda-dlq/

Gene Ting
Gene Ting, Solutions Architect

AWS Lambda is a serverless, event-driven compute service that allows developers to bring their functions to the cloud easily. A key challenge that Lambda developers often face is to create solutions that handle exceptions and failures gracefully. Some examples include:

  • Notifying operations support when a function fails with context
  • Sending jobs that have timed out to a handler that can either notify operations of a critical failure or rebalance jobs

Now, with the release of Lambda Dead Letter Queues, Lambda functions can be configured to notify when the function fails, with context on what the failure was.

In this post, we show how you can configure your function to deliver notification to an Amazon SQS queue or Amazon SNS topic, and how you can create a process to automatically generate more meaningful notifications when your Lambda functions fail.

Introducing Lambda Dead Letter Queues

Dead-letter queues are a powerful concept, which help software developers find software issue patterns in their asynchronous processing components. The way it works is simple—when your messaging component receives a message and detects a fatal or unhandled error while processing the message, it sends information about the message that failed to another location, such as another queue or another notification system. SQS provides dead letter queues today, sending messages that couldn’t be handled to a different queue for further investigation.

AWS Lambda Dead Letter Queues builds upon the concept by enabling Lambda functions to be configured with an SQS queue or SNS topic as a destination to which the Lambda service can send information about an asynchronous request when processing fails. The Lambda service sends information about the failed request when the request will no longer be retried. Supported invocations include:

  • An event type invocation from a custom application
  • Any AWS event source that’s not a DynamoDB table, Amazon Kinesis stream, or API Gateway resource request integration

Take the typical beginner use case for learning about serverless applications on AWS: creating thumbnails from images dropped onto an S3 bucket. The transcoding Lambda function can be configured to send any transcoding failures to an SNS topic, which triggers a Lambda function for further investigation.

deadletterqueues_1.jpeg
Now, you can set up a dead letter queue for an existing Lambda function and test out the feature.

Configuring a DLQ target for a Lambda function

First, make sure that the execution role for the Lambda function is allowed to publish to the SNS topic. For this demo, use the sns-lambda-test topic. An example is provided below:

{
   "Version":"2012-10-17",
   "Statement":[{
      "Effect":"Allow",
      "Action":"sns:Publish",
      "Resource":"arn:aws:sns:us-west-2:123456789012:sns-lambda-test"
      }
   ]
}

If an SQS queue is the intended target, you need a comparable policy that allows the appropriate SendMessage action to the queue.

Next, choose an existing Lambda function against which to configure a dead-letter queue. For this example, choose a predeployed function, such as CreateThumbnail.

deadletterqueues_2.png

Select the function, choose Configuration, expand the **Advanced settings **section in the middle of the page, and scroll to the DLQ Resource form. Choose SNS and for SNS Topic name, enter sns-lambda-test.
deadletterqueues_3.png
That’s it—the function is now configured and ready for testing.

Processing failure notifications

One easy way to test the handler for your dead letter queue is to submit an event that is known to fail for the Lambda function. In this example, you can simply drop a text file pretending to be an image to the S3 bucket, to be recognized by the image thumbnail creator as a non-image file, and have the handler exit with an error message.

When Lambda sends an error notification to an SNS topic, three additional message attributes are attached to the notification in the MessageAttributes object:

  • RequestID – The request ID.
  • ErrorCode – The HTTP response code that would have been given if the handler was synchronously called.
  • ErrorMessage – The error message given back by the Lambda runtime. In the example above, it is the error message from the handler.

In addition to these attributes, the body of the event is held in the Message attribute of the Sns object. If you use an SQS queue instead, the additional attributes are in the MessageAttributes object and the event body is held in the Body attribute of the message.

Handling timeouts

One of the most common failures to occur in Lambda functions is a timeout. In this scenario, the Lambda function executes until it’s been forcefully terminated by the Lambda runtime, which sends an error message indicating that the function has timed out, as in the following example error message:

"ErrorMessage": {

"Type": "String", 

"Value": "2016-11-29T04:27:36.789Z b4797725-b5eb-11e6-acb2-17876a085622 Task timed out after 300.00 seconds" 

}

An error handler can simply parse for the string Task timed out after in the Value attribute, and act accordingly, such as breaking the request into multiple Lambda invocations, or sending to a different queue that spins up EC2 instances in an Auto Scaling group for handling larger jobs.

Handling critical failures

Another scenario that you may need to handle is when critical failures occur. Some examples of a critical failure are:

  • A misconfiguration of the Lambda handler
  • A system crash, such as an out-of-memory error

In either case, there’s very little that can be handled gracefully in application logic. These kinds of errors can be forwarded to operations support for root cause analysis or break glass fixes.

In the case of a system crash, your dead letter queue receives an error message similar to the following:

"RequestID": { "Type": "String", "Value": "6502cad0-b641-11e6-bd4e-279609143c53" }, 

"ErrorCode": { "Type": "String", "Value": "200" },

"ErrorMessage": { "Type": "String", "Value": "Process exited before completing request" }

For this example, the Lambda handler was forced to crash with an out-of-memory error, which can be found by searching in the Lambda handler’s log stream by the given RequestID.

In the case of a misconfiguration, your dead letter queue receives an error message along the following lines:

"ErrorMessage": { "Type": "String", "Value": "Cannot find module '/var/task/index'" }

In this example, the Lambda handler was misconfigured to load a non-existent index.js module.

Monitoring Lambda functions configured with dead letter queues

Lambda functions with a configured dead letter queue also come with their own CloudWatch metric called “DeadLetterErrors”. The metric is incremented whenever the dead letter message payload can’t be sent to the dead letter queue at any time.

Conclusion

With the launch of Dead Letter Queues, Lambda function developers can now create much simpler functions by focusing only on the business logic, and leverage the AWS Lambda infrastructure to delegate error handling elsewhere in a more graceful manner.

For more information, read about Dead Letter Queues in the AWS Lambda Developer Guide. Happy coding everyone, and have fun creating awesome serverless applications!