This post is courtesy of Otavio Ferreira, Manager, Amazon SNS, AWS Messaging.
Amazon SNS message filtering provides a set of string and numeric matching operators that allow each subscription to receive only the messages of interest. Hence, SNS message filtering can simplify your pub/sub messaging architecture by offloading the message filtering logic from your subscriber systems, as well as the message routing logic from your publisher systems.
After you set the subscription attribute that defines a filter policy, the subscribing endpoint receives only the messages that carry attributes matching this filter policy. Other messages published to the topic are filtered out for this subscription. In this way, the native integration between SNS and Amazon CloudWatch provides visibility into the number of messages delivered, as well as the number of messages filtered out.
CloudWatch metrics are captured automatically for you. To get started with SNS message filtering, see Filtering Messages with Amazon SNS.
Message Filtering Metrics
The following six CloudWatch metrics are relevant to understanding your SNS message filtering activity:
NumberOfMessagesPublished – Inbound traffic to SNS. This metric tracks all the messages that have been published to the topic.
NumberOfNotificationsDelivered – Outbound traffic from SNS. This metric tracks all the messages that have been successfully delivered to endpoints subscribed to the topic. A delivery takes place either when the incoming message attributes match a subscription filter policy, or when the subscription has no filter policy at all, which results in a catch-all behavior.
NumberOfNotificationsFilteredOut – This metric tracks all the messages that were filtered out because they carried attributes that didn’t match the subscription filter policy.
NumberOfNotificationsFilteredOut-NoMessageAttributes – This metric tracks all the messages that were filtered out because they didn’t carry any attributes at all and, consequently, didn’t match the subscription filter policy.
NumberOfNotificationsFilteredOut-InvalidAttributes – This metric keeps track of messages that were filtered out because they carried invalid or malformed attributes and, thus, didn’t match the subscription filter policy.
NumberOfNotificationsFailed – This last metric tracks all the messages that failed to be delivered to subscribing endpoints, regardless of whether a filter policy had been set for the endpoint. This metric is emitted after the message delivery retry policy is exhausted, and SNS stops attempting to deliver the message. At that moment, the subscribing endpoint is likely no longer reachable. For example, the subscribing SQS queue or Lambda function has been deleted by its owner. You may want to closely monitor this metric to address message delivery issues quickly.
Message filtering graphs
Through the AWS Management Console, you can compose graphs to display your SNS message filtering activity. The graph shows the number of messages published, delivered, and filtered out within the timeframe you specify (1h, 3h, 12h, 1d, 3d, 1w, or custom).
To compose an SNS message filtering graph with CloudWatch:
Open the CloudWatch console.
Choose Metrics, SNS, All Metrics, and Topic Metrics.
Select all metrics to add to the graph, such as:
NumberOfMessagesPublished
NumberOfNotificationsDelivered
NumberOfNotificationsFilteredOut
Choose Graphed metrics.
In the Statistic column, switch from Average to Sum.
Title your graph with a descriptive name, such as “SNS Message Filtering”
After you have your graph set up, you may want to copy the graph link for bookmarking, emailing, or sharing with co-workers. You may also want to add your graph to a CloudWatch dashboard for easy access in the future. Both actions are available to you on the Actions menu, which is found above the graph.
Summary
SNS message filtering defines how SNS topics behave in terms of message delivery. By using CloudWatch metrics, you gain visibility into the number of messages published, delivered, and filtered out. This enables you to validate the operation of filter policies and more easily troubleshoot during development phases.
SNS message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). CloudWatch metrics for SNS message filtering is available now, in all AWS Regions.
It’s a public holiday here today (yes, again). So, while we indulge in the traditional pastime of barbecuing stuff (ourselves, mainly), here’s a little trove of Pi projects that cater for our various furry friends.
Project Floofball
Nicole Horward created Project Floofball for her hamster, Harold. It’s an IoT hamster wheel that uses a Raspberry Pi and a magnetic door sensor to log how far Harold runs.
JaganK3 used to work long hours that meant he couldn’t be there to feed his dog on time. He found that he couldn’t buy an automated feeder in India without paying a lot to import one, so he made one himself. It uses a Raspberry Pi to control a motor that turns a dispensing valve in a hopper full of dry food, giving his dog a portion of food at set times.
He also added a web cam for live video streaming, because he could. Find out more in JaganK3’s Instructable for his pet feeder.
Shark laser cat toy
Sam Storino, meanwhile, is using a Raspberry Pi to control a laser-pointer cat toy with a goshdarned SHARK (which is kind of what I’d expect from the guy who made the steampunk-looking cat feeder a few weeks ago). The idea is to keep his cats interested and active within the confines of a compact city apartment.
Post with 52 votes and 7004 views. Tagged with cat, shark, lasers, austin powers, raspberry pi; Shared by JeorgeLeatherly. Raspberry Pi Automatic Cat Laser Pointer Toy
If I were a cat, I would definitely be entirely happy with this. Find out more on Sam’s website.
All of these makers are generous in acknowledging the tutorials and build logs that helped them with their projects. It’s lovely to see the Raspberry Pi and maker community working like this, and I bet their projects will inspire others too.
Now, if you’ll excuse me. I’m late for a barbecue.
When I talk with customers and partners, I find that they are in different stages in the adoption of DevOps methodologies. They are automating the creation of application artifacts and the deployment of their applications to different infrastructure environments. In many cases, they are creating and supporting multiple applications using a variety of coding languages and artifacts.
The management of these processes and artifacts can be challenging, but using the right tools and methodologies can simplify the process.
In this post, I will show you how you can automate the creation and storage of application artifacts through the implementation of a pipeline and custom deploy action in AWS CodePipeline. The example includes a Node.js code base stored in an AWS CodeCommit repository. A Node Package Manager (npm) artifact is built from the code base, and the build artifact is published to a JFrogArtifactory npm repository.
I frequently recommend AWS CodePipeline, the AWS continuous integration and continuous delivery tool. You can use it to quickly innovate through integration and deployment of new features and bug fixes by building a workflow that automates the build, test, and deployment of new versions of your application. And, because AWS CodePipeline is extensible, it allows you to create a custom action that performs customized, automated actions on your behalf.
JFrog’s Artifactory is a universal binary repository manager where you can manage multiple applications, their dependencies, and versions in one place. Artifactory also enables you to standardize the way you manage your package types across all applications developed in your company, no matter the code base or artifact type.
If you already have a Node.js CodeCommit repository, a JFrog Artifactory host, and would like to automate the creation of the pipeline, including the custom action and CodeBuild project, you can use this AWS CloudFormationtemplate to create your AWS CloudFormation stack.
This figure shows the path defined in the pipeline for this project. It starts with a change to Node.js source code committed to a private code repository in AWS CodeCommit. With this change, CodePipeline triggers AWS CodeBuild to create the npm package from the node.js source code. After the build, CodePipeline triggers the custom action job worker to commit the build artifact to the designated artifact repository in Artifactory.
This blog post assumes you have already:
· Created a CodeCommit repository that contains a Node.js project.
· Configured a two-stage pipeline in AWS CodePipeline.
The Source stage of the pipeline is configured to poll the Node.js CodeCommit repository. The Build stage is configured to use a CodeBuild project to build the npm package using a buildspec.yml file located in the code repository.
If you do not have a Node.js repository, you can create a CodeCommit repository that contains this simple ‘Hello World’ project. This project also includes a buildspec.yml file that is used when you define your CodeBuild project. It defines the steps to be taken by CodeBuild to create the npm artifact.
If you do not already have a pipeline set up in CodePipeline, you can use this template to create a pipeline with a CodeCommit source action and a CodeBuild build action through the AWS Command Line Interface (AWS CLI). If you do not want to install the AWS CLI on your local machine, you can use AWS Cloud9, our managed integrated development environment (IDE), to interact with AWS APIs.
In your development environment, open your favorite editor and fill out the template with values appropriate to your project. For information, see the readme in the GitHub repository.
Use this CLI command to create the pipeline from the template:
It creates a pipeline that has a CodeCommit source action and a CodeBuild build action.
Integrating JFrog Artifactory
JFrog Artifactory provides default repositories for your project needs. For my NPM package repository, I am using the default virtual npm repository (named npm) that is available in Artifactory Pro. You might want to consider creating a repository per project but for the example used in this post, using the default lets me get started without having to configure a new repository.
I can use the steps in the Set Me Up -> npm section on the landing page to configure my worker to interact with the default NPM repository.
Describes the required values to run the custom action. I will define my custom action in the ‘Deploy’ category, identify the provider as ‘Artifactory’, of version ‘1’, and specify a variety of configurationProperties whose values will be defined when this stage is added to my pipeline.
Polls CodePipeline for a job, scanning for its action-definition properties. In this blog post, after a job has been found, the job worker does the work required to publish the npm artifact to the Artifactory repository.
{
"category": "Deploy",
"configurationProperties": [{
"name": "TypeOfArtifact",
"required": true,
"key": true,
"secret": false,
"description": "Package type, ex. npm for node packages",
"type": "String"
},
{ "name": "RepoKey",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Name of the repository in which this artifact should be stored"
},
{ "name": "UserName",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Username for authenticating with the repository"
},
{ "name": "Password",
"required": true,
"key": true,
"secret": true,
"type": "String",
"description": "Password for authenticating with the repository"
},
{ "name": "EmailAddress",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Email address used to authenticate with the repository"
},
{ "name": "ArtifactoryHost",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Public address of Artifactory host, ex: https://myexamplehost.com or http://myexamplehost.com:8080"
}],
"provider": "Artifactory",
"version": "1",
"settings": {
"entityUrlTemplate": "{Config:ArtifactoryHost}/artifactory/webapp/#/artifacts/browse/tree/General/{Config:RepoKey}"
},
"inputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 1
},
"outputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 0
}
}
There are seven sections to the custom action definition:
category: This is the stage in which you will be creating this action. It can be Source, Build, Deploy, Test, Invoke, Approval. Except for source actions, the category section simply allows us to organize our actions. I am setting the category for my action as ‘Deploy’ because I’m using it to publish my node artifact to my Artifactory instance.
configurationProperties: These are the parameters or variables required for your project to authenticate and commit your artifact. In the case of my custom worker, I need:
TypeOfArtifact: In this case, npm, because it’s for the Node Package Manager.
RepoKey: The name of the repository. In this case, it’s the default npm.
UserName and Password for the user to authenticate with the Artifactory repository.
EmailAddress used to authenticate with the repository.
Artifactory host name or IP address.
provider: The name you define for your custom action stage. I have named the provider Artifactory.
version: Version number for the custom action. Because this is the first version, I set the version number to 1.
entityUrlTemplate: This URL is presented to your users for the deploy stage along with the title you define in your provider. The link takes the user to their artifact repository page in the Artifactory host.
inputArtifactDetails: The number of artifacts to expect from the previous stage in the pipeline.
outputArtifactDetails: The number of artifacts that should be the result from the custom action stage. Later in this blog post, I define 0 for my output artifacts because I am publishing the artifact to the Artifactory repository as the final action.
After I define the custom action in a JSON file, I use the AWS CLI to create the custom action type in CodePipeline:
After I create the custom action type in the same region as my pipeline, I edit the pipeline to add a Deploy stage and configure it to use the custom action I created for Artifactory:
I have created a custom worker for the actions required to commit the npm artifact to the Artifactory repository. The worker is in Python and it runs in a loop on an Amazon EC2 instance. My custom worker polls for a deploy job and publishes the NPM artifact to the Artifactory repository.
The EC2 instance is running Amazon Linux and has an IAM instance role attached that gives the worker permission to access CodePipeline. The worker process is as follows:
Take the configuration properties from the custom worker and poll CodePipeline for a custom action job.
After there is a job in the job queue with the appropriate category, provider, and version, acknowledge the job.
Download the zipped artifact created in the previous Build stage from the provided S3 buckets with the provided temporary credentials.
Unzip the artifact into a temporary directory.
A user-defined Artifactory user name and password is used to receive a temporary API key from Artifactory.
To avoid having to write the password to a file, use that temporary API key and user name to authenticate with the NPM repository.
Publish the Node.js package to the specified repository.
Because I am running my custom worker on an Amazon Linux EC2 instance, I installed npm with the following command:
sudo yum install nodejs npm --enablerepo=epel
For my custom worker, I used pip to install the required Python libraries:
pip install boto3 requests
For a full Python package list, see requirements.txt in the GitHub repository.
Let’s take a look at some of the code snippets from the worker.
First, the worker polls for jobs:
def action_type():
ActionType = {
'category': 'Deploy',
'owner': 'Custom',
'provider': 'Artifactory',
'version': '1' }
return(ActionType)
def poll_for_jobs():
try:
artifactory_action_type = action_type()
print(artifactory_action_type)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
while not jobs['jobs']:
time.sleep(10)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
if jobs['jobs']:
print('Job found')
return jobs['jobs'][0]
except ClientError as e:
print("Received an error: %s" % str(e))
raise
When there is a job in the queue, the poller returns a number of values from the queue such as jobId, the input and output S3 buckets for artifacts, temporary credentials to access the S3 buckets, and other configuration details from the stage in the pipeline.
After successfully receiving the job details, the worker sends an acknowledgement to CodePipeline to ensure that the work on the job is not duplicated by other workers watching for the same job:
def job_acknowledge(jobId, nonce):
try:
print('Acknowledging job')
result = codepipeline.acknowledge_job(jobId=jobId, nonce=nonce)
return result
except Exception as e:
print("Received an error when trying to acknowledge the job: %s" % str(e))
raise
With the job now acknowledged, the worker publishes the source code artifact into the desired repository. The worker gets the value of the artifact S3 bucket and objectKey from the inputArtifacts in the response from the poll_for_jobs API request. Next, the worker creates a new directory in /tmp and downloads the S3 object into this directory:
def get_bucket_location(bucketName, init_client):
region = init_client.get_bucket_location(Bucket=bucketName)['LocationConstraint']
if not region:
region = 'us-east-1'
return region
def get_s3_artifact(bucketName, objectKey, ak, sk, st):
init_s3 = boto3.client('s3')
region = get_bucket_location(bucketName, init_s3)
session = Session(aws_access_key_id=ak,
aws_secret_access_key=sk,
aws_session_token=st)
s3 = session.resource('s3',
region_name=region,
config=botocore.client.Config(signature_version='s3v4'))
try:
tempdirname = tempfile.mkdtemp()
except OSError as e:
print('Could not write temp directory %s' % tempdirname)
raise
bucket = s3.Bucket(bucketName)
obj = bucket.Object(objectKey)
filename = tempdirname + '/' + objectKey
try:
if os.path.dirname(objectKey):
directory = os.path.dirname(filename)
os.makedirs(directory)
print('Downloading the %s object and writing it to disk in %s location' % (objectKey, tempdirname))
with open(filename, 'wb') as data:
obj.download_fileobj(data)
except ClientError as e:
print('Downloading the object and writing the file to disk raised this error: ' + str(e))
raise
return(filename, tempdirname)
Because the downloaded artifact from S3 is a zip file, the worker must unzip it first. To have a clean area in which to work, I extract the downloaded zip archive into a new directory:
def unzip_codepipeline_artifact(artifact, origtmpdir):
# create a new temp directory
# Unzip artifact into new directory
try:
newtempdir = tempfile.mkdtemp()
print('Extracting artifact %s into temporary directory %s' % (artifact, newtempdir))
zip_ref = zipfile.ZipFile(artifact, 'r')
zip_ref.extractall(newtempdir)
zip_ref.close()
shutil.rmtree(origtmpdir)
return(os.listdir(newtempdir), newtempdir)
except OSError as e:
if e.errno != errno.EEXIST:
shutil.rmtree(newtempdir)
raise
The worker now has the npm package that I want to store in my Artifactory NPM repository.
To authenticate with the NPM repository, the worker requests a temporary token from the Artifactory host. After receiving this temporary token, it creates a .npmrc file in the worker user’s home directory that includes a hash of the user name and temporary token. After it has authenticated, the worker runs npm config set registry <URL OF REPOSITORY> to configure the npm registry value to be the Artifactory host. Next, the worker runs npm publish –registry <URL OF REPOSITORY>, which publishes the node package to the NPM repository in the Artifactory host.
def push_to_npm(configuration, artifact_list, temp_dir, jobId):
reponame = configuration['RepoKey']
art_type = configuration['TypeOfArtifact']
print("Putting artifact into NPM repository " + reponame)
token, hostname, username = gen_artifactory_auth_token(configuration)
npmconfigfile = create_npmconfig_file(configuration, username, token)
url = hostname + '/artifactory/api/' + art_type + '/' + reponame
print("Changing directory to " + str(temp_dir))
os.chdir(temp_dir)
try:
print("Publishing following files to the repository: %s " % os.listdir(temp_dir))
print("Sending artifact to Artifactory NPM registry URL: " + url)
subprocess.call(["npm", "config", "set", "registry", url])
req = subprocess.call(["npm", "publish", "--registry", url])
print("Return code from npm publish: " + str(req))
if req != 0:
err_msg = "npm ERR! Recieved non OK response while sending response to Artifactory. Return code from npm publish: " + str(req)
signal_failure(jobId, err_msg)
else:
signal_success(jobId)
except requests.exceptions.RequestException as e:
print("Received an error when trying to commit artifact %s to repository %s: " % (str(art_type), str(configuration['RepoKey']), str(e)))
raise
return(req, npmconfigfile)
If the return value from publishing to the repository is not 0, the worker signals a failure to CodePipeline. If the value is 0, the worker signals success to CodePipeline to indicate that the stage of the pipeline has been completed successfully.
For the custom worker code, see npm_job_worker.py in the GitHub repository.
I run my custom worker on an EC2 instance using the command python npm_job_worker.py, with an optional --version flag that can be used to specify worker versions other than 1. Then I trigger a release change in my pipeline:
From my custom worker output logs, I have just committed a package named node_example at version 1.0.3:
On artifact: index.js
Committing to the repo: https://artifactory.myexamplehost.com/artifactory/api/npm/npm
Sending artifact to Artifactory URL: https:// artifactoryhost.myexamplehost.com/artifactory/api/npm/npm
npm config: 0
npm http PUT https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
npm http 201 https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
+ [email protected]
Return code from npm publish: 0
Signaling success to CodePipeline
After that has been built successfully, I can find my artifact in my Artifactory repository:
To help you automate this process, I have created this AWS CloudFormation template that automates the creation of the CodeBuild project, the custom action, and the CodePipeline pipeline. It also launches the Amazon EC2-based custom job worker in an AWS Auto Scaling group. This template requires you to have a VPC and CodeCommit repository for your Node.js project. If you do not currently have a VPC in which you want to run your custom worker EC2 instances, you can use this AWS QuickStart to create one. If you do not have an existing Node.js project, I’ve provided a sample project in the GitHub repository.
Conclusion
I‘ve shown you the steps to integrate your JFrog Artifactory repository with your CodePipeline workflow. I’ve shown you how to create a custom action in CodePipeline and how to create a custom worker that works in your CI/CD pipeline. To dig deeper into custom actions and see how you can integrate your Artifactory repositories into your AWS CodePipeline projects, check out the full code base on GitHub.
If you have any questions or feedback, feel free to reach out to us through the AWS CodePipeline forum.
Erin McGill is a Solutions Architect in the AWS Partner Program with a focus on DevOps and automation tooling.
Someone changed the address of UPS corporate headquarters to his own apartment in Chicago. The company discovered it three months later.
The problem, of course, is that there isn’t any authentication of change-of-address submissions:
According to the Postal Service, nearly 37 million change-of-address requests known as PS Form 3575 were submitted in 2017. The form, which can be filled out in person or online, includes a warning below the signature line that “anyone submitting false or inaccurate information” could be subject to fines and imprisonment.
To cut down on possible fraud, post offices send a validation letter to both an old and new address when a change is filed. The letter includes a toll-free number to call to report anything suspicious.
Each year, only a tiny fraction of the requests are ever referred to postal inspectors for investigation. A spokeswoman for the U.S. Postal Inspection Service could not provide a specific number to the Tribune, but officials have previously said that the number of change-of-address investigations in a given year totals 1,000 or fewer typically.
While fraud involving change-of-address forms has long been linked to identity thieves, the targets are usually unsuspecting individuals, not massive corporations.
Abstract: ElsieFour (LC4) is a low-tech cipher that can be computed by hand; but unlike many historical ciphers, LC4 is designed to be hard to break. LC4 is intended for encrypted communication between humans only, and therefore it encrypts and decrypts plaintexts and ciphertexts consisting only of the English letters A through Z plus a few other characters. LC4 uses a nonce in addition to the secret key, and requires that different messages use unique nonces. LC4 performs authenticated encryption, and optional header data can be included in the authentication. This paper defines the LC4 encryption and decryption algorithms, analyzes LC4’s security, and describes a simple appliance for computing LC4 by hand.
Almost two decades ago I designed Solitaire, a pen-and-paper cipher that uses a deck of playing cards to store the cipher’s state. This algorithm uses specialized tiles. This gives the cipher designer more options, but it can be incriminating in a way that regular playing cards are not.
Creating these defenses is the goal of NIST’s lightweight cryptography initiative, which aims to develop cryptographic algorithm standards that can work within the confines of a simple electronic device. Many of the sensors, actuators and other micromachines that will function as eyes, ears and hands in IoT networks will work on scant electrical power and use circuitry far more limited than the chips found in even the simplest cell phone. Similar small electronics exist in the keyless entry fobs to newer-model cars and the Radio Frequency Identification (RFID) tags used to locate boxes in vast warehouses.
All of these gadgets are inexpensive to make and will fit nearly anywhere, but common encryption methods may demand more electronic resources than they possess.
This post courtesy of George Mao, AWS Senior Serverless Specialist – Solutions Architect
AWS Lambda and AWS CodeDeploy recently made it possible to automatically shift incoming traffic between two function versions based on a preconfigured rollout strategy. This new feature allows you to gradually shift traffic to the new function. If there are any issues with the new code, you can quickly rollback and control the impact to your application.
Previously, you had to manually move 100% of traffic from the old version to the new version. Now, you can have CodeDeploy automatically execute pre- or post-deployment tests and automate a gradual rollout strategy. Traffic shifting is built right into the AWS Serverless Application Model (SAM), making it easy to define and deploy your traffic shifting capabilities. SAM is an extension of AWS CloudFormation that provides a simplified way of defining serverless applications.
In this post, I show you how to use SAM, CloudFormation, and CodeDeploy to accomplish an automated rollout strategy for safe Lambda deployments.
Scenario
For this walkthrough, you write a Lambda application that returns a count of the S3 buckets that you own. You deploy it and use it in production. Later on, you receive requirements that tell you that you need to change your Lambda application to count only buckets that begin with the letter “a”.
Before you make the change, you need to be sure that your new Lambda application works as expected. If it does have issues, you want to minimize the number of impacted users and roll back easily. To accomplish this, you create a deployment process that publishes the new Lambda function, but does not send any traffic to it. You use CodeDeploy to execute a PreTraffic test to ensure that your new function works as expected. After the test succeeds, CodeDeploy automatically shifts traffic gradually to the new version of the Lambda function.
Your Lambda function is exposed as a REST service via an Amazon API Gateway deployment. This makes it easy to test and integrate.
Prerequisites
To execute the SAM and CloudFormation deployment, you must have the following IAM permissions:
cloudformation:*
lambda:*
codedeploy:*
iam:create*
You may use the AWS SAM Local CLI or the AWS CLI to package and deploy your Lambda application. If you choose to use SAM Local, be sure to install it onto your system. For more information, see AWS SAM Local Installation.
For this post, use SAM to define your resources because it comes with built-in CodeDeploy support for safe Lambda deployments. The deployment is handled and automated by CloudFormation.
SAM allows you to define your Serverless applications in a simple and concise fashion, because it automatically creates all necessary resources behind the scenes. For example, if you do not define an execution role for a Lambda function, SAM automatically creates one. SAM also creates the CodeDeploy application necessary to drive the traffic shifting, as well as the IAM service role that CodeDeploy uses to execute all actions.
Create a SAM template
To get started, write your SAM template and call it template.yaml.
Review the key parts of the SAM template that defines returnS3Buckets:
The AutoPublishAlias attribute instructs SAM to automatically publish a new version of the Lambda function for each new deployment and link it to the live alias.
The Policies attribute specifies additional policy statements that SAM adds onto the automatically generated IAM role for this function. The first statement provides the function with permission to call listBuckets.
The DeploymentPreference attribute configures the type of rollout pattern to use. In this case, you are shifting traffic in a linear fashion, moving 10% of traffic every minute to the new version. For more information about supported patterns, see Serverless Application Model: Traffic Shifting Configurations.
The Hooks attribute specifies that you want to execute the preTrafficHook Lambda function before CodeDeploy automatically begins shifting traffic. This function should perform validation testing on the newly deployed Lambda version. This function invokes the new Lambda function and checks the results. If you’re satisfied with the tests, instruct CodeDeploy to proceed with the rollout via an API call to: codedeploy.putLifecycleEventHookExecutionStatus.
The Events attribute defines an API-based event source that can trigger this function. It accepts requests on the /test path using an HTTP GET method.
'use strict';
const AWS = require('aws-sdk');
const codedeploy = new AWS.CodeDeploy({apiVersion: '2014-10-06'});
var lambda = new AWS.Lambda();
exports.handler = (event, context, callback) => {
console.log("Entering PreTraffic Hook!");
// Read the DeploymentId & LifecycleEventHookExecutionId from the event payload
var deploymentId = event.DeploymentId;
var lifecycleEventHookExecutionId = event.LifecycleEventHookExecutionId;
var functionToTest = process.env.NewVersion;
console.log("Testing new function version: " + functionToTest);
// Perform validation of the newly deployed Lambda version
var lambdaParams = {
FunctionName: functionToTest,
InvocationType: "RequestResponse"
};
var lambdaResult = "Failed";
lambda.invoke(lambdaParams, function(err, data) {
if (err){ // an error occurred
console.log(err, err.stack);
lambdaResult = "Failed";
}
else{ // successful response
var result = JSON.parse(data.Payload);
console.log("Result: " + JSON.stringify(result));
// Check the response for valid results
// The response will be a JSON payload with statusCode and body properties. ie:
// {
// "statusCode": 200,
// "body": 51
// }
if(result.body == 9){
lambdaResult = "Succeeded";
console.log ("Validation testing succeeded!");
}
else{
lambdaResult = "Failed";
console.log ("Validation testing failed!");
}
// Complete the PreTraffic Hook by sending CodeDeploy the validation status
var params = {
deploymentId: deploymentId,
lifecycleEventHookExecutionId: lifecycleEventHookExecutionId,
status: lambdaResult // status can be 'Succeeded' or 'Failed'
};
// Pass AWS CodeDeploy the prepared validation test results.
codedeploy.putLifecycleEventHookExecutionStatus(params, function(err, data) {
if (err) {
// Validation failed.
console.log('CodeDeploy Status update failed');
console.log(err, err.stack);
callback("CodeDeploy Status update failed");
} else {
// Validation succeeded.
console.log('Codedeploy status updated successfully');
callback(null, 'Codedeploy status updated successfully');
}
});
}
});
}
The hook is hardcoded to check that the number of S3 buckets returned is 9.
Review the key parts of the SAM template that defines preTrafficHook:
The Policies attribute specifies additional policy statements that SAM adds onto the automatically generated IAM role for this function. The first statement provides permissions to call the CodeDeploy PutLifecycleEventHookExecutionStatus API action. The second statement provides permissions to invoke the specific version of the returnS3Buckets function to test
This function has traffic shifting features disabled by setting the DeploymentPreference option to false.
The FunctionName attribute explicitly tells CloudFormation what to name the function. Otherwise, CloudFormation creates the function with the default naming convention: [stackName]-[FunctionName]-[uniqueID]. Name the function with the “CodeDeployHook_” prefix because the CodeDeployServiceRole role only allows InvokeFunction on functions named with that prefix.
Set the Timeout attribute to allow enough time to complete your validation tests.
Use an environment variable to inject the ARN of the newest deployed version of the returnS3Buckets function. The ARN allows the function to know the specific version to invoke and perform validation testing on.
Deploy the function
Your SAM template is all set and the code is written—you’re ready to deploy the function for the first time. Here’s how to do it via the SAM CLI. Replace “sam” with “cloudformation” to use CloudFormation instead.
First, package the function. This command returns a CloudFormation importable file, packaged.yaml.
sam package –template-file template.yaml –s3-bucket mybucket –output-template-file packaged.yaml
Now deploy everything:
sam deploy –template-file packaged.yaml –stack-name mySafeDeployStack –capabilities CAPABILITY_IAM
At this point, both Lambda functions have been deployed within the CloudFormation stack mySafeDeployStack. The returnS3Buckets has been deployed as Version 1:
SAM automatically created a few things, including the CodeDeploy application, with the deployment pattern that you specified (Linear10PercentEvery1Minute). There is currently one deployment group, with no action, because no deployments have occurred. SAM also created the IAM service role that this CodeDeploy application uses:
There is a single managed policy attached to this role, which allows CodeDeploy to invoke any Lambda function that begins with “CodeDeployHook_”.
An API has been set up called safeDeployStack. It targets your Lambda function with the /test resource using the GET method. When you test the endpoint, API Gateway executes the returnS3Buckets function and it returns the number of S3 buckets that you own. In this case, it’s 51.
Publish a new Lambda function version
Now implement the requirements change, which is to make returnS3Buckets count only buckets that begin with the letter “a”. The code now looks like the following (see returnS3BucketsNew.js in GitHub):
'use strict';
var AWS = require('aws-sdk');
var s3 = new AWS.S3();
exports.handler = (event, context, callback) => {
console.log("I am here! " + context.functionName + ":" + context.functionVersion);
s3.listBuckets(function (err, data){
if(err){
console.log(err, err.stack);
callback(null, {
statusCode: 500,
body: "Failed!"
});
}
else{
var allBuckets = data.Buckets;
console.log("Total buckets: " + allBuckets.length);
//callback(null, allBuckets.length);
// New Code begins here
var counter=0;
for(var i in allBuckets){
if(allBuckets[i].Name[0] === "a")
counter++;
}
console.log("Total buckets starting with a: " + counter);
callback(null, {
statusCode: 200,
body: counter
});
}
});
}
Repackage and redeploy with the same two commands as earlier:
sam package –template-file template.yaml –s3-bucket mybucket –output-template-file packaged.yaml
sam deploy –template-file packaged.yaml –stack-name mySafeDeployStack –capabilities CAPABILITY_IAM
CloudFormation understands that this is a stack update instead of an entirely new stack. You can see that reflected in the CloudFormation console:
During the update, CloudFormation deploys the new Lambda function as version 2 and adds it to the “live” alias. There is no traffic routing there yet. CodeDeploy now takes over to begin the safe deployment process.
The first thing CodeDeploy does is invoke the preTrafficHook function. Verify that this happened by reviewing the Lambda logs and metrics:
The function should progress successfully, invoke Version 2 of returnS3Buckets, and finally invoke the CodeDeploy API with a success code. After this occurs, CodeDeploy begins the predefined rollout strategy. Open the CodeDeploy console to review the deployment progress (Linear10PercentEvery1Minute):
Verify the traffic shift
During the deployment, verify that the traffic shift has started to occur by running the test periodically. As the deployment shifts towards the new version, a larger percentage of the responses return 9 instead of 51. These numbers match the S3 buckets.
A minute later, you see 10% more traffic shifting to the new version. The whole process takes 10 minutes to complete. After completion, open the Lambda console and verify that the “live” alias now points to version 2:
After 10 minutes, the deployment is complete and CodeDeploy signals success to CloudFormation and completes the stack update.
Check the results
If you invoke the function alias manually, you see the results of the new implementation.
aws lambda invoke –function [lambda arn to live alias] out.txt
You can also execute the prod stage of your API and verify the results by issuing an HTTP GET to the invoke URL:
Summary
This post has shown you how you can safely automate your Lambda deployments using the Lambda traffic shifting feature. You used the Serverless Application Model (SAM) to define your Lambda functions and configured CodeDeploy to manage your deployment patterns. Finally, you used CloudFormation to automate the deployment and updates to your function and PreTraffic hook.
Now that you know all about this new feature, you’re ready to begin automating Lambda deployments with confidence that things will work as designed. I look forward to hearing about what you’ve built with the AWS Serverless Platform.
AWS Glue provides enhanced support for working with datasets that are organized into Hive-style partitions. AWS Glue crawlers automatically identify partitions in your Amazon S3 data. The AWS Glue ETL (extract, transform, and load) library natively supports partitions when you work with DynamicFrames. DynamicFrames represent a distributed collection of data without requiring you to specify a schema. You can now push down predicates when creating DynamicFrames to filter out partitions and avoid costly calls to S3. We have also added support for writing DynamicFrames directly into partitioned directories without converting them to Apache Spark DataFrames.
Partitioning has emerged as an important technique for organizing datasets so that they can be queried efficiently by a variety of big data systems. Data is organized in a hierarchical directory structure based on the distinct values of one or more columns. For example, you might decide to partition your application logs in Amazon S3 by date—broken down by year, month, and day. Files corresponding to a single day’s worth of data would then be placed under a prefix such as s3://my_bucket/logs/year=2018/month=01/day=23/.
Systems like Amazon Athena, Amazon Redshift Spectrum, and now AWS Glue can use these partitions to filter data by value without making unnecessary calls to Amazon S3. This can significantly improve the performance of applications that need to read only a few partitions.
In this post, we show you how to efficiently process partitioned datasets using AWS Glue. First, we cover how to set up a crawler to automatically scan your partitioned dataset and create a table and partitions in the AWS Glue Data Catalog. Then, we introduce some features of the AWS Glue ETL library for working with partitioned data. You can now filter partitions using SQL expressions or user-defined functions to avoid listing and reading unnecessary data from Amazon S3. We’ve also added support in the ETL library for writing AWS Glue DynamicFrames directly into partitions without relying on Spark SQL DataFrames.
Let’s get started!
Crawling partitioned data
In this example, we use the same GitHub archive dataset that we introduced in a previous post about Scala support in AWS Glue. This data, which is publicly available from the GitHub archive, contains a JSON record for every API request made to the GitHub service. A sample dataset containing one month of activity from January 2017 is available at the following location:
Here you can replace <region> with the AWS Region in which you are working, for example, us-east-1. This dataset is partitioned by year, month, and day, so an actual file will be at a path like the following:
To crawl this data, you can either follow the instructions in the AWS Glue Developer Guide or use the provided AWS CloudFormation template. This template creates a stack that contains the following:
An IAM role with permissions to access AWS Glue resources
A database in the AWS Glue Data Catalog named githubarchive_month
A crawler set up to crawl the GitHub dataset
An AWS Glue development endpoint (which is used in the next section to transform the data)
To run this template, you must provide an S3 bucket and prefix where you can write output data in the next section. The role that this template creates will have permission to write to this bucket only. You also need to provide a public SSH key for connecting to the development endpoint. For more information about creating an SSH key, see our Development Endpoint tutorial. After you create the AWS CloudFormation stack, you can run the crawler from the AWS Glue console.
In addition to inferring file types and schemas, crawlers automatically identify the partition structure of your dataset and populate the AWS Glue Data Catalog. This ensures that your data is correctly grouped into logical tables and makes the partition columns available for querying in AWS Glue ETL jobs or query engines like Amazon Athena.
After you crawl the table, you can view the partitions by navigating to the table in the AWS Glue console and choosing View partitions. The partitions should look like the following:
For partitioned paths in Hive-style of the form key=val, crawlers automatically populate the column name. In this case, because the GitHub data is stored in directories of the form 2017/01/01, the crawlers use default names like partition_0, partition_1, and so on. You can easily change these names on the AWS Glue console: Navigate to the table, choose Edit schema, and rename partition_0 to year, partition_1 to month, and partition_2 to day:
Now that you’ve crawled the dataset and named your partitions appropriately, let’s see how to work with partitioned data in an AWS Glue ETL job.
Transforming and filtering the data
To get started with the AWS Glue ETL libraries, you can use an AWS Glue development endpoint and an Apache Zeppelin notebook. AWS Glue development endpoints provide an interactive environment to build and run scripts using Apache Spark and the AWS Glue ETL library. They are great for debugging and exploratory analysis, and can be used to develop and test scripts before migrating them to a recurring job.
If you ran the AWS CloudFormation template in the previous section, then you already have a development endpoint named partition-endpoint in your account. Otherwise, you can follow the instructions in this development endpoint tutorial. In either case, you need to set up an Apache Zeppelin notebook, either locally, or on an EC2 instance. You can find more information about development endpoints and notebooks in the AWS Glue Developer Guide.
The following examples are all written in the Scala programming language, but they can all be implemented in Python with minimal changes.
Reading a partitioned dataset
To get started, let’s read the dataset and see how the partitions are reflected in the schema. First, you import some classes that you will need for this example and set up a GlueContext, which is the main class that you will use to read and write data.
Execute the following in a Zeppelin paragraph, which is a unit of executable code:
%spark
import com.amazonaws.services.glue.DynamicFrame import com.amazonaws.services.glue.DynamicRecord
import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.JsonOptions import org.apache.spark.SparkContext
import java.util.Calendar
import java.util.GregorianCalendar
import scala.collection.JavaConversions._
@transient val spark: SparkContext = SparkContext.getOrCreate()
val glueContext: GlueContext = new GlueContext(spark)
This is straightforward with two caveats: First, each paragraph must start with the line %spark to indicate that the paragraph is Scala. Second, the spark variable must be marked @transient to avoid serialization issues. This is only necessary when running in a Zeppelin notebook.
Next, read the GitHub data into a DynamicFrame, which is the primary data structure that is used in AWS Glue scripts to represent a distributed collection of data. A DynamicFrame is similar to a Spark DataFrame, except that it has additional enhancements for ETL transformations. DynamicFrames are discussed further in the post AWS Glue Now Supports Scala Scripts, and in the AWS Glue API documentation.
The following snippet creates a DynamicFrame by referencing the Data Catalog table that you just crawled and then prints the schema:
%spark
val githubEvents: DynamicFrame = glueContext.getCatalogSource(
database = "githubarchive_month",
tableName = "data"
).getDynamicFrame()
githubEvents.schema.asFieldList.foreach { field =>
println(s"${field.getName}: ${field.getType.getType.getName}")
}
You could also print the full schema using githubEvents.printSchema(). But in this case, the full schema is quite large, so I’ve printed only the top-level columns. This paragraph takes about 5 minutes to run on a standard size AWS Glue development endpoint. After it runs, you should see the following output:
Note that the partition columns year, month, and day were automatically added to each record.
Filtering by partition columns
One of the primary reasons for partitioning data is to make it easier to operate on a subset of the partitions, so now let’s see how to filter data by the partition columns. In particular, let’s find out what people are building in their free time by looking at GitHub activity on the weekends. One way to accomplish this is to use the filter transformation on the githubEvents DynamicFrame that you created earlier to select the appropriate events:
%spark
def filterWeekend(rec: DynamicRecord): Boolean = {
def getAsInt(field: String): Int = {
rec.getField(field) match {
case Some(strVal: String) => strVal.toInt
// The filter transformation will catch exceptions and mark the record as an error.
case _ => throw new IllegalArgumentException(s"Unable to extract field $field")
}
}
val (year, month, day) = (getAsInt("year"), getAsInt("month"), getAsInt("day"))
val cal = new GregorianCalendar(year, month - 1, day) // Calendar months start at 0.
val dayOfWeek = cal.get(Calendar.DAY_OF_WEEK)
dayOfWeek == Calendar.SATURDAY || dayOfWeek == Calendar.SUNDAY
}
val filteredEvents = githubEvents.filter(filterWeekend)
filteredEvents.count
This snippet defines the filterWeekend function that uses the Java Calendar class to identify those records where the partition columns (year, month, and day) fall on a weekend. If you run this code, you see that there were 6,303,480 GitHub events falling on the weekend in January 2017, out of a total of 29,160,561 events. This seems reasonable—about 22 percent of the events fell on the weekend, and about 29 percent of the days that month fell on the weekend (9 out of 31). So people are using GitHub slightly less on the weekends, but there is still a lot of activity!
Predicate pushdowns for partition columns
The main downside to using the filter transformation in this way is that you have to list and read all files in the entire dataset from Amazon S3 even though you need only a small fraction of them. This is manageable when dealing with a single month’s worth of data. But as you try to process more data, you will spend an increasing amount of time reading records only to immediately discard them.
To address this issue, we recently released support for pushing down predicates on partition columns that are specified in the AWS Glue Data Catalog. Instead of reading the data and filtering the DynamicFrame at executors in the cluster, you apply the filter directly on the partition metadata available from the catalog. Then you list and read only the partitions from S3 that you need to process.
To accomplish this, you can specify a Spark SQL predicate as an additional parameter to the getCatalogSource method. This predicate can be any SQL expression or user-defined function as long as it uses only the partition columns for filtering. Remember that you are applying this to the metadata stored in the catalog, so you don’t have access to other fields in the schema.
The following snippet shows how to use this functionality to read only those partitions occurring on a weekend:
%spark
val partitionPredicate =
"date_format(to_date(concat(year, '-', month, '-', day)), 'E') in ('Sat', 'Sun')"
val pushdownEvents = glueContext.getCatalogSource(
database = "githubarchive_month",
tableName = "data",
pushDownPredicate = partitionPredicate).getDynamicFrame()
Here you use the SparkSQL string concat function to construct a date string. You use the to_date function to convert it to a date object, and the date_format function with the ‘E’ pattern to convert the date to a three-character day of the week (for example, Mon, Tue, and so on). For more information about these functions, Spark SQL expressions, and user-defined functions in general, see the Spark SQL documentation and list of functions.
Note that the pushdownPredicate parameter is also available in Python. The corresponding call in Python is as follows:
You can observe the performance impact of pushing down predicates by looking at the execution time reported for each Zeppelin paragraph. The initial approach using a Scala filter function took 2.5 minutes:
Because the version using a pushdown lists and reads much less data, it takes only 24 seconds to complete, a 5X improvement!
Of course, the exact benefit that you see depends on the selectivity of your filter. The more partitions that you exclude, the more improvement you will see.
In addition to Hive-style partitioning for Amazon S3 paths, Parquet and ORC file formats further partition each file into blocks of data that represent column values. Each block also stores statistics for the records that it contains, such as min/max for column values. AWS Glue supports pushdown predicates for both Hive-style partitions and block partitions in these formats. While reading data, it prunes unnecessary S3 partitions and also skips the blocks that are determined unnecessary to be read by column statistics in Parquet and ORC formats.
Additional transformations
Now that you’ve read and filtered your dataset, you can apply any additional transformations to clean or modify the data. For example, you could augment it with sentiment analysis as described in the previous AWS Glue post.
To keep things simple, you can just pick out some columns from the dataset using the ApplyMapping transformation:
ApplyMapping is a flexible transformation for performing projection and type-casting. In this example, we use it to unnest several fields, such as actor.login, which we map to the top-level actor field. We also cast the id column to a long and the partition columns to integers.
Writing out partitioned data
The final step is to write out your transformed dataset to Amazon S3 so that you can process it with other systems like Amazon Athena. By default, when you write out a DynamicFrame, it is not partitioned—all the output files are written at the top level under the specified output path. Until recently, the only way to write a DynamicFrame into partitions was to convert it into a Spark SQL DataFrame before writing. We are excited to share that DynamicFrames now support native partitioning by a sequence of keys.
You can accomplish this by passing the additional partitionKeys option when creating a sink. For example, the following code writes out the dataset that you created earlier in Parquet format to S3 in directories partitioned by the type field.
Here, $outpath is a placeholder for the base output path in S3. The partitionKeys parameter can also be specified in Python in the connection_options dict:
When you execute this write, the type field is removed from the individual records and is encoded in the directory structure. To demonstrate this, you can list the output path using the aws s3 ls command from the AWS CLI:
PRE type=CommitCommentEvent/
PRE type=CreateEvent/
PRE type=DeleteEvent/
PRE type=ForkEvent/
PRE type=GollumEvent/
PRE type=IssueCommentEvent/
PRE type=IssuesEvent/
PRE type=MemberEvent/
PRE type=PublicEvent/
PRE type=PullRequestEvent/
PRE type=PullRequestReviewCommentEvent/
PRE type=PushEvent/
PRE type=ReleaseEvent/
PRE type=WatchEvent/
As expected, there is a partition for each distinct event type. In this example, we partitioned by a single value, but this is by no means required. For example, if you want to preserve the original partitioning by year, month, and day, you could simply set the partitionKeys option to be Seq(“year”, “month”, “day”).
Conclusion
In this post, we showed you how to work with partitioned data in AWS Glue. Partitioning is a crucial technique for getting the most out of your large datasets. Many tools in the AWS big data ecosystem, including Amazon Athena and Amazon Redshift Spectrum, take advantage of partitions to accelerate query processing. AWS Glue provides mechanisms to crawl, filter, and write partitioned data so that you can structure your data in Amazon S3 however you want, to get the best performance out of your big data applications.
Ben Sowell is a senior software development engineer at AWS Glue. He has worked for more than 5 years on ETL systems to help users unlock the potential of their data. In his free time, he enjoys reading and exploring the Bay Area.
Mohit Saxena is a senior software development engineer at AWS Glue. His passion is building scalable distributed systems for efficiently managing data on cloud. He also enjoys watching movies and reading about the latest technology.
If you build (or want to build) data-driven web and mobile apps and need real-time updates and the ability to work offline, you should take a look at AWS AppSync. Announced in preview form at AWS re:Invent 2017 and described in depth here, AWS AppSync is designed for use in iOS, Android, JavaScript, and React Native apps. AWS AppSync is built around GraphQL, an open, standardized query language that makes it easy for your applications to request the precise data that they need from the cloud.
I’m happy to announce that the preview period is over and that AWS AppSync is now generally available and production-ready, with six new features that will simplify and streamline your application development process:
Console Log Access – You can now see the CloudWatch Logs entries that are created when you test your GraphQL queries, mutations, and subscriptions from within the AWS AppSync Console.
Console Testing with Mock Data – You can now create and use mock context objects in the console for testing purposes.
Subscription Resolvers – You can now create resolvers for AWS AppSync subscription requests, just as you can already do for query and mutate requests.
Batch GraphQL Operations for DynamoDB – You can now make use of DynamoDB’s batch operations (BatchGetItem and BatchWriteItem) across one or more tables. in your resolver functions.
CloudWatch Support – You can now use Amazon CloudWatch Metrics and CloudWatch Logs to monitor calls to the AWS AppSync APIs.
CloudFormation Support – You can now define your schemas, data sources, and resolvers using AWS CloudFormation templates.
A Brief AppSync Review Before diving in to the new features, let’s review the process of creating an AWS AppSync API, starting from the console. I click Create API to begin:
I enter a name for my API and (for demo purposes) choose to use the Sample schema:
The schema defines a collection of GraphQL object types. Each object type has a set of fields, with optional arguments:
If I was creating an API of my own I would enter my schema at this point. Since I am using the sample, I don’t need to do this. Either way, I click on Create to proceed:
The GraphQL schema type defines the entry points for the operations on the data. All of the data stored on behalf of a particular schema must be accessible using a path that begins at one of these entry points. The console provides me with an endpoint and key for my API:
It also provides me with guidance and a set of fully functional sample apps that I can clone:
When I clicked Create, AWS AppSync created a pair of Amazon DynamoDB tables for me. I can click Data Sources to see them:
I can also see and modify my schema, issue queries, and modify an assortment of settings for my API.
Let’s take a quick look at each new feature…
Console Log Access The AWS AppSync Console already allows me to issue queries and to see the results, and now provides access to relevant log entries.In order to see the entries, I must enable logs (as detailed below), open up the LOGS, and check the checkbox. Here’s a simple mutation query that adds a new event. I enter the query and click the arrow to test it:
I can click VIEW IN CLOUDWATCH for a more detailed view:
Console Testing with Mock Data You can now create a context object in the console where it will be passed to one of your resolvers for testing purposes. I’ll add a testResolver item to my schema:
Then I locate it on the right-hand side of the Schema page and click Attach:
I choose a data source (this is for testing and the actual source will not be accessed), and use the Put item mapping template:
Then I click Select test context, choose Create New Context, assign a name to my test content, and click Save (as you can see, the test context contains the arguments from the query along with values to be returned for each field of the result):
After I save the new Resolver, I click Test to see the request and the response:
Subscription Resolvers Your AWS AppSync application can monitor changes to any data source using the @aws_subscribe GraphQL schema directive and defining a Subscription type. The AWS AppSync client SDK connects to AWS AppSync using MQTT over Websockets and the application is notified after each mutation. You can now attach resolvers (which convert GraphQL payloads into the protocol needed by the underlying storage system) to your subscription fields and perform authorization checks when clients attempt to connect. This allows you to perform the same fine grained authorization routines across queries, mutations, and subscriptions.
Batch GraphQL Operations Your resolvers can now make use of DynamoDB batch operations that span one or more tables in a region. This allows you to use a list of keys in a single query, read records multiple tables, write records in bulk to multiple tables, and conditionally write or delete related records across multiple tables.
In order to use this feature the IAM role that you use to access your tables must grant access to DynamoDB’s BatchGetItem and BatchPutItem functions.
CloudWatch Logs Support You can now tell AWS AppSync to log API requests to CloudWatch Logs. Click on Settings and Enable logs, then choose the IAM role and the log level:
CloudFormation Support You can use the following CloudFormation resource types in your templates to define AWS AppSync resources:
AWS::AppSync::GraphQLApi – Defines an AppSync API in terms of a data source (an Amazon Elasticsearch Service domain or a DynamoDB table).
AWS::AppSync::ApiKey – Defines the access key needed to access the data source.
AWS::AppSync::GraphQLSchema – Defines a GraphQL schema.
AWS::AppSync::DataSource – Defines a data source.
AWS::AppSync::Resolver – Defines a resolver by referencing a schema and a data source, and includes a mapping template for requests.
Here’s a simple schema definition in YAML form:
AppSyncSchema:
Type: "AWS::AppSync::GraphQLSchema"
DependsOn:
- AppSyncGraphQLApi
Properties:
ApiId: !GetAtt AppSyncGraphQLApi.ApiId
Definition: |
schema {
query: Query
mutation: Mutation
}
type Query {
singlePost(id: ID!): Post
allPosts: [Post]
}
type Mutation {
putPost(id: ID!, title: String!): Post
}
type Post {
id: ID!
title: String!
}
Available Now These new features are available now and you can start using them today! Here are a couple of blog posts and other resources that you might find to be of interest:
As the 4.17 merge window opened, it seemed possible that the kernel lockdown patch set could be merged at last. That was before the linux-kernel mailing list got its hands on the issue. What resulted was not one of the kernel community’s finest moments. But it did result in a couple of evident conclusions: kernel lockdown will almost certainly not be merged for 4.17, but something that looks very much like it is highly likely to be accepted in a subsequent merge window.
You can always view and manage your Amazon GuardDuty findings on the Findings page in the GuardDuty console or by using GuardDuty APIs with the AWS CLI or SDK. But there’s a quicker and easier way, you can use Amazon Alexa as a conversational interface to review your GuardDuty findings. With Alexa, you can build natural voice experiences and create a more intuitive way of interacting GuardDuty.
In this post, I show you how to deploy a sample custom Alexa skill and use an Alexa-enabled device, such as Amazon Echo, to get information about GuardDuty findings across your AWS accounts and regions. The information provided by this sample skill gives you a broad overview of GuardDuty finding statistics, severities, and descriptions. When you hear something interesting, you can log in to the GuardDuty console or another analysis tool to investigate the findings data.
Note: Although not covered here, you can also deploy this sample skill using Alexa for Business, which you can use to make skills available to your shared devices and enrolled users without having to publish them to the Alexa skills store.
Prerequisites
To complete the steps in this post, make sure you have:
A basic understanding of Alexa Custom Skills, which is helpful for deploying the sample skill described here. If you’re not already familiar with Alexa custom skill concepts and terminology, you might want to review the following documentation resources.
An AWS account with GuardDuty enabled in one or more AWS regions.
Deploy the Lambda function by using the CloudFormation Template.
Create the custom skill in the Alexa developer console.
Test the skill using an Alexa-enabled device.
Deploy the Lambda function with the CloudFormation Template
For this next step, make sure you deploy the template within the AWS account you want to monitor.
To deploy the Lambda function in the N. Virginia region (see the note below), you can use the CloudFormation template provided by clicking the following link: load the supplied template. In the CloudFormation console, on the Select Template page, select Next.
Note: The following AWS regions support hosting custom Alexa skills: US East (N. Virginia), Asia Pacific (Tokyo), EU (Ireland), West (Oregon). If you want to deploy in a region other than N. Virginia, you will first need to upload the custom skill’s Lambda deployment package (zip file with code) to an S3 bucket in the selected region.
After you load the template, provide the following input parameters:
Input parameter
Input parameter description
FLASHREGIONS
Comma separated list of region Ids with NO spaces to include in flash briefing stats. At least one region is required. Make sure GuardDuty is enabled in regions declared.
MAXRESP
Max number of findings to return in a response.
ArtifactsBucket
S3 Bucket where Lambda deployment package resides. Leave the default for N. Virginia.
ArtifactsPrefix
Path in S3 bucket where Lambda deployment package resides. Leave the default for N. Virginia.
On the Specify Details page, enter the input parameters (see above), and then select Next.
On the Options page, accept the default values, and then select Next.
On the Review page, confirm the details, and then select Create. The stack will be created in approximately 2 minutes.
Create the custom skill in the Alexa developer console
In the second part of this solution implementation, you will create the skill in the Amazon Developer Console.
Sign in to the Alexa area of the Amazon Developer Console, select Your Alexa Consoles in the top right, and then select Skills.
Select Create Skill.
For the name, enter Ask Amazon GuardDuty, and then select Next.
In the Choose a model to add to your skill page, select Custom, and then select Create skill.
Select the JSON Editor and paste the contents of the alexa_ask_guardduty_skill.json file into the code editor, and overwrite the existing content. This file contains the intent schema which defines the set of intents the service can accept and process.
Select Save Model, select Build Model, and then wait for the build to complete.
When the model build is complete, on the left side, select Endpoint.
In the Endpoint page, in the Service Endpoint Type section, select AWS Lambda ARN (Amazon Resource Name).
In the Default Region field, copy and paste the value from the CloudFormation Stack Outputs key named AlexaAskGDSkillArn. Leave the default values for other options, and then select Save Endpoints.
Because you’re not publishing this skill, you don’t need to complete the Launch section of the configuration. The skill will remain in the “Development” status and will only be available for Alexa devices linked to the Amazon developer account used to create the skill. Anyone with physical access to the linked Alexa-enabled device can use the custom skill. As a best practice, I recommend that you delete the Lambda trigger created by the CloudFormation template and add a new one with Skill ID verification enabled.
Test the skill using an Alexa-enabled device
Now that you’ve deployed the sample solution, the next step is to test the skill. Make sure you’re using an Alexa-enabled device linked to the Amazon developer account used to create the skill. Before testing, if there are no current GuardDuty findings available, you can generate sample findings in the console. When you generate sample findings, GuardDuty populates your current findings list with one sample finding for each supported finding type.
You can test using the following voice commands:
“Alexa, Open GuardDuty” — Opens the skill and provides a welcome response. You can also use “Alexa, Ask GuardDuty”.
“Get flash briefing” — Provides global and regional counts for low, medium, and high severity findings. The regions declared in the FLASHREGIONS parameter are included. You can also use “Ask GuardDuty to get flash briefing” to bypass the welcome message. You can learn more about GuardDuty severity levels in the documentation.
For the next set of commands, you can specify the region, use region names such as <Virginia>, <Oregon>, <Ireland>, and so on:
“Get statistics for region” — Provides regional counts for low, medium, and high severity findings.
“Get findings for region” — Returns finding information for the requested region. The number of findings returned is configured in the MAXRESP parameter.
“Get <high/medium/low> severity findings for region” – Returns finding information with the minimum severity requested as high, medium, or low. The number of findings returned is configured in the MAXRESP parameter.
“Help” — Provides information about the skill and supported utterances. Also provides current configuration for FLASHREGIONS and MAXRESP.
You can use this sample solution to get GuardDuty statistics and findings through the Alexa conversational interface. You’ll be able to identify findings that require further investigation quickly. This solution’s code is available on GitHub.
Now, your applications and federated users can complete longer running workloads in a single session by increasing the maximum session duration up to 12 hours for an IAM role. Users and applications still retrieve temporary credentials by assuming roles using AWS Security Token Service (AWS STS), but these credentials can now be valid for up to 12 hours when using the AWS SDK or CLI. This change allows your users and applications to perform longer running workloads, such as a batch upload to S3 or a CloudFormation template, using a single session. You can extend the maximum session duration using the IAM console or CLI. Once you increase the maximum session duration, users and applications assuming the IAM role can request temporary credentials that expire when the IAM role session expires.
In this post, I show you how to configure the maximum session duration for an existing IAM role to 4 hours (maximum allowed duration is 12 hours) using the IAM console. I’ll use 4 hours because AWS recommends configuring the session duration for a role to the shortest duration that your federated users would require to access your AWS resources. I’ll then show how existing federated users can use the AWS SDK or CLI to request temporary security credentials that are valid until the role session expires.
Configure the maximum session duration for an existing IAM role to 4 hours
Let’s assume you have an existing IAM role called ADFS-Production that allows your federated users to upload objects to an S3 bucket in your AWS account. You want to extend the maximum session duration for this role to 4 hours. By default, IAM roles in your AWS accounts have a maximum session duration of one hour. To extend a role’s maximum session duration to 4 hours, follow the steps below:
In the left navigation pane, select Roles and then select the role for which you want to increase the maximum session duration. For this example, I select ADFS-Production and verify the maximum session duration for this role. This value is set to 1 hour (3,600 seconds) by default.
Select Edit, and then define the maximum session duration.
Select one of the predefined durations or provide a custom duration. For this example, I set the maximum session duration to be 4 hours.
Select Save changes.
Alternatively, you can use the latest AWS CLI and call Update-Role to set the maximum session duration for the role ADFS-Production. Here’s an example to set the maximum session duration to 14,400 seconds (4 hours).
$ aws iam update-role -–role-name ADFS-Production -–MaxSessionDuration 14400
Now that you’ve successfully extended the maximum session for your IAM role, ADFS-Production, your federated users can use AWS STS to retrieve temporary credentials that are valid for 4 hours to access your S3 buckets.
Access AWS resources with temporary security credentials using AWS CLI/SDK
To enable federated SDK and CLI access for your users who use temporary security credentials, you might have implemented the solution described in the blog post on How to Implement Federated API and CLI Access Using SAML 2.0 and AD FS. That blog post demonstrates how to use the AWS Python SDK and some additional client-side integration code provided in the post to implement federated SDK and CLI access for your users. To enable your users to request longer temporary security credentials, you can make the following changes suggested in this blog to the solution provided in that post.
When calling AssumeRoleWithSAML API to request AWS temporary security credentials, you need to include the DurationSeconds parameter. The value of this parameter is the duration the user requests and, therefore, the duration their temporary security credentials are valid. In this example, I am using boto to request the maximum length of 14,400 seconds (4 hours) using code from the How to Implement Federated API and CLI Access Using SAML 2.0 and AD FS post that I have updated:
# Use the assertion to get an AWS STS token using Assume Role with SAML conn = boto.sts.connect_to_region(region) token = conn.assume_role_with_saml(role_arn, principal_arn, assertion, 14400)
By adding a value for the DurationSeconds parameter in the AssumeRoleWithSAML call, your federated user can retrieve temporary security credentials that are valid for up to 14,400 seconds (4 hours). If you don’t provide this value, the default session duration is 1 hour. If you provide a value of 5 hours for your temporary security credentials, AWS STS will throw an error since this is longer than the role session duration of 4 hours.
Conclusion
I demonstrated how you can configure the maximum session duration for a role from 1 hour (default) up to 12 hours. Then, I showed you how your federated users can retrieve temporary security credentials that are valid for longer durations to access AWS resources using AWS CLI/SDK for up to 12 hours.
Similarly, you can also increase the maximum role session duration for your applications and users who use Web Identity or OpenID Connect Federation or Cross-Account Access with Assume Role. If you have comments about this blog, submit them in the Comments section below. If you have questions or suggestions, please start a new thread on the IAM forum.
Product security is an interesting animal: it is a uniquely cross-disciplinary endeavor that spans policy, consulting, process automation, in-depth software engineering, and cutting-edge vulnerability research. And in contrast to many other specializations in our field of expertise – say, incident response or network security – we have virtually no time-tested and coherent frameworks for setting it up within a company of any size.
In my previous post, I shared some thoughts on nurturing technical organizations and cultivating the right kind of leadership within. Today, I figured it would be fitting to follow up with several notes on what I learned about structuring product security work – and about actually making the effort count.
The “comfort zone” trap
For security engineers, knowing your limits is a sought-after quality: there is nothing more dangerous than a security expert who goes off script and starts dispensing authoritatively-sounding but bogus advice on a topic they know very little about. But that same quality can be destructive when it prevents us from growing beyond our most familiar role: that of a critic who pokes holes in other people’s designs.
The role of a resident security critic lends itself all too easily to a sense of supremacy: the mistaken belief that our cognitive skills exceed the capabilities of the engineers and product managers who come to us for help – and that the cool bugs we file are the ultimate proof of our special gift. We start taking pride in the mere act of breaking somebody else’s software – and then write scathing but ineffectual critiques addressed to executives, demanding that they either put a stop to a project or sign off on a risk. And hey, in the latter case, they better brace for our triumphant “I told you so” at some later date.
Of course, escalations of this type have their place, but they need to be a very rare sight; when practiced routinely, they are a telltale sign of a dysfunctional team. We might be failing to think up viable alternatives that are in tune with business or engineering needs; we might be very unpersuasive, failing to communicate with other rational people in a language they understand; or it might be that our tolerance for risk is badly out of whack with the rest of the company. Whatever the cause, I’ve seen high-level escalations where the security team spoke of valiant efforts to resist inexplicably awful design decisions or data sharing setups; and where product leads in turn talked about pressing business needs randomly blocked by obstinate security folks. Sometimes, simply having them compare their notes would be enough to arrive at a technical solution – such as sharing a less sensitive subset of the data at hand.
To be effective, any product security program must be rooted in a partnership with the rest of the company, focused on helping them get stuff done while eliminating or reducing security risks. To combat the toxic us-versus-them mentality, I found it helpful to have some team members with software engineering backgrounds, even if it’s the ownership of a small open-source project or so. This can broaden our horizons, helping us see that we all make the same mistakes – and that not every solution that sounds good on paper is usable once we code it up.
Getting off the treadmill
All security programs involve a good chunk of operational work. For product security, this can be a combination of product launch reviews, design consulting requests, incoming bug reports, or compliance-driven assessments of some sort. And curiously, such reactive work also has the property of gradually expanding to consume all the available resources on a team: next year is bound to bring even more review requests, even more regulatory hurdles, and even more incoming bugs to triage and fix.
Being more tractable, such routine tasks are also more readily enshrined in SDLs, SLAs, and all kinds of other official documents that are often mistaken for a mission statement that justifies the existence of our teams. Soon, instead of explaining to a developer why they should fix a particular problem right away, we end up pointing them to page 17 in our severity classification guideline, which defines that “severity 2” vulnerabilities need to be resolved within a month. Meanwhile, another policy may be telling them that they need to run a fuzzer or a web application scanner for a particular number of CPU-hours – no matter whether it makes sense or whether the job is set up right.
To run a product security program that scales sublinearly, stays abreast of future threats, and doesn’t erect bureaucratic speed bumps just for the sake of it, we need to recognize this inherent tendency for operational work to take over – and we need to reign it in. No matter what the last year’s policy says, we usually don’t need to be doing security reviews with a particular cadence or to a particular depth; if we need to scale them back 10% to staff a two-quarter project that fixes an important API and squashes an entire class of bugs, it’s a short-term risk we should feel empowered to take.
As noted in my earlier post, I find contingency planning to be a valuable tool in this regard: why not ask ourselves how the team would cope if the workload went up another 30%, but bad financial results precluded any team growth? It’s actually fun to think about such hypotheticals ahead of the time – and hey, if the ideas sound good, why not try them out today?
Living for a cause
It can be difficult to understand if our security efforts are structured and prioritized right; when faced with such uncertainty, it is natural to stick to the safe fundamentals – investing most of our resources into the very same things that everybody else in our industry appears to be focusing on today.
I think it’s important to combat this mindset – and if so, we might as well tackle it head on. Rather than focusing on tactical objectives and policy documents, try to write down a concise mission statement explaining why you are a team in the first place, what specific business outcomes you are aiming for, how do you prioritize it, and how you want it all to change in a year or two. It should be a fluid narrative that reads right and that everybody on your team can take pride in; my favorite way of starting the conversation is telling folks that we could always have a new VP tomorrow – and that the VP’s first order of business could be asking, “why do you have so many people here and how do I know they are doing the right thing?”. It’s a playful but realistic framing device that motivates people to get it done.
In general, a comprehensive product security program should probably start with the assumption that no matter how many resources we have at our disposal, we will never be able to stay in the loop on everything that’s happening across the company – and even if we did, we’re not going to be able to catch every single bug. It follows that one of our top priorities for the team should be making sure that bugs don’t happen very often; a scalable way of getting there is equipping engineers with intuitive and usable tools that make it easy to perform common tasks without having to worry about security at all. Examples include standardized, managed containers for production jobs; safe-by-default APIs, such as strict contextual autoescaping for XSS or type safety for SQL; security-conscious style guidelines; or plug-and-play libraries that take care of common crypto or ACL enforcement tasks.
Of course, not all problems can be addressed on framework level, and not every engineer will always reach for the right tools. Because of this, the next principle that I found to be worth focusing on is containment and mitigation: making sure that bugs are difficult to exploit when they happen, or that the damage is kept in check. The solutions in this space can range from low-level enhancements (say, hardened allocators or seccomp-bpf sandboxes) to client-facing features such as browser origin isolation or Content Security Policy.
The usual consulting, review, and outreach tasks are an important facet of a product security program, but probably shouldn’t be the sole focus of your team. It’s also best to avoid undue emphasis on vulnerability showmanship: while valuable in some contexts, it creates a hypercompetitive environment that may be hostile to less experienced team members – not to mention, squashing individual bugs offers very limited value if the same issue is likely to be reintroduced into the codebase the next day. I like to think of security reviews as a teaching opportunity instead: it’s a way to raise awareness, form partnerships with engineers, and help them develop lasting habits that reduce the incidence of bugs. Metrics to understand the impact of your work are important, too; if your engagements are seen mostly as a yet another layer of red tape, product teams will stop reaching out to you for advice.
The other tenet of a healthy product security effort requires us to recognize at a scale and given enough time, every defense mechanism is bound to fail – and so, we need ways to prevent bugs from turning into incidents. The efforts in this space may range from developing product-specific signals for the incident response and monitoring teams; to offering meaningful vulnerability reward programs and nourishing a healthy and respectful relationship with the research community; to organizing regular offensive exercises in hopes of spotting bugs before anybody else does.
Oh, one final note: an important feature of a healthy security program is the existence of multiple feedback loops that help you spot problems without the need to micromanage the organization and without being deathly afraid of taking chances. For example, the data coming from bug bounty programs, if analyzed correctly, offers a wonderful way to alert you to systemic problems in your codebase – and later on, to measure the impact of any remediation and hardening work.
Most malware tries to compromise your systems by using a known vulnerability that the operating system maker has already patched. As best practices to help prevent malware from affecting your systems, you should apply all operating system patches and actively monitor your systems for missing patches.
Launch an Amazon EC2 instance for use with Systems Manager.
Configure Systems Manager to patch your Amazon EC2 Linux instances.
In two previous blog posts (Part 1 and Part 2), I showed how to use the AWS Management Console to perform the necessary steps to patch, inspect, and protect Microsoft Windows workloads. You can implement those same processes for your Linux instances running in AWS by changing the instance tags and types shown in the previous blog posts.
Because most Linux system administrators are more familiar with using a command line, I show how to patch Linux workloads by using the AWS CLI in this blog post. The steps to use the Amazon EBS Snapshot Scheduler and Amazon Inspector are identical for both Microsoft Windows and Linux.
What you should know first
To follow along with the solution in this post, you need one or more Amazon EC2 instances. You may use existing instances or create new instances. For this post, I assume this is an Amazon EC2 for Amazon Linux instance installed from Amazon Machine Images (AMIs).
Systems Manager is a collection of capabilities that helps you automate management tasks for AWS-hosted instances on Amazon EC2 and your on-premises servers. In this post, I use Systems Manager for two purposes: to run remote commands and apply operating system patches. To learn about the full capabilities of Systems Manager, see What Is AWS Systems Manager?
If you are not familiar with how to launch an Amazon EC2 instance, see Launching an Instance. I also assume you launched or will launch your instance in a private subnet. You must make sure that the Amazon EC2 instance can connect to the internet using a network address translation (NAT) instance or NAT gateway to communicate with Systems Manager. The following diagram shows how you should structure your VPC.
Later in this post, you will assign tasks to a maintenance window to patch your instances with Systems Manager. To do this, the IAM user you are using for this post must have the iam:PassRole permission. This permission allows the IAM user assigning tasks to pass his own IAM permissions to the AWS service. In this example, when you assign a task to a maintenance window, IAM passes your credentials to Systems Manager. You also should authorize your IAM user to use Amazon EC2 and Systems Manager. As mentioned before, you will be using the AWS CLI for most of the steps in this blog post. Our documentation shows you how to get started with the AWS CLI. Make sure you have the AWS CLI installed and configured with an AWS access key and secret access key that belong to an IAM user that have the following AWS managed policies attached to the IAM user you are using for this example: AmazonEC2FullAccess and AmazonSSMFullAccess.
Step 1: Launch an Amazon EC2 Linux instance
In this section, I show you how to launch an Amazon EC2 instance so that you can use Systems Manager with the instance. This step requires you to do three things:
Create an IAM role for Systems Manager before launching your Amazon EC2 instance.
Launch your Amazon EC2 instance with Amazon EBS and the IAM role for Systems Manager.
Add tags to the instances so that you can add your instances to a Systems Manager maintenance window based on tags.
A. Create an IAM role for Systems Manager
Before launching an Amazon EC2 instance, I recommend that you first create an IAM role for Systems Manager, which you will use to update the Amazon EC2 instance. AWS already provides a preconfigured policy that you can use for the new role and it is called AmazonEC2RoleforSSM.
Create a JSON file named trustpolicy-ec2ssm.json that contains the following trust policy. This policy describes which principal (an entity that can take action on an AWS resource) is allowed to assume the role we are going to create. In this example, the principal is the Amazon EC2 service.
Use the following command to create a role named EC2SSM that has the AWS managed policy AmazonEC2RoleforSSM attached to it. This generates JSON-based output that describes the role and its parameters, if the command is successful.
$ aws iam create-role --role-name EC2SSM --assume-role-policy-document file://trustpolicy-ec2ssm.json
Use the following command to attach the AWS managed IAM policy (AmazonEC2RoleforSSM) to your newly created role.
$ aws iam attach-role-policy --role-name EC2SSM --policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2RoleforSSM
Use the following commands to create the IAM instance profile and add the role to the instance profile. The instance profile is needed to attach the role we created earlier to your Amazon EC2 instance.
$ aws iam create-instance-profile --instance-profile-name EC2SSM-IP
$ aws iam add-role-to-instance-profile --instance-profile-name EC2SSM-IP --role-name EC2SSM
B. Launch your Amazon EC2 instance
To follow along, you need an Amazon EC2 instance that is running Amazon Linux. You can use any existing instance you may have or create a new instance.
When launching a new Amazon EC2 instance, be sure that:
Use the following command to launch a new Amazon EC2 instance using an Amazon Linux AMI available in the US East (N. Virginia) Region (also known as us-east-1). Replace YourKeyPair and YourSubnetId with your information. For more information about creating a key pair, see the create-key-pair documentation. Write down the InstanceId that is in the output because you will need it later in this post.
If you are using an existing Amazon EC2 instance, you can use the following command to attach the instance profile you created earlier to your instance.
The final step of configuring your Amazon EC2 instances is to add tags. You will use these tags to configure Systems Manager in Step 2 of this post. For this example, I add a tag named Patch Group and set the value to Linux Servers. I could have other groups of Amazon EC2 instances that I treat differently by having the same tag name but a different tag value. For example, I might have a collection of other servers with the tag name Patch Group with a value of Web Servers.
Use the following command to add the Patch Group tag to your Amazon EC2 instance.
Note: You must wait a few minutes until the Amazon EC2 instance is available before you can proceed to the next section. To make sure your Amazon EC2 instance is online and ready, you can use the following AWS CLI command:
At this point, you now have at least one Amazon EC2 instance you can use to configure Systems Manager.
Step 2: Configure Systems Manager
In this section, I show you how to configure and use Systems Manager to apply operating system patches to your Amazon EC2 instances, and how to manage patch compliance.
To start, I provide some background information about Systems Manager. Then, I cover how to:
Create the Systems Manager IAM role so that Systems Manager is able to perform patch operations.
Create a Systems Manager patch baseline and associate it with your instance to define which patches Systems Manager should apply.
Define a maintenance window to make sure Systems Manager patches your instance when you tell it to.
Monitor patch compliance to verify the patch state of your instances.
You must meet two prerequisites to use Systems Manager to apply operating system patches. First, you must attach the IAM role you created in the previous section, EC2SSM, to your Amazon EC2 instance. Second, you must install the Systems Manager agent on your Amazon EC2 instance. If you have used a recent Amazon Linux AMI, Amazon has already installed the Systems Manager agent on your Amazon EC2 instance. You can confirm this by logging in to an Amazon EC2 instance and checking the Systems Manager agent log files that are located at /var/log/amazon/ssm/.
For a maintenance window to be able to run any tasks, you must create a new role for Systems Manager. This role is a different kind of role than the one you created earlier: this role will be used by Systems Manager instead of Amazon EC2. Earlier, you created the role, EC2SSM, with the policy, AmazonEC2RoleforSSM, which allowed the Systems Manager agent on your instance to communicate with Systems Manager. In this section, you need a new role with the policy, AmazonSSMMaintenanceWindowRole, so that the Systems Manager service can execute commands on your instance.
To create the new IAM role for Systems Manager:
Create a JSON file named trustpolicy-maintenancewindowrole.json that contains the following trust policy. This policy describes which principal is allowed to assume the role you are going to create. This trust policy allows not only Amazon EC2 to assume this role, but also Systems Manager.
Use the following command to create a role named MaintenanceWindowRole that has the AWS managed policy, AmazonSSMMaintenanceWindowRole, attached to it. This command generates JSON-based output that describes the role and its parameters, if the command is successful.
$ aws iam create-role --role-name MaintenanceWindowRole --assume-role-policy-document file://trustpolicy-maintenancewindowrole.json
Use the following command to attach the AWS managed IAM policy (AmazonEC2RoleforSSM) to your newly created role.
$ aws iam attach-role-policy --role-name MaintenanceWindowRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonSSMMaintenanceWindowRole
B. Create a Systems Manager patch baseline and associate it with your instance
Next, you will create a Systems Manager patch baseline and associate it with your Amazon EC2 instance. A patch baseline defines which patches Systems Manager should apply to your instance. Before you can associate the patch baseline with your instance, though, you must determine if Systems Manager recognizes your Amazon EC2 instance. Use the following command to list all instances managed by Systems Manager. The --filters option ensures you look only for your newly created Amazon EC2 instance.
If your instance is missing from the list, verify that:
Your instance is running.
You attached the Systems Manager IAM role, EC2SSM.
You deployed a NAT gateway in your public subnet to ensure your VPC reflects the diagram shown earlier in this post so that the Systems Manager agent can connect to the Systems Manager internet endpoint.
Now that you have checked that Systems Manager can manage your Amazon EC2 instance, it is time to create a patch baseline. With a patch baseline, you define which patches are approved to be installed on all Amazon EC2 instances associated with the patch baseline. The Patch Group resource tag you defined earlier will determine to which patch group an instance belongs. If you do not specifically define a patch baseline, the default AWS-managed patch baseline is used.
To create a patch baseline:
Use the following command to create a patch baseline named AmazonLinuxServers. With approval rules, you can determine the approved patches that will be included in your patch baseline. In this example, you add all Critical severity patches to the patch baseline as soon as they are released, by setting the Auto approval delay to 0 days. By setting the Auto approval delay to 2 days, you add to this patch baseline the Important, Medium, and Low severity patches two days after they are released.
Use the following command to register the patch baseline you created with your instance. To do so, you use the Patch Group tag that you added to your Amazon EC2 instance.
Now that you have successfully set up a role, created a patch baseline, and registered your Amazon EC2 instance with your patch baseline, you will define a maintenance window so that you can control when your Amazon EC2 instances will receive patches. By creating multiple maintenance windows and assigning them to different patch groups, you can make sure your Amazon EC2 instances do not all reboot at the same time.
To define a maintenance window:
Use the following command to define a maintenance window. In this example command, the maintenance window will start every Saturday at 10:00 P.M. UTC. It will have a duration of 4 hours and will not start any new tasks 1 hour before the end of the maintenance window.
After defining the maintenance window, you must register the Amazon EC2 instance with the maintenance window so that Systems Manager knows which Amazon EC2 instance it should patch in this maintenance window. You can register the instance by using the same Patch Group tag you used to associate the Amazon EC2 instance with the AWS-provided patch baseline, as shown in the following command.
Assign a task to the maintenance window that will install the operating system patches on your Amazon EC2 instance. The following command includes the following options.
name is the name of your task and is optional. I named mine Patching.
task-arn is the name of the task document you want to run.
max-concurrency allows you to specify how many of your Amazon EC2 instances Systems Manager should patch at the same time. max-errors determines when Systems Manager should abort the task. For patching, this number should not be too low, because you do not want your entire patch task to stop on all instances if one instance fails. You can set this, for example, to 20%.
service-role-arn is the Amazon Resource Name (ARN) of the AmazonSSMMaintenanceWindowRole role you created earlier in this blog post.
task-invocation-parameters defines the parameters that are specific to the AWS-RunPatchBaseline task document and tells Systems Manager that you want to install patches with a timeout of 600 seconds (10 minutes).
Now, you must wait for the maintenance window to run at least once according to the schedule you defined earlier. If your maintenance window has expired, you can check the status of any maintenance tasks Systems Manager has performed by using the following command.
You also can see the overall patch compliance of all Amazon EC2 instances using the following command in the AWS CLI.
$ aws ssm list-compliance-summaries
This command shows you the number of instances that are compliant with each category and the number of instances that are not in JSON format.
You also can see overall patch compliance by choosing Compliance under Insights in the navigation pane of the Systems Manager console. You will see a visual representation of how many Amazon EC2 instances are up to date, how many Amazon EC2 instances are noncompliant, and how many Amazon EC2 instances are compliant in relation to the earlier defined patch baseline.
In this section, you have set everything up for patch management on your instance. Now you know how to patch your Amazon EC2 instance in a controlled manner and how to check if your Amazon EC2 instance is compliant with the patch baseline you have defined. Of course, I recommend that you apply these steps to all Amazon EC2 instances you manage.
Summary
In this blog post, I showed how to use Systems Manager to create a patch baseline and maintenance window to keep your Amazon EC2 Linux instances up to date with the latest security patches. Remember that by creating multiple maintenance windows and assigning them to different patch groups, you can make sure your Amazon EC2 instances do not all reboot at the same time.
If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing any part of this solution, start a new thread on the Amazon EC2 forum or contact AWS Support.
Over the next few months the noise over GDPR will finally reach a crescendo. For the uninitiated, “GDPR” stands for “General Data Protection Regulation” and it goes into effect on May 25th of this year. GDPR is designed to protect how personal information of EU (European Union) citizens is collected, stored, and shared. The regulation should also improve transparency as to how personal information is managed by a business or organization.
Backblaze fully expects to be GDPR compliant when May 25th rolls around and we thought we’d share our experience along the way. We’ll start with this post as an introduction to GDPR. In future posts, we’ll dive into some of the details of the process we went through in meeting the GDPR objectives.
GDPR: A Two Way Street
To ensure we are GDPR compliant, Backblaze has assembled a dedicated internal team, engaged outside counsel in the United Kingdom, and consulted with other tech companies on best practices. While it is a sizable effort on our part, we view this as a waypoint in our ongoing effort to secure and protect our customers’ data and to be transparent in how we work as a company.
In addition to the effort we are putting into complying with the regulation, we think it is important to underscore and promote the idea that data privacy and security is a two-way street. We can spend millions of dollars on protecting the security of our systems, but we can’t stop a bad actor from finding and using your account credentials left on a note stuck to your monitor. We can give our customers tools like two factor authentication and private encryption keys, but it is the partnership with our customers that is the most powerful protection. The same thing goes for your digital privacy — we’ll do our best to protect your information, but we will need your help to do so.
Why GDPR is Important
At the center of GDPR is the protection of Personally Identifiable Information or “PII.” The definition for PII is information that can be used stand-alone or in concert with other information to identify a specific person. This includes obvious data like: name, address, and phone number, less obvious data like email address and IP address, and other data such as a credit card number, and unique identifiers that can be decoded back to the person.
How Will GDPR Affect You as an Individual
If you are a citizen in the EU, GDPR is designed to protect your private information from being used or shared without your permission. Technically, this only applies when your data is collected, processed, stored or shared outside of the EU, but it’s a good practice to hold all of your service providers to the same standard. For example, when you are deciding to sign up with a service, you should be able to quickly access and understand what personal information is being collected, why it is being collected, and what the business can do with that information. These terms are typically found in “Terms and Conditions” and “Privacy Policy” documents, or perhaps in a written contract you signed before starting to use a given service or product.
Even if you are not a citizen of the EU, GDPR will still affect you. Why? Because nearly every company you deal with, especially online, will have customers that live in the EU. It makes little sense for Backblaze, or any other service provider or vendor, to create a separate set of rules for just EU citizens. In practice, protection of private information should be more accountable and transparent with GDPR.
How Will GDPR Affect You as a Backblaze Customer
Over the coming months Backblaze customers will see changes to our current “Terms and Conditions,” “Privacy Policy,” and to our Backblaze services. While the changes to the Backblaze services are expected to be minimal, the “terms and privacy” documents will change significantly. The changes will include among other things the addition of a group of model clauses and related materials. These clauses will be generally consistent across all GDPR compliant vendors and are meant to be easily understood so that a customer can easily determine how their PII is being collected and used.
Common GDPR Questions:
Here are a few of the more common questions we have heard regarding GDPR.
GDPR will only affect citizens in the EU. Answer: The changes that are being made by companies such as Backblaze to comply with GDPR will almost certainly apply to customers from all countries. And that’s a good thing. The protections afforded to EU citizens by GDPR are something all users of our service should benefit from.
After May 25, 2018, a citizen of the EU will not be allowed to use any applications or services that store data outside of the EU. Answer: False, no one will stop you as an EU citizen from using the internet-based service you choose. But, you should make sure you know where your data is being collected, processed, and stored. If any of those activities occur outside the EU, make sure the company is following the GDPR guidelines.
My business only has a few EU citizens as customers, so I don’t need to care about GDPR? Answer: False, even if you have just one EU citizen as a customer, and you capture, process or store data their PII outside of the EU, you need to comply with GDPR.
Companies can be fined millions of dollars for not complying with GDPR. Answer: True, but: the regulation allows for companies to be fined up to $4 Million dollars or 20% of global revenue (whichever is greater) if they don’t comply with GDPR. In practice, the feeling is that such fines will be reserved (at least initially) for egregious violators that ignore or merely give “lip-service” to GDPR.
You’ll be able to tell a company is GDPR compliant because they have a “GDPR Certified” badge on their website. Answer: There is no official GDPR certification or an official GDPR certification program. Companies that comply with GDPR are expected to follow the articles in the regulation and it should be clear from the outside looking in that they have followed the regulations. For example, their “Terms and Conditions,” and “Privacy Policy” should clearly spell out how and why they collect, use, and share your information. At some point a real GDPR certification program may be adopted, but not yet.
For all the hoopla about GDPR, the regulation is reasonably well thought out and addresses a very important issue — people’s privacy online. Creating a best practices document, or in this case a regulation, that companies such as Backblaze can follow is a good idea. The document isn’t perfect, and over the coming years we expect there to be changes. One thing we hope for is that the countries within the EU continue to stand behind one regulation and not fragment the document into multiple versions, each applying to themselves. We believe that having multiple different GDPR versions for different EU countries would lead to less protection overall of EU citizens.
In summary, GDPR changes are coming over the next few months. Backblaze has our internal staff and our EU-based legal council working diligently to ensure that we will be GDPR compliant by May 25th. We believe that GDPR will have a positive effect in enhancing the protection of personally identifiable information for not only EU citizens, but all of our Backblaze customers.
This post courtesy of Roberto Iturralde, Sr. Application Developer- AWS Professional Services
Application architects are faced with key decisions throughout the process of designing and implementing their systems. One decision common to nearly all solutions is how to manage the storage and access rights of application configuration. Shared configuration should be stored centrally and securely with each system component having access only to the properties that it needs for functioning.
With AWS Systems Manager Parameter Store, developers have access to central, secure, durable, and highly available storage for application configuration and secrets. Parameter Store also integrates with AWS Identity and Access Management (IAM), allowing fine-grained access control to individual parameters or branches of a hierarchical tree.
This post demonstrates how to create and access shared configurations in Parameter Store from AWS Lambda. Both encrypted and plaintext parameter values are stored with only the Lambda function having permissions to decrypt the secrets. You also use AWS X-Ray to profile the function.
Solution overview
This example is made up of the following components:
An unencrypted Parameter Store parameter that the Lambda function loads
A KMS key that only the Lambda function can access. You use this key to create an encrypted parameter later.
Lambda function code in Python 3.6 that demonstrates how to load values from Parameter Store at function initialization for reuse across invocations.
Launch the AWS SAM template
To create the resources shown in this post, you can download the SAM template or choose the button to launch the stack. The template requires one parameter, an IAM user name, which is the name of the IAM user to be the admin of the KMS key that you create. In order to perform the steps listed in this post, this IAM user will need permissions to execute Lambda functions, create Parameter Store parameters, administer keys in KMS, and view the X-Ray console. If you have these privileges in your IAM user account you can use your own account to complete the walkthrough. You can not use the root user to administer the KMS keys.
SAM template resources
The following sections show the code for the resources defined in the template. Lambda function
In this YAML code, you define a Lambda function named ParameterStoreBlogFunctionDev using the SAM AWS::Serverless::Function type. The environment variables for this function include the ENV (dev) and the APP_CONFIG_PATH where you find the configuration for this app in Parameter Store. X-Ray tracing is also enabled for profiling later.
The IAM role for this function extends the AWSLambdaBasicExecutionRole by adding IAM policies that grant the function permissions to write to X-Ray and get parameters from Parameter Store, limited to paths under /dev/parameterStoreBlog*. Parameter Store parameter
SimpleParameter:
Type: AWS::SSM::Parameter
Properties:
Name: '/dev/parameterStoreBlog/appConfig'
Description: 'Sample dev config values for my app'
Type: String
Value: '{"key1": "value1","key2": "value2","key3": "value3"}'
This YAML code creates a plaintext string parameter in Parameter Store in a path that your Lambda function can access. KMS encryption key
ParameterStoreBlogDevEncryptionKeyAlias:
Type: AWS::KMS::Alias
Properties:
AliasName: 'alias/ParameterStoreBlogKeyDev'
TargetKeyId: !Ref ParameterStoreBlogDevEncryptionKey
ParameterStoreBlogDevEncryptionKey:
Type: AWS::KMS::Key
Properties:
Description: 'Encryption key for secret config values for the Parameter Store blog post'
Enabled: True
EnableKeyRotation: False
KeyPolicy:
Version: '2012-10-17'
Id: 'key-default-1'
Statement:
-
Sid: 'Allow administration of the key & encryption of new values'
Effect: Allow
Principal:
AWS:
- !Sub 'arn:aws:iam::${AWS::AccountId}:user/${IAMUsername}'
Action:
- 'kms:Create*'
- 'kms:Encrypt'
- 'kms:Describe*'
- 'kms:Enable*'
- 'kms:List*'
- 'kms:Put*'
- 'kms:Update*'
- 'kms:Revoke*'
- 'kms:Disable*'
- 'kms:Get*'
- 'kms:Delete*'
- 'kms:ScheduleKeyDeletion'
- 'kms:CancelKeyDeletion'
Resource: '*'
-
Sid: 'Allow use of the key'
Effect: Allow
Principal:
AWS: !GetAtt ParameterStoreBlogFunctionRoleDev.Arn
Action:
- 'kms:Encrypt'
- 'kms:Decrypt'
- 'kms:ReEncrypt*'
- 'kms:GenerateDataKey*'
- 'kms:DescribeKey'
Resource: '*'
This YAML code creates an encryption key with a key policy with two statements.
The first statement allows a given user (${IAMUsername}) to administer the key. Importantly, this includes the ability to encrypt values using this key and disable or delete this key, but does not allow the administrator to decrypt values that were encrypted with this key.
The second statement grants your Lambda function permission to encrypt and decrypt values using this key. The alias for this key in KMS is ParameterStoreBlogKeyDev, which is how you reference it later.
Lambda function
Here I walk you through the Lambda function code.
import os, traceback, json, configparser, boto3
from aws_xray_sdk.core import patch_all
patch_all()
# Initialize boto3 client at global scope for connection reuse
client = boto3.client('ssm')
env = os.environ['ENV']
app_config_path = os.environ['APP_CONFIG_PATH']
full_config_path = '/' + env + '/' + app_config_path
# Initialize app at global scope for reuse across invocations
app = None
class MyApp:
def __init__(self, config):
"""
Construct new MyApp with configuration
:param config: application configuration
"""
self.config = config
def get_config(self):
return self.config
def load_config(ssm_parameter_path):
"""
Load configparser from config stored in SSM Parameter Store
:param ssm_parameter_path: Path to app config in SSM Parameter Store
:return: ConfigParser holding loaded config
"""
configuration = configparser.ConfigParser()
try:
# Get all parameters for this app
param_details = client.get_parameters_by_path(
Path=ssm_parameter_path,
Recursive=False,
WithDecryption=True
)
# Loop through the returned parameters and populate the ConfigParser
if 'Parameters' in param_details and len(param_details.get('Parameters')) > 0:
for param in param_details.get('Parameters'):
param_path_array = param.get('Name').split("/")
section_position = len(param_path_array) - 1
section_name = param_path_array[section_position]
config_values = json.loads(param.get('Value'))
config_dict = {section_name: config_values}
print("Found configuration: " + str(config_dict))
configuration.read_dict(config_dict)
except:
print("Encountered an error loading config from SSM.")
traceback.print_exc()
finally:
return configuration
def lambda_handler(event, context):
global app
# Initialize app if it doesn't yet exist
if app is None:
print("Loading config and creating new MyApp...")
config = load_config(full_config_path)
app = MyApp(config)
return "MyApp config is " + str(app.get_config()._sections)
Beneath the import statements, you import the patch_all function from the AWS X-Ray library, which you use to patch boto3 to create X-Ray segments for all your boto3 operations.
Next, you create a boto3 SSM client at the global scope for reuse across function invocations, following Lambda best practices. Using the function environment variables, you assemble the path where you expect to find your configuration in Parameter Store. The class MyApp is meant to serve as an example of an application that would need its configuration injected at construction. In this example, you create an instance of ConfigParser, a class in Python’s standard library for handling basic configurations, to give to MyApp.
The load_config function loads the all the parameters from Parameter Store at the level immediately beneath the path provided in the Lambda function environment variables. Each parameter found is put into a new section in ConfigParser. The name of the section is the name of the parameter, less the base path. In this example, the full parameter name is /dev/parameterStoreBlog/appConfig, which is put in a section named appConfig.
Finally, the lambda_handler function initializes an instance of MyApp if it doesn’t already exist, constructing it with the loaded configuration from Parameter Store. Then it simply returns the currently loaded configuration in MyApp. The impact of this design is that the configuration is only loaded from Parameter Store the first time that the Lambda function execution environment is initialized. Subsequent invocations reuse the existing instance of MyApp, resulting in improved performance. You see this in the X-Ray traces later in this post. For more advanced use cases where configuration changes need to be received immediately, you could implement an expiry policy for your configuration entries or push notifications to your function.
To confirm that everything was created successfully, test the function in the Lambda console.
In the Functions pane, filter to ParameterStoreBlogFunctionDev to find the function created by the SAM template earlier. Open the function name to view its details.
On the top right of the function detail page, choose Test. You may need to create a new test event. The input JSON doesn’t matter as this function ignores the input.
After running the test, you should see output similar to the following. This demonstrates that the function successfully fetched the unencrypted configuration from Parameter Store.
Create an encrypted parameter
You currently have a simple, unencrypted parameter and a Lambda function that can access it.
Next, you create an encrypted parameter that only your Lambda function has permission to use for decryption. This limits read access for this parameter to only this Lambda function.
To follow along with this section, deploy the SAM template for this post in your account and make your IAM user name the KMS key admin mentioned earlier.
For Name, enter /dev/parameterStoreBlog/appSecrets.
For Type, select Secure String.
For KMS Key ID, choose alias/ParameterStoreBlogKeyDev, which is the key that your SAM template created.
For Value, enter {"secretKey": "secretValue"}.
Choose Create Parameter.
If you now try to view the value of this parameter by choosing the name of the parameter in the parameters list and then choosing Show next to the Value field, you won’t see the value appear. This is because, even though you have permission to encrypt values using this KMS key, you do not have permissions to decrypt values.
In the Lambda console, run another test of your function. You now also see the secret parameter that you created and its decrypted value.
If you do not see the new parameter in the Lambda output, this may be because the Lambda execution environment is still warm from the previous test. Because the parameters are loaded at Lambda startup, you need a fresh execution environment to refresh the values.
Adjust the function timeout to a different value in the Advanced Settings at the bottom of the Lambda Configuration tab. Choose Save and test to trigger the creation of a new Lambda execution environment.
Profiling the impact of querying Parameter Store using AWS X-Ray
By using the AWS X-Ray SDK to patch boto3 in your Lambda function code, each invocation of the function creates traces in X-Ray. In this example, you can use these traces to validate the performance impact of your design decision to only load configuration from Parameter Store on the first invocation of the function in a new execution environment.
From the Lambda function details page where you tested the function earlier, under the function name, choose Monitoring. Choose View traces in X-Ray.
This opens the X-Ray console in a new window filtered to your function. Be aware of the time range field next to the search bar if you don’t see any search results. In this screenshot, I’ve invoked the Lambda function twice, one time 10.3 minutes ago with a response time of 1.1 seconds and again 9.8 minutes ago with a response time of 8 milliseconds.
Looking at the details of the longer running trace by clicking the trace ID, you can see that the Lambda function spent the first ~350 ms of the full 1.1 sec routing the request through Lambda and creating a new execution environment for this function, as this was the first invocation with this code. This is the portion of time before the initialization subsegment.
Next, it took 725 ms to initialize the function, which includes executing the code at the global scope (including creating the boto3 client). This is also a one-time cost for a fresh execution environment.
Finally, the function executed for 65 ms, of which 63.5 ms was the GetParametersByPath call to Parameter Store.
Looking at the trace for the second, much faster function invocation, you see that the majority of the 8 ms execution time was Lambda routing the request to the function and returning the response. Only 1 ms of the overall execution time was attributed to the execution of the function, which makes sense given that after the first invocation you’re simply returning the config stored in MyApp.
While the Traces screen allows you to view the details of individual traces, the X-Ray Service Map screen allows you to view aggregate performance data for all traced services over a period of time.
In the X-Ray console navigation pane, choose Service map. Selecting a service node shows the metrics for node-specific requests. Selecting an edge between two nodes shows the metrics for requests that traveled that connection. Again, be aware of the time range field next to the search bar if you don’t see any search results.
After invoking your Lambda function several more times by testing it from the Lambda console, you can view some aggregate performance metrics. Look at the following:
From the client perspective, requests to the Lambda service for the function are taking an average of 50 ms to respond. The function is generating ~1 trace per minute.
The function itself is responding in an average of 3 ms. In the following screenshot, I’ve clicked on this node, which reveals a latency histogram of the traced requests showing that over 95% of requests return in under 5 ms.
Parameter Store is responding to requests in an average of 64 ms, but note the much lower trace rate in the node. This is because you only fetch data from Parameter Store on the initialization of the Lambda execution environment.
Conclusion
Deduplication, encryption, and restricted access to shared configuration and secrets is a key component to any mature architecture. Serverless architectures designed using event-driven, on-demand, compute services like Lambda are no different.
In this post, I walked you through a sample application accessing unencrypted and encrypted values in Parameter Store. These values were created in a hierarchy by application environment and component name, with the permissions to decrypt secret values restricted to only the function needing access. The techniques used here can become the foundation of secure, robust configuration management in your enterprise serverless applications.
Amazon EMR enables data analysts and scientists to deploy a cluster running popular frameworks such as Spark, HBase, Presto, and Flink of any size in minutes. When you launch a cluster, Amazon EMR automatically configures the underlying Amazon EC2 instances with the frameworks and applications that you choose for your cluster. This can include popular web interfaces such as Hue workbench, Zeppelin notebook, and Ganglia monitoring dashboards and tools.
These web interfaces are hosted on the EMR master node and must be accessed using the public DNS name of the master node (master public DNS value). The master public DNS value is dynamically created, not very user friendly and is hard to remember— it looks something like ip-###-###-###-###.us-west-2.compute.internal. Not having a friendly URL to connect to the popular workbench or notebook interfaces may impact the workflow and hinder your gained agility.
Some customers have addressed this challenge through custom bootstrap actions, steps, or external scripts that periodically check for new clusters and register a friendlier name in DNS. These approaches either put additional burden on the data practitioners or require additional resources to execute the scripts. In addition, there is typically some lag time associated with such scripts. They often don’t do a great job cleaning up the DNS records after the cluster has terminated, potentially resulting in a security risk.
The solution in this post provides an automated, serverless approach to registering a friendly master node name for easy access to the web interfaces.
Before I dive deeper, I review these key services and how they are part of this solution.
CloudWatch Events
CloudWatch Events delivers a near real-time stream of system events that describe changes in AWS resources. Using simple rules, you can match events and route them to one or more target functions or streams. An event can be generated in one of four ways:
From an AWS service when resources change state
From API calls that are delivered via AWS CloudTrail
From your own code that can generate application-level events
In this solution, I cover the first type of event, which is automatically emitted by EMR when the cluster state changes. Based on the state of this event, either create or update the DNS record in Route 53 when the cluster state changes to STARTING, or delete the DNS record when the cluster is no longer needed and the state changes to TERMINATED. For more information about all EMR event details, see Monitor CloudWatch Events.
Route 53 private hosted zones
A private hosted zone is a container that holds information about how to route traffic for a domain and its subdomains within one or more VPCs. Private hosted zones enable you to use custom DNS names for your internal resources without exposing the names or IP addresses to the internet.
Route 53 supports resource record sets with a wide range of record types. In this solution, you use a CNAME record that is used to specify a domain name as an alias for another domain (the ‘canonical’ domain). You use a friendly name of the cluster as the CNAME for the EMR master public DNS value.
You are using private hosted zones because an EMR cluster is typically deployed within a private subnet and is accessed either from within the VPC or from on-premises resources over VPN or AWS Direct Connect. To resolve domain names in private hosted zones from your on-premises network, configure a DNS forwarder, as described in How can I resolve Route 53 private hosted zones from an on-premises network via an Ubuntu instance?.
Lambda
Lambda is a compute service that lets you run code without provisioning or managing servers. Lambda executes your code only when needed and scales automatically to thousands of requests per second. Lambda takes care of high availability, and server and OS maintenance and patching. You pay only for the consumed compute time. There is no charge when your code is not running.
Lambda provides the ability to invoke your code in response to events, such as when an object is put to an Amazon S3 bucket or as in this case, when a CloudWatch event is emitted. As part of this solution, you deploy a Lambda function as a target that is invoked by CloudWatch Events when the event matches your rule. You also configure the necessary permissions based on the Lambda permissions model, including a Lambda function policy and Lambda execution role.
Putting it all together
Now that you have all of the pieces, you can put together a complete solution. The following diagram illustrates how the solution works:
Start with a user activity such as launching or terminating an EMR cluster.
EMR automatically sends events to the CloudWatch Events stream.
A CloudWatch Events rule matches the specified event, and routes it to a target, which in this case is a Lambda function. In this case, you are using the EMR Cluster State Change
The Lambda function performs the following key steps:
Get the clusterId value from the event detail and use it to call EMR. DescribeCluster API to retrieve the following data points:
MasterPublicDnsName – public DNS name of the master node
Locate the tag containing the friendly name to use as the CNAME for the cluster. The key name containing the friendly name should be The value should be specified as host.domain.com, where domain is the private hosted zone in which to update the DNS record.
Update DNS based on the state in the event detail.
If the state is STARTING, the function calls the Route 53 API to create or update a resource record set in the private hosted zone specified by the domain tag. This is a CNAME record mapped to MasterPublicDnsName.
Conversely, if the state is TERMINATED, the function calls the Route 53 API to delete the associated resource record set from the private hosted zone.
Deploying the solution
Because all of the components of this solution are serverless, use the AWS Serverless Application Model (AWS SAM) template to deploy the solution. AWS SAM is natively supported by AWS CloudFormation and provides a simplified syntax for expressing serverless resources, resulting in fewer lines of code.
Overview of the SAM template
For this solution, the SAM template has 76 lines of text as compared to 142 lines without SAM resources (and writing the template in YAML would be even slightly smaller). The solution can be deployed using the AWS Management Console, AWS Command Line Interface (AWS CLI), or AWS SAM Local.
CloudFormation transforms help simplify template authoring by condensing a multiple-line resource declaration into a single line in your template. To inform CloudFormation that your template defines a serverless application, add a line under the template format version as follows:
Before SAM, you would use the AWS::Lambda::Function resource type to define your Lambda function. You would then need a resource to define the permissions for the function (AWS::Lambda::Permission), another resource to define a Lambda execution role (AWS::IAM::Role), and finally a CloudWatch Events resource (Events::Rule) that triggers this function.
With SAM, you need to define just a single resource for your function, AWS::Serverless::Function. Using this single resource type, you can define everything that you need, including function properties such as function handler, runtime, and code URI, as well as the required IAM policies and the CloudWatch event.
A few additional things to note in the code example:
CodeUri – Before you can deploy a SAM template, first upload your Lambda function code zip to S3. You can do this manually or use the aws cloudformation package CLI command to automate the task of uploading local artifacts to a S3 bucket, as shown later.
Lambda execution role and permissions – You are not specifying a Lambda execution role in the template. Rather, you are providing the required permissions as IAM policy documents. When the template is submitted, CloudFormation expands the AWS::Serverless::Function resource, declaring a Lambda function and an execution role. The created role has two attached policies: a default AWSLambdaBasicExecutionRole and the inline policy specified in the template.
CloudWatch Events rule – Instead of specifying a CloudWatch Events resource type, you are defining an event source object as a property of the function itself. When the template is submitted, CloudFormation expands this into a CloudWatch Events rule resource and automatically creates the Lambda resource-based permissions to allow the CloudWatch Events rule to trigger the function.
NOTE: If you are trying this solution outside of us-east-1, then you should download the necessary files, upload them to the buckets in your region, edit the script as appropriate and then run it or use the CLI deployment method below.
3.) Choose Next.
4.) On the Specify Details page, keep or modify the stack name and choose Next.
5.) On the Options page, choose Next.
6.) On the Review page, take the following steps:
Acknowledge the two Transform access capabilities. This allows the CloudFormation transform to create the required IAM resources with custom names.
Under Transforms, choose Create Change Set.
Wait a few seconds for the change set to be created before proceeding. The change set should look as follows:
7.) Choose Execute to deploy the template.
After the template is deployed, you should see four resources created:
After the package is successfully uploaded, the output should look as follows:
Uploading to 0f6d12c7872b50b37dbfd5a60385b854 1872 / 1872.0 (100.00%)
Successfully packaged artifacts and wrote output template to file serverless-output.template.
The CodeUri property in serverless-output.template is now referencing the packaged artifacts in the S3 bucket that you specified:
s3://<bucket>/0f6d12c7872b50b37dbfd5a60385b854
Use the aws cloudformation deploy CLI command to deploy the stack:
You should see the following output after the stack has been successfully created:
Waiting for changeset to be created...
Waiting for stack create/update to complete
Successfully created/updated stack – EmrDnsSetterCli
Validating results
To test the solution, launch an EMR cluster. The Lambda function looks for the cluster_name tag associated with the EMR cluster. Make sure to specify the friendly name of your cluster as host.domain.com where the domain is the private hosted zone in which to create the CNAME record.
Here is a sample CLI command to launch a cluster within a specific subnet in a VPC with the required tag cluster_name.
After the cluster is launched, log in to the Route 53 console. In the left navigation pane, choose Hosted Zones to view the list of private and public zones currently configured in Route 53. Select the hosted zone that you specified in the ZONE tag when you launched the cluster. Verify that the resource records were created.
You can also monitor the CloudWatch Events metrics that are published to CloudWatch every minute, such as the number of TriggeredRules and Invocations.
Now that you’ve verified that the Lambda function successfully updated the Route 53 resource records in the zone file, terminate the EMR cluster and verify that the records are removed by the same function.
Conclusion
This solution provides a serverless approach to automatically assigning a friendly name for your EMR cluster for easy access to popular notebooks and other web interfaces. CloudWatch Events also supports cross-account event delivery, so if you are running EMR clusters in multiple AWS accounts, all cluster state events across accounts can be consolidated into a single account.
I hope that this solution provides a small glimpse into the power of CloudWatch Events and Lambda and how they can be leveraged with EMR and other AWS big data services. For example, by using the EMR step state change event, you can chain various pieces of your analytics pipeline. You may have a transient cluster perform data ingest and, when the task successfully completes, spin up an ETL cluster for transformation and upload to Amazon Redshift. The possibilities are truly endless.
AWS Fargate is a technology that allows you to focus on running your application without needing to provision, monitor, or manage the underlying compute infrastructure. You package your application into a Docker container that you can then launch using your container orchestration tool of choice.
Fargate allows you to use containers without being responsible for Amazon EC2 instances, similar to how EC2 allows you to run VMs without managing physical infrastructure. Currently, Fargate provides support for Amazon Elastic Container Service (Amazon ECS). Support for Amazon Elastic Container Service for Kubernetes (Amazon EKS) will be made available in the near future.
Despite offloading the responsibility for the underlying instances, Fargate still gives you deep control over configuration of network placement and policies. This includes the ability to use many networking fundamentals such as Amazon VPC and security groups.
This post covers how to take advantage of the different ways of networking your containers in Fargate when using ECS as your orchestration platform, with a focus on how to do networking securely.
The first step to running any application in Fargate is defining an ECS task for Fargate to launch. A task is a logical group of one or more Docker containers that are deployed with specified settings. When running a task in Fargate, there are two different forms of networking to consider:
Container (local) networking
External networking
Container Networking
Container networking is often used for tightly coupled application components. Perhaps your application has a web tier that is responsible for serving static content as well as generating some dynamic HTML pages. To generate these dynamic pages, it has to fetch information from another application component that has an HTTP API.
One potential architecture for such an application is to deploy the web tier and the API tier together as a pair and use local networking so the web tier can fetch information from the API tier.
If you are running these two components as two processes on a single EC2 instance, the web tier application process could communicate with the API process on the same machine by using the local loopback interface. The local loopback interface has a special IP address of 127.0.0.1 and hostname of localhost.
By making a networking request to this local interface, it bypasses the network interface hardware and instead the operating system just routes network calls from one process to the other directly. This gives the web tier a fast and efficient way to fetch information from the API tier with almost no networking latency.
In Fargate, when you launch multiple containers as part of a single task, they can also communicate with each other over the local loopback interface. Fargate uses a special container networking mode called awsvpc, which gives all the containers in a task a shared elastic network interface to use for communication.
If you specify a port mapping for each container in the task, then the containers can communicate with each other on that port. For example the following task definition could be used to deploy the web tier and the API tier:
ECS, with Fargate, is able to take this definition and launch two containers, each of which is bound to a specific static port on the elastic network interface for the task.
Because each Fargate task has its own isolated networking stack, there is no need for dynamic ports to avoid port conflicts between different tasks as in other networking modes. The static ports make it easy for containers to communicate with each other. For example, the web container makes a request to the API container using its well-known static port:
curl 127.0.0.1:8080/my-endpoint
This sends a local network request, which goes directly from one container to the other over the local loopback interface without traversing the network. This deployment strategy allows for fast and efficient communication between two tightly coupled containers. But most application architectures require more than just internal local networking.
External Networking
External networking is used for network communications that go outside the task to other servers that are not part of the task, or network communications that originate from other hosts on the internet and are directed to the task.
Configuring external networking for a task is done by modifying the settings of the VPC in which you launch your tasks. A VPC is a fundamental tool in AWS for controlling the networking capabilities of resources that you launch on your account.
When setting up a VPC, you create one or more subnets, which are logical groups that your resources can be placed into. Each subnet has an Availability Zone and its own route table, which defines rules about how network traffic operates for that subnet. There are two main types of subnets: public and private.
Public subnets
A public subnet is a subnet that has an associated internet gateway. Fargate tasks in that subnet are assigned both private and public IP addresses:
A browser or other client on the internet can send network traffic to the task via the internet gateway using its public IP address. The tasks can also send network traffic to other servers on the internet because the route table can route traffic out via the internet gateway.
If tasks want to communicate directly with each other, they can use each other’s private IP address to send traffic directly from one to the other so that it stays inside the subnet without going out to the internet gateway and back in.
Private subnets
A private subnet does not have direct internet access. The Fargate tasks inside the subnet don’t have public IP addresses, only private IP addresses. Instead of an internet gateway, a network address translation (NAT) gateway is attached to the subnet:
There is no way for another server or client on the internet to reach your tasks directly, because they don’t even have an address or a direct route to reach them. This is a great way to add another layer of protection for internal tasks that handle sensitive data. Those tasks are protected and can’t receive any inbound traffic at all.
In this configuration, the tasks can still communicate to other servers on the internet via the NAT gateway. They would appear to have the IP address of the NAT gateway to the recipient of the communication. If you run a Fargate task in a private subnet, you must add this NAT gateway. Otherwise, Fargate can’t make a network request to Amazon ECR to download the container image, or communicate with Amazon CloudWatch to store container metrics.
Load balancers
If you are running a container that is hosting internet content in a private subnet, you need a way for traffic from the public to reach the container. This is generally accomplished by using a load balancer such as an Application Load Balancer or a Network Load Balancer.
ECS integrates tightly with AWS load balancers by automatically configuring a service-linked load balancer to send network traffic to containers that are part of the service. When each task starts, the IP address of its elastic network interface is added to the load balancer’s configuration. When the task is being shut down, network traffic is safely drained from the task before removal from the load balancer.
To get internet traffic to containers using a load balancer, the load balancer is placed into a public subnet. ECS configures the load balancer to forward traffic to the container tasks in the private subnet:
This configuration allows your tasks in Fargate to be safely isolated from the rest of the internet. They can still initiate network communication with external resources via the NAT gateway, and still receive traffic from the public via the Application Load Balancer that is in the public subnet.
Another potential use case for a load balancer is for internal communication from one service to another service within the private subnet. This is typically used for a microservice deployment, in which one service such as an internet user account service needs to communicate with an internal service such as a password service. Obviously, it is undesirable for the password service to be directly accessible on the internet, so using an internet load balancer would be a major security vulnerability. Instead, this can be accomplished by hosting an internal load balancer within the private subnet:
With this approach, one container can distribute requests across an Auto Scaling group of other private containers via the internal load balancer, ensuring that the network traffic stays safely protected within the private subnet.
Best Practices for Fargate Networking
Determine whether you should use local task networking
Local task networking is ideal for communicating between containers that are tightly coupled and require maximum networking performance between them. However, when you deploy one or more containers as part of the same task they are always deployed together so it removes the ability to independently scale different types of workload up and down.
In the example of the application with a web tier and an API tier, it may be the case that powering the application requires only two web tier containers but 10 API tier containers. If local container networking is used between these two container types, then an extra eight unnecessary web tier containers would end up being run instead of allowing the two different services to scale independently.
A better approach would be to deploy the two containers as two different services, each with its own load balancer. This allows clients to communicate with the two web containers via the web service’s load balancer. The web service could distribute requests across the eight backend API containers via the API service’s load balancer.
Run internet tasks that require internet access in a public subnet
If you have tasks that require internet access and a lot of bandwidth for communication with other services, it is best to run them in a public subnet. Give them public IP addresses so that each task can communicate with other services directly.
If you run these tasks in a private subnet, then all their outbound traffic has to go through an NAT gateway. AWS NAT gateways support up to 10 Gbps of burst bandwidth. If your bandwidth requirements go over this, then all task networking starts to get throttled. To avoid this, you could distribute the tasks across multiple private subnets, each with their own NAT gateway. It can be easier to just place the tasks into a public subnet, if possible.
Avoid using a public subnet or public IP addresses for private, internal tasks
If you are running a service that handles private, internal information, you should not put it into a public subnet or use a public IP address. For example, imagine that you have one task, which is an API gateway for authentication and access control. You have another background worker task that handles sensitive information.
The intended access pattern is that requests from the public go to the API gateway, which then proxies request to the background task only if the request is from an authenticated user. If the background task is in a public subnet and has a public IP address, then it could be possible for an attacker to bypass the API gateway entirely. They could communicate directly to the background task using its public IP address, without being authenticated.
Conclusion
Fargate gives you a way to run containerized tasks directly without managing any EC2 instances, but you still have full control over how you want networking to work. You can set up containers to talk to each other over the local network interface for maximum speed and efficiency. For running workloads that require privacy and security, use a private subnet with public internet access locked down. Or, for simplicity with an internet workload, you can just use a public subnet and give your containers a public IP address.
To deploy one of these Fargate task networking approaches, check out some sample CloudFormation templates showing how to configure the VPC, subnets, and load balancers.
If you have questions or suggestions, please comment below.
When managing your AWS resources, you often need to grant one AWS service access to another to accomplish tasks. For example, you could use an AWS Lambdafunction to resize, watermark, and postprocess images, for which you would need to store the associated metadata in Amazon DynamoDB. You also could use Lambda, Amazon S3, and Amazon CloudFront to build a serverless website that uses a DynamoDB table as a session store, with Lambda updating the information in the table. In both these examples, you need to grant Lambda functions permissions to write to DynamoDB.
In this post, I demonstrate how to create an AWS Identity and Access Management (IAM) policy that will be attached to an IAM role. The role is then used to grant a Lambda function access to a DynamoDB table. By using an IAM policy and role to control access, I don’t need to embed credentials in code and can tightly control which services the Lambda function can access. The policy also includes permissions to allow the Lambda function to write log files to Amazon CloudWatch Logs. This allows me to view utilization statistics for your Lambda functions and to have access to additional logging for troubleshooting issues.
Solution overview
The following architecture diagram presents an overview of the solution in this post.
The architecture of this post’s solution uses a Lambda function (1 in the preceding diagram) to make read API calls such as GET or SCAN and write API calls such as PUT or UPDATE to a DynamoDB table (2). The Lambda function also writes log files to CloudWatch Logs (3). The Lambda function uses an IAM role (4) that has an IAM policy attached (5) that grants access to DynamoDB and CloudWatch.
Overview of the AWS services used in this post
I use the following AWS services in this post’s solution:
IAM – For securely controlling access to AWS services. With IAM, you can centrally manage users, security credentials such as access keys, and permissions that control which AWS resources users and applications can access.
DynamoDB – A fast and flexible NoSQL database service for all applications that need consistent, single-digit-millisecond latency at any scale.
Lambda – Run code without provisioning or managing servers. You pay only for the compute time you consume—there is no charge when your code is not running.
CloudWatch Logs– For monitoring, storing, and accessing log files generated by AWS resources, including Lambda.
IAM access policies
I have authored an IAM access policy with JSON to grant the required permissions to the DynamoDB table and CloudWatch Logs. I will attach this policy to a role, and this role will then be attached to a Lambda function, which will assume the required access to DynamoDB and CloudWatch Logs
I will walk through this policy, and explain its elements and how to create the policy in the IAM console.
The following policy grants a Lambda function read and write access to a DynamoDB table and writes log files to CloudWatch Logs. This policy is called MyLambdaPolicy. The following is the full JSON document of this policy (the AWS account ID is a placeholder value that you would replace with your own account ID).
The first element in this policy is the Version, which defines the JSON version. At the time of this post’s publication, the most recent version of JSON is 2012-10-17.
The next element in this first policy is a Statement. This is the main section of the policy and includes multiple elements. This first statement is to Allow access to DynamoDB, and in this example, the elements I use are:
An Effect element – Specifies whether the statement results in an Allow or an explicit Deny. By default, access to resources is implicitly denied. In this example, I have used Allow because I want to allow the actions.
An Action element – Describes the specific actions for this statement. Each AWS service has its own set of actions that describe tasks that you can perform with that service. I have used the DynamoDB actions that I want to allow. For the definitions of all available actions for DynamoDB, see the DynamoDB API Reference.
A Resource element – Specifies the object or objects for this statement using Amazon Resource Names (ARNs). You use an ARN to uniquely identify an AWS resource. All Resource elements start with arn:aws and then define the object or objects for the statement. I use this to specify the DynamoDB table to which I want to allow access. To build the Resource element for DynamoDB, I have to specify:
The AWS service (dynamodb)
The AWS Region (eu-west-1)
The AWS account ID (123456789012)
The table (table/SampleTable)
The complete Resource element of the first statement is: arn:aws:dynamodb:eu-west-1:123456789012:table/SampleTable
In this policy, I created a second statement to allow access to CloudWatch Logs so that the Lambda function can write log files for troubleshooting and analysis. I have used the same elements as for the DynamoDB statement, but have changed the following values:
For the Action element, I used the CloudWatch actions that I want to allow. Definitions of all the available actions for CloudWatch are provided in the CloudWatch API Reference.
For the Resource element, I specified the AWS account to which I want to allow my Lambda function to write its log files. As in the preceding example for DynamoDB, you have to use the ARN for CloudWatch Logs to specify where access should be granted. To build the Resource element for CloudWatch Logs, I have to specify:
The AWS service (logs)
The AWS region (eu-west-1)
The AWS account ID (123456789012)
All log groups in this account (*)
The complete Resource element of the second statement is: arn:aws:logs:eu-west-1:123456789012:*
Create the IAM policy in your account
Before you can apply MyLambdaPolicy to a Lambda function, you have to create the policy in your own account and then apply it to an IAM role.
To create an IAM policy:
Navigate to the IAM console and choose Policies in the navigation pane. Choose Create policy.
Because I have already written the policy in JSON, you don’t need to use the Visual Editor, so you can choose the JSON tab and paste the content of the JSON policy document shown earlier in this post (remember to replace the placeholder account IDswith your own account ID). Choose Review policy.
Name the policy MyLambdaPolicy and give it a description that will help you remember the policy’s purpose. You also can view a summary of the policy’s permissions. Choose Create policy.
You have created the IAM policy that you will apply to the Lambda function.
Attach the IAM policy to an IAM role
To apply MyLambdaPolicy to a Lambda function, you first have to attach the policy to an IAM role.
To create a new role and attach MyLambdaPolicy to the role:
Navigate to the IAM console and choose Roles in the navigation pane. Choose Create role.
Choose AWS service and then choose Lambda. Choose Next: Permissions.
On the Attach permissions policies page, type MyLambdaPolicy in the Search box. Choose MyLambdaPolicy from the list of returned search results, and then choose Next: Review.
On the Review page, type MyLambdaRole in the Role name box and an appropriate description, and then choose Create role.
You have attached the policy you created earlier to a new IAM role, which in turn can be used by a Lambda function.
Apply the IAM role to a Lambda function
You have created an IAM role that has an attached IAM policy that grants both read and write access to DynamoDB and write access to CloudWatch Logs. The next step is to apply the IAM role to a Lambda function.
To apply the IAM role to a Lambda function:
Navigate to the Lambda console and choose Create function.
On the Create function page under Author from scratch, name the function MyLambdaFunction, and choose the runtime you want to use based on your application requirements. Lambda currently supports Node.js, Java, Python, Go, and .NET Core. From the Role dropdown, choose Choose an existing role, and from the Existing role dropdown, choose MyLambdaRole. Then choose Create function.
MyLambdaFunction now has access to CloudWatch Logs and DynamoDB. You can choose either of these services to see the details of which permissions the function has.
If you have any comments about this blog post, submit them in the “Comments” section below. If you have any questions about the services used, start a new thread in the applicable AWS forum: IAM, Lambda, DynamoDB, or CloudWatch.
With Raspberry Pi projects using home assistant services such as Amazon Alexa and Google Home becoming more and more popular, we invited Raspberry Pi maker Matt ‘Raspberry Pi Spy‘ Hawkins to write a guest post about his latest project, the Pi Spy Alexa Skill.
Pi Spy Skill
The Alexa system uses Skills to provide voice-activated functionality, and it allows you to create new Skills to add extra features. With the Pi Spy Skill, you can ask Alexa what function each pin on the Raspberry Pi’s GPIO header provides, for example by using the phrase “Alexa, ask Pi Spy what is Pin 2.” In response to a phrase such as “Alexa, ask Pi Spy where is GPIO 8”, Alexa can now also tell you on which pin you can find a specific GPIO reference number.
This information is already available in various forms, but I thought it would be useful to retrieve it when I was busy soldering or building circuits and had no hands free.
Creating an Alexa Skill
There is a learning curve to creating a new Skill, and in some regards it was similar to mobile app development.
A Skill consists of two parts: the first is created within the Amazon Developer Console and defines the structure of the voice commands Alexa should recognise. The second part is a webservice that can receive data extracted from the voice commands and provide a response back to the device. You can create the webservice on a webserver, internet-connected device, or cloud service.
I decided to use Amazon’s AWS Lambda service. Once set up, this allows you to write code without having to worry about the server it is running on. It also supports Python, so it fit in nicely with most of my other projects.
To get started, I logged into the Amazon Developer Console with my personal Amazon account and navigated to the Alexa section. I created a new Skill named Pi Spy. Within a Skill, you define an Intent Schema and some Sample Utterances. The schema defines individual intents, and the utterances define how these are invoked by the user.
Here is how my ExaminePin intent is defined in the schema:
Example utterances then attempt to capture the different phrases the user might speak to their device.
Whenever Alexa matches a spoken phrase to an utterance, it passes the name of the intent and the variable PinID to the webservice.
In the test section, you can check what JSON data will be generated and passed to your webservice in response to specific phrases. This allows you to verify that the webservices’ responses are correct.
Over on the AWS Services site, I created a Lambda function based on one of the provided examples to receive the incoming requests. Here is the section of that code which deals with the ExaminePin intent:
For this intent, I used a Python dictionary to match the incoming pin number to its description. Another Python function deals with the GPIO queries. A URL to this Lambda function was added to the Skill as its ‘endpoint’.
As with the Skill, the Python code can be tested to iron out any syntax errors or logic problems.
With suitable configuration, it would be possible to create the webservice on a Pi, and that is something I’m currently working on. This approach is particularly interesting, as the Pi can then be used to control local hardware devices such as cameras, lights, or pet feeders.
Note
My Alexa Skill is currently only available to UK users. I’m hoping Amazon will choose to copy it to the US service, but I think that is down to its perceived popularity, or it may be done in bulk based on release date. In the next update, I’ll be adding an American English version to help speed up this process.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.