This post is courtesy of Alan Protasio, Software Development Engineer, Amazon Web Services
Just like compute and storage, messaging is a fundamental building block of enterprise applications. Message brokers (aka “message-oriented middleware”) enable different software systems, often written in different languages, on different platforms, running in different locations, to communicate and exchange information. Mission-critical applications, such as CRM and ERP, rely on message brokers to work.
A common performance consideration for customers deploying a message broker in a production environment is the throughput of the system, measured as messages per second. This is important to know so that application environments (hosts, threads, memory, etc.) can be configured correctly.
In this post, we demonstrate how to measure the throughput for Amazon MQ, a new managed message broker service for ActiveMQ, using JMS Benchmark. It should take between 15–20 minutes to set up the environment and an hour to run the benchmark. We also provide some tips on how to configure Amazon MQ for optimal throughput.
Benchmarking throughput for Amazon MQ
ActiveMQ can be used for a number of use cases. These use cases can range from simple fire and forget tasks (that is, asynchronous processing), low-latency request-reply patterns, to buffering requests before they are persisted to a database.
The throughput of Amazon MQ is largely dependent on the use case. For example, if you have non-critical workloads such as gathering click events for a non-business-critical portal, you can use ActiveMQ in a non-persistent mode and get extremely high throughput with Amazon MQ.
On the flip side, if you have a critical workload where durability is extremely important (meaning that you can’t lose a message), then you are bound by the I/O capacity of your underlying persistence store. We recommend using mq.m4.large for the best results. The mq.t2.micro instance type is intended for product evaluation. Performance is limited, due to the lower memory and burstable CPU performance.
Tip: To improve your throughput with Amazon MQ, make sure that you have consumers processing messaging as fast as (or faster than) your producers are pushing messages.
Because it’s impossible to talk about how the broker (ActiveMQ) behaves for each and every use case, we walk through how to set up your own benchmark for Amazon MQ using our favorite open-source benchmarking tool: JMS Benchmark. We are fans of the JMS Benchmark suite because it’s easy to set up and deploy, and comes with a built-in visualizer of the results.
Non-Persistent Scenarios – Queue latency as you scale producer throughput
Getting started
At the time of publication, you can create an mq.m4.large single-instance broker for testing for $0.30 per hour (US pricing).
Step 2 – Create an EC2 instance to run your benchmark Launch the EC2 instance using Step 1: Launch an Instance. We recommend choosing the m5.large instance type.
Step 3 – Configure the security groups Make sure that all the security groups are correctly configured to let the traffic flow between the EC2 instance and your broker.
From the broker list, choose the name of your broker (for example, MyBroker)
In the Details section, under Security and network, choose the name of your security group or choose the expand icon ( ).
From the security group list, choose your security group.
At the bottom of the page, choose Inbound, Edit.
In the Edit inbound rules dialog box, add a role to allow traffic between your instance and the broker: • Choose Add Rule. • For Type, choose Custom TCP. • For Port Range, type the ActiveMQ SSL port (61617). • For Source, leave Custom selected and then type the security group of your EC2 instance. • Choose Save.
Your broker can now accept the connection from your EC2 instance.
Step 4 – Run the benchmark Connect to your EC2 instance using SSH and run the following commands:
After the benchmark finishes, you can find the results in the ~/reports directory. As you may notice, the performance of ActiveMQ varies based on the number of consumers, producers, destinations, and message size.
Amazon MQ architecture
The last bit that’s important to know so that you can better understand the results of the benchmark is how Amazon MQ is architected.
Amazon MQ is architected to be highly available (HA) and durable. For HA, we recommend using the multi-AZ option. After a message is sent to Amazon MQ in persistent mode, the message is written to the highly durable message store that replicates the data across multiple nodes in multiple Availability Zones. Because of this replication, for some use cases you may see a reduction in throughput as you migrate to Amazon MQ. Customers have told us they appreciate the benefits of message replication as it helps protect durability even in the face of the loss of an Availability Zone.
Conclusion
We hope this gives you an idea of how Amazon MQ performs. We encourage you to run tests to simulate your own use cases.
To learn more, see the Amazon MQ website. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.
Thanks to Greg Eppel, Sr. Solutions Architect, Microsoft Platform for this great blog that describes how to create a custom CodeBuild build environment for the .NET Framework. — AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. CodeBuild provides curated build environments for programming languages and runtimes such as Android, Go, Java, Node.js, PHP, Python, Ruby, and Docker. CodeBuild now supports builds for the Microsoft Windows Server platform, including a prepackaged build environment for .NET Core on Windows. If your application uses the .NET Framework, you will need to use a custom Docker image to create a custom build environment that includes the Microsoft proprietary Framework Class Libraries. For information about why this step is required, see our FAQs. In this post, I’ll show you how to create a custom build environment for .NET Framework applications and walk you through the steps to configure CodeBuild to use this environment.
Build environments are Docker images that include a complete file system with everything required to build and test your project. To use a custom build environment in a CodeBuild project, you build a container image for your platform that contains your build tools, push it to a Docker container registry such as Amazon Elastic Container Registry (Amazon ECR), and reference it in the project configuration. When it builds your application, CodeBuild retrieves the Docker image from the container registry specified in the project configuration and uses the environment to compile your source code, run your tests, and package your application.
Step 1: Launch EC2 Windows Server 2016 with Containers
In the Amazon EC2 console, in your region, launch an Amazon EC2 instance from a Microsoft Windows Server 2016 Base with Containers AMI.
Increase disk space on the boot volume to at least 50 GB to account for the larger size of containers required to install and run Visual Studio Build Tools.
Run the following command in that directory. This process can take a while. It depends on the size of EC2 instance you launched. In my tests, a t2.2xlarge takes less than 30 minutes to build the image and produces an approximately 15 GB image.
docker build -t buildtools2017:latest -m 2GB .
Run the following command to test the container and start a command shell with all the developer environment variables:
docker run -it buildtools2017
Create a repository in the Amazon ECS console. For the repository name, type buildtools2017. Choose Next step and then complete the remaining steps.
Execute the following command to generate authentication details for our registry to the local Docker engine. Make sure you have permissions to the Amazon ECR registry before you execute the command.
aws ecr get-login
In the same command prompt window, copy and paste the following commands:
In the CodeCommit console, create a repository named DotNetFrameworkSampleApp. On the Configure email notifications page, choose Skip.
Clone a .NET Framework Docker sample application from GitHub. The repository includes a sample ASP.NET Framework that we’ll use to demonstrate our custom build environment.On the EC2 instance, open a command prompt and execute the following commands:
Navigate to the CodeCommit repository and confirm that the files you just pushed are there.
Step 4: Configure build spec
To build your .NET Framework application with CodeBuild you use a build spec, which is a collection of build commands and related settings, in YAML format, that AWS CodeBuild can use to run a build. You can include a build spec as part of the source code or you can define a build spec when you create a build project. In this example, I include a build spec as part of the source code.
In the root directory of your source directory, create a YAML file named buildspec.yml.
At this point, we have a Docker image with Visual Studio Build Tools installed and stored in the Amazon ECR registry. We also have a sample ASP.NET Framework application in a CodeCommit repository. Now we are going to set up CodeBuild to build the ASP.NET Framework application.
In the Amazon ECR console, choose the repository that was pushed earlier with the docker push command. On the Permissions tab, choose Add.
For Source Provider, choose AWS CodeCommit and then choose the called DotNetFrameworkSampleApp repository.
For Environment Image, choose Specify a Docker image.
For Environment type, choose Windows.
For Custom image type, choose Amazon ECR.
For Amazon ECR repository, choose the Docker image with the Visual Studio Build Tools installed, buildtools2017. Your configuration should look like the image below:
Choose Continue and then Save and Build to create your CodeBuild project and start your first build. You can monitor the status of the build in the console. You can also configure notifications that will notify subscribers whenever builds succeed, fail, go from one phase to another, or any combination of these events.
Summary
CodeBuild supports a number of platforms and languages out of the box. By using custom build environments, it can be extended to other runtimes. In this post, I showed you how to build a .NET Framework environment on a Windows container and demonstrated how to use it to build .NET Framework applications in CodeBuild.
We’re excited to see how customers extend and use CodeBuild to enable continuous integration and continuous delivery for their Windows applications. Feel free to share what you’ve learned extending CodeBuild for your own projects. Just leave questions or suggestions in the comments.
When I talk with customers and partners, I find that they are in different stages in the adoption of DevOps methodologies. They are automating the creation of application artifacts and the deployment of their applications to different infrastructure environments. In many cases, they are creating and supporting multiple applications using a variety of coding languages and artifacts.
The management of these processes and artifacts can be challenging, but using the right tools and methodologies can simplify the process.
In this post, I will show you how you can automate the creation and storage of application artifacts through the implementation of a pipeline and custom deploy action in AWS CodePipeline. The example includes a Node.js code base stored in an AWS CodeCommit repository. A Node Package Manager (npm) artifact is built from the code base, and the build artifact is published to a JFrogArtifactory npm repository.
I frequently recommend AWS CodePipeline, the AWS continuous integration and continuous delivery tool. You can use it to quickly innovate through integration and deployment of new features and bug fixes by building a workflow that automates the build, test, and deployment of new versions of your application. And, because AWS CodePipeline is extensible, it allows you to create a custom action that performs customized, automated actions on your behalf.
JFrog’s Artifactory is a universal binary repository manager where you can manage multiple applications, their dependencies, and versions in one place. Artifactory also enables you to standardize the way you manage your package types across all applications developed in your company, no matter the code base or artifact type.
If you already have a Node.js CodeCommit repository, a JFrog Artifactory host, and would like to automate the creation of the pipeline, including the custom action and CodeBuild project, you can use this AWS CloudFormationtemplate to create your AWS CloudFormation stack.
This figure shows the path defined in the pipeline for this project. It starts with a change to Node.js source code committed to a private code repository in AWS CodeCommit. With this change, CodePipeline triggers AWS CodeBuild to create the npm package from the node.js source code. After the build, CodePipeline triggers the custom action job worker to commit the build artifact to the designated artifact repository in Artifactory.
This blog post assumes you have already:
· Created a CodeCommit repository that contains a Node.js project.
· Configured a two-stage pipeline in AWS CodePipeline.
The Source stage of the pipeline is configured to poll the Node.js CodeCommit repository. The Build stage is configured to use a CodeBuild project to build the npm package using a buildspec.yml file located in the code repository.
If you do not have a Node.js repository, you can create a CodeCommit repository that contains this simple ‘Hello World’ project. This project also includes a buildspec.yml file that is used when you define your CodeBuild project. It defines the steps to be taken by CodeBuild to create the npm artifact.
If you do not already have a pipeline set up in CodePipeline, you can use this template to create a pipeline with a CodeCommit source action and a CodeBuild build action through the AWS Command Line Interface (AWS CLI). If you do not want to install the AWS CLI on your local machine, you can use AWS Cloud9, our managed integrated development environment (IDE), to interact with AWS APIs.
In your development environment, open your favorite editor and fill out the template with values appropriate to your project. For information, see the readme in the GitHub repository.
Use this CLI command to create the pipeline from the template:
It creates a pipeline that has a CodeCommit source action and a CodeBuild build action.
Integrating JFrog Artifactory
JFrog Artifactory provides default repositories for your project needs. For my NPM package repository, I am using the default virtual npm repository (named npm) that is available in Artifactory Pro. You might want to consider creating a repository per project but for the example used in this post, using the default lets me get started without having to configure a new repository.
I can use the steps in the Set Me Up -> npm section on the landing page to configure my worker to interact with the default NPM repository.
Describes the required values to run the custom action. I will define my custom action in the ‘Deploy’ category, identify the provider as ‘Artifactory’, of version ‘1’, and specify a variety of configurationProperties whose values will be defined when this stage is added to my pipeline.
Polls CodePipeline for a job, scanning for its action-definition properties. In this blog post, after a job has been found, the job worker does the work required to publish the npm artifact to the Artifactory repository.
{
"category": "Deploy",
"configurationProperties": [{
"name": "TypeOfArtifact",
"required": true,
"key": true,
"secret": false,
"description": "Package type, ex. npm for node packages",
"type": "String"
},
{ "name": "RepoKey",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Name of the repository in which this artifact should be stored"
},
{ "name": "UserName",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Username for authenticating with the repository"
},
{ "name": "Password",
"required": true,
"key": true,
"secret": true,
"type": "String",
"description": "Password for authenticating with the repository"
},
{ "name": "EmailAddress",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Email address used to authenticate with the repository"
},
{ "name": "ArtifactoryHost",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Public address of Artifactory host, ex: https://myexamplehost.com or http://myexamplehost.com:8080"
}],
"provider": "Artifactory",
"version": "1",
"settings": {
"entityUrlTemplate": "{Config:ArtifactoryHost}/artifactory/webapp/#/artifacts/browse/tree/General/{Config:RepoKey}"
},
"inputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 1
},
"outputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 0
}
}
There are seven sections to the custom action definition:
category: This is the stage in which you will be creating this action. It can be Source, Build, Deploy, Test, Invoke, Approval. Except for source actions, the category section simply allows us to organize our actions. I am setting the category for my action as ‘Deploy’ because I’m using it to publish my node artifact to my Artifactory instance.
configurationProperties: These are the parameters or variables required for your project to authenticate and commit your artifact. In the case of my custom worker, I need:
TypeOfArtifact: In this case, npm, because it’s for the Node Package Manager.
RepoKey: The name of the repository. In this case, it’s the default npm.
UserName and Password for the user to authenticate with the Artifactory repository.
EmailAddress used to authenticate with the repository.
Artifactory host name or IP address.
provider: The name you define for your custom action stage. I have named the provider Artifactory.
version: Version number for the custom action. Because this is the first version, I set the version number to 1.
entityUrlTemplate: This URL is presented to your users for the deploy stage along with the title you define in your provider. The link takes the user to their artifact repository page in the Artifactory host.
inputArtifactDetails: The number of artifacts to expect from the previous stage in the pipeline.
outputArtifactDetails: The number of artifacts that should be the result from the custom action stage. Later in this blog post, I define 0 for my output artifacts because I am publishing the artifact to the Artifactory repository as the final action.
After I define the custom action in a JSON file, I use the AWS CLI to create the custom action type in CodePipeline:
After I create the custom action type in the same region as my pipeline, I edit the pipeline to add a Deploy stage and configure it to use the custom action I created for Artifactory:
I have created a custom worker for the actions required to commit the npm artifact to the Artifactory repository. The worker is in Python and it runs in a loop on an Amazon EC2 instance. My custom worker polls for a deploy job and publishes the NPM artifact to the Artifactory repository.
The EC2 instance is running Amazon Linux and has an IAM instance role attached that gives the worker permission to access CodePipeline. The worker process is as follows:
Take the configuration properties from the custom worker and poll CodePipeline for a custom action job.
After there is a job in the job queue with the appropriate category, provider, and version, acknowledge the job.
Download the zipped artifact created in the previous Build stage from the provided S3 buckets with the provided temporary credentials.
Unzip the artifact into a temporary directory.
A user-defined Artifactory user name and password is used to receive a temporary API key from Artifactory.
To avoid having to write the password to a file, use that temporary API key and user name to authenticate with the NPM repository.
Publish the Node.js package to the specified repository.
Because I am running my custom worker on an Amazon Linux EC2 instance, I installed npm with the following command:
sudo yum install nodejs npm --enablerepo=epel
For my custom worker, I used pip to install the required Python libraries:
pip install boto3 requests
For a full Python package list, see requirements.txt in the GitHub repository.
Let’s take a look at some of the code snippets from the worker.
First, the worker polls for jobs:
def action_type():
ActionType = {
'category': 'Deploy',
'owner': 'Custom',
'provider': 'Artifactory',
'version': '1' }
return(ActionType)
def poll_for_jobs():
try:
artifactory_action_type = action_type()
print(artifactory_action_type)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
while not jobs['jobs']:
time.sleep(10)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
if jobs['jobs']:
print('Job found')
return jobs['jobs'][0]
except ClientError as e:
print("Received an error: %s" % str(e))
raise
When there is a job in the queue, the poller returns a number of values from the queue such as jobId, the input and output S3 buckets for artifacts, temporary credentials to access the S3 buckets, and other configuration details from the stage in the pipeline.
After successfully receiving the job details, the worker sends an acknowledgement to CodePipeline to ensure that the work on the job is not duplicated by other workers watching for the same job:
def job_acknowledge(jobId, nonce):
try:
print('Acknowledging job')
result = codepipeline.acknowledge_job(jobId=jobId, nonce=nonce)
return result
except Exception as e:
print("Received an error when trying to acknowledge the job: %s" % str(e))
raise
With the job now acknowledged, the worker publishes the source code artifact into the desired repository. The worker gets the value of the artifact S3 bucket and objectKey from the inputArtifacts in the response from the poll_for_jobs API request. Next, the worker creates a new directory in /tmp and downloads the S3 object into this directory:
def get_bucket_location(bucketName, init_client):
region = init_client.get_bucket_location(Bucket=bucketName)['LocationConstraint']
if not region:
region = 'us-east-1'
return region
def get_s3_artifact(bucketName, objectKey, ak, sk, st):
init_s3 = boto3.client('s3')
region = get_bucket_location(bucketName, init_s3)
session = Session(aws_access_key_id=ak,
aws_secret_access_key=sk,
aws_session_token=st)
s3 = session.resource('s3',
region_name=region,
config=botocore.client.Config(signature_version='s3v4'))
try:
tempdirname = tempfile.mkdtemp()
except OSError as e:
print('Could not write temp directory %s' % tempdirname)
raise
bucket = s3.Bucket(bucketName)
obj = bucket.Object(objectKey)
filename = tempdirname + '/' + objectKey
try:
if os.path.dirname(objectKey):
directory = os.path.dirname(filename)
os.makedirs(directory)
print('Downloading the %s object and writing it to disk in %s location' % (objectKey, tempdirname))
with open(filename, 'wb') as data:
obj.download_fileobj(data)
except ClientError as e:
print('Downloading the object and writing the file to disk raised this error: ' + str(e))
raise
return(filename, tempdirname)
Because the downloaded artifact from S3 is a zip file, the worker must unzip it first. To have a clean area in which to work, I extract the downloaded zip archive into a new directory:
def unzip_codepipeline_artifact(artifact, origtmpdir):
# create a new temp directory
# Unzip artifact into new directory
try:
newtempdir = tempfile.mkdtemp()
print('Extracting artifact %s into temporary directory %s' % (artifact, newtempdir))
zip_ref = zipfile.ZipFile(artifact, 'r')
zip_ref.extractall(newtempdir)
zip_ref.close()
shutil.rmtree(origtmpdir)
return(os.listdir(newtempdir), newtempdir)
except OSError as e:
if e.errno != errno.EEXIST:
shutil.rmtree(newtempdir)
raise
The worker now has the npm package that I want to store in my Artifactory NPM repository.
To authenticate with the NPM repository, the worker requests a temporary token from the Artifactory host. After receiving this temporary token, it creates a .npmrc file in the worker user’s home directory that includes a hash of the user name and temporary token. After it has authenticated, the worker runs npm config set registry <URL OF REPOSITORY> to configure the npm registry value to be the Artifactory host. Next, the worker runs npm publish –registry <URL OF REPOSITORY>, which publishes the node package to the NPM repository in the Artifactory host.
def push_to_npm(configuration, artifact_list, temp_dir, jobId):
reponame = configuration['RepoKey']
art_type = configuration['TypeOfArtifact']
print("Putting artifact into NPM repository " + reponame)
token, hostname, username = gen_artifactory_auth_token(configuration)
npmconfigfile = create_npmconfig_file(configuration, username, token)
url = hostname + '/artifactory/api/' + art_type + '/' + reponame
print("Changing directory to " + str(temp_dir))
os.chdir(temp_dir)
try:
print("Publishing following files to the repository: %s " % os.listdir(temp_dir))
print("Sending artifact to Artifactory NPM registry URL: " + url)
subprocess.call(["npm", "config", "set", "registry", url])
req = subprocess.call(["npm", "publish", "--registry", url])
print("Return code from npm publish: " + str(req))
if req != 0:
err_msg = "npm ERR! Recieved non OK response while sending response to Artifactory. Return code from npm publish: " + str(req)
signal_failure(jobId, err_msg)
else:
signal_success(jobId)
except requests.exceptions.RequestException as e:
print("Received an error when trying to commit artifact %s to repository %s: " % (str(art_type), str(configuration['RepoKey']), str(e)))
raise
return(req, npmconfigfile)
If the return value from publishing to the repository is not 0, the worker signals a failure to CodePipeline. If the value is 0, the worker signals success to CodePipeline to indicate that the stage of the pipeline has been completed successfully.
For the custom worker code, see npm_job_worker.py in the GitHub repository.
I run my custom worker on an EC2 instance using the command python npm_job_worker.py, with an optional --version flag that can be used to specify worker versions other than 1. Then I trigger a release change in my pipeline:
From my custom worker output logs, I have just committed a package named node_example at version 1.0.3:
On artifact: index.js
Committing to the repo: https://artifactory.myexamplehost.com/artifactory/api/npm/npm
Sending artifact to Artifactory URL: https:// artifactoryhost.myexamplehost.com/artifactory/api/npm/npm
npm config: 0
npm http PUT https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
npm http 201 https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
+ [email protected]
Return code from npm publish: 0
Signaling success to CodePipeline
After that has been built successfully, I can find my artifact in my Artifactory repository:
To help you automate this process, I have created this AWS CloudFormation template that automates the creation of the CodeBuild project, the custom action, and the CodePipeline pipeline. It also launches the Amazon EC2-based custom job worker in an AWS Auto Scaling group. This template requires you to have a VPC and CodeCommit repository for your Node.js project. If you do not currently have a VPC in which you want to run your custom worker EC2 instances, you can use this AWS QuickStart to create one. If you do not have an existing Node.js project, I’ve provided a sample project in the GitHub repository.
Conclusion
I‘ve shown you the steps to integrate your JFrog Artifactory repository with your CodePipeline workflow. I’ve shown you how to create a custom action in CodePipeline and how to create a custom worker that works in your CI/CD pipeline. To dig deeper into custom actions and see how you can integrate your Artifactory repositories into your AWS CodePipeline projects, check out the full code base on GitHub.
If you have any questions or feedback, feel free to reach out to us through the AWS CodePipeline forum.
Erin McGill is a Solutions Architect in the AWS Partner Program with a focus on DevOps and automation tooling.
Today, we’re excited to announce local build support in AWS CodeBuild.
AWS CodeBuild is a fully managed build service. There are no servers to provision and scale, or software to install, configure, and operate. You just specify the location of your source code, choose your build settings, and CodeBuild runs build scripts for compiling, testing, and packaging your code.
In this blog post, I’ll show you how to set up CodeBuild locally to build and test a sample Java application.
By building an application on a local machine you can:
Test the integrity and contents of a buildspec file locally.
Test and build an application locally before committing.
Identify and fix errors quickly from your local development environment.
Prerequisites
In this post, I am using AWS Cloud9 IDE as my development environment.
If you would like to use AWS Cloud9 as your IDE, follow the express setup steps in the AWS Cloud9 User Guide.
The AWS Cloud9 IDE comes with Docker and Git already installed. If you are going to use your laptop or desktop machine as your development environment, install Docker and Git before you start.
Note: We need to provide three environment variables namely IMAGE_NAME, SOURCE and ARTIFACTS.
IMAGE_NAME: The name of your build environment image.
SOURCE: The absolute path to your source code directory.
ARTIFACTS: The absolute path to your artifact output folder.
When you run the sample project, you get a runtime error that says the YAML file does not exist. This is because a buildspec.yml file is not included in the sample web project. AWS CodeBuild requires a buildspec.yml to run a build. For more information about buildspec.yml, see Build Spec Example in the AWS CodeBuild User Guide.
Let’s add a buildspec.yml file with the following content to the sample-web-app folder and then rebuild the project.
version: 0.2
phases:
build:
commands:
- echo Build started on `date`
- mvn install
artifacts:
files:
- target/javawebdemo.war
This time your build should be successful. Upon successful execution, look in the /artifacts folder for the final built artifacts.zip file to validate.
Conclusion:
In this blog post, I showed you how to quickly set up the CodeBuild local agent to build projects right from your local desktop machine or laptop. As you see, local builds can improve developer productivity by helping you identify and fix errors quickly.
I hope you found this post useful. Feel free to leave your feedback or suggestions in the comments.
Many companies across the globe use Amazon DynamoDB to store and query historical user-interaction data. DynamoDB is a fast NoSQL database used by applications that need consistent, single-digit millisecond latency.
Often, customers want to turn their valuable data in DynamoDB into insights by analyzing a copy of their table stored in Amazon S3. Doing this separates their analytical queries from their low-latency critical paths. This data can be the primary source for understanding customers’ past behavior, predicting future behavior, and generating downstream business value. Customers often turn to DynamoDB because of its great scalability and high availability. After a successful launch, many customers want to use the data in DynamoDB to predict future behaviors or provide personalized recommendations.
DynamoDB is a good fit for low-latency reads and writes, but it’s not practical to scan all data in a DynamoDB database to train a model. In this post, I demonstrate how you can use DynamoDB table data copied to Amazon S3 by AWS Data Pipeline to predict customer behavior. I also demonstrate how you can use this data to provide personalized recommendations for customers using Amazon SageMaker. You can also run ad hoc queries using Amazon Athena against the data. DynamoDB recently released on-demand backups to create full table backups with no performance impact. However, it’s not suitable for our purposes in this post, so I chose AWS Data Pipeline instead to create managed backups are accessible from other services.
To do this, I describe how to read the DynamoDB backup file format in Data Pipeline. I also describe how to convert the objects in S3 to a CSV format that Amazon SageMaker can read. In addition, I show how to schedule regular exports and transformations using Data Pipeline. The sample data used in this post is from Bank Marketing Data Set of UCI.
The solution that I describe provides the following benefits:
Separates analytical queries from production traffic on your DynamoDB table, preserving your DynamoDB read capacity units (RCUs) for important production requests
Automatically updates your model to get real-time predictions
Optimizes for performance (so it doesn’t compete with DynamoDB RCUs after the export) and for cost (using data you already have)
Makes it easier for developers of all skill levels to use Amazon SageMaker
All code and data set in this post are available in this .zip file.
Solution architecture
The following diagram shows the overall architecture of the solution.
The steps that data follows through the architecture are as follows:
Data Pipeline regularly copies the full contents of a DynamoDB table as JSON into an S3
Exported JSON files are converted to comma-separated value (CSV) format to use as a data source for Amazon SageMaker.
Amazon SageMaker renews the model artifact and update the endpoint.
The converted CSV is available for ad hoc queries with Amazon Athena.
Data Pipeline controls this flow and repeats the cycle based on the schedule defined by customer requirements.
Building the auto-updating model
This section discusses details about how to read the DynamoDB exported data in Data Pipeline and build automated workflows for real-time prediction with a regularly updated model.
Find the automation_script.sh file and edit it for your environment. For example, you need to replace 's3://<your bucket>/<datasource path>/' with your own S3 path to the data source for Amazon ML. In the script, the text enclosed by angle brackets—< and >—should be replaced with your own path.
Upload the json-serde-1.3.6-SNAPSHOT-jar-with-dependencies.jar file to your S3 path so that the ADD jar command in Apache Hive can refer to it.
For this solution, the banking.csv should be imported into a DynamoDB table.
Export a DynamoDB table
To export the DynamoDB table to S3, open the Data Pipeline console and choose the Export DynamoDB table to S3 template. In this template, Data Pipeline creates an Amazon EMR cluster and performs an export in the EMRActivity activity. Set proper intervals for backups according to your business requirements.
One core node(m3.xlarge) provides the default capacity for the EMR cluster and should be suitable for the solution in this post. Leave the option to resize the cluster before running enabled in the TableBackupActivity activity to let Data Pipeline scale the cluster to match the table size. The process of converting to CSV format and renewing models happens in this EMR cluster.
For a more in-depth look at how to export data from DynamoDB, see Export Data from DynamoDB in the Data Pipeline documentation.
Add the script to an existing pipeline
After you export your DynamoDB table, you add an additional EMR step to EMRActivity by following these steps:
Open the Data Pipeline console and choose the ID for the pipeline that you want to add the script to.
For Actions, choose Edit.
In the editing console, choose the Activities category and add an EMR step using the custom script downloaded in the previous section, as shown below.
Paste the following command into the new step after the data upload step:
The element #{output.directoryPath} references the S3 path where the data pipeline exports DynamoDB data as JSON. The path should be passed to the script as an argument.
The bash script has two goals, converting data formats and renewing the Amazon SageMaker model. Subsequent sections discuss the contents of the automation script.
Automation script: Convert JSON data to CSV with Hive
We use Apache Hive to transform the data into a new format. The Hive QL script to create an external table and transform the data is included in the custom script that you added to the Data Pipeline definition.
When you run the Hive scripts, do so with the -e option. Also, define the Hive table with the 'org.openx.data.jsonserde.JsonSerDe' row format to parse and read JSON format. The SQL creates a Hive EXTERNAL table, and it reads the DynamoDB backup data on the S3 path passed to it by Data Pipeline.
Note: You should create the table with the “EXTERNAL” keyword to avoid the backup data being accidentally deleted from S3 if you drop the table.
The full automation script for converting follows. Add your own bucket name and data source path in the highlighted areas.
After creating an external table, you need to read data. You then use the INSERT OVERWRITE DIRECTORY ~ SELECT command to write CSV data to the S3 path that you designated as the data source for Amazon SageMaker.
Depending on your requirements, you can eliminate or process the columns in the SELECT clause in this step to optimize data analysis. For example, you might remove some columns that have unpredictable correlations with the target value because keeping the wrong columns might expose your model to “overfitting” during the training. In this post, customer_id columns is removed. Overfitting can make your prediction weak. More information about overfitting can be found in the topic Model Fit: Underfitting vs. Overfitting in the Amazon ML documentation.
Automation script: Renew the Amazon SageMaker model
After the CSV data is replaced and ready to use, create a new model artifact for Amazon SageMaker with the updated dataset on S3. For renewing model artifact, you must create a new training job. Training jobs can be run using the AWS SDK ( for example, Amazon SageMaker boto3 ) or the Amazon SageMaker Python SDK that can be installed with “pip install sagemaker” command as well as the AWS CLI for Amazon SageMaker described in this post.
In addition, consider how to smoothly renew your existing model without service impact, because your model is called by applications in real time. To do this, you need to create a new endpoint configuration first and update a current endpoint with the endpoint configuration that is just created.
#!/bin/bash
## Define variable
REGION=$2
DTTIME=`date +%Y-%m-%d-%H-%M-%S`
ROLE="<your AmazonSageMaker-ExecutionRole>"
# Select containers image based on region.
case "$REGION" in
"us-west-2" )
IMAGE="174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:latest"
;;
"us-east-1" )
IMAGE="382416733822.dkr.ecr.us-east-1.amazonaws.com/linear-learner:latest"
;;
"us-east-2" )
IMAGE="404615174143.dkr.ecr.us-east-2.amazonaws.com/linear-learner:latest"
;;
"eu-west-1" )
IMAGE="438346466558.dkr.ecr.eu-west-1.amazonaws.com/linear-learner:latest"
;;
*)
echo "Invalid Region Name"
exit 1 ;
esac
# Start training job and creating model artifact
TRAINING_JOB_NAME=TRAIN-${DTTIME}
S3OUTPUT="s3://<your bucket name>/model/"
INSTANCETYPE="ml.m4.xlarge"
INSTANCECOUNT=1
VOLUMESIZE=5
aws sagemaker create-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION} --algorithm-specification TrainingImage=${IMAGE},TrainingInputMode=File --role-arn ${ROLE} --input-data-config '[{ "ChannelName": "train", "DataSource": { "S3DataSource": { "S3DataType": "S3Prefix", "S3Uri": "s3://<your bucket name>/<datasource path>/", "S3DataDistributionType": "FullyReplicated" } }, "ContentType": "text/csv", "CompressionType": "None" , "RecordWrapperType": "None" }]' --output-data-config S3OutputPath=${S3OUTPUT} --resource-config InstanceType=${INSTANCETYPE},InstanceCount=${INSTANCECOUNT},VolumeSizeInGB=${VOLUMESIZE} --stopping-condition MaxRuntimeInSeconds=120 --hyper-parameters feature_dim=20,predictor_type=binary_classifier
# Wait until job completed
aws sagemaker wait training-job-completed-or-stopped --training-job-name ${TRAINING_JOB_NAME} --region ${REGION}
# Get newly created model artifact and create model
MODELARTIFACT=`aws sagemaker describe-training-job --training-job-name ${TRAINING_JOB_NAME} --region ${REGION} --query 'ModelArtifacts.S3ModelArtifacts' --output text `
MODELNAME=MODEL-${DTTIME}
aws sagemaker create-model --region ${REGION} --model-name ${MODELNAME} --primary-container Image=${IMAGE},ModelDataUrl=${MODELARTIFACT} --execution-role-arn ${ROLE}
# create a new endpoint configuration
CONFIGNAME=CONFIG-${DTTIME}
aws sagemaker create-endpoint-config --region ${REGION} --endpoint-config-name ${CONFIGNAME} --production-variants VariantName=Users,ModelName=${MODELNAME},InitialInstanceCount=1,InstanceType=ml.m4.xlarge
# create or update the endpoint
STATUS=`aws sagemaker describe-endpoint --endpoint-name ServiceEndpoint --query 'EndpointStatus' --output text --region ${REGION} `
if [[ $STATUS -ne "InService" ]] ;
then
aws sagemaker create-endpoint --endpoint-name ServiceEndpoint --endpoint-config-name ${CONFIGNAME} --region ${REGION}
else
aws sagemaker update-endpoint --endpoint-name ServiceEndpoint --endpoint-config-name ${CONFIGNAME} --region ${REGION}
fi
Grant permission
Before you execute the script, you must grant proper permission to Data Pipeline. Data Pipeline uses the DataPipelineDefaultResourceRole role by default. I added the following policy to DataPipelineDefaultResourceRole to allow Data Pipeline to create, delete, and update the Amazon SageMaker model and data source in the script.
After you deploy a model into production using Amazon SageMaker hosting services, your client applications use this API to get inferences from the model hosted at the specified endpoint. This approach is useful for interactive web, mobile, or desktop applications.
Following, I provide a simple Python code example that queries against Amazon SageMaker endpoint URL with its name (“ServiceEndpoint”) and then uses them for real-time prediction.
Data Pipeline exports DynamoDB table data into S3. The original JSON data should be kept to recover the table in the rare event that this is needed. Data Pipeline then converts JSON to CSV so that Amazon SageMaker can read the data.Note: You should select only meaningful attributes when you convert CSV. For example, if you judge that the “campaign” attribute is not correlated, you can eliminate this attribute from the CSV.
Train the Amazon SageMaker model with the new data source.
When a new customer comes to your site, you can judge how likely it is for this customer to subscribe to your new product based on “predictedScores” provided by Amazon SageMaker.
If the new user subscribes your new product, your application must update the attribute “y” to the value 1 (for yes). This updated data is provided for the next model renewal as a new data source. It serves to improve the accuracy of your prediction. With each new entry, your application can become smarter and deliver better predictions.
Running ad hoc queries using Amazon Athena
Amazon Athena is a serverless query service that makes it easy to analyze large amounts of data stored in Amazon S3 using standard SQL. Athena is useful for examining data and collecting statistics or informative summaries about data. You can also use the powerful analytic functions of Presto, as described in the topic Aggregate Functions of Presto in the Presto documentation.
With the Data Pipeline scheduled activity, recent CSV data is always located in S3 so that you can run ad hoc queries against the data using Amazon Athena. I show this with example SQL statements following. For an in-depth description of this process, see the post Interactive SQL Queries for Data in Amazon S3 on the AWS News Blog.
Creating an Amazon Athena table and running it
Simply, you can create an EXTERNAL table for the CSV data on S3 in Amazon Athena Management Console.
=== Table Creation ===
CREATE EXTERNAL TABLE datasource (
age int,
job string,
marital string ,
education string,
default string,
housing string,
loan string,
contact string,
month string,
day_of_week string,
duration int,
campaign int,
pdays int ,
previous int ,
poutcome string,
emp_var_rate double,
cons_price_idx double,
cons_conf_idx double,
euribor3m double,
nr_employed double,
y int
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n'
LOCATION 's3://<your bucket name>/<datasource path>/';
The following query calculates the correlation coefficient between the target attribute and other attributes using Amazon Athena.
=== Sample Query ===
SELECT corr(age,y) AS correlation_age_and_target,
corr(duration,y) AS correlation_duration_and_target,
corr(campaign,y) AS correlation_campaign_and_target,
corr(contact,y) AS correlation_contact_and_target
FROM ( SELECT age , duration , campaign , y ,
CASE WHEN contact = 'telephone' THEN 1 ELSE 0 END AS contact
FROM datasource
) datasource ;
Conclusion
In this post, I introduce an example of how to analyze data in DynamoDB by using table data in Amazon S3 to optimize DynamoDB table read capacity. You can then use the analyzed data as a new data source to train an Amazon SageMaker model for accurate real-time prediction. In addition, you can run ad hoc queries against the data on S3 using Amazon Athena. I also present how to automate these procedures by using Data Pipeline.
You can adapt this example to your specific use case at hand, and hopefully this post helps you accelerate your development. You can find more examples and use cases for Amazon SageMaker in the video AWS 2017: Introducing Amazon SageMaker on the AWS website.
Yong Seong Lee is a Cloud Support Engineer for AWS Big Data Services. He is interested in every technology related to data/databases and helping customers who have difficulties in using AWS services. His motto is “Enjoy life, be curious and have maximum experience.”
As of March 31, 2018 we had 100,110 spinning hard drives. Of that number, there were 1,922 boot drives and 98,188 data drives. This review looks at the quarterly and lifetime statistics for the data drive models in operation in our data centers. We’ll also take a look at why we are collecting and reporting 10 new SMART attributes and take a sneak peak at some 8 TB Toshiba drives. Along the way, we’ll share observations and insights on the data presented and we look forward to you doing the same in the comments.
Background
Since April 2013, Backblaze has recorded and saved daily hard drive statistics from the drives in our data centers. Each entry consists of the date, manufacturer, model, serial number, status (operational or failed), and all of the SMART attributes reported by that drive. Currently there are about 97 million entries totaling 26 GB of data. You can download this data from our website if you want to do your own research, but for starters here’s what we found.
Hard Drive Reliability Statistics for Q1 2018
At the end of Q1 2018 Backblaze was monitoring 98,188 hard drives used to store data. For our evaluation below we remove from consideration those drives which were used for testing purposes and those drive models for which we did not have at least 45 drives. This leaves us with 98,046 hard drives. The table below covers just Q1 2018.
Notes and Observations
If a drive model has a failure rate of 0%, it only means there were no drive failures of that model during Q1 2018.
The overall Annualized Failure Rate (AFR) for Q1 is just 1.2%, well below the Q4 2017 AFR of 1.65%. Remember that quarterly failure rates can be volatile, especially for models that have a small number of drives and/or a small number of Drive Days.
There were 142 drives (98,188 minus 98,046) that were not included in the list above because we did not have at least 45 of a given drive model. We use 45 drives of the same model as the minimum number when we report quarterly, yearly, and lifetime drive statistics.
Welcome Toshiba 8TB drives, almost…
We mentioned Toshiba 8 TB drives in the first paragraph, but they don’t show up in the Q1 Stats chart. What gives? We only had 20 of the Toshiba 8 TB drives in operation in Q1, so they were excluded from the chart. Why do we have only 20 drives? When we test out a new drive model we start with the “tome test” and it takes 20 drives to fill one tome. A tome is the same drive model in the same logical position in each of the 20 Storage Pods that make up a Backblaze Vault. There are 60 tomes in each vault.
In this test, we created a Backblaze Vault of 8 TB drives, with 59 of the tomes being Seagate 8 TB drives and 1 tome being the Toshiba drives. Then we monitored the performance of the vault and its member tomes to see if, in this case, the Toshiba drives performed as expected.
So far the Toshiba drive is performing fine, but they have been in place for only 20 days. Next up is the “pod test” where we fill a Storage Pod with Toshiba drives and integrate it into a Backblaze Vault comprised of like-sized drives. We hope to have a better look at the Toshiba 8 TB drives in our Q2 report — stay tuned.
Lifetime Hard Drive Reliability Statistics
While the quarterly chart presented earlier gets a lot of interest, the real test of any drive model is over time. Below is the lifetime failure rate chart for all the hard drive models which have 45 or more drives in operation as of March 31st, 2018. For each model, we compute their reliability starting from when they were first installed.
Notes and Observations
The failure rates of all of the larger drives (8-, 10- and 12 TB) are very good, 1.2% AFR (Annualized Failure Rate) or less. Many of these drives were deployed in the last year, so there is some volatility in the data, but you can use the Confidence Interval to get a sense of the failure percentage range.
The overall failure rate of 1.84% is the lowest we have ever achieved, besting the previous low of 2.00% from the end of 2017.
Our regular readers and drive stats wonks may have noticed a sizable jump in the number of HGST 8 TB drives (model: HUH728080ALE600), from 45 last quarter to 1,045 this quarter. As the 10 TB and 12 TB drives become more available, the price per terabyte of the 8 TB drives has gone down. This presented an opportunity to purchase the HGST drives at a price in line with our budget.
We purchased and placed into service the 45 original HGST 8 TB drives in Q2 of 2015. They were our first Helium-filled drives and our only ones until the 10 TB and 12 TB Seagate drives arrived in Q3 2017. We’ll take a first look into whether or not Helium makes a difference in drive failure rates in an upcoming blog post.
New SMART Attributes
If you have previously worked with the hard drive stats data or plan to, you’ll notice that we added 10 more columns of data starting in 2018. There are 5 new SMART attributes we are tracking each with a raw and normalized value:
177 – Wear Range Delta
179 – Used Reserved Block Count Total
181- Program Fail Count Total or Non-4K Aligned Access Count
182 – Erase Fail Count
235 – Good Block Count AND System(Free) Block Count
The 5 values are all related to SSD drives.
Yes, SSD drives, but before you jump to any conclusions, we used 10 Samsung 850 EVO SSDs as boot drives for a period of time in Q1. This was an experiment to see if we could reduce boot up time for the Storage Pods. In our case, the improved boot up speed wasn’t worth the SSD cost, but it did add 10 new columns to the hard drive stats data.
Speaking of hard drive stats data, the complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose, all we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone. It is free.
If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the MS Excel spreadsheet.
Good luck and let us know if you find anything interesting.
[Ed: 5/1/2018 – Updated Lifetime chart to fix error in confidence interval for HGST 4TB drive, model: HDS5C4040ALE630]
AWS Glue is an increasingly popular way to develop serverless ETL (extract, transform, and load) applications for big data and data lake workloads. Organizations that transform their ETL applications to cloud-based, serverless ETL architectures need a seamless, end-to-end continuous integration and continuous delivery (CI/CD) pipeline: from source code, to build, to deployment, to product delivery. Having a good CI/CD pipeline can help your organization discover bugs before they reach production and deliver updates more frequently. It can also help developers write quality code and automate the ETL job release management process, mitigate risk, and more.
AWS Glue is a fully managed data catalog and ETL service. It simplifies and automates the difficult and time-consuming tasks of data discovery, conversion, and job scheduling. AWS Glue crawls your data sources and constructs a data catalog using pre-built classifiers for popular data formats and data types, including CSV, Apache Parquet, JSON, and more.
When you are developing ETL applications using AWS Glue, you might come across some of the following CI/CD challenges:
Iterative development with unit tests
Continuous integration and build
Pushing the ETL pipeline to a test environment
Pushing the ETL pipeline to a production environment
Testing ETL applications using real data (live test)
The following diagram shows the pipeline workflow:
This solution uses AWS CodePipeline, which lets you orchestrate and automate the test and deploy stages for ETL application source code. The solution consists of a pipeline that contains the following stages:
1.) Source Control: In this stage, the AWS Glue ETL job source code and the AWS CloudFormation template file for deploying the ETL jobs are both committed to version control. I chose to use AWS CodeCommit for version control.
2.) LiveTest: In this stage, all resources—including AWS Glue crawlers, jobs, S3 buckets, roles, and other resources that are required for the solution—are provisioned, deployed, live tested, and cleaned up.
The LiveTest stage includes the following actions:
Deploy: In this action, all the resources that are required for this solution (crawlers, jobs, buckets, roles, and so on) are provisioned and deployed using an AWS CloudFormation template.
AutomatedLiveTest: In this action, all the AWS Glue crawlers and jobs are executed and data exploration and validation tests are performed. These validation tests include, but are not limited to, record counts in both raw tables and transformed tables in the data lake and any other business validations. I used AWS CodeBuild for this action.
LiveTestApproval: This action is included for the cases in which a pipeline administrator approval is required to deploy/promote the ETL applications to the next stage. The pipeline pauses in this action until an administrator manually approves the release.
LiveTestCleanup: In this action, all the LiveTest stage resources, including test crawlers, jobs, roles, and so on, are deleted using the AWS CloudFormation template. This action helps minimize cost by ensuring that the test resources exist only for the duration of the AutomatedLiveTest and LiveTestApproval
3.) DeployToProduction: In this stage, all the resources are deployed using the AWS CloudFormation template to the production environment.
Try it out
This code pipeline takes approximately 20 minutes to complete the LiveTest test stage (up to the LiveTest approval stage, in which manual approval is required).
To get started with this solution, choose Launch Stack:
This creates the CI/CD pipeline with all of its stages, as described earlier. It performs an initial commit of the sample AWS Glue ETL job source code to trigger the first release change.
In the AWS CloudFormation console, choose Create. After the template finishes creating resources, you see the pipeline name on the stack Outputs tab.
After that, open the CodePipeline console and select the newly created pipeline. Initially, your pipeline’s CodeCommit stage shows that the source action failed.
Allow a few minutes for your new pipeline to detect the initial commit applied by the CloudFormation stack creation. As soon as the commit is detected, your pipeline starts. You will see the successful stage completion status as soon as the CodeCommit source stage runs.
In the CodeCommit console, choose Code in the navigation pane to view the solution files.
Next, you can watch how the pipeline goes through the LiveTest stage of the deploy and AutomatedLiveTest actions, until it finally reaches the LiveTestApproval action.
At this point, if you check the AWS CloudFormation console, you can see that a new template has been deployed as part of the LiveTest deploy action.
At this point, make sure that the AWS Glue crawlers and the AWS Glue job ran successfully. Also check whether the corresponding databases and external tables have been created in the AWS Glue Data Catalog. Then verify that the data is validated using Amazon Athena, as shown following.
Open the AWS Glue console, and choose Databases in the navigation pane. You will see the following databases in the Data Catalog:
Open the Amazon Athena console, and run the following queries. Verify that the record counts are matching.
SELECT count(*) FROM "nycitytaxi_gluedemocicdtest"."data";
SELECT count(*) FROM "nytaxiparquet_gluedemocicdtest"."datalake";
The following shows the raw data:
The following shows the transformed data:
The pipeline pauses the action until the release is approved. After validating the data, manually approve the revision on the LiveTestApproval action on the CodePipeline console.
Add comments as needed, and choose Approve.
The LiveTestApproval stage now appears as Approved on the console.
After the revision is approved, the pipeline proceeds to use the AWS CloudFormation template to destroy the resources that were deployed in the LiveTest deploy action. This helps reduce cost and ensures a clean test environment on every deployment.
Production deployment is the final stage. In this stage, all the resources—AWS Glue crawlers, AWS Glue jobs, Amazon S3 buckets, roles, and so on—are provisioned and deployed to the production environment using the AWS CloudFormation template.
After successfully running the whole pipeline, feel free to experiment with it by changing the source code stored on AWS CodeCommit. For example, if you modify the AWS Glue ETL job to generate an error, it should make the AutomatedLiveTest action fail. Or if you change the AWS CloudFormation template to make its creation fail, it should affect the LiveTest deploy action. The objective of the pipeline is to guarantee that all changes that are deployed to production are guaranteed to work as expected.
Conclusion
In this post, you learned how easy it is to implement CI/CD for serverless AWS Glue ETL solutions with AWS developer tools like AWS CodePipeline and AWS CodeBuild at scale. Implementing such solutions can help you accelerate ETL development and testing at your organization.
If you have questions or suggestions, please comment below.
Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.
Security updates have been issued by Arch Linux (apache), openSUSE (libvirt, openssl, policycoreutils, and zziplib), Oracle (firefox and python-paramiko), and Red Hat (python-paramiko).
Thanks to Raja Mani, AWS Solutions Architect, for this great blog.
—
In this blog post, I’ll walk you through the steps for setting up continuous replication of an AWS CodeCommit repository from one AWS region to another AWS region using a serverless architecture. CodeCommit is a fully-managed, highly scalable source control service that stores anything from source code to binaries. It works seamlessly with your existing Git tools and eliminates the need to operate your own source control system. Replicating an AWS CodeCommit repository from one AWS region to another AWS region enables you to achieve lower latency pulls for global developers. This same approach can also be used to automatically back up repositories currently hosted on other services (for example, GitHub or BitBucket) to AWS CodeCommit.
This solution uses AWS Lambda and AWS Fargate for continuous replication. Benefits of this approach include:
The replication process can be easily setup to trigger based on events, such as commits made to the repository.
Setting up a serverless architecture means you don’t need to provision, maintain, or administer servers.
Note: AWS Fargate has a limitation of 10 GB for storage and is available in US East (N. Virginia) region. A similar solution that uses Amazon EC2 instances to replicate the repositories on a schedule was published in a previous blog and can be used if your repository does not meet these conditions.
Replication using Fargate
As you follow this blog post, you’ll set up an architecture that looks like this:
Any change in the AWS CodeCommit repository will trigger a Lambda function. The Lambda function will call the Fargate task that replicates the repository using a Git command line tool.
Let us assume a user wants to replicate a repository (Source) from US East (N. Virginia/us-east-1) region to a repository (Destination) in US West (Oregon/us-west-2) region. I’ll walk you through the steps for it:
Prerequisites
Create an AWS Service IAM role for Amazon EC2 that has permission for both source and destination repositories, IAM CreateRole, AttachRolePolicy and Amazon ECR privileges. Here is the EC2 role policy I used:
You need a Docker environment to build this solution. You can launch an EC2 instance and install Docker (or) you can use AWS Cloud9 that comes with Docker and Git preinstalled. I used an EC2 instance and installed Docker in it. Use the IAM role created in the previous step when creating the EC2 instance. I am going to refer this environment as “Docker Environment” in the following steps.
You need to install the AWS CLI on the Docker environment. For AWS CLI installation, refer this page.
You need to install Git, including a Git command line on the Docker environment.
Step 1: Create the Docker image
To create the Docker image, first it needs a Dockerfile. A Dockerfile is a manifest that describes the base image to use for your Docker image and what you want installed and running on it. For more information about Dockerfiles, go to the Dockerfile Reference.
1. Choose a directory in the Docker environment and perform the following steps in that directory. I used /home/ec2-user directory to perform the following steps.
2. Clone the AWS CodeCommit repository in the Docker environment. Open the terminal to the Docker environment and run the following commands to clone your source AWS CodeCommit repository (I ran the commands from /home/ec2-user directory):
Note: Change the URL marked in red to your source and destination repository URL.
3. Create a file called Dockerfile (case sensitive) with the following content (I created it in /home/ec2-user directory):
# Pull the Amazon Linux latest base image
FROM amazonlinux:latest
#Install aws-cli and git command line tools
RUN yum -y install unzip aws-cli
RUN yum -y install git
WORKDIR /home/ec2-user
RUN mkdir LocalRepository
WORKDIR /home/ec2-user/LocalRepository
#Copy Cloned CodeCommit repository to Docker container
COPY ./LocalRepository /home/ec2-user/LocalRepository
#Copy shell script that does the replication
COPY ./repl_repository.bash /home/ec2-user/LocalRepository
RUN chmod ugo+rwx /home/ec2-user/LocalRepository/repl_repository.bash
WORKDIR /home/ec2-user/LocalRepository
#Call this script when Docker starts the container
ENTRYPOINT ["/home/ec2-user/LocalRepository/repl_repository.bash"]
4. Copy the following shell script into a file called repl_repository.bash to the DockerFile directory location in the Docker environment (I created it in /home/ec2-user directory)
6. Verify whether the replication is working by running the repl_repository.bash script from the LocalRepository directory. Go to LocalRepository directory and run this command: . ../repl_repository.bash If it is successful, you will get the “Everything up-to-date” at the last line of the result like this:
$ . ../repl_repository.bash
Everything up-to-date
Step 2: Build the Docker Image
1. Build the Docker image by running this command from the directory where you created the DockerFile in the Docker environment in the previous step (I ran it from /home/ec2-user directory):
$ docker build . –t ccrepl
Output: It installs various packages and set environment variables as part of steps 1 to 3 from the Dockerfile. The steps 4 to 11 from the Dockerfile should produce an output similar to the following:
2. Run the following command to verify that the image was created successfully. It will display “Everything up-to-date” at the end if it is successful.
[[email protected] LocalRepository]$ docker run ccrepl
Everything up-to-date
Step 3: Push the Docker Image to Amazon Elastic Container Registry (ECR)
Perform the following steps in the Docker Environment.
1. Run the AWS CLI configure command and set default region as your source repository region (I used us-east-1).
$ aws configure set default.region <Source Repository Region>
2. Create an Amazon ECR repository using this command to store your ccrepl image (Note the repositoryUri in the output):
2. Create a role called AccessRoleForCCfromFG using the following command in the DockerEnvironment:
$ aws iam create-role --role-name AccessRoleForCCfromFG --assume-role-policy-document file://trustpolicyforecs.json
3. Assign CodeCommit service full access to the above role using the following command in the DockerEnvironment:
$ aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AWSCodeCommitFullAccess --role-name AccessRoleForCCfromFG
4. In the Amazon ECS Console, choose Repositories and select the ccrepl repository that was created in the previous step. Copy the Repository URI.
5. In the Amazon ECS Console, choose Task Definitions and click Create New Task Definition.
6. Select launch type compatibility as FARGATE and click Next Step.
7. In the create task definition screen, do the following:
In Task Definition Name, type ccrepl
In Task Role, choose AccessRoleForCCfromFG
In Task Memory, choose 2GB
In Task CPU, choose 1 vCPU
Click Add Container under Container Definitions in the same screen. In the Add Container screen, do the following:
Enter Container name as ccreplcont
Enter Image URL copied from step 4
Enter Memory Limits as 128 and click Add.
Note: Select TaskExecutionRole as “ecsTaskExecutionRole” if it already exists. If not, select create new role and it will create “ecsTaskExecutionRole” for you.
8. Click the Create button in the task definition screen to create the task. It will successfully create the task, execution role and AWS CloudWatch Log groups.
9. In the Amazon ECS Console, click Clusters and create cluster. Select template as “Networking only, Powered by AWS Fargate” and click next step.
10. Enter cluster name as ccreplcluster and click create.
Step 5: Create the Lambda Function
In this section, I used Amazon Elastic Container Service (ECS) run task API from Lambda to invoke the Fargate task.
1. In the IAM Console, create a new role called ECSLambdaRole with the permissions to AWS CodeCommit, Amazon ECS as well as pass roles privileges needed to run the ECS task. Your statement should look similar to the following (replace <your account id>):
2. In AWS management console, select VPC service and click subnets in the left navigation screen. Note down the Subnet IDs that you want to run the Fargate task in.
3. Create a new Lambda Node.js function called FargateTaskExecutionFunc and assign the role ECSLambdaRole with the following content:
Note: Replace subnets values (marked in red color) with the subnet IDs you identified as the subnets you wanted to run the Fargate task on in Step 2 of this section.
1. In the Lambda Console, click FargateTaskExecutionFunc under functions.
2. Under Add triggers in the Designer, select CodeCommit
3. In the Configure triggers screen, do the following:
Enter Repository name as Source (your source repository name)
Enter trigger name as LambdaTrigger
Leave the Events as “All repository events”
Leave the Branch names as “All branches”
Click Add button
Click Save button to save the changes
Step 6: Verification
To test the application, make a commit and push the changes to the source repository in AWS CodeCommit. That should automatically trigger the Lambda function and replicate the changes in the destination repository. You can verify this by checking CloudWatch Logs for Lambda and ECS, or simply going to the destination repository and verifying the change appears.
Conclusion
Congratulations! You have successfully configured repository replication of an AWS CodeCommit repository using AWS Lambda and AWS Fargate. You can use this technique in a deployment pipeline. You can also tweak the trigger configuration in AWS CodeCommit to call the Lambda function in response to any supported trigger event in AWS CodeCommit.
Security updates have been issued by Arch Linux (openssl and zziplib), Debian (ldap-account-manager, ming, python-crypto, sam2p, sdl-image1.2, and squirrelmail), Fedora (bchunk, koji, libidn, librelp, nodejs, and php), Gentoo (curl, dhcp, libvirt, mailx, poppler, qemu, and spice-vdagent), Mageia (389-ds-base, aubio, cfitsio, libvncserver, nmap, and ntp), openSUSE (GraphicsMagick, ImageMagick, spice-gtk, and wireshark), Oracle (kubernetes), Slackware (patch), and SUSE (apache2 and openssl).
You can always view and manage your Amazon GuardDuty findings on the Findings page in the GuardDuty console or by using GuardDuty APIs with the AWS CLI or SDK. But there’s a quicker and easier way, you can use Amazon Alexa as a conversational interface to review your GuardDuty findings. With Alexa, you can build natural voice experiences and create a more intuitive way of interacting GuardDuty.
In this post, I show you how to deploy a sample custom Alexa skill and use an Alexa-enabled device, such as Amazon Echo, to get information about GuardDuty findings across your AWS accounts and regions. The information provided by this sample skill gives you a broad overview of GuardDuty finding statistics, severities, and descriptions. When you hear something interesting, you can log in to the GuardDuty console or another analysis tool to investigate the findings data.
Note: Although not covered here, you can also deploy this sample skill using Alexa for Business, which you can use to make skills available to your shared devices and enrolled users without having to publish them to the Alexa skills store.
Prerequisites
To complete the steps in this post, make sure you have:
A basic understanding of Alexa Custom Skills, which is helpful for deploying the sample skill described here. If you’re not already familiar with Alexa custom skill concepts and terminology, you might want to review the following documentation resources.
An AWS account with GuardDuty enabled in one or more AWS regions.
Deploy the Lambda function by using the CloudFormation Template.
Create the custom skill in the Alexa developer console.
Test the skill using an Alexa-enabled device.
Deploy the Lambda function with the CloudFormation Template
For this next step, make sure you deploy the template within the AWS account you want to monitor.
To deploy the Lambda function in the N. Virginia region (see the note below), you can use the CloudFormation template provided by clicking the following link: load the supplied template. In the CloudFormation console, on the Select Template page, select Next.
Note: The following AWS regions support hosting custom Alexa skills: US East (N. Virginia), Asia Pacific (Tokyo), EU (Ireland), West (Oregon). If you want to deploy in a region other than N. Virginia, you will first need to upload the custom skill’s Lambda deployment package (zip file with code) to an S3 bucket in the selected region.
After you load the template, provide the following input parameters:
Input parameter
Input parameter description
FLASHREGIONS
Comma separated list of region Ids with NO spaces to include in flash briefing stats. At least one region is required. Make sure GuardDuty is enabled in regions declared.
MAXRESP
Max number of findings to return in a response.
ArtifactsBucket
S3 Bucket where Lambda deployment package resides. Leave the default for N. Virginia.
ArtifactsPrefix
Path in S3 bucket where Lambda deployment package resides. Leave the default for N. Virginia.
On the Specify Details page, enter the input parameters (see above), and then select Next.
On the Options page, accept the default values, and then select Next.
On the Review page, confirm the details, and then select Create. The stack will be created in approximately 2 minutes.
Create the custom skill in the Alexa developer console
In the second part of this solution implementation, you will create the skill in the Amazon Developer Console.
Sign in to the Alexa area of the Amazon Developer Console, select Your Alexa Consoles in the top right, and then select Skills.
Select Create Skill.
For the name, enter Ask Amazon GuardDuty, and then select Next.
In the Choose a model to add to your skill page, select Custom, and then select Create skill.
Select the JSON Editor and paste the contents of the alexa_ask_guardduty_skill.json file into the code editor, and overwrite the existing content. This file contains the intent schema which defines the set of intents the service can accept and process.
Select Save Model, select Build Model, and then wait for the build to complete.
When the model build is complete, on the left side, select Endpoint.
In the Endpoint page, in the Service Endpoint Type section, select AWS Lambda ARN (Amazon Resource Name).
In the Default Region field, copy and paste the value from the CloudFormation Stack Outputs key named AlexaAskGDSkillArn. Leave the default values for other options, and then select Save Endpoints.
Because you’re not publishing this skill, you don’t need to complete the Launch section of the configuration. The skill will remain in the “Development” status and will only be available for Alexa devices linked to the Amazon developer account used to create the skill. Anyone with physical access to the linked Alexa-enabled device can use the custom skill. As a best practice, I recommend that you delete the Lambda trigger created by the CloudFormation template and add a new one with Skill ID verification enabled.
Test the skill using an Alexa-enabled device
Now that you’ve deployed the sample solution, the next step is to test the skill. Make sure you’re using an Alexa-enabled device linked to the Amazon developer account used to create the skill. Before testing, if there are no current GuardDuty findings available, you can generate sample findings in the console. When you generate sample findings, GuardDuty populates your current findings list with one sample finding for each supported finding type.
You can test using the following voice commands:
“Alexa, Open GuardDuty” — Opens the skill and provides a welcome response. You can also use “Alexa, Ask GuardDuty”.
“Get flash briefing” — Provides global and regional counts for low, medium, and high severity findings. The regions declared in the FLASHREGIONS parameter are included. You can also use “Ask GuardDuty to get flash briefing” to bypass the welcome message. You can learn more about GuardDuty severity levels in the documentation.
For the next set of commands, you can specify the region, use region names such as <Virginia>, <Oregon>, <Ireland>, and so on:
“Get statistics for region” — Provides regional counts for low, medium, and high severity findings.
“Get findings for region” — Returns finding information for the requested region. The number of findings returned is configured in the MAXRESP parameter.
“Get <high/medium/low> severity findings for region” – Returns finding information with the minimum severity requested as high, medium, or low. The number of findings returned is configured in the MAXRESP parameter.
“Help” — Provides information about the skill and supported utterances. Also provides current configuration for FLASHREGIONS and MAXRESP.
You can use this sample solution to get GuardDuty statistics and findings through the Alexa conversational interface. You’ll be able to identify findings that require further investigation quickly. This solution’s code is available on GitHub.
This post courtesy of Ed Lima, AWS Solutions Architect
We are excited to announce that you can now develop your AWS Lambda functions using the Node.js 8.10 runtime, which is the current Long Term Support (LTS) version of Node.js. Start using this new version today by specifying a runtime parameter value of nodejs8.10 when creating or updating functions.
Supporting async/await
The Lambda programming model for Node.js 8.10 now supports defining a function handler using the async/await pattern.
Asynchronous or non-blocking calls are an inherent and important part of applications, as user and human interfaces are asynchronous by nature. If you decide to have a coffee with a friend, you usually order the coffee then start or continue a conversation with your friend while the coffee is getting ready. You don’t wait for the coffee to be ready before you start talking. These activities are asynchronous, because you can start one and then move to the next without waiting for completion. Otherwise, you’d delay (or block) the start of the next activity.
Asynchronous calls used to be handled in Node.js using callbacks. That presented problems when they were nested within other callbacks in multiple levels, making the code difficult to maintain and understand.
Promises were implemented to try to solve issues caused by “callback hell.” They allow asynchronous operations to call their own methods and handle what happens when a call is successful or when it fails. As your requirements become more complicated, even promises become harder to work with and may still end up complicating your code.
Async/await is the new way of handling asynchronous operations in Node.js, and makes for simpler, easier, and cleaner code for non-blocking calls. It still uses promises but a callback is returned directly from the asynchronous function, just as if it were a synchronous blocking function.
Take for instance the following Lambda function to get the current account settings, using the Node.js 6.10 runtime:
let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
exports.handler = (event, context, callback) => {
let getAccountSettingsPromise = lambda.getAccountSettings().promise();
getAccountSettingsPromise.then(
(data) => {
callback(null, data);
},
(err) => {
console.log(err);
callback(err);
}
);
};
With the new Node.js 8.10 runtime, there are new handler types that can be declared with the “async” keyword or can return a promise directly.
This is how the same function looks like using async/await with Node.js 8.10:
let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
exports.handler = async (event) => {
return await lambda.getAccountSettings().promise() ;
};
Alternatively, you could have the handler return a promise directly:
let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
exports.handler = (event) => {
return new Promise((resolve, reject) => {
lambda.getAccountSettings(event)
.then((data) => {
resolve data;
})
.catch(reject);
});
};
The new handler types are alternatives to the callback pattern, which is still fully supported.
All three functions return the same results. However, in the new runtime with async/await, all callbacks in the code are gone, which makes it easier to read. This is especially true for those less familiar with promises.
Another great advantage of async/await is better error handling. You can use a try/catch block inside the scope of an async function. Even though the function awaits an asynchronous operation, any errors end up in the catch block.
You can improve your previous Node.js 8.10 function with this trusted try/catch error handling pattern:
let AWS = require('aws-sdk');
let lambda = new AWS.Lambda();
let data;
exports.handler = async (event) => {
try {
data = await lambda.getAccountSettings().promise();
}
catch (err) {
console.log(err);
return err;
}
return data;
};
While you now have a similar number of lines in both runtimes, the code is cleaner and more readable with async/await. It makes the asynchronous calls look more synchronous. However, it is important to notice that the code is still executed the same way as if it were using a callback or promise-based API.
Backward compatibility
You may port your existing Node.js 4.3 and 6.10 functions over to Node.js 8.10 by updating the runtime. Node.js 8.10 does include numerous breaking changes from previous Node versions.
Make sure to review the API changes between Node.js 4.3, 6.10, and Node.js 8.10 to see if there are other changes that might affect your code. We recommend testing that your Lambda function passes internal validation for its behavior when upgrading to the new runtime version.
You can use Lambda versions/aliases to safely test that your function runs as expected on Node 8.10, before routing production traffic to it.
New node features
You can now get better performance when compared to the previous LTS version 6.x (up to 20%). The new V8 6.0 engine comes with Turbofan and the Ignition pipeline, which leads to lower memory consumption and faster startup time across Node.js applications.
HTTP/2, which is subject to future changes, allows developers to use the new protocol to speed application development and undo many of HTTP/1.1 workarounds to make applications faster, simpler, and more powerful.
In this blog post, I will show how you can perform unit testing as a part of your AWS CodeStar project. AWS CodeStar helps you quickly develop, build, and deploy applications on AWS. With AWS CodeStar, you can set up your continuous delivery (CD) toolchain and manage your software development from one place.
Because unit testing tests individual units of application code, it is helpful for quickly identifying and isolating issues. As a part of an automated CI/CD process, it can also be used to prevent bad code from being deployed into production.
Many of the AWS CodeStar project templates come preconfigured with a unit testing framework so that you can start deploying your code with more confidence. The unit testing is configured to run in the provided build stage so that, if the unit tests do not pass, the code is not deployed. For a list of AWS CodeStar project templates that include unit testing, see AWS CodeStar Project Templates in the AWS CodeStar User Guide.
The scenario
As a big fan of superhero movies, I decided to list my favorites and ask my friends to vote on theirs by using a WebService endpoint I created. The example I use is a Python web service running on AWS Lambda with AWS CodeCommit as the code repository. CodeCommit is a fully managed source control system that hosts Git repositories and works with all Git-based tools.
Here’s how you can create the WebService endpoint:
Sign in to the AWS CodeStar console. Choose Start a project, which will take you to the list of project templates.
For code edits I will choose AWS Cloud9, which is a cloud-based integrated development environment (IDE) that you use to write, run, and debug code.
Here are the other tasks required by my scenario:
Create a database table where the votes can be stored and retrieved as needed.
Update the logic in the Lambda function that was created for posting and getting the votes.
Update the unit tests (of course!) to verify that the logic works as expected.
For a database table, I’ve chosen Amazon DynamoDB, which offers a fast and flexible NoSQL database.
Getting set up on AWS Cloud9
From the AWS CodeStar console, go to the AWS Cloud9 console, which should take you to your project code. I will open up a terminal at the top-level folder under which I will set up my environment and required libraries.
Use the following command to set the PYTHONPATH environment variable on the terminal.
You should now be able to use the following command to execute the unit tests in your project.
python -m unittest discover vote-your-movie/tests
Start coding
Now that you have set up your local environment and have a copy of your code, add a DynamoDB table to the project by defining it through a template file. Open template.yml, which is the Serverless Application Model (SAM) template file. This template extends AWS CloudFormation to provide a simplified way of defining the Amazon API Gateway APIs, AWS Lambda functions, and Amazon DynamoDB tables required by your serverless application.
AWSTemplateFormatVersion: 2010-09-09
Transform:
- AWS::Serverless-2016-10-31
- AWS::CodeStar
Parameters:
ProjectId:
Type: String
Description: CodeStar projectId used to associate new resources to team members
Resources:
# The DB table to store the votes.
MovieVoteTable:
Type: AWS::Serverless::SimpleTable
Properties:
PrimaryKey:
# Name of the "Candidate" is the partition key of the table.
Name: Candidate
Type: String
# Creating a new lambda function for retrieving and storing votes.
MovieVoteLambda:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: python3.6
Environment:
# Setting environment variables for your lambda function.
Variables:
TABLE_NAME: !Ref "MovieVoteTable"
TABLE_REGION: !Ref "AWS::Region"
Role:
Fn::ImportValue:
!Join ['-', [!Ref 'ProjectId', !Ref 'AWS::Region', 'LambdaTrustRole']]
Events:
GetEvent:
Type: Api
Properties:
Path: /
Method: get
PostEvent:
Type: Api
Properties:
Path: /
Method: post
We’ll use Python’s boto3 library to connect to AWS services. And we’ll use Python’s mock library to mock AWS service calls for our unit tests. Use the following command to install these libraries:
pip install --upgrade boto3 mock -t .
Add these libraries to the buildspec.yml, which is the YAML file that is required for CodeBuild to execute.
version: 0.2
phases:
install:
commands:
# Upgrade AWS CLI to the latest version
- pip install --upgrade awscli boto3 mock
pre_build:
commands:
# Discover and run unit tests in the 'tests' directory. For more information, see <https://docs.python.org/3/library/unittest.html#test-discovery>
- python -m unittest discover tests
build:
commands:
# Use AWS SAM to package the application by using AWS CloudFormation
- aws cloudformation package --template template.yml --s3-bucket $S3_BUCKET --output-template template-export.yml
artifacts:
type: zip
files:
- template-export.yml
Open the index.py where we can write the simple voting logic for our Lambda function.
import json
import datetime
import boto3
import os
table_name = os.environ['TABLE_NAME']
table_region = os.environ['TABLE_REGION']
VOTES_TABLE = boto3.resource('dynamodb', region_name=table_region).Table(table_name)
CANDIDATES = {"A": "Black Panther", "B": "Captain America: Civil War", "C": "Guardians of the Galaxy", "D": "Thor: Ragnarok"}
def handler(event, context):
if event['httpMethod'] == 'GET':
resp = VOTES_TABLE.scan()
return {'statusCode': 200,
'body': json.dumps({item['Candidate']: int(item['Votes']) for item in resp['Items']}),
'headers': {'Content-Type': 'application/json'}}
elif event['httpMethod'] == 'POST':
try:
body = json.loads(event['body'])
except:
return {'statusCode': 400,
'body': 'Invalid input! Expecting a JSON.',
'headers': {'Content-Type': 'application/json'}}
if 'candidate' not in body:
return {'statusCode': 400,
'body': 'Missing "candidate" in request.',
'headers': {'Content-Type': 'application/json'}}
if body['candidate'] not in CANDIDATES.keys():
return {'statusCode': 400,
'body': 'You must vote for one of the following candidates - {}.'.format(get_allowed_candidates()),
'headers': {'Content-Type': 'application/json'}}
resp = VOTES_TABLE.update_item(
Key={'Candidate': CANDIDATES.get(body['candidate'])},
UpdateExpression='ADD Votes :incr',
ExpressionAttributeValues={':incr': 1},
ReturnValues='ALL_NEW'
)
return {'statusCode': 200,
'body': "{} now has {} votes".format(CANDIDATES.get(body['candidate']), resp['Attributes']['Votes']),
'headers': {'Content-Type': 'application/json'}}
def get_allowed_candidates():
l = []
for key in CANDIDATES:
l.append("'{}' for '{}'".format(key, CANDIDATES.get(key)))
return ", ".join(l)
What our code basically does is take in the HTTPS request call as an event. If it is an HTTP GET request, it gets the votes result from the table. If it is an HTTP POST request, it sets a vote for the candidate of choice. We also validate the inputs in the POST request to filter out requests that seem malicious. That way, only valid calls are stored in the table.
In the example code provided, we use a CANDIDATES variable to store our candidates, but you can store the candidates in a JSON file and use Python’s json library instead.
Let’s update the tests now. Under the tests folder, open the test_handler.py and modify it to verify the logic.
import os
# Some mock environment variables that would be used by the mock for DynamoDB
os.environ['TABLE_NAME'] = "MockHelloWorldTable"
os.environ['TABLE_REGION'] = "us-east-1"
# The library containing our logic.
import index
# Boto3's core library
import botocore
# For handling JSON.
import json
# Unit test library
import unittest
## Getting StringIO based on your setup.
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
## Python mock library
from mock import patch, call
from decimal import Decimal
@patch('botocore.client.BaseClient._make_api_call')
class TestCandidateVotes(unittest.TestCase):
## Test the HTTP GET request flow.
## We expect to get back a successful response with results of votes from the table (mocked).
def test_get_votes(self, boto_mock):
# Input event to our method to test.
expected_event = {'httpMethod': 'GET'}
# The mocked values in our DynamoDB table.
items_in_db = [{'Candidate': 'Black Panther', 'Votes': Decimal('3')},
{'Candidate': 'Captain America: Civil War', 'Votes': Decimal('8')},
{'Candidate': 'Guardians of the Galaxy', 'Votes': Decimal('8')},
{'Candidate': "Thor: Ragnarok", 'Votes': Decimal('1')}
]
# The mocked DynamoDB response.
expected_ddb_response = {'Items': items_in_db}
# The mocked response we expect back by calling DynamoDB through boto.
response_body = botocore.response.StreamingBody(StringIO(str(expected_ddb_response)),
len(str(expected_ddb_response)))
# Setting the expected value in the mock.
boto_mock.side_effect = [expected_ddb_response]
# Expecting that there would be a call to DynamoDB Scan function during execution with these parameters.
expected_calls = [call('Scan', {'TableName': os.environ['TABLE_NAME']})]
# Call the function to test.
result = index.handler(expected_event, {})
# Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
assert result.get('headers').get('Content-Type') == 'application/json'
assert result.get('statusCode') == 200
result_body = json.loads(result.get('body'))
# Verifying that the results match to that from the table.
assert len(result_body) == len(items_in_db)
for i in range(len(result_body)):
assert result_body.get(items_in_db[i].get("Candidate")) == int(items_in_db[i].get("Votes"))
assert boto_mock.call_count == 1
boto_mock.assert_has_calls(expected_calls)
## Test the HTTP POST request flow that places a vote for a selected candidate.
## We expect to get back a successful response with a confirmation message.
def test_place_valid_candidate_vote(self, boto_mock):
# Input event to our method to test.
expected_event = {'httpMethod': 'POST', 'body': "{\"candidate\": \"D\"}"}
# The mocked response in our DynamoDB table.
expected_ddb_response = {'Attributes': {'Candidate': "Thor: Ragnarok", 'Votes': Decimal('2')}}
# The mocked response we expect back by calling DynamoDB through boto.
response_body = botocore.response.StreamingBody(StringIO(str(expected_ddb_response)),
len(str(expected_ddb_response)))
# Setting the expected value in the mock.
boto_mock.side_effect = [expected_ddb_response]
# Expecting that there would be a call to DynamoDB UpdateItem function during execution with these parameters.
expected_calls = [call('UpdateItem', {
'TableName': os.environ['TABLE_NAME'],
'Key': {'Candidate': 'Thor: Ragnarok'},
'UpdateExpression': 'ADD Votes :incr',
'ExpressionAttributeValues': {':incr': 1},
'ReturnValues': 'ALL_NEW'
})]
# Call the function to test.
result = index.handler(expected_event, {})
# Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
assert result.get('headers').get('Content-Type') == 'application/json'
assert result.get('statusCode') == 200
assert result.get('body') == "{} now has {} votes".format(
expected_ddb_response['Attributes']['Candidate'],
expected_ddb_response['Attributes']['Votes'])
assert boto_mock.call_count == 1
boto_mock.assert_has_calls(expected_calls)
## Test the HTTP POST request flow that places a vote for an non-existant candidate.
## We expect to get back a successful response with a confirmation message.
def test_place_invalid_candidate_vote(self, boto_mock):
# Input event to our method to test.
# The valid IDs for the candidates are A, B, C, and D
expected_event = {'httpMethod': 'POST', 'body': "{\"candidate\": \"E\"}"}
# Call the function to test.
result = index.handler(expected_event, {})
# Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
assert result.get('headers').get('Content-Type') == 'application/json'
assert result.get('statusCode') == 400
assert result.get('body') == 'You must vote for one of the following candidates - {}.'.format(index.get_allowed_candidates())
## Test the HTTP POST request flow that places a vote for a selected candidate but associated with an invalid key in the POST body.
## We expect to get back a failed (400) response with an appropriate error message.
def test_place_invalid_data_vote(self, boto_mock):
# Input event to our method to test.
# "name" is not the expected input key.
expected_event = {'httpMethod': 'POST', 'body': "{\"name\": \"D\"}"}
# Call the function to test.
result = index.handler(expected_event, {})
# Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
assert result.get('headers').get('Content-Type') == 'application/json'
assert result.get('statusCode') == 400
assert result.get('body') == 'Missing "candidate" in request.'
## Test the HTTP POST request flow that places a vote for a selected candidate but not as a JSON string which the body of the request expects.
## We expect to get back a failed (400) response with an appropriate error message.
def test_place_malformed_json_vote(self, boto_mock):
# Input event to our method to test.
# "body" receives a string rather than a JSON string.
expected_event = {'httpMethod': 'POST', 'body': "Thor: Ragnarok"}
# Call the function to test.
result = index.handler(expected_event, {})
# Run unit test assertions to verify the expected calls to mock have occurred and verify the response.
assert result.get('headers').get('Content-Type') == 'application/json'
assert result.get('statusCode') == 400
assert result.get('body') == 'Invalid input! Expecting a JSON.'
if __name__ == '__main__':
unittest.main()
I am keeping the code samples well commented so that it’s clear what each unit test accomplishes. It tests the success conditions and the failure paths that are handled in the logic.
In my unit tests I use the patch decorator (@patch) in the mock library. @patch helps mock the function you want to call (in this case, the botocore library’s _make_api_call function in the BaseClient class). Before we commit our changes, let’s run the tests locally. On the terminal, run the tests again. If all the unit tests pass, you should expect to see a result like this:
You:~/environment $ python -m unittest discover vote-your-movie/tests
.....
----------------------------------------------------------------------
Ran 5 tests in 0.003s
OK
You:~/environment $
Upload to AWS
Now that the tests have passed, it’s time to commit and push the code to source repository!
Add your changes
From the terminal, go to the project’s folder and use the following command to verify the changes you are about to push.
git status
To add the modified files only, use the following command:
git add -u
Commit your changes
To commit the changes (with a message), use the following command:
git commit -m "Logic and tests for the voting webservice."
Push your changes to AWS CodeCommit
To push your committed changes to CodeCommit, use the following command:
git push
In the AWS CodeStar console, you can see your changes flowing through the pipeline and being deployed. There are also links in the AWS CodeStar console that take you to this project’s build runs so you can see your tests running on AWS CodeBuild. The latest link under the Build Runs table takes you to the logs.
After the deployment is complete, AWS CodeStar should now display the AWS Lambda function and DynamoDB table created and synced with this project. The Project link in the AWS CodeStar project’s navigation bar displays the AWS resources linked to this project.
Because this is a new database table, there should be no data in it. So, let’s put in some votes. You can download Postman to test your application endpoint for POST and GET calls. The endpoint you want to test is the URL displayed under Application endpoints in the AWS CodeStar console.
Now let’s open Postman and look at the results. Let’s create some votes through POST requests. Based on this example, a valid vote has a value of A, B, C, or D. Here’s what a successful POST request looks like:
Here’s what it looks like if I use some value other than A, B, C, or D:
Now I am going to use a GET request to fetch the results of the votes from the database.
And that’s it! You have now created a simple voting web service using AWS Lambda, Amazon API Gateway, and DynamoDB and used unit tests to verify your logic so that you ship good code. Happy coding!
I’m Charles Fort, a developer at Woot who specializes in deployments and developer experience. Woot is the original daily deals website. It was founded in 2004 and acquired by Amazon in 2010 – https://www.woot.com
We just moved our web front-end deployments from Troop, a deployment agent we developed ourselves, to AWS CodeDeploy and AWS CodePipeline. This migration involved launching a new customer-facing EC2 web server fleet with the CodeDeploy agent installed and creating and managing our CodeDeploy and CodePipeline resources with AWS CloudFormation. Immediately after we completed the migration, we observed a ~50% reduction in HTTP 500 errors during deployment.
In this blog post, I’m going to show you:
Why we chose AWS deployment tools.
An architectural diagram of Woot’s systems.
An overview of the migration project.
Our migration results.
The old and busted
We wanted to replace our in-house deployment system with something we could build automation on top of, and something that we didn’t have to own or maintain. We already own and maintain a build system, which is bad enough. We didn’t want to run additional infrastructure for our deployment pipeline.
In short, we wanted a cloud service. Because all of our infrastructure is in AWS, CodeDeploy was a natural fit to replace our deployment agent. CodePipeline acts as the automation orchestrator, our wizard in the cloud telling CodeDeploy what to deploy and when.
Architecture overview
Here’s a look at the architecture of a Woot web front end:
Woot architecture overview
Project overview
Our project involved migrating five web front ends, which together handle an average of 12 million requests per day, to CodeDeploy and CodePipeline while keeping the site live for our customers.
Here are the steps we took:
Wrote some new deployment scripts.
Launched a new fleet of EC2 web servers with CodeDeploy support.
Created a deployment pipeline for our CloudFormation-defined CodeDeploy and CodePipeline configuration.
Introduced our new fleet to live traffic. Hello, customers!
Deployment scripts
Our old deployment system didn’t stop or start our web server. Instead, it tried to swap out the build artifacts from under the server while the server was running. That’s definitely a bad idea.
We wrote some deployment scripts in Powershell that are run by CodeDeploy to stop and start our IIS web servers. These scripts work in conjunction with the Elastic Load Balancing (ELB) support in CodeDeploy because we certainly didn’t want to stop the web server while it’s serving customer traffic.
New fleet
Because our fleet is running on Amazon EC2, we built an Amazon Machine Image (AMI) for our web fleet with the CodeDeploy agent already installed. From a fleet perspective, this was most of what we had to do. With the agent installed, CodeDeploy can use our deployment scripts to deploy our web projects.
AWS CloudFormation deployment pipeline
Because we needed a deployment pipeline and several pieces of CodeDeploy configuration (a CodeDeploy application and at least one CodeDeploy deployment group) for each web project we want to deploy, we decided to use AWS CloudFormation to version this configuration. Our build system (TeamCity) can read from our version control system and write to Amazon S3. We made a simple build in TeamCity to push an AWS CloudFormation template to S3, which triggers a pipeline that deploys to AWS CloudFormation. This creates the CodePipeline and CodeDeploy resources. Now we can do code reviews on our infrastructure changes. More eyes is more safety! We can also trace infrastructure changes over time, just like we can with code changes.
Live traffic time!
Our web fleets run behind Classic Load Balancers. By choosing CodeDeploy, we can use its sweet new ELB features. For example, through an elastic load balancer, CodeDeploy can prevent internet traffic from being routed to an instance while it is being deployed to. After the deployment to that instance is complete, it then makes the instance available for traffic.
We launched new hosts with the CodeDeploy agent and deployed to them without ELB support turned on. Then we slowly, manually introduced them into our fleet while monitoring stats. After we had the new machines in, we slowly removed the old ones from the load balancer until our fleet was fully supported by CodeDeploy. Doing the migration this way resulted in 0 downtime for our sites.
One fun detail: When we had 2/3 of the new fleet in our load balancer, we triggered a CodeDeploy deployment to the fleet, but this time with ELB support turned on. This caused CodeDeploy to place the rest of the machines into the load balancer (coexisting with the old fleet), and there were slightly fewer buttons to press.
AWS CloudFormation example
This is a simplified example of the AWS CloudFormation template we use to manage the AWS configuration for one of our web projects. It is deployed in a deployment pipeline, much like the web projects themselves.
Parameters:
CodePipelineBucket:
Type: String
CodePipelineRole:
Type: String
CodeDeployRole:
Type: String
CodeDeployBucket:
Type: String
Resources:
### Woot.Example deployment configuration ###
ExampleDeploymentConfig:
Type: 'AWS::CodeDeploy::DeploymentConfig'
Properties:
MinimumHealthyHosts:
Type: FLEET_PERCENT
Value: '66' # Let's keep 2/3 of the fleet healthy at any point
#Woot.Example CodeDeploy application
WootExampleApplication:
Type: "AWS::CodeDeploy::Application"
Properties:
ApplicationName: "Woot.Example"
#Woot.Example CodeDeploy deployment groups
WootExampleDeploymentGroup:
DependsOn: "WootExampleApplication"
Type: "AWS::CodeDeploy::DeploymentGroup"
Properties:
ApplicationName: "Woot.Example"
DeploymentConfigName: !Ref "ExampleDeploymentConfig" # use the deployment configuration we created
DeploymentGroupName: "Woot.Example.Main"
AutoRollbackConfiguration:
Enabled: true
Events:
- DEPLOYMENT_FAILURE # this makes the deployment rollback when the deployment hits the failure threshold
- DEPLOYMENT_STOP_ON_REQUEST # this makes the deployment rollback if you hit the stop button
LoadBalancerInfo:
ElbInfoList:
- Name: "WootExampleInternal" # this is the ELB the hosts live in, they will be added and removed from here
DeploymentStyle:
DeploymentOption: "WITH_TRAFFIC_CONTROL" # this tells CodeDeploy to actually add/remove the hosts from the ELB
Ec2TagFilters:
-
Key: "Name"
Value: "exampleweb*" # deploy to all machines named like exampleweb001.yourdomain.com, etc
Type: "KEY_AND_VALUE"
ServiceRoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/${CodeDeployRole}" # this is the IAM role CodeDeploy in your account should use
#Woot.Example CodePipeline
WootExampleDeploymentPipeline:
DependsOn: "WootExampleDeploymentGroup"
Type: "AWS::CodePipeline::Pipeline"
Properties:
RoleArn: !Sub "arn:aws:iam::${AWS::AccountId}:role/${CodePipelineRole}" # this is the IAM role CodePipeline in your account should use
Name: "Woot.Example" # name of the pipeline
ArtifactStore:
Type: S3
Location: !Ref "CodePipelineBucket" # the bucket CodePipeline uses in your account to shuffle artifacts around
Stages:
-
Name: Source # one S3 source stage
Actions:
-
Name: SourceAction
ActionTypeId:
Category: Source
Owner: AWS
Version: 1
Provider: S3
OutputArtifacts:
-
Name: SourceOutput
Configuration:
S3Bucket: !Ref "CodeDeployBucket" # the S3 bucket your builds go into (needs to be versioned)
S3ObjectKey: "Woot.Example/Woot.Example-Release.zip" # build artifact path for this application
RunOrder: 1
-
Name: Deploy # one deploy stage that triggers the CodeDeploy deployment
Actions:
-
Name: DeployAction
ActionTypeId:
Category: Deploy
Owner: AWS
Version: 1
Provider: CodeDeploy
InputArtifacts:
-
Name: SourceOutput
Configuration:
ApplicationName: "Woot.Example"
DeploymentGroupName: "Woot.Example.Main"
RunOrder: 2
Appspec.yml example
This is the appspec.yml file we use for our main web front end (Woot.Web.Retail). The appspec.yml file tells CodeDeploy where to put files and when to run our deployment scripts.
Because our server-launching infrastructure doesn’t use CodeDeploy for initial placement of the build artifacts, CodeDeploy won’t overwrite the files. (The service has no knowledge of them.) This is both good and bad: good because CodeDeploy won’t overwrite files it didn’t write, and bad because it means we have to have a deployment script like clearWebsiteDeployment.ps1:
# restart script as 64bit powershell if it's 32 bit
if ($PSHOME -like "*SysWOW64*") {
& (Join-Path ($PSHOME -replace "SysWOW64", "SysNative") powershell.exe) -File `
(Join-Path $PSScriptRoot $MyInvocation.MyCommand) @args
Exit $LastExitCode
}
Import-Module WebAdministration
Stop-Website -name $env:APPLICATION_NAME
#sleep for 10 seconds to give IIS a chance to stop
Start-Sleep -s 10
$website = Get-Website -name "*$env:APPLICATION_NAME*"
if ($website.state -ne 'stopped') {
throw "The website cannot be stopped"
}
startWebsite.ps1:
# restart script as 64bit powershell if it's 32 bit
if ($PSHOME -like "*SysWOW64*") {
& (Join-Path ($PSHOME -replace "SysWOW64", "SysNative") powershell.exe) -File `
(Join-Path $PSScriptRoot $MyInvocation.MyCommand) @args
Exit $LastExitCode
}
Import-Module WebAdministration
Start-Website -name $env:APPLICATION_NAME
#sleep for 10 seconds to give IIS a chance to start
Start-Sleep -s 10
$website = Get-Website -name "*$env:APPLICATION_NAME*"
if ($website.state -ne 'started') {
throw "The website cannot be started"
}
You can see we used a CodeDeploy environment variable ($env:APPLICATION_NAME) in our scripts. The name of the CodeDeploy application is also the name of the IIS website. This way, we can use the same deployment scripts for multiple websites.
The new hotness
Now that we’re running CodeDeploy in production we are extremely pleased with the results. Our old deployment agent, Troop, did not give us much control over the way releases went out. Now we can check on a deployment at a per-instance level, and the opportunities for automation are impressive.
After our migration, we saw a 50% reduction in HTTP 500 errors served to customers during deployments. We looked at the one-hour time slices during a deployment and compared the average count before and after our migration to CodeDeploy. These numbers show our old deployment system was hella busted (really broken).
This graph shows summed deployment errors over time and compares CodeDeploy to our legacy in-house deployment agent (Troop).
CodeDeploy vs Troop
We plan to implement a full cross-account release process on AWS deployment tools. We will have a single AWS account with a pipeline that controls CodeDeploy in our various environments, triggering tests and promoting to the next environment as they pass. Building something like that with our own tooling would take a lot of work. Thanks to CodeDeploy, CodePipeline, and AWS CloudFormation for making our lives easier.
Before our beloved SpaceDave left the Raspberry Pi Foundation to join the ranks of the European Space Agency (ESA) — and no, we’re still not jealous *ahem* — he kindly drafted us one final blog post about the Astro Pi upgrades heading to the International Space Station today! So here it is. Enjoy!
We are very excited to announce that Astro Pi upgrades are on their way to the International Space Station! Back in September, we blogged about a small payload being launched to the International Space Station to upgrade the capabilities of our Astro Pi units.
Sneak peek
For the longest time, the payload was scheduled to be launched on SpaceX CRS 14 in February. However, the launch was delayed to April and so impacted the flight operations we have planned for running Mission Space Lab student experiments.
To avoid this, ESA had the payload transferred to Russian Soyuz MS-08 (54S), which is launching today to carry crew members Oleg Artemyev, Andrew Feustel, and Ricky Arnold to the ISS.
You can watch coverage of the launch on NASA TV from 4.30pm GMT this afternoon, with the launch scheduled for 5.44pm GMT. Check the NASA TV schedule for updates.
The upgrades
The pictures below show the flight hardware in its final configuration before loading onto the launch vehicle.
All access
With the wireless dongle, the Astro Pi units can be deployed in ISS locations other than the Columbus module, where they don’t have access to an Ethernet switch.
We are also sending some flexible optical filters. These are made from the same material as the blue square which is shipped with the Raspberry Pi NoIR Camera Module.
#bluefilter
So that future Astro Pi code will need to command fewer windows to download earth observation imagery to the ground, we’re also including some 32GB micro SD cards to replace the current 8GB cards.
More space in space
Tthe items above are enclosed in a large 8″ ziplock bag that has been designated the “AstroPi Kit”.
It’s ziplock bags all the way down up
Once the Soyuz docks with the ISS, this payload is one of the first which will be unpacked, so that the Astro Pi units can be upgraded and deployed ready to run your experiments!
More Astro Pi
Stay tuned for our next update in April, when student code is set to be run on the Astro Pi units as part of our Mission Space Lab programme. And to find out more about Astro Pi, head to the programme website.
The worst thing for a computer user has happened. The hard drive on your computer crashed, or your computer is lost or completely unusable.
Fortunately, you’re a Backblaze customer with a current backup in the cloud. That’s great. The challenge is that you’ve got a presentation to make in just 48 hours and the document and materials you need for the presentation were on the hard drive that crashed.
Relax. Backblaze has your data (and your back). The question is, how do you get what you need to make that presentation deadline?
Here are some strategies you could use.
One — The first approach is to get back the presentation file and materials you need to meet your presentation deadline as quickly as possible. You can use another computer (maybe even your smartphone) to make that presentation.
Two — The second approach is to get your computer (or a new computer, if necessary) working again and restore all the files from your Backblaze backup.
Let’s start with Option One, which gets you back to work with just the files you need now as quickly as possible.
Option One — You’ve Got a Deadline and Just Need Your Files
Getting Back to Work Immediately
You want to get your computer working again as soon as possible, but perhaps your top priority is getting access to the files you need for your presentation. The computer can wait.
Find a Computer to Use
First of all. You’re going to need a computer to use. If you have another computer handy, you’re all set. If you don’t, you’re going to need one. Here are some ideas on where to find one:
Family and Friends
Work
Neighbors
Local library
Local school
Community or religious organization
Local computer shop
Online store
If you have a smartphone that you can use to give your presentation or to print materials, that’s great. With the Backblaze app for iOS and Android, you can download files directly from your Backblaze account to your smartphone. You also have the option with your smartphone to email or share files from your Backblaze backup so you can use them elsewhere.
Download The File(s) You Need
Once you have the computer, you need to connect to your Backblaze backup through a web browser or the Backblaze smartphone app.
Backblaze Web Admin
Sign into your Backblaze account. You can download the files directly or use the share link to share files with yourself or someone else.
If you have an iOS or Android smartphone, you can use the Backblaze app and retrieve the files you need. You then could view the file on your phone, use a smartphone app with the file, or email it to yourself or someone else.
Backblaze Smartphone app (iOS)
Using one of the approaches above, you got your files back in time for your presentation. Way to go!
Now, the next step is to get the computer with the bad drive running again and restore all your files, or, if that computer is no longer usable, restore your Backblaze backup to a new computer.
Option Two — You Need a Working Computer Again
Getting the Computer with the Failed Drive Running Again (or a New Computer)
If the computer with the failed drive can’t be saved, then you’re going to need a new computer. A new computer likely will come with the operating system installed and ready to boot. If you’ve got a running computer and are ready to restore your files from Backblaze, you can skip forward to Restore the Files to the Drive.
If you need to replace the hard drive in your computer before you restore your files, you can continue reading.
Buy a New Hard Drive to Replace the Failed Drive
The hard drive is gone, so you’re going to need a new drive. If you have a computer or electronics store nearby, you could get one there. Another choice is to order a drive online and pay for one or two-day delivery. You have a few choices:
Buy a hard drive of the same type and size you had
Upgrade to a drive with more capacity
Upgrade to an SSD. SSDs cost more but they are faster, more reliable, and less susceptible to jolts, magnetic fields, and other hazards that can affect a drive. Otherwise, they work the same as a hard disk drive (HDD) and most likely will work with the same connector.
Hard Disk Drive (HDD)
Solid State Drive (SSD)
Be sure that the drive dimensions are compatible with where you’re going to install the drive in your computer, and the drive connector is compatible with your computer system (SATA, PCIe, etc.) Here’s some help.
Install the Drive
If you’re handy with computers, you can install the drive yourself. It’s not hard, and there are numerous videos on YouTube and elsewhere on how to do this. Just be sure to note how everything was connected so you can get everything connected and put back together correctly. Also, be sure that you discharge any static electricity from your body by touching something metallic before you handle anything inside the computer. If all this sounds like too much to handle, find a friend or a local computer store to help you.
Note: If the drive that failed is a boot drive for your operating system (either Macintosh or Windows), you need to make sure that the drive is bootable and has the operating system files on it. You may need to reinstall from an operating system source disk or install files.
Once logged in, you will be brought to the account Overview page. On this page, all of the computers registered for backup under your account are shown with some basic information about each. Select the backup from which you wish to restore data by using the appropriate “Restore” button.
Selecting the Type of Restore
Backblaze offers three different ways in which you can receive your restore data: downloadable ZIP file, USB flash drive, or USB hard drive. The downloadable ZIP restore option will create a ZIP file of the files you request that is made available for download for 7 days. ZIP restores do not have any additional cost and are a great option for individual files or small sets of data.
Depending on the speed of your internet connection to the Backblaze data center, downloadable restores may not always be the best option for restoring very large amounts of data. ZIP restores are limited to 500 GB per request and a maximum of 5 active requests can be submitted under a single account at any given time.
USB flash and hard drive restores are built with the data you request and then shipped to an address of your choosing via FedEx Overnight or FedEx Priority International. USB flash restores cost $99 and can contain up to 128 GB (110,000 MB of data) and USB hard drive restores cost $189 and can contain up to 4TB max (3,500,000 MB of data). Both include the cost of shipping.
You can return the ZIP drive within 30 days for a full refund with our Restore Return Refund Program, effectively making the process of restoring free, even with a shipped USB drive.
Selecting Files for Restore
Using the left hand file viewer, navigate to the location of the files you wish to restore. You can use the disclosure triangles to see subfolders. Clicking on a folder name will display the folder’s files in the right hand file viewer. If you are attempting to restore files that have been deleted or are otherwise missing or files from a failed or disconnected secondary or external hard drive, you may need to change the time frame parameters.
Put checkmarks next to disks, files or folders you’d like to recover. Once you have selected the files and folders you wish to restore, select the “Continue with Restore” button above or below the file viewer. Backblaze will then build the restore via the option you select (ZIP or USB drive). You’ll receive an automated email notifying you when the ZIP restore has been built and is ready for download or when the USB restore drive ships.
If you are using the downloadable ZIP option, and the restore is over 2 GB, we highly recommend using the Backblaze Downloader for better speed and reliability. We have a guide on using the Backblaze Downloader for Mac OS X or for Windows.
Recent versions of both macOS and Windows have built-in capability to extract files from a ZIP archive. If the built-in capabilities aren’t working for you, you can find additional utilities for Macintosh and Windows.
Reactivating your Backblaze Account
Now that you’ve got a working computer again, you’re going to need to reinstall Backblaze Backup (if it’s not on the system already) and connect with your existing account. Start by downloading and reinstalling Backblaze.
If you’ve restored the files from your Backblaze Backup to your new computer or drive, you don’t want to have to reupload the same files again to your Backblaze backup. To let Backblaze know that this computer is on the same account and has the same files, you need to use “Inherit Backup State.” See https://help.backblaze.com/hc/en-us/articles/217666358-Inherit-Backup-State
That’s It
You should be all set, either with the files you needed for your presentation, or with a restored computer that is again ready to do productive work.
We hope your presentation wowed ’em.
If you have any additional questions on restoring from a Backblaze backup, please ask away in the comments. Also, be sure to check out our help resources at https://www.backblaze.com/help.html.
***This blog is authored by Mike Okner of Monsanto, an AWS customer. It originally appeared on the Monsanto company blog. Minor edits were made to the original post.***
Recently, I was looking to create a status page app to monitor a few important internal services. I wanted this app to be as lightweight, reliable, and hassle-free as possible, so using a “serverless” architecture that doesn’t require any patching or other maintenance was quite appealing.
I also don’t deploy anything in a production AWS environment outside of some sort of template (usually CloudFormation) as a rule. I don’t want to have to come back to something I created ad hoc in the console after 6 months and try to recall exactly how I architected all of the resources. I’ll inevitably forget something and create more problems before solving the original one. So building the status page in a template was a requirement.
The Design I settled on a design using two Lambda functions, both written in Python 3.6.
The first Lambda function makes requests out to a list of important services and writes their current status to a DynamoDB table. This function is executed once per minute via CloudWatch Event Rule.
The second Lambda function reads each service’s status & uptime information from DynamoDB and renders a Jinja template. This function is behind an API Gateway that has been configured to return text/html instead of its default application/json Content-Type.
The CloudFormation Template AWS provides a Serverless Application Model template transformer to streamline the templating of Lambda + API Gateway designs, but it assumes (like everything else about the API Gateway) that you’re actually serving an API that returns JSON content. So, unfortunately, it won’t work for this use-case because we want to return HTML content. Instead, we’ll have to enumerate every resource like usual.
The Skeleton We’ll be using YAML for the template in this example. I find it easier to read than JSON, but you can easily convert between the two with a converter if you disagree.
The CheckerLambda is the actual Lambda function. The Code section is a local path to a ZIP file containing the code and its dependencies. I’m using CloudFormation’s packaging feature to automatically push the deployable to S3.
The CheckerLambdaRole is the IAM role the Lambda will assume which grants it access to DynamoDB in addition to the usual Lambda logging permissions.
The CheckerLambdaTimer is the CloudWatch Events Rule that triggers the checker to run once per minute.
The CheckerLambdaTimerPermission grants CloudWatch the ability to invoke the checker Lambda function on its interval.
The Web Page Gateway The API Gateway handles incoming requests for the web page, invokes the Lambda, and then returns the Lambda’s results as HTML content. Its template looks like:
There’s a lot going on here, but the real meat is in the PageGatewayMethod section. There are a couple properties that deviate from the default which is why we couldn’t use the SAM transformer.
First, we’re passing request headers through to the Lambda in theRequestTemplates section. I’m doing this so I can validate incoming auth headers. The API Gateway can do some types of auth, but I found it easier to check auth myself in the Lambda function since the Gateway is designed to handle API calls and not browser requests.
Next, note that in the IntegrationResponses section we’re defining the Content-Type header to be ‘text/html’ (with single-quotes) and defining the ResponseTemplate to be $input.path(‘$’). This is what makes the request render as a HTML page in your browser instead of just raw text.
Due to the StageName and PathPart values in the other sections, your actual page will be accessible at https://someId.execute-api.region.amazonaws.com/Prod/page. I have the page behind an existing reverse-proxy and give it a saner URL for end-users. The reverse proxy also attaches the auth header I mentioned above. If that header isn’t present, the Lambda will render an error page instead so the proxy can’t be bypassed.
The Web Page Rendering Lambda This Lambda is invoked by calls to the API Gateway and looks like:
The WebRenderLambda and WebRenderLambdaRole should look familiar.
The WebRenderLambdaGatewayPermission is similar to the Status Checker’s CloudWatch permission, only this time it allows the API Gateway to invoke this Lambda.
The DynamoDB Table This one is straightforward.
# DynamoDB table
DynamoTable:
Type: AWS::DynamoDB::Table
Properties:
AttributeDefinitions:
- AttributeName: name
AttributeType: S
ProvisionedThroughput:
WriteCapacityUnits: 1
ReadCapacityUnits: 1
TableName: status-page-checker-results
KeySchema:
- KeyType: HASH
AttributeName: name
The Deployment We’ve made it this far defining every resource in a template that we can check in to version control, so we might as well script the deployment as well rather than manually manage the CloudFormation Stack via the AWS web console.
Since I’m using the packaging feature, I first run:
$ aws cloudformation package \
--template-file template.yaml \
--s3-bucket <some-bucket-name> \
--output-template-file template-packaged.yaml
Uploading to 34cd6e82c5e8205f9b35e71afd9e1548 1922559 / 1922559.0 (100.00%) Successfully packaged artifacts and wrote output template to file template-packaged.yaml.
Then to deploy the template (whether new or modified), I run:
$ aws cloudformation deploy \
--region '<aws-region>' \
--template-file template-packaged.yaml \
--stack-name '<some-name>' \
--capabilities CAPABILITY_IAM
Waiting for changeset to be created.. Waiting for stack create/update to complete Successfully created/updated stack - <some-name>
And that’s it! You’ve just created a dynamic web page that will never require you to SSH anywhere, patch a server, recover from a disaster after Amazon terminates your unhealthy EC2, or any other number of pitfalls that are now the problem of some ops person at AWS. And you can reproduce deployments and make changes with confidence because everything is defined in the template and can be tracked in version control.
As polyfloyd’s blog post for the project notes, Sebastius led the way with the hardware portion of the build. The cube is made from six LED panels driven by a Raspberry Pi, and uses a breakout board to support the panels, which are connected in pairs:
The displays are connected in 3 chains, the maximum number of parallel chains the board supports, of 2 panels each. Having a higher degree of parallelization increases the refresh rate which in turn improves the overall image quality.
The first two chains make up the 4 sides. The remaining chain makes up the top and bottom of the cube.
Sebastius removed the plastic frames that come as standard on the panels, in order to allow them to fit together snugly as a cube. He designed and laser-cut a custom frame from plywood to support the panels instead.
Software
The team used hzeller’s software to drive the panels, and polyfloyd wrote their own program to “shove the pixels around”. polyfloyd used Ledcat, software they had made to drive previous LED projects, and adapted this interface so programs written for Ledcat would also work with hzeller’s library.
The full code for the project can be found on polyfloyd’s GitHub profile. It includes the ability to render animations to gzipped files, and to stream animations in real time via SSH.
Mapping 2D and spherical images with shaders
“One of the programs that could work with my LED-panels through [Unix] pipes was Shady,” observes polyfloyd, explaining the use of shaders with the cube. “The program works by rendering OpenGL fragment shaders to an RGB24 format which could then be piped to wherever needed. These shaders are small programs that can render an image by calculating the color for each pixel on the screen individually.”
The team programmed a shader to map the two-dimensional position of pixels in an image to the three-dimensional space of the cube. This then allowed the team to apply the mapping to spherical images, such as the globe in the video below:
The team has interesting plans for the cube moving forward, including the addition of an accelerometer and batteries. Follow their progress on the polyfloyd blog.
Nathan Willis looks beyond open web fonts on opensource.com. “For starters, it’s critical to understand that Google Fonts and Open Font Library offer a specialized service—delivering fonts in web pages—and they don’t implement solutions for other use cases. That is not a shortcoming on the services’ side; it simply means that we have to develop other solutions.
There are a number of problems to solve. Probably the most obvious example is the awkwardness of installing fonts on a desktop Linux machine for use in other applications. You can download any of the web fonts offered by either service, but all you will get is a generic ZIP file with some TTF or OTF binaries inside and a plaintext license file. What happens next is up to you to guess.”
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.