Earlier this month we launched the C5 Instances with Local NVMe Storage and I told you that we would be doing the same for additional instance types in the near future!
Today we are introducing M5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for workloads that require a balance of compute and memory resources. Here are the specs:
Instance Name
vCPUs
RAM
Local Storage
EBS-Optimized Bandwidth
Network Bandwidth
m5d.large
2
8 GiB
1 x 75 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
m5d.xlarge
4
16 GiB
1 x 150 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
m5d.2xlarge
8
32 GiB
1 x 300 GB NVMe SSD
Up to 2.120 Gbps
Up to 10 Gbps
m5d.4xlarge
16
64 GiB
1 x 600 GB NVMe SSD
2.210 Gbps
Up to 10 Gbps
m5d.12xlarge
48
192 GiB
2 x 900 GB NVMe SSD
5.0 Gbps
10 Gbps
m5d.24xlarge
96
384 GiB
4 x 900 GB NVMe SSD
10.0 Gbps
25 Gbps
The M5d instances are powered by Custom Intel® Xeon® Platinum 8175M series processors running at 2.5 GHz, including support for AVX-512.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.
Here are a couple of things to keep in mind about the local NVMe storage on the M5d instances:
Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.
Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.
Available Now M5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent M5 instances.
Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!
AWS re:Invent June 13, 2018 | 05:00 PM – 05:30 PM PT – Episode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar. Compute
Containers June 25, 2018 | 09:00 AM – 09:45 AM PT – Running Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.
June 19, 2018 | 11:00 AM – 11:45 AM PT – Launch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new
June 21, 2018 | 01:00 PM – 01:45 PM PT – Enabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.
June 28, 2018 | 01:00 PM – 01:45 PM PT – Fireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device. IoT
June 27, 2018 | 11:00 AM – 11:45 AM PT – AWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.
Mobile June 25, 2018 | 11:00 AM – 11:45 AM PT – Drive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.
June 26, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services. June 27, 2018 | 01:00 PM – 01:45 PM PT – Changing the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances. June 28, 2018 | 11:00 AM – 11:45 AM PT – Big Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.
Today, at the AWS Summit in Tokyo we announced a number of updates and new features for Amazon SageMaker. Starting today, SageMaker is available in Asia Pacific (Tokyo)! SageMaker also now supports CloudFormation. A new machine learning framework, Chainer, is now available in the SageMaker Python SDK, in addition to MXNet and Tensorflow. Finally, support for running Chainer models on several devices was added to AWS Greengrass Machine Learning.
Amazon SageMaker Chainer Estimator
Chainer is a popular, flexible, and intuitive deep learning framework. Chainer networks work on a “Define-by-Run” scheme, where the network topology is defined dynamically via forward computation. This is in contrast to many other frameworks which work on a “Define-and-Run” scheme where the topology of the network is defined separately from the data. A lot of developers enjoy the Chainer scheme since it allows them to write their networks with native python constructs and tools.
Luckily, using Chainer with SageMaker is just as easy as using a TensorFlow or MXNet estimator. In fact, it might even be a bit easier since it’s likely you can take your existing scripts and use them to train on SageMaker with very few modifications. With TensorFlow or MXNet users have to implement a train function with a particular signature. With Chainer your scripts can be a little bit more portable as you can simply read from a few environment variables like SM_MODEL_DIR, SM_NUM_GPUS, and others. We can wrap our existing script in a if __name__ == '__main__': guard and invoke it locally or on sagemaker.
import argparse
import os
if __name__ =='__main__':
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script.
parser.add_argument('--epochs', type=int, default=10)
parser.add_argument('--batch-size', type=int, default=64)
parser.add_argument('--learning-rate', type=float, default=0.05)
# Data, model, and output directories
parser.add_argument('--output-data-dir', type=str, default=os.environ['SM_OUTPUT_DATA_DIR'])
parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR'])
parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN'])
parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST'])
args, _ = parser.parse_known_args()
# ... load from args.train and args.test, train a model, write model to args.model_dir.
Then, we can run that script locally or use the SageMaker Python SDK to launch it on some GPU instances in SageMaker. The hyperparameters will get passed in to the script as CLI commands and the environment variables above will be autopopulated. When we call fit the input channels we pass will be populated in the SM_CHANNEL_* environment variables.
from sagemaker.chainer.estimator import Chainer
# Create my estimator
chainer_estimator = Chainer(
entry_point='example.py',
train_instance_count=1,
train_instance_type='ml.p3.2xlarge',
hyperparameters={'epochs': 10, 'batch-size': 64}
)
# Train my estimator
chainer_estimator.fit({'train': train_input, 'test': test_input})
# Deploy my estimator to a SageMaker Endpoint and get a Predictor
predictor = chainer_estimator.deploy(
instance_type="ml.m4.xlarge",
initial_instance_count=1
)
Now, instead of bringing your own docker container for training and hosting with Chainer, you can just maintain your script. You can see the full sagemaker-chainer-containers on github. One of my favorite features of the new container is built-in chainermn for easy multi-node distribution of your chainer training jobs.
There’s a lot more documentation and information available in both the README and the example notebooks.
AWS GreenGrass ML with Chainer
AWS GreenGrass ML now includes a pre-built Chainer package for all devices powered by Intel Atom, NVIDIA Jetson, TX2, and Raspberry Pi. So, now GreenGrass ML provides pre-built packages for TensorFlow, Apache MXNet, and Chainer! You can train your models on SageMaker then easily deploy it to any GreenGrass-enabled device using GreenGrass ML.
JAWS UG
I want to give a quick shout out to all of our wonderful and inspirational friends in the JAWS UG who attended the AWS Summit in Tokyo today. I’ve very much enjoyed seeing your pictures of the summit. Thanks for making Japan an amazing place for AWS developers! I can’t wait to visit again and meet with all of you.
Amazon Neptune is now Generally Available in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland). Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. At the core of Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with millisecond latencies. Neptune supports two popular graph models, Property Graph and RDF, through Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune can be used to power everything from recommendation engines and knowledge graphs to drug discovery and network security. Neptune is fully-managed with automatic minor version upgrades, backups, encryption, and fail-over. I wrote about Neptune in detail for AWS re:Invent last year and customers have been using the preview and providing great feedback that the team has used to prepare the service for GA.
Now that Amazon Neptune is generally available there are a few changes from the preview:
A large number of performance enhancements and updates
Launching a Neptune cluster is as easy as navigating to the AWS Management Console and clicking create cluster. Of course you can also launch with CloudFormation, the CLI, or the SDKs.
You can monitor your cluster health and the health of individual instances through Amazon CloudWatch and the console.
Additional Resources
We’ve created two repos with some additional tools and examples here. You can expect continuous development on these repos as we add additional tools and examples.
Amazon Neptune Tools Repo This repo has a useful tool for converting GraphML files into Neptune compatible CSVs for bulk loading from S3.
Amazon Neptune Samples Repo This repo has a really cool example of building a collaborative filtering recommendation engine for video game preferences.
Purpose Built Databases
There’s an industry trend where we’re moving more and more onto purpose-built databases. Developers and businesses want to access their data in the format that makes the most sense for their applications. As cloud resources make transforming large datasets easier with tools like AWS Glue, we have a lot more options than we used to for accessing our data. With tools like Amazon Redshift, Amazon Athena, Amazon Aurora, Amazon DynamoDB, and more we get to choose the best database for the job or even enable entirely new use-cases. Amazon Neptune is perfect for workloads where the data is highly connected across data rich edges.
I’m really excited about graph databases and I see a huge number of applications. Looking for ideas of cool things to build? I’d love to build a web crawler in AWS Lambda that uses Neptune as the backing store. You could further enrich it by running Amazon Comprehend or Amazon Rekognition on the text and images found and creating a search engine on top of Neptune.
As always, feel free to reach out in the comments or on twitter to provide any feedback!
When I talk with customers and partners, I find that they are in different stages in the adoption of DevOps methodologies. They are automating the creation of application artifacts and the deployment of their applications to different infrastructure environments. In many cases, they are creating and supporting multiple applications using a variety of coding languages and artifacts.
The management of these processes and artifacts can be challenging, but using the right tools and methodologies can simplify the process.
In this post, I will show you how you can automate the creation and storage of application artifacts through the implementation of a pipeline and custom deploy action in AWS CodePipeline. The example includes a Node.js code base stored in an AWS CodeCommit repository. A Node Package Manager (npm) artifact is built from the code base, and the build artifact is published to a JFrogArtifactory npm repository.
I frequently recommend AWS CodePipeline, the AWS continuous integration and continuous delivery tool. You can use it to quickly innovate through integration and deployment of new features and bug fixes by building a workflow that automates the build, test, and deployment of new versions of your application. And, because AWS CodePipeline is extensible, it allows you to create a custom action that performs customized, automated actions on your behalf.
JFrog’s Artifactory is a universal binary repository manager where you can manage multiple applications, their dependencies, and versions in one place. Artifactory also enables you to standardize the way you manage your package types across all applications developed in your company, no matter the code base or artifact type.
If you already have a Node.js CodeCommit repository, a JFrog Artifactory host, and would like to automate the creation of the pipeline, including the custom action and CodeBuild project, you can use this AWS CloudFormationtemplate to create your AWS CloudFormation stack.
This figure shows the path defined in the pipeline for this project. It starts with a change to Node.js source code committed to a private code repository in AWS CodeCommit. With this change, CodePipeline triggers AWS CodeBuild to create the npm package from the node.js source code. After the build, CodePipeline triggers the custom action job worker to commit the build artifact to the designated artifact repository in Artifactory.
This blog post assumes you have already:
· Created a CodeCommit repository that contains a Node.js project.
· Configured a two-stage pipeline in AWS CodePipeline.
The Source stage of the pipeline is configured to poll the Node.js CodeCommit repository. The Build stage is configured to use a CodeBuild project to build the npm package using a buildspec.yml file located in the code repository.
If you do not have a Node.js repository, you can create a CodeCommit repository that contains this simple ‘Hello World’ project. This project also includes a buildspec.yml file that is used when you define your CodeBuild project. It defines the steps to be taken by CodeBuild to create the npm artifact.
If you do not already have a pipeline set up in CodePipeline, you can use this template to create a pipeline with a CodeCommit source action and a CodeBuild build action through the AWS Command Line Interface (AWS CLI). If you do not want to install the AWS CLI on your local machine, you can use AWS Cloud9, our managed integrated development environment (IDE), to interact with AWS APIs.
In your development environment, open your favorite editor and fill out the template with values appropriate to your project. For information, see the readme in the GitHub repository.
Use this CLI command to create the pipeline from the template:
aws codepipeline create-pipeline – cli-input-json file://source-build-actions-codepipeline.json – region 'us-west-2'
It creates a pipeline that has a CodeCommit source action and a CodeBuild build action.
Integrating JFrog Artifactory
JFrog Artifactory provides default repositories for your project needs. For my NPM package repository, I am using the default virtual npm repository (named npm) that is available in Artifactory Pro. You might want to consider creating a repository per project but for the example used in this post, using the default lets me get started without having to configure a new repository.
I can use the steps in the Set Me Up -> npm section on the landing page to configure my worker to interact with the default NPM repository.
Describes the required values to run the custom action. I will define my custom action in the ‘Deploy’ category, identify the provider as ‘Artifactory’, of version ‘1’, and specify a variety of configurationProperties whose values will be defined when this stage is added to my pipeline.
Polls CodePipeline for a job, scanning for its action-definition properties. In this blog post, after a job has been found, the job worker does the work required to publish the npm artifact to the Artifactory repository.
{
"category": "Deploy",
"configurationProperties": [{
"name": "TypeOfArtifact",
"required": true,
"key": true,
"secret": false,
"description": "Package type, ex. npm for node packages",
"type": "String"
},
{ "name": "RepoKey",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Name of the repository in which this artifact should be stored"
},
{ "name": "UserName",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Username for authenticating with the repository"
},
{ "name": "Password",
"required": true,
"key": true,
"secret": true,
"type": "String",
"description": "Password for authenticating with the repository"
},
{ "name": "EmailAddress",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Email address used to authenticate with the repository"
},
{ "name": "ArtifactoryHost",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Public address of Artifactory host, ex: https://myexamplehost.com or http://myexamplehost.com:8080"
}],
"provider": "Artifactory",
"version": "1",
"settings": {
"entityUrlTemplate": "{Config:ArtifactoryHost}/artifactory/webapp/#/artifacts/browse/tree/General/{Config:RepoKey}"
},
"inputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 1
},
"outputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 0
}
}
There are seven sections to the custom action definition:
category: This is the stage in which you will be creating this action. It can be Source, Build, Deploy, Test, Invoke, Approval. Except for source actions, the category section simply allows us to organize our actions. I am setting the category for my action as ‘Deploy’ because I’m using it to publish my node artifact to my Artifactory instance.
configurationProperties: These are the parameters or variables required for your project to authenticate and commit your artifact. In the case of my custom worker, I need:
TypeOfArtifact: In this case, npm, because it’s for the Node Package Manager.
RepoKey: The name of the repository. In this case, it’s the default npm.
UserName and Password for the user to authenticate with the Artifactory repository.
EmailAddress used to authenticate with the repository.
Artifactory host name or IP address.
provider: The name you define for your custom action stage. I have named the provider Artifactory.
version: Version number for the custom action. Because this is the first version, I set the version number to 1.
entityUrlTemplate: This URL is presented to your users for the deploy stage along with the title you define in your provider. The link takes the user to their artifact repository page in the Artifactory host.
inputArtifactDetails: The number of artifacts to expect from the previous stage in the pipeline.
outputArtifactDetails: The number of artifacts that should be the result from the custom action stage. Later in this blog post, I define 0 for my output artifacts because I am publishing the artifact to the Artifactory repository as the final action.
After I define the custom action in a JSON file, I use the AWS CLI to create the custom action type in CodePipeline:
After I create the custom action type in the same region as my pipeline, I edit the pipeline to add a Deploy stage and configure it to use the custom action I created for Artifactory:
I have created a custom worker for the actions required to commit the npm artifact to the Artifactory repository. The worker is in Python and it runs in a loop on an Amazon EC2 instance. My custom worker polls for a deploy job and publishes the NPM artifact to the Artifactory repository.
The EC2 instance is running Amazon Linux and has an IAM instance role attached that gives the worker permission to access CodePipeline. The worker process is as follows:
Take the configuration properties from the custom worker and poll CodePipeline for a custom action job.
After there is a job in the job queue with the appropriate category, provider, and version, acknowledge the job.
Download the zipped artifact created in the previous Build stage from the provided S3 buckets with the provided temporary credentials.
Unzip the artifact into a temporary directory.
A user-defined Artifactory user name and password is used to receive a temporary API key from Artifactory.
To avoid having to write the password to a file, use that temporary API key and user name to authenticate with the NPM repository.
Publish the Node.js package to the specified repository.
Because I am running my custom worker on an Amazon Linux EC2 instance, I installed npm with the following command:
sudo yum install nodejs npm – enablerepo=epel
For my custom worker, I used pip to install the required Python libraries:
pip install boto3 requests
For a full Python package list, see requirements.txt in the GitHub repository.
Let’s take a look at some of the code snippets from the worker.
First, the worker polls for jobs:
def action_type():
ActionType = {
'category': 'Deploy',
'owner': 'Custom',
'provider': 'Artifactory',
'version': '1' }
return(ActionType)
def poll_for_jobs():
try:
artifactory_action_type = action_type()
print(artifactory_action_type)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
while not jobs['jobs']:
time.sleep(10)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
if jobs['jobs']:
print('Job found')
return jobs['jobs'][0]
except ClientError as e:
print("Received an error: %s" % str(e))
raise
When there is a job in the queue, the poller returns a number of values from the queue such as jobId, the input and output S3 buckets for artifacts, temporary credentials to access the S3 buckets, and other configuration details from the stage in the pipeline.
After successfully receiving the job details, the worker sends an acknowledgement to CodePipeline to ensure that the work on the job is not duplicated by other workers watching for the same job:
def job_acknowledge(jobId, nonce):
try:
print('Acknowledging job')
result = codepipeline.acknowledge_job(jobId=jobId, nonce=nonce)
return result
except Exception as e:
print("Received an error when trying to acknowledge the job: %s" % str(e))
raise
With the job now acknowledged, the worker publishes the source code artifact into the desired repository. The worker gets the value of the artifact S3 bucket and objectKey from the inputArtifacts in the response from the poll_for_jobs API request. Next, the worker creates a new directory in /tmp and downloads the S3 object into this directory:
def get_bucket_location(bucketName, init_client):
region = init_client.get_bucket_location(Bucket=bucketName)['LocationConstraint']
if not region:
region = 'us-east-1'
return region
def get_s3_artifact(bucketName, objectKey, ak, sk, st):
init_s3 = boto3.client('s3')
region = get_bucket_location(bucketName, init_s3)
session = Session(aws_access_key_id=ak,
aws_secret_access_key=sk,
aws_session_token=st)
s3 = session.resource('s3',
region_name=region,
config=botocore.client.Config(signature_version='s3v4'))
try:
tempdirname = tempfile.mkdtemp()
except OSError as e:
print('Could not write temp directory %s' % tempdirname)
raise
bucket = s3.Bucket(bucketName)
obj = bucket.Object(objectKey)
filename = tempdirname + '/' + objectKey
try:
if os.path.dirname(objectKey):
directory = os.path.dirname(filename)
os.makedirs(directory)
print('Downloading the %s object and writing it to disk in %s location' % (objectKey, tempdirname))
with open(filename, 'wb') as data:
obj.download_fileobj(data)
except ClientError as e:
print('Downloading the object and writing the file to disk raised this error: ' + str(e))
raise
return(filename, tempdirname)
Because the downloaded artifact from S3 is a zip file, the worker must unzip it first. To have a clean area in which to work, I extract the downloaded zip archive into a new directory:
def unzip_codepipeline_artifact(artifact, origtmpdir):
# create a new temp directory
# Unzip artifact into new directory
try:
newtempdir = tempfile.mkdtemp()
print('Extracting artifact %s into temporary directory %s' % (artifact, newtempdir))
zip_ref = zipfile.ZipFile(artifact, 'r')
zip_ref.extractall(newtempdir)
zip_ref.close()
shutil.rmtree(origtmpdir)
return(os.listdir(newtempdir), newtempdir)
except OSError as e:
if e.errno != errno.EEXIST:
shutil.rmtree(newtempdir)
raise
The worker now has the npm package that I want to store in my Artifactory NPM repository.
To authenticate with the NPM repository, the worker requests a temporary token from the Artifactory host. After receiving this temporary token, it creates a .npmrc file in the worker user’s home directory that includes a hash of the user name and temporary token. After it has authenticated, the worker runs npm config set registry <URL OF REPOSITORY> to configure the npm registry value to be the Artifactory host. Next, the worker runs npm publish –registry <URL OF REPOSITORY>, which publishes the node package to the NPM repository in the Artifactory host.
def push_to_npm(configuration, artifact_list, temp_dir, jobId):
reponame = configuration['RepoKey']
art_type = configuration['TypeOfArtifact']
print("Putting artifact into NPM repository " + reponame)
token, hostname, username = gen_artifactory_auth_token(configuration)
npmconfigfile = create_npmconfig_file(configuration, username, token)
url = hostname + '/artifactory/api/' + art_type + '/' + reponame
print("Changing directory to " + str(temp_dir))
os.chdir(temp_dir)
try:
print("Publishing following files to the repository: %s " % os.listdir(temp_dir))
print("Sending artifact to Artifactory NPM registry URL: " + url)
subprocess.call(["npm", "config", "set", "registry", url])
req = subprocess.call(["npm", "publish", "--registry", url])
print("Return code from npm publish: " + str(req))
if req != 0:
err_msg = "npm ERR! Recieved non OK response while sending response to Artifactory. Return code from npm publish: " + str(req)
signal_failure(jobId, err_msg)
else:
signal_success(jobId)
except requests.exceptions.RequestException as e:
print("Received an error when trying to commit artifact %s to repository %s: " % (str(art_type), str(configuration['RepoKey']), str(e)))
raise
return(req, npmconfigfile)
If the return value from publishing to the repository is not 0, the worker signals a failure to CodePipeline. If the value is 0, the worker signals success to CodePipeline to indicate that the stage of the pipeline has been completed successfully.
For the custom worker code, see npm_job_worker.py in the GitHub repository.
I run my custom worker on an EC2 instance using the command python npm_job_worker.py, with an optional --version flag that can be used to specify worker versions other than 1. Then I trigger a release change in my pipeline:
From my custom worker output logs, I have just committed a package named node_example at version 1.0.3:
On artifact: index.js
Committing to the repo: https://artifactory.myexamplehost.com/artifactory/api/npm/npm
Sending artifact to Artifactory URL: https:// artifactoryhost.myexamplehost.com/artifactory/api/npm/npm
npm config: 0
npm http PUT https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
npm http 201 https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
+ [email protected]
Return code from npm publish: 0
Signaling success to CodePipeline
After that has been built successfully, I can find my artifact in my Artifactory repository:
To help you automate this process, I have created this AWS CloudFormation template that automates the creation of the CodeBuild project, the custom action, and the CodePipeline pipeline. It also launches the Amazon EC2-based custom job worker in an AWS Auto Scaling group. This template requires you to have a VPC and CodeCommit repository for your Node.js project. If you do not currently have a VPC in which you want to run your custom worker EC2 instances, you can use this AWS QuickStart to create one. If you do not have an existing Node.js project, I’ve provided a sample project in the GitHub repository.
Conclusion
I‘ve shown you the steps to integrate your JFrog Artifactory repository with your CodePipeline workflow. I’ve shown you how to create a custom action in CodePipeline and how to create a custom worker that works in your CI/CD pipeline. To dig deeper into custom actions and see how you can integrate your Artifactory repositories into your AWS CodePipeline projects, check out the full code base on GitHub.
If you have any questions or feedback, feel free to reach out to us through the AWS CodePipeline forum.
Erin McGill is a Solutions Architect in the AWS Partner Program with a focus on DevOps and automation tooling.
As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!
Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.
New C5d Instances with Local Storage In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:
Instance Name
vCPUs
RAM
Local Storage
EBS Bandwidth
Network Bandwidth
c5d.large
2
4 GiB
1 x 50 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.xlarge
4
8 GiB
1 x 100 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.2xlarge
8
16 GiB
1 x 225 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.4xlarge
16
32 GiB
1 x 450 GB NVMe SSD
2.25 Gbps
Up to 10 Gbps
c5d.9xlarge
36
72 GiB
1 x 900 GB NVMe SSD
4.5 Gbps
10 Gbps
c5d.18xlarge
72
144 GiB
2 x 900 GB NVMe SSD
9 Gbps
25 Gbps
Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.
Here are a couple of things to keep in mind about the local NVMe storage:
Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.
Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.
Available Now C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.
The EU’s General Data Protection Regulation (GDPR) describes data processor and data controller roles, and some customers and AWS Partner Network (APN) partners are asking how this affects the long-established AWS Shared Responsibility Model. I wanted to take some time to help folks understand shared responsibilities for us and for our customers in context of the GDPR.
How does the AWS Shared Responsibility Model change under GDPR? The short answer – it doesn’t. AWS is responsible for securing the underlying infrastructure that supports the cloud and the services provided; while customers and APN partners, acting either as data controllers or data processors, are responsible for any personal data they put in the cloud. The shared responsibility model illustrates the various responsibilities of AWS and our customers and APN partners, and the same separation of responsibility applies under the GDPR.
AWS responsibilities as a data processor
The GDPR does introduce specific regulation and responsibilities regarding data controllers and processors. When any AWS customer uses our services to process personal data, the controller is usually the AWS customer (and sometimes it is the AWS customer’s customer). However, in all of these cases, AWS is always the data processor in relation to this activity. This is because the customer is directing the processing of data through its interaction with the AWS service controls, and AWS is only executing customer directions. As a data processor, AWS is responsible for protecting the global infrastructure that runs all of our services. Controllers using AWS maintain control over data hosted on this infrastructure, including the security configuration controls for handling end-user content and personal data. Protecting this infrastructure, is our number one priority, and we invest heavily in third-party auditors to test our security controls and make any issues they find available to our customer base through AWS Artifact. Our ISO 27018 report is a good example, as it tests security controls that focus on protection of personal data in particular.
AWS has an increased responsibility for our managed services. Examples of managed services include Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon Elastic MapReduce, and Amazon WorkSpaces. These services provide the scalability and flexibility of cloud-based resources with less operational overhead because we handle basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery. For most managed services, you only configure logical access controls and protect account credentials, while maintaining control and responsibility of any personal data.
Customer and APN partner responsibilities as data controllers — and how AWS Services can help
Our customers can act as data controllers or data processors within their AWS environment. As a data controller, the services you use may determine how you configure those services to help meet your GDPR compliance needs. For example, AWS Services that are classified as Infrastructure as a Service (IaaS), such as Amazon EC2, Amazon VPC, and Amazon S3, are under your control and require you to perform all routine security configuration and management that would be necessary no matter where the servers were located. With Amazon EC2 instances, you are responsible for managing: guest OS (including updates and security patches), application software or utilities installed on the instances, and the configuration of the AWS-provided firewall (called a security group).
To help you realize data protection by design principles under the GDPR when using our infrastructure, we recommend you protect AWS account credentials and set up individual user accounts with Amazon Identity and Access Management (IAM) so that each user is only given the permissions necessary to fulfill their job duties. We also recommend using multi-factor authentication (MFA) with each account, requiring the use of SSL/TLS to communicate with AWS resources, setting up API/user activity logging with AWS CloudTrail, and using AWS encryption solutions, along with all default security controls within AWS Services. You can also use advanced managed security services, such as Amazon Macie, which assists in discovering and securing personal data stored in Amazon S3.
For more information, you can download the AWS Security Best Practices whitepaper or visit the AWS Security Resources or GDPR Center webpages. In addition to our solutions and services, AWS APN partners can provide hundreds of tools and features to help you meet your security objectives, ranging from network security and configuration management to access control and data encryption.
Join us this month to learn about some of the exciting new services and solution best practices at AWS. We also have our first re:Invent 2018 webinar series, “How to re:Invent”. Sign up now to learn more, we look forward to seeing you.
Note – All sessions are free and in Pacific Time.
Tech talks featured this month:
Analytics & Big Data
May 21, 2018 | 11:00 AM – 11:45 AM PT – Integrating Amazon Elasticsearch with your DevOps Tooling – Learn how you can easily integrate Amazon Elasticsearch Service into your DevOps tooling and gain valuable insight from your log data.
May 24, 2018 | 11:00 AM – 11:45 AM PT – Data Transformation Patterns in AWS – Discover how to perform common data transformations on the AWS Data Lake.
May 30, 2018 | 01:00 PM – 01:45 PM PT – Accelerating Life Sciences with HPC on AWS – Learn how you can accelerate your Life Sciences research workloads by harnessing the power of high performance computing on AWS.
Containers
May 24, 2018 | 01:00 PM – 01:45 PM PT –Building Microservices with the 12 Factor App Pattern on AWS – Learn best practices for building containerized microservices on AWS, and how traditional software design patterns evolve in the context of containers.
Databases
May 21, 2018 | 01:00 PM – 01:45 PM PT – How to Migrate from Cassandra to Amazon DynamoDB – Get the benefits, best practices and guides on how to migrate your Cassandra databases to Amazon DynamoDB.
May 23, 2018 | 01:00 PM – 01:45 PM PT – 5 Hacks for Optimizing MySQL in the Cloud – Learn how to optimize your MySQL databases for high availability, performance, and disaster resilience using RDS.
DevOps
May 23, 2018 | 09:00 AM – 09:45 AM PT – .NET Serverless Development on AWS – Learn how to build a modern serverless application in .NET Core 2.0.
Enterprise & Hybrid
May 22, 2018 | 11:00 AM – 11:45 AM PT – Hybrid Cloud Customer Use Cases on AWS – Learn how customers are leveraging AWS hybrid cloud capabilities to easily extend their datacenter capacity, deliver new services and applications, and ensure business continuity and disaster recovery.
IoT
May 31, 2018 | 11:00 AM – 11:45 AM PT – Using AWS IoT for Industrial Applications – Discover how you can quickly onboard your fleet of connected devices, keep them secure, and build predictive analytics with AWS IoT.
Machine Learning
May 22, 2018 | 09:00 AM – 09:45 AM PT – Using Apache Spark with Amazon SageMaker – Discover how to use Apache Spark with Amazon SageMaker for training jobs and application integration.
May 24, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS DeepLens – Learn how AWS DeepLens provides a new way for developers to learn machine learning by pairing the physical device with a broad set of tutorials, examples, source code, and integration with familiar AWS services.
May 30, 2018 | 09:00 AM – 09:45 AM PT– Introducing AWS Certificate Manager Private Certificate Authority (CA) – Learn how AWS Certificate Manager (ACM) Private Certificate Authority (CA), a managed private CA service, helps you easily and securely manage the lifecycle of your private certificates.
June 1, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS Firewall Manager – Centrally configure and manage AWS WAF rules across your accounts and applications.
May 30, 2018 | 11:00 AM – 11:45 AM PT – Accelerate Productivity by Computing at the Edge – Learn how AWS Snowball Edge support for compute instances helps accelerate data transfers, execute custom applications, and reduce overall storage costs.
Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.
AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.
Building a Big Grid In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.
Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.
Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:
Spot Instance Limit – 120,000
EBS Volume Limit – 120,000
EBS Capacity Limit – 2 PB
If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.
Running the Grid We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.
The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.
Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.
To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.
I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!
EC2’s H1 instances offer 2 to 16 terabytes of fast, dense storage for big data applications, optimized to deliver high throughput for sequential I/O. Enhanced Networking, 32 to 256 gigabytes of RAM, and Intel Xeon E5-2686 v4 processors running at a base frequency of 2.3 GHz round out the feature set.
I am happy to announce that we are reducing the On-Demand and Reserved Instance prices for H1 instances in the US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) Regions by 15%, effective immediately.
EC2 Spot Fleets are really cool. You can launch a fleet of Spot Instances that spans EC2 instance types and Availability Zones without having to write custom code to discover capacity or monitor prices. You can set the target capacity (the size of the fleet) in units that are meaningful to your application and have Spot Fleet create and then maintain the fleet on your behalf. Our customers are creating Spot Fleets of all sizes. For example, one financial service customer runs Monte Carlo simulations across 10 different EC2 instance types. They routinely make requests for hundreds of thousands of vCPUs and count on Spot Fleet to give them access to massive amounts of capacity at the best possible price.
EC2 Fleet Today we are extending and generalizing the set-it-and-forget-it model that we pioneered in Spot Fleet with EC2 Fleet, a new building block that gives you the ability to create fleets that are composed of a combination of EC2 On-Demand, Reserved, and Spot Instances with a single API call. You tell us what you need, capacity and instance-wise, and we’ll handle all the heavy lifting. We will launch, manage, monitor and scale instances as needed, without the need for scaffolding code.
You can specify the capacity of your fleet in terms of instances, vCPUs, or application-oriented units, and also indicate how much of the capacity should be fulfilled by Spot Instances. The application-oriented units allow you to specify the relative power of each EC2 instance type in a way that directly maps to the needs of your application. All three capacity specification options (instances, vCPUs, and application-oriented units) are known as weights.
I think you’ll find a number ways this feature makes managing a fleet of instances easier, and believe that you will also appreciate the team’s near-term feature roadmap of interest (more on that in a bit).
Using EC2 Fleet There are a number of ways that you can use this feature, whether you’re running a stateless web service, a big data cluster or a continuous integration pipeline. Today I’m going to describe how you can use EC2 Fleet for genomic processing, but this is similar to workloads like risk analysis, log processing or image rendering. Modern DNA sequencers can produce multiple terabytes of raw data each day, to process that data into meaningful information in a timely fashion you need lots of processing power. I’ll be showing you how to deploy a “grid” of worker nodes that can quickly crunch through secondary analysis tasks in parallel.
Projects in genomics can use the elasticity EC2 provides to experiment and try out new pipelines on hundreds or even thousands of servers. With EC2 you can access as many cores as you need and only pay for what you use. Prior to today, you would need to use the RunInstances API or an Auto Scaling group for the On-Demand & Reserved Instance portion of your grid. To get the best price performance you’d also create and manage a Spot Fleet or multiple Spot Auto Scaling groups with different instance types if you wanted to add Spot Instances to turbo-boost your secondary analysis. Finally, to automate scaling decisions across multiple APIs and Auto Scaling groups you would need to write Lambda functions that periodically assess your grid’s progress & backlog, as well as current Spot prices – modifying your Auto Scaling Groups and Spot Fleets accordingly.
You can now replace all of this with a single EC2 Fleet, analyzing genomes at scale for as little as $1 per analysis. In my grid, each step in in the pipeline requires 1 vCPU and 4 GiB of memory, a perfect match for M4 and M5 instances with 4 GiB of memory per vCPU. I will create a fleet using M4 and M5 instances with weights that correspond to the number of vCPUs on each instance:
m4.16xlarge – 64 vCPUs, weight = 64
m5.24xlarge – 96 vCPUs, weight = 96
This is expressed in a template that looks like this:
By default, EC2 Fleet will select the most cost effective combination of instance types and Availability Zones (both specified in the template) using the current prices for the Spot Instances and public prices for the On-Demand Instances (if you specify instances for which you have matching RIs, your discounts will apply). The default mode takes weights into account to get the instances that have the lowest price per unit. So for my grid, fleet will find the instance that offers the lowest price per vCPU.
Now I can request capacity in terms of vCPUs, knowing EC2 Fleet will select the lowest cost option using only the instance types I’ve defined as acceptable. Also, I can specify how many vCPUs I want to launch using On-Demand or Reserved Instance capacity and how many vCPUs should be launched using Spot Instance capacity:
The above means that I want a total of 2880 vCPUs, with 960 vCPUs fulfilled using On-Demand and 1920 using Spot. The On-Demand price per vCPU is lower for m5.24xlarge than the On-Demand price per vCPU for m4.16xlarge, so EC2 Fleet will launch 10 m5.24xlarge instances to fulfill 960 vCPUs. Based on current Spot pricing (again, on a per-vCPU basis), EC2 Fleet will choose to launch 30 m4.16xlarge instances or 20 m5.24xlarges, delivering 1920 vCPUs either way.
Putting it all together, I have a single file (fl1.json) that describes my fleet:
My entire fleet is created within seconds and was built using 10 m5.24xlarge On-Demand Instances and 30 m4.16xlarge Spot Instances, since the current Spot price was 1.5¢ per vCPU for m4.16xlarge and 1.6¢ per vCPU for m5.24xlarge.
Now lets imagine my grid has crunched through its backlog and no longer needs the additional Spot Instances. I can then modify the size of my fleet by changing the target capacity in my fleet specification, like this:
{
"TotalTargetCapacity": 960,
}
Since 960 was equal to the amount of On-Demand vCPUs I had requested, when I describe my fleet I will see all of my capacity being delivered using On-Demand capacity:
Earlier I described how RI discounts apply when EC2 Fleet launches instances for which you have matching RIs, so you might be wondering how else RI customers benefit from EC2 Fleet. Let’s say that I own regional RIs for M4 instances. In my EC2 Fleet I would remove m5.24xlarge and specify m4.10xlarge and m4.16xlarge. Then when EC2 Fleet creates the grid, it will quickly find M4 capacity across the sizes and AZs I’ve specified, and my RI discounts apply automatically to this usage.
In the Works We plan to connect EC2 Fleet and EC2 Auto Scaling groups. This will let you create a single fleet that mixed instance types and Spot, Reserved and On-Demand, while also taking advantage of EC2 Auto Scaling features such as health checks and lifecycle hooks. This integration will also bring EC2 Fleet functionality to services such as Amazon ECS, Amazon EKS, and AWS Batch that build on and make use of EC2 Auto Scaling for fleet management.
Available Now You can create and make use of EC2 Fleets today in all public AWS Regions!
Amazon Aurora MySQL (Aurora MySQL) is a managed relational database engine, wire-compatible with MySQL 5.6. Most of the drivers, connectors, and tools that you currently use with MySQL can be used with Aurora MySQL with little or no change.
Aurora MySQL database (DB) clusters provide advanced features such as:
One primary instance that supports read and write operations and up to 15 Aurora Replicas that support read-only operations, and can be automatically promoted to the primary role if the current primary instance fails
A cluster endpoint that automatically follows the primary instance in case of failover
A reader endpoint that includes all Aurora Replicas and is automatically updated when Aurora Replicas are added or removed
Internal server connection pooling and thread multiplexing for improved scalability
Near-instantaneous database restarts and crash recovery
Access to near-real-time cluster metadata that allows application developers to build “smart drivers,” connecting directly to individual instances based on their read-write or read-only role
Client-side components that are not well configured (applications, drivers, connectors, proxies) may be unable to react to recovery actions and DB cluster topology changes, or the reaction may be delayed, contributing to unexpected downtime and performance issues. In order to prevent that and make the most of Aurora MySQL features, database administrators (DBAs) and application developers are encouraged to implement best practices.
You can read more about these best practices in the recently-published Amazon Aurora MySQL DBA Handbook – Connection Management whitepaper.
About the Author
Szymon Komendera is a database engineer on the Amazon Aurora team.
Researchers at Princeton University have released IoT Inspector, a tool that analyzes the security and privacy of IoT devices by examining the data they send across the Internet. They’ve already used the tool to study a bunch of different IoT devices. From their blog post:
Finding #3: Many IoT Devices Contact a Large and Diverse Set of Third Parties
In many cases, consumers expect that their devices contact manufacturers’ servers, but communication with other third-party destinations may not be a behavior that consumers expect.
We have found that many IoT devices communicate with third-party services, of which consumers are typically unaware. We have found many instances of third-party communications in our analyses of IoT device network traffic. Some examples include:
Samsung Smart TV. During the first minute after power-on, the TV talks to Google Play, Double Click, Netflix, FandangoNOW, Spotify, CBS, MSNBC, NFL, Deezer, and Facebookeven though we did not sign in or create accounts with any of them.
Amcrest WiFi Security Camera. The camera actively communicates with cellphonepush.quickddns.com using HTTPS. QuickDDNS is a Dynamic DNS service provider operated by Dahua. Dahua is also a security camera manufacturer, although Amcrest’s website makes no references to Dahua. Amcrest customer service informed us that Dahua was the original equipment manufacturer.
Halo Smoke Detector. The smart smoke detector communicates with broker.xively.com. Xively offers an MQTT service, which allows manufacturers to communicate with their devices.
Geeni Light Bulb. The Geeni smart bulb communicates with gw.tuyaus.com, which is operated by TuYa, a China-based company that also offers an MQTT service.
We also looked at a number of other devices, such as Samsung Smart Camera and TP-Link Smart Plug, and found communications with third parties ranging from NTP pools (time servers) to video storage services.
Their first two findings are that “Many IoT devices lack basic encryption and authentication” and that “User behavior can be inferred from encrypted IoT device traffic.” No surprises there.
Enterprises adopt containers because they recognize the benefits: speed, agility, portability, and high compute density. They understand how accelerating application delivery and deployment pipelines makes it possible to rapidly slipstream new features to customers. Although the benefits are indisputable, this acceleration raises concerns about security and corporate compliance with software governance. In this blog post, I provide a solution that shows how Layered Insight, the pioneer and global leader in container-native application protection, can be used with seamless application build and delivery pipelines like those available in AWS CodeBuild to address these concerns.
Layered Insight solutions
Layered Insight enables organizations to unify DevOps and SecOps by providing complete visibility and control of containerized applications. Using the industry’s first embedded security approach, Layered Insight solves the challenges of container performance and protection by providing accurate insight into container images, adaptive analysis of running containers, and automated enforcement of container behavior.
AWS CodeBuild
AWS CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. CodeBuild scales continuously and processes multiple builds concurrently, so your builds are not left waiting in a queue. You can get started quickly by using prepackaged build environments, or you can create custom build environments that use your own build tools.
Problem Definition
Security and compliance concerns span the lifecycle of application containers. Common concerns include:
Visibility into the container images. You need to verify the software composition information of the container image to determine whether known vulnerabilities associated with any of the software packages and libraries are included in the container image.
Governance of container images is critical because only certain open source packages/libraries, of specific versions, should be included in the container images. You need support for mechanisms for blacklisting all container images that include a certain version of a software package/library, or only allowing open source software that come with a specific type of license (such as Apache, MIT, GPL, and so on). You need to be able to address challenges such as:
· Defining the process for image compliance policies at the enterprise, department, and group levels.
· Preventing the images that fail the compliance checks from being deployed in critical environments, such as staging, pre-prod, and production.
Visibility into running container instances is critical, including:
· CPU and memory utilization.
· Security of the build environment.
· All activities (system, network, storage, and application layer) of the application code running in each container instance.
Protection of running container instances that is:
· Zero-touch to the developers (not an SDK-based approach).
· Zero touch to the DevOps team and doesn’t limit the portability of the containerized application.
· This protection must retain the option to switch to a different container stack or orchestration layer, or even to a different Container as a Service (CaaS ).
· And it must be a fully automated solution to SecOps, so that the SecOps team doesn’t have to manually analyze and define detailed blacklist and whitelist policies.
Solution Details
In AWS CodeCommit, we have three projects: ● “Democode” is a simple Java application, with one buildspec to build the app into a Docker container (run by build-demo-image CodeBuild project), and another to instrument said container (instrument-image CodeBuild project). The resulting container is stored in ECR repo javatestasjavatest:20180415-layered. This instrumented container is running in AWS Fargate cluster demo-java-appand can be seen in the Layered Insight runtime console as the javatestapplication in us-east-1. ● aws-codebuild-docker-imagesis a clone of the official aws-codebuild-docker-images repo on GitHub . This CodeCommit project is used by the build-python-builder CodeBuild project to build the python 3.3.6 codebuild image and is stored at the codebuild-python ECR repo. We then manually instructed the Layered Insight console to instrument the image. ● scan-java-imagecontains just a buildspec.yml file. This file is used by the scan-java-image CodeBuild project to instruct Layered Assessment to perform a vulnerability scan of the javatest container image built previously, and then run the scan results through a compliance policy that states there should be no medium vulnerabilities. This build fails — but in this case that is a success: the scan completes successfully, but compliance fails as there are medium-level issues found in the scan.
This build is performed using the instrumented version of the Python 3.3.6 CodeBuild image, so the activity of the processes running within the build are recorded each time within the LI console.
Build container image
Create or use a CodeCommit project with your application. To build this image and store it in Amazon Elastic Container Registry (Amazon ECR), add a buildspec file to the project and build a container image and create a CodeBuild project.
Scan container image
Once the image is built, create a new buildspec in the same project or a new one that looks similar to below (update ECR URL as necessary):
version: 0.2
phases:
pre_build:
commands:
- echo Pulling down LI Scan API client scripts
- git clone https://github.com/LayeredInsight/scan-api-example-python.git
- echo Setting up LI Scan API client
- cd scan-api-example-python
- pip install layint_scan_api
- pip install -r requirements.txt
build:
commands:
- echo Scanning container started on `date`
- IMAGEID=$(./li_add_image – name <aws-region>.amazonaws.com/javatest:20180415)
- ./li_wait_for_scan -v – imageid $IMAGEID
- ./li_run_image_compliance -v – imageid $IMAGEID – policyid PB15260f1acb6b2aa5b597e9d22feffb538256a01fbb4e5a95
Add the buildspec file to the git repo, push it, and then build a CodeBuild project using with the instrumented Python 3.3.6 CodeBuild image at <aws-region>.amazonaws.com/codebuild-python:3.3.6-layered. Set the following environment variables in the CodeBuild project: ● LI_APPLICATIONNAME – name of the build to display ● LI_LOCATION – location of the build project to display ● LI_API_KEY – ApiKey:<key-name>:<api-key> ● LI_API_HOST – location of the Layered Insight API service
Instrument container image
Next, to instrument the new container image:
In the Layered Insight runtime console, ensure that the ECR registry and credentials are defined (click the Setup icon and the ‘+’ sign on the top right of the screen to add a new container registry). Note the name given to the registry in the console, as this needs to be referenced in the li_add_imagecommand in the script, below.
Next, add a new buildspec (with a new name) to the CodeCommit project, such as the one shown below. This code will download the Layered Insight runtime client, and use it to instruct the Layered Insight service to instrument the image that was just built:
version: 0.2
phases:
pre_build:
commands:
echo Pulling down LI API Runtime client scripts
git clone https://github.com/LayeredInsight/runtime-api-example-python
echo Setting up LI API client
cd runtime-api-example-python
pip install layint-runtime-api
pip install -r requirements.txt
build:
commands:
echo Instrumentation started on `date`
./li_add_image – registry "Javatest ECR" – name IMAGE_NAME:TAG – description "IMAGE DESCRIPTION" – policy "Default Policy" – instrument – wait – verbose
Commit and push the new buildspec file.
Going back to CodeBuild, create a new project, with the same CodeCommit repo, but this time select the new buildspec file. Use a Python 3.3.6 builder – either the AWS or LI Instrumented version.
Click Continue
Click Save
Run the build, again on the master branch.
If everything runs successfully, a new image should appear in the ECR registry with a -layered suffix. This is the instrumented image.
Run instrumented container image
When the instrumented container is now run — in ECS, Fargate, or elsewhere — it will log data back to the Layered Insight runtime console. It’s appearance in the console can be modified by setting the LI_APPLICATIONNAME and LI_LOCATION environment variables when running the container.
Conclusion
In the above blog we have provided you steps needed to embed governance and runtime security in your build pipelines running on AWS CodeBuild using Layered Insight.
In a multi-account environment where you require connectivity between accounts, and perhaps connectivity between cloud and on-premises workloads, the demand for a robust Domain Name Service (DNS) that’s capable of name resolution across all connected environments will be high.
The most common solution is to implement local DNS in each account and use conditional forwarders for DNS resolutions outside of this account. While this solution might be efficient for a single-account environment, it becomes complex in a multi-account environment.
In this post, I will provide a solution to implement central DNS for multiple accounts. This solution reduces the number of DNS servers and forwarders needed to implement cross-account domain resolution. I will show you how to configure this solution in four steps:
Set up your Central DNS account.
Set up each participating account.
Create Route53 associations.
Configure on-premises DNS (if applicable).
Solution overview
In this solution, you use AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD) as a DNS service in a dedicated account in a Virtual Private Cloud (DNS-VPC).
The DNS service included in AWS Managed Microsoft AD uses conditional forwarders to forward domain resolution to either Amazon Route 53 (for domains in the awscloud.com zone) or to on-premises DNS servers (for domains in the example.com zone). You’ll use AWS Managed Microsoft AD as the primary DNS server for other application accounts in the multi-account environment (participating accounts).
A participating account is any application account that hosts a VPC and uses the centralized AWS Managed Microsoft AD as the primary DNS server for that VPC. Each participating account has a private, hosted zone with a unique zone name to represent this account (for example, business_unit.awscloud.com).
You associate the DNS-VPC with the unique hosted zone in each of the participating accounts, this allows AWS Managed Microsoft AD to use Route 53 to resolve all registered domains in private, hosted zones in participating accounts.
The following diagram shows how the various services work together:
Figure 1: Diagram showing the relationship between all the various services
In this diagram, all VPCs in participating accounts use Dynamic Host Configuration Protocol (DHCP) option sets. The option sets configure EC2 instances to use the centralized AWS Managed Microsoft AD in DNS-VPC as their default DNS Server. You also configure AWS Managed Microsoft AD to use conditional forwarders to send domain queries to Route53 or on-premises DNS servers based on query zone. For domain resolution across accounts to work, we associate DNS-VPC with each hosted zone in participating accounts.
If, for example, server.pa1.awscloud.com needs to resolve addresses in the pa3.awscloud.com domain, the sequence shown in the following diagram happens:
Figure 2: How domain resolution across accounts works
1.1: server.pa1.awscloud.com sends domain name lookup to default DNS server for the name server.pa3.awscloud.com. The request is forwarded to the DNS server defined in the DHCP option set (AWS Managed Microsoft AD in DNS-VPC).
1.2: AWS Managed Microsoft AD forwards name resolution to Route53 because it’s in the awscloud.com zone.
1.3: Route53 resolves the name to the IP address of server.pa3.awscloud.com because DNS-VPC is associated with the private hosted zone pa3.awscloud.com.
Similarly, if server.example.com needs to resolve server.pa3.awscloud.com, the following happens:
2.1: server.example.com sends domain name lookup to on-premise DNS server for the name server.pa3.awscloud.com.
2.2: on-premise DNS server using conditional forwarder forwards domain lookup to AWS Managed Microsoft AD in DNS-VPC.
1.2: AWS Managed Microsoft AD forwards name resolution to Route53 because it’s in the awscloud.com zone.
1.3: Route53 resolves the name to the IP address of server.pa3.awscloud.com because DNS-VPC is associated with the private hosted zone pa3.awscloud.com.
Step 1: Set up a centralized DNS account
In previous AWS Security Blog posts, Drew Dennis covered a couple of options for establishing DNS resolution between on-premises networks and Amazon VPC. In this post, he showed how you can use AWS Managed Microsoft AD (provisioned with AWS Directory Service) to provide DNS resolution with forwarding capabilities.
To set up a centralized DNS account, you can follow the same steps in Drew’s post to create AWS Managed Microsoft AD and configure the forwarders to send DNS queries for awscloud.com to default, VPC-provided DNS and to forward example.com queries to the on-premise DNS server.
Here are a few considerations while setting up central DNS:
The VPC that hosts AWS Managed Microsoft AD (DNS-VPC) will be associated with all private hosted zones in participating accounts.
To be able to resolve domain names across AWS and on-premises, connectivity through Direct Connect or VPN must be in place.
Step 2: Set up participating accounts
The steps I suggest in this section should be applied individually in each application account that’s participating in central DNS resolution.
Create the VPC(s) that will host your resources in participating account.
Create VPC Peering between local VPC(s) in each participating account and DNS-VPC.
Create a private hosted zone in Route 53. Hosted zone domain names must be unique across all accounts. In the diagram above, we used pa1.awscloud.com / pa2.awscloud.com / pa3.awscloud.com. You could also use a combination of environment and business unit: for example, you could use pa1.dev.awscloud.com to achieve uniqueness.
Associate VPC(s) in each participating account with the local private hosted zone.
The next step is to change the default DNS servers on each VPC using DHCP option set:
Follow these steps to create a new DHCP option set. Make sure in the DNS Servers to put the private IP addresses of the two AWS Managed Microsoft AD servers that were created in DNS-VPC:
Figure 3: The “Create DHCP options set” dialog box
Follow these steps to assign the DHCP option set to your VPC(s) in participating account.
Step 3: Associate DNS-VPC with private hosted zones in each participating account
The next steps will associate DNS-VPC with the private, hosted zone in each participating account. This allows instances in DNS-VPC to resolve domain records created in these hosted zones. If you need them, here are more details on associating a private, hosted zone with VPC on a different account.
In each participating account, create the authorization using the private hosted zone ID from the previous step, the region, and the VPC ID that you want to associate (DNS-VPC).
After completing these steps, AWS Managed Microsoft AD in the centralized DNS account should be able to resolve domain records in the private, hosted zone in each participating account.
Step 4: Setting up on-premises DNS servers
This step is necessary if you would like to resolve AWS private domains from on-premises servers and this task comes down to configuring forwarders on-premise to forward DNS queries to AWS Managed Microsoft AD in DNS-VPC for all domains in the awscloud.com zone.
The steps to implement conditional forwarders vary by DNS product. Follow your product’s documentation to complete this configuration.
Summary
I introduced a simplified solution to implement central DNS resolution in a multi-account environment that could be also extended to support DNS resolution between on-premise resources and AWS. This can help reduce operations effort and the number of resources needed to implement cross-account domain resolution.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Directory Service forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
We made it easier for you to comply with regulatory standards by controlling access to AWS Regions using IAM policies. For example, if your company requires users to create resources in a specific AWS region, you can now add a new condition to the IAM policies you attach to your IAM principal (user or role) to enforce this for all AWS services. In this post, I review conditions in policies, introduce the new condition, and review a policy example to demonstrate how you can control access across multiple AWS services to a specific region.
Condition concepts
Before I introduce the new condition, let’s review the condition element of an IAM policy. A condition is an optional IAM policy element that lets you specify special circumstances under which the policy grants or denies permission. A condition includes a condition key, operator, and value for the condition. There are two types of conditions: service-specific conditions and global conditions. Service-specific conditions are specific to certain actions in an AWS service. For example, the condition key ec2:InstanceType supports specific EC2 actions. Global conditions support all actions across all AWS services.
Now that I’ve reviewed the condition element in an IAM policy, let me introduce the new condition.
AWS:RequestedRegion condition key
The new global condition key, , supports all actions across all AWS services. You can use any string operator and specify any AWS region for its value.
Condition key
Description
Operator(s)
Value
aws:RequestedRegion
Allows you to specify the region to which the IAM principal (user or role) can make API calls
I’ll now demonstrate the use of the new global condition key.
Example: Policy with region-level control
Let’s say a group of software developers in my organization is working on a project using Amazon EC2 and Amazon RDS. The project requires a web server running on an EC2 instance using Amazon Linux and a MySQL database instance in RDS. The developers also want to test Amazon Lambda, an event-driven platform, to retrieve data from the MySQL DB instance in RDS for future use.
My organization requires all the AWS resources to remain in the Frankfurt, eu-central-1, region. To make sure this project follows these guidelines, I create a single IAM policy for all the AWS services that this group is going to use and apply the new global condition key aws:RequestedRegion for all the services. This way I can ensure that any new EC2 instances launched or any database instances created using RDS are in Frankfurt. This policy also ensures that any Lambda functions this group creates for testing are also in the Frankfurt region.
The first statement in the above example contains all the read-only actions that let my developers use the console for EC2, RDS, and Lambda. The permissions for IAM-related actions are required to launch EC2 instances with a role, enable enhanced monitoring in RDS, and for AWS Lambda to assume the IAM execution role to execute the Lambda function. I’ve combined all the read-only actions into a single statement for simplicity. The second statement is where I give write access to my developers for the three services and restrict the write access to the Frankfurt region using the aws:RequestedRegion condition key. You can also list multiple AWS regions with the new condition key if your developers are allowed to create resources in multiple regions. The third statement grants permissions for the IAM action iam:PassRole required by AWS Lambda. For more information on allowing users to create a Lambda function, see Using Identity-Based Policies for AWS Lambda.
Summary
You can now use the aws:RequestedRegion global condition key in your IAM policies to specify the region to which the IAM principal (user or role) can invoke an API call. This capability makes it easier for you to restrict the AWS regions your IAM principals can use to comply with regulatory standards and improve account security. For more information about this global condition key and policy examples using aws:RequestedRegion, see the IAM documentation.
If you have comments about this post, submit them in the Comments section below. If you have questions about or suggestions for this solution, start a new thread on the IAM forum.
Want more AWS Security news? Follow us on Twitter.
AWS CloudFormation allows developers and systems administrators to easily create and manage a collection of related AWS resources (called a CloudFormation stack) by provisioning and updating them in an orderly and predictable way. CloudFormation users can now deploy and manage AWS Batch resources in exactly the same way that they are managing the rest of their AWS infrastructure.
This post highlights the native resources supported in CloudFormation and demonstrates how to create AWS Batch compute environments using CloudFormation. All sample CloudFormation, per-region templates related to this post can be found on the CloudFormation sample template site. The Ohio (us-east-2) Region is used as the example region for the remainder of this post.
AWS Batch Resources
AWS Batch is a managed service that helps you efficiently run batch computing workloads on the AWS Cloud. Users submit jobs to job queues, specifying the application to be run and their jobs’ CPU and memory requirements. AWS Batch is responsible for launching the appropriate quantity and types of instances needed to run your jobs.
AWS Batch removes the undifferentiated heavy lifting of configuring and managing compute infrastructure, allowing you to instead focus on your applications and users. This is demonstrated in the How AWS Batch Works video.
AWS Batch manages the following resources:
Job definitions
Job queues
Compute environments
A job definition specifies how jobs are to be run—for example, which Docker image to use for your job, how many vCPUs and how much memory is required, the IAM role to be used, and more.
Jobs are submitted to job queues where they reside until they can be scheduled to run on Amazon EC2 instances within a compute environment. An AWS account can have multiple job queues, each with varying priority. This gives you the ability to closely align the consumption of compute resources with your organizational requirements.
Compute environments provision and manage your EC2 instances and other compute resources that are used to run your AWS Batch jobs. Job queues are mapped to one more compute environments and a given environment can also be mapped to one or more job queues. This many-to-many relationship is defined by the compute environment order and job queue priority properties.
The following diagram shows a general overview of how the AWS Batch resources interact.
CloudFormation stack creation and updates
Upon the creation of your stack, an AWS Batch job definition is registered using your CloudFormation template. If a job definition with the same name has already been registered, a new revision is created. On stack updates, any changes to your job definition specifications in the CloudFormation template result in a new revision of that job definition and a deregistration of the previous job definition revision. Stack deletions only result in the deregistration of your job definition, as AWS Batch does not delete job definitions.
At the stack creation, a job queue is created using the template. Any changes to your job queue properties within the stack result in a call to the UpdateJobQueue API action. Similarly, stack deletions result in the deletion of job queues from your AWS Batch compute environment.
CloudFormation creates an AWS Batch compute environment using the properties specified in your template. Stack updates result in updates to your compute environment where possible. If you need to change a parameter that is not supported by the UpdateComputeEnvironment API action, stack updates result in the deletion and re-creation of your compute environment. Upon stack deletion, your compute environment is disabled, and then deleted.
All naming conventions specified by CloudFormation should be followed—especially in the case of resource replacement—or you run the risk of a failed stack changes. For example, all AWS Batch resource property names must be capitalized, and resource names must be changed in the case of resource replacement, as is the case in any CloudFormation stack.
If you do not provide values for ComputeEnvironmentName, JobQueueName, or JobDefinitionName in your template, a pseudo-random name is generated for you using the logical ID that you gave the resource in CloudFormation.
Launching a “Hello World” example stack
Here’s a familiar “Hello World” example of a CloudFormation stack with AWS Batch resources.
This example registers a simple job definition, a job queue that can accept job submissions, and a compute environment that contains the compute resources used to execute your job. The stack template also creates additional AWS resources that are required by AWS Batch:
An IAM service role that gives AWS Batch permissions to take the required actions on your behalf
An IAM ECS instance role
A VPC
A VPC subnet (though I’ve provided a general template, I suggest that this be a private subnet)
A security group
This stack can easily be deployed in the CloudFormation console, but I provide CLI commands that complete the stack creation for you. Use the Launch stack button or run the following command:
You can monitor the creation of the resources in your CloudFormation stack in the CloudFormation console, on the Events tab:
Confirm the successful creation of your stack by observing a CREATE_COMPLETE status. At this point, you should also be able to view the new resource ARNs on the Outputs tab:
After your stack is successfully created, everything that you need to submit a “hello-world” job is complete.
Make sure to use the accurate job definition name and revision number. You can find the accurate Amazon Resource Name (ARN) on the CloudFormation stack Outputs tab. A pseudo-random resource name is generated for your AWS Batch resources. If you do have an existing hello-world job definition, make sure that you run the command with the job definition revision created by your new CloudFormation stack from the stack outputs.
You can monitor the successful execution of the job in the AWS Batch console under Jobs:
When you are done using this stack and want to delete the resources, run the following command. CloudFormation deregisters the job definition, and deletes the job queue, compute environment, and the rest of the resources in the stack template.
aws – region us-east-2 cloudformation delete-stack – stack-name hello-world-batch-stack
Now that you know the basics of AWS Batch resources, here’s a more complex example.
High– and low-priority job queues with On-Demand and Spot compute environments
This CloudFormation stack creates two job queues with varying priority and two compute environments. You have one On-Demand compute environment and one Spot compute environment with a Spot price at 40% of On-Demand.
The first job queue is higher priority and feeds jobs to both compute environments, while the lower priority job queue only submits jobs for execution to the Spot compute environment.
There are two job definitions, one high-priority job queue and one low-priority job queue. Each job submitted using a given job definition is submitted to a job queue. For example, jobs submitted with an important-production-application job definition are submitted to the high priority job queue, while jobs submitted with a test-application job definition are submitted to the low priority job queue.
This example registers both job definitions and creates your compute environments and job queues. It also creates the VPC, subnet, security group, IAM service role for AWS Batch, ECS instance role, and an IAM Spot Fleet role. Use the Launch stack button or run the following command:
As with any CloudFormation stack, you can update resources for your application’s specific needs. AWS CloudFormation Designer is a graphic tool for creating, viewing, and modifying CloudFormation templates.
Any changes to resource properties that require replacement results in the creation of a new resource to reflect this change, and the deletion of the obsolete resource. Changes to an immutable compute environment or job queue properties results in replacement. Changes to updateable properties update the existing resource. Any changes to job definitions (beyond the name) result in the registration of a new revision of the existing job definition, followed by the deregistration of the previous revision.
Finally, run the following command to delete the CloudFormation stack containing your AWS Batch resources:
aws – region us-east-2 cloudformation delete-stack – stack-name high-low-priority-batch-stack
Conclusion
In this post, I detailed the steps to create, update with and without replacement, and delete your AWS Batch resources using CloudFormation templates as part of CloudFormation stacks with other AWS service resources. For more information, see the following topics:
In December 2016, we launched the Outbound Network Access functionality for Amazon RDS for Oracle, enabling customers to use their RDS for Oracle database instances to communicate with external web endpoints using the utl_http and utl tcp packages, and sending emails through utl_smtp. We extended the functionality by adding the option of using custom DNS servers, allowing such outbound network accesses to make use of any DNS server a customer chooses to use. These releases enabled HTTP, TCP and SMTP communication originating out of RDS for Oracle instances – limited to non-secure (non-SSL) mediums.
To overcome the limitation over SSL connections, we recently published a whitepaper, that guides through the process of creating customized Oracle wallet bundles on your RDS for Oracle instances. By making use of such wallets, you can now extend the Outbound Network Access capability to have external communications happen over secure (SSL/TLS) connections. This opens up new use cases for your RDS for Oracle instances.
With the right set of certificates imported into your RDS for Oracle instances (through Oracle wallets), your database instances can now:
Communicate with a HTTPS endpoint: Using utl_http, access a resource such as https://status.aws.amazon.com/robots.txt
Download files from Amazon S3 securely: Using a presigned URL from Amazon S3, you can now download any file over SSL
Extending Oracle Database links to use SSL: Database links between RDS for Oracle instances can now use SSL as long as the instances have the SSL option installed
Sending email over SMTPS:
You can now integrate with Amazon SES to send emails from your database instances and any other generic SMTPS with which the provider can be integrated
These are just a few high-level examples of new use cases that have opened up with the whitepaper. As a reminder, always ensure to have best security practices in place when making use of Outbound Network Access (detailed in the whitepaper).
About the Author
Surya Nallu is a Software Development Engineer on the Amazon RDS for Oracle team.
Many of today’s discussions around blockchain technology remind me of the classic Shimmer Floor Wax skit. According to Dan Aykroyd, Shimmer is a dessert topping. Gilda Radner claims that it is a floor wax, and Chevy Chase settles the debate and reveals that it actually is both! Some of the people that I talk to see blockchains as the foundation of a new monetary system and a way to facilitate international payments. Others see blockchains as a distributed ledger and immutable data source that can be applied to logistics, supply chain, land registration, crowdfunding, and other use cases. Either way, it is clear that there are a lot of intriguing possibilities and we are working to help our customers use this technology more effectively.
We are launching AWS Blockchain Templates today. These templates will let you launch an Ethereum (either public or private) or Hyperledger Fabric (private) network in a matter of minutes and with just a few clicks. The templates create and configure all of the AWS resources needed to get you going in a robust and scalable fashion.
Launching a Private Ethereum Network The Ethereum template offers two launch options. The ecs option creates an Amazon ECS cluster within a Virtual Private Cloud (VPC) and launches a set of Docker images in the cluster. The docker-local option also runs within a VPC, and launches the Docker images on EC2 instances. The template supports Ethereum mining, the EthStats and EthExplorer status pages, and a set of nodes that implement and respond to the Ethereum RPC protocol. Both options create and make use of a DynamoDB table for service discovery, along with Application Load Balancers for the status pages.
Here are the AWS Blockchain Templates for Ethereum:
I start by opening the CloudFormation Console in the desired region and clicking Create Stack:
I select Specify an Amazon S3 template URL, enter the URL of the template for the region, and click Next:
I give my stack a name:
Next, I enter the first set of parameters, including the network ID for the genesis block. I’ll stick with the default values for now:
I will also use the default values for the remaining network parameters:
Moving right along, I choose the container orchestration platform (ecs or docker-local, as I explained earlier) and the EC2 instance type for the container nodes:
Next, I choose my VPC and the subnets for the Ethereum network and the Application Load Balancer:
I configure my keypair, EC2 security group, IAM role, and instance profile ARN (full information on the required permissions can be found in the documentation):
The Instance Profile ARN can be found on the summary page for the role:
I confirm that I want to deploy EthStats and EthExplorer, choose the tag and version for the nested CloudFormation templates that are used by this one, and click Next to proceed:
On the next page I specify a tag for the resources that the stack will create, leave the other options as-is, and click Next:
I review all of the parameters and options, acknowledge that the stack might create IAM resources, and click Create to build my network:
The template makes use of three nested templates:
After all of the stacks have been created (mine took about 5 minutes), I can select JeffNet and click the Outputs tab to discover the links to EthStats and EthExplorer:
Here’s my EthStats:
And my EthExplorer:
If I am writing apps that make use of my private network to store and process smart contracts, I would use the EthJsonRpcUrl.
Stay Tuned My colleagues are eager to get your feedback on these new templates and plan to add new versions of the frameworks as they become available.
Do you have a clear understanding of the hybrid cloud? If you don’t, it’s not surprising.
Hybrid cloud has been applied to a greater and more varied number of IT solutions than almost any other recent data management term. About the only thing that’s clear about the hybrid cloud is that the term hybrid cloud wasn’t invented by customers, but by vendors who wanted to hawk whatever solution du jour they happened to be pushing.
Let’s be honest. We’re in an industry that loves hype. We can’t resist grafting hyper, multi, ultra, and super and other prefixes onto the beginnings of words to entice customers with something new and shiny. The alphabet soup of cloud-related terms can include various options for where the cloud is located (on-premises, off-premises), whether the resources are private or shared in some degree (private, community, public), what type of services are offered (storage, computing), and what type of orchestrating software is used to manage the workflow and the resources. With so many moving parts, it’s no wonder potential users are confused.
Let’s take a step back, try to clear up the misconceptions, and come up with a basic understanding of what the hybrid cloud is. To be clear, this is our viewpoint. Others are free to do what they like, so bear that in mind.
So, What is the Hybrid Cloud?
The hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them.
To get beyond the hype, let’s start with Forrester Research‘s idea of the hybrid cloud: “One or more public clouds connected to something in my data center. That thing could be a private cloud; that thing could just be traditional data center infrastructure.”
To put it simply, a hybrid cloud is a mash-up of on-premises and off-premises IT resources.
To expand on that a bit, we can say that the hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud[1] resources combined with third-party public cloud resources that use some kind of orchestration[2] between them. The advantage of the hybrid cloud model is that it allows workloads and data to move between private and public clouds in a flexible way as demands, needs, and costs change, giving businesses greater flexibility and more options for data deployment and use.
In other words, if you have some IT resources in-house that you are replicating or augmenting with an external vendor, congrats, you have a hybrid cloud!
Private Cloud vs. Public Cloud
The cloud is really just a collection of purpose built servers. In a private cloud, the servers are dedicated to a single tenant or a group of related tenants. In a public cloud, the servers are shared between multiple unrelated tenants (customers). A public cloud is off-site, while a private cloud can be on-site or off-site — or on-prem or off-prem.
As an example, let’s look at a hybrid cloud meant for data storage, a hybrid data cloud. A company might set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage to save cost and reduce the amount of storage needed on-site. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.
The hybrid cloud concept also contains cloud computing. For example, at the end of the quarter, order processing application instances can be spun up off-premises in a hybrid computing cloud as needed to add to on-premises capacity.
Hybrid Cloud Benefits
If we accept that the hybrid cloud combines the best elements of private and public clouds, then the benefits of hybrid cloud solutions are clear, and we can identify the primary two benefits that result from the blending of private and public clouds.
Benefit 1: Flexibility and Scalability
Undoubtedly, the primary advantage of the hybrid cloud is its flexibility. It takes time and money to manage in-house IT infrastructure and adding capacity requires advance planning.
The cloud is ready and able to provide IT resources whenever needed on short notice. The term cloud bursting refers to the on-demand and temporary use of the public cloud when demand exceeds resources available in the private cloud. For example, some businesses experience seasonal spikes that can put an extra burden on private clouds. These spikes can be taken up by a public cloud. Demand also can vary with geographic location, events, or other variables. The public cloud provides the elasticity to deal with these and other anticipated and unanticipated IT loads. The alternative would be fixed cost investments in on-premises IT resources that might not be efficiently utilized.
For a data storage user, the on-premises private cloud storage provides, among other benefits, the highest speed access. For data that is not frequently accessed, or needed with the absolute lowest levels of latency, it makes sense for the organization to move it to a location that is secure, but less expensive. The data is still readily available, and the public cloud provides a better platform for sharing the data with specific clients, users, or with the general public.
Benefit 2: Cost Savings
The public cloud component of the hybrid cloud provides cost-effective IT resources without incurring capital expenses and labor costs. IT professionals can determine the best configuration, service provider, and location for each service, thereby cutting costs by matching the resource with the task best suited to it. Services can be easily scaled, redeployed, or reduced when necessary, saving costs through increased efficiency and avoiding unnecessary expenses.
Comparing Private vs Hybrid Cloud Storage Costs
To get an idea of the difference in storage costs between a purely on-premises solutions and one that uses a hybrid of private and public storage, we’ll present two scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: private (on-premises) cloud or public (off-premises) cloud. We are using the costs for our own B2 Cloud Storage in this example. The math can be adapted for any set of numbers you wish to use.
Scenario 1100% of data on-premises storage
Data Stored
Data stored On-Premises: 100%
100 TB
1,000 TB
2,000 TB
On-premises cost range
Monthly Cost
Low — $12/TB/Month
$1,200
$12,000
$24,000
High — $20/TB/Month
$2,000
$20,000
$40,000
Scenario 220% of data on-premises with 80% public cloud storage (B2)
Data Stored
Data stored On-Premises: 20%
20 TB
200 TB
400 TB
Data stored in Cloud: 80%
80 TB
800 TB
1,600 TB
On-premises cost range
Monthly Cost
Low — $12/TB/Month
$240
$2,400
$4,800
High — $20/TB/Month
$400
$4,000
$8,000
Public cloud cost range
Monthly Cost
Low — $5/TB/Month (B2)
$400
$4,000
$8,000
High — $20/TB/Month
$1,600
$16,000
$32,000
On-premises + public cloud cost range
Monthly Cost
Low
$640
$6,400
$12,800
High
$2,000
$20,000
$40,000
As can be seen in the numbers above, using a hybrid cloud solution and storing 80% of the data in the cloud with a provider such as Backblaze B2 can result in significant savings over storing only on-premises. For other cost scenarios, see the B2 Cost Calculator.
When Hybrid Might Not Always Be the Right Fit
There are circumstances where the hybrid cloud might not be the best solution. Smaller organizations operating on a tight IT budget might best be served by a purely public cloud solution. The cost of setting up and running private servers is substantial.
An application that requires the highest possible speed might not be suitable for hybrid, depending on the specific cloud implementation. While latency does play a factor in data storage for some users, it is less of a factor for uploading and downloading data than it is for organizations using the hybrid cloud for computing. Because Backblaze recognized the importance of speed and low-latency for customers wishing to use computing on data stored in B2, we directly connected our data centers with those of our computing partners, ensuring that latency would not be an issue even for a hybrid cloud computing solution.
It is essential to have a good understanding of workloads and their essential characteristics in order to make the hybrid cloud work well for you. Each application needs to be examined for the right mix of private cloud, public cloud, and traditional IT resources that fit the particular workload in order to benefit most from a hybrid cloud architecture.
The Hybrid Cloud Can Be a Win-Win Solution
From the high altitude perspective, any solution that enables an organization to respond in a flexible manner to IT demands is a win. Avoiding big upfront capital expenses for in-house IT infrastructure will appeal to the CFO. Being able to quickly spin up IT resources as they’re needed will appeal to the CTO and VP of Operations.
Should You Go Hybrid?
We’ve arrived at the bottom line and the question is, should you or your organization embrace hybrid cloud infrastructures?
According to 451 Research, by 2019, 69% of companies will operate in hybrid cloud environments, and 60% of workloads will be running in some form of hosted cloud service (up from 45% in 2017). That indicates that the benefits of the hybrid cloud appeal to a broad range of companies.
Clearly, depending on an organization’s needs, there are advantages to a hybrid solution. While it might have been possible to dismiss the hybrid cloud in the early days of the cloud as nothing more than a buzzword, that’s no longer true. The hybrid cloud has evolved beyond the marketing hype to offer real solutions for an increasingly complex and challenging IT environment.
If an organization approaches the hybrid cloud with sufficient planning and a structured approach, a hybrid cloud can deliver on-demand flexibility, empower legacy systems and applications with new capabilities, and become a catalyst for digital transformation. The result can be an elastic and responsive infrastructure that has the ability to quickly respond to changing demands of the business.
As data management professionals increasingly recognize the advantages of the hybrid cloud, we can expect more and more of them to embrace it as an essential part of their IT strategy.
Tell Us What You’re Doing with the Hybrid Cloud
Are you currently embracing the hybrid cloud, or are you still uncertain or hanging back because you’re satisfied with how things are currently? Maybe you’ve gone totally hybrid. We’d love to hear your comments below on how you’re dealing with the hybrid cloud.
[1] Private cloud can be on-premises or a dedicated off-premises facility.
[2] Hybrid cloud orchestration solutions are often proprietary, vertical, and task dependent.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.