Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!
Containers June 25, 2018 | 09:00 AM – 09:45 AM PT – Running Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.
June 28, 2018 | 01:00 PM – 01:45 PM PT – Fireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device. IoT
Since our last System and Organization Control (SOC) audit, our service and compliance teams have been working to increase the number of AWS Services in scope prioritized based on customer requests. Today, we’re happy to report 11 services are newly SOC compliant, which is a 21 percent increase in the last six months.
Our latest SOC 1, 2, and 3 reports covering the period from October 1, 2017 to March 31, 2018 are now available. The SOC 1 and 2 reports are available on-demand through AWS Artifact by logging into the AWS Management Console. The SOC 3 report can be downloaded here.
Finally, prospective customers can read our SOC 1 and 2 reports by reaching out to AWS Compliance.
Want more AWS Security news? Follow us on Twitter.
Join us this month to learn about some of the exciting new services and solution best practices at AWS. We also have our first re:Invent 2018 webinar series, “How to re:Invent”. Sign up now to learn more, we look forward to seeing you.
May 30, 2018 | 01:00 PM – 01:45 PM PT – Accelerating Life Sciences with HPC on AWS – Learn how you can accelerate your Life Sciences research workloads by harnessing the power of high performance computing on AWS.
May 22, 2018 | 11:00 AM – 11:45 AM PT – Hybrid Cloud Customer Use Cases on AWS – Learn how customers are leveraging AWS hybrid cloud capabilities to easily extend their datacenter capacity, deliver new services and applications, and ensure business continuity and disaster recovery.
May 31, 2018 | 11:00 AM – 11:45 AM PT – Using AWS IoT for Industrial Applications – Discover how you can quickly onboard your fleet of connected devices, keep them secure, and build predictive analytics with AWS IoT.
May 24, 2018 | 09:00 AM – 09:45 AM PT – Introducing AWS DeepLens – Learn how AWS DeepLens provides a new way for developers to learn machine learning by pairing the physical device with a broad set of tutorials, examples, source code, and integration with familiar AWS services.
May 30, 2018 | 11:00 AM – 11:45 AM PT – Accelerate Productivity by Computing at the Edge – Learn how AWS Snowball Edge support for compute instances helps accelerate data transfers, execute custom applications, and reduce overall storage costs.
This blog post was co-authored by Ujjwal Ratan, a senior AI/ML solutions architect on the global life sciences team.
Healthcare data is generated at an ever-increasing rate and is predicted to reach 35 zettabytes by 2020. Being able to cost-effectively and securely manage this data whether for patient care, research or legal reasons is increasingly important for healthcare providers.
Healthcare providers must have the ability to ingest, store and protect large volumes of data including clinical, genomic, device, financial, supply chain, and claims. AWS is well-suited to this data deluge with a wide variety of ingestion, storage and security services (e.g. AWS Direct Connect, Amazon Kinesis Streams, Amazon S3, Amazon Macie) for customers to handle their healthcare data. In a recent Healthcare IT News article, healthcare thought-leader, John Halamka, noted, “I predict that five years from now none of us will have datacenters. We’re going to go out to the cloud to find EHRs, clinical decision support, analytics.”
I realize simply storing this data is challenging enough. Magnifying the problem is the fact that healthcare data is increasingly attractive to cyber attackers, making security a top priority. According to Mariya Yao in her Forbes column, it is estimated that individual medical records can be worth hundreds or even thousands of dollars on the black market.
In this first of a 2-part post, I will address the value that AWS can bring to customers for ingesting, storing and protecting provider’s healthcare data. I will describe key components of any cloud-based healthcare workload and the services AWS provides to meet these requirements. In part 2 of this post we will dive deep into the AWS services used for advanced analytics, artificial intelligence and machine learning.
The data tsunami is upon us
So where is this data coming from? In addition to the ubiquitous electronic health record (EHR), the sources of this data include:
devices such as MRIs, x-rays and ultrasounds
sensors and wearables for patients
medical equipment telemetry
Additional sources of data come from non-clinical, operational systems such as:
claims and billing
Data from these sources can be structured (e.g., claims data) as well as unstructured (e.g., clinician notes). Some data comes across in streams such as that taken from patient monitors, while some comes in batch form. Still other data comes in near-real time such as HL7 messages. All of this data has retention policies dictating how long it must be stored. Much of this data is stored in perpetuity as many systems in use today have no purge mechanism. AWS has services to manage all these data types as well as their retention, security and access policies.
Imaging is a significant contributor to this data tsunami. Increasing demand for early-stage diagnoses along with aging populations drive increasing demand for images from CT, PET, MRI, ultrasound, digital pathology, X-ray and fluoroscopy. For example, a thin-slice CT image can be hundreds of megabytes. Increasing demand and strict retention policies make storage costly.
Due to the plummeting cost of gene sequencing, molecular diagnostics (including liquid biopsy) is a large contributor to this data deluge. Many predict that as the value of molecular testing becomes more identifiable, the reimbursement models will change and it will increasingly become the standard of care. According to the Washington Post article “Sequencing the Genome Creates so Much Data We Don’t Know What to do with It,”
“Some researchers predict that up to one billion people will have their genome sequenced by 2025 generating up to 40 exabytes of data per year.”
Although genomics is primarily used for oncology diagnostics today, it’s also used for other purposes, pharmacogenomics — used to understand how an individual will metabolize a medication.
It is increasingly challenging for the typical hospital, clinic or physician practice to securely store, process and manage this data without cloud adoption.
Amazon has a variety of ingestion techniques depending on the nature of the data including size, frequency and structure. AWS Snowball and AWS Snowmachine are appropriate for extremely-large, secure data transfers whether one time or episodic. AWS Glue is a fully-managed ETL service for securely moving data from on-premise to AWS and Amazon Kinesis can be used for ingesting streaming data.
Amazon S3, Amazon S3 IA, and Amazon Glacier are economical, data-storage services with a pay-as-you-go pricing model that expand (or shrink) with the customer’s requirements.
The above architecture has four distinct components – ingestion, storage, security, and analytics. In this post I will dive deeper into the first three components, namely ingestion, storage and security. In part 2, I will look at how to use AWS’ analytics services to draw value on, and optimize, your healthcare data.
A typical provider data center will consist of many systems with varied datasets. AWS provides multiple tools and services to effectively and securely connect to these data sources and ingest data in various formats. The customers can choose from a range of services and use them in accordance with the use case.
For use cases involving one-time (or periodic), very large data migrations into AWS, customers can take advantage of AWS Snowball devices. These devices come in two sizes, 50 TB and 80 TB and can be combined together to create a petabyte scale data transfer solution.
The devices are easy to connect and load and they are shipped to AWS avoiding the network bottlenecks associated with such large-scale data migrations. The devices are extremely secure supporting 256-bit encryption and come in a tamper-resistant enclosure. AWS Snowball imports data in Amazon S3 which can then interface with other AWS compute services to process that data in a scalable manner.
For use cases involving a need to store a portion of datasets on premises for active use and offload the rest on AWS, the Amazon storage gateway service can be used. The service allows you to seamlessly integrate on premises applications via standard storage protocols like iSCSI or NFS mounted on a gateway appliance. It supports a file interface, a volume interface and a tape interface which can be utilized for a range of use cases like disaster recovery, backup and archiving, cloud bursting, storage tiering and migration.
By using the AWS proposed reference architecture for disaster recovery, healthcare providers can ensure their data assets are securely stored on the cloud and are easily accessible in the event of a disaster. The “AWS Disaster Recovery” whitepaper includes details on options available to customers based on their desired recovery time objective (RTO) and recovery point objective (RPO).
AWS is an ideal destination for offloading large volumes of less-frequently-accessed data. These datasets are rarely used in active compute operations but are exceedingly important to retain for reasons like compliance. By storing these datasets on AWS, customers can take advantage of the highly-durable platform to securely store their data and also retrieve them easily when they need to. For more details on how AWS enables customers to run back and archival use cases on AWS, please refer to the following set of whitepapers.
A healthcare provider may have a variety of databases spread throughout the hospital system supporting critical applications such as EHR, PACS, finance and many more. These datasets often need to be aggregated to derive information and calculate metrics to optimize business processes. AWS Glue is a fully-managed Extract, Transform and Load (ETL) service that can read data from a JDBC-enabled, on-premise database and transfer the datasets into AWS services like Amazon S3, Amazon Redshift and Amazon RDS. This allows customers to create transformation workflows that integrate smaller datasets from multiple sources and aggregates them on AWS.
Healthcare providers deal with a variety of streaming datasets which often have to be analyzed in near real time. These datasets come from a variety of sources such as sensors, messaging buses and social media, and often do not adhere to an industry standard. The Amazon Kinesis suite of services, that includes Amazon Kinesis Streams, Amazon Kinesis Firehose, and Amazon Kinesis Analytics, are the ideal set of services to accomplish the task of deriving value from streaming data.
Example: Using AWS Glue to de-identify and ingest healthcare data into S3 Let’s consider a scenario in which a provider maintains patient records in a database they want to ingest into S3. The provider also wants to de-identify the data by stripping personally- identifiable attributes and store the non-identifiable information in an S3 bucket. This bucket is different from the one that contains identifiable information. Doing this allows the healthcare provider to separate sensitive information with more restrictions set up via S3 bucket policies.
To ingest records into S3, we create a Glue job that reads from the source database using a Glue connection. The connection is also used by a Glue crawler to populate the Glue data catalog with the schema of the source database. We will use the Glue development endpoint and a zeppelin notebook server on EC2 to develop and execute the job.
Step 1: Import the necessary libraries and also set a glue context which is a wrapper on the spark context:
Step 2: Create a dataframe from the source data. I call the dataframe “readmissionsdata”. Here is what the schema would look like:
Step 3: Now select the columns that contains indentifiable information and store it in a new dataframe. Call the new dataframe “phi”.
Step 4: Non-PHI columns are stored in a separate dataframe. Call this dataframe “nonphi”.
Step 5: Write the two dataframes into two separate S3 buckets
Once successfully executed, the PHI and non-PHI attributes are stored in two separate files in two separate buckets that can be individually maintained.
In 2016, 327 healthcare providers reported a protected health information (PHI) breach, affecting 16.4m patient records. There have been 342 data breaches reported in 2017 — involving 3.2 million patient records.
To date, AWS has released 51 HIPAA-eligible services to help customers address security challenges and is in the process of making many more services HIPAA-eligible. These HIPAA-eligible services (along with all other AWS services) help customers build solutions that comply with HIPAA security and auditing requirements. A catalogue of HIPAA-enabled services can be found at AWS HIPAA-eligible services. It is important to note that AWS manages physical and logical access controls for the AWS boundary. However, the overall security of your workloads is a shared responsibility, where you are responsible for controlling user access to content on your AWS accounts.
AWS storage services allow you to store data efficiently while maintaining high durability and scalability. By using Amazon S3 as the central storage layer, you can take advantage of the Amazon S3 storage management features to get operational metrics on your data sets and transition them between various storage classes to save costs. By tagging objects on Amazon S3, you can build a governance layer on Amazon S3 to grant role based access to objects using Amazon IAM and Amazon S3 bucket policies.
To learn more about the Amazon S3 storage management features, see the following link.
In the example above, we are storing the PHI information in a bucket named “phi.” Now, we want to protect this information to make sure its encrypted, does not have unauthorized access, and all access requests to the data are logged.
Encryption: S3 provides settings to enable default encryption on a bucket. This ensures any object in the bucket is encrypted by default.
Logging: S3 provides object level logging that can be used to capture all API calls to the object. The API calls are logged in cloudtrail for easy access and consolidation. Moreover, it also supports events to proactively alert customers of read and write operations.
Access control: Customers can use S3 bucket policies and IAM policies to restrict access to the phi bucket. It can also put a restriction to enforce multi-factor authentication on the bucket. For example, the following policy enforces multi-factor authentication on the phi bucket:
In Part 1 of this blog, we detailed the ingestion, storage, security and management of healthcare data on AWS. Stay tuned for part two where we are going to dive deep into optimizing the data for analytics and machine learning.
Backblaze’s rapid ingest service, Fireball, graduates out of public beta. Our device holds 70 terabytes of customer data and is perfect for migrating large data sets to B2 Cloud Storage.
At Backblaze, we like to put ourselves in the customer’s shoes. Specifically, we ask questions like “how can we make cloud storage more useful?” There is a long list of things we can do to help — over the last few weeks, we’ve addressed some of them when we lowered the cost of downloading data to $0.01 / GB. Today, we are pleased to publicly release our rapid ingest service, Fireball.
What is the Backblaze B2 Fireball?
The Fireball is a hardware device, specifically a NAS device. Any Backblaze B2 customer can order it from inside their account. The Fireball device can hold up to 70 terabytes of data. Upon ordering, it ships from a Backblaze data center to you. When you receive it, you can transfer your data onto the Fireball using your internal network. Once your data transfer is complete, you send it back to a Backblaze data center. Finally, inside our secure data center, your data is uploaded from the Fireball to your account. Your data remains encrypted throughout the process. Step by step instructions can be found here.
Why Use the Fireball?
“We would not have been able to get this project off the ground without the B2 Fireball.” — James Cole, KLRU (Austin City Limits)
For most customers, transferring large quantities of data isn’t always simple. The need can arise as you migrate off of legacy systems (e.g. replacing LTO) or simply on a project basis (e.g. transferring video shot in the field to the cloud). An common approach is to upload your data via the internet to the cloud storage vendor of your choosing. While cloud storage vendors don’t charge for uploads, you have to pay your network provider for bandwidth. That’s assuming you are in a place where the bandwidth can be secured.
Your data is stored in megabytes (“MB”) but your bandwidth is measured in megabits per second (“Mbps”). The difference? An 80 Mbps upload connection will transfer no more than 10 MB per second. That means, in your best case scenario, you might be able to upload 50 terabytes in 50 days, assuming you use nearly all of your upload bandwidth for the upload.
If you’re looking to migrate old backups from LTO or even a large project, a 3 month lag time is not operationally viable. That’s why multiple cloud storage providers have introduced rapid ingest devices.
How It Compares: Backblaze B2 Fireball vs AWS Snowball vs GCS Transfer Appliance
“We found the B2 Fireball much simpler and easier to use than Amazon’s Snowball. WunderVu had been looking for a cloud solution for security and simplicity, and B2 hit every check box.” — Aaron Rhodes, Executive Producer, WunderVu
Every vendor that offers a rapid ingest service only lets you upload to that vendor’s cloud. For example, you can’t use an Amazon Snowball to upload to Google Cloud Storage. This means that when considering a rapid ingest service, you are also making a decision on what cloud storage vendor to use. As such, one should consider not only the cost of the rapid ingest service, but also how much that vendor is going to charge you to store and download your data.
Cloud Storage $/GB/Month
$550 (30 day rental)
$200 (10 day rental)
$300 (10 day rental)
*AWS does not estimate shipping fees at the time of the Snowball order.
To make the comparison easier, let’s create a hypothetical case and compare the costs incurred in the first year. Assume you have 100 TB as an initial upload. But that’s just the initial upload. Over the course of the year, let’s consider a usage pattern where every month you add 5 TB, delete 2 TBs, and download 10 TBs.
Cloud Storage Fees
Total Transfer + Cloud Storage Fees
$1,250 (2 Fireballs)
$400 (2 Snowballs)
$800 (1 transit)
Just looking at the first year, Amazon is 337% more expensive than Backblaze and Google is 374% more expensive than Backblaze.
Put simply, Backblaze offers the lowest cost, high performance cloud storage on the planet. During our public beta of the Fireball program we’ve had extremely positive feedback around how the Fireball enables customers to get their projects started in a time efficient and cost effective way. We hope you’ll give it a try!
AWS has achieved Spain’s Esquema Nacional de Seguridad (ENS) High certification across 29 services. To successfully achieve the ENS High Standard, BDO España conducted an independent audit and attested that AWS meets confidentiality, integrity, and availability standards. This provides the assurance needed by Spanish Public Sector organizations wanting to build secure applications and services on AWS.
The National Security Framework, regulated under Royal Decree 3/2010, was developed through close collaboration between ENAC (Entidad Nacional de Acreditación), the Ministry of Finance and Public Administration and the CCN (National Cryptologic Centre), and other administrative bodies.
The following AWS Services are ENS High accredited across our Dublin and Frankfurt Regions:
AWS has updated its certifications against ISO 9001, ISO 27001, ISO 27017, and ISO 27018 standards, bringing the total to 67 services now under ISO compliance. We added the following 29 services this cycle:
AWS maintains certifications through extensive audits of its controls to ensure that information security risks that affect the confidentiality, integrity, and availability of company and customer information are appropriately managed.
You can download copies of the AWS ISO certificates that contain AWS’s in-scope services and Regions, and use these certificates to jump-start your own certification efforts:
The Paris Region will benefit from three AWS Direct Connect locations. Telehouse Voltaire is available today. AWS Direct Connect will also become available at Equinix Paris in early 2018, followed by Interxion Paris.
All AWS infrastructure regions around the world are designed, built, and regularly audited to meet the most rigorous compliance standards and to provide high levels of security for all AWS customers. These include ISO 27001, ISO 27017, ISO 27018, SOC 1 (Formerly SAS 70), SOC 2 and SOC 3 Security & Availability, PCI DSS Level 1, and many more. This means customers benefit from all the best practices of AWS policies, architecture, and operational processes built to satisfy the needs of even the most security sensitive customers.
AWS is certified under the EU-US Privacy Shield, and the AWS Data Processing Addendum (DPA) is GDPR-ready and available now to all AWS customers to help them prepare for May 25, 2018 when the GDPR becomes enforceable. The current AWS DPA, as well as the AWS GDPR DPA, allows customers to transfer personal data to countries outside the European Economic Area (EEA) in compliance with European Union (EU) data protection laws. AWS also adheres to the Cloud Infrastructure Service Providers in Europe (CISPE) Code of Conduct. The CISPE Code of Conduct helps customers ensure that AWS is using appropriate data protection standards to protect their data, consistent with the GDPR. In addition, AWS offers a wide range of services and features to help customers meet the requirements of the GDPR, including services for access controls, monitoring, logging, and encryption.
From Our Customers Many AWS customers are preparing to use this new Region. Here’s a small sample:
Societe Generale, one of the largest banks in France and the world, has accelerated their digital transformation while working with AWS. They developed SG Research, an application that makes reports from Societe Generale’s analysts available to corporate customers in order to improve the decision-making process for investments. The new AWS Region will reduce latency between applications running in the cloud and in their French data centers.
SNCF is the national railway company of France. Their mobile app, powered by AWS, delivers real-time traffic information to 14 million riders. Extreme weather, traffic events, holidays, and engineering works can cause usage to peak at hundreds of thousands of users per second. They are planning to use machine learning and big data to add predictive features to the app.
Radio France, the French public radio broadcaster, offers seven national networks, and uses AWS to accelerate its innovation and stay competitive.
Les Restos du Coeur, a French charity that provides assistance to the needy, delivering food packages and participating in their social and economic integration back into French society. Les Restos du Coeur is using AWS for its CRM system to track the assistance given to each of their beneficiaries and the impact this is having on their lives.
AlloResto by JustEat (a leader in the French FoodTech industry), is using AWS to to scale during traffic peaks and to accelerate their innovation process.
AWS Consulting and Technology Partners We are already working with a wide variety of consulting, technology, managed service, and Direct Connect partners in France. Here’s a partial list:
AWS in France We have been investing in Europe, with a focus on France, for the last 11 years. We have also been developing documentation and training programs to help our customers to improve their skills and to accelerate their journey to the AWS Cloud.
As part of our commitment to AWS customers in France, we plan to train more than 25,000 people in the coming years, helping them develop highly sought after cloud skills. They will have access to AWS training resources in France via AWS Academy, AWSome days, AWS Educate, and webinars, all delivered in French by AWS Technical Trainers and AWS Certified Trainers.
Use it Today The EU (Paris) Region is open for business now and you can start using it today!
Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging
Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.
However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.
If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.
What is event-driven computing?
Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.
Which AWS services publish events to SNS natively?
Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.
Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).
AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.
Elastic Load Balancing:Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.
Amazon S3:Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.
Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.
Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.
AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.
Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.
Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.
Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.
AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.
Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.
AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.
In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:
Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.
Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.
Our Health Customer Stories page lists just a few of the many customers that are building and running healthcare and life sciences applications that run on AWS. Customers like Verge Health, Care Cloud, and Orion Health trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.
Sixteen More Services In my last HIPAA Eligibility Update I shared the news that we added eight additional services to our list of HIPAA eligible services. Today I am happy to let you know that we have added another sixteen services to the list, bringing the total up to 46. Here are the newest additions, along with some short descriptions and links to some of my blog posts to jog your memory:
Deriving insights from large datasets is central to nearly every industry, and life sciences is no exception. To combat the rising cost of bringing drugs to market, pharmaceutical companies are looking for ways to optimize their drug development processes. They are turning to big data analytics to better quantify the effect that their drug compounds have on different populations and to look for new clinical indications for existing drugs.
A real world evidence (RWE) platform is loosely defined as an integrated set of services and products that life sciences companies use to securely acquire, store, and analyze large, often disparate datasets to gain insight into the functions of a specific drug or intervention. The petabyte-scale data generated from wearables, medical devices, genomics, clinical imaging, and claims (to name a few) allows pharmaceutical and other life sciences companies to build big data platforms to analyze these datasets. And many of them are doing it on AWS.
At the center of nearly every RWE platform is a data lake that houses different data types. It also stores the related metadata to identify where each piece of data came from, who owned it, etc. Analytics engines integrate the relevant streaming (e.g., wearables), structured (e.g., claims data), and unstructured (e.g., notes in electronic health records) data.
In this post, I highlight common architectural patterns that customers are using to maximize the value of real world evidence on AWS. The architecture presented here can be reproduced in multiple regions, so you can respect local data sovereignty requirements, when applicable, while conducting global studies. This post doesn’t cover all considerations for real world evidence (such as security and authentication), but instead focuses on the areas that are related to your data flow.
A data lake is your source of truth
The ability to integrate disparate data types is critical to maximizing the utility of RWE. Life sciences companies have to be able to store, search, and retrieve data of different types and sizes, including (but not limited to) the following:
Streaming data from wearables and medical devices
Structured data from genomics and claims data
Unstructured data from notes in electronic health records
A common solution to integrate these data types is a data lake. Data lakes allow organizations to store all their data, regardless of data type, in a centralized repository. Because data can be stored as-is, there’s no need to convert it to a predefined schema. And you no longer need to know what questions you want to ask of your data beforehand. You can use data lakes for ad hoc analyses, so you can quickly explore and discover new insights without needing to structure the data first, as you would with a traditional data warehouse.
Although there are many uses for storing real world evidence data in a data lake, here are a few examples of the processes it can facilitate for pharma and biotech companies:
Quickly access all data for a given subject, from clinical images to genomics to claims data.
Associate incoming RWE data with existing data in your RWE data lake, such as by subject or study.
Select specific windows in longitudinal studies (microbiome, metabolomics, etc.).
The following diagram shows an architecture of the features within a data lake and several examples of how data enters a data lake:
This architecture, which is entirely serverless and backed by Amazon S3, lets you scale your data lake to easily accommodate any data size for your RWE platform. Additionally, the components presented in this diagram can be secured via IAM controls and service policies. This enables you to secure and protect the sensitive information that often resides within an RWE platform. Amazon S3 is the canonical source for objects in your RWE data lake. You track and search metadata that is associated with these objects in a data catalog built on Amazon Elasticsearch Service and Amazon DynamoDB.
Let’s dive deeper into what this architecture does and how data makes it into the data lake:
Streaming (e.g., wearables), structured, and unstructured data is acquired from myriad devices and sources. Depending on the data size, you might use AWS IoT (streaming), AWS Storage Gateway (mid-size/continual batch), and AWS Snowball (large legacy datasets, such as imaging). AWS IoT writes to Amazon Kinesis Firehose, which transforms the telemetry data in-flight to land both transformed and raw data in Amazon S3.
When data lands in Amazon S3 buckets, an AWS Lambda function is invoked (either by trigger or manually).
This Lambda function writes to a data catalog that is fronted by Amazon API Gateway. The data catalog contains metadata about all the object data in Amazon S3, as well as data that resides in databases, such as Amazon Redshift.
An AWS Lambda function on the other side of API Gateway writes the appropriate metadata about the objects, such as the study that the data was generated from, into Amazon Elasticsearch Service and/or Amazon DynamoDB, which I refer to as the data catalog.
The data catalog mentioned in steps 3 and 4 is central to your data management. In most analyses, the first step is to build a data manifest of where the data-of-interest lies, which is discussed in later sections.
Normalizing data for real world evidence
As I previously mentioned, data can enter your real world evidence data lake in many different formats. In many cases, you might need to normalize this data into a specific format to ease downstream analysis. For example, you might have to integrate many different electronic health records that are stored in different formats. You also might have to transform genomics data, often represented as a variant call format (VCF) file, into a format that’s easier for big data technologies like Apache Parquet to query. While you can certainly analyze this data in the format in which it entered, it might be better for querying after it’s transformed into a different format, or into a different data store.
In the architecture shown in the following diagram, AWS Step Functions orchestrates the data normalization (extract, transform, load—or ETL) process. Step Functions is a serverless workflow service that ensures that your long-running ETL jobs execute in order and complete successfully. At a high level, you first query the data catalog to get a manifest of the data to be normalized. Normalization occurs on Amazon EC2 instances, and the results are then stored back in Amazon S3 or in databases such as Amazon Redshift for future analysis. Locations of these results data are also logged in the data catalog for future querying.
Here is an example Step Functions state machine for this process:
The following is the accompanying JSON that produces it:
Either a user or an application submits a POST request through Amazon API Gateway to process or transform a set of data. This request contains the query parameters that correspond to generating the data manifest. It is passed into an AWS Step Functions state machine, which orchestrates the ETL process.
A Lambda function queries the data catalog and builds a manifest that contains the location of the data of interest.
The manifest is passed to downstream Lambda functions in the state machine that orchestrate the batch workflow to process the data that’s specified in the manifest.
These Lambda functions submit jobs to AWS Batch to execute batch jobs on Amazon EC2.
Amazon EC2 processes the data (e.g., data normalization) and stores results back in Amazon S3 and/or Amazon Redshift.
Results data is then logged in the data catalog, using the process shown in the data acquisition diagram.
Analyzing and visualizing data in real world evidence
Ultimately, the value in real world evidence platforms is the value you derive from the wide variety of data that feeds into it. Big data analytics, such as Spark on EMR, machine learning with the Deep Learning AMIs, and healthcare data warehouses, are all fundamental to maximizing the value of RWE.
Given the wide array of questions you can answer, you want your architecture to be as flexible as possible. Your organization also might use business intelligence (BI) tools. Again, as with the data normalization tier, you can use Step Functions to orchestrate your workflow. You first build the data manifest and then submit each portion of your desired analysis to the relevant data location and compute options, such as Amazon EMR or Amazon Redshift. Your organization’s BI tool of choice, such as Amazon QuickSight or one offered by our AWS Big Data Competency partners, can extract and visualize the results.
Here is the workflow in more detail:
A user initiates a query on a dataset. In the context of RWE, this is largely analyzing a cohort in the context of a specific indication (drug response, etc.).
AWS Step Functions invokes a Lambda function to query the data catalog and build a manifest of data.
The manifest is passed to subsequent Lambda functions, which orchestrate the data analysis through different AWS services.
These analyses can include Amazon EMR for population-scale genomics, Amazon EC2 for HPC and machine learning, and Amazon Redshift for your healthcare data warehouse.
Results from each analysis are staged back in Amazon S3 and logged in the data catalog.
BI tools like Amazon QuickSight or Tableau, or data analysis workbooks like Jupyter can query Amazon S3 to visualize results, such as through Amazon Athena.
A practical example
Imagine that you are at a pharmaceutical company that is developing a drug to address a neurological condition. You have longitudinal data in the form of brain scans and have noticed that the drug compound under development seems to benefit a specific subpopulation of individuals. As a result, you determine to sequence the genomes of all the responders and non-responders to look for specific biomarkers that could indicate response levels. Success would result in a companion diagnostic that you could use alongside your drug to increase its overall efficacy by delivering it only to the responding population.
Let’s briefly look at how this study might play out in the steps of data acquisition, normalization, and analysis described earlier.
Data from longitudinal brain imaging studies can be moved to AWS using AWS Snowball. Genomics data coming off your genome sequencers would land in Amazon S3 via AWS Storage Gateway and/or AWS Direct Connect. These datasets are stored in your data catalog where you track items such as date generated, anonymized subject ID, etc.
Genomes are transformed from raw reads (e.g., FASTQ) to a human readable format (VCF) that identifies variations in a genome. These VCF files are subsequently extracted, transformed, and loaded into a format that is amenable to big data analytics, such as Parquet.
You build a data manifest of your brain scans in Amazon S3. You use the deep learning AMI and P2 instance family to build machine learning models to identify images that represent different stages in your disorder-of-interest. You then run association analyses that combine your genomics data with your brain imaging models and drug response data to identify what genomic variants associate with improved treatment outcomes. You can manage these analyses via Jupyter notebooks, or you can connect with your BI tool of choice to visualize your results.
It will always be day one for RWE
By implementing real world evidence platforms on AWS, you can quickly integrate and interrogate disparate healthcare data to advance human health. By designing your RWE platform to be flexible, you can readily incorporate new big data technologies as they become relevant to your needs. This enables you to innovate and discover at a quicker rate.
The healthcare ecosystem has chosen a variety of tools and techniques for working with big data, but one tool that comes up again and again in many of the architectures we design and review is Spark on Amazon EMR. Will spark power the data behind precision medicine?
About the Authors
Dr. Aaron Friedman is a Healthcare and Life Sciences Partner Solutions Architect at Amazon Web Services. He works with ISVs and SIs to architect healthcare solutions on AWS, and bring the best possible experience to their customers. His passion is working at the intersection of science, big data, and software. In his spare time, he’s exploring the outdoors, learning a new thing to cook, or spending time with his wife and his dog, Macaroon.
Do you know about the AWS Support Knowledge Center? It contains answers to some of the most frequently asked questions and other requests asked of our support team. Many of the answers even include a short video that serves to illustrate the process or to provide additional info on the topic.
For example, I recently stepped in to our studio and created a new video called Preparing to Send a Snowball Back to AWS. In 90 action-packed seconds, this video shows you how to power down the Snowball, stow the cables, lock the back panel, and verify that the proper return address is on the built-in display:
Visit the Knowledge Center to see other videos and to find answers to other questions that you might have about AWS.
Last week we launched our 15th AWS Region and today we are launching our 16th. We have expanded the AWS footprint into the United Kingdom with a new Region in London, our third in Europe. AWS customers can use the new London Region to better serve end-users in the United Kingdom and can also use it to store data in the UK.
From Our Customers Many AWS customers are getting ready to use this new Region. Here’s a very small sample:
Trainline is Europe’s number one independent rail ticket retailer. Every day more than 100,000 people travel using tickets bought from Trainline. Here’s what Mark Holt (CTO of Trainline) shared with us:
We recently completed the migration of 100 percent of our eCommerce infrastructure to AWS and have seen awesome results: improved security, 60 percent less downtime, significant cost savings and incredible improvements in agility. From extensive testing, we know that 0.3s of latency is worth more than 8 million pounds and so, while AWS connectivity is already blazingly fast, we expect that serving our UK customers from UK datacenters should lead to significant top-line benefits.
Kainos Evolve Electronic Medical Records (EMR) automates the creation, capture and handling of medical case notes and operational documents and records, allowing healthcare providers to deliver better patient safety and quality of care for several leading NHS Foundation Trusts and market leading healthcare technology companies.
Travis Perkins, the largest supplier of building materials in the UK, is implementing the biggest systems and business change in its history including the migration of its datacenters to AWS.
Just Eat is the world’s leading marketplace for online food delivery. Using AWS, JustEat has been able to experiment faster and reduce the time to roll out new feature updates.
OakNorth, a new bank focused on lending between £1m-£20m to entrepreneurs and growth businesses, became the UK’s first cloud-based bank in May after several months of working with AWS to drive the development forward with the regulator.
Partners I’m happy to report that we are already working with a wide variety of consulting, technology, managed service, and Direct Connect partners in the United Kingdom. Here’s a partial list:
AWS Premier Consulting Partners – Accenture, Claranet, Cloudreach, CSC, Datapipe, KCOM, Rackspace, and Slalom.
AWS Consulting Partners – Attenda, Contino, Deloitte, KPMG, LayerV, Lemongrass, Perfect Image, and Version 1.
AWS Managed Service Partners – Claranet, Cloudreach, KCOM, and Rackspace.
AWS Direct Connect Partners – AT&T, BT, Hutchison Global Communications, Level 3, Redcentric, and Vodafone.
Here are a few examples of what our partners are working on:
KCOM is a professional services provider offering consultancy, architecture, project delivery and managed service capabilities to large UK-based enterprise businesses. The scalability and flexibility of AWS gives them a significant competitive advantage with their enterprise and public sector customers. The new Region will allow KCOM to build innovative solutions for their public sector clients while meeting local regulatory requirements.
Splunk is a member of the AWS Partner Network and a market leader in analyzing machine data to deliver operational intelligence for security, IT, and the business. They use cloud computing and big data analytics to help their customers to embrace digital transformation and continuous innovation. The new Region will provide even more companies with real-time visibility into the operation of their systems and infrastructure.
Redcentric is a NHS Digital-approved N3 Commercial Aggregator. Their work allows health and care providers such as NHS acute, emergency and mental trusts, clinical commissioning groups (CCGs), and the ISV community to connect securely to AWS. The London Region will allow health and care providers to deliver new digital services and to improve outcomes for citizens and patients.
Compliance & Connectivity Every AWS Region is designed and built to meet rigorous compliance standards including ISO 27001, ISO 9001, ISO 27017, ISO 27018, SOC 1, SOC 2, SOC3, PCI DSS Level 1, and many more. Our Cloud Compliance page includes information about these standards, along with those that are specific to the UK, including Cyber Essentials Plus.
The UK Government recognizes that local datacenters from hyper scale public cloud providers can deliver secure solutions for OFFICIAL workloads. In order to meet the special security needs of public sector organizations in the UK with respect to OFFICIAL workloads, we have worked with our Direct Connect Partners to make sure that obligations for connectivity to the Public Services Network (PSN) and N3 can be met.
Use it Today The London Region is open for business now and you can start using it today! If you need additional information about this Region, please feel free to contact our UK team at [email protected].
We are growing the AWS footprint once again. Our new Canada (Central) Region is now available and you can start using it today. AWS customers in Canada and the northern parts of the United States have fast, low-latency access to the suite of AWS infrastructure services.
The Region supports all sizes of C4, D2, M4, T2, and X1 instances.
As part of our on-going focus on making cloud computing available to you in an environmentally friendly fashion, AWS data centers in Canada draw power from a grid that generates 99% of its electricity using hydropower (read about AWS Sustainability to learn more).
Well Connected After receiving a lot of positive feedback on the network latency metrics that I shared when we launched the AWS Region in Ohio, I am happy to have a new set to share as part of today’s launch (these times represent a lower bound on latency and may change over time).
The first set of metrics are to other Canadian cities:
9 ms to Toronto.
14 ms to Ottawa.
47 ms to Calgary.
49 ms to Edmonton.
60 ms to Vancouver.
The second set are to locations in the US:
9 ms to New York.
19 ms to Chicago.
16 ms to US East (Northern Virginia).
27 ms to US East (Ohio).
75 ms to US West (Oregon).
Canada is also home to CloudFront edge locations in Toronto, Ontario, and Montreal, Quebec.
And Canada Makes 15 Today’s launch brings our global footprint to 15 Regions and 40 Availability Zones, with seven more Availability Zones and three more Regions coming online through the next year. As a reminder, each Region is a physical location where we have two or more Availability Zones or AZs. Each Availability Zone, in turn, consists of one or more data centers, each with redundant power, networking, and connectivity, all housed in separate facilities. Having two or more AZ’s in each Region gives you the ability to run applications that are more highly available, fault tolerant, and durable than would be the case if you were limited to a single AZ.
Nous étendons la portée d’AWS une fois de plus. Notre nouvelle Région du Canada (Centre) est maintenant disponible et vous pouvez commencer à l’utiliser dès aujourd’hui. Les clients d’AWS au Canada et dans les régions du nord des États-Unis ont un accès rapide et à latence réduite à l’ensemble des services d’infrastructure AWS.
La région supporte toutes les tailles des instances C4, D2, M4, T2 et X1.
Dans le cadre de notre mission continue de vous offrir des services infonuagiques de manière écologique, les centres de données d’AWS au Canada sont alimentés par un réseau électrique dont 99 pour cent de l’énergie fournie est de nature hydroélectrique (consultez AWS Sustainability pour en savoir plus).
Bien connecté Après avoir reçu beaucoup de commentaires positifs sur les mesures de latence du réseau dont je vous ai fait part lorsque nous avons lancé la région AWS en Ohio, je suis heureux de vous faire part d’un nouvel ensemble de mesures dans le cadre du lancement d’aujourd’hui (ces mesures représentent une limite inférieure à la latence et pourraient changer au fil du temps).
Le premier ensemble de mesures concerne d’autres villes canadiennes:
9 ms à Toronto.
14 ms à Ottawa.
47 ms à Calgary.
49 ms à Edmonton.
60 ms à Vancouver.
Le deuxième ensemble concerne des emplacements aux États-Unis :
9 ms à New York.
19 ms à Chicago.
16 ms à USA Est (Virginie du Nord).
27 ms à USA Est (Ohio).
75 ms à USA Ouest (Oregon).
Le Canada compte également des emplacements périphériques CloudFront à Toronto, en Ontario, et à Montréal, au Québec.
Et le Canada fait 15 Le lancement d’aujourd’hui porte notre présence mondiale à 15 régions et 40 zones de disponibilité avec sept autres zones de disponibilité et trois autres régions qui seront mises en opération au cours de la prochaine année. Pour vous rafraîchir la mémoire, chaque région est un emplacement physique où nous avons deux ou plusieurs zones de disponibilité. Chaque zone de disponibilité, à son tour, comprend un ou plusieurs centres de données, chacun doté d’une alimentation, d’une mise en réseau et d’une connectivité redondantes dans des installations distinctes. Avoir deux zones de disponibilité ou plus dans chaque région vous donne la possibilité d’opérer des applications qui sont plus disponibles, plus tolérantes aux pannes et plus durables qu’elles ne le seraient si vous étiez limité à une seule zone de disponibilité.
Moving large amounts of on-premises data to the cloud as part of a migration effort is still more challenging than it should be! Even with high-end connections, moving petabytes or exabytes of film vaults, financial records, satellite imagery, or scientific data across the Internet can take years or decades. On the business side, adding new networking or better connectivity to data centers that are scheduled to be decommissioned after a migration is expensive and hard to justify.
However, customers with exabyte-scale on-premises storage look at the 80 TB, do the math, and realize that an all-out data migration would still require lots of devices and some headache-inducing logistics.
Introducing AWS Snowmobile In order to meet the needs of these customers, we are launching Snowmobile today. This secure data truck stores up to 100 PB of data and can help you to move exabytes to AWS in a matter of weeks (you can get more than one if necessary). Designed to meet the needs of our customers in the financial services, media & entertainment, scientific, and other industries, Snowmobile attaches to your network and appears as a local, NFS-mounted volume. You can use your existing backup and archiving tools to fill it up with data destined for Amazon Simple Storage Service (S3) or Amazon Glacier.
Physically, Snowmobile is a ruggedized, tamper-resistant shipping container 45 feet long, 9.6 feet high, and 8 feet wide. It is water-proof, climate-controlled, and can be parked in a covered or uncovered area adjacent to your existing data center. Each Snowmobile consumes about 350 KW of AC power; if you don’t have sufficient capacity on site we can arrange for a generator.
On the security side, Snowmobile incorporates multiple layers of logical and physical protection including chain-of-custody tracking and video surveillance. Your data is encrypted with your AWS Key Management Service (KMS) keys before it is written. Each container includes GPS tracking, with cellular or satellite connectivity back to AWS. We will arrange for a security vehicle escort when the Snowmobile is in transit; we can also arrange for dedicated security guards while your Snowmobile is on-premises.
Each Snowmobile includes a network cable connected to a high-speed switch capable of supporting 1 Tb/second of data transfer spread across multiple 40 Gb/second connections. Assuming that your existing network can transfer data at that rate, you can fill a Snowmobile in about 10 days.
Snowmobile in Action I don’t happen to have an exabyte-scale data center and I certainly don’t have room next to my house for a 45 foot long container. In order to illustrate the process of arranging for and using a Snowmobile, I sat down at my LEGO table and (in the finest Doc Brown tradition) built a scale model. I hope that you enjoy this brick-based story telling!
Let’s start in your data center. It was built a while ago and is definitely showing its age. The racks are full of disk and tape drives of multiple vintages, each storing precious, mission-critical data. You and your colleagues spend too much time inside of the raised floor, tracking cables and trying to squeeze out just a bit more performance:
Your manager is getting frustrated and does not know what to do next:
Fortunately, one of your colleagues reads this blog every day and she knows just what to do:
A quick phone call to AWS and a meeting is set up:
Everyone gets together at a convenient AWS office to learn more about Snowmobile and to plan the migration:
Everyone gathers around to look at the scale model of the Snowmobile. Even the dog is intrigued, and your manager takes a picture:
A Snowmobile shows up at your data center:
AWS Professional Services helps you to get it connected and you initiate the data transfer:
The Snowmobile heads back to AWS and your data is imported as you specified!
Snowmobile at DigitalGlobe Our friends at DigitalGlobe are using a Snowmobile to move 100 PB of satellite imagery to AWS. Here’s what Jay Littlepage (former Amazonian and now VP of Infrastructure & Operations at DigitalGlobe) has to say about this effort:
Like many large enterprises, we are in the process of migrating IT operations from our data centers to AWS. Our geospatial big data platform, GBDX, has been based in AWS since inception. But our unmatchable 16-year archive of high-resolution satellite imagery, visualizing 6 billion square kilometers of the Earth’s surface, has been stored within our facilities. We have slowly been migrating our archive to AWS but that process has been slow and inefficient. Our constellation of satellites generate more earth imagery each year (10 PB) than we have been able to migrate by these methods.
We needed a solution that could move our 100 PB archive but could not find one until now with AWS Snowmobile. DigitalGlobe is currently migrating our entire raw imagery archive with one Snowmobile transfer directly into an Amazon Glacier Vault. AWS Snowmobile operators are providing an amazing customized service where they manage the configuration, monitoring, and logistics. Using Snowmobile’s data transfer abilities will get our time-lapse imagery archive to the cloud more quickly, allowing our customers and partners to have access to uniquely massive data sets. By using AWS’ elastic computing platform within GBDX, we will run distributed image analysis, revealing the pace and pattern of world-wide change on an extraordinary scale, with unprecedented speed, in a more cost-effective manner – prioritizing insights over infrastructure. Without Snowmobile, we would not have been able to transfer our extremely large volume of data in such a short time or create new business opportunities for our customers. Snowmobile is truly a game changer!
Things to Know Here are a couple of final things you should know about Snowmobile:
Data Export – The initial launch is aimed at data import (on-premises to AWS). We do know that some of our customers are interested in data export, with a particular focus on disaster recovery (DR) use cases.
Availability – Snowmobile is available in all AWS Regions. As you can see from reading the previous section, this is not a self-serve product. My AWS Sales colleagues are ready to discuss your data import needs with you.
Pricing – I don’t have pricing info to share. However, we intend to make sure that Snowmobile is both faster and less expensive than using a network-based data transfer model.
While all of these improvements were important they did not change the basic character of the appliance. Over the past year or so, with many AWS customers putting the original Snowball to work in many different types of physical environments and on a very wide variety of migration, big data, genomics, and data collection workloads, we have seen that there’s room to make this appliance even more functional.
Many customers are generating large amounts of data (often hundreds of terabytes) in situations where network connectivity is limited or non-existent and the physical environment is extreme. Customers want to collect data that they generate in farms, factories, hospitals, aircraft, and oil wells. From shop floor metrics to video surveillance, to information collected by IoT devices, customers are interested in a model that goes beyond straightforward store-and-forward data to collection, and would like to be able to do some local processing as the data arrives. They want to filter, clean, analyze, organize, track, summarize, and monitor the data as it arrives. They want to scan incoming data for patterns or problems, and raise alerts quickly if something interesting is detected.
New Snowball Edge Today we are adding Snowball Edge to the lineup. This appliance expands the scope of the Snowball, adding more connectivity, more storage, horizontal scalability via clustering, new storage endpoints that can be accessed from existing S3 and NFS clients, and Lambda-powered local processing.
Physically, Snowball Edge is designed to be at home in rough-and-tumble industrial, aerospace, agricultural, and military environments. The new form factor is also suitable for rack mounting in situations where you are taking advantage of the new clustering feature.
Let’s take a quick look at all of the new features!
More Connectivity This appliance is well-connected, and gives you plenty of high-speed options. On the network side you can use 10GBase-T, 10 or 25 Gb SFP28, or 40 Gb QSFP+. Your IoT devices can upload data using 3G cellular or Wi-Fi. If that’s not enough, you can connect Zigbee, other IoT receivers, and external storage devices to the USB 3.0 port or the PCIe expansion port.
You have access to enough connectivity to copy data to Snowball Edge at up to 14 Gb per second; you can copy 100 TB in 19 hours or so. Beginning-to-end (initiate data transfer to data available in S3) the entire process takes a week, including shipping and handling along the way.
More Storage Snowball Edge includes 100 TB of storage.
Horizontal Scaling via Clustering You can easily configure 2 or more Snowball Edge appliances into a cluster to add capacity and to increase durability, while keeping all of the storage accessible through a single endpoint. For example, clustering 6 appliances will create a highly available cluster with 400 TB of storage and 99.999% durability. This allows you to remove 2 of the appliances and still keep your data protected.
You can grow clusters up to petabyte scale, and you can shrink them by simply removing and returning appliances. The clusters are self-managing and you don’t need to worry about software updates or other tedium.
Order a cluster by simply checking Local compute and storage only and Make this a cluster when you set up your job:
New Storage Endpoints (S3 and NFS) If you have existing backup, archiving, or data transfer tools that “speak” S3 or NFS, you can now use them to store and access data stored on Snowball Edge. If you create a cluster of 2 or more appliances, the same endpoint applies to all of them; this allows you to think of your cluster as local, network-attached storage.
Snowball Edge supports a powerful subset of the S3 API, including LIST, GET, PUT, DELETE, HEAD, and Multipart Upload. It also supports NFS v3 and NFS 4.1.
When you use Snowball Edge as a file storage gateway and access it via NFS, file and directory metadata (permissions, ownership, and timestamps) is mapped to S3 metadata, and preserved when the data is ingested to S3. You can use this feature to migrate data, bootstrap your usage of AWS Storage Gateway, or to store on-premises files for sharing between on-premises apps.
Lambda-Powered Local Processing You can now write AWS Lambda functions in Python and use them to process data as it is uploaded to an S3 bucket associated with a Snowball Edge.
The functions can (as I hinted at earlier) filter, clean, analyze, organize, track, summarize the data as it arrives. Snowball Edge gives you the ability to add intelligence and sophistication to your data collection and data processing systems.
We are starting out with support for the S3 PUT operation, and you can use one function per bucket. The functions must be written in Python, and are run in a Lambda environment that is configured for 128 MB of memory.
You configure your functions when you order your Snowball Edge:
We do recommend that you test your functions in the cloud before placing your order.
Pricing and Availability Snowball Edge is designed to be deployed in plug-and-play fashion. Your colleagues in the field don’t have to configure or administer it. The on-board LCD panel displays status information and plays setup videos. The on-board code is self-updating; there’s no routine software maintenance. You can check the status and make late-breaking configuration changes to deployed appliances through the AWS Management Console (API and CLI access is also available).
Each Snowball Edge job costs $300 plus shipping. You can keep each appliance for up to 10 days; after that you’ll be charged $30 per appliance per day. You can run Lambda functions locally at no charge.
Many of the tools and technologies now in use at your local doctor, dentist, hospital, or other healthcare provider generate massive amounts of sensitive digital data. Other prolific data generators include genomic sequencers and any number of activity and fitness trackers. We all want to benefit from the insights that can be produced by this “data tsunami,” but we also want to be confident that it will be stored in a protected fashion and processed in a responsible manner.
In the United States, protection of healthcare data is governed by HIPAA (the Health Insurance Portability and Accountability Act). Because many AWS customers would like to store and process sensitive health care data on the cloud, we have worked to make multiple AWS services HIPAA-eligible; this means that the services can be used to process Protected Health Information (PHI) and to build applications that are HIPAA-compliant (read HIPAA in the Cloud to learn more about what Cleveland Clinic, Orion Health, Eliza, Philips, and other AWS customers are doing).
Last year I introduced you to AWS Import/Export Snowball. This is an AWS-owned storage appliance that you can use to move large amounts of data (generally 10 terabytes or more) to AWS on a one-time or recurring basis. You simply request a Snowball from the AWS Management Console, connect it to your network when it arrives, copy your data to it, and then send it back to us so that we can copy the data to the AWS storage service of your choice. Snowball encrypts your data using keys that you specify and control.
With Snowball now on the list of HIPAA-eligible services, AWS customers in the Healthcare and Life Sciences space can quickly move on-premises data to Snowball and then process it using any of the services that I just mentioned. For example, they can use the new HDFS Import feature to migrate an existing on-premises Hadoop cluster to the cloud and analyze it using a scalable EMR cluster. They can also move existing petabyte-scale data (medical images, patient records, and the like) to AWS and store it in S3 or Glacier, both already HIPAA-eligible. These services are proven, easy to use, and offer high data durability at low cost.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.