Owen Zacharias, Vice President of Application Delivery at NextGen Healthcare, explains to AWS Solutions Architect Andrea Sabet how his company developed a series of build and deployment pipelines using native AWS services in the highly regulated healthcare sector.
Learn how the following services can be used to build and deploy infrastructure and application code:
AWS CloudFormation, which provides a common language for you to model and provision AWS and third party application resources in your cloud environment
Discover how AWS resources can be rapidly created and updated as part of a CI/CD pipeline while ensuring HIPAA compliance through approved/vetted AWS Identity and Access Management (IAM) roles that AWS CloudFormation is permitted to assume.
We are living in a golden age of innovation, where personalized medicine is making it possible to cure diseases that we never thought curable. Digital medicine is helping people with diseases get healthier, and we are constantly discovering how to use the body’s immune system to target and eradicate cancer cells. According to a report published by ClinicalTrials.gov, the number of registered studies hit 293,000 in 2018, representing a 250x growth since 2000.
However, an internal rate of return (IRR) analysis by Endpoints News, using data from EvaluatePharma, highlights some interesting trends. A flourishing trend in pharma innovation is supported by strong growth in registered studies. However, the IRR shows a rapidly declining trend, from around 17 percent in 2000 to below the cost of capital in 2017 and projected to go to 0 percent by 2020.
This blog post is the first installment in a series that focuses on the end-to-end workflow of collecting, storing, processing, visualizing, and acting on clinical trials data in a compliant and secure manner. The series also discusses the application of artificial intelligence and machine learning technologies to the world of clinical trials. In this post, we highlight common architectural patterns that AWS customers use to modernize their clinical trials. These incorporate mobile technologies for better evidence generation, cost reduction, increasing quality, improving access, and making medicine more personalized for patients.
Improving the outcomes of clinical trials and reducing costs
Biotech and pharma organizations are feeling the pressure to use resources as efficiently as possible. This pressure forces them to search for any opportunity to streamline processes, get faster, and stay more secure, all while decreasing costs. More and more life sciences companies are targeting biologics, CAR-T, and precision medicine therapeutics, with focus shifting towards smaller, geographically distributed patient segments. This shift has resulted in an increasing mandate to capture data from previously unavailable, nontraditional sources. These sources include mobile devices, IoT devices, and in-home and clinical devices. Life sciences companies merge data from these sources with data from traditional clinical trials to build robust evidence around the safety and efficacy of a drug.
Improvised data ingestion using mobile technologies can speed up outcomes, reduce costs, and improve the accuracy of clinical trials. This is especially true when mobile data ingestion is supplemented with artificial intelligence and machine learning (AI/ML) technologies.
Together, they can usher in a new age of smart clinical trials.
At the same time, traditional clinical trial processes and technology designed for mass-marketed blockbuster drugs can’t effectively meet emerging industry needs. This leaves life sciences and pharmaceutical companies in need of assistance for evolving their clinical trial operations. These circumstances result in making clinical trials one of the largest areas of investment for bringing a new drug to market.
Using mobile technologies with traditional technologies in clinical trials can improve the outcomes of the trials and simultaneously reduce costs. Some of the use cases that the integration of various technologies enables include these:
Identifying and tracking participants in clinical trials
Identifying participants for clinical trials recruitment
Educating and informing patients participating in clinical trials
Implementing standardized protocols and sharing associated information to trial participants
Tracking adverse events and safety profiles
Integrating genomic and phenotypic data for identifying novel biomarkers
Integrating mobile data into clinical trials for better clinical trial management
Creation of a patient-control arm based on historical data
Stratifying cohorts based on treatment, claims, and registry datasets
Building a collaborative, interoperable network for data sharing and knowledge creation
Building compliance-ready infrastructure for clinical trial management
The AWS Cloud provides HIPAA eligible services and solutions. As an AWS customer, you can use these to build solutions for global implementation of mobile devices and sensors in trials, secure capture of streaming Internet of Things (IoT) data, and advanced analytics through visualization tools or AI/ML capabilities. Some of the use cases these services and solutions enable are finding and recruiting patients using smart analytics, facilitating global data management, and remote or in-patient site monitoring. Others include predicting lack of adherence, detecting adverse events, and accelerating trial outcomes along with optimizing trial costs.
Clinical Trials 2.0 (CT2.0) at AWS is geared toward facilitating wider adoption of cloud-native services to enable data ingestion from disparate sources, cost-optimized and reliable storage, and holistic analytics. At the same time, CT2.0 provides the granular access control, end-to-end security, and global scalability needed to conduct clinical trials more efficiently.
Reference architecture
One of the typical architectures for managing a clinical trial using mobile technologies is shown following. This architecture focuses on capturing real-time data from mobile sources and providing a way to process it.
* – Additional considerations such as data security, access control, and compliance need to be incorporated into the architecture and are discussed in the remainder of this post.
Managing a trial by using this architecture consists of the following five major steps.
Step 1: Collect data
Mobile devices, personal wearables, instruments, and smart-devices are extensively being used (or being considered) by global pharmaceutical companies in patient care and clinical trials to provide data for activity tracking, vital signs monitoring, and so on, in real-time. Devices like infusion pumps, personal use dialysis machines, and so on require tracking and alerting of device consumables and calibration status. Remote settings management is also a major use case for these kinds of devices. The end-user mobile devices used in the clinical trial emit a lot of telemetry data that requires real-time data capture, data cleansing, transformation, and analysis.
Typically, these devices are connected to an edge node or a smart phone. Such a connection provides sufficient computing resources to stream data to AWS IoT Core. AWS IoT Core can then be configured to write data to Amazon Kinesis Data Firehose in near real time. Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon S3. S3 provides online, flexible, cost efficient, pay-as-you-go storage that can replicate your data on three Availability Zones within an AWS Region. The edge node or smart phone can use the AWS IoT SDKs to publish and subscribe device data to AWS IoT Core with MQTT. This process uses the AWS method of authentication (called ‘SigV4’), X.509 certificate–based authentication, and customer-created token-based authentication (through custom authorizers). This authenticated approach enables you to map your choice of policies to each certificate and remotely control device or application access. You can also use the Kinesis Data Firehose encryption feature to enable server-side data encryption.
You can also capture additional data such as Case Report Forms (CRF), Electronic Medical Records (EMR), and medical images using Picture Archiving and Communication Systems (PACS). In addition, you can capture laboratory data (Labs) and other Patient Reported Outcomes data (ePRO). AWS provides multiple tools and services to effectively and securely connect to these data sources, enabling you to ingest data in various volumes, variety, and velocities. For more information about creating a HealthCare Data Hub and ingesting Digital Imaging and Communications in Medicine (DICOM) data, see the AWS Big Data Blog post Create a Healthcare Data Hub with AWS and Mirth Connect.
Step 2: Store data
After data is ingested from the devices and wearables used in the clinical trial, Kinesis Data Firehose is used to store the data on Amazon S3. This stored data serves as a raw copy and can later be used for historical analysis and pattern prediction. Using Amazon S3’s lifecycle policies, you can periodically move your data to reduced cost storage such as Amazon S3 Glacier for further optimizing their storage costs. Using Amazon S3 Intelligent Tiering can automatically optimize costs when data access patterns change, without performance impact or operational overhead by moving data between two access tiers—frequent access and infrequent access. You can also choose to encrypt data at rest and in motion using various encryption options available on S3.
Amazon S3 offers an extremely durable, highly available, and infinitely scalable data storage infrastructure, simplifying most data processing, backup, and replication tasks.
Step 3: Data processing—fast lane
After collecting and storing a raw copy of the data, Amazon S3 is configured to publish events to AWS Lambda and invoke a Lambda function by passing the event data as a parameter. The Lambda function is used to extract the key performance indicators (KPIs) such as adverse event notifications, medication adherence, and treatment schedule management from the incoming data. You can use Lambda to process these KPIs and store them in Amazon DynamoDB, along with encryption at rest, which powers a near-real-time clinical trial status dashboard. This alerts clinical trial coordinators in real time so that appropriate interventions can take place.
In addition to this, using a data warehouse full of medical records, you can train and implement a machine learning model. This model can predict which patients are about to switch medications or might exhibit adherence challenges in the future. Such prediction can enable clinical trial coordinators to narrow in on those patients with mitigation strategies.
Step 4: Data processing—batch
For historical analysis and pattern prediction, the staged data (stored in S3) is processed in batches. A Lambda function is used to trigger the extract, transform, and load (ETL) process every time new data is added to the raw data S3 bucket. This Lambda function triggers an ETL process using AWS Glue, a fully managed ETL service that makes it easy for you to prepare and load your data for analytics. This approach helps in mining current and historical data to derive actionable insights, which is stored on Amazon S3.
From there, data is loaded on to Amazon Redshift, a cost-effective, petabyte-scale data warehouse offering from AWS. You can also use Amazon Redshift Spectrum to extend data warehousing out to exabytes without loading any data to Amazon Redshift, as detailed in the Big Data blog post Amazon Redshift Spectrum Extends Data Warehousing Out to Exabytes—No Loading Required. This enables you to provide an all-encompassing picture of the entire clinical trial to your clinical trial coordinators, enabling you to react and respond faster.
In addition to this, you can train and implement a machine learning model to identify patients who might be at risk for adherence challenges. This enables clinical trial coordinators to reinforce patient education and support.
Step 5: Visualize and act on data
After the data is processed and ready to be consumed, you can use Amazon QuickSight, a cloud-native business intelligence service from AWS that offers native Amazon Redshift connectivity. Amazon QuickSight is serverless and so can be rolled out to your audiences in hours. You can also use a host of third-party reporting tools, which can use AWS-supplied JDBC or ODBC drivers or open-source PostgreSQL drivers to connect with Amazon Redshift. These tools include TIBCO Spotfire Analytics, Tableau Server, Qlik Sense Enterprise, Looker, and others. Real-time data processing (step 3 preceding) combines with historical-view batch processing (step 4). Together, they empower contract research organizations (CROs), study managers, trial coordinators, and other entities involved in the clinical trial journey to make effective and informed decisions at a speed and frequency that was previously unavailable. Using Amazon QuickSight’s unique Pay-per-Session pricing model, you can optimize costs for your bursty usage models by paying only when users access the dashboards.
Using Amazon Simple Notification Service (Amazon SNS), real-time feedback based on incoming data and telemetry is sent to patients by using text messages, mobile push, and emails. In addition, study managers and coordinators can send Amazon SNS notifications to patients. Amazon SNS provides a fully managed pub/sub messaging for micro services, distributed systems, and serverless applications. It’s designed for high-throughput, push-based, many-to-many messaging. Alerts and notifications can be based on current data or a combination of current and historical data.
Data security, data privacy, data integrity, and compliance considerations
At AWS, customer trust is our top priority. We deliver services to millions of active customers, including enterprises, educational institutions, and government agencies in over 190 countries. Our customers include financial services providers, healthcare providers, and governmental agencies, who trust us with some of their most sensitive information.
To facilitate this, along with the services mentioned earlier, you should also use AWS Identity and Access Management (IAM) service. IAM enables you to maintain segregation of access, fine-grained access control, and securing end user mobile and web applications. You can also use AWS Security Token Service (AWS STS) to provide secure, self-expiring, time-boxed, temporary security credentials to third-party administrators and service providers, greatly strengthening your security posture. You can use AWS CloudTrail to log IAM and STS API calls. Additionally, AWS IoT Device Management makes it easy to securely onboard, organize, monitor, and remotely manage IoT devices at scale.
With AWS, you can add an additional layer of security to your data at rest in the cloud. AWS provides scalable and efficient encryption features for services like Amazon EBS, Amazon S3, Amazon Redshift, Amazon SNS, AWS Glue, and many more. Flexible key management options, including AWS Key Management Service, enable you to choose whether to have AWS manage the encryption keys or to keep complete control over their keys. In addition, AWS provides APIs for you to integrate encryption and data protection with any of the services that you develop or deploy in an AWS environment.
As a customer, you maintain ownership of your data, and select which AWS services can process, store, and host the content. Generally speaking, AWS doesn’t access or use customers’ content for any purpose without their consent. AWS never uses customer data to derive information for marketing or advertising.
When evaluating the security of a cloud solution, it’s important that you understand and distinguish between the security of the cloud and security in the cloud. The AWS Shared Responsibility Model details this relationship.
To assist you with your compliance efforts, AWS continues to add more services to the various compliance regulations, attestations, certifications, and programs across the world. To decide which services are suitable for you, see the services in scope page.
With the ever-growing interconnectivity and technological advances in the field of medical devices, mobile devices and sensors can improve numerous aspects of clinical trials. They can help in recruitment, informed consent, patient counseling, and patient communication management. They can also improve protocol and medication adherence, clinical endpoints measurement, and the process of alerting participants on adverse events. Smart sensors, smart mobile devices, and robust interconnecting systems can be central in conducting clinical trials.
Every biopharma organization conducting or sponsoring a clinical trial activity faces the conundrum of advancing their approach to trials while maintaining overall trial performance and data consistency. The AWS Cloud enables a new dimension for how data is collected, stored, and used for clinical trials. It thus addresses that conundrum as we march towards a new reality of how drugs are brought to market. The AWS Cloud abstracts away technical challenges such as scaling, security, and establishing a cost-efficient IT infrastructure. In doing so, it allows biopharma organizations to focus on their core mission of improving patent lives through the development of effective, groundbreaking treatments.
About the Author
Mayank Thakkar – Global Solutions Architect, AWS HealthCare and Life Sciences
Deven Atnoor, Ph.D. – Industry Specialist, AWS HealthCare and Life Sciences
Our security culture is one of the things that sets AWS apart. Security is job zero — it is the foundation for all AWS employees and impacts the work we do every day, across the company. And that’s reflected in our services, which undergo exacting internal and external security reviews before being released. From there, we have historically waited for customer demand to begin the complex process of third-party assessment and validating services under specific compliance programs. However, we’ve heard you tell us you want every generally available (GA) service in scope to keep up with the pace of your innovation and at the same time, meet rigorous compliance and regulatory requirements.
I wanted to share how we’re meeting this challenge with a more proactive approach to service certification by certifying services at launch. For the first time, we’ve launched new GA services with PCI DSS, ISO 9001/27001/27017/27018, SOC 2, and HIPAA eligibility. That means customers who rely on or require these compliance programs can select from 10 brand new services right away, without having to wait for one or more trailing audit cycles.
This proactive compliance approach means we move upstream in the product development process. Over the last several months, we’ve made significant process improvements to deliver additional services with compliance certifications and HIPAA eligibility. Our security, compliance, and service teams have partnered in new ways to implement controls and audit earlier in a service’s development phase to demonstrate operating effectiveness. We also integrated auditing mechanisms into multiple stages of the launch process, enabling our security and compliance teams, as well as auditors, to assess controls throughout a service’s preview period. Additionally, we increased our audit frequency to meet services’ GA deadlines.
The work reflects a meaningful shift in our business. We’re excited to get these services into your hands sooner and wanted to report our overall progress. We also ask for your continued feedback since it drives our decisions and prioritization. Because going forward, we’ll continue to iterate and innovate until all of our services are certified at launch.
This post courtesy of Mayank Thakkar, AWS Senior Solutions Architect
Serverless computing refers to an architecture discipline that allows you to build and run applications or services without thinking about servers. You can focus on your applications, without worrying about provisioning, scaling, or managing any servers. You can use serverless architectures for nearly any type of application or backend service. AWS handles the heavy lifting around scaling, high availability, and running those workloads.
The AWS HIPAA program enables covered entities—and those business associates subject to the U.S. Health Insurance Portability and Accountability Act of 1996 (HIPAA)—to use the secure AWS environment to process, maintain, and store protected health information (PHI). Based on customer feedback, AWS is trying to add more services to the HIPAA program, including serverless technologies.
AWS recently announced that AWS Step Functions has achieved HIPAA-eligibility status and has been added to the AWS Business Associate Addendum (BAA), adding to a growing list of HIPAA-eligible services. The BAA is an AWS contract that is required under HIPAA rules to ensure that AWS appropriately safeguards PHI. The BAA also serves to clarify and limit, as appropriate, the permissible uses and disclosures of PHI by AWS, based on the relationship between AWS and customers and the activities or services being performed by AWS.
Along with HIPAA eligibility for most of the rest of the serverless platform at AWS, Step Functions inclusion is a major win for organizations looking to process PHI using serverless technologies, opening up numerous new use cases and patterns. You can still use non-eligible services to orchestrate the storage, transmission, and processing of the metadata around PHI, but not the PHI itself.
In this post, I examine some common serverless use cases that I see in the healthcare and life sciences industry and show how AWS Serverless can be used to build powerful, cost-efficient, HIPAA-eligible architectures.
Provider directory web application
Running HIPAA-compliant web applications (like provider directories) on AWS is a common use case in the healthcare industry. Healthcare providers are often looking for ways to build and run applications and services without thinking about servers. They are also looking for ways to provide the most cost-effective and scalable delivery of secure health-related information to members, providers, and partners worldwide.
Unpredictable access patterns and spiky workloads often force organizations to provision for peak in these cases, and they end up paying for idle capacity. AWS Auto Scaling solves this challenge to a great extent but you still have to manage and maintain the underlying servers from a patching, high availability, and scaling perspective. AWS Lambda (along with other serverless technologies from AWS) removes this constraint.
The above architecture shows a serverless way to host a customer-facing website, with Amazon S3 being used for hosting static files (.js, .css, images, and so on). If your website is based on client-side technologies, you can eliminate the need to run a web server farm. In addition, you can use S3 features like server-side encryption and bucket access policies to lock down access to the content.
Using Amazon CloudFront, a global content delivery network, with S3 origins can bring your content closer to the end user and cut down S3 access costs, by caching the content at the edge. In addition, using AWS [email protected] gives you an ability to bring and execute your own code to customize the content that CloudFront delivers. That significantly reduces latency and improves the end user experience while maintaining the same Lambda development model. Some common examples include checking cookies, inspecting headers or authorization tokens, rewriting URLs, and making calls to external resources to confirm user credentials and generate HTTP responses.
You can power the APIs needed for your client application by using Amazon API Gateway, which takes care of creating, publishing, maintaining, monitoring, and securing APIs at any scale. API Gateway also provides robust ways to provide traffic management, authorization and access control, monitoring, API version management, and the other tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls. This allows you to focus on your business logic. Direct, secure, and authenticated integration with Lambda functions allows this serverless architecture to scale up and down seamlessly with incoming traffic.
The CloudFront integration with AWS WAF provides a reliable way to protect your application against common web exploits that could affect application availability, compromise security, or consume excessive resources.
API Gateway can integrate directly with Lambda, which by default can access the public resources. Lambda functions can be configured to access your Amazon VPC resources as well. If you have extended your data center to AWS using AWS Direct Connect or a VPN connection, Lambda can access your on-premises resources, with the traffic flowing over your VPN connection (or Direct Connect) instead of the public internet.
All the services mentioned above (except Amazon EC2) are fully managed by AWS in terms of high availability, scaling, provisioning, and maintenance, giving you a cost-effective way to host your web applications. It’s pay-as-you-go vs. pay-as-you-provision. Spikes in demand, typically encountered during the enrollment season, are handled gracefully, with these services scaling automatically to meet demand and then scale down. You get to keep your costs in control.
All AWS services referenced in the above architecture are HIPAA-eligible, thus enabling you to store, process, and transmit PHI, as long as it complies with the BAA.
Medical device telemetry (ingesting data @ scale)
The ever-increasing presence of IoT devices in the healthcare industry has created the challenges of ingesting this data at scale and making it available for processing as soon as it is produced. Processing this data in real time (or near-real time) is key to delivering urgent care to patients.
The infinite scalability (theoretical) along with low startup times offered by Lambda makes it a great candidate for these kinds of use cases. Balancing ballooning healthcare costs and timely delivery of care is a never-ending challenge. With subsecond billing and no charge for non-execution, Lambda becomes the best choice for AWS customers.
These end-user medical devices emit a lot of telemetry data, which requires constant analysis and real-time tracking and updating. For example, devices like infusion pumps, personal use dialysis machines, and so on require tracking and alerting of device consumables and calibration status. They also require updates for these settings. Consider the following architecture:
Typically, these devices are connected to an edge node or collector, which provides sufficient computing resources to authenticate itself to AWS and start streaming data to Amazon Kinesis Streams. The collector uses the Kinesis Producer Library to simplify high throughput to a Kinesis data stream. You can also use the server-side encryption feature, supported by Kinesis Streams, to achieve encryption-at-rest. Kinesis provides a scalable, highly available way to achieve loose coupling between data-producing (medical devices) and data-consuming (Lambda) layers.
After the data is transported via Kinesis, Lambda can then be used to process this data in real time, storing derived insights in Amazon DynamoDB, which can then power a near-real time health dashboard. Caregivers can access this real-time data to provide timely care and manage device settings.
End-user medical devices, via the edge node, can also connect to and poll an API hosted on API Gateway to check for calibration settings, firmware updates, and so on. The modifications can be easily updated by admins, providing a scalable way to manage these devices.
For historical analysis and pattern prediction, the staged data (stored in S3), can be processed in batches. Use AWS Batch, Amazon EMR, or any custom logic running on a fleet of Amazon EC2 instances to gain actionable insights. Lambda can also be used to process data in a MapReduce fashion, as detailed in the Ad Hoc Big Data Processing Made Simple with Serverless MapReduce post.
You can also build high-throughput batch workflows or orchestrate Apache Spark applications using Step Functions, as detailed in the Orchestrate Apache Spark applications using AWS Step Functions and Apache Livy post. These insights can then be used to calibrate the medical devices to achieve effective outcomes.
Another use case that I see is using mobile devices to provide diagnostic care in out-patient settings. These environments typically lack the robust IT infrastructure that clinics and hospitals can provide, and often are subjected to intermittent internet connectivity as well. Various biosensors (otoscopes, thermometers, heart rate monitors, and so on) can easily talk to smartphones, which can then act as aggregators and analyzers before forwarding the data to a central processing system. After the data is in the system, caregivers and practitioners can then view and act on the data.
In the above diagram, an application running on a mobile device (iOS or Android) talks to various biosensors and collects diagnostic data. Using AWS mobile SDKs along with Amazon Cognito, these smart devices can authenticate themselves to AWS and access the APIs hosted on API Gateway. Amazon Cognito also offers data synchronization across various mobile devices, which helps you to build “offline” features in your mobile application. Amazon Cognito Sync resolves conflicts and intermittent network connectivity, enabling you to focus on delivering great app experiences instead of creating and managing a user data sync solution.
You can also use CloudFront and [email protected], as detailed in the first use case of this post, to cache content at edge locations and provide some light processing closer to your end users.
Lambda acts as a middle tier, processing the CRUD operations on the incoming data and storing it in DynamoDB, which is again exposed to caregivers through another set of Lambda functions and API Gateway. Caregivers can access the information through a browser-based interface, with Lambda processing the middle-tier application logic. They can view the historical data, compare it with fresh data coming in, and make corrections. Caregivers can also react to incoming data and issue alerts, which are delivered securely to the smart device through Amazon SNS.
Also, by using DynamoDB Streams and its integration with Lambda, you can implement Lambda functions that react to data modifications in DynamoDB tables (and hence, incoming device data). This gives you a way to codify common reactions to incoming data, in near-real time.
Lambda ecosystem
As I discussed in the above use cases, Lambda is a powerful, event-driven, stateless, on-demand compute platform offering scalability, agility, security, and reliability, along with a fine-grained cost structure.
For some organizations, migrating from a traditional programing model to a microservices-driven model can be a steep curve. Also, to build and maintain complex applications using Lambda, you need a vast array of tools, all the way from local debugging support to complex application performance monitoring tools. The following list of tools and services can assist you in building world-class applications with minimal effort:
AWS X-Ray is a distributed tracing system that allows developers to analyze and debug production for distributed applications, such as those built using a microservices (Lambda) architecture. AWS X-Ray was recently added to the AWS BAA, opening the doors for processing PHI workloads.
AWS Step Functions helps build HIPAA-compliant complex workflows using Lambda. It provides a way to coordinate the components of distributed applications and Lambda functions using visual workflows.
AWS SAM provides a fast and easy way of deploying serverless applications. You can write simple templates to describe your functions and their event sources (API Gateway, S3, Kinesis, and so on). AWS recently relaunched the AWS SAM CLI, which allows you to create a local testing environment that simulates the AWS runtime environment for Lambda. It allows faster, iterative development of your Lambda functions by eliminating the need to redeploy your application package to the Lambda runtime.
There are numerous other health care and life science use cases that customers are implementing, using Lambda with other AWS services. AWS is committed to easing the effort of implementing health care solutions in the cloud. Making Lambda HIPAA-eligible is just another milestone in the journey. For more examples of use cases, see Serverless. For the latest list of HIPAA-eligible services, see HIPAA Eligible Services Reference.
I’m excited to announce to our healthcare customers and partners that you can now accept a single AWS Business Associate Addendum (BAA) for all accounts within your organization. Once accepted, all current and future accounts created or added to your organization will immediately be covered by the BAA.
Our team is always thinking about how we can reduce manual processes related to your compliance tasks. That’s why I’ve been looking forward to the release of AWS Artifact Organization Agreements, which was designed to simplify the BAA process and improve your experience when designating AWS accounts as HIPAA accounts. Previously, if you wanted to designate several AWS accounts, you had to sign-in to each account individually to accept the BAA or email us. Now, an authorized master account user can accept the BAA once to automatically designate all existing and future member accounts in the organization as HIPAA accounts for use with protected health information (PHI). This release addresses a frequent customer request to be able to quickly designate multiple HIPAA accounts and confirm those accounts are covered under the terms of the BAA.
If you have a BAA in place already and want to leverage this new capability a master account user can accept the new AWS Organizations BAA in AWS Artifact today. To get started, your organization must use AWS Organizations to manage your accounts, and “all features” needs to be enabled. Learn more about creating an organization here.
Once you are using AWS Organizations with all features enabled, and you have the necessary user permissions, then accepting the AWS Organizations BAA takes about two minutes. We’ve created a video that shows you the process, step-by-step.
If your organization prefers to continue managing HIPAA accounts individually, you can still do that. We have streamlined the process for accepting an individual account BAA as well. It takes less than two minutes to designate a single account as a HIPAA account in AWS Artifact. You can watch the new video here to learn how.
As with all AWS Artifact features, there is no additional cost to use AWS Artifact to review, accept, and manage individual account BAAs or the new organization BAA. To learn more, go to the FAQ page.
This blog post was co-authored by Ujjwal Ratan, a senior AI/ML solutions architect on the global life sciences team.
Healthcare data is generated at an ever-increasing rate and is predicted to reach 35 zettabytes by 2020. Being able to cost-effectively and securely manage this data whether for patient care, research or legal reasons is increasingly important for healthcare providers.
Healthcare providers must have the ability to ingest, store and protect large volumes of data including clinical, genomic, device, financial, supply chain, and claims. AWS is well-suited to this data deluge with a wide variety of ingestion, storage and security services (e.g. AWS Direct Connect, Amazon Kinesis Streams, Amazon S3, Amazon Macie) for customers to handle their healthcare data. In a recent Healthcare IT News article, healthcare thought-leader, John Halamka, noted, “I predict that five years from now none of us will have datacenters. We’re going to go out to the cloud to find EHRs, clinical decision support, analytics.”
I realize simply storing this data is challenging enough. Magnifying the problem is the fact that healthcare data is increasingly attractive to cyber attackers, making security a top priority. According to Mariya Yao in her Forbes column, it is estimated that individual medical records can be worth hundreds or even thousands of dollars on the black market.
In this first of a 2-part post, I will address the value that AWS can bring to customers for ingesting, storing and protecting provider’s healthcare data. I will describe key components of any cloud-based healthcare workload and the services AWS provides to meet these requirements. In part 2 of this post we will dive deep into the AWS services used for advanced analytics, artificial intelligence and machine learning.
The data tsunami is upon us
So where is this data coming from? In addition to the ubiquitous electronic health record (EHR), the sources of this data include:
genomic sequencers
devices such as MRIs, x-rays and ultrasounds
sensors and wearables for patients
medical equipment telemetry
mobile applications
Additional sources of data come from non-clinical, operational systems such as:
human resources
finance
supply chain
claims and billing
Data from these sources can be structured (e.g., claims data) as well as unstructured (e.g., clinician notes). Some data comes across in streams such as that taken from patient monitors, while some comes in batch form. Still other data comes in near-real time such as HL7 messages. All of this data has retention policies dictating how long it must be stored. Much of this data is stored in perpetuity as many systems in use today have no purge mechanism. AWS has services to manage all these data types as well as their retention, security and access policies.
Imaging is a significant contributor to this data tsunami. Increasing demand for early-stage diagnoses along with aging populations drive increasing demand for images from CT, PET, MRI, ultrasound, digital pathology, X-ray and fluoroscopy. For example, a thin-slice CT image can be hundreds of megabytes. Increasing demand and strict retention policies make storage costly.
Due to the plummeting cost of gene sequencing, molecular diagnostics (including liquid biopsy) is a large contributor to this data deluge. Many predict that as the value of molecular testing becomes more identifiable, the reimbursement models will change and it will increasingly become the standard of care. According to the Washington Post article “Sequencing the Genome Creates so Much Data We Don’t Know What to do with It,”
“Some researchers predict that up to one billion people will have their genome sequenced by 2025 generating up to 40 exabytes of data per year.”
Although genomics is primarily used for oncology diagnostics today, it’s also used for other purposes, pharmacogenomics — used to understand how an individual will metabolize a medication.
Reference Architecture
It is increasingly challenging for the typical hospital, clinic or physician practice to securely store, process and manage this data without cloud adoption.
Amazon has a variety of ingestion techniques depending on the nature of the data including size, frequency and structure. AWS Snowball and AWS Snowmachine are appropriate for extremely-large, secure data transfers whether one time or episodic. AWS Glue is a fully-managed ETL service for securely moving data from on-premise to AWS and Amazon Kinesis can be used for ingesting streaming data.
Amazon S3, Amazon S3 IA, and Amazon Glacier are economical, data-storage services with a pay-as-you-go pricing model that expand (or shrink) with the customer’s requirements.
The above architecture has four distinct components – ingestion, storage, security, and analytics. In this post I will dive deeper into the first three components, namely ingestion, storage and security. In part 2, I will look at how to use AWS’ analytics services to draw value on, and optimize, your healthcare data.
Ingestion
A typical provider data center will consist of many systems with varied datasets. AWS provides multiple tools and services to effectively and securely connect to these data sources and ingest data in various formats. The customers can choose from a range of services and use them in accordance with the use case.
For use cases involving one-time (or periodic), very large data migrations into AWS, customers can take advantage of AWS Snowball devices. These devices come in two sizes, 50 TB and 80 TB and can be combined together to create a petabyte scale data transfer solution.
The devices are easy to connect and load and they are shipped to AWS avoiding the network bottlenecks associated with such large-scale data migrations. The devices are extremely secure supporting 256-bit encryption and come in a tamper-resistant enclosure. AWS Snowball imports data in Amazon S3 which can then interface with other AWS compute services to process that data in a scalable manner.
For use cases involving a need to store a portion of datasets on premises for active use and offload the rest on AWS, the Amazon storage gateway service can be used. The service allows you to seamlessly integrate on premises applications via standard storage protocols like iSCSI or NFS mounted on a gateway appliance. It supports a file interface, a volume interface and a tape interface which can be utilized for a range of use cases like disaster recovery, backup and archiving, cloud bursting, storage tiering and migration.
The AWS Storage Gateway appliance can use the AWS Direct Connect service to establish a dedicated network connection from the on premises data center to AWS.
Specific Industry Use Cases
By using the AWS proposed reference architecture for disaster recovery, healthcare providers can ensure their data assets are securely stored on the cloud and are easily accessible in the event of a disaster. The “AWS Disaster Recovery” whitepaper includes details on options available to customers based on their desired recovery time objective (RTO) and recovery point objective (RPO).
AWS is an ideal destination for offloading large volumes of less-frequently-accessed data. These datasets are rarely used in active compute operations but are exceedingly important to retain for reasons like compliance. By storing these datasets on AWS, customers can take advantage of the highly-durable platform to securely store their data and also retrieve them easily when they need to. For more details on how AWS enables customers to run back and archival use cases on AWS, please refer to the following set of whitepapers.
A healthcare provider may have a variety of databases spread throughout the hospital system supporting critical applications such as EHR, PACS, finance and many more. These datasets often need to be aggregated to derive information and calculate metrics to optimize business processes. AWS Glue is a fully-managed Extract, Transform and Load (ETL) service that can read data from a JDBC-enabled, on-premise database and transfer the datasets into AWS services like Amazon S3, Amazon Redshift and Amazon RDS. This allows customers to create transformation workflows that integrate smaller datasets from multiple sources and aggregates them on AWS.
Healthcare providers deal with a variety of streaming datasets which often have to be analyzed in near real time. These datasets come from a variety of sources such as sensors, messaging buses and social media, and often do not adhere to an industry standard. The Amazon Kinesis suite of services, that includes Amazon Kinesis Streams, Amazon Kinesis Firehose, and Amazon Kinesis Analytics, are the ideal set of services to accomplish the task of deriving value from streaming data.
Example: Using AWS Glue to de-identify and ingest healthcare data into S3 Let’s consider a scenario in which a provider maintains patient records in a database they want to ingest into S3. The provider also wants to de-identify the data by stripping personally- identifiable attributes and store the non-identifiable information in an S3 bucket. This bucket is different from the one that contains identifiable information. Doing this allows the healthcare provider to separate sensitive information with more restrictions set up via S3 bucket policies.
To ingest records into S3, we create a Glue job that reads from the source database using a Glue connection. The connection is also used by a Glue crawler to populate the Glue data catalog with the schema of the source database. We will use the Glue development endpoint and a zeppelin notebook server on EC2 to develop and execute the job.
Step 1: Import the necessary libraries and also set a glue context which is a wrapper on the spark context:
Step 2: Create a dataframe from the source data. I call the dataframe “readmissionsdata”. Here is what the schema would look like:
Step 3: Now select the columns that contains indentifiable information and store it in a new dataframe. Call the new dataframe “phi”.
Step 4: Non-PHI columns are stored in a separate dataframe. Call this dataframe “nonphi”.
Step 5: Write the two dataframes into two separate S3 buckets
Once successfully executed, the PHI and non-PHI attributes are stored in two separate files in two separate buckets that can be individually maintained.
Storage
In 2016, 327 healthcare providers reported a protected health information (PHI) breach, affecting 16.4m patient records[1]. There have been 342 data breaches reported in 2017 — involving 3.2 million patient records.[2]
To date, AWS has released 51 HIPAA-eligible services to help customers address security challenges and is in the process of making many more services HIPAA-eligible. These HIPAA-eligible services (along with all other AWS services) help customers build solutions that comply with HIPAA security and auditing requirements. A catalogue of HIPAA-enabled services can be found at AWS HIPAA-eligible services. It is important to note that AWS manages physical and logical access controls for the AWS boundary. However, the overall security of your workloads is a shared responsibility, where you are responsible for controlling user access to content on your AWS accounts.
AWS storage services allow you to store data efficiently while maintaining high durability and scalability. By using Amazon S3 as the central storage layer, you can take advantage of the Amazon S3 storage management features to get operational metrics on your data sets and transition them between various storage classes to save costs. By tagging objects on Amazon S3, you can build a governance layer on Amazon S3 to grant role based access to objects using Amazon IAM and Amazon S3 bucket policies.
To learn more about the Amazon S3 storage management features, see the following link.
Security
In the example above, we are storing the PHI information in a bucket named “phi.” Now, we want to protect this information to make sure its encrypted, does not have unauthorized access, and all access requests to the data are logged.
Encryption: S3 provides settings to enable default encryption on a bucket. This ensures any object in the bucket is encrypted by default.
Logging: S3 provides object level logging that can be used to capture all API calls to the object. The API calls are logged in cloudtrail for easy access and consolidation. Moreover, it also supports events to proactively alert customers of read and write operations.
Access control: Customers can use S3 bucket policies and IAM policies to restrict access to the phi bucket. It can also put a restriction to enforce multi-factor authentication on the bucket. For example, the following policy enforces multi-factor authentication on the phi bucket:
In Part 1 of this blog, we detailed the ingestion, storage, security and management of healthcare data on AWS. Stay tuned for part two where we are going to dive deep into optimizing the data for analytics and machine learning.
Amazon Simple Notification Service (SNS) now supports VPC Endpoints (VPCE) via AWS PrivateLink. You can use VPC Endpoints to privately publish messages to SNS topics, from an Amazon Virtual Private Cloud (VPC), without traversing the public internet. When you use AWS PrivateLink, you don’t need to set up an Internet Gateway (IGW), Network Address Translation (NAT) device, or Virtual Private Network (VPN) connection. You don’t need to use public IP addresses, either.
VPC Endpoints doesn’t require code changes and can bring additional security to Pub/Sub Messaging use cases that rely on SNS. VPC Endpoints helps promote data privacy and is aligned with assurance programs, including the Health Insurance Portability and Accountability Act (HIPAA), FedRAMP, and others discussed below.
VPC Endpoints for SNS in action
Here’s how VPC Endpoints for SNS works. The following example is based on a banking system that processes mortgage applications. This banking system, which has been deployed to a VPC, publishes each mortgage application to an SNS topic. The SNS topic then fans out the mortgage application message to two subscribing AWS Lambda functions:
Save-Mortgage-Application stores the application in an Amazon DynamoDB table. As the mortgage application contains personally identifiable information (PII), the message must not traverse the public internet.
Save-Credit-Report checks the applicant’s credit history against an external Credit Reporting Agency (CRA), then stores the final credit report in an Amazon S3 bucket.
The following diagram depicts the underlying architecture for this banking system:
To protect applicants’ data, the financial institution responsible for developing this banking system needed a mechanism to prevent PII data from traversing the internet when publishing mortgage applications from their VPC to the SNS topic. Therefore, they created a VPC endpoint to enable their publisher Amazon EC2 instance to privately connect to the SNS API. As shown in the diagram, when the VPC endpoint is created, an Elastic Network Interface (ENI) is automatically placed in the same VPC subnet as the publisher EC2 instance. This ENI exposes a private IP address that is used as the entry point for traffic destined to SNS. This ensures that traffic between the VPC and SNS doesn’t leave the Amazon network.
Set up VPC Endpoints for SNS
The process for creating a VPC endpoint to privately connect to SNS doesn’t require code changes: access the VPC Management Console, navigate to the Endpoints section, and create a new Endpoint. Three attributes are required:
The SNS service name.
The VPC and Availability Zones (AZs) from which you’ll publish your messages.
The Security Group (SG) to be associated with the endpoint network interface. The Security Group controls the traffic to the endpoint network interface from resources in your VPC. If you don’t specify a Security Group, the default Security Group for your VPC will be associated.
The SNS API is served through HTTP Secure (HTTPS), and encrypts all messages in transit with Transport Layer Security (TLS) certificates issued by Amazon Trust Services (ATS). The certificates verify the identity of the SNS API server when encrypted connections are established. The certificates help establish proof that your SNS API client (SDK, CLI) is communicating securely with the SNS API server. A Certificate Authority (CA) issues the certificate to a specific domain. Hence, when a domain presents a certificate that’s issued by a trusted CA, the SNS API client knows it’s safe to make the connection.
Summary
VPC Endpoints can increase the security of your pub/sub messaging use cases by allowing you to publish messages to SNS topics, from instances in your VPC, without traversing the internet. Setting up VPC Endpoints for SNS doesn’t require any code changes because the SNS API address remains the same.
VPC Endpoints for SNS is now available in all AWS Regions where AWS PrivateLink is available. For information on pricing and regional availability, visit the VPC pricing page. For more information and on-boarding, see Publishing to Amazon SNS Topics from Amazon Virtual Private Cloud in the SNS documentation.
If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Amazon SNS forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Today, our customers use AWS CloudHSM to meet corporate, contractual and regulatory compliance requirements for data security by using dedicated Hardware Security Module (HSM) instances within the AWS cloud. CloudHSM delivers all the benefits of traditional HSMs including secure generation, storage, and management of cryptographic keys used for data encryption that are controlled and accessible only by you.
As a managed service, it automates time-consuming administrative tasks such as hardware provisioning, software patching, high availability, backups and scaling for your sensitive and regulated workloads in a cost-effective manner. Backup and restore functionality is the core building block enabling scalability, reliability and high availability in CloudHSM.
You should consider using AWS CloudHSM if you require:
Keys stored in dedicated, third-party validated hardware security modules under your exclusive control
FIPS 140-2 compliance
Integration with applications using PKCS#11, Java JCE, or Microsoft CNG interfaces
Healthcare applications subject to HIPAA regulations
Streaming video solutions subject to contractual DRM requirements
We recently released a whitepaper, “Security of CloudHSM Backups” that provides in-depth information on how backups are protected in all three phases of the CloudHSM backup lifecycle process: Creation, Archive, and Restore.
About the Author
Balaji Iyer is a senior consultant in the Professional Services team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly-scalable distributed systems, operational security, large scale migrations, and leading strategic AWS initiatives.
Applying technology to healthcare data has the potential to produce many exciting and important outcomes. The analysis produced from healthcare data can empower clinicians to improve the health of individuals and populations by enabling them to make better decisions that enhance the care they provide.
The Observational Health Data Sciences and Informatics (OHDSI, pronounced “Odyssey”) program and community is working toward this goal by producing data standards and open-source solutions to store and analyze observational health data. Using the OHDSI tools, you can visualize the health of your entire population. You can build cohorts of patients, analyze incidence rates for various conditions, and estimate the effect of treatments on patients with certain conditions. You can also model health outcome predictions using machine learning algorithms.
One of the challenges often faced when working with big data tools is the expense of the infrastructure required to run them. Another challenge is the learning curve to implement and begin using these tools. Amazon Web Services has enabled us to address many of the classic IT challenges by making enterprise class infrastructure and technology available in an affordable, elastic, and automated way. This blog post demonstrates how to combine some of the OHDSI projects (Atlas, Achilles, WebAPI, and the OMOP Common Data Model) with AWS technologies. By doing so, you can quickly and inexpensively implement a health data science and informatics environment.
Shown following is just one example of the population health analysis that is possible with the OHDSI tools. This visualization shows the prevalence of various drugs within the given population of people. This information helps researchers and clinicians discover trends and make better informed decisions about patient health.
OHDSI application architecture on AWS
Before deploying an application on AWS that transmits, processes, or stores protected health information (PHI) or personally identifiable information (PII), address your organization’s compliance concerns. Make sure that you have worked with your internal compliance and legal team to ensure compliance with the laws and regulations that govern your organization. To understand how you can use AWS services as a part of your overall compliance program, see the AWS HIPAA Compliance whitepaper. With that said, we paid careful attention to the HIPAA control set during the design of this solution.
This blog post presents a complete OHDSI application environment, including a data warehouse with sample data. It has the following features:
Following, you can see a block diagram of how the OHDSI tools map to the services provided by AWS.
Atlas is the web application that researchers interact with to perform analysis. Atlas interacts with the underlying databases through a web services application named WebAPI. In this example, both Atlas and WebAPI are deployed and managed by AWS Elastic Beanstalk. Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications. Simply upload the Atlas and WebAPI code and Elastic Beanstalk automatically handles the deployment. It covers everything from capacity provisioning, load balancing, autoscaling, and high availability, to application health monitoring. Using a feature of Elastic Beanstalk called ebextensions, the Atlas and WebAPI servers are customized to use an encrypted storage volume for the middleware application logs.
Atlas stores the state of the various patient cohorts that are analyzed in a dedicated database separate from your observational health data. This database is provided by Amazon Aurora with PostgreSQL compatibility.
Amazon Aurora is a relational database built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open-source databases. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching, and backups. It is configured for high availability and uses encryption at rest for the database and backups, and encryption in flight for the JDBC connections.
All of your observational health data is stored inside the OHDSI Observational Medical Outcomes Partnership Common Data Model (OMOP CDM). This model also stores useful vocabulary tables that help to translate values from various data sources (like EHR systems and claims data).
The OMOP CDM schema is deployed onto Amazon Redshift. Amazon Redshift is a fast, fully managed data warehouse that allows you to run complex analytic queries against petabytes of structured data. It uses using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. You can also resize an Amazon Redshift cluster as your requirements for it change.
The solution in this blog post automatically loads de-identified sample data of 1,000 people from the CMS 2008–2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF). The data has helpful formatting from LTS Computing LLC. Vocabulary data from the OHDSI Athena project is also loaded into the OMOP CDM, and a results set is computed by OHDSI Achilles.
Following is a detailed technical diagram showing the configuration of the architecture to be deployed.
Deploying OHDSI on AWS
Everything just described is automatically deployed by using an AWS CloudFormation template. Using this template, you can quickly get started with the OHDSI project. The CloudFormation templates for this deployment as well as all of the supporting scripts and source code can be found in the AWS Labs GitHub repo.
From your AWS account, open the CloudFormation Management Console and choose Create Stack. From there, copy and paste the following URL in the Specify an Amazon S3 template URL box, and choose Next.
On the next screen, you provide a Stack Name (this can be anything you like) and a few other parameters for your OHDSI environment.
You use the DatabasePassword parameter to set the password for the master user account of the Amazon Redshift and Aurora databases.
You use the EBEndpoint name to generate a unique URL for Atlas to access the OHDSI environment. It is http://EBEndpoint.AWS-Region.elasticbeanstalk.com, where EBEndpoint.AWS-Region indicates the Elastic Beanstalk endpoint and AWS Region. You can configure this URL through Elastic Beanstalk if you want to change it in the future.
You use the KPair option to choose one of your existing Amazon EC2 key pairs to use with the instances that Elastic Beanstalk deploys. By doing this, you can gain administrative access to these instances in the future if you need to. If you don’t already have an Amazon EC2 key pair, you can generate one for free. You do this by going to the Key Pairs section of the EC2 console and choosing Generate Key Pair.
Finally, you use the UserIPRange parameter to specify a CIDR IP address range from which to access your OHDSI environment. By default, your OHDSI environment is accessible over the public internet. Use UserIPRange to limit access over the Internet to a single IP address or a range of IP addresses that represent users you want to have access. Through additional configuration, you can also make your OHDSI environment completely private and accessible only through a VPN or AWS Direct Connect private circuit.
When you’ve provided all Parameters, choose Next.
On the next screen, you can provide some other optional information like tags at your discretion, or just choose Next.
On the next screen, you can review what will be deployed. At the bottom of the screen, there is a check box for you to acknowledge that AWS CloudFormation might create IAM resources with custom names. This is correct; the template being deployed creates four custom roles that give permission for the AWS services involved to communicate with each other. Details of these permissions are inside the CloudFormation template referenced in the URL given in the first step. Check the box acknowledging this and choose Next.
You can watch as CloudFormation builds out your OHDSI architecture. A CloudFormation deployment is called a stack. The parent stack creates two child stacks, one containing the VPC and IAM roles and another created by Elastic Beanstalk with the Atlas and WebAPI servers. When all three stacks have reached the green CREATE_COMPLETE status, as shown in the screenshot following, then the OHDSI architecture has been deployed.
There is still some work going on behind the scenes, though. To watch the progress, browse to the Amazon Redshift section of your AWS Management Console and choose the Amazon Redshift cluster that was created for your OHDSI architecture. After you do so, you can observe the Loads and Queries tabs.
First, on the Loads tab, you can see the CMS De-SynPUF sample data and Athena vocabulary data being loaded into the OMOP Common Data Model. After you see the VOCABULARY table reach the COMPLETED status (as shown following), all of the sample and vocabulary data has been loaded.
After the data loads, the Achilles computation starts. On the Queries tab, you can watch Achilles running queries against your database to build out the Results schema. Achilles runs a large number of queries, and the entire process can take quite some time (about 20 minutes for the sample data we’ve loaded). Eventually, no new queries show up in the Queries tab, which shows that the Achilles computation is completed. The entire process from the time you executed the CloudFormation template until the Achilles computation is completed usually takes about an hour and 15 minutes.
At this point, you can browse to the Elastic Beanstalk section of the AWS Management Console. There, you can choose the OHDSI Application and Environment (green box) that was deployed by the CloudFormation template. At the top of the dashboard, as shown following, you see a link to a URL. This URL matches the name you provided in the EBEndpoint parameter of the CloudFormation template. Choose this URL, and you can start using Atlas to explore the CMS DE-SynPUF sample data!
Cost of deploying this environment
It used to be common to see healthcare data analytics environments deployed in an on-premises data center with expensive data warehouse appliances and virtualized environments. The cloud era has democratized the availability of the infrastructure required to do this type of data analysis, so that now it is within reach of even small organizations. This environment can expand to analyze petabyte-scale health data, and you only pay for what you need. See an estimated breakdown of the monthly cost components for this environment as deployed on the AWS Solution Calculator.
It’s also worth noting that this environment does not have to be run all of the time. If you are only performing analyses periodically, you can terminate the environment when you are finished and restore it from the database backups when you want to continue working. This would reduce the cost of operation even further.
Summary
Now that you have a fully functional OHDSI environment with sample data, you can use this to explore and learn the toolset and its capabilities. After learning with the sample data, you can begin gaining insights by analyzing your own organization’s health data. You can do this using an extract, transform, load (ETL) process from one or more of your health data sources.
James Wiggins is a senior healthcare solutions architect at AWS. He is passionate about using technology to help organizations positively impact world health. He also loves spending time with his wife and three children.
The Amazon RDS team launched nearly 80 features in 2017. Some of them were covered in this blog, others on the AWS Database Blog, and the rest in What’s New or Forum posts. To wrap up my week, I thought it would be worthwhile to give you an organized recap. So here we go!
The following list includes the ten most downloaded AWS security and compliance documents in 2017. Using this list, you can learn about what other AWS customers found most interesting about security and compliance last year.
AWS Security Best Practices – This guide is intended for customers who are designing the security infrastructure and configuration for applications running on AWS. The guide provides security best practices that will help you define your Information Security Management System (ISMS) and build a set of security policies and processes for your organization so that you can protect your data and assets in the AWS Cloud.
AWS: Overview of Security Processes – This whitepaper describes the physical and operational security processes for the AWS managed network and infrastructure, and helps answer questions such as, “How does AWS help me protect my data?”
Service Organization Controls (SOC) 3 Report – This publicly available report describes internal AWS security controls, availability, processing integrity, confidentiality, and privacy.
Introduction to AWS Security –This document provides an introduction to AWS’s approach to security, including the controls in the AWS environment, and some of the products and features that AWS makes available to customers to meet your security objectives.
AWS: Risk and Compliance – This whitepaper provides information to help customers integrate AWS into their existing control framework, including a basic approach for evaluating AWS controls and a description of AWS certifications, programs, reports, and third-party attestations.
Use AWS WAF to Mitigate OWASP’s Top 10 Web Application Vulnerabilities – AWS WAF is a web application firewall that helps you protect your websites and web applications against various attack vectors at the HTTP protocol level. This whitepaper outlines how you can use AWS WAF to mitigate the application vulnerabilities that are defined in the Open Web Application Security Project (OWASP) Top 10 list of most common categories of application security flaws.
Introduction to Auditing the Use of AWS – This whitepaper provides information, tools, and approaches for auditors to use when auditing the security of the AWS managed network and infrastructure.
AWS Security and Compliance: Quick Reference Guide – By using AWS, you inherit the many security controls that we operate, thus reducing the number of security controls that you need to maintain. Your own compliance and certification programs are strengthened while at the same time lowering your cost to maintain and run your specific security assurance requirements. Learn more in this quick reference guide.
My colleague Sandy Carter delivered the Enterprise Innovation State of the Union last week at AWS re:Invent. She wrote the guest post below to recap the announcements that she made from the stage.
“I want my company to innovate, but I am not convinced we can execute successfully.” Far too many times I have heard this fear expressed by senior executives that I have met at different points in my career. In fact, a recent study published by Price Waterhouse Coopers found that while 93% of executives depend on innovation to drive growth, more than half are challenged to take innovative ideas to market quickly in a scalable way.
Many customers are struggling with how to drive enterprise innovation, so I was thrilled to share the stage at AWS re:Invent this past week with several senior executives who have successfully broken this mold to drive amazing enterprise innovation. In particular, I want to thank Parag Karnik from Johnson & Johnson, Bill Rothe from Hess Corporation, Dave Williams from Just Eat, and Olga Lagunova from Pitney Bowes for sharing their stories of innovation, creativity, and solid execution.
Among the many new announcements from AWS this past week, I am particularly excited about the following newly-launched AWS products and programs that I announced at re:Invent to drive new innovations by our enterprise customers:
AI: New Deep Learning Amazon Machine Image (AMI) on EC2 Windows As I shared at re:Invent, customers such as Infor are already successfully leveraging artificial intelligence tools on AWS to deliver tailored, industry-specific applications to their customers. We want to facilitate more of our Windows developers to get started quickly and easily with AI, leveraging machine learning based tools with popular deep learning frameworks, such as Apache MXNet, TensorFlow, and Caffe2. In order to enable this, I announced at re:Invent that AWS now offers a new Deep Learning AMI for Microsoft Windows. The AMI is tailored to facilitate large scale training of deep-learning models, and enables quick and easy setup of Windows Server-based compute resources for machine learning applications.
IoT: Visualize and Analyze SQL and IoT Data Forecasts show as many as 31 billion IoT devices by 2020. AWS wants every Windows customer to take advantage of the data available from their devices. Pitney Bowes, for example, now has more than 130,000 IoT devices streaming data to AWS. Using machine learning, Pitney Bowes enriches and analyzes data to enhance their customer experience, improve efficiencies, and create new data products. AWS IoT Analytics can now be leveraged to run analytics on IoT data and get insights that help you make better and more accurate decisions for IoT applications and machine learning use cases. AWS IoT Analytics can automatically enrich IoT device data with contextual metadata such as your SQL Server transactional data.
New Capabilities for .NET Developers on AWS In addition to all of the enhancements we’ve introduced to deliver a first class experience to Windows developers on AWS, we announced that we are including .NET Core 2.0 support in AWS Lambda and AWS CodeBuild, which will be available for broader use early next year. .NET Core 2.0 packs a number of new features such as Razor pages, better compatibility with .NET framework, more than double the number of APIs compared to the previous versions, and much more. With this announcement, you will be able to take advantage of all latest .NET Core features on Lambda and CodeBuild for building modern serverless and DevOps centric solutions.
License optimization for BYOL AWS provides you a wide variety of instance types and families that best meet your workload needs. If you are using software licensed by the number of vCPUs, you want the ability to further tweak vCPU count to optimize license spend. I announced the upcoming ability to optimize CPUs for EC2, giving you greater control over your EC2 instances on two fronts:
You can specify a custom number of vCPUs when launching new instances to save on vCPU based licensing costs. For example, SQL Server licensing spend.
You can disable Hyper-Threading Technology for workloads that perform well with single-threaded CPUs, like some high-performance computing (HPC) applications.
Using these capabilities, customers who bring their own license (BYOL) will be able to optimize their license usage and save on the license costs.
Server Migration Service for Hyper-V Virtual Machines As Bill Rothe from Hess Corporation shared at re:Invent, Hess has successfully migrated a wide range of workloads to the cloud, including SQL Server, SharePoint, SAP HANA, and many others. AWS Server Migration Service (SMS) now supports Hyper-V virtual machine (VM) migration, in order to further support enterprise migrations like these. AWS Server Migration Service will enable you to more easily co-ordinate large-scale server migrations from on-premise Hyper-V environments to AWS. AWS Server Migration Service allows you to automate, schedule, and track incremental replications of live server volumes. The replicated volumes are encrypted in transit and saved as a new Amazon Machine Image (AMI), which can be launched as an EC2 instance on AWS.
Microsoft Premier Support for AWS End-Customers I was pleased to announce that Microsoft and AWS have developed new areas of support integration to help ensure a great customer experience. Microsoft Premier Support is on board to help AWS assist end customers. AWS Support engineers can escalate directly to Microsoft Support on behalf of AWS customers running Microsoft workloads.
Best Practice Tools: HIPAA Compliance and Digital Innovation Workshop In November, we updated our HIPAA-focused white paper, outlining how you can use AWS to create HIPAA-compliant applications. In the first quarter of next year, we will publish a HIPAA Implementation Guide that expands on our HIPAA Quick Start to enable you to follow strict security, compliance, and risk management controls for common healthcare use cases. I was also pleased to award a Digital Innovation Workshop to one of our customers in my re:Invent session, and look forward to seeing more customers take advantage of this workshop.
AWS: The Continuous Innovation Cloud A common thread we see across customers is that continuous innovation from AWS enables their ongoing reinvention. Continuous innovation means that you are always getting a newer, better offering every single day. Sometimes it is in the form of brand new services and capabilities, and sometimes it is happening invisibly, under the covers where your environment just keeps getting better. I invite you to learn more about how you can accelerate your innovation journey with recently launched AWS services and AWS best practices. If you are migrating Windows workloads, speak with your AWS sales representative or an AWS Microsoft Workloads Competency Partner to learn how you can leverage our re:Think for Windows program for credits to start your migration.
We don’t often recognize or celebrate anniversaries at AWS. With nearly 100 services on our list, we’d be eating cake and drinking champagne several times a week. While that might sound like fun, we’d rather spend our working hours listening to customers and innovating. With that said, Amazon QuickSight has now been generally available for a little over a year and I would like to give you a quick update!
QuickSight in Action Today, tens of thousands of customers (from startups to enterprises, in industries as varied as transportation, legal, mining, and healthcare) are using QuickSight to analyze and report on their business data.
Here are a couple of examples:
Gemini provides legal evidence procurement for California attorneys who represent injured workers. They have gone from creating custom reports and running one-off queries to creating and sharing dynamic QuickSight dashboards with drill-downs and filtering. QuickSight is used to track sales pipeline, measure order throughput, and to locate bottlenecks in the order processing pipeline.
Jivochat provides a real-time messaging platform to connect visitors to website owners. QuickSight lets them create and share interactive dashboards while also providing access to the underlying datasets. This has allowed them to move beyond the sharing of static spreadsheets, ensuring that everyone is looking at the same and is empowered to make timely decisions based on current data.
Transfix is a tech-powered freight marketplace that matches loads and increases visibility into logistics for Fortune 500 shippers in retail, food and beverage, manufacturing, and other industries. QuickSight has made analytics accessible to both BI engineers and non-technical business users. They scrutinize key business and operational metrics including shipping routes, carrier efficient, and process automation.
Looking Back / Looking Ahead The feedback on QuickSight has been incredibly helpful. Customers tell us that their employees are using QuickSight to connect to their data, perform analytics, and make high-velocity, data-driven decisions, all without setting up or running their own BI infrastructure. We love all of the feedback that we get, and use it to drive our roadmap, leading to the introduction of over 40 new features in just a year. Here’s a summary:
Looking forward, we are watching an interesting trend develop within our customer base. As these customers take a close look at how they analyze and report on data, they are realizing that a serverless approach offers some tangible benefits. They use Amazon Simple Storage Service (S3) as a data lake and query it using a combination of QuickSight and Amazon Athena, giving them agility and flexibility without static infrastructure. They also make great use of QuickSight’s dashboards feature, monitoring business results and operational metrics, then sharing their insights with hundreds of users. You can read Building a Serverless Analytics Solution for Cleaner Cities and review Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight if you are interested in this approach.
New Features and Enhancements We’re still doing our best to listen and to learn, and to make sure that QuickSight continues to meet your needs. I’m happy to announce that we are making seven big additions today:
Geospatial Visualization – You can now create geospatial visuals on geographical data sets.
Private VPC Access – You can now sign up to access a preview of a new feature that allows you to securely connect to data within VPCs or on-premises, without the need for public endpoints.
Flat Table Support – In addition to pivot tables, you can now use flat tables for tabular reporting. To learn more, read about Using Tabular Reports.
Calculated SPICE Fields – You can now perform run-time calculations on SPICE data as part of your analysis. Read Adding a Calculated Field to an Analysis for more information.
Wide Table Support – You can now use tables with up to 1000 columns.
HIPAA Compliance – You can now run HIPAA-compliant workloads on QuickSight.
Geospatial Visualization Everyone seems to want this feature! You can now take data that contains a geographic identifier (country, city, state, or zip code) and create beautiful visualizations with just a few clicks. QuickSight will geocode the identifier that you supply, and can also accept lat/long map coordinates. You can use this feature to visualize sales by state, map stores to shipping destinations, and so forth. Here’s a sample visualization:
Private VPC Access Preview If you have data in AWS (perhaps in Amazon Redshift, Amazon Relational Database Service (RDS), or on EC2) or on-premises in Teradata or SQL Server on servers without public connectivity, this feature is for you. Private VPC Access for QuickSight uses an Elastic Network Interface (ENI) for secure, private communication with data sources in a VPC. It also allows you to use AWS Direct Connect to create a secure, private link with your on-premises resources. Here’s what it looks like:
If you are ready to join the preview, you can sign up today.
Contributed by Tiffany Jernigan, Developer Advocate for Amazon ECS
Get ready for takeoff!
We made sure that this year’s re:Invent is chock-full of containers: there are over 40 sessions! New to containers? No problem, we have several introductory sessions for you to dip your toes. Been using containers for years and know the ins and outs? Don’t miss our technical deep-dives and interactive chalk talks led by container experts.
If you can’t make it to Las Vegas, you can catch the keynotes and session recaps from our livestream and on Twitch.
Session types
Not everyone learns the same way, so we have multiple types of breakout content:
Birds of a Feather An interactive discussion with industry leaders about containers on AWS.
Breakout sessions 60-minute presentations about building on AWS. Sessions are delivered by both AWS experts and customers and span all content levels.
Workshops 2.5-hour, hands-on sessions that teach how to build on AWS. AWS credits are provided. Bring a laptop, and have an active AWS account.
Chalk Talks 1-hour, highly interactive sessions with a smaller audience. They begin with a short lecture delivered by an AWS expert, followed by a discussion with the audience.
Session levels
Whether you’re new to containers or you’ve been using them for years, you’ll find useful information at every level.
Introductory Sessions are focused on providing an overview of AWS services and features, with the assumption that attendees are new to the topic.
Advanced Sessions dive deeper into the selected topic. Presenters assume that the audience has some familiarity with the topic, but may or may not have direct experience implementing a similar solution.
Expert Sessions are for attendees who are deeply familiar with the topic, have implemented a solution on their own already, and are comfortable with how the technology works across multiple services, architectures, and implementations.
Session locations
All container sessions are located in the Aria Resort.
MONDAY 11/27
Breakout sessions
Level 200 (Introductory)
CON202 – Getting Started with Docker and Amazon ECS By packaging software into standardized units, Docker gives code everything it needs to run, ensuring consistency from your laptop all the way into production. But once you have your code ready to ship, how do you run and scale it in the cloud? In this session, you become comfortable running containerized services in production using Amazon ECS. We cover container deployment, cluster management, service auto-scaling, service discovery, secrets management, logging, monitoring, security, and other core concepts. We also cover integrated AWS services and supplementary services that you can take advantage of to run and scale container-based services in the cloud.
Chalk talks
Level 200 (Introductory)
CON211 – Reducing your Compute Footprint with Containers and Amazon ECS Tomas Riha, platform architect for Volvo, shows how Volvo transitioned its WirelessCar platform from using Amazon EC2 virtual machines to containers running on Amazon ECS, significantly reducing cost. Tomas dives deep into the architecture that Volvo used to achieve the migration in under four months, including Amazon ECS, Amazon ECR, Elastic Load Balancing, and AWS CloudFormation.
CON212 – Anomaly Detection Using Amazon ECS, AWS Lambda, and Amazon EMR Learn about the architecture that Cisco CloudLock uses to enable automated security and compliance checks throughout the entire development lifecycle, from the first line of code through runtime. It includes integration with IAM roles, Amazon VPC, and AWS KMS.
Level 400 (Expert)
CON410 – Advanced CICD with Amazon ECS Control Plane Mohit Gupta, product and engineering lead for Clever, demonstrates how to extend the Amazon ECS control plane to optimize management of container deployments and how the control plane can be broadly applied to take advantage of new AWS services. This includes ark—an AWS CLI-based deployment to Amazon ECS, Dapple—a slack-based automation system for deployments and notifications, and Kayvee—log and event routing libraries based on Amazon Kinesis.
Workshops
Level 200 (Introductory)
CON209 – Interstella 8888: Learn How to Use Docker on AWS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to get hands-on experience with Docker as you containerize Interstella 8888’s aging monolithic application and deploy it using Amazon ECS.
CON213 – Hands-on Deployment of Kubernetes on AWS In this workshop, attendees get hands-on experience using Kubernetes and Kops (Kubernetes Operations), as described in our recent blog post. Attendees learn how to provision a cluster, assign role-based permissions and security, and launch a container. If you’re interested in learning best practices for running Kubernetes on AWS, don’t miss this workshop.
TUESDAY 11/28
Breakout Sessions
Level 200 (Introductory)
CON206 – Docker on AWS In this session, Docker Technical Staff Member Patrick Chanezon discusses how Finnish Rail, the national train system for Finland, is using Docker on Amazon Web Services to modernize their customer facing applications, from ticket sales to reservations. Patrick also shares the state of Docker development and adoption on AWS, including explaining the opportunities and implications of efforts such as Project Moby, Docker EE, and how developers can use and contribute to Docker projects.
CON208 – Building Microservices on AWS Increasingly, organizations are turning to microservices to help them empower autonomous teams, letting them innovate and ship software faster than ever before. But implementing a microservices architecture comes with a number of new challenges that need to be dealt with. Chief among these finding an appropriate platform to help manage a growing number of independently deployable services. In this session, Sam Newman, author of Building Microservices and a renowned expert in microservices strategy, discusses strategies for building scalable and robust microservices architectures. He also tells you how to choose the right platform for building microservices, and about common challenges and mistakes organizations make when they move to microservices architectures.
Level 300 (Advanced)
CON302 – Building a CICD Pipeline for Containers on AWS Containers can make it easier to scale applications in the cloud, but how do you set up your CICD workflow to automatically test and deploy code to containerized apps? In this session, we explore how developers can build effective CICD workflows to manage their containerized code deployments on AWS.
Ajit Zadgaonkar, Director of Engineering and Operations at Edmunds walks through best practices for CICD architectures used by his team to deploy containers. We also deep dive into topics such as how to create an accessible CICD platform and architect for safe blue/green deployments.
CON307 – Building Effective Container Images Sick of getting paged at 2am and wondering “where did all my disk space go?” New Docker users often start with a stock image in order to get up and running quickly, but this can cause problems as your application matures and scales. Creating efficient container images is important to maximize resources, and deliver critical security benefits.
In this session, AWS Sr. Technical Evangelist Abby Fuller covers how to create effective images to run containers in production. This includes an in-depth discussion of how Docker image layers work, things you should think about when creating your images, working with Amazon ECR, and mise-en-place for install dependencies. Prakash Janakiraman, Co-Founder and Chief Architect at Nextdoor discuss high-level and language-specific best practices for with building images and how Nextdoor uses these practices to successfully scale their containerized services with a small team.
CON309 – Containerized Machine Learning on AWS Image recognition is a field of deep learning that uses neural networks to recognize the subject and traits for a given image. In Japan, Cookpad uses Amazon ECS to run an image recognition platform on clusters of GPU-enabled EC2 instances. In this session, hear from Cookpad about the challenges they faced building and scaling this advanced, user-friendly service to ensure high-availability and low-latency for tens of millions of users.
CON320 – Monitoring, Logging, and Debugging for Containerized Services As containers become more embedded in the platform tools, debug tools, traces, and logs become increasingly important. Nare Hayrapetyan, Senior Software Engineer and Calvin French-Owen, Senior Technical Officer for Segment discuss the principals of monitoring and debugging containers and the tools Segment has implemented and built for logging, alerting, metric collection, and debugging of containerized services running on Amazon ECS.
Chalk Talks
Level 300 (Advanced)
CON314 – Automating Zero-Downtime Production Cluster Upgrades for Amazon ECS Containers make it easy to deploy new code into production to update the functionality of a service, but what happens when you need to update the Amazon EC2 compute instances that your containers are running on? In this talk, we’ll deep dive into how to upgrade the Amazon EC2 infrastructure underlying a live production Amazon ECS cluster without affecting service availability. Matt Callanan, Engineering Manager at Expedia walk through Expedia’s “PRISM” project that safely relocates hundreds of tasks onto new Amazon EC2 instances with zero-downtime to applications.
CON322 – Maximizing Amazon ECS for Large-Scale Workloads Head of Mobfox DevOps, David Spitzer, shows how Mobfox used Docker and Amazon ECS to scale the Mobfox services and development teams to achieve low-latency networking and automatic scaling. This session covers Mobfox’s ecosystem architecture. It compares 2015 and today, the challenges Mobfox faced in growing their platform, and how they overcame them.
CON323 – Microservices Architectures for the Enterprise Salva Jung, Principle Engineer for Samsung Mobile shares how Samsung Connect is architected as microservices running on Amazon ECS to securely, stably, and efficiently handle requests from millions of mobile and IoT devices around the world.
CON324 – Windows Containers on Amazon ECS Docker containers are commonly regarded as powerful and portable runtime environments for Linux code, but Docker also offers API and toolchain support for running Windows Servers in containers. In this talk, we discuss the various options for running windows-based applications in containers on AWS.
CON326 – Remote Sensing and Image Processing on AWS Learn how Encirca services by DuPont Pioneer uses Amazon ECS powered by GPU-instances and Amazon EC2 Spot Instances to run proprietary image-processing algorithms against satellite imagery. Mark Lanning and Ethan Harstad, engineers at DuPont Pioneer show how this architecture has allowed them to process satellite imagery multiple times a day for each agricultural field in the United States in order to identify crop health changes.
Workshops
Level 300 (Advanced)
CON317 – Advanced Container Management at Catsndogs.lol Catsndogs.lol is a (fictional) company that needs help deploying and scaling its container-based application. During this workshop, attendees join the new DevOps team at CatsnDogs.lol, and help the company to manage their applications using Amazon ECS, and help release new features to make our customers happier than ever.Attendees get hands-on with service and container-instance auto-scaling, spot-fleet integration, container placement strategies, service discovery, secrets management with AWS Systems Manager Parameter Store, time-based and event-based scheduling, and automated deployment pipelines. If you are a developer interested in learning more about how Amazon ECS can accelerate your application development and deployment workflows, or if you are a systems administrator or DevOps person interested in understanding how Amazon ECS can simplify the operational model associated with running containers at scale, then this workshop is for you. You should have basic familiarity with Amazon ECS, Amazon EC2, and IAM.
Additional requirements:
The AWS CLI or AWS Tools for PowerShell installed
An AWS account with administrative permissions (including the ability to create IAM roles and policies) created at least 24 hours in advance.
WEDNESDAY 11/29
Birds of a Feather (BoF)
CON01 – Birds of a Feather: Containers and Open Source at AWS Cloud native architectures take advantage of on-demand delivery, global deployment, elasticity, and higher-level services to enable developer productivity and business agility. Open source is a core part of making cloud native possible for everyone. In this session, we welcome thought leaders from the CNCF, Docker, and AWS to discuss the cloud’s direction for growth and enablement of the open source community. We also discuss how AWS is integrating open source code into its container services and its contributions to open source projects.
Breakout Sessions
Level 300 (Advanced)
CON308 – Mastering Kubernetes on AWS Much progress has been made on how to bootstrap a cluster since Kubernetes’ first commit and is now only a matter of minutes to go from zero to a running cluster on Amazon Web Services. However, evolving a simple Kubernetes architecture to be ready for production in a large enterprise can quickly become overwhelming with options for configuration and customization.
In this session, Arun Gupta, Open Source Strategist for AWS and Raffaele Di Fazio, software engineer at leading European fashion platform Zalando, show the common practices for running Kubernetes on AWS and share insights from experience in operating tens of Kubernetes clusters in production on AWS. We cover options and recommendations on how to install and manage clusters, configure high availability, perform rolling upgrades and handle disaster recovery, as well as continuous integration and deployment of applications, logging, and security.
CON310 – Moving to Containers: Building with Docker and Amazon ECS If you’ve ever considered moving part of your application stack to containers, don’t miss this session. We cover best practices for containerizing your code, implementing automated service scaling and monitoring, and setting up automated CI/CD pipelines with fail-safe deployments. Manjeeva Silva and Thilina Gunasinghe show how McDonalds implemented their home delivery platform in four months using Docker containers and Amazon ECS to serve tens of thousands of customers.
Level 400 (Expert)
CON402 – Advanced Patterns in Microservices Implementation with Amazon ECS Scaling a microservice-based infrastructure can be challenging in terms of both technical implementation and developer workflow. In this talk, AWS Solutions Architect Pierre Steckmeyer is joined by Will McCutchen, Architect at BuzzFeed, to discuss Amazon ECS as a platform for building a robust infrastructure for microservices. We look at the key attributes of microservice architectures and how Amazon ECS supports these requirements in production, from configuration to sophisticated workload scheduling to networking capabilities to resource optimization. We also examine what it takes to build an end-to-end platform on top of the wider AWS ecosystem, and what it’s like to migrate a large engineering organization from a monolithic approach to microservices.
CON404 – Deep Dive into Container Scheduling with Amazon ECS As your application’s infrastructure grows and scales, well-managed container scheduling is critical to ensuring high availability and resource optimization. In this session, we deep dive into the challenges and opportunities around container scheduling, as well as the different tools available within Amazon ECS and AWS to carry out efficient container scheduling. We discuss patterns for container scheduling available with Amazon ECS, the Blox scheduling framework, and how you can customize and integrate third-party scheduler frameworks to manage container scheduling on Amazon ECS.
Chalk Talks
Level 300 (Advanced)
CON312 – Building a Selenium Fleet on the Cheap with Amazon ECS with Spot Fleet Roberto Rivera and Matthew Wedgwood, engineers at RetailMeNot, give a practical overview of setting up a fleet of Selenium nodes running on Amazon ECS with Spot Fleet. Discuss the challenges of running Selenium with high availability at minimum cost using Amazon ECS container introspection to connect the Selenium Hub with its nodes.
CON315 – Virtually There: Building a Render Farm with Amazon ECS Learn how 8i Corp scales its multi-tenanted, volumetric render farm up to thousands of instances using AWS, Docker, and an API-driven infrastructure. This render farm enables them to turn the video footage from an array of synchronized cameras into a photo-realistic hologram capable of playback on a range of devices, from mobile phones to high-end head mounted displays. Join Owen Evans, VP of Engineering for 8i, as they dive deep into how 8i’s rendering infrastructure is built and maintained by just a handful of people and powered by Amazon ECS.
CON325 – Developing Microservices – from Your Laptop to the Cloud Wesley Chow, Staff Engineer at Adroll, shows how his team extends Amazon ECS by enabling local development capabilities. Hologram, Adroll’s local development program, brings the capabilities of the Amazon EC2 instance metadata service to non-EC2 hosts, so that developers can run the same software on local machines with the same credentials source as in production.
CON327 – Patterns and Considerations for Service Discovery Roven Drabo, head of cloud operations at Kaplan Test Prep, illustrates Kaplan’s complete container automation solution using Amazon ECS along with how his team uses NGINX and HashiCorp Consul to provide an automated approach to service discovery and container provisioning.
CON328 – Building a Development Platform on Amazon ECS Quinton Anderson, Head of Engineering for Commonwealth Bank of Australia, walks through how they migrated their internal development and deployment platform from Mesos/Marathon to Amazon ECS. The platform uses a custom DSL to abstract a layered application architecture, in a way that makes it easy to plug or replace new implementations into each layer in the stack.
Workshops
Level 300 (Advanced)
CON318 – Interstella 8888: Monolith to Microservices with Amazon ECS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to get hands-on experience deploying Docker containers as you break Interstella 8888’s aging monolithic application into containerized microservices. Using Amazon ECS and an Application Load Balancer, you create API-based microservices and deploy them leveraging integrations with other AWS services.
CON332 – Build a Java Spring Application on Amazon ECS This workshop teaches you how to lift and shift existing Spring and Spring Cloud applications onto the AWS platform. Learn how to build a Spring application container, understand bootstrap secrets, push container images to Amazon ECR, and deploy the application to Amazon ECS. Then, learn how to configure the deployment for production.
THURSDAY 11/30
Breakout Sessions
Level 200 (Introductory)
CON201 – Containers on AWS – State of the Union Just over four years after the first public release of Docker, and three years to the day after the launch of Amazon ECS, the use of containers has surged to run a significant percentage of production workloads at startups and enterprise organizations. Join Deepak Singh, General Manager of Amazon Container Services, as he covers the state of containerized application development and deployment trends, new container capabilities on AWS that are available now, options for running containerized applications on AWS, and how AWS customers successfully run container workloads in production.
Level 300 (Advanced)
CON304 – Batch Processing with Containers on AWS Batch processing is useful to analyze large amounts of data. But configuring and scaling a cluster of virtual machines to process complex batch jobs can be difficult. In this talk, we show how to use containers on AWS for batch processing jobs that can scale quickly and cost-effectively. We also discuss AWS Batch, our fully managed batch-processing service. You also hear from GoPro and Here about how they use AWS to run batch processing jobs at scale including best practices for ensuring efficient scheduling, fine-grained monitoring, compute resource automatic scaling, and security for your batch jobs.
Level 400 (Expert)
CON406 – Architecting Container Infrastructure for Security and Compliance While organizations gain agility and scalability when they migrate to containers and microservices, they also benefit from compliance and security, advantages that are often overlooked. In this session, Kelvin Zhu, lead software engineer at Okta, joins Mitch Beaumont, enterprise solutions architect at AWS, to discuss security best practices for containerized infrastructure. Learn how Okta built their development workflow with an emphasis on security through testing and automation. Dive deep into how containers enable automated security and compliance checks throughout the development lifecycle. Also understand best practices for implementing AWS security and secrets management services for any containerized service architecture.
Chalk Talks
Level 300 (Advanced)
CON329 – Full Software Lifecycle Management for Containers Running on Amazon ECS Learn how The Washington Post uses Amazon ECS to run Arc Publishing, a digital journalism platform that powers The Washington Post and a growing number of major media websites. Amazon ECS enabled The Washington Post to containerize their existing microservices architecture, avoiding a complete rewrite that would have delayed the platform’s launch by several years. In this session, Jason Bartz, Technical Architect at The Washington Post, discusses the platform’s architecture. He addresses the challenges of optimizing Arc Publishing’s workload, and managing the application lifecycle to support 2,000 containers running on more than 50 Amazon ECS clusters.
CON330 – Running Containerized HIPAA Workloads on AWS Nihar Pasala, Engineer at Aetion, discusses the Aetion Evidence Platform, a system for generating the real-world evidence used by healthcare decision makers to implement value-based care. This session discusses the architecture Aetion uses to run HIPAA workloads using containers on Amazon ECS, best practices, and learnings.
Level 400 (Expert)
CON408 – Building a Machine Learning Platform Using Containers on AWS DeepLearni.ng develops and implements machine learning models for complex enterprise applications. In this session, Thomas Rogers, Engineer for DeepLearni.ng discusses how they worked with Scotiabank to leverage Amazon ECS, Amazon ECR, Docker, GPU-accelerated Amazon EC2 instances, and TensorFlow to develop a retail risk model that helps manage payment collections for millions of Canadian credit card customers.
Workshops
Level 300 (Advanced)
CON319 – Interstella 8888: CICD for Containers on AWS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to learn how to set up a CI/CD pipeline for containerized microservices. You get hands-on experience deploying Docker container images using Amazon ECS, AWS CloudFormation, AWS CodeBuild, and AWS CodePipeline, automating everything from code check-in to production.
FRIDAY 12/1
Breakout Sessions
Level 400 (Expert)
CON405 – Moving to Amazon ECS – the Not-So-Obvious Benefits If you ask 10 teams why they migrated to containers, you will likely get answers like ‘developer productivity’, ‘cost reduction’, and ‘faster scaling’. But teams often find there are several other ‘hidden’ benefits to using containers for their services. In this talk, Franziska Schmidt, Platform Engineer at Mapbox and Yaniv Donenfeld from AWS will discuss the obvious, and not so obvious benefits of moving to containerized architecture. These include using Docker and Amazon ECS to achieve shared libraries for dev teams, separating private infrastructure from shareable code, and making it easier for non-ops engineers to run services.
Chalk Talks
Level 300 (Advanced)
CON331 – Deploying a Regulated Payments Application on Amazon ECS Travelex discusses how they built an FCA-compliant international payments service using a microservices architecture on AWS. This chalk talk covers the challenges of designing and operating an Amazon ECS-based PaaS in a regulated environment using a DevOps model.
Workshops
Level 400 (Expert)
CON407 – Interstella 8888: Advanced Microservice Operations Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. In this workshop, you help Interstella 8888 build a modern microservices-based logistics system to save the company from financial ruin. We give you the hands-on experience you need to run microservices in the real world. This includes implementing advanced container scheduling and scaling to deal with variable service requests, implementing a service mesh, issue tracing with AWS X-Ray, container and instance-level logging with Amazon CloudWatch, and load testing.
Know before you go
Want to brush up on your container knowledge before re:Invent? Here are some helpful resources to get started:
Amazon ElastiCache for Redis is now a HIPAA Eligible Service and has been added to the AWS Business Associate Addendum (BAA). This means you can use ElastiCache for Redis to help you power healthcare applications as well as process, maintain, and store protected health information (PHI). ElastiCache for Redis is a Redis-compatible, fully-managed, in-memory data store and cache in the cloud that provides sub-millisecond latency to power applications. Now you can use the speed, simplicity, and flexibility of ElastiCache for Redis to build secure, fast, and internet-scale healthcare applications.
ElastiCache for Redis with HIPAA eligibility is available for all current-generation instance node types and requires Redis engine version 3.2.6. You must ensure that nodes are configured to encrypt the data in transit and at rest, and to authenticate Redis commands before the engine executes them. See Architecting for HIPAA Security and Compliance on Amazon Web Services for information about how to configure Amazon HIPAA Eligible Services to store, process, and transmit PHI.
ElastiCache for Redis uses Advanced Encryption Standard (AES)-512 symmetric keys to encrypt data on disk. The Redis backups stored in Amazon S3 are encrypted with server-side encryption (SSE) using AES-256 symmetric keys. ElastiCache for Redis uses Transport Layer Security (TLS) to encrypt data in transit. It uses the Redis AUTH token that you provide at the time of Redis cluster creation to authenticate the Redis commands coming from clients. The AUTH token is encrypted using AWS Key Management Service.
As I was preparing to write this post, I took a nostalgic look at the blog post I wrote when we launched AWS Direct Connect back in 2012. We created Direct Connect after our enterprise customers asked us to allow them to establish dedicated connections to an AWS Region in pursuit of enhanced privacy, additional data transfer bandwidth, and more predictable data transfer performance. Starting from one AWS Region and a single colo, Direct Connect is now available in every public AWS Region and accessible from dozens of colos scattered across the world (over 60 locations at last count). Our customers have taken to Direct Connect wholeheartedly and we have added features such as Link Aggregation, Amazon EFS support, CloudWatch monitoring, and HIPAA eligibility. In the past five weeks alone we have added Direct Connect locations in Houston (Texas), Vancouver (Canada), Manchester (UK), Canberra (Australia), and Perth (Australia).
Today we are making Direct Connect simpler and more powerful with the addition of the Direct Connect Gateway. We are also giving Direct Connect customers in any Region the ability to create public virtual interfaces that receive our global IP routes and enable access to the public endpoints for our services and updating the Direct Connect pricing model.
Let’s take a look at each one!
New Direct Connect Gateway You can use the new Direct Connect Gateway to establish connectivity that spans Virtual Private Clouds (VPCs) spread across multiple AWS Regions. You no longer need to establish multiple BGP sessions for each VPC; this reduces your administrative workload as well as the load on your network devices.
This feature also allows you to connect to any of the participating VPCs from any Direct Connect location, further reducing your costs for making using AWS services on a cross-region basis.
Here is a diagram that illustrates the simplification that you can achieve with a Direct Connect Gateway (each “lock” icon represents a Virtual Private Gateway). Start with this:
And end up like this:
The VPCs that reference a particular Direct Connect Gateway must have IP address ranges that do not overlap. Today, the VPCs must all be in the same AWS account; we plan to make this more flexible in the future.
Each Gateway is a global object that exists across all of the public AWS Regions. All communication between the Regions via the Gateways takes place across the AWS network backbone.
Creating a Direct Connect Gateway You can create a Direct Connect Gateway from the Direct Connect Console or by calling the CreateDirectConnectGateway function from your code. I’ll use the Console!
I open the Direct Connect Console and click on Direct Connect Gateways to get started:
The list is empty since I don’t have any Gateways yet. Click on Create Direct Connect Gateway to change that:
I give my Gateway a name, enter a private ASN for my network, then click on Create. The ASN (Autonomous System Number) must be in one of the ranges defined as private in RFC 6996:
My new Gateway will appear in the other AWS Regions within a moment or two:
I have a Direct Connect Connection in Ohio that I will use to create my VIF:
Now I create a private VIF that references the Gateway and the Connection:
It is ready to use within seconds:
I already have a pair of VPCs with non-overlapping CIDRs, and a Virtual Private Gateway attached to each one. Here are the VPCs (since this is a demo I’ll show both in the same Region for convenience):
And the Virtual Private Gateways:
I return to the Direct Connect Console and navigate to the Direct Connect Gateways. I select my Gateway and choose Associate Virtual Private Gateway from the Actions menu:
Then I select both of my Virtual Private Gateways and click on Associate:
If, as would usually be the case, my VPCs are in distinct AWS Regions, the same procedure would apply. For this blog post it was easier to show you the operations once rather than twice.
The Virtual Gateway association is complete within a minute or so (the state starts out as associating):
When the state transitions to associated, traffic can flow between your on-premises network and your VPCs, over your AWS Direct Connect connection, regardless of the AWS Regions where your VPCs reside.
Public Virtual Interfaces for Service Endpoints You can now create Public Virtual Interfaces that will allow you to access AWS public service endpoints for AWS services running in any AWS Region (except AWS China Region) over Direct Connect. These interfaces receive (via BGP) Amazon’s global IP routes. You can create these interfaces in the Direct Connect Console; start by selecting the Public option:
After you create it you will need to associate it with a VPC.
Updated Pricing Model In light of the ever-expanding number of AWS Regions and AWS Direct Connect locations, data transfer pricing is now based on the location of the Direct Connect and the source AWS Region. The new pricing is simpler that the older model which was based on AWS Direct Connect locations.
Now Available This new feature is available today and you can start to use it right now. You can create and use Direct Connect Gateways at no charge; you pay the usual Direct Connect prices for port hours and data transfer.
Nine years ago I showed you how you could Distribute Your Content with Amazon CloudFront. We launched CloudFront in 2008 with 14 Points of Presence and have been expanding rapidly ever since. Today I am pleased to announce the opening of our 100th Point of Presence, the fifth one in Tokyo and the sixth in Japan. With 89 Edge Locations and 11 Regional Edge Caches, CloudFront now supports traffic generated by millions of viewers around the world.
23 Countries, 50 Cities, and Growing Those 100 Points of Presence span the globe, with sites in 50 cities and 23 countries. In the past 12 months we have expanded the size of our network by about 58%, adding 37 Points of Presence, including nine in the following new cities:
Berlin, Germany
Minneapolis, Minnesota, USA
Prague, Czech Republic
Boston, Massachusetts, USA
Munich, Germany
Vienna, Austria
Kuala Lumpur, Malaysia
Philadelphia, Pennsylvania, USA
Zurich, Switzerland
We have even more in the works, including an Edge Location in the United Arab Emirates, currently planned for the first quarter of 2018.
Innovating for Our Customers As I mentioned earlier, our network consists of a mix of Edge Locations and Regional Edge Caches. First announced at re:Invent 2016, the Regional Edge Caches sit between our Edge Locations and your origin servers, have even more memory than the Edge Locations, and allow us to store content close to the viewers for rapid delivery, all while reducing the load on the origin servers.
While locations are important, they are just a starting point. We continue to focus on security with the recent launch of our Security Policies feature and our announcement that CloudFront is a HIPAA-eligible service. We gave you more content-serving and content-generation options with the launch of[email protected], letting you run AWS Lambda functions close to your users.
We have also been working to accelerate the processing of cache invalidations and configuration changes. We now accept invalidations within milliseconds of the request and confirm that the request has been processed world-wide, typically within 60 seconds. This helps to ensure that your customers have access to fresh, timely content!
Our Health Customer Stories page lists just a few of the many customers that are building and running healthcare and life sciences applications that run on AWS. Customers like Verge Health, Care Cloud, and Orion Health trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.
Sixteen More Services In my last HIPAA Eligibility Update I shared the news that we added eight additional services to our list of HIPAA eligible services. Today I am happy to let you know that we have added another sixteen services to the list, bringing the total up to 46. Here are the newest additions, along with some short descriptions and links to some of my blog posts to jog your memory:
Amazon RDS for MariaDB – This service lets you set up scalable, managed MariaDB instances in minutes, and offers high performance, high availability, and a simplified security model that makes it easy for you to encrypt data at rest and in transit. Read Amazon RDS Update – MariaDB is Now Available to learn more.
AWS Batch – This service lets you run large-scale batch computing jobs on AWS. You don’t need to install or maintain specialized batch software or build your own server clusters. Read AWS Batch – Run Batch Computing Jobs on AWS to learn more.
AWS Key Management Service – This service makes it easy for you to create and control the encryption keys used to encrypt your data. It uses HSMs to protect your keys, and is integrated with AWS CloudTrail in order to provide you with a log of all key usage. Read New AWS Key Management Service (KMS) to learn more.
AWS Snowball Edge – This is a data transfer device with 100 terabytes of on-board storage as well as compute capabilities. You can use it to move large amounts of data into or out of AWS, as a temporary storage tier, or to support workloads in remote or offline locations. To learn more, read AWS Snowball Edge – More Storage, Local Endpoints, Lambda Functions.
AWS Snowmobile – This is an exabyte-scale data transfer service. Pulled by a semi-trailer truck, each Snowmobile packs 100 petabytes of storage into a ruggedized 45-foot long shipping container. Read AWS Snowmobile – Move Exabytes of Data to the Cloud in Weeks to learn more (and to see some of my finest LEGO work).
As I have noted in the past, the AWS Blog Team is working hard to make sure that you know about as many AWS launches and publications as possible, without totally burying you in content! As part of our balancing act, we will occasionally publish catch-up posts to clear our queues and to bring more information to your attention. Here’s what I have in store for you today:
Monitoring for Cross-Region Replication of S3 Objects
Tags for Spot Fleet Instances
PCI DSS Compliance for 12 More Services
HIPAA Eligibility for WorkDocs
VPC Resizing
AppStream 2.0 Graphics Design Instances
AMS Connector App for ServiceNow
Regtech in the Cloud
New & Revised Quick Starts
Let’s jump right in!
Monitoring for Cross-Region Replication of S3 Objects I told you about cross-region replication for S3 a couple of years ago. As I showed you at the time, you simply enable versioning for the source bucket and then choose a destination region and bucket. You can check the replication status manually, or you can create an inventory (daily or weekly) of the source and destination buckets.
The Cross-Region Replication Monitor (CRR Monitor for short) solution checks the replication status of objects across regions and gives you metrics and failure notifications in near real-time.
Tags for Spot Instances Spot Instances and Spot Fleets (collections of Spot Instances) give you access to spare compute capacity. We recently gave you the ability to enter tags (key/value pairs) as part of your spot requests and to have those tags applied to the EC2 instances launched to fulfill the request:
PCI DSS Compliance for 12 More Services As first announced on the AWS Security Blog, we recently added 12 more services to our PCI DSS compliance program, raising the total number of in-scope services to 42. To learn more, check out our Compliance Resources.
VPC Resizing This feature allows you to extend an existing Virtual Private Cloud (VPC) by adding additional blocks of addresses. This gives you more flexibility and should help you to deal with growth. You can add up to four secondary /16 CIDRs per VPC. You can also edit the secondary CIDRs by deleting them and adding new ones. Simply select the VPC and choose Edit CIDRs from the menu:
AppStream 2.0 Graphics Design Instances Powered by AMD FirePro S7150x2 Server GPUs and equipped with AMD Multiuser GPU technology, the new Graphics Design instances for Amazon AppStream 2.0 will let you run and stream graphics applications more cost-effectively than ever. The instances are available in four sizes, with 2-16 vCPUs and 7.5 GB to 61 GB of memory.
AMS Connector App for ServiceNow AWS Managed Services (AMS) provides Infrastructure Operations Management for the Enterprise. Designed to accelerate cloud adoption, it automates common operations such as change requests, patch management, security and backup.
The new AMS integration App for ServiceNow lets you interact with AMS from within ServiceNow, with no need for any custom development or API integration.
Regtech in the Cloud Regtech (as I learned while writing this), is short for regulatory technology, and is all about using innovative technology such as cloud computing, analytics, and machine learning to address regulatory challenges.
Working together with APN Consulting Partner Cognizant, TABB Group recently published a thought leadership paper that explains why regulations and compliance pose huge challenges for our customers in the financial services, and shows how AWS can help!
New & Revised Quick Starts Our Quick Starts team has been cranking out new solutions and making significant updates to the existing ones. Here’s a roster:
It is time for an update on our on-going effort to make AWS a great host for healthcare and life sciences applications. As you can see from our Health Customer Stories page, Philips, VergeHealth, and Cambia (to choose a few) trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.
Eight More Eligible Services Today I am happy to share the news that we are adding another eight services to the list:
Amazon CloudFront can now be utilized to enhance the delivery and transfer of Protected Health Information data to applications on the Internet. By providing a completely secure and encryptable pathway, CloudFront can now be used as a part of applications that need to cache PHI. This includes applications for viewing lab results or imaging data, and those that transfer PHI from Healthcare Information Exchanges (HIEs).
AWS WAF can now be used to protect applications running on AWS which operate on PHI such as patient care portals, patient scheduling systems, and HIEs. Requests and responses containing encrypted PHI and PII can now pass through AWS WAF.
AWS Shield can now be used to protect web applications such as patient care portals and scheduling systems that operate on encrypted PHI from DDoS attacks.
Amazon S3 Transfer Acceleration can now be used to accelerate the bulk transfer of large amounts of research, genetics, informatics, insurance, or payer/payment data containing PHI/PII information. Transfers can take place between a pair of AWS Regions or from an on-premises system and an AWS Region.
Amazon WorkSpaces can now be used by researchers, informaticists, hospital administrators and other users to analyze, visualize or process PHI/PII data using on-demand Windows virtual desktops.
AWS Directory Service can now be used to connect the authentication and authorization systems of organizations that use or process PHI/PII to their resources in the AWS Cloud. For example, healthcare providers operating hybrid cloud environments can now use AWS Directory Services to allow their users to easily transition between cloud and on-premises resources.
Amazon Simple Notification Service (SNS) can now be used to send notifications containing encrypted PHI/PII as part of patient care, payment processing, and mobile applications.
Amazon Cognito can now be used to authenticate users into mobile patient portal and payment processing applications that use PHI/PII identifiers for accounts.
Additional HIPAA Resources Here are some additional resources that will help you to build applications that comply with HIPAA and HITECH:
Keep in Touch In order to make use of any AWS service in any manner that involves PHI, you must first enter into an AWS Business Associate Addendum (BAA). You can contact us to start the process.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.