Tag Archives: hipaa

Powering HIPAA-compliant workloads using AWS Serverless technologies

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/powering-hipaa-compliant-workloads-using-aws-serverless-technologies/

This post courtesy of Mayank Thakkar, AWS Senior Solutions Architect

Serverless computing refers to an architecture discipline that allows you to build and run applications or services without thinking about servers. You can focus on your applications, without worrying about provisioning, scaling, or managing any servers. You can use serverless architectures for nearly any type of application or backend service. AWS handles the heavy lifting around scaling, high availability, and running those workloads.

The AWS HIPAA program enables covered entities—and those business associates subject to the U.S. Health Insurance Portability and Accountability Act of 1996 (HIPAA)—to use the secure AWS environment to process, maintain, and store protected health information (PHI). Based on customer feedback, AWS is trying to add more services to the HIPAA program, including serverless technologies.

AWS recently announced that AWS Step Functions has achieved HIPAA-eligibility status and has been added to the AWS Business Associate Addendum (BAA), adding to a growing list of HIPAA-eligible services. The BAA is an AWS contract that is required under HIPAA rules to ensure that AWS appropriately safeguards PHI. The BAA also serves to clarify and limit, as appropriate, the permissible uses and disclosures of PHI by AWS, based on the relationship between AWS and customers and the activities or services being performed by AWS.

Along with HIPAA eligibility for most of the rest of the serverless platform at AWS, Step Functions inclusion is a major win for organizations looking to process PHI using serverless technologies, opening up numerous new use cases and patterns. You can still use non-eligible services to orchestrate the storage, transmission, and processing of the metadata around PHI, but not the PHI itself.

In this post, I examine some common serverless use cases that I see in the healthcare and life sciences industry and show how AWS Serverless can be used to build powerful, cost-efficient, HIPAA-eligible architectures.

Provider directory web application

Running HIPAA-compliant web applications (like provider directories) on AWS is a common use case in the healthcare industry. Healthcare providers are often looking for ways to build and run applications and services without thinking about servers. They are also looking for ways to provide the most cost-effective and scalable delivery of secure health-related information to members, providers, and partners worldwide.

Unpredictable access patterns and spiky workloads often force organizations to provision for peak in these cases, and they end up paying for idle capacity. AWS Auto Scaling solves this challenge to a great extent but you still have to manage and maintain the underlying servers from a patching, high availability, and scaling perspective. AWS Lambda (along with other serverless technologies from AWS) removes this constraint.

The above architecture shows a serverless way to host a customer-facing website, with Amazon S3 being used for hosting static files (.js, .css, images, and so on). If your website is based on client-side technologies, you can eliminate the need to run a web server farm. In addition, you can use S3 features like server-side encryption and bucket access policies to lock down access to the content.

Using Amazon CloudFront, a global content delivery network, with S3 origins can bring your content closer to the end user and cut down S3 access costs, by caching the content at the edge. In addition, using AWS [email protected] gives you an ability to bring and execute your own code to customize the content that CloudFront delivers. That significantly reduces latency and improves the end user experience while maintaining the same Lambda development model. Some common examples include checking cookies, inspecting headers or authorization tokens, rewriting URLs, and making calls to external resources to confirm user credentials and generate HTTP responses.

You can power the APIs needed for your client application by using Amazon API Gateway, which takes care of creating, publishing, maintaining, monitoring, and securing APIs at any scale. API Gateway also provides robust ways to provide traffic management, authorization and access control, monitoring, API version management, and the other tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls. This allows you to focus on your business logic. Direct, secure, and authenticated integration with Lambda functions allows this serverless architecture to scale up and down seamlessly with incoming traffic.

The CloudFront integration with AWS WAF provides a reliable way to protect your application against common web exploits that could affect application availability, compromise security, or consume excessive resources.

API Gateway can integrate directly with Lambda, which by default can access the public resources. Lambda functions can be configured to access your Amazon VPC resources as well. If you have extended your data center to AWS using AWS Direct Connect or a VPN connection, Lambda can access your on-premises resources, with the traffic flowing over your VPN connection (or Direct Connect) instead of the public internet.

All the services mentioned above (except Amazon EC2) are fully managed by AWS in terms of high availability, scaling, provisioning, and maintenance, giving you a cost-effective way to host your web applications. It’s pay-as-you-go vs. pay-as-you-provision. Spikes in demand, typically encountered during the enrollment season, are handled gracefully, with these services scaling automatically to meet demand and then scale down. You get to keep your costs in control.

All AWS services referenced in the above architecture are HIPAA-eligible, thus enabling you to store, process, and transmit PHI, as long as it complies with the BAA.

Medical device telemetry (ingesting data @ scale)

The ever-increasing presence of IoT devices in the healthcare industry has created the challenges of ingesting this data at scale and making it available for processing as soon as it is produced. Processing this data in real time (or near-real time) is key to delivering urgent care to patients.

The infinite scalability (theoretical) along with low startup times offered by Lambda makes it a great candidate for these kinds of use cases. Balancing ballooning healthcare costs and timely delivery of care is a never-ending challenge. With subsecond billing and no charge for non-execution, Lambda becomes the best choice for AWS customers.

These end-user medical devices emit a lot of telemetry data, which requires constant analysis and real-time tracking and updating. For example, devices like infusion pumps, personal use dialysis machines, and so on require tracking and alerting of device consumables and calibration status. They also require updates for these settings. Consider the following architecture:

Typically, these devices are connected to an edge node or collector, which provides sufficient computing resources to authenticate itself to AWS and start streaming data to Amazon Kinesis Streams. The collector uses the Kinesis Producer Library to simplify high throughput to a Kinesis data stream. You can also use the server-side encryption feature, supported by Kinesis Streams, to achieve encryption-at-rest. Kinesis provides a scalable, highly available way to achieve loose coupling between data-producing (medical devices) and data-consuming (Lambda) layers.

After the data is transported via Kinesis, Lambda can then be used to process this data in real time, storing derived insights in Amazon DynamoDB, which can then power a near-real time health dashboard. Caregivers can access this real-time data to provide timely care and manage device settings.

End-user medical devices, via the edge node, can also connect to and poll an API hosted on API Gateway to check for calibration settings, firmware updates, and so on. The modifications can be easily updated by admins, providing a scalable way to manage these devices.

For historical analysis and pattern prediction, the staged data (stored in S3), can be processed in batches. Use AWS Batch, Amazon EMR, or any custom logic running on a fleet of Amazon EC2 instances to gain actionable insights. Lambda can also be used to process data in a MapReduce fashion, as detailed in the Ad Hoc Big Data Processing Made Simple with Serverless MapReduce post.

You can also build high-throughput batch workflows or orchestrate Apache Spark applications using Step Functions, as detailed in the Orchestrate Apache Spark applications using AWS Step Functions and Apache Livy post. These insights can then be used to calibrate the medical devices to achieve effective outcomes.

Use Lambda to load data into Amazon Redshift, a cost-effective, petabyte-scale data warehouse offering. One of my colleagues, Ian Meyers, pointed this out in his Zero-Administration Amazon Redshift Database Loader post.

Mobile diagnostics

Another use case that I see is using mobile devices to provide diagnostic care in out-patient settings. These environments typically lack the robust IT infrastructure that clinics and hospitals can provide, and often are subjected to intermittent internet connectivity as well. Various biosensors (otoscopes, thermometers, heart rate monitors, and so on) can easily talk to smartphones, which can then act as aggregators and analyzers before forwarding the data to a central processing system. After the data is in the system, caregivers and practitioners can then view and act on the data.

In the above diagram, an application running on a mobile device (iOS or Android) talks to various biosensors and collects diagnostic data. Using AWS mobile SDKs along with Amazon Cognito, these smart devices can authenticate themselves to AWS and access the APIs hosted on API Gateway. Amazon Cognito also offers data synchronization across various mobile devices, which helps you to build “offline” features in your mobile application. Amazon Cognito Sync resolves conflicts and intermittent network connectivity, enabling you to focus on delivering great app experiences instead of creating and managing a user data sync solution.

You can also use CloudFront and [email protected], as detailed in the first use case of this post, to cache content at edge locations and provide some light processing closer to your end users.

Lambda acts as a middle tier, processing the CRUD operations on the incoming data and storing it in DynamoDB, which is again exposed to caregivers through another set of Lambda functions and API Gateway. Caregivers can access the information through a browser-based interface, with Lambda processing the middle-tier application logic. They can view the historical data, compare it with fresh data coming in, and make corrections. Caregivers can also react to incoming data and issue alerts, which are delivered securely to the smart device through Amazon SNS.

Also, by using DynamoDB Streams and its integration with Lambda, you can implement Lambda functions that react to data modifications in DynamoDB tables (and hence, incoming device data). This gives you a way to codify common reactions to incoming data, in near-real time.

Lambda ecosystem

As I discussed in the above use cases, Lambda is a powerful, event-driven, stateless, on-demand compute platform offering scalability, agility, security, and reliability, along with a fine-grained cost structure.

For some organizations, migrating from a traditional programing model to a microservices-driven model can be a steep curve. Also, to build and maintain complex applications using Lambda, you need a vast array of tools, all the way from local debugging support to complex application performance monitoring tools. The following list of tools and services can assist you in building world-class applications with minimal effort:

  • AWS X-Ray is a distributed tracing system that allows developers to analyze and debug production for distributed applications, such as those built using a microservices (Lambda) architecture. AWS X-Ray was recently added to the AWS BAA, opening the doors for processing PHI workloads.
  • AWS Step Functions helps build HIPAA-compliant complex workflows using Lambda. It provides a way to coordinate the components of distributed applications and Lambda functions using visual workflows.
  • AWS SAM provides a fast and easy way of deploying serverless applications. You can write simple templates to describe your functions and their event sources (API Gateway, S3, Kinesis, and so on). AWS recently relaunched the AWS SAM CLI, which allows you to create a local testing environment that simulates the AWS runtime environment for Lambda. It allows faster, iterative development of your Lambda functions by eliminating the need to redeploy your application package to the Lambda runtime.

For more details, see the Serverless Application Developer Tooling webpage.

Conclusion

There are numerous other health care and life science use cases that customers are implementing, using Lambda with other AWS services. AWS is committed to easing the effort of implementing health care solutions in the cloud. Making Lambda HIPAA-eligible is just another milestone in the journey. For more examples of use cases, see Serverless. For the latest list of HIPAA-eligible services, see HIPAA Eligible Services Reference.

Accept a BAA with AWS for all accounts in your organization

Post Syndicated from Kristen Haught original https://aws.amazon.com/blogs/security/accept-a-baa-with-aws-for-all-accounts-in-your-organization/

I’m excited to announce to our healthcare customers and partners that you can now accept a single AWS Business Associate Addendum (BAA) for all accounts within your organization. Once accepted, all current and future accounts created or added to your organization will immediately be covered by the BAA.

Our team is always thinking about how we can reduce manual processes related to your compliance tasks. That’s why I’ve been looking forward to the release of AWS Artifact Organization Agreements, which was designed to simplify the BAA process and improve your experience when designating AWS accounts as HIPAA accounts. Previously, if you wanted to designate several AWS accounts, you had to sign-in to each account individually to accept the BAA or email us. Now, an authorized master account user can accept the BAA once to automatically designate all existing and future member accounts in the organization as HIPAA accounts for use with protected health information (PHI). This release addresses a frequent customer request to be able to quickly designate multiple HIPAA accounts and confirm those accounts are covered under the terms of the BAA.

If you have a BAA in place already and want to leverage this new capability a master account user can accept the new AWS Organizations BAA in AWS Artifact today. To get started, your organization must use AWS Organizations to manage your accounts, and “all features” needs to be enabled. Learn more about creating an organization here.

Once you are using AWS Organizations with all features enabled, and you have the necessary user permissions, then accepting the AWS Organizations BAA takes about two minutes. We’ve created a video that shows you the process, step-by-step.

If your organization prefers to continue managing HIPAA accounts individually, you can still do that.  We have streamlined the process for accepting an individual account BAA as well. It takes less than two minutes to designate a single account as a HIPAA account in AWS Artifact. You can watch the new video here to learn how.

As with all AWS Artifact features, there is no additional cost to use AWS Artifact to review, accept, and manage individual account BAAs or the new organization BAA. To learn more, go to the FAQ page.

 

Securing messages published to Amazon SNS with AWS PrivateLink

Post Syndicated from Otavio Ferreira original https://aws.amazon.com/blogs/security/securing-messages-published-to-amazon-sns-with-aws-privatelink/

Amazon Simple Notification Service (SNS) now supports VPC Endpoints (VPCE) via AWS PrivateLink. You can use VPC Endpoints to privately publish messages to SNS topics, from an Amazon Virtual Private Cloud (VPC), without traversing the public internet. When you use AWS PrivateLink, you don’t need to set up an Internet Gateway (IGW), Network Address Translation (NAT) device, or Virtual Private Network (VPN) connection. You don’t need to use public IP addresses, either.

VPC Endpoints doesn’t require code changes and can bring additional security to Pub/Sub Messaging use cases that rely on SNS. VPC Endpoints helps promote data privacy and is aligned with assurance programs, including the Health Insurance Portability and Accountability Act (HIPAA), FedRAMP, and others discussed below.

VPC Endpoints for SNS in action

Here’s how VPC Endpoints for SNS works. The following example is based on a banking system that processes mortgage applications. This banking system, which has been deployed to a VPC, publishes each mortgage application to an SNS topic. The SNS topic then fans out the mortgage application message to two subscribing AWS Lambda functions:

  • Save-Mortgage-Application stores the application in an Amazon DynamoDB table. As the mortgage application contains personally identifiable information (PII), the message must not traverse the public internet.
  • Save-Credit-Report checks the applicant’s credit history against an external Credit Reporting Agency (CRA), then stores the final credit report in an Amazon S3 bucket.

The following diagram depicts the underlying architecture for this banking system:
 
Diagram depicting the architecture for the example banking system
 
To protect applicants’ data, the financial institution responsible for developing this banking system needed a mechanism to prevent PII data from traversing the internet when publishing mortgage applications from their VPC to the SNS topic. Therefore, they created a VPC endpoint to enable their publisher Amazon EC2 instance to privately connect to the SNS API. As shown in the diagram, when the VPC endpoint is created, an Elastic Network Interface (ENI) is automatically placed in the same VPC subnet as the publisher EC2 instance. This ENI exposes a private IP address that is used as the entry point for traffic destined to SNS. This ensures that traffic between the VPC and SNS doesn’t leave the Amazon network.

Set up VPC Endpoints for SNS

The process for creating a VPC endpoint to privately connect to SNS doesn’t require code changes: access the VPC Management Console, navigate to the Endpoints section, and create a new Endpoint. Three attributes are required:

  • The SNS service name.
  • The VPC and Availability Zones (AZs) from which you’ll publish your messages.
  • The Security Group (SG) to be associated with the endpoint network interface. The Security Group controls the traffic to the endpoint network interface from resources in your VPC. If you don’t specify a Security Group, the default Security Group for your VPC will be associated.

Help ensure your security and compliance

SNS can support messaging use cases in regulated market segments, such as healthcare provider systems subject to the Health Insurance Portability and Accountability Act (HIPAA) and financial systems subject to the Payment Card Industry Data Security Standard (PCI DSS), and is also in-scope with the following Assurance Programs:

The SNS API is served through HTTP Secure (HTTPS), and encrypts all messages in transit with Transport Layer Security (TLS) certificates issued by Amazon Trust Services (ATS). The certificates verify the identity of the SNS API server when encrypted connections are established. The certificates help establish proof that your SNS API client (SDK, CLI) is communicating securely with the SNS API server. A Certificate Authority (CA) issues the certificate to a specific domain. Hence, when a domain presents a certificate that’s issued by a trusted CA, the SNS API client knows it’s safe to make the connection.

Summary

VPC Endpoints can increase the security of your pub/sub messaging use cases by allowing you to publish messages to SNS topics, from instances in your VPC, without traversing the internet. Setting up VPC Endpoints for SNS doesn’t require any code changes because the SNS API address remains the same.

VPC Endpoints for SNS is now available in all AWS Regions where AWS PrivateLink is available. For information on pricing and regional availability, visit the VPC pricing page.
For more information and on-boarding, see Publishing to Amazon SNS Topics from Amazon Virtual Private Cloud in the SNS documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Amazon SNS forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Security of Cloud HSMBackups

Post Syndicated from Balaji Iyer original https://aws.amazon.com/blogs/architecture/security-of-cloud-hsmbackups/

Today, our customers use AWS CloudHSM to meet corporate, contractual and regulatory compliance requirements for data security by using dedicated Hardware Security Module (HSM) instances within the AWS cloud. CloudHSM delivers all the benefits of traditional HSMs including secure generation, storage, and management of cryptographic keys used for data encryption that are controlled and accessible only by you.

As a managed service, it automates time-consuming administrative tasks such as hardware provisioning, software patching, high availability, backups and scaling for your sensitive and regulated workloads in a cost-effective manner. Backup and restore functionality is the core building block enabling scalability, reliability and high availability in CloudHSM.

You should consider using AWS CloudHSM if you require:

  • Keys stored in dedicated, third-party validated hardware security modules under your exclusive control
  • FIPS 140-2 compliance
  • Integration with applications using PKCS#11, Java JCE, or Microsoft CNG interfaces
  • High-performance in-VPC cryptographic acceleration (bulk crypto)
  • Financial applications subject to PCI regulations
  • Healthcare applications subject to HIPAA regulations
  • Streaming video solutions subject to contractual DRM requirements

We recently released a whitepaper, “Security of CloudHSM Backups” that provides in-depth information on how backups are protected in all three phases of the CloudHSM backup lifecycle process: Creation, Archive, and Restore.

About the Author

Balaji Iyer is a senior consultant in the Professional Services team at Amazon Web Services. In this role, he has helped several customers successfully navigate their journey to AWS. His specialties include architecting and implementing highly-scalable distributed systems, operational security, large scale migrations, and leading strategic AWS initiatives.

Amazon Relational Database Service – Looking Back at 2017

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-relational-database-service-looking-back-at-2017/

The Amazon RDS team launched nearly 80 features in 2017. Some of them were covered in this blog, others on the AWS Database Blog, and the rest in What’s New or Forum posts. To wrap up my week, I thought it would be worthwhile to give you an organized recap. So here we go!

Certification & Security

Features

Engine Versions & Features

Regional Support

Instance Support

Price Reductions

And That’s a Wrap
I’m pretty sure that’s everything. As you can see, 2017 was quite the year! I can’t wait to see what the team delivers in 2018.

Jeff;

 

The Top 10 Most Downloaded AWS Security and Compliance Documents in 2017

Post Syndicated from Sara Duffer original https://aws.amazon.com/blogs/security/the-top-10-most-downloaded-aws-security-and-compliance-documents-in-2017/

AWS download logo

The following list includes the ten most downloaded AWS security and compliance documents in 2017. Using this list, you can learn about what other AWS customers found most interesting about security and compliance last year.

  1. AWS Security Best Practices – This guide is intended for customers who are designing the security infrastructure and configuration for applications running on AWS. The guide provides security best practices that will help you define your Information Security Management System (ISMS) and build a set of security policies and processes for your organization so that you can protect your data and assets in the AWS Cloud.
  2. AWS: Overview of Security Processes – This whitepaper describes the physical and operational security processes for the AWS managed network and infrastructure, and helps answer questions such as, “How does AWS help me protect my data?”
  3. Architecting for HIPAA Security and Compliance on AWS – This whitepaper describes how to leverage AWS to develop applications that meet HIPAA and HITECH compliance requirements.
  4. Service Organization Controls (SOC) 3 Report – This publicly available report describes internal AWS security controls, availability, processing integrity, confidentiality, and privacy.
  5. Introduction to AWS Security –This document provides an introduction to AWS’s approach to security, including the controls in the AWS environment, and some of the products and features that AWS makes available to customers to meet your security objectives.
  6. AWS Best Practices for DDoS Resiliency – This whitepaper covers techniques to mitigate distributed denial of service (DDoS) attacks.
  7. AWS: Risk and Compliance – This whitepaper provides information to help customers integrate AWS into their existing control framework, including a basic approach for evaluating AWS controls and a description of AWS certifications, programs, reports, and third-party attestations.
  8. Use AWS WAF to Mitigate OWASP’s Top 10 Web Application Vulnerabilities – AWS WAF is a web application firewall that helps you protect your websites and web applications against various attack vectors at the HTTP protocol level. This whitepaper outlines how you can use AWS WAF to mitigate the application vulnerabilities that are defined in the Open Web Application Security Project (OWASP) Top 10 list of most common categories of application security flaws.
  9. Introduction to Auditing the Use of AWS – This whitepaper provides information, tools, and approaches for auditors to use when auditing the security of the AWS managed network and infrastructure.
  10. AWS Security and Compliance: Quick Reference Guide – By using AWS, you inherit the many security controls that we operate, thus reducing the number of security controls that you need to maintain. Your own compliance and certification programs are strengthened while at the same time lowering your cost to maintain and run your specific security assurance requirements. Learn more in this quick reference guide.

– Sara

Amazon QuickSight Update – Geospatial Visualization, Private VPC Access, and More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-quicksight-update-geospatial-visualization-private-vpc-access-and-more/

We don’t often recognize or celebrate anniversaries at AWS. With nearly 100 services on our list, we’d be eating cake and drinking champagne several times a week. While that might sound like fun, we’d rather spend our working hours listening to customers and innovating. With that said, Amazon QuickSight has now been generally available for a little over a year and I would like to give you a quick update!

QuickSight in Action
Today, tens of thousands of customers (from startups to enterprises, in industries as varied as transportation, legal, mining, and healthcare) are using QuickSight to analyze and report on their business data.

Here are a couple of examples:

Gemini provides legal evidence procurement for California attorneys who represent injured workers. They have gone from creating custom reports and running one-off queries to creating and sharing dynamic QuickSight dashboards with drill-downs and filtering. QuickSight is used to track sales pipeline, measure order throughput, and to locate bottlenecks in the order processing pipeline.

Jivochat provides a real-time messaging platform to connect visitors to website owners. QuickSight lets them create and share interactive dashboards while also providing access to the underlying datasets. This has allowed them to move beyond the sharing of static spreadsheets, ensuring that everyone is looking at the same and is empowered to make timely decisions based on current data.

Transfix is a tech-powered freight marketplace that matches loads and increases visibility into logistics for Fortune 500 shippers in retail, food and beverage, manufacturing, and other industries. QuickSight has made analytics accessible to both BI engineers and non-technical business users. They scrutinize key business and operational metrics including shipping routes, carrier efficient, and process automation.

Looking Back / Looking Ahead
The feedback on QuickSight has been incredibly helpful. Customers tell us that their employees are using QuickSight to connect to their data, perform analytics, and make high-velocity, data-driven decisions, all without setting up or running their own BI infrastructure. We love all of the feedback that we get, and use it to drive our roadmap, leading to the introduction of over 40 new features in just a year. Here’s a summary:

Looking forward, we are watching an interesting trend develop within our customer base. As these customers take a close look at how they analyze and report on data, they are realizing that a serverless approach offers some tangible benefits. They use Amazon Simple Storage Service (S3) as a data lake and query it using a combination of QuickSight and Amazon Athena, giving them agility and flexibility without static infrastructure. They also make great use of QuickSight’s dashboards feature, monitoring business results and operational metrics, then sharing their insights with hundreds of users. You can read Building a Serverless Analytics Solution for Cleaner Cities and review Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight if you are interested in this approach.

New Features and Enhancements
We’re still doing our best to listen and to learn, and to make sure that QuickSight continues to meet your needs. I’m happy to announce that we are making seven big additions today:

Geospatial Visualization – You can now create geospatial visuals on geographical data sets.

Private VPC Access – You can now sign up to access a preview of a new feature that allows you to securely connect to data within VPCs or on-premises, without the need for public endpoints.

Flat Table Support – In addition to pivot tables, you can now use flat tables for tabular reporting. To learn more, read about Using Tabular Reports.

Calculated SPICE Fields – You can now perform run-time calculations on SPICE data as part of your analysis. Read Adding a Calculated Field to an Analysis for more information.

Wide Table Support – You can now use tables with up to 1000 columns.

Other Buckets – You can summarize the long tail of high-cardinality data into buckets, as described in Working with Visual Types in Amazon QuickSight.

HIPAA Compliance – You can now run HIPAA-compliant workloads on QuickSight.

Geospatial Visualization
Everyone seems to want this feature! You can now take data that contains a geographic identifier (country, city, state, or zip code) and create beautiful visualizations with just a few clicks. QuickSight will geocode the identifier that you supply, and can also accept lat/long map coordinates. You can use this feature to visualize sales by state, map stores to shipping destinations, and so forth. Here’s a sample visualization:

To learn more about this feature, read Using Geospatial Charts (Maps), and Adding Geospatial Data.

Private VPC Access Preview
If you have data in AWS (perhaps in Amazon Redshift, Amazon Relational Database Service (RDS), or on EC2) or on-premises in Teradata or SQL Server on servers without public connectivity, this feature is for you. Private VPC Access for QuickSight uses an Elastic Network Interface (ENI) for secure, private communication with data sources in a VPC. It also allows you to use AWS Direct Connect to create a secure, private link with your on-premises resources. Here’s what it looks like:

If you are ready to join the preview, you can sign up today.

Jeff;

 

Amazon ElastiCache for Redis Is Now a HIPAA Eligible Service and You Can Use It to Power Real-Time Healthcare Applications

Post Syndicated from Manan Goel original https://aws.amazon.com/blogs/security/now-you-can-use-amazon-elasticache-for-redis-a-hipaa-eligible-service-to-power-real-time-healthcare-applications/

HIPAA image

Amazon ElastiCache for Redis is now a HIPAA Eligible Service and has been added to the AWS Business Associate Addendum (BAA). This means you can use ElastiCache for Redis to help you power healthcare applications as well as process, maintain, and store protected health information (PHI). ElastiCache for Redis is a Redis-compatible, fully-managed, in-memory data store and cache in the cloud that provides sub-millisecond latency to power applications. Now you can use the speed, simplicity, and flexibility of ElastiCache for Redis to build secure, fast, and internet-scale healthcare applications.

ElastiCache for Redis with HIPAA eligibility is available for all current-generation instance node types and requires Redis engine version 3.2.6. You must ensure that nodes are configured to encrypt the data in transit and at rest, and to authenticate Redis commands before the engine executes them. See Architecting for HIPAA Security and Compliance on Amazon Web Services for information about how to configure Amazon HIPAA Eligible Services to store, process, and transmit PHI.

ElastiCache for Redis uses Advanced Encryption Standard (AES)-512 symmetric keys to encrypt data on disk. The Redis backups stored in Amazon S3 are encrypted with server-side encryption (SSE) using AES-256 symmetric keys. ElastiCache for Redis uses Transport Layer Security (TLS) to encrypt data in transit. It uses the Redis AUTH token that you provide at the time of Redis cluster creation to authenticate the Redis commands coming from clients. The AUTH token is encrypted using AWS Key Management Service.

There is no additional charge for using ElastiCache for Redis clusters with HIPAA eligibility. To get started, see HIPAA Compliance for Amazon ElastiCache for Redis.

– Manan

New – AWS Direct Connect Gateway – Inter-Region VPC Access

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-direct-connect-gateway-inter-region-vpc-access/

As I was preparing to write this post, I took a nostalgic look at the blog post I wrote when we launched AWS Direct Connect back in 2012. We created Direct Connect after our enterprise customers asked us to allow them to establish dedicated connections to an AWS Region in pursuit of enhanced privacy, additional data transfer bandwidth, and more predictable data transfer performance. Starting from one AWS Region and a single colo, Direct Connect is now available in every public AWS Region and accessible from dozens of colos scattered across the world (over 60 locations at last count). Our customers have taken to Direct Connect wholeheartedly and we have added features such as Link Aggregation, Amazon EFS support, CloudWatch monitoring, and HIPAA eligibility. In the past five weeks alone we have added Direct Connect locations in Houston (Texas), Vancouver (Canada), Manchester (UK), Canberra (Australia), and Perth (Australia).

Today we are making Direct Connect simpler and more powerful with the addition of the Direct Connect Gateway. We are also giving Direct Connect customers in any Region the ability to create public virtual interfaces that receive our global IP routes and enable access to the public endpoints for our services and updating the Direct Connect pricing model.

Let’s take a look at each one!

New Direct Connect Gateway
You can use the new Direct Connect Gateway to establish connectivity that spans Virtual Private Clouds (VPCs) spread across multiple AWS Regions. You no longer need to establish multiple BGP sessions for each VPC; this reduces your administrative workload as well as the load on your network devices.

This feature also allows you to connect to any of the participating VPCs from any Direct Connect location, further reducing your costs for making using AWS services on a cross-region basis.

Here is a diagram that illustrates the simplification that you can achieve with a Direct Connect Gateway (each “lock” icon represents a Virtual Private Gateway). Start with this:

And end up like this:

The VPCs that reference a particular Direct Connect Gateway must have IP address ranges that do not overlap. Today, the VPCs must all be in the same AWS account; we plan to make this more flexible in the future.

Each Gateway is a global object that exists across all of the public AWS Regions. All communication between the Regions via the Gateways takes place across the AWS network backbone.

Creating a Direct Connect Gateway
You can create a Direct Connect Gateway from the Direct Connect Console or by calling the CreateDirectConnectGateway function from your code. I’ll use the Console!

I open the Direct Connect Console and click on Direct Connect Gateways to get started:

The list is empty since I don’t have any Gateways yet. Click on Create Direct Connect Gateway to change that:

I give my Gateway a name, enter a private ASN for my network, then click on Create. The ASN (Autonomous System Number) must be in one of the ranges defined as private in RFC 6996:

My new Gateway will appear in the other AWS Regions within a moment or two:

I have a Direct Connect Connection in Ohio that I will use to create my VIF:

Now I create a private VIF that references the Gateway and the Connection:

It is ready to use within seconds:

I already have a pair of VPCs with non-overlapping CIDRs, and a Virtual Private Gateway attached to each one. Here are the VPCs (since this is a demo I’ll show both in the same Region for convenience):

And the Virtual Private Gateways:

I return to the Direct Connect Console and navigate to the Direct Connect Gateways. I select my Gateway and choose Associate Virtual Private Gateway from the Actions menu:

Then I select both of my Virtual Private Gateways and click on Associate:

If, as would usually be the case, my VPCs are in distinct AWS Regions, the same procedure would apply. For this blog post it was easier to show you the operations once rather than twice.

The Virtual Gateway association is complete within a minute or so (the state starts out as associating):

When the state transitions to associated, traffic can flow between your on-premises network and your VPCs, over your AWS Direct Connect connection, regardless of the AWS Regions where your VPCs reside.

Public Virtual Interfaces for Service Endpoints
You can now create Public Virtual Interfaces that will allow you to access AWS public service endpoints for AWS services running in any AWS Region (except AWS China Region) over Direct Connect. These interfaces receive (via BGP) Amazon’s global IP routes. You can create these interfaces in the Direct Connect Console; start by selecting the Public option:

After you create it you will need to associate it with a VPC.

Updated Pricing Model
In light of the ever-expanding number of AWS Regions and AWS Direct Connect locations, data transfer pricing is now based on the location of the Direct Connect and the source AWS Region. The new pricing is simpler that the older model which was based on AWS Direct Connect locations.

Now Available
This new feature is available today and you can start to use it right now. You can create and use Direct Connect Gateways at no charge; you pay the usual Direct Connect prices for port hours and data transfer.

Jeff;

 

98, 99, 100 CloudFront Points of Presence!

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/98-99-100-cloudfront-points-of-presence/

Nine years ago I showed you how you could Distribute Your Content with Amazon CloudFront. We launched CloudFront in 2008 with 14 Points of Presence and have been expanding rapidly ever since. Today I am pleased to announce the opening of our 100th Point of Presence, the fifth one in Tokyo and the sixth in Japan. With 89 Edge Locations and 11 Regional Edge Caches, CloudFront now supports traffic generated by millions of viewers around the world.

23 Countries, 50 Cities, and Growing
Those 100 Points of Presence span the globe, with sites in 50 cities and 23 countries. In the past 12 months we have expanded the size of our network by about 58%, adding 37 Points of Presence, including nine in the following new cities:

  • Berlin, Germany
  • Minneapolis, Minnesota, USA
  • Prague, Czech Republic
  • Boston, Massachusetts, USA
  • Munich, Germany
  • Vienna, Austria
  • Kuala Lumpur, Malaysia
  • Philadelphia, Pennsylvania, USA
  • Zurich, Switzerland

We have even more in the works, including an Edge Location in the United Arab Emirates, currently planned for the first quarter of 2018.

Innovating for Our Customers
As I mentioned earlier, our network consists of a mix of Edge Locations and Regional Edge Caches. First announced at re:Invent 2016, the Regional Edge Caches sit between our Edge Locations and your origin servers, have even more memory than the Edge Locations, and allow us to store content close to the viewers for rapid delivery, all while reducing the load on the origin servers.

While locations are important, they are just a starting point. We continue to focus on security with the recent launch of our Security Policies feature and our announcement that CloudFront is a HIPAA-eligible service. We gave you more content-serving and content-generation options with the launch of [email protected], letting you run AWS Lambda functions close to your users.

We have also been working to accelerate the processing of cache invalidations and configuration changes. We now accept invalidations within milliseconds of the request and confirm that the request has been processed world-wide, typically within 60 seconds. This helps to ensure that your customers have access to fresh, timely content!

Visit our Getting Started with Amazon CloudFront page for sign-up information, tutorials, webinars, on-demand videos, office hours, and more.

Jeff;

 

AWS HIPAA Eligibility Update (October 2017) – Sixteen Additional Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-hipaa-eligibility-post-update-october-2017-sixteen-additional-services/

Our Health Customer Stories page lists just a few of the many customers that are building and running healthcare and life sciences applications that run on AWS. Customers like Verge Health, Care Cloud, and Orion Health trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.

Sixteen More Services
In my last HIPAA Eligibility Update I shared the news that we added eight additional services to our list of HIPAA eligible services. Today I am happy to let you know that we have added another sixteen services to the list, bringing the total up to 46. Here are the newest additions, along with some short descriptions and links to some of my blog posts to jog your memory:

Amazon Aurora with PostgreSQL Compatibility – This brand-new addition to Amazon Aurora allows you to encrypt your relational databases using keys that you create and manage through AWS Key Management Service (KMS). When you enable encryption for an Amazon Aurora database, the underlying storage is encrypted, as are automated backups, read replicas, and snapshots. Read New – Encryption at Rest for Amazon Aurora to learn more.

Amazon CloudWatch Logs – You can use the logs to monitor and troubleshoot your systems and applications. You can monitor your existing system, application, and custom log files in near real-time, watching for specific phrases, values, or patterns. Log data can be stored durably and at low cost, for as long as needed. To learn more, read Store and Monitor OS & Application Log Files with Amazon CloudWatch and Improvements to CloudWatch Logs and Dashboards.

Amazon Connect – This self-service, cloud-based contact center makes it easy for you to deliver better customer service at a lower cost. You can use the visual designer to set up your contact flows, manage agents, and track performance, all without specialized skills. Read Amazon Connect – Customer Contact Center in the Cloud and New – Amazon Connect and Amazon Lex Integration to learn more.

Amazon ElastiCache for Redis – This service lets you deploy, operate, and scale an in-memory data store or cache that you can use to improve the performance of your applications. Each ElastiCache for Redis cluster publishes key performance metrics to Amazon CloudWatch. To learn more, read Caching in the Cloud with Amazon ElastiCache and Amazon ElastiCache – Now With a Dash of Redis.

Amazon Kinesis Streams – This service allows you to build applications that process or analyze streaming data such as website clickstreams, financial transactions, social media feeds, and location-tracking events. To learn more, read Amazon Kinesis – Real-Time Processing of Streaming Big Data and New: Server-Side Encryption for Amazon Kinesis Streams.

Amazon RDS for MariaDB – This service lets you set up scalable, managed MariaDB instances in minutes, and offers high performance, high availability, and a simplified security model that makes it easy for you to encrypt data at rest and in transit. Read Amazon RDS Update – MariaDB is Now Available to learn more.

Amazon RDS SQL Server – This service lets you set up scalable, managed Microsoft SQL Server instances in minutes, and also offers high performance, high availability, and a simplified security model. To learn more, read Amazon RDS for SQL Server and .NET support for AWS Elastic Beanstalk and Amazon RDS for Microsoft SQL Server – Transparent Data Encryption (TDE) to learn more.

Amazon Route 53 – This is a highly available Domain Name Server. It translates names like www.example.com into IP addresses. To learn more, read Moving Ahead with Amazon Route 53.

AWS Batch – This service lets you run large-scale batch computing jobs on AWS. You don’t need to install or maintain specialized batch software or build your own server clusters. Read AWS Batch – Run Batch Computing Jobs on AWS to learn more.

AWS CloudHSM – A cloud-based Hardware Security Module (HSM) for key storage and management at cloud scale. Designed for sensitive workloads, CloudHSM lets you manage your own keys using FIPS 140-2 Level 3 validated HSMs. To learn more, read AWS CloudHSM – Secure Key Storage and Cryptographic Operations and AWS CloudHSM Update – Cost Effective Hardware Key Management at Cloud Scale for Sensitive & Regulated Workloads.

AWS Key Management Service – This service makes it easy for you to create and control the encryption keys used to encrypt your data. It uses HSMs to protect your keys, and is integrated with AWS CloudTrail in order to provide you with a log of all key usage. Read New AWS Key Management Service (KMS) to learn more.

AWS Lambda – This service lets you run event-driven application or backend code without thinking about or managing servers. To learn more, read AWS Lambda – Run Code in the Cloud, AWS Lambda – A Look Back at 2016, and AWS Lambda – In Full Production with New Features for Mobile Devs.

[email protected] – You can use this new feature of AWS Lambda to run Node.js functions across the global network of AWS locations without having to provision or manager servers, in order to deliver rich, personalized content to your users with low latency. Read [email protected] – Intelligent Processing of HTTP Requests at the Edge to learn more.

AWS Snowball Edge – This is a data transfer device with 100 terabytes of on-board storage as well as compute capabilities. You can use it to move large amounts of data into or out of AWS, as a temporary storage tier, or to support workloads in remote or offline locations. To learn more, read AWS Snowball Edge – More Storage, Local Endpoints, Lambda Functions.

AWS Snowmobile – This is an exabyte-scale data transfer service. Pulled by a semi-trailer truck, each Snowmobile packs 100 petabytes of storage into a ruggedized 45-foot long shipping container. Read AWS Snowmobile – Move Exabytes of Data to the Cloud in Weeks to learn more (and to see some of my finest LEGO work).

AWS Storage Gateway – This hybrid storage service lets your on-premises applications use AWS cloud storage (Amazon Simple Storage Service (S3), Amazon Glacier, and Amazon Elastic File System) in a simple and seamless way, with storage for volumes, files, and virtual tapes. To learn more, read The AWS Storage Gateway – Integrate Your Existing On-Premises Applications with AWS Cloud Storage and File Interface to AWS Storage Gateway.

And there you go! Check out my earlier post for a list of resources that will help you to build applications that comply with HIPAA and HITECH.

Jeff;

 

Catching Up on Some Recent AWS Launches and Publications

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/catching-up-on-some-recent-aws-launches-and-publications/

As I have noted in the past, the AWS Blog Team is working hard to make sure that you know about as many AWS launches and publications as possible, without totally burying you in content! As part of our balancing act, we will occasionally publish catch-up posts to clear our queues and to bring more information to your attention. Here’s what I have in store for you today:

  • Monitoring for Cross-Region Replication of S3 Objects
  • Tags for Spot Fleet Instances
  • PCI DSS Compliance for 12 More Services
  • HIPAA Eligibility for WorkDocs
  • VPC Resizing
  • AppStream 2.0 Graphics Design Instances
  • AMS Connector App for ServiceNow
  • Regtech in the Cloud
  • New & Revised Quick Starts

Let’s jump right in!

Monitoring for Cross-Region Replication of S3 Objects
I told you about cross-region replication for S3 a couple of years ago. As I showed you at the time, you simply enable versioning for the source bucket and then choose a destination region and bucket. You can check the replication status manually, or you can create an inventory (daily or weekly) of the source and destination buckets.

The Cross-Region Replication Monitor (CRR Monitor for short) solution checks the replication status of objects across regions and gives you metrics and failure notifications in near real-time.

To learn more, read the CRR Monitor Implementation Guide and then use the AWS CloudFormation template to Deploy the CRR Monitor.

Tags for Spot Instances
Spot Instances and Spot Fleets (collections of Spot Instances) give you access to spare compute capacity. We recently gave you the ability to enter tags (key/value pairs) as part of your spot requests and to have those tags applied to the EC2 instances launched to fulfill the request:

To learn more, read Tag Your Spot Fleet EC2 Instances.

PCI DSS Compliance for 12 More Services
As first announced on the AWS Security Blog, we recently added 12 more services to our PCI DSS compliance program, raising the total number of in-scope services to 42. To learn more, check out our Compliance Resources.

HIPAA Eligibility for WorkDocs
In other compliance news, we announced that Amazon WorkDocs has achieved HIPAA eligibility and PCI DSS compliance in all AWS Regions where WorkDocs is available.

VPC Resizing
This feature allows you to extend an existing Virtual Private Cloud (VPC) by adding additional blocks of addresses. This gives you more flexibility and should help you to deal with growth. You can add up to four secondary /16 CIDRs per VPC. You can also edit the secondary CIDRs by deleting them and adding new ones. Simply select the VPC and choose Edit CIDRs from the menu:

Then add or remove CIDR blocks as desired:

To learn more, read about VPCs and Subnets.

AppStream 2.0 Graphics Design Instances
Powered by AMD FirePro S7150x2 Server GPUs and equipped with AMD Multiuser GPU technology, the new Graphics Design instances for Amazon AppStream 2.0 will let you run and stream graphics applications more cost-effectively than ever. The instances are available in four sizes, with 2-16 vCPUs and 7.5 GB to 61 GB of memory.

To learn more, read Introducing Amazon AppStream 2.0 Graphics Design, a New Lower Costs Instance Type for Streaming Graphics Applications.

AMS Connector App for ServiceNow
AWS Managed Services (AMS) provides Infrastructure Operations Management for the Enterprise. Designed to accelerate cloud adoption, it automates common operations such as change requests, patch management, security and backup.

The new AMS integration App for ServiceNow lets you interact with AMS from within ServiceNow, with no need for any custom development or API integration.

To learn more, read Cloud Management Made Easier: AWS Managed Services Now Integrates with ServiceNow.

Regtech in the Cloud
Regtech (as I learned while writing this), is short for regulatory technology, and is all about using innovative technology such as cloud computing, analytics, and machine learning to address regulatory challenges.

Working together with APN Consulting Partner Cognizant, TABB Group recently published a thought leadership paper that explains why regulations and compliance pose huge challenges for our customers in the financial services, and shows how AWS can help!

New & Revised Quick Starts
Our Quick Starts team has been cranking out new solutions and making significant updates to the existing ones. Here’s a roster:

Alfresco Content Services (v2) Atlassian Confluence Confluent Platform Data Lake
Datastax Enterprise GitHub Enterprise Hashicorp Nomad HIPAA
Hybrid Data Lake with Wandisco Fusion IBM MQ IBM Spectrum Scale Informatica EIC
Magento (v2) Linux Bastion (v2) Modern Data Warehouse with Tableau MongoDB (v2)
NetApp ONTAP NGINX (v2) RD Gateway Red Hat Openshift
SAS Grid SIOS Datakeeper StorReduce SQL Server (v2)

And that’s all I have for today!

Jeff;

AWS HIPAA Eligibility Update (July 2017) – Eight Additional Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-hipaa-eligibility-update-july-2017-eight-additional-services/

It is time for an update on our on-going effort to make AWS a great host for healthcare and life sciences applications. As you can see from our Health Customer Stories page, Philips, VergeHealth, and Cambia (to choose a few) trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.

In May we announced that we added Amazon API Gateway, AWS Direct Connect, AWS Database Migration Service, and Amazon Simple Queue Service (SQS) to our list of HIPAA eligible services and discussed our how customers and partners are putting them to use.

Eight More Eligible Services
Today I am happy to share the news that we are adding another eight services to the list:

Amazon CloudFront can now be utilized to enhance the delivery and transfer of Protected Health Information data to applications on the Internet. By providing a completely secure and encryptable pathway, CloudFront can now be used as a part of applications that need to cache PHI. This includes applications for viewing lab results or imaging data, and those that transfer PHI from Healthcare Information Exchanges (HIEs).

AWS WAF can now be used to protect applications running on AWS which operate on PHI such as patient care portals, patient scheduling systems, and HIEs. Requests and responses containing encrypted PHI and PII can now pass through AWS WAF.

AWS Shield can now be used to protect web applications such as patient care portals and scheduling systems that operate on encrypted PHI from DDoS attacks.

Amazon S3 Transfer Acceleration can now be used to accelerate the bulk transfer of large amounts of research, genetics, informatics, insurance, or payer/payment data containing PHI/PII information. Transfers can take place between a pair of AWS Regions or from an on-premises system and an AWS Region.

Amazon WorkSpaces can now be used by researchers, informaticists, hospital administrators and other users to analyze, visualize or process PHI/PII data using on-demand Windows virtual desktops.

AWS Directory Service can now be used to connect the authentication and authorization systems of organizations that use or process PHI/PII to their resources in the AWS Cloud. For example, healthcare providers operating hybrid cloud environments can now use AWS Directory Services to allow their users to easily transition between cloud and on-premises resources.

Amazon Simple Notification Service (SNS) can now be used to send notifications containing encrypted PHI/PII as part of patient care, payment processing, and mobile applications.

Amazon Cognito can now be used to authenticate users into mobile patient portal and payment processing applications that use PHI/PII identifiers for accounts.

Additional HIPAA Resources
Here are some additional resources that will help you to build applications that comply with HIPAA and HITECH:

Keep in Touch
In order to make use of any AWS service in any manner that involves PHI, you must first enter into an AWS Business Associate Addendum (BAA). You can contact us to start the process.

Jeff;

Analyze OpenFDA Data in R with Amazon S3 and Amazon Athena

Post Syndicated from Ryan Hood original https://aws.amazon.com/blogs/big-data/analyze-openfda-data-in-r-with-amazon-s3-and-amazon-athena/

One of the great benefits of Amazon S3 is the ability to host, share, or consume public data sets. This provides transparency into data to which an external data scientist or developer might not normally have access. By exposing the data to the public, you can glean many insights that would have been difficult with a data silo.

The openFDA project creates easy access to the high value, high priority, and public access data of the Food and Drug Administration (FDA). The data has been formatted and documented in consumer-friendly standards. Critical data related to drugs, devices, and food has been harmonized and can easily be called by application developers and researchers via API calls. OpenFDA has published two whitepapers that drill into the technical underpinnings of the API infrastructure as well as how to properly analyze the data in R. In addition, FDA makes openFDA data available on S3 in raw format.

In this post, I show how to use S3, Amazon EMR, and Amazon Athena to analyze the drug adverse events dataset. A drug adverse event is an undesirable experience associated with the use of a drug, including serious drug side effects, product use errors, product quality programs, and therapeutic failures.

Data considerations

Keep in mind that this data does have limitations. In addition, in the United States, these adverse events are submitted to the FDA voluntarily from consumers so there may not be reports for all events that occurred. There is no certainty that the reported event was actually due to the product. The FDA does not require that a causal relationship between a product and event be proven, and reports do not always contain the detail necessary to evaluate an event. Because of this, there is no way to identify the true number of events. The important takeaway to all this is that the information contained in this data has not been verified to produce cause and effect relationships. Despite this disclaimer, many interesting insights and value can be derived from the data to accelerate drug safety research.

Data analysis using SQL

For application developers who want to perform targeted searching and lookups, the API endpoints provided by the openFDA project are “ready to go” for software integration using a standard API powered by Elasticsearch, NodeJS, and Docker. However, for data analysis purposes, it is often easier to work with the data using SQL and statistical packages that expect a SQL table structure. For large-scale analysis, APIs often have query limits, such as 5000 records per query. This can cause extra work for data scientists who want to analyze the full dataset instead of small subsets of data.

To address the concern of requiring all the data in a single dataset, the openFDA project released the full 100 GB of harmonized data files that back the openFDA project onto S3. Athena is an interactive query service that makes it easy to analyze data in S3 using standard SQL. It’s a quick and easy way to answer your questions about adverse events and aspirin that does not require you to spin up databases or servers.

While you could point tools directly at the openFDA S3 files, you can find greatly improved performance and use of the data by following some of the preparation steps later in this post.

Architecture

This post explains how to use the following architecture to take the raw data provided by openFDA, leverage several AWS services, and derive meaning from the underlying data.

Steps:

  1. Load the openFDA /drug/event dataset into Spark and convert it to gzip to allow for streaming.
  2. Transform the data in Spark and save the results as a Parquet file in S3.
  3. Query the S3 Parquet file with Athena.
  4. Perform visualization and analysis of the data in R and Python on Amazon EC2.

Optimizing public data sets: A primer on data preparation

Those who want to jump right into preparing the files for Athena may want to skip ahead to the next section.

Transforming, or pre-processing, files is a common task for using many public data sets. Before you jump into the specific steps for transforming the openFDA data files into a format optimized for Athena, I thought it would be worthwhile to provide a quick exploration on the problem.

Making a dataset in S3 efficiently accessible with minimal transformation for the end user has two key elements:

  1. Partitioning the data into objects that contain a complete part of the data (such as data created within a specific month).
  2. Using file formats that make it easy for applications to locate subsets of data (for example, gzip, Parquet, ORC, etc.).

With these two key elements in mind, you can now apply transformations to the openFDA adverse event data to prepare it for Athena. You might find the data techniques employed in this post to be applicable to many of the questions you might want to ask of the public data sets stored in Amazon S3.

Before you get started, I encourage those who are interested in doing deeper healthcare analysis on AWS to make sure that you first read the AWS HIPAA Compliance whitepaper. This covers the information necessary for processing and storing patient health information (PHI).

Also, the adverse event analysis shown for aspirin is strictly for demonstration purposes and should not be used for any real decision or taken as anything other than a demonstration of AWS capabilities. However, there have been robust case studies published that have explored a causal relationship between aspirin and adverse reactions using OpenFDA data. If you are seeking research on aspirin or its risks, visit organizations such as the Centers for Disease Control and Prevention (CDC) or the Institute of Medicine (IOM).

Preparing data for Athena

For this walkthrough, you will start with the FDA adverse events dataset, which is stored as JSON files within zip archives on S3. You then convert it to Parquet for analysis. Why do you need to convert it? The original data download is stored in objects that are partitioned by quarter.

Here is a small sample of what you find in the adverse events (/drugs/event) section of the openFDA website.

If you were looking for events that happened in a specific quarter, this is not a bad solution. For most other scenarios, such as looking across the full history of aspirin events, it requires you to access a lot of data that you won’t need. The zip file format is not ideal for using data in place because zip readers must have random access to the file, which means the data can’t be streamed. Additionally, the zip files contain large JSON objects.

To read the data in these JSON files, a streaming JSON decoder must be used or a computer with a significant amount of RAM must decode the JSON. Opening up these files for public consumption is a great start. However, you still prepare the data with a few lines of Spark code so that the JSON can be streamed.

Step 1:  Convert the file types

Using Apache Spark on EMR, you can extract all of the zip files and pull out the events from the JSON files. To do this, use the Scala code below to deflate the zip file and create a text file. In addition, compress the JSON files with gzip to improve Spark’s performance and reduce your overall storage footprint. The Scala code can be run in either the Spark Shell or in an Apache Zeppelin notebook on your EMR cluster.

If you are unfamiliar with either Apache Zeppelin or the Spark Shell, the following posts serve as great references:

 

import scala.io.Source
import java.util.zip.ZipInputStream
import org.apache.spark.input.PortableDataStream
import org.apache.hadoop.io.compress.GzipCodec

// Input Directory
val inputFile = "s3://download.open.fda.gov/drug/event/2015q4/*.json.zip";

// Output Directory
val outputDir = "s3://{YOUR OUTPUT BUCKET HERE}/output/2015q4/";

// Extract zip files from 
val zipFiles = sc.binaryFiles(inputFile);

// Process zip file to extract the json as text file and save it
// in the output directory 
val rdd = zipFiles.flatMap((file: (String, PortableDataStream)) => {
    val zipStream = new ZipInputStream(file.2.open)
    val entry = zipStream.getNextEntry
    val iter = Source.fromInputStream(zipStream).getLines
    iter
}).map(.replaceAll("\s+","")).saveAsTextFile(outputDir, classOf[GzipCodec])

Step 2:  Transform JSON into Parquet

With just a few more lines of Scala code, you can use Spark’s abstractions to convert the JSON into a Spark DataFrame and then export the data back to S3 in Parquet format.

Spark requires the JSON to be in JSON Lines format to be parsed correctly into a DataFrame.

// Output Parquet directory
val outputDir = "s3://{YOUR OUTPUT BUCKET NAME}/output/drugevents"
// Input json file
val inputJson = "s3://{YOUR OUTPUT BUCKET NAME}/output/2015q4/*”
// Load dataframe from json file multiline 
val df = spark.read.json(sc.wholeTextFiles(inputJson).values)
// Extract results from dataframe
val results = df.select("results")
// Save it to Parquet
results.write.parquet(outputDir)

Step 3:  Create an Athena table

With the data cleanly prepared and stored in S3 using the Parquet format, you can now place an Athena table on top of it to get a better understanding of the underlying data.

Because the openFDA data structure incorporates several layers of nesting, it can be a complex process to try to manually derive the underlying schema in a Hive-compatible format. To shorten this process, you can load the top row of the DataFrame from the previous step into a Hive table within Zeppelin and then extract the “create  table” statement from SparkSQL.

results.createOrReplaceTempView("data")

val top1 = spark.sql("select * from data tablesample(1 rows)")

top1.write.format("parquet").mode("overwrite").saveAsTable("drugevents")

val show_cmd = spark.sql("show create table drugevents”).show(1, false)

This returns a “create table” statement that you can almost paste directly into the Athena console. Make some small modifications (adding the word “external” and replacing “using with “stored as”), and then execute the code in the Athena query editor. The table is created.

For the openFDA data, the DDL returns all string fields, as the date format used in your dataset does not conform to the yyy-mm-dd hh:mm:ss[.f…] format required by Hive. For your analysis, the string format works appropriately but it would be possible to extend this code to use a Presto function to convert the strings into time stamps.

CREATE EXTERNAL TABLE  drugevents (
   companynumb  string, 
   safetyreportid  string, 
   safetyreportversion  string, 
   receiptdate  string, 
   patientagegroup  string, 
   patientdeathdate  string, 
   patientsex  string, 
   patientweight  string, 
   serious  string, 
   seriousnesscongenitalanomali  string, 
   seriousnessdeath  string, 
   seriousnessdisabling  string, 
   seriousnesshospitalization  string, 
   seriousnesslifethreatening  string, 
   seriousnessother  string, 
   actiondrug  string, 
   activesubstancename  string, 
   drugadditional  string, 
   drugadministrationroute  string, 
   drugcharacterization  string, 
   drugindication  string, 
   drugauthorizationnumb  string, 
   medicinalproduct  string, 
   drugdosageform  string, 
   drugdosagetext  string, 
   reactionoutcome  string, 
   reactionmeddrapt  string, 
   reactionmeddraversionpt  string)
STORED AS parquet
LOCATION
  's3://{YOUR TARGET BUCKET}/output/drugevents'

With the Athena table in place, you can start to explore the data by running ad hoc queries within Athena or doing more advanced statistical analysis in R.

Using SQL and R to analyze adverse events

Using the openFDA data with Athena makes it very easy to translate your questions into SQL code and perform quick analysis on the data. After you have prepared the data for Athena, you can begin to explore the relationship between aspirin and adverse drug events, as an example. One of the most common metrics to measure adverse drug events is the Proportional Reporting Ratio (PRR). It is defined as:

PRR = (m/n)/( (M-m)/(N-n) )
Where
m = #reports with drug and event
n = #reports with drug
M = #reports with event in database
N = #reports in database

Gastrointestinal haemorrhage has the highest PRR of any reaction to aspirin when viewed in aggregate. One question you may want to ask is how the PRR has trended on a yearly basis for gastrointestinal haemorrhage since 2005.

Using the following query in Athena, you can see the PRR trend of “GASTROINTESTINAL HAEMORRHAGE” reactions with “ASPIRIN” since 2005:

with drug_and_event as 
(select rpad(receiptdate, 4, 'NA') as receipt_year
    , reactionmeddrapt
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug_and_event 
from fda.drugevents
where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
group by reactionmeddrapt, rpad(receiptdate, 4, 'NA') 
), reports_with_drug as 
(
select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_drug 
 from fda.drugevents 
 where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and medicinalproduct = 'ASPIRIN'
group by rpad(receiptdate, 4, 'NA') 
), reports_with_event as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as reports_with_event 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
     and reactionmeddrapt= 'GASTROINTESTINAL HAEMORRHAGE'
   group by rpad(receiptdate, 4, 'NA')
), total_reports as 
(
   select rpad(receiptdate, 4, 'NA') as receipt_year
    , count(distinct (concat(safetyreportid,receiptdate,reactionmeddrapt))) as total_reports 
   from fda.drugevents
   where rpad(receiptdate,4,'NA') 
     between '2005' and '2015' 
   group by rpad(receiptdate, 4, 'NA')
)
select  drug_and_event.receipt_year, 
(1.0 * drug_and_event.reports_with_drug_and_event/reports_with_drug.reports_with_drug)/ (1.0 * (reports_with_event.reports_with_event- drug_and_event.reports_with_drug_and_event)/(total_reports.total_reports-reports_with_drug.reports_with_drug)) as prr
, drug_and_event.reports_with_drug_and_event
, reports_with_drug.reports_with_drug
, reports_with_event.reports_with_event
, total_reports.total_reports
from drug_and_event
    inner join reports_with_drug on  drug_and_event.receipt_year = reports_with_drug.receipt_year   
    inner join reports_with_event on  drug_and_event.receipt_year = reports_with_event.receipt_year
    inner join total_reports on  drug_and_event.receipt_year = total_reports.receipt_year
order by  drug_and_event.receipt_year


One nice feature of Athena is that you can quickly connect to it via R or any other tool that can use a JDBC driver to visualize the data and understand it more clearly.

With this quick R script that can be run in R Studio either locally or on an EC2 instance, you can create a visualization of the PRR and Reporting Odds Ratio (RoR) for “GASTROINTESTINAL HAEMORRHAGE” reactions from “ASPIRIN” since 2005 to better understand these trends.

# connect to ATHENA
conn <- dbConnect(drv, '<Your JDBC URL>',s3_staging_dir="<Your S3 Location>",user=Sys.getenv(c("USER_NAME"),password=Sys.getenv(c("USER_PASSWORD"))

# Declare Adverse Event
adverseEvent <- "'GASTROINTESTINAL HAEMORRHAGE'"

# Build SQL Blocks
sqlFirst <- "SELECT rpad(receiptdate, 4, 'NA') as receipt_year, count(DISTINCT safetyreportid) as event_count FROM fda.drugsflat WHERE rpad(receiptdate,4,'NA') between '2005' and '2015'"
sqlEnd <- "GROUP BY rpad(receiptdate, 4, 'NA') ORDER BY receipt_year"

# Extract Aspirin with adverse event counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN' AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
aspirinAdverseCount = dbGetQuery(conn,sql)

# Extract Aspirin counts
sql <- paste(sqlFirst,"AND medicinalproduct ='ASPIRIN'", sqlEnd,sep=" ")
aspirinCount = dbGetQuery(conn,sql)

# Extract adverse event counts
sql <- paste(sqlFirst,"AND reactionmeddrapt=",adverseEvent, sqlEnd,sep=" ")
adverseCount = dbGetQuery(conn,sql)

# All Drug Adverse event Counts
sql <- paste(sqlFirst, sqlEnd,sep=" ")
allDrugCount = dbGetQuery(conn,sql)

# Select correct rows
selAll =  allDrugCount$receipt_year == aspirinAdverseCount$receipt_year
selAspirin = aspirinCount$receipt_year == aspirinAdverseCount$receipt_year
selAdverse = adverseCount$receipt_year == aspirinAdverseCount$receipt_year

# Calculate Numbers
m <- c(aspirinAdverseCount$event_count)
n <- c(aspirinCount[selAspirin,2])
M <- c(adverseCount[selAdverse,2])
N <- c(allDrugCount[selAll,2])

# Calculate proptional reporting ratio
PRR = (m/n)/((M-m)/(N-n))

# Calculate reporting Odds Ratio
d = n-m
D = N-M
ROR = (m/d)/(M/D)

# Plot the PRR and ROR
g_range <- range(0, PRR,ROR)
g_range[2] <- g_range[2] + 3
yearLen = length(aspirinAdverseCount$receipt_year)
axis(1,1:yearLen,lab=ax)
plot(PRR, type="o", col="blue", ylim=g_range,axes=FALSE, ann=FALSE)
axis(1,1:yearLen,lab=ax)
axis(2, las=1, at=1*0:g_range[2])
box()
lines(ROR, type="o", pch=22, lty=2, col="red")

As you can see, the PRR and RoR have both remained fairly steady over this time range. With the R Script above, all you need to do is change the adverseEvent variable from GASTROINTESTINAL HAEMORRHAGE to another type of reaction to analyze and compare those trends.

Summary

In this walkthrough:

  • You used a Scala script on EMR to convert the openFDA zip files to gzip.
  • You then transformed the JSON blobs into flattened Parquet files using Spark on EMR.
  • You created an Athena DDL so that you could query these Parquet files residing in S3.
  • Finally, you pointed the R package at the Athena table to analyze the data without pulling it into a database or creating your own servers.

If you have questions or suggestions, please comment below.


Next Steps

Take your skills to the next level. Learn how to optimize Amazon S3 for an architecture commonly used to enable genomic data analysis. Also, be sure to read more about running R on Amazon Athena.

 

 

 

 

 


About the Authors

Ryan Hood is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys watching the Cubs win the World Series and attempting to Sous-vide anything he can find in his refrigerator.

 

 

Vikram Anand is a Data Engineer for AWS. He works on big data projects leveraging the newest AWS offerings. In his spare time, he enjoys playing soccer and watching the NFL & European Soccer leagues.

 

 

Dave Rocamora is a Solutions Architect at Amazon Web Services on the Open Data team. Dave is based in Seattle and when he is not opening data, he enjoys biking and drinking coffee outside.

 

 

 

 

Introducing the Self-Service Business Associate Addendum

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/introducing-the-self-service-business-associate-addendum/

HIPAA logo

Today, we made available a new feature in AWS Artifact (our auditing and compliance portal) that enables you to review, accept, and track the status of your Business Associate Addendum (BAA). With this new feature, you can accept the terms of a BAA online, and instantly designate an AWS account as a “HIPAA Account” for use with protected health information (PHI) under the U.S. Health Insurance Portability and Accountability Act (HIPAA). In addition, you can sign in to AWS Artifact to confirm that your account is designated as a HIPAA Account, and review the terms of the BAA for that account. If you are no longer using a designated HIPAA Account in connection with PHI, you can remove that designation using the AWS Artifact interface.

Today’s release addresses two key customer needs in particular: (1) the need to enter into a BAA quickly, and (2) the need to easily track and control whether an AWS account is designated as a HIPAA Account under a BAA.

The BAA is the first specialized industry agreement that AWS is making available online. We chose to launch with the BAA as a commitment to AWS customer organizations who are reinventing the way healthcare is researched and delivered with the cloud. Many AWS customers have great stories to tell as we work together to use technology to advance the healthcare industry.

If you already have a BAA with AWS, or if you are considering designing or migrating a new solution that will create, receive, maintain, or transmit PHI on AWS, you can use AWS Artifact to manage your HIPAA Accounts today. As with all AWS Artifact features, there are no additional fees for using AWS Artifact to review, accept, and manage BAAs online.

– Chad

AWS HIPAA Program Update – Dedicated Instances and Hosts Are No Longer Required

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/aws-hipaa-program-update-dedicated-instances-and-hosts-are-no-longer-required/

Over the years, we have seen tremendous growth in the use of the AWS Cloud for healthcare applications. Our customers and AWS Partner Network (APN) Partners who offer solutions that store, process, and transmit Protected Health Information (PHI) sign a Business Associate Addendum (BAA) with AWS. As part of the AWS HIPAA compliance program, customers and APN Partners must use a set of HIPAA Eligible Services for portions of their applications that store, process, and transmit PHI.

Recently, our HIPAA compliance program announced that those AWS customers and APN Partners who have signed a BAA with AWS are no longer required to use Amazon EC2 Dedicated Instances and Dedicated Hosts to store, process, or transmit PHI. To learn more about the announcement and some architectural optimizations you should consider making, see the full APN Blog post.

–  Craig