Cesar Wedemann (QEDU) talks to Simon about how they gather Education data and provide this data to teachers and public schools to improve education in Brazil. They developed a free-access portal that offers easy visualization of brazilian Education open data.
The AWS Podcast is a cloud platform podcast for developers, dev ops, and cloud professionals seeking the latest news and trends in storage, security, infrastructure, serverless, and more. Join Simon Elisha and Jeff Barr for regular updates, deep dives and interviews. Whether you’re building machine learning and AI models, open source projects, or hybrid cloud solutions, the AWS Podcast has something for you. Subscribe with one of the following:
Like the Podcast?
Rate us on iTunes and send your suggestions, show ideas, and comments to [email protected]. We want to hear from you!
How do you train the next generation of Digital leaders? How do you provide them with a modern educational experience? Can you do it without technical expertise? Hear how Ruth Black (Teaching Fellow at the Digital Academy) applied Amazon Transcribe to make this real.
The AWS Podcast is a cloud platform podcast for developers, dev ops, and cloud professionals seeking the latest news and trends in storage, security, infrastructure, serverless, and more. Join Simon Elisha and Jeff Barr for regular updates, deep dives, and interviews.
Rate us on iTunes and send your suggestions, show ideas, and comments to [email protected]. We want to hear from you!
Jeff Olson (VP & Chief Data Officer at College Board) talks about his experiences in fostering change from an organisational standpoint whilst moving to a microservices architecture.
The AWS Podcast is a cloud platform podcast for developers, dev ops, and cloud professionals seeking the latest news and trends in storage, security, infrastructure, serverless, and more. Join Simon Elisha and Jeff Barr for regular updates, deep dives and interviews. Whether you’re building machine learning and AI models, open source projects, or hybrid cloud solutions, the AWS Podcast has something for you. Subscribe with one of the following:
Like the Podcast?
Rate us on iTunes and send your suggestions, show ideas, and comments to [email protected]. We want to hear from you!
Cliff Addison (University of Liverpool) joins with Will Mayers and Cristin Merritt (Alces Flight) to talk about High Performance Computing in the cloud and meeting the needs of researchers.
The AWS Podcast is a cloud platform podcast for developers, dev ops, and cloud professionals seeking the latest news and trends in storage, security, infrastructure, serverless, and more. Join Simon Elisha and Jeff Barr for regular updates, deep dives and interviews. Whether you’re building machine learning and AI models, open source projects, or hybrid cloud solutions, the AWS Podcast has something for you. Subscribe with one of the following:
Like the Podcast?
Rate us on iTunes and send your suggestions, show ideas, and comments to [email protected]. We want to hear from you!
Continuous Diagnostics and Mitigation (CDM), a U.S. Department of Homeland Security cybersecurity program, is gaining new visibility as part of the federal government’s overall focus on securing its information and networks. How an agency performs against CDM will soon be regularly reported in the updated Federal Information Technology Acquisition Reform Act (FITARA) scorecard. That’s in addition to updates in the President’s Management Agenda. Because of this additional scrutiny, there are many questions about how cloud service providers can enable the CDM journey.
This blog will explain how you can implement a CDM program—or extend an existing one—within your AWS environment, and how you can use AWS capabilities to provide real-time CDM compliance and FISMA reporting.
When it comes to compliance for departments and agencies, the AWS Shared Responsibility Model describes how AWS is responsible for security of the cloud and the customer is responsible for security in the cloud. The Shared Responsibility Model is segmented into 3 categories (1) infrastructure services (see figure 1 below), (2) container services, and (3) abstracted services, each having a different level of controls customers can inherit to minimize their effort in managing and maintaining compliance and audit requirements. For example, when designing your workloads in AWS that fall under Infrastructure Services in the Shared Responsibility Model, AWS helps relieve your operational burden, as AWS operates, manages, and controls the components from the host operating system and virtualization layer down to the physical security of the facilities in which the services operate. This also relates to IT controls you can inherit through AWS compliance programs. Before their journey to the AWS Cloud, a customer may have been responsible for the entire control set in their compliance and auditing program. With AWS, you can inherit controls from AWS compliance programs, allowing you to focus on workloads and data you put in the cloud.
Figure 1: AWS Infrastructure Services
For example, if you deploy infrastructure services such as Amazon Virtual Private Cloud (Amazon VPC) networking and Amazon Elastic Compute Cloud (Amazon EC2) instances, you can base your CDM compliance controls on the AWS controls you inherit for network infrastructure, virtualization, and physical security. You would be responsible for managing things like change management for Amazon EC2 AMIs, operating system and patching, AWS Config management, AWS Identity and Access Management (IAM), and encryption of services at rest and in transit.
If you deploy container services such as Amazon Relational Database Service (Amazon RDS) or Amazon EMR that build on top of infrastructure services, AWS manages the underlying infrastructure virtualization, physical controls, and the OS and associated IT controls, like patching and change management. You inherit the IT security controls from AWS compliance programs and can use them as artifacts in your compliance program. For example, you can request a SOC report for one of our 62 SOC-compliant services available from AWS Artifact. You are still responsible for the controls of what you put in the cloud.
Another example is if you deploy abstracted services such as Amazon Simple Storage Service (S3) or Amazon DynamoDB. AWS is responsible for the IT controls applicable to the services we provide. You are responsible for managing your data, classifying your assets, using IAM to apply permissions to individual resources at the platform level, or managing permissions based on user identity or user responsibility at the IAM user/group level.
For agencies struggling to comply with FISMA requirements and thus CDM reporting, leveraging abstracted and container services means you now have the ability to inherit more controls from AWS, reducing the resource drain and cost of FISMA compliance.
So, what does this all mean for your CDM program on AWS? It means that AWS can help agencies realize CDM’s vision of near-real time FISMA reporting for infrastructure, container, and abstracted services. The following paragraphs explain how you can leverage existing AWS Services and solutions that support CDM.
For AM 1.4, AWS Config identifies all cloud assets in your AWS account, which can be stored in a DynamoDB table in the reporting format required. AWS Config rules can enforce and report on compliance, ensuring all Amazon Elastic Block Store (EBS) volumes (block level storage) and Amazon S3 buckets (object storage) are encrypted.
For metric 2.9, AWS Config provides relationship information for all resources, which means you can use AWS Config relationship data to report on CIO metrics focused on the segmentation of resources. AWS Config can identify things like the network interfaces, security groups (firewalls), subnets, and VPCs related to an instance.
The focus is on you to develop policies and procedures to respond to cyber “incidents.” AWS provides capabilities to automate policies and procedures in a policy driven manner to improve detection times, shorten response times, and reduce attack surface. FISMA defines “incident” as “an occurrence that (A) actually or imminently jeopardizes, without lawful authority, the integrity, confidentiality, or availability of information or an information system; or (B) constitutes a violation or imminent threat of violation of law, security policies, security procedures, or acceptable use policies.”
Enabling GuardDuty in your accounts and writing finding results to a reporting database complies with CIO “Recover” metrics 4.1 and 4.2. Requirement 4.3 mandates you “automatically disable the system or relevant asset upon the detection of a given security violation or vulnerability,” per NIST SP 800-53r4 IR-4(2). You can comply by enabling automation to remediate instances.
5.0 Recover
Responding and recovering are closely related processes. Enabling automation to remediate instances also complies with 5.1 and 5.2, which focus on recovering from incidents. While it is your responsibility to develop an Information System Contingency Plan (ISCP), AWS can enable you to translate that plan into automated machine-readable policy through AWS CloudFormation. The service can use AWS Config to report on the number of HVA systems for which an ISCP has been developed (5.1). For both 5.1 and 5.2, “mean time for the organization to restore operations following the containment of a system intrusion or compromise,” using AWS multi-region architectures, inter-region VPC peering, S3 cross-region replication, AWS Landing Zones, and the global AWS network to enable global networking with services like AWS Direct Connect gateway allows you to develop an architecture (5.1.1) that tracks the number of HVA systems that have an alternate processing site identified and provisioned to enable global recovery in minutes.
The end result is that you can build a new CDM program or extend an existing one using AWS. We are relentlessly focused on innovating for you and providing a comprehensive platform of secure IT services to meet your unique needs. Federal customers required to comply with CDM can use these innovations to incorporate services like AWS Config and AWS Systems Manager to provide assets and software management in AWS and on-premise systems to answer the question, “What is on the network?”
For real time compliance and reporting, you can leverage services like GuardDuty to analyze AWS CloudTrail, DNS, and VPC flow logs in your account so you really know “who is on your network,” and “what is happening on the network.” VPC, security groups, Amazon CloudFront, and AWS WAF allow you to protect the boundary of your environment, while services like AWS Key Management Service (KMS), AWS CloudHSM, AWS Certificate Manager (ACM) and HTTPS endpoints enable you to “protect the data on the network.”
The services discussed in this blog provide you with the ability to design a cloud architecture that can help you move from ad-hoc to optimized in your CDM reporting. If you want more data, send us feedback and please, let us know how we can help with your CDM journey! If you have feedback about this blog post, submit comments in the Comments section below, or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
Our teams are continuing to focus on compliance enablement around the world and now that includes a new guide for public sector customers in India. The User Guide for Government Departments and Agencies in India provides information that helps government users at various central, state, district, and municipal agencies understand security and controls available with AWS. It also explains how to implement appropriate information security, risk management, and governance programs using AWS Services, which are offered in India by Amazon Internet Services Private Limited (AISPL).
The guide focuses on the Ministry of Electronics and Information Technology (Meity) requirements that are detailed in Guidelines for Government Departments for Adoption/Procurement of Cloud Services, addressing common issues that public sector customers encounter.
Our newest guide is part of a series diving into customer compliance issues across industries and jurisdictions, such as financial services guides for Singapore, Australia, and Hong Kong. We’ll be publishing additional guides this year to help you understand other regulatory requirements around the world.
Want more AWS Security news? Follow us on Twitter.
We have seen a lot of discussion this past week about the role of Amazon Rekognition in facial recognition, surveillance, and civil liberties, and we wanted to share some thoughts.
Amazon Rekognition is a service we announced in 2016. It makes use of new technologies – such as deep learning – and puts them in the hands of developers in an easy-to-use, low-cost way. Since then, we have seen customers use the image and video analysis capabilities of Amazon Rekognition in ways that materially benefit both society (e.g. preventing human trafficking, inhibiting child exploitation, reuniting missing children with their families, and building educational apps for children), and organizations (enhancing security through multi-factor authentication, finding images more easily, or preventing package theft). Amazon Web Services (AWS) is not the only provider of services like these, and we remain excited about how image and video analysis can be a driver for good in the world, including in the public sector and law enforcement.
There have always been and will always be risks with new technology capabilities. Each organization choosing to employ technology must act responsibly or risk legal penalties and public condemnation. AWS takes its responsibilities seriously. But we believe it is the wrong approach to impose a ban on promising new technologies because they might be used by bad actors for nefarious purposes in the future. The world would be a very different place if we had restricted people from buying computers because it was possible to use that computer to do harm. The same can be said of thousands of technologies upon which we all rely each day. Through responsible use, the benefits have far outweighed the risks.
Customers are off to a great start with Amazon Rekognition; the evidence of the positive impact this new technology can provide is strong (and growing by the week), and we’re excited to continue to support our customers in its responsible use.
-Dr. Matt Wood, general manager of artificial intelligence at AWS
Regardless of your career path, there’s no denying that attending industry events can provide helpful career development opportunities — not only for improving and expanding your skill sets, but for networking as well. According to this article from PayScale.com, experts estimate that somewhere between 70-85% of new positions are landed through networking.
Narrowing our focus to networking opportunities with cloud computing professionals who’re working on tackling some of today’s most innovative and exciting big data solutions, attending big data-focused sessions at an AWS Global Summit is a great place to start.
AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. As the name suggests, these summits are held in major cities around the world, and attract technologists from all industries and skill levels who’re interested in hearing from AWS leaders, experts, partners, and customers.
In addition to networking opportunities with top cloud technology providers, consultants and your peers in our Partner and Solutions Expo, you’ll also hone your AWS skills by attending and participating in a multitude of education and training opportunities.
Here’s a brief sampling of some of the upcoming sessions relevant to big data professionals:
Be sure to check out the main page for AWS Global Summits, where you can see which cities have AWS Summits planned for 2018, register to attend an upcoming event, or provide your information to be notified when registration opens for a future event.
Almost a decade ago, my colleague Deepak Singh introduced the AWS Public Datasets in his post Paging Researchers, Analysts, and Developers. I’m happy to report that Deepak is still an important part of the AWS team and that the Public Datasets program is still going strong!
Today we are announcing a new take on open and public data, the Registry of Open Data on AWS, or RODA. This registry includes existing Public Datasets and allows anyone to add their own datasets so that they can be accessed and analyzed on AWS.
Inside the Registry The home page lists all of the datasets in the registry:
Entering a search term shrinks the list so that only the matching datasets are displayed:
Each dataset has an associated detail page, including usage examples, license info, and the information needed to locate and access the dataset on AWS:
In this case, I can access the data with a simple CLI command:
I could also access it programmatically, or download data to my EC2 instance.
Adding to the Repository If you have a dataset that is publicly available and would like to add it to RODA , you can simply send us a pull request. Head over to the open-data-registry repo, read the CONTRIBUTING document, and create a YAML file that describes your dataset, using one of the existing files in the datasets directory as a model:
We’ll review pull requests regularly; you can “star” or watch the repo in order to track additions and changes.
Impress Me I am looking forward to an inrush of new datasets, along with some blog posts and apps that show how to to use the data in powerful and interesting ways. Let me know what you come up with.
Australian public sector customers now have a clear roadmap to use our secure services for sensitive workloads at the PROTECTED level. For the first time, we’ve released our Information Security Registered Assessors Program (IRAP) PROTECTED documentation via AWS Artifact. This information provides the ability to plan, architect, and self-assess systems built in AWS under the Digital Transformation Agency’s Secure Cloud Guidelines.
In short, this documentation gives public sector customers everything needed to evaluate AWS at the PROTECTED level. And we’re making this resource available to download on-demand through AWS Artifact. When you download the guide, you’ll find a mapping of how AWS meets each requirement to securely and compliantly process PROTECTED data.
With the AWS IRAP PROTECTED documentation, the process of adopting our secure services has never been easier. The information enables individual agencies to complete their own assessments and adopt AWS, but we also continue to work with the Australian Signals Directorate to include our services at the PROTECTED level on the Certified Cloud Services List.
Meanwhile, we’re also excited to announce that there are now 46 services in scope, which mean more options to build secure and innovative solutions, while also saving money and gaining the productivity of the cloud.
If you have questions about this announcement or would like to inquire about how to use AWS for your regulated workloads, contact your account team.
AWS has achieved Spain’s Esquema Nacional de Seguridad (ENS) High certification across 29 services. To successfully achieve the ENS High Standard, BDO España conducted an independent audit and attested that AWS meets confidentiality, integrity, and availability standards. This provides the assurance needed by Spanish Public Sector organizations wanting to build secure applications and services on AWS.
The National Security Framework, regulated under Royal Decree 3/2010, was developed through close collaboration between ENAC (Entidad Nacional de Acreditación), the Ministry of Finance and Public Administration and the CCN (National Cryptologic Centre), and other administrative bodies.
The following AWS Services are ENS High accredited across our Dublin and Frankfurt Regions:
As a follow up to our initial region availability on November 20, 2017, I’m happy to announce that we have expanded the number of accredited services available in the AWS Secret Region by an additional 11 services. We continue to be the only cloud service provider with accredited regions to address the full range of U.S. Department of Defense (DoD) data classifications, including Unclassified, Sensitive (CUI), Secret, and Top Secret.
When the region launched last November, we achieved a Provisional Authorization (PA) for Impact Level 6 (IL6) workloads from the U.S. Defense Information Systems Agency (DISA), the IT combat support organization of the DoD. The PA was recently extended, allowing for continued access to the region for IL6 workloads.
The AWS Secret Region was designed and built to meet the specific security requirements of secret classified workloads for the DoD and the intelligence community. A service catalog for the region is available through your AWS Account Executive.
We expand AWS by picking a geographic area (which we call a Region) and then building multiple, isolated Availability Zones in that area. Each Availability Zone (AZ) has multiple Internet connections and power connections to multiple grids.
Today I am happy to announce that we are opening our 50th AWS Availability Zone, with the addition of a third AZ to the EU (London) Region. This will give you additional flexibility to architect highly scalable, fault-tolerant applications that run across multiple AZs in the UK.
Since launching the EU (London) Region, we have seen an ever-growing set of customers, particularly in the public sector and in regulated industries, use AWS for new and innovative applications. Here are a couple of examples, courtesy of my AWS colleagues in the UK:
Enterprise – Some of the UK’s most respected enterprises are using AWS to transform their businesses, including BBC, BT, Deloitte, and Travis Perkins. Travis Perkins is one of the largest suppliers of building materials in the UK and is implementing the biggest systems and business change in its history, including an all-in migration of its data centers to AWS.
Startups – Cross-border payments company Currencycloud has migrated its entire payments production, and demo platform to AWS resulting in a 30% saving on their infrastructure costs. Clearscore, with plans to disrupting the credit score industry, has also chosen to host their entire platform on AWS. UnderwriteMe is using the EU (London) Region to offer an underwriting platform to their customers as a managed service.
Public Sector -The Met Office chose AWS to support the Met Office Weather App, available for iPhone and Android phones. Since the Met Office Weather App went live in January 2016, it has attracted more than half a million users. Using AWS, the Met Office has been able to increase agility, speed, and scalability while reducing costs. The Driver and Vehicle Licensing Agency (DVLA) is using the EU (London) Region for services such as the Strategic Card Payments platform, which helps the agency achieve PCI DSS compliance.
The AWS EU (London) Region has achieved Public Services Network (PSN) assurance, which provides UK Public Sector customers with an assured infrastructure on which to build UK Public Sector services. In conjunction with AWS’s Standardized Architecture for UK-OFFICIAL, PSN assurance enables UK Public Sector organizations to move their UK-OFFICIAL classified data to the EU (London) Region in a controlled and risk-managed manner.
For a complete list of AWS Regions and Services, visit the AWS Global Infrastructure page. As always, pricing for services in the Region can be found on the detail pages; visit our Cloud Products page to get started.
Contributed by: Stephen Liedig, Senior Solutions Architect, ANZ Public Sector, and Otavio Ferreira, Manager, Amazon Simple Notification Service
Want to make your cloud-native applications scalable, fault-tolerant, and highly available? Recently, we wrote a couple of posts about using AWS messaging services Amazon SQS and Amazon SNS to address messaging patterns for loosely coupled communication between highly cohesive components. For more information, see:
Today, AWS is releasing a new message filtering functionality for SNS. This new feature simplifies the pub/sub messaging architecture by offloading the filtering logic from subscribers, as well as the routing logic from publishers, to SNS.
In this post, we walk you through the new message filtering feature, and how to use it to clean up unnecessary logic in your components, and reduce the number of topics in your architecture.
Topic-based filtering
SNS is a fully managed pub/sub messaging service that lets you fan out messages to large numbers of recipients at one time, using topics. SNS topics support a variety of subscription types, allowing you to push messages to SQS queues, AWS Lambda functions, HTTP endpoints, email addresses, and mobile devices (SMS, push).
In the above scenario, every subscriber receives the same message published to the topic, allowing them to process the message independently. For many use cases, this is sufficient.
However, in more complex scenarios, the subscriber may only be interested in a subset of the messages being published. The onus, in that case, is on each subscriber to ensure that they are filtering and only processing those messages in which they are actually interested.
To avoid this additional filtering logic on each subscriber, many organizations have adopted a practice in which the publisher is now responsible for routing different types of messages to different topics. However, as depicted in the following diagram, this topic-based filtering practice can lead to overly complicated publishers, topic proliferation, and additional overhead in provisioning and managing your SNS topics.
Attribute-based filtering
To leverage the new message filtering capability, SNS requires the publisher to set message attributes and each subscriber to set a subscription attribute (a subscription filter policy). When the publisher posts a new message to the topic, SNS attempts to match the incoming message attributes to the filter policy set on each subscription, to determine whether a particular subscriber is interested in that incoming event. If there is a match, SNS then pushes the message to the subscriber in question. The new attribute-based message filtering approach is depicted in the following diagram.
Message filtering in action
Look at how message filtering works. The following example is based on a sports merchandise ecommerce website, which publishes a variety of events to an SNS topic. The events range from checkout events (triggered when orders are placed or canceled) to buyers’ navigation events (triggered when product pages are visited). The code below is based on the existing AWS SDK for Python.
First, create the single SNS topic to which all shopping events are published.
Next, subscribe the endpoints that will be listening to those shopping events. The first subscriber is an SQS queue that is processed by a payment gateway, while the second subscriber is a Lambda function that indexes the buyer’s shopping interests against a search engine.
A subscription filter policy is set as a subscription attribute, by the subscription owner, as a simple JSON object, containing a set of key-value pairs. This object defines the kind of event in which the subscriber is interested.
You’re now ready to start publishing events with attributes!
Message attributes allow you to provide structured metadata items (such as time stamps, geospatial data, event type, signatures, and identifiers) about the message. Message attributes are optional and separate from, but sent along with, the message body. You can include up to 10 message attributes with your message.
The first message published in this example is related to an order that has been placed on the ecommerce website. The message attribute “event_type” with the value “order_placed” matches only the filter policy associated with the payment gateway subscription. Therefore, only the SQS queue subscribed to the SNS topic is notified about this checkout event.
The second message published is related to a buyer’s navigation activity on the ecommerce website. The message attribute “event_type” with the value “product_page_visited” matches only the filter policy associated with the search engine subscription. Therefore, only the Lambda function subscribed to the SNS topic is notified about this navigation event.
The following diagram represents the architecture for this ecommerce website, with the message filtering mechanism in action. As described earlier, checkout events are pushed only to the SQS queue, whereas navigation events are pushed to the Lambda function only.
Message filtering criteria
It is important to remember the following things about subscription filter policy matching:
A subscription filter policy either matches an incoming message, or it doesn’t. It’s Boolean logic.
For a filter policy to match a message, the message must contain all the attribute keys listed in the policy.
Attributes of the message not mentioned in the filtering policy are ignored.
The value of each key in the filter policy is an array containing one or more values. The policy matches if any of the values in the array match the value in the corresponding message attribute.
If the value in the message attribute is an array, then the filter policy matches if the intersection of the policy array and the message array is non-empty.
The matching is exact (character-by-character), without case-folding or any other string normalization.
The values being matched follow JSON rules: Strings enclosed in quotes, numbers, and the unquoted keywords true, false, and null.
Number matching is at the string representation level. Example: 300, 300.0, and 3.0e2 aren’t considered equal.
When should I use message filtering?
We recommend using message filtering and grouping subscribers into a single topic only when all of the following is true:
Subscribers are semantically related to each other
Subscribers consume similar types of events
Subscribers are supposed to share the same access permissions on the topic
Technically, you could get away with creating a single topic for your entire domain to handle all event processing, even unrelated use cases, but this wouldn’t be recommended. This option could result in an unnecessarily large topic, which could potentially impact your message delivery latency. Also, you would lose the ability to implement fine-grained access control on your topics.
Finally, if you already use SNS, but had to add filtering logic in your subscribers or routing logic in your publishers (topic-based filtering), you can now immediately benefit from message filtering. This new approach lets you clean up any unnecessary logic in your components, and reduce the number of topics in your architecture.
Summary
As we’ve shown in this post, the new message filtering capability in Amazon SNS gives you a great amount of flexibility in your messaging pattern. It allows you to really simplify your pub/sub infrastructure requirements.
Message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). It’s now available in all AWS commercial regions, at no extra charge.
Here’s a few ideas for next steps to get you started:
Add filter policies to your subscriptions on the SNS console,
The AWS US East/West Region has received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) at the Federal Risk and Authorization Management Program (FedRAMP) Moderate baseline.
Though AWS has maintained an AWS US East/West Region Agency-ATO since early 2013, this announcement represents AWS’s carefully deliberated move to the JAB for the centralized maintenance of our P-ATO for 10 services already authorized. This also includes the addition of 10 new services to our FedRAMP program (see the complete list of services below). This doubles the number of FedRAMP Moderate services available to our customers to enable increased use of the cloud and support modernized IT missions. Our public sector customers now can leverage this FedRAMP P-ATO as a baseline for their own authorizations and look to the JAB for centralized Continuous Monitoring reporting and updates. In a significant enhancement for our partners that build their solutions on the AWS US East/West Region, they can now achieve FedRAMP JAB P-ATOs of their own for their Platform as a Service (PaaS) and Software as a Service (SaaS) offerings.
In line with FedRAMP security requirements, our independent FedRAMP assessment was completed in partnership with a FedRAMP accredited Third Party Assessment Organization (3PAO) on our technical, management, and operational security controls to validate that they meet or exceed FedRAMP’s Moderate baseline requirements. Effective immediately, you can begin leveraging this P-ATO for the following 20 services in the AWS US East/West Region:
Amazon Aurora (MySQL)*
Amazon CloudWatch Logs*
Amazon DynamoDB
Amazon Elastic Block Store
Amazon Elastic Compute Cloud
Amazon EMR*
Amazon Glacier*
Amazon Kinesis Streams*
Amazon RDS (MySQL, Oracle, Postgres*)
Amazon Redshift
Amazon Simple Notification Service*
Amazon Simple Queue Service*
Amazon Simple Storage Service
Amazon Simple Workflow Service*
Amazon Virtual Private Cloud
AWS CloudFormation*
AWS CloudTrail*
AWS Identity and Access Management
AWS Key Management Service
Elastic Load Balancing
* Services with first-time FedRAMP Moderate authorizations
We continue to work with the FedRAMP Project Management Office (PMO), other regulatory and compliance bodies, and our customers and partners to ensure that we are raising the bar on our customers’ security and compliance needs.
This post courtesy of Aaron Friedman, Healthcare and Life Sciences Partner Solutions Architect, AWS and Angel Pizarro, Genomics and Life Sciences Senior Solutions Architect, AWS
Precision medicine is tailored to individuals based on quantitative signatures, including genomics, lifestyle, and environment. It is often considered to be the driving force behind the next wave of human health. Through new initiatives and technologies such as population-scale genomics sequencing and IoT-backed wearables, researchers and clinicians in both commercial and public sectors are gaining new, previously inaccessible insights.
Many of these precision medicine initiatives are already happening on AWS. A few of these include:
PrecisionFDA – This initiative is led by the US Food and Drug Administration. The goal is to define the next-generation standard of care for genomics in precision medicine.
Deloitte ConvergeHEALTH – Gives healthcare and life sciences organizations the ability to analyze their disparate datasets on a singular real world evidence platform.
Central to many of these initiatives is genomics, which gives healthcare organizations the ability to establish a baseline for longitudinal studies. Due to its wide applicability in precision medicine initiatives—from rare disease diagnosis to improving outcomes of clinical trials—genomics data is growing at a larger rate than Moore’s law across the globe. Many expect these datasets to grow to be in the range of tens of exabytes by 2025.
Genomics data is also regularly re-analyzed by the community as researchers develop new computational methods or compare older data with newer genome references. These trends are driving innovations in data analysis methods and algorithms to address the massive increase of computational requirements.
Edico Genome, an AWS Partner Network (APN) Partner, has developed a novel solution that accelerates genomics analysis using field-programmable gate arrays, or FPGAs. Historically, Edico Genome deployed their FPGA appliances on-premises. When AWS announced the Amazon EC2 F1 PGA-based instance family in December 2016, Edico Genome adopted a cloud-first strategy, became a F1 launch partner, and was one of the first partners to deploy FPGA-enabled applications on AWS.
On October 19, 2017, Edico Genome partnered with the Children’s Hospital of Philadelphia (CHOP) to demonstrate their FPGA-accelerated genomic pipeline software, called DRAGEN. It can significantly reduce time-to-insight for patient genomes, and analyzed 1,000 genomes from the Center for Applied Genomics Biobank in the shortest time possible. This set a Guinness World Record for the fastest analysis of 1000 whole human genomes, and they did this using 1000 EC2 f1.2xlarge instances in a single AWS region. Not only were they able to analyze genomes at high throughput, they did so averaging approximately $3 per whole human genome of AWS compute for the analysis.
The version of DRAGEN that Edico Genome used for this analysis was also the same one used in the precisionFDA Hidden Treasures – Warm Up challenge, where they were one of the top performers in every assessment.
In the remainder of this post, we walk through the architecture used by Edico Genome, combining EC2 F1 instances and AWS Batch to achieve this milestone.
EC2 F1 instances and Edico’s DRAGEN
EC2 F1 instances provide access to programmable hardware-acceleration using FPGAs at a cloud scale. AWS customers use F1 instances for a wide variety of applications, including big data, financial analytics and risk analysis, image and video processing, engineering simulations, AR/VR, and accelerated genomics. Edico Genome’s FPGA-backed DRAGEN Bio-IT Platform is now integrated with EC2 F1 instances. You can access the accuracy, speed, flexibility, and low compute cost of DRAGEN through a number of third-party platforms, AWS Marketplace, and Edico Genome’s own platform. The DRAGEN platform offers a scalable, accelerated, and cost-efficient secondary analysis solution for a wide variety of genomics applications. Edico Genome also provides a highly optimized mechanism for the efficient storage of genomic data.
Scaling DRAGEN on AWS
Edico Genome used 1,000 EC2 F1 instances to help their customer, the Children’s Hospital of Philadelphia (CHOP), to process and analyze all 1,000 whole human genomes in parallel. They used AWS Batch to provision compute resources and orchestrate DRAGEN compute jobs across the 1,000 EC2 F1 instances. This solution successfully addressed the challenge of creating a scalable genomic processing pipeline that can easily scale to thousands of engines running in parallel.
Architecture
A simplified view of the architecture used for the analysis is shown in the following diagram:
DRAGEN’s portal uses Elastic Load Balancing and Auto Scaling groups to scale out EC2 instances that submitted jobs to AWS Batch.
Job metadata is stored in their Workflow Management (WFM) database, built on top of Amazon Aurora.
The DRAGEN Workflow Manager API submits jobs to AWS Batch.
These jobs are executed on the AWS Batch managed compute environment that was responsible for launching the EC2 F1 instances.
These jobs run as Docker containers that have the requisite DRAGEN binaries for whole genome analysis.
As each job runs, it retrieves and stores genomics data that is staged in Amazon S3.
The steps listed previously can also be bucketed into the following higher-level layers:
Workflow: Edico Genome used their Workflow Management API to orchestrate the submission of AWS Batch jobs. Metadata for the jobs (such as the S3 locations of the genomes, etc.) resides in the Workflow Management Database backed by Amazon Aurora.
Batch execution: AWS Batch launches EC2 F1 instances and coordinates the execution of DRAGEN jobs on these compute resources. AWS Batch enabled Edico to quickly and easily scale up to the full number of instances they needed as jobs were submitted. They also scaled back down as each job was completed, to optimize for both cost and performance.
Compute/job: Edico Genome stored their binaries in a Docker container that AWS Batch deployed onto each of the F1 instances, giving each instance the ability to run DRAGEN without the need to pre-install the core executables. The AWS based DRAGEN solution streams all genomics data from S3 for local computation and then writes the results to a destination bucket. They used an AWS Batch job role that specified the IAM permissions. The role ensured that DRAGEN only had access to the buckets or S3 key space it needed for the analysis. Jobs didn’t need to embed AWS credentials.
In the following sections, we dive deeper into several tasks that enabled Edico Genome’s scalable FPGA genome analysis on AWS:
Prepare your Amazon FPGA Image for AWS Batch
Create a Dockerfile and build your Docker image
Set up your AWS Batch FPGA compute environment
Prerequisites
In brief, you need a modern Linux distribution (3.10+), Amazon ECS Container Agent, awslogs driver, and Docker configured on your image. There are additional recommendations in the Compute Resource AMI specification.
Preparing your Amazon FPGA Image for AWS Batch
You can use any Amazon Machine Image (AMI) or Amazon FPGA Image (AFI) with AWS Batch, provided that it meets the Compute Resource AMI specification. This gives you the ability to customize any workload by increasing the size of root or data volumes, adding instance stores, and connecting with the FPGA (F) and GPU (G and P) instance families.
Next, install the AWS CLI:
pip install awscli
Add any additional software required to interact with the FPGAs on the F1 instances.
As a starting point, AWS publishes an FPGA Developer AMI in the AWS Marketplace. It is based on a CentOS Linux image and includes pre-integrated FPGA development tools. It also includes the runtime tools required to develop and use custom FPGAs for hardware acceleration applications.
For more information about how to set up custom AMIs for your AWS Batch managed compute environments, see Creating a Compute Resource AMI.
Building your Dockerfile
There are two common methods for connecting to AWS Batch to run FPGA-enabled algorithms. The first method, which is the route Edico Genome took, involves storing your binaries in the Docker container itself and running that on top of an F1 instance with Docker installed. The following code example is what a Dockerfile to build your container might look like for this scenario.
# DRAGEN_EXEC Docker image generator --
# Run this Dockerfile from a local directory that contains the latest release of
# - Dragen RPM and Linux DMA Driver available from Edico
# - Edico's Dragen WFMS Wrapper files
FROM centos:centos7
RUN rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# Install Basic packages needed for Dragen
RUN yum -y install \
perl \
sos \
coreutils \
gdb \
time \
systemd-libs \
bzip2-libs \
R \
ca-certificates \
ipmitool \
smartmontools \
rsync
# Install the Dragen RPM
RUN mkdir -m777 -p /var/log/dragen /var/run/dragen
ADD . /root
RUN rpm -Uvh /root/edico_driver*.rpm || true
RUN rpm -Uvh /root/dragen-aws*.rpm || true
# Auto generate the Dragen license
RUN /opt/edico/bin/dragen_lic -i auto
#########################################################
# Now install the Edico WFMS "Wrapper" functions
# Add development tools needed for some util
RUN yum groupinstall -y "Development Tools"
# Install necessary standard packages
RUN yum -y install \
dstat \
git \
python-devel \
python-pip \
time \
tree && \
pip install --upgrade pip && \
easy_install requests && \
pip install psutil && \
pip install python-dateutil && \
pip install constants && \
easy_install boto3
# Setup Python path used by the wrapper
RUN mkdir -p /opt/workflow/python/bin
RUN ln -s /usr/bin/python /opt/workflow/python/bin/python2.7
RUN ln -s /usr/bin/python /opt/workflow/python/bin/python
# Install d_haul and dragen_job_execute wrapper functions and associated packages
RUN mkdir -p /root/wfms/trunk/scheduler/scheduler
COPY scheduler/d_haul /root/wfms/trunk/scheduler/
COPY scheduler/dragen_job_execute /root/wfms/trunk/scheduler/
COPY scheduler/scheduler/aws_utils.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/constants.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/job_utils.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/logger.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/scheduler_utils.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/webapi.py /root/wfms/trunk/scheduler/scheduler/
COPY scheduler/scheduler/wfms_exception.py /root/wfms/trunk/scheduler/scheduler/
RUN touch /root/wfms/trunk/scheduler/scheduler/__init__.py
# Landing directory should be where DJX is located
WORKDIR "/root/wfms/trunk/scheduler/"
# Debug print of container's directories
RUN tree /root/wfms/trunk/scheduler
# Default behaviour. Over-ride with --entrypoint on docker run cmd line
ENTRYPOINT ["/root/wfms/trunk/scheduler/dragen_job_execute"]
CMD []
Note: Edico Genome’s custom Python wrapper functions for its Workflow Management System (WFMS) in the latter part of this Dockerfile should be replaced with functions that are specific to your workflow.
The second method is to install binaries and then use Docker as a lightweight connector between AWS Batch and the AFI. For example, this might be a route you would choose to use if you were provisioning DRAGEN from the AWS Marketplace.
In this case, the Dockerfile would not contain the installation of the binaries to run DRAGEN, but would contain any other packages necessary for job completion. When you run your Docker container, you enable Docker to access the underlying file system.
Connecting to AWS Batch
AWS Batch provisions compute resources and runs your jobs, choosing the right instance types based on your job requirements and scaling down resources as work is completed. AWS Batch users submit a job, based on a template or “job definition” to an AWS Batch job queue.
Job queues are mapped to one or more compute environments that describe the quantity and types of resources that AWS Batch can provision. In this case, Edico created a managed compute environment that was able to launch 1,000 EC2 F1 instances across multiple Availability Zones in us-east-1. As jobs are submitted to a job queue, the service launches the required quantity and types of instances that are needed. As instances become available, AWS Batch then runs each job within appropriately sized Docker containers.
The Edico Genome workflow manager API submits jobs to an AWS Batch job queue. This job queue maps to an AWS Batch managed compute environment containing On-Demand F1 instances. In this section, you can set this up yourself.
To create the compute environment that DRAGEN can use:
An f1.2xlarge EC2 instance contains one FPGA, eight vCPUs, and 122-GiB RAM. As DRAGEN requires an entire FPGA to run, Edico Genome needed to ensure that only one analysis per time executed on an instance. By using the f1.2xlarge vCPUs and memory as a proxy in their AWS Batch job definition, Edico Genome could ensure that only one job runs on an instance at a time. Here’s what that looks like in the AWS CLI:
You can query the status of your DRAGEN job with the following command:
aws batch describe-jobs --jobs <the job ID from the above command>
The logs for your job are written to the /aws/batch/job CloudWatch log group.
Conclusion
In this post, we demonstrated how to set up an environment with AWS Batch that can run DRAGEN on EC2 F1 instances at scale. If you followed the walkthrough, you’ve replicated much of the architecture Edico Genome used to set the Guinness World Record.
There are several ways in which you can harness the computational power of DRAGEN to analyze genomes at scale. First, DRAGEN is available through several different genomics platforms, such as the DNAnexus Platform. DRAGEN is also available on the AWS Marketplace. You can apply the architecture presented in this post to build a scalable solution that is both performant and cost-optimized.
For more information about how AWS Batch can facilitate genomics processing at scale, be sure to check out our aws-batch-genomics GitHub repo on high-throughput genomics on AWS.
The AWS EU (London) Region has been selected to provide services to support UK law enforcement customers. This decision followed an assessment by Home Office Digital, Data and Technology supported by their colleagues in the National Policing Information Risk Management Team (NPIRMT) to determine the region’s suitability for addressing their specific needs.
The security, privacy, and protection of AWS customers are AWS’s first priority. We are committed to supporting Public Sector, Blue Light, Justice, and Public Safety organizations. We hope that other organizations in these sectors will now be encouraged to consider AWS services when addressing their own requirements, including the challenge of providing modern, scalable technologies that can meet their ever-evolving business demands.
– Oliver
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.