Tag Archives: Data at rest

New – Encryption of Data in Transit for Amazon EFS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-encryption-of-data-in-transit-for-amazon-efs/

Amazon Elastic File System was designed to be the file system of choice for cloud-native applications that require shared access to file-based storage. We launched EFS in mid-2016 and have added several important features since then including on-premises access via Direct Connect and encryption of data at rest. We have also made EFS available in additional AWS Regions, most recently US West (Northern California). As was the case with EFS itself, these enhancements were made in response to customer feedback, and reflect our desire to serve an ever-widening customer base.

Encryption in Transit
Today we are making EFS even more useful with the addition of support for encryption of data in transit. When used in conjunction with the existing support for encryption of data at rest, you now have the ability to protect your stored files using a defense-in-depth security strategy.

In order to make it easy for you to implement encryption in transit, we are also releasing an EFS mount helper. The helper (available in source code and RPM form) takes care of setting up a TLS tunnel to EFS, and also allows you to mount file systems by ID. The two features are independent; you can use the helper to mount file systems by ID even if you don’t make use of encryption in transit. The helper also supplies a recommended set of default options to the actual mount command.

Setting up Encryption
I start by installing the EFS mount helper on my Amazon Linux instance:

$ sudo yum install -y amazon-efs-utils

Next, I visit the EFS Console and capture the file system ID:

Then I specify the ID (and the TLS option) to mount the file system:

$ sudo mount -t efs fs-92758f7b -o tls /mnt/efs

And that’s it! The encryption is transparent and has an almost negligible impact on data transfer speed.

Available Now
You can start using encryption in transit today in all AWS Regions where EFS is available.

The mount helper is available for Amazon Linux. If you are running another distribution of Linux you will need to clone the GitHub repo and build your own RPM, as described in the README.

Jeff;

Now Available: Encryption at Rest for Amazon DynamoDB

Post Syndicated from Nitin Sagar original https://aws.amazon.com/blogs/security/now-available-encryption-at-rest-for-amazon-dynamodb/

Today, AWS announced Amazon DynamoDB encryption at rest, a new DynamoDB feature that gives you enhanced security of your data at rest by encrypting it using your associated AWS Key Management Service encryption keys. Encryption at rest can help you meet your security requirements for regulatory compliance.

You now can create an encrypted DynamoDB table anytime with a single click in the AWS Management Console or a single API call. Encrypting DynamoDB data has no impact on table performance. DynamoDB encryption at rest is available starting today in the US East (N. Virginia), US East (Ohio), US West (Oregon), and Europe (Ireland) Regions for no additional fees.

For more information, see the full AWS Blog post.

– Nitin

GDPR for Developers [presentation]

Post Syndicated from Bozho original https://techblog.bozho.net/gdpr-developers-presentation/

On a recent meetup in Amsterdam I talked about GDPR from a technical point of view, effectively turning my “GDPR – a practical guide for developers” article into a talk.

You can see the slides here:

If you’re interested, you can also join a webinar on the same topic, organized by AxonIQ, where I will join Frans Vanbuul. You can find more information about the webinar here.

The interesting thing that I can share after the meetup and after meeting with potential clients is that everyone (maybe unsurprisingly) has a very specific question that doesn’t get an immediate answer even after you follow the general guidelines. That is maybe a problem on the Regulation’s side, as it has not brought sufficient clarity to businesses.

As I said during the presentation – in technology we’re used with binary questions. In law and legal compliance an answer is somewhere on a scale between 1 and 10. “Do I have to encrypt my data at rest”? Well, I guess yes, but in terms of compliance I’d say “6 out of 10”, as it is not strict, depends on the multiple people’s interpretation of the sensitivity of the data and on other factors like access control.

So the communication between legal and technical people is key to understand what exactly implementation changes are needed.

The post GDPR for Developers [presentation] appeared first on Bozho's tech blog.

GDPR – A Practical Guide For Developers

Post Syndicated from Bozho original https://techblog.bozho.net/gdpr-practical-guide-developers/

You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for gettign your systems in compliance with the regulation.

The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.

Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects.

I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.

The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them), the right to data portability (the user should be able to get a machine-readable dump of their data).

Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).

Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.

It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)

Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s.

  • “Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent, and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blcokchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse.
  • Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.
  • Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.
  • Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while).
  • Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.
  • Consent checkboxes – this is in my opinion the biggest change that the regulation brings. “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”.
  • Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.
  • “See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. For example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned)
  • Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviosuly, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).

Now some “do’s”, which are mostly about the technical measures needed to protect personal data. They may be more “ops” than “dev”, but often the application also has to be extended to support them. I’ve listed most of what I could think of in a previous post.

  • Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption
  • Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
  • Encrypt your backups – kind of obvious
  • Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person
  • Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.
  • Have your GDPR register of processing activities in something other than Excel – Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
  • Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose
  • Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)

Finally, some “don’t’s”.

  • Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent.
  • Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case
  • Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
  • Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.
  • Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards

Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.

I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.

The post GDPR – A Practical Guide For Developers appeared first on Bozho's tech blog.

AWS HIPAA Eligibility Update (October 2017) – Sixteen Additional Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-hipaa-eligibility-post-update-october-2017-sixteen-additional-services/

Our Health Customer Stories page lists just a few of the many customers that are building and running healthcare and life sciences applications that run on AWS. Customers like Verge Health, Care Cloud, and Orion Health trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.

Sixteen More Services
In my last HIPAA Eligibility Update I shared the news that we added eight additional services to our list of HIPAA eligible services. Today I am happy to let you know that we have added another sixteen services to the list, bringing the total up to 46. Here are the newest additions, along with some short descriptions and links to some of my blog posts to jog your memory:

Amazon Aurora with PostgreSQL Compatibility – This brand-new addition to Amazon Aurora allows you to encrypt your relational databases using keys that you create and manage through AWS Key Management Service (KMS). When you enable encryption for an Amazon Aurora database, the underlying storage is encrypted, as are automated backups, read replicas, and snapshots. Read New – Encryption at Rest for Amazon Aurora to learn more.

Amazon CloudWatch Logs – You can use the logs to monitor and troubleshoot your systems and applications. You can monitor your existing system, application, and custom log files in near real-time, watching for specific phrases, values, or patterns. Log data can be stored durably and at low cost, for as long as needed. To learn more, read Store and Monitor OS & Application Log Files with Amazon CloudWatch and Improvements to CloudWatch Logs and Dashboards.

Amazon Connect – This self-service, cloud-based contact center makes it easy for you to deliver better customer service at a lower cost. You can use the visual designer to set up your contact flows, manage agents, and track performance, all without specialized skills. Read Amazon Connect – Customer Contact Center in the Cloud and New – Amazon Connect and Amazon Lex Integration to learn more.

Amazon ElastiCache for Redis – This service lets you deploy, operate, and scale an in-memory data store or cache that you can use to improve the performance of your applications. Each ElastiCache for Redis cluster publishes key performance metrics to Amazon CloudWatch. To learn more, read Caching in the Cloud with Amazon ElastiCache and Amazon ElastiCache – Now With a Dash of Redis.

Amazon Kinesis Streams – This service allows you to build applications that process or analyze streaming data such as website clickstreams, financial transactions, social media feeds, and location-tracking events. To learn more, read Amazon Kinesis – Real-Time Processing of Streaming Big Data and New: Server-Side Encryption for Amazon Kinesis Streams.

Amazon RDS for MariaDB – This service lets you set up scalable, managed MariaDB instances in minutes, and offers high performance, high availability, and a simplified security model that makes it easy for you to encrypt data at rest and in transit. Read Amazon RDS Update – MariaDB is Now Available to learn more.

Amazon RDS SQL Server – This service lets you set up scalable, managed Microsoft SQL Server instances in minutes, and also offers high performance, high availability, and a simplified security model. To learn more, read Amazon RDS for SQL Server and .NET support for AWS Elastic Beanstalk and Amazon RDS for Microsoft SQL Server – Transparent Data Encryption (TDE) to learn more.

Amazon Route 53 – This is a highly available Domain Name Server. It translates names like www.example.com into IP addresses. To learn more, read Moving Ahead with Amazon Route 53.

AWS Batch – This service lets you run large-scale batch computing jobs on AWS. You don’t need to install or maintain specialized batch software or build your own server clusters. Read AWS Batch – Run Batch Computing Jobs on AWS to learn more.

AWS CloudHSM – A cloud-based Hardware Security Module (HSM) for key storage and management at cloud scale. Designed for sensitive workloads, CloudHSM lets you manage your own keys using FIPS 140-2 Level 3 validated HSMs. To learn more, read AWS CloudHSM – Secure Key Storage and Cryptographic Operations and AWS CloudHSM Update – Cost Effective Hardware Key Management at Cloud Scale for Sensitive & Regulated Workloads.

AWS Key Management Service – This service makes it easy for you to create and control the encryption keys used to encrypt your data. It uses HSMs to protect your keys, and is integrated with AWS CloudTrail in order to provide you with a log of all key usage. Read New AWS Key Management Service (KMS) to learn more.

AWS Lambda – This service lets you run event-driven application or backend code without thinking about or managing servers. To learn more, read AWS Lambda – Run Code in the Cloud, AWS Lambda – A Look Back at 2016, and AWS Lambda – In Full Production with New Features for Mobile Devs.

[email protected] – You can use this new feature of AWS Lambda to run Node.js functions across the global network of AWS locations without having to provision or manager servers, in order to deliver rich, personalized content to your users with low latency. Read [email protected] – Intelligent Processing of HTTP Requests at the Edge to learn more.

AWS Snowball Edge – This is a data transfer device with 100 terabytes of on-board storage as well as compute capabilities. You can use it to move large amounts of data into or out of AWS, as a temporary storage tier, or to support workloads in remote or offline locations. To learn more, read AWS Snowball Edge – More Storage, Local Endpoints, Lambda Functions.

AWS Snowmobile – This is an exabyte-scale data transfer service. Pulled by a semi-trailer truck, each Snowmobile packs 100 petabytes of storage into a ruggedized 45-foot long shipping container. Read AWS Snowmobile – Move Exabytes of Data to the Cloud in Weeks to learn more (and to see some of my finest LEGO work).

AWS Storage Gateway – This hybrid storage service lets your on-premises applications use AWS cloud storage (Amazon Simple Storage Service (S3), Amazon Glacier, and Amazon Elastic File System) in a simple and seamless way, with storage for volumes, files, and virtual tapes. To learn more, read The AWS Storage Gateway – Integrate Your Existing On-Premises Applications with AWS Cloud Storage and File Interface to AWS Storage Gateway.

And there you go! Check out my earlier post for a list of resources that will help you to build applications that comply with HIPAA and HITECH.

Jeff;

 

AWS Summit New York – Summary of Announcements

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-summit-new-york-summary-of-announcements/

Whew – what a week! Tara, Randall, Ana, and I have been working around the clock to create blog posts for the announcements that we made at the AWS Summit in New York. Here’s a summary to help you to get started:

Amazon Macie – This new service helps you to discover, classify, and secure content at scale. Powered by machine learning and making use of Natural Language Processing (NLP), Macie looks for patterns and alerts you to suspicious behavior, and can help you with governance, compliance, and auditing. You can read Tara’s post to see how to put Macie to work; you select the buckets of interest, customize the classification settings, and review the results in the Macie Dashboard.

AWS GlueRandall’s post (with deluxe animated GIFs) introduces you to this new extract, transform, and load (ETL) service. Glue is serverless and fully managed, As you can see from the post, Glue crawls your data, infers schemas, and generates ETL scripts in Python. You define jobs that move data from place to place, with a wide selection of transforms, each expressed as code and stored in human-readable form. Glue uses Development Endpoints and notebooks to provide you with a testing environment for the scripts you build. We also announced that Amazon Athena now integrates with Amazon Glue, as does Apache Spark and Hive on Amazon EMR.

AWS Migration Hub – This new service will help you to migrate your application portfolio to AWS. My post outlines the major steps and shows you how the Migration Hub accelerates, tracks,and simplifies your migration effort. You can begin with a discovery step, or you can jump right in and migrate directly. Migration Hub integrates with tools from our migration partners and builds upon the Server Migration Service and the Database Migration Service.

CloudHSM Update – We made a major upgrade to AWS CloudHSM, making the benefits of hardware-based key management available to a wider audience. The service is offered on a pay-as-you-go basis, and is fully managed. It is open and standards compliant, with support for multiple APIs, programming languages, and cryptography extensions. CloudHSM is an integral part of AWS and can be accessed from the AWS Management Console, AWS Command Line Interface (CLI), and through API calls. Read my post to learn more and to see how to set up a CloudHSM cluster.

Managed Rules to Secure S3 Buckets – We added two new rules to AWS Config that will help you to secure your S3 buckets. The s3-bucket-public-write-prohibited rule identifies buckets that have public write access and the s3-bucket-public-read-prohibited rule identifies buckets that have global read access. As I noted in my post, you can run these rules in response to configuration changes or on a schedule. The rules make use of some leading-edge constraint solving techniques, as part of a larger effort to use automated formal reasoning about AWS.

CloudTrail for All Customers – Tara’s post revealed that AWS CloudTrail is now available and enabled by default for all AWS customers. As a bonus, Tara reviewed the principal benefits of CloudTrail and showed you how to review your event history and to deep-dive on a single event. She also showed you how to create a second trail, for use with CloudWatch CloudWatch Events.

Encryption of Data at Rest for EFS – When you create a new file system, you now have the option to select a key that will be used to encrypt the contents of the files on the file system. The encryption is done using an industry-standard AES-256 algorithm. My post shows you how to select a key and to verify that it is being used.

Watch the Keynote
My colleagues Adrian Cockcroft and Matt Wood talked about these services and others on the stage, and also invited some AWS customers to share their stories. Here’s the video:

Jeff;

 

AWS CloudHSM Update – Cost Effective Hardware Key Management at Cloud Scale for Sensitive & Regulated Workloads

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-cloudhsm-update-cost-effective-hardware-key-management/

Our customers run an incredible variety of mission-critical workloads on AWS, many of which process and store sensitive data. As detailed in our Overview of Security Processes document, AWS customers have access to an ever-growing set of options for encrypting and protecting this data. For example, Amazon Relational Database Service (RDS) supports encryption of data at rest and in transit, with options tailored for each supported database engine (MySQL, SQL Server, Oracle, MariaDB, PostgreSQL, and Aurora).

Many customers use AWS Key Management Service (KMS) to centralize their key management, with others taking advantage of the hardware-based key management, encryption, and decryption provided by AWS CloudHSM to meet stringent security and compliance requirements for their most sensitive data and regulated workloads (you can read my post, AWS CloudHSM – Secure Key Storage and Cryptographic Operations, to learn more about Hardware Security Modules, also known as HSMs).

Major CloudHSM Update
Today, building on what we have learned from our first-generation product, we are making a major update to CloudHSM, with a set of improvements designed to make the benefits of hardware-based key management available to a much wider audience while reducing the need for specialized operating expertise. Here’s a summary of the improvements:

Pay As You Go – CloudHSM is now offered under a pay-as-you-go model that is simpler and more cost-effective, with no up-front fees.

Fully Managed – CloudHSM is now a scalable managed service; provisioning, patching, high availability, and backups are all built-in and taken care of for you. Scheduled backups extract an encrypted image of your HSM from the hardware (using keys that only the HSM hardware itself knows) that can be restored only to identical HSM hardware owned by AWS. For durability, those backups are stored in Amazon Simple Storage Service (S3), and for an additional layer of security, encrypted again with server-side S3 encryption using an AWS KMS master key.

Open & Compatible  – CloudHSM is open and standards-compliant, with support for multiple APIs, programming languages, and cryptography extensions such as PKCS #11, Java Cryptography Extension (JCE), and Microsoft CryptoNG (CNG). The open nature of CloudHSM gives you more control and simplifies the process of moving keys (in encrypted form) from one CloudHSM to another, and also allows migration to and from other commercially available HSMs.

More Secure – CloudHSM Classic (the original model) supports the generation and use of keys that comply with FIPS 140-2 Level 2. We’re stepping that up a notch today with support for FIPS 140-2 Level 3, with security mechanisms that are designed to detect and respond to physical attempts to access or modify the HSM. Your keys are protected with exclusive, single-tenant access to tamper-resistant HSMs that appear within your Virtual Private Clouds (VPCs). CloudHSM supports quorum authentication for critical administrative and key management functions. This feature allows you to define a list of N possible identities that can access the functions, and then require at least M of them to authorize the action. It also supports multi-factor authentication using tokens that you provide.

AWS-Native – The updated CloudHSM is an integral part of AWS and plays well with other tools and services. You can create and manage a cluster of HSMs using the AWS Management Console, AWS Command Line Interface (CLI), or API calls.

Diving In
You can create CloudHSM clusters that contain 1 to 32 HSMs, each in a separate Availability Zone in a particular AWS Region. Spreading HSMs across AZs gives you high availability (including built-in load balancing); adding more HSMs gives you additional throughput. The HSMs within a cluster are kept in sync: performing a task or operation on one HSM in a cluster automatically updates the others. Each HSM in a cluster has its own Elastic Network Interface (ENI).

All interaction with an HSM takes place via the AWS CloudHSM client. It runs on an EC2 instance and uses certificate-based mutual authentication to create secure (TLS) connections to the HSMs.

At the hardware level, each HSM includes hardware-enforced isolation of crypto operations and key storage. Each customer HSM runs on dedicated processor cores.

Setting Up a Cluster
Let’s set up a cluster using the CloudHSM Console:

I click on Create cluster to get started, select my desired VPC and the subnets within it (I can also create a new VPC and/or subnets if needed):

Then I review my settings and click on Create:

After a few minutes, my cluster exists, but is uninitialized:

Initialization simply means retrieving a certificate signing request (the Cluster CSR):

And then creating a private key and using it to sign the request (these commands were copied from the Initialize Cluster docs and I have omitted the output. Note that ID identifies the cluster):

$ openssl genrsa -out CustomerRoot.key 2048
$ openssl req -new -x509 -days 365 -key CustomerRoot.key -out CustomerRoot.crt
$ openssl x509 -req -days 365 -in ID_ClusterCsr.csr   \
                              -CA CustomerRoot.crt    \
                              -CAkey CustomerRoot.key \
                              -CAcreateserial         \
                              -out ID_CustomerHsmCertificate.crt

The next step is to apply the signed certificate to the cluster using the console or the CLI. After this has been done, the cluster can be activated by changing the password for the HSM’s administrative user, otherwise known as the Crypto Officer (CO).

Once the cluster has been created, initialized and activated, it can be used to protect data. Applications can use the APIs in AWS CloudHSM SDKs to manage keys, encrypt & decrypt objects, and more. The SDKs provide access to the CloudHSM client (running on the same instance as the application). The client, in turn, connects to the cluster across an encrypted connection.

Available Today
The new HSM is available today in the US East (Northern Virginia), US West (Oregon), US East (Ohio), and EU (Ireland) Regions, with more in the works. Pricing starts at $1.45 per HSM per hour.

Jeff;

In Case You Missed These: AWS Security Blog Posts from January, February, and March

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/in-case-you-missed-these-aws-security-blog-posts-from-january-february-and-march/

Image of lock and key

In case you missed any AWS Security Blog posts published so far in 2017, they are summarized and linked to below. The posts are shown in reverse chronological order (most recent first), and the subject matter ranges from protecting dynamic web applications against DDoS attacks to monitoring AWS account configuration changes and API calls to Amazon EC2 security groups.

March

March 22: How to Help Protect Dynamic Web Applications Against DDoS Attacks by Using Amazon CloudFront and Amazon Route 53
Using a content delivery network (CDN) such as Amazon CloudFront to cache and serve static text and images or downloadable objects such as media files and documents is a common strategy to improve webpage load times, reduce network bandwidth costs, lessen the load on web servers, and mitigate distributed denial of service (DDoS) attacks. AWS WAF is a web application firewall that can be deployed on CloudFront to help protect your application against DDoS attacks by giving you control over which traffic to allow or block by defining security rules. When users access your application, the Domain Name System (DNS) translates human-readable domain names (for example, www.example.com) to machine-readable IP addresses (for example, 192.0.2.44). A DNS service, such as Amazon Route 53, can effectively connect users’ requests to a CloudFront distribution that proxies requests for dynamic content to the infrastructure hosting your application’s endpoints. In this blog post, I show you how to deploy CloudFront with AWS WAF and Route 53 to help protect dynamic web applications (with dynamic content such as a response to user input) against DDoS attacks. The steps shown in this post are key to implementing the overall approach described in AWS Best Practices for DDoS Resiliency and enable the built-in, managed DDoS protection service, AWS Shield.

March 21: New AWS Encryption SDK for Python Simplifies Multiple Master Key Encryption
The AWS Cryptography team is happy to announce a Python implementation of the AWS Encryption SDK. This new SDK helps manage data keys for you, and it simplifies the process of encrypting data under multiple master keys. As a result, this new SDK allows you to focus on the code that drives your business forward. It also provides a framework you can easily extend to ensure that you have a cryptographic library that is configured to match and enforce your standards. The SDK also includes ready-to-use examples. If you are a Java developer, you can refer to this blog post to see specific Java examples for the SDK. In this blog post, I show you how you can use the AWS Encryption SDK to simplify the process of encrypting data and how to protect your encryption keys in ways that help improve application availability by not tying you to a single region or key management solution.

March 21: Updated CJIS Workbook Now Available by Request
The need for guidance when implementing Criminal Justice Information Services (CJIS)–compliant solutions has become of paramount importance as more law enforcement customers and technology partners move to store and process criminal justice data in the cloud. AWS services allow these customers to easily and securely architect a CJIS-compliant solution when handling criminal justice data, creating a durable, cost-effective, and secure IT infrastructure that better supports local, state, and federal law enforcement in carrying out their public safety missions. AWS has created several documents (collectively referred to as the CJIS Workbook) to assist you in aligning with the FBI’s CJIS Security Policy. You can use the workbook as a framework for developing CJIS-compliant architecture in the AWS Cloud. The workbook helps you define and test the controls you operate, and document the dependence on the controls that AWS operates (compute, storage, database, networking, regions, Availability Zones, and edge locations).

March 9: New Cloud Directory API Makes It Easier to Query Data Along Multiple Dimensions
Today, we made available a new Cloud Directory API, ListObjectParentPaths, that enables you to retrieve all available parent paths for any directory object across multiple hierarchies. Use this API when you want to fetch all parent objects for a specific child object. The order of the paths and objects returned is consistent across iterative calls to the API, unless objects are moved or deleted. In case an object has multiple parents, the API allows you to control the number of paths returned by using a paginated call pattern. In this blog post, I use an example directory to demonstrate how this new API enables you to retrieve data across multiple dimensions to implement powerful applications quickly.

March 8: How to Access the AWS Management Console Using AWS Microsoft AD and Your On-Premises Credentials
AWS Directory Service for Microsoft Active Directory, also known as AWS Microsoft AD, is a managed Microsoft Active Directory (AD) hosted in the AWS Cloud. Now, AWS Microsoft AD makes it easy for you to give your users permission to manage AWS resources by using on-premises AD administrative tools. With AWS Microsoft AD, you can grant your on-premises users permissions to resources such as the AWS Management Console instead of adding AWS Identity and Access Management (IAM) user accounts or configuring AD Federation Services (AD FS) with Security Assertion Markup Language (SAML). In this blog post, I show how to use AWS Microsoft AD to enable your on-premises AD users to sign in to the AWS Management Console with their on-premises AD user credentials to access and manage AWS resources through IAM roles.

March 7: How to Protect Your Web Application Against DDoS Attacks by Using Amazon Route 53 and an External Content Delivery Network
Distributed Denial of Service (DDoS) attacks are attempts by a malicious actor to flood a network, system, or application with more traffic, connections, or requests than it is able to handle. To protect your web application against DDoS attacks, you can use AWS Shield, a DDoS protection service that AWS provides automatically to all AWS customers at no additional charge. You can use AWS Shield in conjunction with DDoS-resilient web services such as Amazon CloudFront and Amazon Route 53 to improve your ability to defend against DDoS attacks. Learn more about architecting for DDoS resiliency by reading the AWS Best Practices for DDoS Resiliency whitepaper. You also have the option of using Route 53 with an externally hosted content delivery network (CDN). In this blog post, I show how you can help protect the zone apex (also known as the root domain) of your web application by using Route 53 to perform a secure redirect to prevent discovery of your application origin.

Image of lock and key

February

February 27: Now Generally Available – AWS Organizations: Policy-Based Management for Multiple AWS Accounts
Today, AWS Organizations moves from Preview to General Availability. You can use Organizations to centrally manage multiple AWS accounts, with the ability to create a hierarchy of organizational units (OUs). You can assign each account to an OU, define policies, and then apply those policies to an entire hierarchy, specific OUs, or specific accounts. You can invite existing AWS accounts to join your organization, and you can also create new accounts. All of these functions are available from the AWS Management Console, the AWS Command Line Interface (CLI), and through the AWS Organizations API.To read the full AWS Blog post about today’s launch, see AWS Organizations – Policy-Based Management for Multiple AWS Accounts.

February 23: s2n Is Now Handling 100 Percent of SSL Traffic for Amazon S3
Today, we’ve achieved another important milestone for securing customer data: we have replaced OpenSSL with s2n for all internal and external SSL traffic in Amazon Simple Storage Service (Amazon S3) commercial regions. This was implemented with minimal impact to customers, and multiple means of error checking were used to ensure a smooth transition, including client integration tests, catching potential interoperability conflicts, and identifying memory leaks through fuzz testing.

February 22: Easily Replace or Attach an IAM Role to an Existing EC2 Instance by Using the EC2 Console
AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials. IAM roles for EC2 make it easier for your applications to make API requests securely from an instance because they do not require you to manage AWS security credentials that the applications use. Recently, we enabled you to use temporary security credentials for your applications by attaching an IAM role to an existing EC2 instance by using the AWS CLI and SDK. To learn more, see New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI. Starting today, you can attach an IAM role to an existing EC2 instance from the EC2 console. You can also use the EC2 console to replace an IAM role attached to an existing instance. In this blog post, I will show how to attach an IAM role to an existing EC2 instance from the EC2 console.

February 22: How to Audit Your AWS Resources for Security Compliance by Using Custom AWS Config Rules
AWS Config Rules enables you to implement security policies as code for your organization and evaluate configuration changes to AWS resources against these policies. You can use Config rules to audit your use of AWS resources for compliance with external compliance frameworks such as CIS AWS Foundations Benchmark and with your internal security policies related to the US Health Insurance Portability and Accountability Act (HIPAA), the Federal Risk and Authorization Management Program (FedRAMP), and other regimes. AWS provides some predefined, managed Config rules. You also can create custom Config rules based on criteria you define within an AWS Lambda function. In this post, I show how to create a custom rule that audits AWS resources for security compliance by enabling VPC Flow Logs for an Amazon Virtual Private Cloud (VPC). The custom rule meets requirement 4.3 of the CIS AWS Foundations Benchmark: “Ensure VPC flow logging is enabled in all VPCs.”

February 13: AWS Announces CISPE Membership and Compliance with First-Ever Code of Conduct for Data Protection in the Cloud
I have two exciting announcements today, both showing AWS’s continued commitment to ensuring that customers can comply with EU Data Protection requirements when using our services.

February 13: How to Enable Multi-Factor Authentication for AWS Services by Using AWS Microsoft AD and On-Premises Credentials
You can now enable multi-factor authentication (MFA) for users of AWS services such as Amazon WorkSpaces and Amazon QuickSight and their on-premises credentials by using your AWS Directory Service for Microsoft Active Directory (Enterprise Edition) directory, also known as AWS Microsoft AD. MFA adds an extra layer of protection to a user name and password (the first “factor”) by requiring users to enter an authentication code (the second factor), which has been provided by your virtual or hardware MFA solution. These factors together provide additional security by preventing access to AWS services, unless users supply a valid MFA code.

February 13: How to Create an Organizational Chart with Separate Hierarchies by Using Amazon Cloud Directory
Amazon Cloud Directory enables you to create directories for a variety of use cases, such as organizational charts, course catalogs, and device registries. Cloud Directory offers you the flexibility to create directories with hierarchies that span multiple dimensions. For example, you can create an organizational chart that you can navigate through separate hierarchies for reporting structure, location, and cost center. In this blog post, I show how to use Cloud Directory APIs to create an organizational chart with two separate hierarchies in a single directory. I also show how to navigate the hierarchies and retrieve data. I use the Java SDK for all the sample code in this post, but you can use other language SDKs or the AWS CLI.

February 10: How to Easily Log On to AWS Services by Using Your On-Premises Active Directory
AWS Directory Service for Microsoft Active Directory (Enterprise Edition), also known as Microsoft AD, now enables your users to log on with just their on-premises Active Directory (AD) user name—no domain name is required. This new domainless logon feature makes it easier to set up connections to your on-premises AD for use with applications such as Amazon WorkSpaces and Amazon QuickSight, and it keeps the user logon experience free from network naming. This new interforest trusts capability is now available when using Microsoft AD with Amazon WorkSpaces and Amazon QuickSight Enterprise Edition. In this blog post, I explain how Microsoft AD domainless logon works with AD interforest trusts, and I show an example of setting up Amazon WorkSpaces to use this capability.

February 9: New! Attach an AWS IAM Role to an Existing Amazon EC2 Instance by Using the AWS CLI
AWS Identity and Access Management (IAM) roles enable your applications running on Amazon EC2 to use temporary security credentials that AWS creates, distributes, and rotates automatically. Using temporary credentials is an IAM best practice because you do not need to maintain long-term keys on your instance. Using IAM roles for EC2 also eliminates the need to use long-term AWS access keys that you have to manage manually or programmatically. Starting today, you can enable your applications to use temporary security credentials provided by AWS by attaching an IAM role to an existing EC2 instance. You can also replace the IAM role attached to an existing EC2 instance. In this blog post, I show how you can attach an IAM role to an existing EC2 instance by using the AWS CLI.

February 8: How to Remediate Amazon Inspector Security Findings Automatically
The Amazon Inspector security assessment service can evaluate the operating environments and applications you have deployed on AWS for common and emerging security vulnerabilities automatically. As an AWS-built service, Amazon Inspector is designed to exchange data and interact with other core AWS services not only to identify potential security findings but also to automate addressing those findings. Previous related blog posts showed how you can deliver Amazon Inspector security findings automatically to third-party ticketing systems and automate the installation of the Amazon Inspector agent on new Amazon EC2 instances. In this post, I show how you can automatically remediate findings generated by Amazon Inspector. To get started, you must first run an assessment and publish any security findings to an Amazon Simple Notification Service (SNS) topic. Then, you create an AWS Lambda function that is triggered by those notifications. Finally, the Lambda function examines the findings and then implements the appropriate remediation based on the type of issue.

February 6: How to Simplify Security Assessment Setup Using Amazon EC2 Systems Manager and Amazon Inspector
In a July 2016 AWS Blog post, I discussed how to integrate Amazon Inspector with third-party ticketing systems by using Amazon Simple Notification Service (SNS) and AWS Lambda. This AWS Security Blog post continues in the same vein, describing how to use Amazon Inspector to automate various aspects of security management. In this post, I show you how to install the Amazon Inspector agent automatically through the Amazon EC2 Systems Manager when a new Amazon EC2 instance is launched. In a subsequent post, I will show you how to update EC2 instances automatically that run Linux when Amazon Inspector discovers a missing security patch.

Image of lock and key

January

January 30: How to Protect Data at Rest with Amazon EC2 Instance Store Encryption
Encrypting data at rest is vital for regulatory compliance to ensure that sensitive data saved on disks is not readable by any user or application without a valid key. Some compliance regulations such as PCI DSS and HIPAA require that data at rest be encrypted throughout the data lifecycle. To this end, AWS provides data-at-rest options and key management to support the encryption process. For example, you can encrypt Amazon EBS volumes and configure Amazon S3 buckets for server-side encryption (SSE) using AES-256 encryption. Additionally, Amazon RDS supports Transparent Data Encryption (TDE). Instance storage provides temporary block-level storage for Amazon EC2 instances. This storage is located on disks attached physically to a host computer. Instance storage is ideal for temporary storage of information that frequently changes, such as buffers, caches, and scratch data. By default, files stored on these disks are not encrypted. In this blog post, I show a method for encrypting data on Linux EC2 instance stores by using Linux built-in libraries. This method encrypts files transparently, which protects confidential data. As a result, applications that process the data are unaware of the disk-level encryption.

January 27: How to Detect and Automatically Remediate Unintended Permissions in Amazon S3 Object ACLs with CloudWatch Events
Amazon S3 Access Control Lists (ACLs) enable you to specify permissions that grant access to S3 buckets and objects. When S3 receives a request for an object, it verifies whether the requester has the necessary access permissions in the associated ACL. For example, you could set up an ACL for an object so that only the users in your account can access it, or you could make an object public so that it can be accessed by anyone. If the number of objects and users in your AWS account is large, ensuring that you have attached correctly configured ACLs to your objects can be a challenge. For example, what if a user were to call the PutObjectAcl API call on an object that is supposed to be private and make it public? Or, what if a user were to call the PutObject with the optional Acl parameter set to public-read, therefore uploading a confidential file as publicly readable? In this blog post, I show a solution that uses Amazon CloudWatch Events to detect PutObject and PutObjectAcl API calls in near-real time and helps ensure that the objects remain private by making automatic PutObjectAcl calls, when necessary.

January 26: Now Available: Amazon Cloud Directory—A Cloud-Native Directory for Hierarchical Data
Today we are launching Amazon Cloud Directory. This service is purpose-built for storing large amounts of strongly typed hierarchical data. With the ability to scale to hundreds of millions of objects while remaining cost-effective, Cloud Directory is a great fit for all sorts of cloud and mobile applications.

January 24: New SOC 2 Report Available: Confidentiality
As with everything at Amazon, the success of our security and compliance program is primarily measured by one thing: our customers’ success. Our customers drive our portfolio of compliance reports, attestations, and certifications that support their efforts in running a secure and compliant cloud environment. As a result of our engagement with key customers across the globe, we are happy to announce the publication of our new SOC 2 Confidentiality report. This report is available now through AWS Artifact in the AWS Management Console.

January 18: Compliance in the Cloud for New Financial Services Cybersecurity Regulations
Financial regulatory agencies are focused more than ever on ensuring responsible innovation. Consequently, if you want to achieve compliance with financial services regulations, you must be increasingly agile and employ dynamic security capabilities. AWS enables you to achieve this by providing you with the tools you need to scale your security and compliance capabilities on AWS. The following breakdown of the most recent cybersecurity regulations, NY DFS Rule 23 NYCRR 500, demonstrates how AWS continues to focus on your regulatory needs in the financial services sector.

January 9: New Amazon GameDev Blog Post: Protect Multiplayer Game Servers from DDoS Attacks by Using Amazon GameLift
In online gaming, distributed denial of service (DDoS) attacks target a game’s network layer, flooding servers with requests until performance degrades considerably. These attacks can limit a game’s availability to players and limit the player experience for those who can connect. Today’s new Amazon GameDev Blog post uses a typical game server architecture to highlight DDoS attack vulnerabilities and discusses how to stay protected by using built-in AWS Cloud security, AWS security best practices, and the security features of Amazon GameLift. Read the post to learn more.

January 6: The Top 10 Most Downloaded AWS Security and Compliance Documents in 2016
The following list includes the 10 most downloaded AWS security and compliance documents in 2016. Using this list, you can learn about what other people found most interesting about security and compliance last year.

January 6: FedRAMP Compliance Update: AWS GovCloud (US) Region Receives a JAB-Issued FedRAMP High Baseline P-ATO for Three New Services
Three new services in the AWS GovCloud (US) region have received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) under the Federal Risk and Authorization Management Program (FedRAMP). JAB issued the authorization at the High baseline, which enables US government agencies and their service providers the capability to use these services to process the government’s most sensitive unclassified data, including Personal Identifiable Information (PII), Protected Health Information (PHI), Controlled Unclassified Information (CUI), criminal justice information (CJI), and financial data.

January 4: The Top 20 Most Viewed AWS IAM Documentation Pages in 2016
The following 20 pages were the most viewed AWS Identity and Access Management (IAM) documentation pages in 2016. I have included a brief description with each link to give you a clearer idea of what each page covers. Use this list to see what other people have been viewing and perhaps to pique your own interest about a topic you’ve been meaning to research.

January 3: The Most Viewed AWS Security Blog Posts in 2016
The following 10 posts were the most viewed AWS Security Blog posts that we published during 2016. You can use this list as a guide to catch up on your blog reading or even read a post again that you found particularly useful.

January 3: How to Monitor AWS Account Configuration Changes and API Calls to Amazon EC2 Security Groups
You can use AWS security controls to detect and mitigate risks to your AWS resources. The purpose of each security control is defined by its control objective. For example, the control objective of an Amazon VPC security group is to permit only designated traffic to enter or leave a network interface. Let’s say you have an Internet-facing e-commerce website, and your security administrator has determined that only HTTP (TCP port 80) and HTTPS (TCP 443) traffic should be allowed access to the public subnet. As a result, your administrator configures a security group to meet this control objective. What if, though, someone were to inadvertently change this security group’s rules and enable FTP or other protocols to access the public subnet from any location on the Internet? That expanded access could weaken the security posture of your assets. Consequently, your administrator might need to monitor the integrity of your company’s security controls so that the controls maintain their desired effectiveness. In this blog post, I explore two methods for detecting unintended changes to VPC security groups. The two methods address not only control objectives but also control failures.

If you have questions about or issues with implementing the solutions in any of these posts, please start a new thread on the forum identified near the end of each post.

– Craig

Secure Amazon EMR with Encryption

Post Syndicated from Sai Sriparasa original https://aws.amazon.com/blogs/big-data/secure-amazon-emr-with-encryption/

In the last few years, there has been a rapid rise in enterprises adopting the Apache Hadoop ecosystem for critical workloads that process sensitive or highly confidential data. Due to the highly critical nature of the workloads, the enterprises implement certain organization/industry wide policies and certain regulatory or compliance policies. Such policy requirements are designed to protect sensitive data from unauthorized access.

A common requirement within such policies is about encrypting data at-rest and in-flight. Amazon EMR uses “security configurations” to make it easy to specify the encryption keys and certificates, ranging from AWS Key Management Service to supplying your own custom encryption materials provider.

You create a security configuration that specifies encryption settings and then use the configuration when you create a cluster. This makes it easy to build the security configuration one time and use it for any number of clusters.

o_Amazon_EMR_Encryption_1

In this post, I go through the process of setting up the encryption of data at multiple levels using security configurations with EMR. Before I dive deep into encryption, here are the different phases where data needs to be encrypted.

Data at rest

  • Data residing on Amazon S3—S3 client-side encryption with EMR
  • Data residing on disk—the Amazon EC2 instance store volumes (except boot volumes) and the attached Amazon EBS volumes of cluster instances are encrypted using Linux Unified Key System (LUKS)

Data in transit

  • Data in transit from EMR to S3, or vice versa—S3 client side encryption with EMR
  • Data in transit between nodes in a cluster—in-transit encryption via Secure Sockets Layer (SSL) for MapReduce and Simple Authentication and Security Layer (SASL) for Spark shuffle encryption
  • Data being spilled to disk or cached during a shuffle phase—Spark shuffle encryption or LUKS encryption

Encryption walkthrough

For this post, you create a security configuration that implements encryption in transit and at rest. To achieve this, you create the following resources:

  • KMS keys for LUKS encryption and S3 client-side encryption for data exiting EMR to S3
  • SSL certificates to be used for MapReduce shuffle encryption
  • The environment into which the EMR cluster is launched. For this post, you launch EMR in private subnets and set up an S3 VPC endpoint to get the data from S3.
  • An EMR security configuration

All of the scripts and code snippets used for this walkthrough are available on the aws-blog-emrencryption GitHub repo.

Generate KMS keys

For this walkthrough, you use AWS KMS, a managed service that makes it easy for you to create and control the encryption keys used to encrypt your data and disks.

You generate two KMS master keys, one for S3 client-side encryption to encrypt data going out of EMR and the other for LUKS encryption to encrypt the local disks. The Hadoop MapReduce framework uses HDFS. Spark uses the local file system on each slave instance for intermediate data throughout a workload, where data could be spilled to disk when it overflows memory.

To generate the keys, use the kms.json AWS CloudFormation script.  As part of this script, provide an alias name, or display name, for the keys. An alias must be in the “alias/aliasname” format, and can only contain alphanumeric characters, an underscore, or a dash.

o_Amazon_EMR_Encryption_2

After you finish generating the keys, the ARNs are available as part of the outputs.

o_Amazon_EMR_Encryption_3

Generate SSL certificates

The SSL certificates allow the encryption of the MapReduce shuffle using HTTPS while the data is in transit between nodes.

o_Amazon_EMR_Encryption_4

For this walkthrough, use OpenSSL to generate a self-signed X.509 certificate with a 2048-bit RSA private key that allows access to the issuer’s EMR cluster instances. This prompts you to provide subject information to generate the certificates.

Use the cert-create.sh script to generate SSL certificates that are compressed into a zip file. Upload the zipped certificates to S3 and keep a note of the S3 prefix. You use this S3 prefix when you build your security configuration.

Important

This example is a proof-of-concept demonstration only. Using self-signed certificates is not recommended and presents a potential security risk. For production systems, use a trusted certification authority (CA) to issue certificates.

To implement certificates from custom providers, use the TLSArtifacts provider interface.

Build the environment

For this walkthrough, launch an EMR cluster into a private subnet. If you already have a VPC and would like to launch this cluster into a public subnet, skip this section and jump to the Create a Security Configuration section.

To launch the cluster into a private subnet, the environment must include the following resources:

  • VPC
  • Private subnet
  • Public subnet
  • Bastion
  • Managed NAT gateway
  • S3 VPC endpoint

As the EMR cluster is launched into a private subnet, you need a bastion or a jump server to SSH onto the cluster. After the cluster is running, you need access to the Internet to request the data keys from KMS. Private subnets do not have access to the Internet directly, so route this traffic via the managed NAT gateway. Use an S3 VPC endpoint to provide a highly reliable and a secure connection to S3.

o_Amazon_EMR_Encryption_5

In the CloudFormation console, create a new stack for this environment and use the environment.json CloudFormation template to deploy it.

As part of the parameters, pick an instance family for the bastion and an EC2 key pair to be used to SSH onto the bastion. Provide an appropriate stack name and add the appropriate tags. For example, the following screenshot is the review step for a stack that I created.

o_Amazon_EMR_Encryption_6

After creating the environment stack, look at the Output tab and make a note of the VPC ID, bastion, and private subnet IDs, as you will use them when you launch the EMR cluster resources.

o_Amazon_EMR_Encryption_7

Create a security configuration

The final step before launching the secure EMR cluster is to create a security configuration. For this walkthrough, create a security configuration with S3 client-side encryption using EMR, and LUKS encryption for local volumes using the KMS keys created earlier. You also use the SSL certificates generated and uploaded to S3 earlier for encrypting the MapReduce shuffle.

o_Amazon_EMR_Encryption_8

Launch an EMR cluster

Now, you can launch an EMR cluster in the private subnet. First, verify that the service role being used for EMR has access to the AmazonElasticMapReduceRole managed service policy. The default service role is EMR_DefaultRole. For more information, see Configuring User Permissions Using IAM Roles.

From the Build an environment section, you have the VPC ID and the subnet ID for the private subnet into which the EMR cluster should be launched. Select those values for the Network and EC2 Subnet fields. In the next step, provide a name and tags for the cluster.

o_Amazon_EMR_Encryption_9

The last step is to select the private key, assign the security configuration that was created in the Create a security configuration section, and choose Create Cluster.

o_Amazon_EMR_Encryption_10

Now that you have the environment and the cluster up and running, you can get onto the master node to run scripts. You need the IP address, which you can retrieve from the EMR console page. Choose Hardware, Master Instance group and note the private IP address of the master node.

o_Amazon_EMR_Encryption_11

As the master node is in a private subnet, SSH onto the bastion instance first and then jump from the bastion instance to the master node. For information about how to SSH onto the bastion and then to the Hadoop master, open the ssh-commands.txt file. For more information about how to get onto the bastion, see the Securely Connect to Linux Instances Running in a Private Amazon VPC post.

After you are on the master node, bring your own Hive or Spark scripts. For testing purposes, the GitHub /code directory includes the test.py PySpark and test.q Hive scripts.

Summary

As part of this post, I’ve identified the different phases where data needs to be encrypted and walked through how data in each phase can be encrypted. Then, I described a step-by-step process to achieve all the encryption prerequisites, such as building the KMS keys, building SSL certificates, and launching the EMR cluster with a strong security configuration. As part of this walkthrough, you also secured the data by launching your cluster in a private subnet within a VPC, and used a bastion instance for access to the EMR cluster.

If you have questions or suggestions, please comment below.


About the Author

sai_90Sai Sriparasa is a Big Data Consultant for AWS Professional Services. He works with our customers to provide strategic & tactical big data solutions with an emphasis on automation, operations & security on AWS. In his spare time, he follows sports and current affairs.

 

 

 


Related

Implementing Authorization and Auditing using Apache Ranger on Amazon EMR

EMRRanger_1

 

 

 

 

 

 

 

How to Protect Data at Rest with Amazon EC2 Instance Store Encryption

Post Syndicated from Assaf Namer original https://aws.amazon.com/blogs/security/how-to-protect-data-at-rest-with-amazon-ec2-instance-store-encryption/

Encrypting data at rest is vital for regulatory compliance to ensure that sensitive data saved on disks is not readable by any user or application without a valid key. Some compliance regulations such as PCI DSS and HIPAA require that data at rest be encrypted throughout the data lifecycle. To this end, AWS provides data-at-rest options and key management to support the encryption process. For example, you can encrypt Amazon EBS volumes and configure Amazon S3 buckets for server-side encryption (SSE) using AES-256 encryption. Additionally, Amazon RDS supports Transparent Data Encryption (TDE).

Instance storage provides temporary block-level storage for Amazon EC2 instances. This storage is located on disks attached physically to a host computer. Instance storage is ideal for temporary storage of information that frequently changes, such as buffers, caches, and scratch data. By default, files stored on these disks are not encrypted.

In this blog post, I show a method for encrypting data on Linux EC2 instance stores by using Linux built-in libraries. This method encrypts files transparently, which protects confidential data. As a result, applications that process the data are unaware of the disk-level encryption.

First, though, I will provide some background information required for this solution.

Disk and file system encryption

You can use two methods to encrypt files on instance stores. The first method is disk encryption, in which the entire disk or block within the disk is encrypted by using one or more encryption keys. Disk encryption operates below the file-system level, is operating-system agnostic, and hides directory and file information such as name and size. Encrypting File System, for example, is a Microsoft extension to the Windows NT operating system’s New Technology File System (NTFS) that provides disk encryption.

The second method is file-system-level encryption. Files and directories are encrypted, but not the entire disk or partition. File-system-level encryption operates on top of the file system and is portable across operating systems.

The Linux dm-crypt Infrastructure

Dm-crypt is a Linux kernel-level encryption mechanism that allows users to mount an encrypted file system. Mounting a file system is the process in which a file system is attached to a directory (mount point), making it available to the operating system. After mounting, all files in the file system are available to applications without any additional interaction; however, these files are encrypted when stored on disk.

Device mapper is an infrastructure in the Linux 2.6 and 3.x kernel that provides a generic way to create virtual layers of block devices. The device mapper crypt target provides transparent encryption of block devices using the kernel crypto API. The solution in this post uses dm-crypt in conjunction with a disk-backed file system mapped to a logical volume by the Logical Volume Manager (LVM). LVM provides logical volume management for the Linux kernel.

The following diagram depicts the relationship between an application, file system, and dm-crypt. Dm-crypt sits between the physical disk and the file system, and data written from the operating system to the disk is encrypted. The application is unaware of such disk-level encryption. Applications use a specific mount point in order to store and retrieve files, and these files are encrypted when stored to disk. If the disk is lost or stolen, the data on the disk is useless.

Overview of the solution

In this post, I create a new file system called secretfs. This file system is encrypted using dm-crypt. This example uses LVM and Linux Unified Key Setup (LUKS) to encrypt a file system. The encrypted file system sits on the EC2 instance store disk. Note that the internal store file system is not encrypted but rather a newly created file system.

The following diagram shows how the newly encrypted file system resides in the EC2 internal store disk. Applications that need to save sensitive data temporarily will use the secretfs mount point (‘/mnt/secretfs’) directory to store temporary or scratch files.

Requirements

This solution has three requirements for the solution to work. First, you need to configure the related items on boot using EC2 launch configuration because the encrypted file system is created at boot time. An administrator should have full control over every step and should be able to grant and revoke the encrypted file system creation or access to keys. Second, you must enable logging for every encryption or decryption request by using AWS CloudTrail. In particular, logging is critical when the keys are created and when an EC2 instance requests password decryption to unlock an encrypted file system. Lastly, you should integrate the solution with other AWS services, as described in the next section.

AWS services used in this solution

I use the following AWS services in this solution:

  • AWS Key Management Service (KMS) – AWS KMS is a managed service that enables easy creation and control of encryption keys used to encrypt data. KMS uses envelope encryption in which data is encrypted using a data key that is then encrypted using a master key. Master keys can also be used to encrypt and decrypt up to 4 kilobytes of data. In our solution, I use KMS encrypt/decrypt APIs to encrypt the encrypted file system’s password. See more information about envelope encryption.
  • AWS CloudTrail – CloudTrail records AWS API calls for your account. KMS and CloudTrail are fully integrated, which means CloudTrail logs each request to and from KMS for future auditing. This post’s solution enables CloudTrail for monitoring and audit.
  • Amazon S3 – S3 is an AWS storage I use S3 in this post to save the encrypted file system password.
  • AWS Identity and Access Management (IAM) – AWS IAM enables you to control access securely to AWS services. In this post, I configure and attach a policy to EC2 instances that allows access to the S3 bucket to read the encrypted password file and to KMS to decrypt the file system password.

Architectural overview

The following diagram illustrates the steps in the process of encrypting the EC2 instance store.

Diagram illustrating the steps in the process of encrypting the EC2 instance store

In this architectural diagram:

  1. The administrator encrypts a secret password by using KMS. The encrypted password is stored in a file.
  2. The administrator puts the file containing the encrypted password in an S3 bucket.
  3. At instance boot time, the instance copies the encrypted file to an internal disk.
  4. The EC2 instance then decrypts the file using KMS and retrieves the plaintext password. The password is used to configure the Linux encrypted file system with LUKS. All data written to the encrypted file system is encrypted by using an AES-128 encryption algorithm when stored on disk.

Implementing the solution

Create an S3 bucket

First, you create a bucket to store the encrypted password file. This file contains the password (key) used to encrypt the file system. Each EC2 instance upon boot copies the encrypted password file, decrypts the file, and retrieves the plaintext password, which is used to encrypt the file system on the instance store disk.

In this step, you create the S3 bucket that stores the encrypted password file, and apply the necessary permissions. If you are using an Amazon VPC endpoint for Amazon S3, you also need to add permissions to the bucket to allow access from the endpoint. (For a detailed example, see Example Bucket Policies for VPC Endpoints for Amazon S3.)

To create a new bucket:

  1. Sign in to the S3 console and choose Create Bucket.
  2. In the Bucket Name box, type your bucket name and then choose Create.
  3. You should see the details about your new bucket in the right pane.

Configure IAM roles and permission for the S3 bucket

When an EC2 instance boots, it must read the encrypted password file from S3 and then decrypt the password using KMS. In this section, I configure an IAM policy that allows the EC2 instance to assume a role with the right access permissions to the S3 bucket. The following policy grants the correct access permissions, in which your-bucket-name is the S3 bucket that stores the encrypted password file.

To create and configure the IAM policy:

  1. Sign in to the AWS Management Console and navigate to the IAM console. In the navigation pane, choose Policies, choose Create Policy, select Create Your Own Policy, name and describe the policy, and paste the following policy. Choose Create Policy. For more details, see Creating Customer Managed Policies.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Stmt1478729875000",
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject"
                ],
                "Resource": [
                    "arn:aws:s3:::<your-bucket-name>/LuksInternalStorageKey"
                ]
            }
        ]
    }

    The preceding policy grants read access to the bucket where the encrypted password is stored. This policy is used by the EC2 instance, which requires you to configure an IAM role. You will configure KMS permissions later in this post.

  1. In the IAM console, choose Roles, and then choose Create New Role.
  2. In Step 1: Role Name, type your role name, and choose Next Step.
  3. In Step 2: Select Role Type, choose Amazon EC2 and choose Next Step.
  4. In Step 3: Established Trust, choose Next Step.
  5. In Step 4: Attach Policy, choose the policy you created in Step 1, as shown in the following screenshot.Screenshot of choosing the policy
  1. In Step 5: Review, review the configuration and complete the steps. The newly created IAM role is now ready. You will use it when launching new EC2 instances, which will have the permission to access the encrypted password file in the S3 bucket.

You now should have a new IAM role listed on the Roles page. Choose Roles to list all roles in your account and then select the role you just created as shown in the following screenshot.

Screenshot of selecting the role you just created

Encrypt a secret password with KMS and store it in the S3 bucket

Next, you use KMS to encrypt a secret password. To encrypt text by using KMS, you must use AWS CLI. AWS CLI is installed by default on EC2 Amazon Linux instances and you can install it on Linux, Windows, or Mac computers.

To encrypt a secret password with KMS and store it in the S3 bucket:

  • From the AWS CLI, type the following command to encrypt a secret password by using KMS (replace the region name with your region). You must have the right permissions in order to create keys and put objects in S3 (for more details, see Using IAM Policies with AWS KMS). In this example, I have used AWS CLI on the Linux OS to encrypt and generate the encrypted password file.
aws --region us-east-1 kms encrypt --key-id 'alias/EncFSForEC2InternalStorageKey' --plaintext "ThisIs-a-SecretPassword" --query CiphertextBlob --output text | base64 --decode > LuksInternalStorageKey

aws s3 cp LuksInternalStorageKey s3://<bucket-name>/LuksInternalStorageKey

The preceding commands encrypt the password (Base64 is used to decode the cipher text). The command outputs the results to a file called LuksInternalStorageKey. It also creates a key alias (key name) that makes it easy to identify different keys; the alias is called EncFSForEC2InternalStorageKey. The file is then copied to the S3 bucket I created earlier in this post.

Configure permissions to allow the role to access the KMS key

Next, you grant the role access to the key you just created with KMS:

  1. From the IAM console, choose Encryption keys from the navigation pane.
  1. Select EncFSForEC2InternalStorageKey (this is the key alias you configured in the previous section). To add a new role that can use the key, scroll down to the Key Policy and then choose Add under Key Users.Screenshot of adding a new role that can use the KMS key
  1. Choose the new role you created earlier in this post and then choose Attach.
  1. The role now has permission to use the key.

Configure EC2 with role and launch configurations

In this section, you launch a new EC2 instance with the new IAM role and a bootstrap script that executes the steps to encrypt the file system, as described earlier in the “Architectural overview” section:

  1. In the EC2 console, launch a new instance (see this tutorial for more details). In Step 3: Configure Instance Details, choose the IAM role you configured earlier, as shown in the following screenshot.Screenshot of configuring EC2 instance details
  1. Expand the Advanced Details section (see previous screenshot) and paste the following script in the EC2 instance’s User data Keep the As text check box selected. The script will be executed at EC2 boot time.
    #!/bin/bash
    
    ## Initial setup to be executed on boot
    ##====================================
    
    # Create an empty file. This file will be used to host the file system.
    # In this example we create a 2 GB file called secretfs (Secret File System).
    dd of=secretfs bs=1G count=0 seek=2
    # Lock down normal access to the file.
    chmod 600 secretfs
    # Associate a loopback device with the file.
    losetup /dev/loop0 secretfs
    #Copy encrypted password file from S3. The password is used to configure LUKE later on.
    aws s3 cp s3://an-internalstoragekeybucket/LuksInternalStorageKey .
    # Decrypt the password from the file with KMS, save the secret password in LuksClearTextKey
    LuksClearTextKey=$(aws --region us-east-1 kms decrypt --ciphertext-blob fileb://LuksInternalStorageKey --output text --query Plaintext | base64 --decode)
    # Encrypt storage in the device. cryptsetup will use the Linux
    # device mapper to create, in this case, /dev/mapper/secretfs.
    # Initialize the volume and set an initial key.
    echo "$LuksClearTextKey" | cryptsetup -y luksFormat /dev/loop0
    # Open the partition, and create a mapping to /dev/mapper/secretfs.
    echo "$LuksClearTextKey" | cryptsetup luksOpen /dev/loop0 secretfs
    # Clear the LuksClearTextKey variable because we don't need it anymore.
    unset LuksClearTextKey
    # Check its status (optional).
    cryptsetup status secretfs
    # Zero out the new encrypted device.
    dd if=/dev/zero of=/dev/mapper/secretfs
    # Create a file system and verify its status.
    mke2fs -j -O dir_index /dev/mapper/secretfs
    # List file system configuration (optional).
    tune2fs -l /dev/mapper/secretfs
    # Mount the new file system to /mnt/secretfs.
    mkdir /mnt/secretfs
    mount /dev/mapper/secretfs /mnt/secretfs

  2. If you have not enabled it already, be sure to enable CloudTrail on your account. Using CloudTrail, you will be able to monitor and audit access to the KMS key.
  3. Launch the EC2 instance, which copies the password file from S3, decrypts the file using KMS, and configures an encrypted file system. The file system is mounted on /mnt/secretfs. Therefore, every file written to this mount point is encrypted when stored to disk. Applications that process sensitive data and need temporary storage should use the encrypted file system by writing and reading files from the mount point, ‘/mnt/secretfs’. The rest of the file system (for example, /home/ec2-user) is not encrypted.

You can list the encrypted file system’s status. First, SSH to the EC2 instance using the key pair you used to launch the EC2 instance. (For more information about logging in to an EC2 instance using a key pair, see Getting Started with Amazon EC2 Linux Instances.) Then, run the following command as root.

[[email protected] ec2-user]# cryptsetup status secretfs
/dev/mapper/secretfs is active and is in use.
    type:    LUKS1
    cipher:  aes-xts-plain64
    keysize: 256 bits
    device:  /dev/loop0
    loop:    /secretfs
    offset:  4096 sectors
    size:    4190208 sectors
    mode:    read/write

As the command’s results should show, the file system is encrypted with AES-256 using XTS mode. XTS is a configuration method that allows ciphers to work with large data streams, without the risk of compromising the provided security.

Conclusion

This blog post shows you how to encrypt a file system on EC2 instance storage by using built-in Linux libraries and drivers with LVM and LUKS, in conjunction with AWS services such as S3 and KMS. If your applications need temporary storage, you can use an EC2 internal disk that is physically attached to the host computer. The data on instance stores persists only during the lifetime of its associated instance. However, instance store volumes are not encrypted. This post provides a simple solution that balances between the speed and availability of instance stores and the need for encryption at rest when dealing with sensitive data.

If you have comments about this blog post, submit them in the “Comments” section below. If you have implementation questions about the solution in this post, please start a new thread on the EC2 forum.

– Assaf