Tag Archives: AWS KMS

Podcast: How AWS KMS could help customers meet encryption and deletion requirements, including GDPR

Post Syndicated from Katie Doptis original https://aws.amazon.com/blogs/security/podcast-how-aws-kms-could-help-customers-meet-encryption-and-deletion-requirements-including-gdpr/

Encryption is a powerful tool to protect your data but it can be difficult to get right because it demands understanding how encryption keys are created, distributed, used, and managed. To make encryption easier to use, we created AWS Key Management Service (KMS) to let you scale your use of the cloud without struggling to ensure encryption is used consistently across workloads.

Because AWS KMS makes it easy for you to create and control the encryption keys used to encrypt your data, the service can be used to meet both encryption and deletion requirements in a data lifecycle management policy. Cryptographic deletion is the idea is that you can delete a relatively small number of keys to make a large amount of encrypted data irretrievable. This concept is being widely discussed as an option for organizations facing data deletion requirements, such as those in the EU’s General Data Protection Regulation (GDPR).

Listen to the podcast and hear from Ken Beer, general manager of AWS KMS, about best practices related to encryption, key management, and cryptographic deletion. He also covers the advantages of KMS over on-premises systems and how the service has been designed so that even AWS operators can’t access customer keys.

Now You Can Create Encrypted Amazon EBS Volumes by Using Your Custom Encryption Keys When You Launch an Amazon EC2 Instance

Post Syndicated from Nishit Nagar original https://aws.amazon.com/blogs/security/create-encrypted-amazon-ebs-volumes-custom-encryption-keys-launch-amazon-ec2-instance-2/

Amazon Elastic Block Store (EBS) offers an encryption solution for your Amazon EBS volumes so you don’t have to build, maintain, and secure your own infrastructure for managing encryption keys for block storage. Amazon EBS encryption uses AWS Key Management Service (AWS KMS) customer master keys (CMKs) when creating encrypted Amazon EBS volumes, providing you all the benefits associated with using AWS KMS. You can specify either an AWS managed CMK or a customer-managed CMK to encrypt your Amazon EBS volume. If you use a customer-managed CMK, you retain granular control over your encryption keys, such as having AWS KMS rotate your CMK every year. To learn more about creating CMKs, see Creating Keys.

In this post, we demonstrate how to create an encrypted Amazon EBS volume using a customer-managed CMK when you launch an EC2 instance from the EC2 console, AWS CLI, and AWS SDK.

Creating an encrypted Amazon EBS volume from the EC2 console

Follow these steps to launch an EC2 instance from the EC2 console with Amazon EBS volumes that are encrypted by customer-managed CMKs:

  1. Sign in to the AWS Management Console and open the EC2 console.
  2. Select Launch instance, and then, in Step 1 of the wizard, select an Amazon Machine Image (AMI).
  3. In Step 2 of the wizard, select an instance type, and then provide additional configuration details in Step 3. For details about configuring your instances, see Launching an Instance.
  4. In Step 4 of the wizard, specify additional EBS volumes that you want to attach to your instances.
  5. To create an encrypted Amazon EBS volume, first add a new volume by selecting Add new volume. Leave the Snapshot column blank.
  6. In the Encrypted column, select your CMK from the drop-down menu. You can also paste the full Amazon Resource Name (ARN) of your custom CMK key ID in this box. To learn more about finding the ARN of a CMK, see Working with Keys.
  7. Select Review and Launch. Your instance will launch with an additional Amazon EBS volume with the key that you selected. To learn more about the launch wizard, see Launching an Instance with Launch Wizard.

Creating Amazon EBS encrypted volumes from the AWS CLI or SDK

You also can use RunInstances to launch an instance with additional encrypted Amazon EBS volumes by setting Encrypted to true and adding kmsKeyID along with the actual key ID in the BlockDeviceMapping object, as shown in the following command:

$> aws ec2 run-instances –image-id ami-b42209de –count 1 –instance-type m4.large –region us-east-1 –block-device-mappings file://mapping.json

In this example, mapping.json describes the properties of the EBS volume that you want to create:


{
"DeviceName": "/dev/sda1",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 100,
"VolumeType": "gp2",
"Encrypted": true,
"kmsKeyID": "arn:aws:kms:us-east-1:012345678910:key/abcd1234-a123-456a-a12b-a123b4cd56ef"
}
}

You can also launch instances with additional encrypted EBS data volumes via an Auto Scaling or Spot Fleet by creating a launch template with the above BlockDeviceMapping. For example:

$> aws ec2 create-launch-template –MyLTName –image-id ami-b42209de –count 1 –instance-type m4.large –region us-east-1 –block-device-mappings file://mapping.json

To learn more about launching an instance with the AWS CLI or SDK, see the AWS CLI Command Reference.

In this blog post, we’ve demonstrated a single-step, streamlined process for creating Amazon EBS volumes that are encrypted under your CMK when you launch your EC2 instance, thereby streamlining your instance launch workflow. To start using this functionality, navigate to the EC2 console.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon EC2 forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter name Type Description
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefix String The Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdir String The Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role> String The name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_list StringList A comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connection String The name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.user String The user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.password Secure String The password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

Property Default Description
S3Bucket mybucket The bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClusters Requires Input A comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroups Requires Input A list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.

 

 

 

 

Rotate Amazon RDS database credentials automatically with AWS Secrets Manager

Post Syndicated from Apurv Awasthi original https://aws.amazon.com/blogs/security/rotate-amazon-rds-database-credentials-automatically-with-aws-secrets-manager/

Recently, we launched AWS Secrets Manager, a service that makes it easier to rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. You can configure Secrets Manager to rotate secrets automatically, which can help you meet your security and compliance needs. Secrets Manager offers built-in integrations for MySQL, PostgreSQL, and Amazon Aurora on Amazon RDS, and can rotate credentials for these databases natively. You can control access to your secrets by using fine-grained AWS Identity and Access Management (IAM) policies. To retrieve secrets, employees replace plaintext secrets with a call to Secrets Manager APIs, eliminating the need to hard-code secrets in source code or update configuration files and redeploy code when secrets are rotated.

In this post, I introduce the key features of Secrets Manager. I then show you how to store a database credential for a MySQL database hosted on Amazon RDS and how your applications can access this secret. Finally, I show you how to configure Secrets Manager to rotate this secret automatically.

Key features of Secrets Manager

These features include the ability to:

  • Rotate secrets safely. You can configure Secrets Manager to rotate secrets automatically without disrupting your applications. Secrets Manager offers built-in integrations for rotating credentials for Amazon RDS databases for MySQL, PostgreSQL, and Amazon Aurora. You can extend Secrets Manager to meet your custom rotation requirements by creating an AWS Lambda function to rotate other types of secrets. For example, you can create an AWS Lambda function to rotate OAuth tokens used in a mobile application. Users and applications retrieve the secret from Secrets Manager, eliminating the need to email secrets to developers or update and redeploy applications after AWS Secrets Manager rotates a secret.
  • Secure and manage secrets centrally. You can store, view, and manage all your secrets. By default, Secrets Manager encrypts these secrets with encryption keys that you own and control. Using fine-grained IAM policies, you can control access to secrets. For example, you can require developers to provide a second factor of authentication when they attempt to retrieve a production database credential. You can also tag secrets to help you discover, organize, and control access to secrets used throughout your organization.
  • Monitor and audit easily. Secrets Manager integrates with AWS logging and monitoring services to enable you to meet your security and compliance requirements. For example, you can audit AWS CloudTrail logs to see when Secrets Manager rotated a secret or configure AWS CloudWatch Events to alert you when an administrator deletes a secret.
  • Pay as you go. Pay for the secrets you store in Secrets Manager and for the use of these secrets; there are no long-term contracts or licensing fees.

Get started with Secrets Manager

Now that you’re familiar with the key features, I’ll show you how to store the credential for a MySQL database hosted on Amazon RDS. To demonstrate how to retrieve and use the secret, I use a python application running on Amazon EC2 that requires this database credential to access the MySQL instance. Finally, I show how to configure Secrets Manager to rotate this database credential automatically. Let’s get started.

Phase 1: Store a secret in Secrets Manager

  1. Open the Secrets Manager console and select Store a new secret.
     
    Secrets Manager console interface
     
  2. I select Credentials for RDS database because I’m storing credentials for a MySQL database hosted on Amazon RDS. For this example, I store the credentials for the database superuser. I start by securing the superuser because it’s the most powerful database credential and has full access over the database.
     
    Store a new secret interface with Credentials for RDS database selected
     

    Note: For this example, you need permissions to store secrets in Secrets Manager. To grant these permissions, you can use the AWSSecretsManagerReadWriteAccess managed policy. Read the AWS Secrets Manager Documentation for more information about the minimum IAM permissions required to store a secret.

  3. Next, I review the encryption setting and choose to use the default encryption settings. Secrets Manager will encrypt this secret using the Secrets Manager DefaultEncryptionKeyDefaultEncryptionKey in this account. Alternatively, I can choose to encrypt using a customer master key (CMK) that I have stored in AWS KMS.
     
    Select the encryption key interface
     
  4. Next, I view the list of Amazon RDS instances in my account and select the database this credential accesses. For this example, I select the DB instance mysql-rds-database, and then I select Next.
     
    Select the RDS database interface
     
  5. In this step, I specify values for Secret Name and Description. For this example, I use Applications/MyApp/MySQL-RDS-Database as the name and enter a description of this secret, and then select Next.
     
    Secret Name and description interface
     
  6. For the next step, I keep the default setting Disable automatic rotation because my secret is used by my application running on Amazon EC2. I’ll enable rotation after I’ve updated my application (see Phase 2 below) to use Secrets Manager APIs to retrieve secrets. I then select Next.

    Note: If you’re storing a secret that you’re not using in your application, select Enable automatic rotation. See our AWS Secrets Manager getting started guide on rotation for details.

     
    Configure automatic rotation interface
     

  7. Review the information on the next screen and, if everything looks correct, select Store. We’ve now successfully stored a secret in Secrets Manager.
  8. Next, I select See sample code.
     
    The See sample code button
     
  9. Take note of the code samples provided. I will use this code to update my application to retrieve the secret using Secrets Manager APIs.
     
    Python sample code
     

Phase 2: Update an application to retrieve secret from Secrets Manager

Now that I have stored the secret in Secrets Manager, I update my application to retrieve the database credential from Secrets Manager instead of hard coding this information in a configuration file or source code. For this example, I show how to configure a python application to retrieve this secret from Secrets Manager.

  1. I connect to my Amazon EC2 instance via Secure Shell (SSH).
  2. Previously, I configured my application to retrieve the database user name and password from the configuration file. Below is the source code for my application.
    import MySQLdb
    import config

    def no_secrets_manager_sample()

    # Get the user name, password, and database connection information from a config file.
    database = config.database
    user_name = config.user_name
    password = config.password

    # Use the user name, password, and database connection information to connect to the database
    db = MySQLdb.connect(database.endpoint, user_name, password, database.db_name, database.port)

  3. I use the sample code from Phase 1 above and update my application to retrieve the user name and password from Secrets Manager. This code sets up the client and retrieves and decrypts the secret Applications/MyApp/MySQL-RDS-Database. I’ve added comments to the code to make the code easier to understand.
    # Use the code snippet provided by Secrets Manager.
    import boto3
    from botocore.exceptions import ClientError

    def get_secret():
    #Define the secret you want to retrieve
    secret_name = "Applications/MyApp/MySQL-RDS-Database"
    #Define the Secrets mManager end-point your code should use.
    endpoint_url = "https://secretsmanager.us-east-1.amazonaws.com"
    region_name = "us-east-1"

    #Setup the client
    session = boto3.session.Session()
    client = session.client(
    service_name='secretsmanager',
    region_name=region_name,
    endpoint_url=endpoint_url
    )

    #Use the client to retrieve the secret
    try:
    get_secret_value_response = client.get_secret_value(
    SecretId=secret_name
    )
    #Error handling to make it easier for your code to tolerate faults
    except ClientError as e:
    if e.response['Error']['Code'] == 'ResourceNotFoundException':
    print("The requested secret " + secret_name + " was not found")
    elif e.response['Error']['Code'] == 'InvalidRequestException':
    print("The request was invalid due to:", e)
    elif e.response['Error']['Code'] == 'InvalidParameterException':
    print("The request had invalid params:", e)
    else:
    # Decrypted secret using the associated KMS CMK
    # Depending on whether the secret was a string or binary, one of these fields will be populated
    if 'SecretString' in get_secret_value_response:
    secret = get_secret_value_response['SecretString']
    else:
    binary_secret_data = get_secret_value_response['SecretBinary']

    # Your code goes here.

  4. Applications require permissions to access Secrets Manager. My application runs on Amazon EC2 and uses an IAM role to obtain access to AWS services. I will attach the following policy to my IAM role. This policy uses the GetSecretValue action to grant my application permissions to read secret from Secrets Manager. This policy also uses the resource element to limit my application to read only the Applications/MyApp/MySQL-RDS-Database secret from Secrets Manager. You can visit the AWS Secrets Manager Documentation to understand the minimum IAM permissions required to retrieve a secret.
    {
    "Version": "2012-10-17",
    "Statement": {
    "Sid": "RetrieveDbCredentialFromSecretsManager",
    "Effect": "Allow",
    "Action": "secretsmanager:GetSecretValue",
    "Resource": "arn:aws:secretsmanager:::secret:Applications/MyApp/MySQL-RDS-Database"
    }
    }

Phase 3: Enable Rotation for Your Secret

Rotating secrets periodically is a security best practice because it reduces the risk of misuse of secrets. Secrets Manager makes it easy to follow this security best practice and offers built-in integrations for rotating credentials for MySQL, PostgreSQL, and Amazon Aurora databases hosted on Amazon RDS. When you enable rotation, Secrets Manager creates a Lambda function and attaches an IAM role to this function to execute rotations on a schedule you define.

Note: Configuring rotation is a privileged action that requires several IAM permissions and you should only grant this access to trusted individuals. To grant these permissions, you can use the AWS IAMFullAccess managed policy.

Next, I show you how to configure Secrets Manager to rotate the secret Applications/MyApp/MySQL-RDS-Database automatically.

  1. From the Secrets Manager console, I go to the list of secrets and choose the secret I created in the first step Applications/MyApp/MySQL-RDS-Database.
     
    List of secrets in the Secrets Manager console
     
  2. I scroll to Rotation configuration, and then select Edit rotation.
     
    Rotation configuration interface
     
  3. To enable rotation, I select Enable automatic rotation. I then choose how frequently I want Secrets Manager to rotate this secret. For this example, I set the rotation interval to 60 days.
     
    Edit rotation configuration interface
     
  4. Next, Secrets Manager requires permissions to rotate this secret on your behalf. Because I’m storing the superuser database credential, Secrets Manager can use this credential to perform rotations. Therefore, I select Use the secret that I provided in step 1, and then select Next.
     
    Select which secret to use in the Edit rotation configuration interface
     
  5. The banner on the next screen confirms that I have successfully configured rotation and the first rotation is in progress, which enables you to verify that rotation is functioning as expected. Secrets Manager will rotate this credential automatically every 60 days.
     
    Confirmation banner message
     

Summary

I introduced AWS Secrets Manager, explained the key benefits, and showed you how to help meet your compliance requirements by configuring AWS Secrets Manager to rotate database credentials automatically on your behalf. Secrets Manager helps you protect access to your applications, services, and IT resources without the upfront investment and on-going maintenance costs of operating your own secrets management infrastructure. To get started, visit the Secrets Manager console. To learn more, visit Secrets Manager documentation.

If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the Secrets Manager forum.

Want more AWS Security news? Follow us on Twitter.

AWS Key Management Service now offers FIPS 140-2 validated cryptographic modules enabling easier adoption of the service for regulated workloads

Post Syndicated from Sreekumar Pisharody original https://aws.amazon.com/blogs/security/aws-key-management-service-now-offers-fips-140-2-validated-cryptographic-modules-enabling-easier-adoption-of-the-service-for-regulated-workloads/

AWS Key Management Service (KMS) now uses FIPS 140-2 validated hardware security modules (HSM) and supports FIPS 140-2 validated endpoints, which provide independent assurances about the confidentiality and integrity of your keys. Having additional third-party assurances about the keys you manage in AWS KMS can make it easier to use the service for regulated workloads.

The process of gaining FIPS 140-2 validation is rigorous. First, AWS KMS HSMs were tested by an independent lab; those results were further reviewed by the Cryptographic Module Validation Program run by NIST. You can view the FIPS 140-2 certificate of the AWS Key Management Service HSM to get more details.

AWS KMS HSMs are designed so that no one, not even AWS employees, can retrieve your plaintext keys. The service uses the FIPS 140-2 validated HSMs to protect your keys when you request the service to create keys on your behalf or when you import them. Your plaintext keys are never written to disk and are only used in volatile memory of the HSMs while performing your requested cryptographic operation. Furthermore, AWS KMS keys are never transmitted outside the AWS Regions they were created. And HSM firmware updates are controlled by multi-party access that is audited and reviewed by an independent group within AWS.

AWS KMS HSMs are validated at level 2 overall and at level 3 in the following areas:

  • Cryptographic Module Specification
  • Roles, Services, and Authentication
  • Physical Security
  • Design Assurance

You can also make AWS KMS requests to API endpoints that terminate TLS sessions using a FIPS 140-2 validated cryptographic software module. To do so, connect to the unique FIPS 140-2 validated HTTPS endpoints in the AWS KMS requests made from your applications. AWS KMS FIPS 140-2 validated HTTPS endpoints are powered by the OpenSSL FIPS Object Module. FIPS 140-2 validated API endpoints are available in all commercial regions where AWS KMS is available.

Best Practices for Running Apache Kafka on AWS

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-kafka-on-aws/

This post was written in partnership with Intuit to share learnings, best practices, and recommendations for running an Apache Kafka cluster on AWS. Thanks to Vaishak Suresh and his colleagues at Intuit for their contribution and support.

Intuit, in their own words: Intuit, a leading enterprise customer for AWS, is a creator of business and financial management solutions. For more information on how Intuit partners with AWS, see our previous blog post, Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS. Apache Kafka is an open-source, distributed streaming platform that enables you to build real-time streaming applications.

The best practices described in this post are based on our experience in running and operating large-scale Kafka clusters on AWS for more than two years. Our intent for this post is to help AWS customers who are currently running Kafka on AWS, and also customers who are considering migrating on-premises Kafka deployments to AWS.

AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed.

Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments. However, given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this blog post, we cover the following aspects of running Kafka clusters on AWS:

  • Deployment considerations and patterns
  • Storage options
  • Instance types
  • Networking
  • Upgrades
  • Performance tuning
  • Monitoring
  • Security
  • Backup and restore

Note: While implementing Kafka clusters in a production environment, make sure also to consider factors like your number of messages, message size, monitoring, failure handling, and any operational issues.

Deployment considerations and patterns

In this section, we discuss various deployment options available for Kafka on AWS, along with pros and cons of each option. A successful deployment starts with thoughtful consideration of these options. Considering availability, consistency, and operational overhead of the deployment helps when choosing the right option.

Single AWS Region, Three Availability Zones, All Active

One typical deployment pattern (all active) is in a single AWS Region with three Availability Zones (AZs). One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and consumer instances as shown in the illustration following.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers and Kafka cluster are deployed on each AZ.
  • Data is distributed evenly across three Kafka clusters by using Elastic Load Balancer.
  • Kafka consumers aggregate data from all three Kafka clusters.

Kafka cluster failover occurs this way:

  • Mark down all Kafka producers
  • Stop consumers
  • Debug and restack Kafka
  • Restart consumers
  • Restart Kafka producers

Following are the pros and cons of this pattern.

Pros Cons
  • Highly available
  • Can sustain the failure of two AZs
  • No message loss during failover
  • Simple deployment

 

  • Very high operational overhead:
    • All changes need to be deployed three times, one for each Kafka cluster
    • Maintaining and monitoring three Kafka clusters
    • Maintaining and monitoring three consumer clusters

A restart is required for patching and upgrading brokers in a Kafka cluster. In this approach, a rolling upgrade is done separately for each cluster.

Single Region, Three Availability Zones, Active-Standby

Another typical deployment pattern (active-standby) is in a single AWS Region with a single Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka mirroring with MirrorMaker to replicate messages between any two clusters.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers are deployed on all three AZs.
  • Only one Kafka cluster is deployed across three AZs (active).
  • ZooKeeper instances are deployed on each AZ.
  • Brokers are spread evenly across all three AZs.
  • Kafka consumers can be deployed across all three AZs.
  • Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.

Kafka cluster failover occurs this way:

  • Switch traffic to standby Kafka producers cluster and Kafka cluster.
  • Restart consumers to consume from standby Kafka cluster.

Following are the pros and cons of this pattern.

Pros Cons
  • Less operational overhead when compared to the first option
  • Only one Kafka cluster to manage and consume data from
  • Can handle single AZ failures without activating a standby Kafka cluster
  • Added latency due to cross-AZ data transfer among Kafka brokers
  • For Kafka versions before 0.10, replicas for topic partitions have to be assigned so they’re distributed to the brokers on different AZs (rack-awareness)
  • The cluster can become unavailable in case of a network glitch, where ZooKeeper does not see Kafka brokers
  • Possibility of in-transit message loss during failover

Intuit recommends using a single Kafka cluster in one AWS Region, with brokers distributing across three AZs (single region, three AZs). This approach offers stronger fault tolerance than otherwise, because a failed AZ won’t cause Kafka downtime.

Storage options

There are two storage options for file storage in Amazon EC2:

Ephemeral storage is local to the Amazon EC2 instance. It can provide high IOPS based on the instance type. On the other hand, Amazon EBS volumes offer higher resiliency and you can configure IOPS based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. Your choice of storage is closely related to the type of workload supported by your Kafka cluster.

Kafka provides built-in fault tolerance by replicating data partitions across a configurable number of instances. If a broker fails, you can recover it by fetching all the data from other brokers in the cluster that host the other replicas. Depending on the size of the data transfer, it can affect recovery process and network traffic. These in turn eventually affect the cluster’s performance.

The following table contrasts the benefits of using an instance store versus using EBS for storage.

Instance store EBS
  • Instance storage is recommended for large- and medium-sized Kafka clusters. For a large cluster, read/write traffic is distributed across a high number of brokers, so the loss of a broker has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important, but a failed broker takes longer and requires more network traffic for a smaller Kafka cluster.
  • Storage-optimized instances like h1, i3, and d2 are an ideal choice for distributed applications like Kafka.

 

  • The primary advantage of using EBS in a Kafka deployment is that it significantly reduces data-transfer traffic when a broker fails or must be replaced. The replacement broker joins the cluster much faster.
  • Data stored on EBS is persisted in case of an instance failure or termination. The broker’s data stored on an EBS volume remains intact, and you can mount the EBS volume to a new EC2 instance. Most of the replicated data for the replacement broker is already available in the EBS volume and need not be copied over the network from another broker. Only the changes made after the original broker failure need to be transferred across the network. That makes this process much faster.

 

 

Intuit chose EBS because of their frequent instance restacking requirements and also other benefits provided by EBS.

Generally, Kafka deployments use a replication factor of three. EBS offers replication within their service, so Intuit chose a replication factor of two instead of three.

Instance types

The choice of instance types is generally driven by the type of storage required for your streaming applications on a Kafka cluster. If your application requires ephemeral storage, h1, i3, and d2 instances are your best option.

Intuit used r3.xlarge instances for their brokers and r3.large for ZooKeeper, with ST1 (throughput optimized HDD) EBS for their Kafka cluster.

Here are sample benchmark numbers from Intuit tests.

Configuration Broker bytes (MB/s)
  • r3.xlarge
  • ST1 EBS
  • 12 brokers
  • 12 partitions

 

Aggregate 346.9

If you need EBS storage, then AWS has a newer-generation r4 instance. The r4 instance is superior to R3 in many ways:

  • It has a faster processor (Broadwell).
  • EBS is optimized by default.
  • It features networking based on Elastic Network Adapter (ENA), with up to 10 Gbps on smaller sizes.
  • It costs 20 percent less than R3.

Note: It’s always best practice to check for the latest changes in instance types.

Networking

The network plays a very important role in a distributed system like Kafka. A fast and reliable network ensures that nodes can communicate with each other easily. The available network throughput controls the maximum amount of traffic that Kafka can handle. Network throughput, combined with disk storage, is often the governing factor for cluster sizing.

If you expect your cluster to receive high read/write traffic, select an instance type that offers 10-Gb/s performance.

In addition, choose an option that keeps interbroker network traffic on the private subnet, because this approach allows clients to connect to the brokers. Communication between brokers and clients uses the same network interface and port. For more details, see the documentation about IP addressing for EC2 instances.

If you are deploying in more than one AWS Region, you can connect the two VPCs in the two AWS Regions using cross-region VPC peering. However, be aware of the networking costs associated with cross-AZ deployments.

Upgrades

Kafka has a history of not being backward compatible, but its support of backward compatibility is getting better. During a Kafka upgrade, you should keep your producer and consumer clients on a version equal to or lower than the version you are upgrading from. After the upgrade is finished, you can start using a new protocol version and any new features it supports. There are three upgrade approaches available, discussed following.

Rolling or in-place upgrade

In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into consideration the recommendations for doing rolling restarts to avoid downtime for end users.

Downtime upgrade

If you can afford the downtime, you can take your entire cluster down, upgrade each Kafka broker, and then restart the cluster.

Blue/green upgrade

Intuit followed the blue/green deployment model for their workloads, as described following.

If you can afford to create a separate Kafka cluster and upgrade it, we highly recommend the blue/green upgrade scenario. In this scenario, we recommend that you keep your clusters up-to-date with the latest Kafka version. For additional details on Kafka version upgrades or more details, see the Kafka upgrade documentation.

The following illustration shows a blue/green upgrade.

In this scenario, the upgrade plan works like this:

  • Create a new Kafka cluster on AWS.
  • Create a new Kafka producers stack to point to the new Kafka cluster.
  • Create topics on the new Kafka cluster.
  • Test the green deployment end to end (sanity check).
  • Using Amazon Route 53, change the new Kafka producers stack on AWS to point to the new green Kafka environment that you have created.

The roll-back plan works like this:

  • Switch Amazon Route 53 to the old Kafka producers stack on AWS to point to the old Kafka environment.

For additional details on blue/green deployment architecture using Kafka, see the re:Invent presentation Leveraging the Cloud with a Blue-Green Deployment Architecture.

Performance tuning

You can tune Kafka performance in multiple dimensions. Following are some best practices for performance tuning.

 These are some general performance tuning techniques:

  • If throughput is less than network capacity, try the following:
    • Add more threads
    • Increase batch size
    • Add more producer instances
    • Add more partitions
  • To improve latency when acks =-1, increase your num.replica.fetches value.
  • For cross-AZ data transfer, tune your buffer settings for sockets and for OS TCP.
  • Make sure that num.io.threads is greater than the number of disks dedicated for Kafka.
  • Adjust num.network.threads based on the number of producers plus the number of consumers plus the replication factor.
  • Your message size affects your network bandwidth. To get higher performance from a Kafka cluster, select an instance type that offers 10 Gb/s performance.

For Java and JVM tuning, try the following:

  • Minimize GC pauses by using the Oracle JDK, which uses the new G1 garbage-first collector.
  • Try to keep the Kafka heap size below 4 GB.

Monitoring

Knowing whether a Kafka cluster is working correctly in a production environment is critical. Sometimes, just knowing that the cluster is up is enough, but Kafka applications have many moving parts to monitor. In fact, it can easily become confusing to understand what’s important to watch and what you can set aside. Items to monitor range from simple metrics about the overall rate of traffic, to producers, consumers, brokers, controller, ZooKeeper, topics, partitions, messages, and so on.

For monitoring, Intuit used several tools, including Newrelec, Wavefront, Amazon CloudWatch, and AWS CloudTrail. Our recommended monitoring approach follows.

For system metrics, we recommend that you monitor:

  • CPU load
  • Network metrics
  • File handle usage
  • Disk space
  • Disk I/O performance
  • Garbage collection
  • ZooKeeper

For producers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

For consumers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

Security

Like most distributed systems, Kafka provides the mechanisms to transfer data with relatively high security across the components involved. Depending on your setup, security might involve different services such as encryption, Kerberos, Transport Layer Security (TLS) certificates, and advanced access control list (ACL) setup in brokers and ZooKeeper. The following tells you more about the Intuit approach. For details on Kafka security not covered in this section, see the Kafka documentation.

Encryption at rest

For EBS-backed EC2 instances, you can enable encryption at rest by using Amazon EBS volumes with encryption enabled. Amazon EBS uses AWS Key Management Service (AWS KMS) for encryption. For more details, see Amazon EBS Encryption in the EBS documentation. For instance store–backed EC2 instances, you can enable encryption at rest by using Amazon EC2 instance store encryption.

Encryption in transit

Kafka uses TLS for client and internode communications.

Authentication

Authentication of connections to brokers from clients (producers and consumers) to other brokers and tools uses either Secure Sockets Layer (SSL) or Simple Authentication and Security Layer (SASL).

Kafka supports Kerberos authentication. If you already have a Kerberos server, you can add Kafka to your current configuration.

Authorization

In Kafka, authorization is pluggable and integration with external authorization services is supported.

Backup and restore

The type of storage used in your deployment dictates your backup and restore strategy.

The best way to back up a Kafka cluster based on instance storage is to set up a second cluster and replicate messages using MirrorMaker. Kafka’s mirroring feature makes it possible to maintain a replica of an existing Kafka cluster. Depending on your setup and requirements, your backup cluster might be in the same AWS Region as your main cluster or in a different one.

For EBS-based deployments, you can enable automatic snapshots of EBS volumes to back up volumes. You can easily create new EBS volumes from these snapshots to restore. We recommend storing backup files in Amazon S3.

For more information on how to back up in Kafka, see the Kafka documentation.

Conclusion

In this post, we discussed several patterns for running Kafka in the AWS Cloud. AWS also provides an alternative managed solution with Amazon Kinesis Data Streams, there are no servers to manage or scaling cliffs to worry about, you can scale the size of your streaming pipeline in seconds without downtime, data replication across availability zones is automatic, you benefit from security out of the box, Kinesis Data Streams is tightly integrated with a wide variety of AWS services like Lambda, Redshift, Elasticsearch and it supports open source frameworks like Storm, Spark, Flink, and more. You may refer to kafka-kinesis connector.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics.


About the Author

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

Best Practices for Running Apache Cassandra on Amazon EC2

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-cassandra-on-amazon-ec2/

Apache Cassandra is a commonly used, high performance NoSQL database. AWS customers that currently maintain Cassandra on-premises may want to take advantage of the scalability, reliability, security, and economic benefits of running Cassandra on Amazon EC2.

Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud. When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this post, we outline three Cassandra deployment options, as well as provide guidance about determining the best practices for your use case in the following areas:

  • Cassandra resource overview
  • Deployment considerations
  • Storage options
  • Networking
  • High availability and resiliency
  • Maintenance
  • Security

Before we jump into best practices for running Cassandra on AWS, we should mention that we have many customers who decided to use DynamoDB instead of managing their own Cassandra cluster. DynamoDB is fully managed, serverless, and provides multi-master cross-region replication, encryption at rest, and managed backup and restore. Integration with AWS Identity and Access Management (IAM) enables DynamoDB customers to implement fine-grained access control for their data security needs.

Several customers who have been using large Cassandra clusters for many years have moved to DynamoDB to eliminate the complications of administering Cassandra clusters and maintaining high availability and durability themselves. Gumgum.com is one customer who migrated to DynamoDB and observed significant savings. For more information, see Moving to Amazon DynamoDB from Hosted Cassandra: A Leap Towards 60% Cost Saving per Year.

AWS provides options, so you’re covered whether you want to run your own NoSQL Cassandra database, or move to a fully managed, serverless DynamoDB database.

Cassandra resource overview

Here’s a short introduction to standard Cassandra resources and how they are implemented with AWS infrastructure. If you’re already familiar with Cassandra or AWS deployments, this can serve as a refresher.

Resource Cassandra AWS
Cluster

A single Cassandra deployment.

 

This typically consists of multiple physical locations, keyspaces, and physical servers.

A logical deployment construct in AWS that maps to an AWS CloudFormation StackSet, which consists of one or many CloudFormation stacks to deploy Cassandra.
Datacenter A group of nodes configured as a single replication group.

A logical deployment construct in AWS.

 

A datacenter is deployed with a single CloudFormation stack consisting of Amazon EC2 instances, networking, storage, and security resources.

Rack

A collection of servers.

 

A datacenter consists of at least one rack. Cassandra tries to place the replicas on different racks.

A single Availability Zone.
Server/node A physical virtual machine running Cassandra software. An EC2 instance.
Token Conceptually, the data managed by a cluster is represented as a ring. The ring is then divided into ranges equal to the number of nodes. Each node being responsible for one or more ranges of the data. Each node gets assigned with a token, which is essentially a random number from the range. The token value determines the node’s position in the ring and its range of data. Managed within Cassandra.
Virtual node (vnode) Responsible for storing a range of data. Each vnode receives one token in the ring. A cluster (by default) consists of 256 tokens, which are uniformly distributed across all servers in the Cassandra datacenter. Managed within Cassandra.
Replication factor The total number of replicas across the cluster. Managed within Cassandra.

Deployment considerations

One of the many benefits of deploying Cassandra on Amazon EC2 is that you can automate many deployment tasks. In addition, AWS includes services, such as CloudFormation, that allow you to describe and provision all your infrastructure resources in your cloud environment.

We recommend orchestrating each Cassandra ring with one CloudFormation template. If you are deploying in multiple AWS Regions, you can use a CloudFormation StackSet to manage those stacks. All the maintenance actions (scaling, upgrading, and backing up) should be scripted with an AWS SDK. These may live as standalone AWS Lambda functions that can be invoked on demand during maintenance.

You can get started by following the Cassandra Quick Start deployment guide. Keep in mind that this guide does not address the requirements to operate a production deployment and should be used only for learning more about Cassandra.

Deployment patterns

In this section, we discuss various deployment options available for Cassandra in Amazon EC2. A successful deployment starts with thoughtful consideration of these options. Consider the amount of data, network environment, throughput, and availability.

  • Single AWS Region, 3 Availability Zones
  • Active-active, multi-Region
  • Active-standby, multi-Region

Single region, 3 Availability Zones

In this pattern, you deploy the Cassandra cluster in one AWS Region and three Availability Zones. There is only one ring in the cluster. By using EC2 instances in three zones, you ensure that the replicas are distributed uniformly in all zones.

To ensure the even distribution of data across all Availability Zones, we recommend that you distribute the EC2 instances evenly in all three Availability Zones. The number of EC2 instances in the cluster is a multiple of three (the replication factor).

This pattern is suitable in situations where the application is deployed in one Region or where deployments in different Regions should be constrained to the same Region because of data privacy or other legal requirements.

Pros Cons

●     Highly available, can sustain failure of one Availability Zone.

●     Simple deployment

●     Does not protect in a situation when many of the resources in a Region are experiencing intermittent failure.

 

Active-active, multi-Region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster are deployed in more than one Region.

Pros Cons

●     No data loss during failover.

●     Highly available, can sustain when many of the resources in a Region are experiencing intermittent failures.

●     Read/write traffic can be localized to the closest Region for the user for lower latency and higher performance.

●     High operational overhead

●     The second Region effectively doubles the cost

 

Active-standby, multi-region

In this pattern, you deploy two rings in two different Regions and link them. The VPCs in the two Regions are peered so that data can be replicated between two rings.

However, the second Region does not receive traffic from the applications. It only functions as a secondary location for disaster recovery reasons. If the primary Region is not available, the second Region receives traffic.

We recommend that the two rings in the two Regions be identical in nature, having the same number of nodes, instance types, and storage configuration.

This pattern is most suitable when the applications using the Cassandra cluster require low recovery point objective (RPO) and recovery time objective (RTO).

Pros Cons

●     No data loss during failover.

●     Highly available, can sustain failure or partitioning of one whole Region.

●     High operational overhead.

●     High latency for writes for eventual consistency.

●     The second Region effectively doubles the cost.

Storage options

In on-premises deployments, Cassandra deployments use local disks to store data. There are two storage options for EC2 instances:

Your choice of storage is closely related to the type of workload supported by the Cassandra cluster. Instance store works best for most general purpose Cassandra deployments. However, in certain read-heavy clusters, Amazon EBS is a better choice.

The choice of instance type is generally driven by the type of storage:

  • If ephemeral storage is required for your application, a storage-optimized (I3) instance is the best option.
  • If your workload requires Amazon EBS, it is best to go with compute-optimized (C5) instances.
  • Burstable instance types (T2) don’t offer good performance for Cassandra deployments.

Instance store

Ephemeral storage is local to the EC2 instance. It may provide high input/output operations per second (IOPs) based on the instance type. An SSD-based instance store can support up to 3.3M IOPS in I3 instances. This high performance makes it an ideal choice for transactional or write-intensive applications such as Cassandra.

In general, instance storage is recommended for transactional, large, and medium-size Cassandra clusters. For a large cluster, read/write traffic is distributed across a higher number of nodes, so the loss of one node has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important.

As an example, for a cluster with 100 nodes, the loss of 1 node is 3.33% loss (with a replication factor of 3). Similarly, for a cluster with 10 nodes, the loss of 1 node is 33% less capacity (with a replication factor of 3).

  Ephemeral storage Amazon EBS Comments

IOPS

(translates to higher query performance)

Up to 3.3M on I3

80K/instance

10K/gp2/volume

32K/io1/volume

This results in a higher query performance on each host. However, Cassandra implicitly scales well in terms of horizontal scale. In general, we recommend scaling horizontally first. Then, scale vertically to mitigate specific issues.

 

Note: 3.3M IOPS is observed with 100% random read with a 4-KB block size on Amazon Linux.

AWS instance types I3 Compute optimized, C5 Being able to choose between different instance types is an advantage in terms of CPU, memory, etc., for horizontal and vertical scaling.
Backup/ recovery Custom Basic building blocks are available from AWS.

Amazon EBS offers distinct advantage here. It is small engineering effort to establish a backup/restore strategy.

a) In case of an instance failure, the EBS volumes from the failing instance are attached to a new instance.

b) In case of an EBS volume failure, the data is restored by creating a new EBS volume from last snapshot.

Amazon EBS

EBS volumes offer higher resiliency, and IOPs can be configured based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. EBS volumes can support up to 32K IOPS per volume and up to 80K IOPS per instance in RAID configuration. They have an annualized failure rate (AFR) of 0.1–0.2%, which makes EBS volumes 20 times more reliable than typical commodity disk drives.

The primary advantage of using Amazon EBS in a Cassandra deployment is that it reduces data-transfer traffic significantly when a node fails or must be replaced. The replacement node joins the cluster much faster. However, Amazon EBS could be more expensive, depending on your data storage needs.

Cassandra has built-in fault tolerance by replicating data to partitions across a configurable number of nodes. It can not only withstand node failures but if a node fails, it can also recover by copying data from other replicas into a new node. Depending on your application, this could mean copying tens of gigabytes of data. This adds additional delay to the recovery process, increases network traffic, and could possibly impact the performance of the Cassandra cluster during recovery.

Data stored on Amazon EBS is persisted in case of an instance failure or termination. The node’s data stored on an EBS volume remains intact and the EBS volume can be mounted to a new EC2 instance. Most of the replicated data for the replacement node is already available in the EBS volume and won’t need to be copied over the network from another node. Only the changes made after the original node failed need to be transferred across the network. That makes this process much faster.

EBS volumes are snapshotted periodically. So, if a volume fails, a new volume can be created from the last known good snapshot and be attached to a new instance. This is faster than creating a new volume and coping all the data to it.

Most Cassandra deployments use a replication factor of three. However, Amazon EBS does its own replication under the covers for fault tolerance. In practice, EBS volumes are about 20 times more reliable than typical disk drives. So, it is possible to go with a replication factor of two. This not only saves cost, but also enables deployments in a region that has two Availability Zones.

EBS volumes are recommended in case of read-heavy, small clusters (fewer nodes) that require storage of a large amount of data. Keep in mind that the Amazon EBS provisioned IOPS could get expensive. General purpose EBS volumes work best when sized for required performance.

Networking

If your cluster is expected to receive high read/write traffic, select an instance type that offers 10–Gb/s performance. As an example, i3.8xlarge and c5.9xlarge both offer 10–Gb/s networking performance. A smaller instance type in the same family leads to a relatively lower networking throughput.

Cassandra generates a universal unique identifier (UUID) for each node based on IP address for the instance. This UUID is used for distributing vnodes on the ring.

In the case of an AWS deployment, IP addresses are assigned automatically to the instance when an EC2 instance is created. With the new IP address, the data distribution changes and the whole ring has to be rebalanced. This is not desirable.

To preserve the assigned IP address, use a secondary elastic network interface with a fixed IP address. Before swapping an EC2 instance with a new one, detach the secondary network interface from the old instance and attach it to the new one. This way, the UUID remains same and there is no change in the way that data is distributed in the cluster.

If you are deploying in more than one region, you can connect the two VPCs in two regions using cross-region VPC peering.

High availability and resiliency

Cassandra is designed to be fault-tolerant and highly available during multiple node failures. In the patterns described earlier in this post, you deploy Cassandra to three Availability Zones with a replication factor of three. Even though it limits the AWS Region choices to the Regions with three or more Availability Zones, it offers protection for the cases of one-zone failure and network partitioning within a single Region. The multi-Region deployments described earlier in this post protect when many of the resources in a Region are experiencing intermittent failure.

Resiliency is ensured through infrastructure automation. The deployment patterns all require a quick replacement of the failing nodes. In the case of a regionwide failure, when you deploy with the multi-Region option, traffic can be directed to the other active Region while the infrastructure is recovering in the failing Region. In the case of unforeseen data corruption, the standby cluster can be restored with point-in-time backups stored in Amazon S3.

Maintenance

In this section, we look at ways to ensure that your Cassandra cluster is healthy:

  • Scaling
  • Upgrades
  • Backup and restore

Scaling

Cassandra is horizontally scaled by adding more instances to the ring. We recommend doubling the number of nodes in a cluster to scale up in one scale operation. This leaves the data homogeneously distributed across Availability Zones. Similarly, when scaling down, it’s best to halve the number of instances to keep the data homogeneously distributed.

Cassandra is vertically scaled by increasing the compute power of each node. Larger instance types have proportionally bigger memory. Use deployment automation to swap instances for bigger instances without downtime or data loss.

Upgrades

All three types of upgrades (Cassandra, operating system patching, and instance type changes) follow the same rolling upgrade pattern.

In this process, you start with a new EC2 instance and install software and patches on it. Thereafter, remove one node from the ring. For more information, see Cassandra cluster Rolling upgrade. Then, you detach the secondary network interface from one of the EC2 instances in the ring and attach it to the new EC2 instance. Restart the Cassandra service and wait for it to sync. Repeat this process for all nodes in the cluster.

Backup and restore

Your backup and restore strategy is dependent on the type of storage used in the deployment. Cassandra supports snapshots and incremental backups. When using instance store, a file-based backup tool works best. Customers use rsync or other third-party products to copy data backups from the instance to long-term storage. For more information, see Backing up and restoring data in the DataStax documentation. This process has to be repeated for all instances in the cluster for a complete backup. These backup files are copied back to new instances to restore. We recommend using S3 to durably store backup files for long-term storage.

For Amazon EBS based deployments, you can enable automated snapshots of EBS volumes to back up volumes. New EBS volumes can be easily created from these snapshots for restoration.

Security

We recommend that you think about security in all aspects of deployment. The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation.

Encryption at rest

Encryption at rest can be achieved by using EBS volumes with encryption enabled. Amazon EBS uses AWS KMS for encryption. For more information, see Amazon EBS Encryption.

Instance store–based deployments require using an encrypted file system or an AWS partner solution. If you are using DataStax Enterprise, it supports transparent data encryption.

Encryption in transit

Cassandra uses Transport Layer Security (TLS) for client and internode communications.

Authentication

The security mechanism is pluggable, which means that you can easily swap out one authentication method for another. You can also provide your own method of authenticating to Cassandra, such as a Kerberos ticket, or if you want to store passwords in a different location, such as an LDAP directory.

Authorization

The authorizer that’s plugged in by default is org.apache.cassandra.auth.Allow AllAuthorizer. Cassandra also provides a role-based access control (RBAC) capability, which allows you to create roles and assign permissions to these roles.

Conclusion

In this post, we discussed several patterns for running Cassandra in the AWS Cloud. This post describes how you can manage Cassandra databases running on Amazon EC2. AWS also provides managed offerings for a number of databases. To learn more, see Purpose-built databases for all your application needs.

If you have questions or suggestions, please comment below.


Additional Reading

If you found this post useful, be sure to check out Analyze Your Data on Amazon DynamoDB with Apache Spark and Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight.


About the Authors

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.

 

 

 

Provanshu Dey is a Senior IoT Consultant with AWS Professional Services. He works on highly scalable and reliable IoT, data and machine learning solutions with our customers. In his spare time, he enjoys spending time with his family and tinkering with electronics & gadgets.

 

 

 

New – Encryption at Rest for DynamoDB

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-encryption-at-rest-for-dynamodb/

At AWS re:Invent 2017, Werner encouraged his audience to “Dance like nobody is watching, and to encrypt like everyone is:

The AWS team is always eager to add features that make it easier for you to protect your sensitive data and to help you to achieve your compliance objectives. For example, in 2017 we launched encryption at rest for SQS and EFS, additional encryption options for S3, and server-side encryption of Kinesis Data Streams.

Today we are giving you another data protection option with the introduction of encryption at rest for Amazon DynamoDB. You simply enable encryption when you create a new table and DynamoDB takes care of the rest. Your data (tables, local secondary indexes, and global secondary indexes) will be encrypted using AES-256 and a service-default AWS Key Management Service (KMS) key. The encryption adds no storage overhead and is completely transparent; you can insert, query, scan, and delete items as before. The team did not observe any changes in latency after enabling encryption and running several different workloads on an encrypted DynamoDB table.

Creating an Encrypted Table
You can create an encrypted table from the AWS Management Console, API (CreateTable), or CLI (create-table). I’ll use the console! I enter the name and set up the primary key as usual:

Before proceeding, I uncheck Use default settings, scroll down to the Encrypytion section, and check Enable encryption. Then I click Create and my table is created in encrypted form:

I can see the encryption setting for the table at a glance:

When my compliance team asks me to show them how DynamoDB uses the key to encrypt the data, I can create a AWS CloudTrail trail, insert an item, and then scan the table to see the calls to the AWS KMS API. Here’s an extract from the trail:

{
  "eventTime": "2018-01-24T00:06:34Z",
  "eventSource": "kms.amazonaws.com",
  "eventName": "Decrypt",
  "awsRegion": "us-west-2",
  "sourceIPAddress": "dynamodb.amazonaws.com",
  "userAgent": "dynamodb.amazonaws.com",
  "requestParameters": {
    "encryptionContext": {
      "aws:dynamodb:tableName": "reg-users",
      "aws:dynamodb:subscriberId": "1234567890"
    }
  },
  "responseElements": null,
  "requestID": "7072def1-009a-11e8-9ab9-4504c26bd391",
  "eventID": "3698678a-d04e-48c7-96f2-3d734c5c7903",
  "readOnly": true,
  "resources": [
    {
      "ARN": "arn:aws:kms:us-west-2:1234567890:key/e7bd721d-37f3-4acd-bec5-4d08c765f9f5",
      "accountId": "1234567890",
      "type": "AWS::KMS::Key"
    }
  ]
}

Available Now
This feature is available now in the US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) Regions and you can start using it today.

There’s no charge for the encryption; you will be charged for the calls that DynamoDB makes to AWS KMS on your behalf.

Jeff;

 

How to Encrypt Amazon S3 Objects with the AWS SDK for Ruby

Post Syndicated from Doug Schwartz original https://aws.amazon.com/blogs/security/how-to-encrypt-amazon-s3-objects-with-the-aws-sdk-for-ruby/

AWS KMS image

Recently, Amazon announced some new Amazon S3 encryption and security features. The AWS Blog post showed how to use the Amazon S3 console to take advantage of these new features. However, if you have a large number of Amazon S3 buckets, using the console to implement these features could take hours, if not days. As an alternative, I created documentation topics in the AWS SDK for Ruby Developer Guide that include code examples showing you how to use the new Amazon S3 encryption features using the AWS SDK for Ruby.

What are my encryption options?

You can encrypt Amazon S3 bucket objects on a server or on a client:

  • When you encrypt objects on a server, you request that Amazon S3 encrypt the objects before saving them to disk in data centers and decrypt the objects when you download them. The main advantage of this approach is that Amazon S3 manages the entire encryption process.
  • When you encrypt objects on a client, you encrypt the objects before you upload them to Amazon S3. In this case, you manage the encryption process, the encryption keys, and related tools. Use this option when:
    • Company policy and standards require it.
    • You already have a development process in place that meets your needs.

    Encrypting on the client has always been available, but you should know the following points:

    • You must be diligent about protecting your encryption keys, which is analogous to having a burglar-proof lock on your front door. If you leave a key under the mat, your security is compromised.
    • If you lose your encryption keys, you won’t be able to decrypt your data.

    If you encrypt objects on the client, we strongly recommend that you use an AWS Key Management Service (AWS KMS) managed customer master key (CMK)

How to use encryption on a server

You can specify that Amazon S3 automatically encrypts objects as you upload them to a bucket or require that objects uploaded to an Amazon S3 bucket include encryption on a server before they are uploaded to an Amazon S3 bucket.

The advantage of these settings is that when you specify them, you ensure that objects uploaded to Amazon S3 are encrypted. Alternatively, you can have Amazon S3 encrypt individual objects on the server as you upload them to a bucket or encrypt them on the server with your own key as you upload them to a bucket.

The AWS SDK for Ruby Developer Guide now contains the following topics that explain your encryption options on a server:

How to use encryption on a client

You can encrypt objects on a client before you upload them to a bucket and decrypt them after you download them from a bucket by using the Amazon S3 encryption client.

The AWS SDK for Ruby Developer Guide now contains the following topics that explain your encryption options on the client:

Note: The Amazon S3 encryption client in the AWS SDK for Ruby is compatible with other Amazon S3 encryption clients, but it is not compatible with other AWS client-side encryption libraries, including the AWS Encryption SDK and the Amazon DynamoDB encryption client for Java. Each library returns a different ciphertext (“encrypted message”) format, so you can’t use one library to encrypt objects and a different library to decrypt them. For more information, see Protecting Data Using Client-Side Encryption.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about encrypting objects on servers and clients, start a new thread on the Amazon S3 forum or contact AWS Support.

– Doug

How to Enhance the Security of Sensitive Customer Data by Using Amazon CloudFront Field-Level Encryption

Post Syndicated from Alex Tomic original https://aws.amazon.com/blogs/security/how-to-enhance-the-security-of-sensitive-customer-data-by-using-amazon-cloudfront-field-level-encryption/

Amazon CloudFront is a web service that speeds up distribution of your static and dynamic web content to end users through a worldwide network of edge locations. CloudFront provides a number of benefits and capabilities that can help you secure your applications and content while meeting compliance requirements. For example, you can configure CloudFront to help enforce secure, end-to-end connections using HTTPS SSL/TLS encryption. You also can take advantage of CloudFront integration with AWS Shield for DDoS protection and with AWS WAF (a web application firewall) for protection against application-layer attacks, such as SQL injection and cross-site scripting.

Now, CloudFront field-level encryption helps secure sensitive data such as a customer phone numbers by adding another security layer to CloudFront HTTPS. Using this functionality, you can help ensure that sensitive information in a POST request is encrypted at CloudFront edge locations. This information remains encrypted as it flows to and beyond your origin servers that terminate HTTPS connections with CloudFront and throughout the application environment. In this blog post, we demonstrate how you can enhance the security of sensitive data by using CloudFront field-level encryption.

Note: This post assumes that you understand concepts and services such as content delivery networks, HTTP forms, public-key cryptography, CloudFrontAWS Lambda, and the AWS CLI. If necessary, you should familiarize yourself with these concepts and review the solution overview in the next section before proceeding with the deployment of this post’s solution.

How field-level encryption works

Many web applications collect and store data from users as those users interact with the applications. For example, a travel-booking website may ask for your passport number and less sensitive data such as your food preferences. This data is transmitted to web servers and also might travel among a number of services to perform tasks. However, this also means that your sensitive information may need to be accessed by only a small subset of these services (most other services do not need to access your data).

User data is often stored in a database for retrieval at a later time. One approach to protecting stored sensitive data is to configure and code each service to protect that sensitive data. For example, you can develop safeguards in logging functionality to ensure sensitive data is masked or removed. However, this can add complexity to your code base and limit performance.

Field-level encryption addresses this problem by ensuring sensitive data is encrypted at CloudFront edge locations. Sensitive data fields in HTTPS form POSTs are automatically encrypted with a user-provided public RSA key. After the data is encrypted, other systems in your architecture see only ciphertext. If this ciphertext unintentionally becomes externally available, the data is cryptographically protected and only designated systems with access to the private RSA key can decrypt the sensitive data.

It is critical to secure private RSA key material to prevent unauthorized access to the protected data. Management of cryptographic key material is a larger topic that is out of scope for this blog post, but should be carefully considered when implementing encryption in your applications. For example, in this blog post we store private key material as a secure string in the Amazon EC2 Systems Manager Parameter Store. The Parameter Store provides a centralized location for managing your configuration data such as plaintext data (such as database strings) or secrets (such as passwords) that are encrypted using AWS Key Management Service (AWS KMS). You may have an existing key management system in place that you can use, or you can use AWS CloudHSM. CloudHSM is a cloud-based hardware security module (HSM) that enables you to easily generate and use your own encryption keys in the AWS Cloud.

To illustrate field-level encryption, let’s look at a simple form submission where Name and Phone values are sent to a web server using an HTTP POST. A typical form POST would contain data such as the following.

POST / HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length:60

Name=Jane+Doe&Phone=404-555-0150

Instead of taking this typical approach, field-level encryption converts this data similar to the following.

POST / HTTP/1.1
Host: example.com
Content-Type: application/x-www-form-urlencoded
Content-Length: 1713

Name=Jane+Doe&Phone=AYABeHxZ0ZqWyysqxrB5pEBSYw4AAA...

To further demonstrate field-level encryption in action, this blog post includes a sample serverless application that you can deploy by using a CloudFormation template, which creates an application environment using CloudFront, Amazon API Gateway, and Lambda. The sample application is only intended to demonstrate field-level encryption functionality and is not intended for production use. The following diagram depicts the architecture and data flow of this sample application.

Sample application architecture and data flow

Diagram of the solution's architecture and data flow

Here is how the sample solution works:

  1. An application user submits an HTML form page with sensitive data, generating an HTTPS POST to CloudFront.
  2. Field-level encryption intercepts the form POST and encrypts sensitive data with the public RSA key and replaces fields in the form post with encrypted ciphertext. The form POST ciphertext is then sent to origin servers.
  3. The serverless application accepts the form post data containing ciphertext where sensitive data would normally be. If a malicious user were able to compromise your application and gain access to your data, such as the contents of a form, that user would see encrypted data.
  4. Lambda stores data in a DynamoDB table, leaving sensitive data to remain safely encrypted at rest.
  5. An administrator uses the AWS Management Console and a Lambda function to view the sensitive data.
  6. During the session, the administrator retrieves ciphertext from the DynamoDB table.
  7. The administrator decrypts sensitive data by using private key material stored in the EC2 Systems Manager Parameter Store.
  8. Decrypted sensitive data is transmitted over SSL/TLS via the AWS Management Console to the administrator for review.

Deployment walkthrough

The high-level steps to deploy this solution are as follows:

  1. Stage the required artifacts
    When deployment packages are used with Lambda, the zipped artifacts have to be placed in an S3 bucket in the target AWS Region for deployment. This step is not required if you are deploying in the US East (N. Virginia) Region because the package has already been staged there.
  2. Generate an RSA key pair
    Create a public/private key pair that will be used to perform the encrypt/decrypt functionality.
  3. Upload the public key to CloudFront and associate it with the field-level encryption configuration
    After you create the key pair, the public key is uploaded to CloudFront so that it can be used by field-level encryption.
  4. Launch the CloudFormation stack
    Deploy the sample application for demonstrating field-level encryption by using AWS CloudFormation.
  5. Add the field-level encryption configuration to the CloudFront distribution
    After you have provisioned the application, this step associates the field-level encryption configuration with the CloudFront distribution.
  6. Store the RSA private key in the Parameter Store
    Store the private key in the Parameter Store as a SecureString data type, which uses AWS KMS to encrypt the parameter value.

Deploy the solution

1. Stage the required artifacts

(If you are deploying in the US East [N. Virginia] Region, skip to Step 2, “Generate an RSA key pair.”)

Stage the Lambda function deployment package in an Amazon S3 bucket located in the AWS Region you are using for this solution. To do this, download the zipped deployment package and upload it to your in-region bucket. For additional information about uploading objects to S3, see Uploading Object into Amazon S3.

2. Generate an RSA key pair

In this section, you will generate an RSA key pair by using OpenSSL:

  1. Confirm access to OpenSSL.
    $ openssl version

    You should see version information similar to the following.

    OpenSSL <version> <date>

  1. Create a private key using the following command.
    $ openssl genrsa -out private_key.pem 2048

    The command results should look similar to the following.

    Generating RSA private key, 2048 bit long modulus
    ................................................................................+++
    ..........................+++
    e is 65537 (0x10001)
  1. Extract the public key from the private key by running the following command.
    $ openssl rsa -pubout -in private_key.pem -out public_key.pem

    You should see output similar to the following.

    writing RSA key
  1. Restrict access to the private key.$ chmod 600 private_key.pem Note: You will use the public and private key material in Steps 3 and 6 to configure the sample application.

3. Upload the public key to CloudFront and associate it with the field-level encryption configuration

Now that you have created the RSA key pair, you will use the AWS Management Console to upload the public key to CloudFront for use by field-level encryption. Complete the following steps to upload and configure the public key.

Note: Do not include spaces or special characters when providing the configuration values in this section.

  1. From the AWS Management Console, choose Services > CloudFront.
  2. In the navigation pane, choose Public Key and choose Add Public Key.
    Screenshot of adding a public key

Complete the Add Public Key configuration boxes:

  • Key Name: Type a name such as DemoPublicKey.
  • Encoded Key: Paste the contents of the public_key.pem file you created in Step 2c. Copy and paste the encoded key value for your public key, including the -----BEGIN PUBLIC KEY----- and -----END PUBLIC KEY----- lines.
  • Comment: Optionally add a comment.
  1. Choose Create.
  2. After adding at least one public key to CloudFront, the next step is to create a profile to tell CloudFront which fields of input you want to be encrypted. While still on the CloudFront console, choose Field-level encryption in the navigation pane.
  3. Under Profiles, choose Create profile.
    Screenshot of creating a profile

Complete the Create profile configuration boxes:

  • Name: Type a name such as FLEDemo.
  • Comment: Optionally add a comment.
  • Public key: Select the public key you configured in Step 4.b.
  • Provider name: Type a provider name such as FLEDemo.
    This information will be used when the form data is encrypted, and must be provided to applications that need to decrypt the data, along with the appropriate private key.
  • Pattern to match: Type phone. This configures field-level encryption to match based on the phone.
  1. Choose Save profile.
  2. Configurations include options for whether to block or forward a query to your origin in scenarios where CloudFront can’t encrypt the data. Under Encryption Configurations, choose Create configuration.
    Screenshot of creating a configuration

Complete the Create configuration boxes:

  • Comment: Optionally add a comment.
  • Content type: Enter application/x-www-form-urlencoded. This is a common media type for encoding form data.
  • Default profile ID: Select the profile you added in Step 3e.
  1. Choose Save configuration

4. Launch the CloudFormation stack

Launch the sample application by using a CloudFormation template that automates the provisioning process.

Input parameter Input parameter description
ProviderID Enter the Provider name you assigned in Step 3e. The ProviderID is used in field-level encryption configuration in CloudFront (letters and numbers only, no special characters)
PublicKeyName Enter the Key Name you assigned in Step 3b. This name is assigned to the public key in field-level encryption configuration in CloudFront (letters and numbers only, no special characters).
PrivateKeySSMPath Leave as the default: /cloudfront/field-encryption-sample/private-key
ArtifactsBucket The S3 bucket with artifact files (staged zip file with app code). Leave as default if deploying in us-east-1.
ArtifactsPrefix The path in the S3 bucket containing artifact files. Leave as default if deploying in us-east-1.

To finish creating the CloudFormation stack:

  1. Choose Next on the Select Template page, enter the input parameters and choose Next.
    Note: The Artifacts configuration needs to be updated only if you are deploying outside of us-east-1 (US East [N. Virginia]). See Step 1 for artifact staging instructions.
  2. On the Options page, accept the defaults and choose Next.
  3. On the Review page, confirm the details, choose the I acknowledge that AWS CloudFormation might create IAM resources check box, and then choose Create. (The stack will be created in approximately 15 minutes.)

5. Add the field-level encryption configuration to the CloudFront distribution

While still on the CloudFront console, choose Distributions in the navigation pane, and then:

    1. In the Outputs section of the FLE-Sample-App stack, look for CloudFrontDistribution and click the URL to open the CloudFront console.
    2. Choose Behaviors, choose the Default (*) behavior, and then choose Edit.
    3. For Field-level Encryption Config, choose the configuration you created in Step 3g.
      Screenshot of editing the default cache behavior
    4. Choose Yes, Edit.
    5. While still in the CloudFront distribution configuration, choose the General Choose Edit, scroll down to Distribution State, and change it to Enabled.
    6. Choose Yes, Edit.

6. Store the RSA private key in the Parameter Store

In this step, you store the private key in the EC2 Systems Manager Parameter Store as a SecureString data type, which uses AWS KMS to encrypt the parameter value. For more information about AWS KMS, see the AWS Key Management Service Developer Guide. You will need a working installation of the AWS CLI to complete this step.

  1. Store the private key in the Parameter Store with the AWS CLI by running the following command. You will find the <KMSKeyID> in the KMSKeyID in the CloudFormation stack Outputs. Substitute it for the placeholder in the following command.
    $ aws ssm put-parameter --type "SecureString" --name /cloudfront/field-encryption-sample/private-key --value file://private_key.pem --key-id "<KMSKeyID>"
    
    ------------------
    |  PutParameter  |
    +----------+-----+
    |  Version |  1  |
    +----------+-----+

  1. Verify the parameter. Your private key material should be accessible through the ssm get-parameter in the following command in the Value The key material has been truncated in the following output.
    $ aws ssm get-parameter --name /cloudfront/field-encryption-sample/private-key --with-decryption
    
    -----…
    
    ||  Value  |  -----BEGIN RSA PRIVATE KEY-----
    MIIEowIBAAKCAQEAwGRBGuhacmw+C73kM6Z…….

    Notice we use the —with decryption argument in this command. This returns the private key as cleartext.

    This completes the sample application deployment. Next, we show you how to see field-level encryption in action.

  1. Delete the private key from local storage. On Linux for example, using the shred command, securely delete the private key material from your workstation as shown below. You may also wish to store the private key material within an AWS CloudHSM or other protected location suitable for your security requirements. For production implementations, you also should implement key rotation policies.
    $ shred -zvu -n  100 private*.pem
    
    shred: private_encrypted_key.pem: pass 1/101 (random)...
    shred: private_encrypted_key.pem: pass 2/101 (dddddd)...
    shred: private_encrypted_key.pem: pass 3/101 (555555)...
    ….

Test the sample application

Use the following steps to test the sample application with field-level encryption:

  1. Open sample application in your web browser by clicking the ApplicationURL link in the CloudFormation stack Outputs. (for example, https:d199xe5izz82ea.cloudfront.net/prod/). Note that it may take several minutes for the CloudFront distribution to reach the Deployed Status from the previous step, during which time you may not be able to access the sample application.
  2. Fill out and submit the HTML form on the page:
    1. Complete the three form fields: Full Name, Email Address, and Phone Number.
    2. Choose Submit.
      Screenshot of completing the sample application form
      Notice that the application response includes the form values. The phone number returns the following ciphertext encryption using your public key. This ciphertext has been stored in DynamoDB.
      Screenshot of the phone number as ciphertext
  3. Execute the Lambda decryption function to download ciphertext from DynamoDB and decrypt the phone number using the private key:
    1. In the CloudFormation stack Outputs, locate DecryptFunction and click the URL to open the Lambda console.
    2. Configure a test event using the “Hello World” template.
    3. Choose the Test button.
  4. View the encrypted and decrypted phone number data.
    Screenshot of the encrypted and decrypted phone number data

Summary

In this blog post, we showed you how to use CloudFront field-level encryption to encrypt sensitive data at edge locations and help prevent access from unauthorized systems. The source code for this solution is available on GitHub. For additional information about field-level encryption, see the documentation.

If you have comments about this post, submit them in the “Comments” section below. If you have questions about or issues implementing this solution, please start a new thread on the CloudFront forum.

– Alex and Cameron

Introducing the New GDPR Center and “Navigating GDPR Compliance on AWS” Whitepaper

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/introducing-the-new-gdpr-center-and-navigating-gdpr-compliance-on-aws-whitepaper/

European Union flag

At AWS re:Invent 2017, the AWS Compliance team participated in excellent engagements with AWS customers about the General Data Protection Regulation (GDPR), including discussions that generated helpful input. Today, I am announcing resulting enhancements to our recently launched GDPR Center and the release of a new whitepaper, Navigating GDPR Compliance on AWS. The resources available on the GDPR Center are designed to give you GDPR basics, and provide some ideas as you work out the details of the regulation and find a path to compliance.

In this post, I focus on two of these new GDPR requirements in terms of articles in the GDPR, and explain some of the AWS services and other resources that can help you meet these requirements.

Background about the GDPR

The GDPR is a European privacy law that will become enforceable on May 25, 2018, and is intended to harmonize data protection laws throughout the European Union (EU) by applying a single data protection law that is binding throughout each EU member state. The GDPR not only applies to organizations located within the EU, but also to organizations located outside the EU if they offer goods or services to, or monitor the behavior of, EU data subjects. All AWS services will comply with the GDPR in advance of the May 25, 2018, enforcement date.

We are already seeing customers move personal data to AWS to help solve challenges in complying with the EU’s GDPR because of AWS’s advanced toolset for identifying, securing, and managing all types of data, including personal data. Steve Schmidt, the AWS CISO, has already written about the internal and external work we have been undertaking to help you use AWS services to meet your own GDPR compliance goals.

Article 25 – Data Protection by Design and by Default (Privacy by Design)

Privacy by Design is the integration of data privacy and compliance into the systems development process, enabling applications, systems, and accounts, among other things, to be secure by default. To secure your AWS account, we offer a script to evaluate your AWS account against the full Center for Internet Security (CIS) Amazon Web Services Foundations Benchmark 1.1. You can access this public benchmark on GitHub. Additionally, AWS Trusted Advisor is an online resource to help you improve security by optimizing your AWS environment. Among other things, Trusted Advisor lists a number of security-related controls you should be monitoring. AWS also offers AWS CloudTrail, a logging tool to track usage and API activity. Another example of tooling that enables data protection is Amazon Inspector, which includes a knowledge base of hundreds of rules (regularly updated by AWS security researchers) mapped to common security best practices and vulnerability definitions. Examples of built-in rules include checking for remote root login being enabled or vulnerable software versions installed. These and other tools enable you to design an environment that protects customer data by design.

An accurate inventory of all the GDPR-impacting data is important but sometimes difficult to assess. AWS has some advanced tooling, such as Amazon Macie, to help you determine where customer data is present in your AWS resources. Macie uses advanced machine learning to automatically discover and classify data so that you can protect data, per Article 25.

Article 32 – Security of Processing

You can use many AWS services and features to secure the processing of data regulated by the GDPR. Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch resources in a virtual network that you define. You have complete control over your virtual networking environment, including the selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. With Amazon VPC, you can make the Amazon Cloud a seamless extension of your existing on-premises resources.

AWS Key Management Service (AWS KMS) is a managed service that makes it easy for you to create and control the encryption keys used to encrypt your data, and uses hardware security modules (HSMs) to help protect your keys. Managing keys with AWS KMS allows you to choose to encrypt data either on the server side or the client side. AWS KMS is integrated with several other AWS services to help you protect the data you store with these services. AWS KMS is also integrated with CloudTrail to provide you with logs of all key usage to help meet your regulatory and compliance needs. You can also use the AWS Encryption SDK to correctly generate and use encryption keys, as well as protect keys after they have been used.

We also recently announced new encryption and security features for Amazon S3, including default encryption and a detailed inventory report. Services of this type as well as additional GDPR enablers will be published regularly on our GDPR Center.

Other resources

As you prepare for GDPR, you may want to visit our AWS Customer Compliance Center or Tools for Amazon Web Services to learn about options for building anything from small scripts that delete data to a full orchestration framework that uses AWS Code services.

-Chad

Glenn’s Take on re:Invent Part 2

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-part-2/

Glenn Gore here, Chief Architect for AWS. I’m in Las Vegas this week — with 43K others — for re:Invent 2017. We’ve got a lot of exciting announcements this week. I’m going to check in to the Architecture blog with my take on what’s interesting about some of the announcements from an cloud architectural perspective. My first post can be found here.

The Media and Entertainment industry has been a rapid adopter of AWS due to the scale, reliability, and low costs of our services. This has enabled customers to create new, online, digital experiences for their viewers ranging from broadcast to streaming to Over-the-Top (OTT) services that can be a combination of live, scheduled, or ad-hoc viewing, while supporting devices ranging from high-def TVs to mobile devices. Creating an end-to-end video service requires many different components often sourced from different vendors with different licensing models, which creates a complex architecture and a complex environment to support operationally.

AWS Media Services
Based on customer feedback, we have developed AWS Media Services to help simplify distribution of video content. AWS Media Services is comprised of five individual services that can either be used together to provide an end-to-end service or individually to work within existing deployments: AWS Elemental MediaConvert, AWS Elemental MediaLive, AWS Elemental MediaPackage, AWS Elemental MediaStore and AWS Elemental MediaTailor. These services can help you with everything from storing content safely and durably to setting up a live-streaming event in minutes without having to be concerned about the underlying infrastructure and scalability of the stream itself.

In my role, I participate in many AWS and industry events and often work with the production and event teams that put these shows together. With all the logistical tasks they have to deal with, the biggest question is often: “Will the live stream work?” Compounding this fear is the reality that, as users, we are also quick to jump on social media and make noise when a live stream drops while we are following along remotely. Worse is when I see event organizers actively selecting not to live stream content because of the risk of failure and and exposure — leading them to decide to take the safe option and not stream at all.

With AWS Media Services addressing many of the issues around putting together a high-quality media service, live streaming, and providing access to a library of content through a variety of mechanisms, I can’t wait to see more event teams use live streaming without the concern and worry I’ve seen in the past. I am excited for what this also means for non-media companies, as video becomes an increasingly common way of sharing information and adding a more personalized touch to internally- and externally-facing content.

AWS Media Services will allow you to focus more on the content and not worry about the platform. Awesome!

Amazon Neptune
As a civilization, we have been developing new ways to record and store information and model the relationships between sets of information for more than a thousand years. Government census data, tax records, births, deaths, and marriages were all recorded on medium ranging from knotted cords in the Inca civilization, clay tablets in ancient Babylon, to written texts in Western Europe during the late Middle Ages.

One of the first challenges of computing was figuring out how to store and work with vast amounts of information in a programmatic way, especially as the volume of information was increasing at a faster rate than ever before. We have seen different generations of how to organize this information in some form of database, ranging from flat files to the Information Management System (IMS) used in the 1960s for the Apollo space program, to the rise of the relational database management system (RDBMS) in the 1970s. These innovations drove a lot of subsequent innovations in information management and application development as we were able to move from thousands of records to millions and billions.

Today, as architects and developers, we have a vast variety of database technologies to select from, which have different characteristics that are optimized for different use cases:

  • Relational databases are well understood after decades of use in the majority of companies who required a database to store information. Amazon Relational Database (Amazon RDS) supports many popular relational database engines such as MySQL, Microsoft SQL Server, PostgreSQL, MariaDB, and Oracle. We have even brought the traditional RDBMS into the cloud world through Amazon Aurora, which provides MySQL and PostgreSQL support with the performance and reliability of commercial-grade databases at 1/10th the cost.
  • Non-relational databases (NoSQL) provided a simpler method of storing and retrieving information that was often faster and more scalable than traditional RDBMS technology. The concept of non-relational databases has existed since the 1960s but really took off in the early 2000s with the rise of web-based applications that required performance and scalability that relational databases struggled with at the time. AWS published this Dynamo whitepaper in 2007, with DynamoDB launching as a service in 2012. DynamoDB has quickly become one of the critical design elements for many of our customers who are building highly-scalable applications on AWS. We continue to innovate with DynamoDB, and this week launched global tables and on-demand backup at re:Invent 2017. DynamoDB excels in a variety of use cases, such as tracking of session information for popular websites, shopping cart information on e-commerce sites, and keeping track of gamers’ high scores in mobile gaming applications, for example.
  • Graph databases focus on the relationship between data items in the store. With a graph database, we work with nodes, edges, and properties to represent data, relationships, and information. Graph databases are designed to make it easy and fast to traverse and retrieve complex hierarchical data models. Graph databases share some concepts from the NoSQL family of databases such as key-value pairs (properties) and the use of a non-SQL query language such as Gremlin. Graph databases are commonly used for social networking, recommendation engines, fraud detection, and knowledge graphs. We released Amazon Neptune to help simplify the provisioning and management of graph databases as we believe that graph databases are going to enable the next generation of smart applications.

A common use case I am hearing every week as I talk to customers is how to incorporate chatbots within their organizations. Amazon Lex and Amazon Polly have made it easy for customers to experiment and build chatbots for a wide range of scenarios, but one of the missing pieces of the puzzle was how to model decision trees and and knowledge graphs so the chatbot could guide the conversation in an intelligent manner.

Graph databases are ideal for this particular use case, and having Amazon Neptune simplifies the deployment of a graph database while providing high performance, scalability, availability, and durability as a managed service. Security of your graph database is critical. To help ensure this, you can store your encrypted data by running AWS in Amazon Neptune within your Amazon Virtual Private Cloud (Amazon VPC) and using encryption at rest integrated with AWS Key Management Service (AWS KMS). Neptune also supports Amazon VPC and AWS Identity and Access Management (AWS IAM) to help further protect and restrict access.

Our customers now have the choice of many different database technologies to ensure that they can optimize each application and service for their specific needs. Just as DynamoDB has unlocked and enabled many new workloads that weren’t possible in relational databases, I can’t wait to see what new innovations and capabilities are enabled from graph databases as they become easier to use through Amazon Neptune.

Look for more on DynamoDB and Amazon S3 from me on Monday.

 

Glenn at Tour de Mont Blanc

 

 

GDPR – A Practical Guide For Developers

Post Syndicated from Bozho original https://techblog.bozho.net/gdpr-practical-guide-developers/

You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for gettign your systems in compliance with the regulation.

The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.

Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects.

I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.

The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them), the right to data portability (the user should be able to get a machine-readable dump of their data).

Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).

Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.

It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)

Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s.

  • “Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent, and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blcokchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse.
  • Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.
  • Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.
  • Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while).
  • Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.
  • Consent checkboxes – this is in my opinion the biggest change that the regulation brings. “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”.
  • Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.
  • “See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. For example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned)
  • Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviosuly, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).

Now some “do’s”, which are mostly about the technical measures needed to protect personal data. They may be more “ops” than “dev”, but often the application also has to be extended to support them. I’ve listed most of what I could think of in a previous post.

  • Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption
  • Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
  • Encrypt your backups – kind of obvious
  • Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person
  • Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.
  • Have your GDPR register of processing activities in something other than Excel – Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
  • Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose
  • Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)

Finally, some “don’t’s”.

  • Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent.
  • Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case
  • Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
  • Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.
  • Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards

Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.

I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.

The post GDPR – A Practical Guide For Developers appeared first on Bozho's tech blog.

The 10 Most Viewed Security-Related AWS Knowledge Center Articles and Videos for November 2017

Post Syndicated from Maggie Burke original https://aws.amazon.com/blogs/security/the-10-most-viewed-security-related-aws-knowledge-center-articles-and-videos-for-november-2017/

AWS Knowledge Center image

The AWS Knowledge Center helps answer the questions most frequently asked by AWS Support customers. The following 10 Knowledge Center security articles and videos have been the most viewed this month. It’s likely you’ve wondered about a few of these topics yourself, so here’s a chance to learn the answers!

  1. How do I create an AWS Identity and Access Management (IAM) policy to restrict access for an IAM user, group, or role to a particular Amazon Virtual Private Cloud (VPC)?
    Learn how to apply a custom IAM policy to restrict IAM user, group, or role permissions for creating and managing Amazon EC2 instances in a specified VPC.
  2. How do I use an MFA token to authenticate access to my AWS resources through the AWS CLI?
    One IAM best practice is to protect your account and its resources by using a multi-factor authentication (MFA) device. If you plan use the AWS Command Line Interface (CLI) while using an MFA device, you must create a temporary session token.
  3. Can I restrict an IAM user’s EC2 access to specific resources?
    This article demonstrates how to link multiple AWS accounts through AWS Organizations and isolate IAM user groups in their own accounts.
  4. I didn’t receive a validation email for the SSL certificate I requested through AWS Certificate Manager (ACM)—where is it?
    Can’t find your ACM validation emails? Be sure to check the email address to which you requested that ACM send validation emails.
  5. How do I create an IAM policy that has a source IP restriction but still allows users to switch roles in the AWS Management Console?
    Learn how to write an IAM policy that not only includes a source IP restriction but also lets your users switch roles in the console.
  6. How do I allow users from another account to access resources in my account through IAM?
    If you have the 12-digit account number and permissions to create and edit IAM roles and users for both accounts, you can permit specific IAM users to access resources in your account.
  7. What are the differences between a service control policy (SCP) and an IAM policy?
    Learn how to distinguish an SCP from an IAM policy.
  8. How do I share my customer master keys (CMKs) across multiple AWS accounts?
    To grant another account access to your CMKs, create an IAM policy on the secondary account that grants access to use your CMKs.
  9. How do I set up AWS Trusted Advisor notifications?
    Learn how to receive free weekly email notifications from Trusted Advisor.
  10. How do I use AWS Key Management Service (AWS KMS) encryption context to protect the integrity of encrypted data?
    Encryption context name-value pairs used with AWS KMS encryption and decryption operations provide a method for checking ciphertext authenticity. Learn how to use encryption context to help protect your encrypted data.

The AWS Security Blog will publish an updated version of this list regularly going forward. You also can subscribe to the AWS Knowledge Center Videos playlist on YouTube.

– Maggie