Store Amazon EMR in-transit data encryption certificates using AWS Secrets Manager

Post Syndicated from Hao Wang original https://aws.amazon.com/blogs/big-data/store-amazon-emr-in-transit-data-encryption-certificates-using-aws-secrets-manager/

With Amazon EMR, you can use a security configuration to specify settings for encrypting data in transit. When in-transit encryption is configured, you can enable application-specific encryption features, for example:

  • Hadoop HDFS NameNode or DataNode user interfaces use HTTPS
  • Hadoop MapReduce encrypted shuffle uses Transport Layer Security (TLS)
  • Presto nodes internal communication uses SSL/TLS (Amazon EMR version 5.6.0 and later only)
  • Spark component internal RPC communication, such as the block transfer service and the external shuffle service, is encrypted using the AES-256 cipher in Amazon EMR versions 5.9.0 and later
  • HTTP protocol communication with user interfaces such as Spark History Server and HTTPS-enabled file servers is encrypted using Spark’s SSL configuration

The security configuration of Amazon EMR allows you to set up TLS certificates to encrypt data in transit. A security configuration provides the following options to specify TLS certificates:

  • As a path to a .zip file in an Amazon Simple Storage Service (Amazon S3) bucket that contains all certificates
  • Through a custom certificate provider as a Java class

In many cases, company security policies prohibit storing any type of sensitive information in an S3 bucket, including certificate private keys. For that reason, the only remaining option to secure data in transit on Amazon EMR is to configure the custom certificate provider.

In this post, I guide you through the configuration process and provide Java code samples to secure data in transit on Amazon EMR by storing TLS custom certificates using AWS Secrets Manager.

Secrets Manager helps you protect secrets needed to access your applications, services, and IT resources. The service enables you to easily rotate, manage, and retrieve database credentials, API keys, and other secrets throughout their lifecycle. Users and applications retrieve secrets with a call to Secrets Manager APIs, eliminating the need to hardcode sensitive information in plain text.

Solution overview

The following diagram illustrates the solution architecture.

During an EMR cluster start, if a custom certificate provider is configured for in-transit encryption, the provider is called to get the certificates. A custom certificate provider is a Java class that implements the TLSArtifactsProvider interface.

To make this solution work, you need a secure place to store certificates that can also be accessed by Java code. This post uses Secrets Manager, which provides a mechanism for managing certificates, and encrypts them using AWS Key Management Service (AWS KMS) keys.

To implement this solution, you complete the following high-level steps:

  1. Create a certificate.
  2. Store your certificate to Secrets Manager.
    1. Create a secret for a private key.
    2. Create a secret for a public key.
  3. Implement TLSArtifactsProvider.
  4. Create the Amazon EMR security configuration.
  5. Modify the Amazon Elastic Compute Cloud (Amazon EC2) instance profile role to get the certificate from Secrets Manager.
  6. Start the Amazon EMR cluster.

Create a certificate

For demonstration purposes, this post uses OpenSSL to create a self-signed certificate. See the following code:

openssl req -x509 -newkey rsa:4096 -keyout privateKey.pem -out certificateChain.pem -days 365 -subj "/C=US/ST=MA/L=Boston/O=EMR/OU=EMR/CN=*.ec2.internal" -nodes

This command creates a self-signed, 4096-bit certificate. For production systems, we recommend using a trusted certificate authority (CA) to issue certificates.

The command above has the following parameters:

  • keyout – The output file in which to store the private key.
  • out – The output file in which to store the certificate.
  • days – The number of days for which to certify the certificate.
  • subj – The subject name for a new request. The common name (CN) must match the domain name specified in DHCP that is assigned to the virtual private cloud (VPC). The default is ec2.internal. The * prefix is the wildcard certificate.
  • nodes – Allows you to create a private key without a password, which is without encryption.

The output of OpenSSL includes a pair of keys—one private and one public:

  • privateKey.pem – SSL private key certificate
  • certificateChain.pem – SSL public key certificate

Store your certificate to Secrets Manager

In this section, we walk through the steps to create secrets for a private key and a public key.

Create a secret for a private key

To create a secret for a private key, complete the following steps:

  1. On the Secrets Manager console, choose Store a new secret.
  2. For the secret type, select Other type of secrets.
  3. On the Plaintext tab in the Key/value pairs section, copy the content from privateKey.pem.
  4. For Encryption key, choose DefaultEncryptionKey.
  5. Choose Next.
  6. For Secret name, enter emrprivate.
  7. For Resource permissions, optionally add or edit a resource policy to access secrets across AWS accounts. For more information, refer to Permissions policy examples.
  8. Choose Next.
  9. Choose Store.

Create a secret for a public key

To create a secret for a public key, complete the following steps:

  1. On the Secrets Manager console, choose Store a new secret.
  2. For the secret type, select Other type of secrets.
  3. On the Plaintext tab in the Key/value pairs section, copy the content from certificateChain.pem.
  4. For Encryption key, choose DefaultEncryptionKey.
  5. Choose Next.
  6. For Secret name, enter emrcert.
  7. For Resource permissions, optionally add or edit a resource policy to access secrets across AWS accounts.
  8. Choose Next.
  9. Choose Store.

Implement TLSArtifactsProvider

This section describes the flow in the Java code only. You can download the full code from GitHub.

The interface uses the getTlsArtifacts method, which expects certificates in return:

Java class EmrTlsFromSecretsManager implements following TLSArtifactsProvider interface

public abstract class TLSArtifactsProvider {

  public abstract TLSArtifacts getTlsArtifacts();
}

In the provided code example, we implement the following logic:

@Override
public TLSArtifacts getTlsArtifacts() {

   init();

   //Get private key from string
   PrivateKey privateKey = getPrivateKey(this.tlsPrivateKey);

   //Get certificate from string
   List<Certificate> certChain = getX509FromString(this.tlsCertificateChain);
   List<Certificate> certs = getX509FromString(this.tlsCertificate);

   return new TLSArtifacts(privateKey,certChain,certs);
}

The parameters are as follows:

  • init() – Includes the following:
    • readTags() – Reads the secret ARNs from the Amazon EMR tags
    • getCertificates() – Gets the certificates from Secrets Manager
  • getX509FromString() – Converts certificates to an X509 format
  • getPrivateKey() – Converts the private key to the correct format

Compile the Java project, and you will get the file emr-tls-provider-samples-0.1-jar-with-dependencies.jar. Alternatively you can download the JAR file from GitHub.

Create the Amazon EMR security configuration

To create the Amazon EMR security configuration, complete the following steps:

  1. Upload the emr-tls-provider-samples-0.1-jar-with-dependencies.jar file to an S3 bucket.
  2. On the Amazon EMR console, choose Security configurations, then choose Create.
  3. Enter a name for your new security configuration; for example, emr-tls-ssm.
  4. Select Enable in-transit encryption.
  5. For Certificate provider type, choose Custom.
  6. For Custom key provider location, enter the Amazon S3 path to the Java JAR file.
  7. For Certificate provider class, enter the name of the Java class. In the example code, the name is com.amazonaws.awssamples.EmrTlsFromSecretsManager.
  8. Configure the at-rest encryption as required.
  9. Choose Create.

Modify the EC2 instance profile role

Applications running on Amazon EMR assume and use the Amazon EMR role for Amazon EC2 to interact with other AWS services. To grant permissions to get certificates from Secrets Manager, add the following policy to your EC2 instance profile role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": [
                "arn:aws:secretsmanager:<region>:<account-id>:secret:emrprivate-*",
                "arn:aws:secretsmanager:<region>:<account-id>:secret:emrcert-*"
            ]
        }
    ]
}

Make sure you limit the scope of the Secrets Manager policy to only the certificates that are required for provisioning.

Start the cluster

To reuse the same Java JAR file with different certificates and configurations, you can provide secret ARNs to EmrTlsFromSecretsManager through Amazon EMR tags, rather than embedding them in Java code.

In this example, we use the following tags:

  • sm:ssl:emrcert – The ARN of the Secrets Manager parameter key storing the CA-signed certificate
  • sm:ssl:emrprivate – The ARN of the Secrets Manager parameter key storing the CA-signed certificate private key

Validation

After the cluster is started successfully, you are able to access the HDFS NameNode and DataNode UI via HTTPS. For more information, see View web interfaces hosted on Amazon EMR clusters.

Clean Up

If you don’t need the resources you created in the earlier steps, you can delete the Secrets Manager secrets and EMR cluster in order to avoid additional charges.

  1. On the Secrets Manager console, select the secrets you created.
  2. On the Actions menu, choose Delete secret.This doesn’t automatically delete the secrets, because you need to set a waiting period that allows for the secrets to be restored, if needed. The minimum time is 7 days.
  3. On the Amazon EMR console, select the cluster you created.
  4. Choose Terminate.

The process of deleting the EMR cluster takes a few minutes to complete.

Conclusion

In this post, we demonstrated how to create your custom Amazon EMR TLSArtifactsProvider interface and use Secrets Manager to store certificates. This allows you to define a more secure way to store and use certificates for Amazon EMR in-transit data encryption.


About the author

Hao Wang is a Senior Big Data Architect at AWS. Hao actively works with customers building large scale data platforms on AWS. He has a background as a software architect on implementing distributed software systems. In his spare time, he enjoys reading and outdoor activities with his family.