Tag Archives: Disaster Recovery

Automated Disaster Recovery using CloudEndure

Post Syndicated from Ryan Jaeger original https://aws.amazon.com/blogs/architecture/automated-disaster-recovery-using-cloudendure/

There are any number of events that cause IT outages and impact business continuity. These could include the unexpected infrastructure or application outages caused by flooding, earthquakes, fires, hardware failures, or even malicious attacks. Cloud computing opens a new door to support disaster recovery strategies, with benefits such as elasticity, agility, speed to innovate, and cost savings—all which aid new disaster recovery solutions.

With AWS, organizations can acquire IT resources on-demand, and pay only for the resources they use. Automating disaster recovery (DR) has always been challenging. This blog post shows how you can use automation to allow the orchestration of recovery to eliminate manual processes. CloudEndure Disaster Recovery, an AWS Company, Amazon Route 53, and AWS Lambda are the building blocks to deliver a cost-effective automated DR solution. The example in this post demonstrates how you can recover a production web application with sub-second Recovery Point Objects (RPOs) and Recovery Time Objectives (RTOs) in minutes.

As part of a DR strategy, knowing RPOs and RTOs will determine what kind of solution architecture you need. The RPO represents the point in time of the last recoverable data point (for example, the “last backup”). Any disaster after that point would result in data loss.

The time from the outage to restoration is the RTO. Minimizing RTO and RPO is a cost tradeoff. Restoring from backups and recreating infrastructure after the event is the lowest cost but highest RTO. Conversely, the highest cost and lowest RTO is a solution running a duplicate auto-failover environment.

Solution Overview

CloudEndure is an automated IT resilience solution that lets you recover your environment from unexpected infrastructure or application outages, data corruption, ransomware, or other malicious attacks. It utilizes block-level Continuous Data Replication (CDP), which ensures that target machines are spun up in their most current state during a disaster or drill, so that you can achieve sub-second RPOs. In the event of a disaster, CloudEndure triggers a highly automated machine conversion process and a scalable orchestration engine that can spin up machines in the target AWS Region within minutes. This process enables you to achieve RTOs in minutes. The CloudEndure solution uses a software agent that installs on physical or virtual servers. It connects to a self-service, web-based use console, which then issues an API call to the selected AWS target Region to create a Staging Area in the customer’s AWS account designated to receive the source machine’s replicated data.

Architecture

In the above example, a webserver and database server have the CloudEndure Agent installed, and the disk volumes on each server replicated to a staging environment in the customer’s AWS account. The CloudEndure Replication Server receives the encrypted data replication traffic and writes to the appropriate corresponding EBS volumes. It’s also possible to configure data replication traffic to use VPN or AWS Direct Connect.

With this current setup, if an infrastructure or application outage occurs, a failover to AWS is executed by manually starting the process from the CloudEndure Console. When this happens, CloudEndure creates EC2 instances from the synchronized target EBS volumes. After the failover completes, additional manual steps are needed to change the website’s DNS entry to point to the IP address of the failed over webserver.

Could the CloudEndure failover and DNS update be automated? Yes.

Amazon Route 53 is a highly available and scalable Domain Name System (DNS) web service with three main functions: domain registration, DNS routing, and health checking. A configured Route 53 health check monitors the endpoint of a webserver. If the health check fails over a specified period, an alarm is raised to execute an AWS Lambda function to start the CloudEndure failover process. In addition to health checks, Route 53 DNS Failover allows the DNS record for the webserver to be automatically update based on a healthy endpoint. Now the previously manual process of updating the DNS record to point to the restored web server is automated. You can also build Route 53 DNS Failover configurations to support decision trees to handle complex configurations.

To illustrate this, the following builds on the example by having a primary, secondary, and tertiary DNS Failover choice for the web application:

How Health Checks Work in Complex Amazon Route 53 Configurations

When the CloudEndure failover action executes, it takes several minutes until the target EC2 is launched and configured by CloudEndure. An S3 static web page can be returned to the end-user to improve communication while the failover is happening.

To support this example, Amazon Route 53 DNS failover decision tree can be configured to have a primary, secondary, and tertiary failover. The decision tree logic to support the scenario is the following:

  1. If the primary health check passes, return the primary webserver.
  2. Else, if the secondary health check passes, return the failover webserver.
  3. Else, return the S3 static site.

When the Route 53 health check fails when monitoring the primary endpoint for the webserver, a CloudWatch alarm is configured to ALARM after a set time. This CloudWatch alarm then executes a Lambda function that calls the CloudEndure API to begin the failover.

In the screenshot below, both health checks are reporting “Unhealthy” while the primary health check is in a state of ALARM. At the point, the DNS failover logic should be returning the path to the static S3 site, and the Lambda function executed to start the CloudEndure failover.

The following architecture illustrates the completed scenario:

Conclusion

Having a disaster recovery strategy is critical for business continuity. The benefits of AWS combined with CloudEndure Disaster Recovery creates a non-disruptive DR solution that provides minimal RTO and RPO while reducing total cost of ownership for customers. Leveraging CloudWatch Alarms combined with AWS Lambda for serverless computing are building blocks for a variety of automation scenarios.

 

 

How to migrate your EC2 Oracle Transparent Data Encryption (TDE) database encryption wallet to CloudHSM

Post Syndicated from Tracy Pierce original https://aws.amazon.com/blogs/security/how-to-migrate-your-ec2-oracle-transparent-data-encryption-tde-database-encryption-wallet-to-cloudhsm/

In this post, I’ll show you how to migrate an encryption wallet for an Oracle database installed on Amazon EC2 from using an outside HSM to using AWS CloudHSM. Transparent Data Encryption (TDE) for Oracle is a common use case for Hardware Security Module (HSM) devices like AWS CloudHSM. Oracle TDE uses what is called “envelope encryption.” Envelope encryption is when the encryption key used to encrypt the tables of your database is in turn encrypted by a master key that resides either in a software keystore or on a hardware keystore, like an HSM. This master key is non-exportable by design to protect the confidentiality and integrity of your database encryption. This gives you a more granular encryption scheme on your data.

An encryption wallet is an encrypted container used to store the TDE master key for your database. The encryption wallet needs to be opened manually after a database startup and prior to the TDE encrypted data being accessed, so the master key is available for data decryption. The process I talk about in this post can be used with any non-AWS hardware or software encryption wallet, or a hardware encryption wallet that utilizes AWS CloudHSM Classic. For my examples in this tutorial, I will be migrating from a CloudHSM Classic to a CloudHSM cluster. It is worth noting that Gemalto has announced the end-of-life for Luna 5 HSMs, which our CloudHSM Classic fleet uses.

Note: You cannot migrate from an Oracle instance in
Amazon Relational Database Service (Amazon RDS) to AWS CloudHSM. You must install the Oracle database on an Amazon EC2 instance. Amazon RDS is not currently integrated with AWS CloudHSM.

When you move from one type of encryption wallet to another, new TDE master keys are created inside the new wallet. To ensure that you have access to backups that rely on your old HSM, consider leaving the old HSM running for your normal recovery window period. The steps I discuss will perform the decryption of your TDE keys and then re-encrypt them with the new TDE master key for you.

Once you’ve migrated your Oracle databases to use AWS CloudHSM as your encryption wallet, it’s also a good idea to set up cross-region replication for disaster recovery efforts. With copies of your database and encryption wallet in another region, you can be back in production quickly should a disaster occur. I’ll show you how to take advantage of this by setting up cross-region snapshots of your Oracle database Amazon Elastic Block Store (EBS) volumes and copying backups of your CloudHSM cluster between regions.

Solution overview

For this solution, you will modify the Oracle database’s encryption wallet to use AWS CloudHSM. This is completed in three steps, which will be detailed below. First, you will switch from the current encryption wallet, which is your original HSM device, to a software wallet. This is done by reverse migrating to a local wallet. Second, you’ll replace the PKCS#11 provider of your original HSM with the CloudHSM PKCS#11 software library. Third, you’ll switch the encryption wallet for your database to your CloudHSM cluster. Once this process is complete, your database will automatically re-encrypt all data keys using the new master key.

To complete the disaster recovery (DR) preparation portion of this post, you will perform two more steps. These consist of copying over snapshots of your EC2 volumes and your CloudHSM cluster backups to your DR region. The following diagram illustrates the steps covered in this post.
 

Figure 1: Steps to migrate your EC2 Oracle TDE database encryption wallet to CloudHSM

Figure 1: Steps to migrate your EC2 Oracle TDE database encryption wallet to CloudHSM

  1. Switch the current encryption wallet for the Oracle database TDE from your original HSM to a software wallet via a reverse migration process.
  2. Replace the PKCS#11 provider of your original HSM with the AWS CloudHSM PKCS#11 software library.
  3. Switch your encryption wallet to point to your AWS CloudHSM cluster.
  4. (OPTIONAL) Set up cross-region copies of the EC2 instance housing your Oracle database
  5. (OPTIONAL) Set up a cross-region copy of your recent CloudHSM cluster backup

Prerequisites

This process assumes you have the below items already set up or configured:

Deploy the solution

Now that you have the basic steps, I’ll go into more detail on each of them. I’ll show you the steps to migrate your encryption wallet to a software wallet using a reverse migration command.

Step 1: Switching the current encryption wallet for the Oracle database TDE from your original HSM to a software wallet via a reverse migration process.

To begin, you must configure the sqlnet.ora file for the reverse migration. In Oracle databases, the sqlnet.ora file is a plain-text configuration file that contains information like encryption, route of connections, and naming parameters that determine how the Oracle server and client must use the capabilities for network database access. You will want to create a backup so you can roll back in the event of any errors. You can make a copy with the below command. Make sure to replace </path/to/> with the actual path to your sqlnet.ora file location. The standard location for this file is “$ORACLE_HOME/network/admin“, but check your setup to ensure this is correct.

cp </path/to/>sqlnet.ora </path/to/>sqlnet.ora.backup

The software wallet must be created before you edit this file, and it should preferably be empty. Then, using your favorite text editor, open the sqlnet.ora file and set the below configuration. If an entry already exists, replace it with the below text.


ENCRYPTION_WALLET_LOCATION=
  (SOURCE=(METHOD=FILE)(METHOD_DATA=
    (DIRECTORY=<path_to_keystore>)))

Make sure to replace the <path_to_keystore> with the directory location of your destination wallet. The destination wallet is the path you choose for the local software wallet. You will notice in Oracle the words “keystore” and “wallet” are interchangeable for this post. Next, you’ll configure the wallet for the reverse migration. For this, you will use the ADMINISTER KEY MANAGEMENT statement with the SET ENCRYPTION KEY and REVERSE MIGRATE clauses as shown in the example below.

By using the REVERSE MIGRATE USING clause in your statement, you ensure the existing TDE table keys and tablespace encryption keys are decrypted by the hardware wallet TDE master key and then re-encrypted with the software wallet TDE master key. You will need to log into the database instance as a user that has been granted the ADMINISTER KEY MANAGEMENT or SYSKM privileges to run this statement. An example of the login is below. Make sure to replace the <sec_admin> and <password> with your administrator user name and password for the database.


sqlplus c##<sec_admin> syskm
Enter password: <password> 
Connected.

Once you’re connected, you’ll run the SQL statement below. Make sure to replace <password> with your own existing wallet password and <username:password> with your own existing wallet user ID and password. We are going to run this statement with the WITH BACKUP parameter, as it’s always ideal to take a backup in case something goes incorrectly.

ADMINISTER KEY MANAGEMENT SET ENCRYPTION KEY IDENTIFIED BY <password> REVERSE MIGRATE USING “<username:password>” WITH BACKUP;

If successful, you will see the text keystore altered. When complete, you do not need to restart your database or manually re-open the local wallet as the migration process loads this into memory for you.

With the migration complete, you’ll now move onto the next step of replacing the PKCS#11 provider of your original HSM with the CloudHSM PKCS#11 software library. This library is a PKCS#11 standard implementation that communicates with the HSMs in your cluster and is compliant with PKCS#11 version 2.40.

Step 2: Replacing the PKCS#11 provider of your original HSM with the AWS CloudHSM PKCS#11 software library.

You’ll begin by installing the software library with the below two commands.

wget https://s3.amazonaws.com/cloudhsmv2-software/CloudHsmClient/EL6/cloudhsm-client-pkcs11-latest.el6.x86_64.rpm

sudo yum install -y ./cloudhsm-client-pkcs11-latest.el6.x86_64.rpm

When installation completes, you will be able to find the CloudHSM PKCS#11 software library files in the directory, the default directory for AWS CloudHSM’s software library installs. To ensure processing speed and throughput capabilities of the HSMs, I suggest installing a Redis cache as well. This cache stores key handles and attributes locally, so you may access them without making a call to the HSMs. As this step is optional and not required for this post, I will leave the link for installation instructions here. With the software library installed, you want to ensure the CloudHSM client is running. You can check this with the command below.

sudo start cloudhsm-client

Step 3: Switching your encryption wallet to point to your AWS CloudHSM cluster.

Once you’ve verified the client is running, you’re going to create another backup of the sqlnet.ora file. It’s always a good idea to take backups before making any changes. The command would be similar to below, replacing </path/to/> with the actual path to your sqlnet.ora file.

cp </path/to/>sqlnet.ora </path/to/>sqlnet.ora.backup2

With this done, again open the sqlnet.ora file with your favorite text editor. You are going to edit the line encryption_wallet_location to resemble the below text.


ENCRYPTION_WALLET_LOCATION=
  (SOURCE=(METHOD=HSM))

Save the file and exit. You will need to create the directory where your Oracle database will expect to find the library file for the AWS CloudHSM PKCS#11 software library. You do this with the command below.

sudo mkdir -p /opt/oracle/extapi/64/hsm

With the directory created, you next copy over the CloudHSM PKCS#11 software library from the original installation directory to this one. It is important this new directory only contain the one library file. Should any files exist in the directory that are not directly related to the way you installed the CloudHSM PKCS#11 software library, remove them. The command to copy is below.

sudo cp /opt/cloudhsm/lib/libcloudhsm_pkcs11_standard.so /opt/oracle/extapi/64/hsm

Now, modify the ownership of the directory and everything inside. The Oracle user must have access to these library files to run correctly. The command to do this is below.

sudo chown -R oracle:dba /opt/oracle

With that done, you can start your Oracle database. This completes the migration of your encryption wallet and TDE keys from your original encryption wallet to a local wallet, and finally to CloudHSM as the new encryption wallet. Should you decide you wish to create new TDE master encryption keys on CloudHSM, you can follow the steps here to do so.

These steps are optional, but helpful in the event you must restore your database to production quickly. For customers that leverage DR environments, we have two great blog posts here and here to walk you through each step of the cross-region replication process. The first uses a combination of AWS Step Functions and Amazon CloudWatch Events to copy your EBS snapshots to your DR region, and the second showcases how to copy your CloudHSM cluster backups to your DR region.

Summary

In this post, I walked you through how to migrate your Oracle TDE database encryption wallet to point it to CloudHSM for secure storage of your TDE. I showed you how to properly install the CloudHSM PKCS#11 software library and place it in the directory for Oracle to find and use. This process can be used to migrate most on-premisis encryption wallet to AWS CloudHSM to ensure security of your TDE keys and meet compliance requirements.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS CloudHSM forum.

Want more AWS Security news? Follow us on Twitter.

Author

Tracy Pierce

Tracy Pierce is a Senior Cloud Support Engineer at AWS. She enjoys the peculiar culture of Amazon and uses that to ensure every day is exciting for her fellow engineers and customers alike. Customer Obsession is her highest priority and she shows this by improving processes, documentation, and building tutorials. She has her AS in Computer Security & Forensics from SCTD, SSCP certification, AWS Developer Associate certification, and AWS Security Specialist certification. Outside of work, she enjoys time with friends, her Great Dane, and three cats. She keeps work interesting by drawing cartoon characters on the walls at request.