All posts by Praveen Kumar Jeyarajan

How to unit test and deploy AWS Glue jobs using AWS CodePipeline

Post Syndicated from Praveen Kumar Jeyarajan original https://aws.amazon.com/blogs/devops/how-to-unit-test-and-deploy-aws-glue-jobs-using-aws-codepipeline/

This post is intended to assist users in understanding and replicating a method to unit test Python-based ETL Glue Jobs, using the PyTest Framework in AWS CodePipeline. In the current practice, several options exist for unit testing Python scripts for Glue jobs in a local environment. Although a local development environment may be set up to build and unit test Python-based Glue jobs, by following the documentation, replicating the same procedure in a DevOps pipeline is difficult and time consuming.

Unit test scripts are one of the initial quality gates used by developers to provide a high-quality build. One must reuse these scripts during regression testing to make sure that all of the existing functionality is intact, and that new releases don’t disrupt key application functionality. The majority of the regression test suites are expected to be integrated with the DevOps Pipeline for its execution. Unit testing an application code is a fundamental task that evaluates  whether each (unit) code written by a programmer functions as expected. Unit testing of code provides a mechanism to determine that software quality hasn’t been compromised. One of the difficulties in building Python-based Glue ETL tasks is their ability for unit testing to be incorporated within DevOps Pipeline, especially when there are modernization of mainframe ETL process to modern tech stacks in AWS

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. AWS Glue provides all of the capabilities needed for data integration. This means that you can start analyzing your data and putting it to use in minutes rather than months. AWS Glue provides both visual and code-based interfaces to make data integration easier.

Prerequisites

GitHub Repository

Amazon ECR Image URI for Glue Library

Solution overview

A typical enterprise-scale DevOps pipeline is illustrated in the following diagram. This solution describes how to incorporate the unit testing of Python-based AWS Glue ETL processes into the AWS DevOps Pipeline.

Figure 1 Solution Overview

The GitHub repository aws-glue-jobs-unit-testing has a sample Python-based Glue job in the src folder. Its associated unit test cases built using the Pytest Framework are accessible in the tests folder. An AWS CloudFormation template written in YAML is included in the deploy folder. As a runtime environment, AWS CodeBuild utilizes custom container images. This feature is used to build a project utilizing Glue libraries from Public ECR repository, that can run the code package to demonstrate unit testing integration.

Solution walkthrough

Time to read  7 min
Time to complete  15-20 min
Learning level  300
Services used
AWS CodePipeline, AWS CodeCommit, AWS CodeBuild, Amazon Elastic Container Registry (Amazon ECR) Public Repositories, AWS CloudFormation

The container image at the Public ECR repository for AWS Glue libraries includes all of the binaries required to run PySpark-based AWS Glue ETL tasks locally, as well as unit test them. The public container repository has three image tags, one for each AWS Glue version supported by AWS Glue. To demonstrate the solution, we use the image tag glue_libs_3.0.0_image_01 in this post. To utilize this container image as a runtime image in CodeBuild, copy the Image URI corresponding to the image tag that you intend to use, as shown in the following image.

Figure 2 Select Glue Library from Public ECR

The aws-glue-jobs-unit-testing GitHub repository contains a CloudFormation template, pipeline.yml, which deploys a CodePipeline with CodeBuild projects to create, test, and publish the AWS Glue job. As illustrated in the following, use the copied image URL from Amazon ECR public to create and test a CodeBuild project.

  TestBuild:
    Type: AWS::CodeBuild::Project
    Properties:
      Artifacts:
        Type: CODEPIPELINE
      BadgeEnabled: false
      Environment:
        ComputeType: BUILD_GENERAL1_LARGE
        Image: "public.ecr.aws/glue/aws-glue-libs:glue_libs_3.0.0_image_01"
        ImagePullCredentialsType: CODEBUILD
        PrivilegedMode: false
        Type: LINUX_CONTAINER
      Name: !Sub "${RepositoryName}-${BranchName}-build"
      ServiceRole: !GetAtt CodeBuildRole.Arn  

The pipeline performs the following operations:

  1. It uses the CodeCommit repository as the source and transfers the most recent code from the main branch to the CodeBuild project for further processing.
  2. The following stage is build and test, in which the most recent code from the previous phase is unit tested and the test report is published to CodeBuild report groups.
  3. If all of the test results are good, then the next CodeBuild project is launched to publish the code to an Amazon Simple Storage Service (Amazon S3) bucket.
  4. Following the successful completion of the publish phase, the final step is to deploy the AWS Glue task using the CloudFormation template in the deploy folder.

Deploying the solution

Set up

Now we’ll deploy the solution using a CloudFormation template.

  • Using the GitHub Web, download the code.zip file from the aws-glue-jobs-unit-testing repository. This zip file contains the GitHub repository’s src, tests, and deploy folders. You may also create the zip file yourself using command-line tools, such as git and zip. To create the zip file on Linux or Mac, open the terminal and enter the following commands.
git clone https://github.com/aws-samples/aws-glue-jobs-unit-testing.git
cd aws-glue-jobs-unit-testing
git checkout master
zip -r code.zip src/ tests/ deploy/
  • Sign in to the AWS Management Console and choose the AWS Region of your choice.
  • Create an Amazon S3 bucket. For more information, see How Do I Create an S3 Bucket? in the AWS documentation.
  • Upload the downloaded zip package, code.zip, to the Amazon S3 bucket that you created.

In this example, I created an Amazon S3 bucket named aws-glue-artifacts-us-east-1 in the N. Virginia (us-east-1) Region, and used the console to upload the zip package from the GitHub repository to the Amazon S3 bucket.

Figure 3 Upload code.zip file to S3 bucket

Creating the stack

  1.  In the CloudFormation console, choose Create stack.
  2. On the Specify template page, choose Upload a template file, and then choose the pipeline.yml template, downloaded from the GitHub repository

Figure 4 Upload pipeline.yml template to create a new CloudFormation stack

  1. Specify the following parameters:.
  • Stack name: glue-unit-testing-pipeline (Choose a stack name of your choice)
  • ApplicationStackName: glue-codepipeline-app (This is the name of the CloudFormation stack that will be created by the pipeline)
  • BranchName: master (This is the name of the branch to be created in the CodeCommit repository to check-in the code from the Amazon S3 bucket zip file)
  • BucketName: aws-glue-artifacts-us-east-1 (This is the name of the Amazon S3 bucket that contains the zip file. This bucket will also be used by the pipeline for storing code artifacts)
  • CodeZipFile: lambda.zip (This is the key name of the sample code Amazon S3 object. The object should be a zip file)
  • RepositoryName: aws-glue-unit-testing (This is the name of the CodeCommit repository that will be created by the stack)
  • TestReportGroupName: glue-unittest-report (This is the name of the CodeBuild test report group that will be created to store the unit test reports)

Figure 5 Fill parameters for stack creation

  1. Choose Next, and again Next.
  1. On the Review page, under Capabilities, choose the following options:
  • I acknowledge that CloudFormation might create IAM resources with custom names.

Figure 6 Acknowledge IAM roles creation

  1. Choose Create stack to begin the stack creation process. Once the stack creation is complete, the resources that were created are displayed on the Resources tab. The stack creation takes approximately 5-7 minutes.

Figure 7 Successful completion of stack creation

The stack automatically creates a CodeCommit repository with the initial code checked-in from the zip file uploaded to the Amazon S3 bucket. Furthermore, it creates a CodePipeline view using the CodeCommit repository as the source. In the above example, the CodeCommit repository is aws-glue-unit-test, and the pipeline is aws-glue-unit-test-pipeline.

Testing the solution

To test the deployed pipeline, open the CodePipeline console and select the pipeline created by the CloudFormation stack. Select the Release Change button on the pipeline page.

Figure 8 Choose Release Change on pipeline page

The pipeline begins its execution with the most recent code in the CodeCommit repository.

When the Test_and_Build phase is finished, select the Details link to examine the execution logs.

Figure 9 Successfully completed the Test_and_Build stage

Select the Reports tab, and choose the test report from Report history to view the unit execution results.

Figure 10 Test report from pipeline execution

Finally, after the deployment stage is complete, you can see, run, and monitor the deployed AWS Glue job on the AWS Glue console page. For more information, refer to the Running and monitoring AWS Glue documentation

Figure 11 Successful pipeline execution

Cleanup

To avoid additional infrastructure costs, make sure that you delete the stack after experimenting with the examples provided in the post. On the CloudFormation console, select the stack that you created, and then choose Delete. This will delete all of the resources that it created, including CodeCommit repositories, IAM roles/policies, and CodeBuild projects.

Summary

In this post, we demonstrated how to unit test and deploy Python-based AWS Glue jobs in a pipeline with unit tests written with the PyTest framework. The approach is not limited to CodePipeline, and it can be used to build up a local development environment, as demonstrated in the Big Data blog. The aws-glue-jobs-unit-testing GitHub repository contains the example’s CloudFormation template, as well as sample AWS Glue Python code and Pytest code used in this post. If you have any questions or comments regarding this example, please open an issue or submit a pull request.

Authors:

Praveen Kumar Jeyarajan

Praveen Kumar Jeyarajan is a PraveenKumar is a Senior DevOps Consultant in AWS supporting Enterprise customers and their journey to the cloud. He has 11+ years of DevOps experience and is skilled in solving myriad technical challenges using the latest technologies. He holds a Masters degree in Software Engineering. Outside of work, he enjoys watching movies and playing tennis.

Vaidyanathan Ganesa Sankaran

Vaidyanathan Ganesa Sankaran is a Sr Modernization Architect at AWS supporting Global Enterprise customers on their journey towards modernization. He is specialized in Artificial intelligence, legacy Modernization and Cloud Computing. He holds a Masters degree in Software Engineering and has 12+ years of Modernization experience. Outside work, he loves conducting training sessions for college grads and professional starter who wants to learn cloud and AI. His hobbies are playing tennis, philately and traveling.

How to import PFX-formatted certificates into AWS Certificate Manager using OpenSSL

Post Syndicated from Praveen Kumar Jeyarajan original https://aws.amazon.com/blogs/security/how-to-import-pfx-formatted-certificates-into-aws-certificate-manager-using-openssl/

In this blog post, we show you how to import PFX-formatted certificates into AWS Certificate Manager (ACM) using OpenSSL tools.

Secure Sockets Layer and Transport Layer Security (SSL/TLS) certificates are small data files that digitally bind a cryptographic key pair to an organization’s details. The key pair is used to secure network communications and establish the identity of websites over the internet and on private networks. These certificates are usually issued by a trusted certificate authority (CA). A CA acts as a trusted third party—trusted both by the subject (owner) of the certificate and by the party relying upon the certificate. The format of these certificates is specified by the X.509 or Europay, Mastercard, and Visa (EMV) standards. SSL/TLS certificates issued by a trusted CA are usually encoded in Personal Information Exchange (PFX) or Privacy-Enhanced Mail (PEM) format.

ACM lets you easily provision, manage, and deploy public and private SSL/TLS certificates for use with Amazon Web Services (AWS) and your internal connected resources. Certificates can be imported from outside AWS, or created using AWS tools. Certificates can be used to help with ACM-integrated AWS resources, such as Elastic Load Balancing, Amazon CloudFront distributions, and Amazon API Gateway.

To import a self–signed SSL/TLS certificate into ACM, you must provide the certificate and its private key in PEM format. To import a signed certificate, you must also include the certificate chain in PEM format. Prerequisites for Importing Certificates provides more detail.

Sometimes, the trusted CA issues the certificate, private key, and certificate chain details in PFX format. In this post, we show you how to convert a PFX-encoded certificate into PEM format and then import it into ACM.

Solution

The following solution converts a PFX-encoded certificate to PEM format using the OpenSSL command line tool. The certificate is then imported into ACM.

Figure 1: Use the OpenSSL Toolkit to convert the certificate, then import the certificate into ACM

Figure 1: Use the OpenSSL Toolkit to convert the certificate, then import the certificate into ACM

The solution has two parts, shown in the preceding figure:

  1. Use the OpenSSL Toolkit to convert the PFX-encoded certificate into PEM format.
  2. Import the PEM certificate into ACM.

Prerequisites

We use the OpenSSL toolkit to convert a PFX encoded certificate to PEM format. OpenSSL is an open source toolkit for manipulating cryptographic files. It’s also a general-purpose cryptography library.

For this post, we use a password protected PFX-encoded file—website.xyz.com.pfx—with an X.509 standard CA signed certificate and 2048-bit RSA private key data.

  1. Download and install the OpenSSL toolkit.
  2. Add the OpenSSL binaries location to your system PATH variable, so that the binaries are available for command line use.

Convert the PFX encoded certificate into PEM format

Run the following commands to convert a PFX-encoded SSL certificate into PEM format. The procedure requires the PFX-encoded certificate and the passphrase used for encrypting it.

The procedure converts the PFX-encoded signed certificate file into three files in PEM format.

  • cert-file.pem – PEM file containing the SSL/TLS certificate for the resource.
  • withoutpw-privatekey.pem – PEM file containing the private key of the certificate with no password protection.
  • ca-chain.pem – PEM file containing the root certificate of the CA.

To convert the PFX encoded certificate

  1. Use the following command to extract the certificate private key from the PFX file. If your certificate is secured with a password, enter it when prompted. The command generates a PEM-encoded private key file named privatekey.pem. Enter a passphrase to protect the private key file when prompted to Enter a PEM pass phrase.
    
    openssl pkcs12 -in website.xyz.com.pfx -nocerts -out privatekey.pem
    

     

    Figure 2: Prompt to enter a PEM pass phrase

    Figure 2: Prompt to enter a PEM pass phrase

  2. The previous step generates a password-protected private key. To remove the password, run the following command. When prompted, provide the passphrase created in step 1. If successful, you will see writing RSA key.
    
    openssl rsa -in privatekey.pem -out withoutpw-privatekey.pem
    

     

    Figure 3: Writing RSA key

    Figure 3: Writing RSA key

  3. Use the following command to transfer the certificate from the PFX file to a PEM file. This creates the PEM-encoded certificate file named cert-file.pem. If successful, you will see MAC verified OK.
    
    openssl pkcs12 -in website.xyz.com.pfx -clcerts -nokeys -out cert-file.pem
    

     

    Figure 4: MAC verified OK

    Figure 4: MAC verified OK

  4. Finally, use the following command to extract the CA chain from the PFX file. This creates the CA chain file named ca-chain.pem. If successful, you will see MAC verified OK.
    
    openssl pkcs12 -in website.xyz.com.pfx -cacerts -nokeys -chain -out ca-chain.pem
    

     

    Figure 5: MAC verified OK

    Figure 5: MAC verified OK

When the preceding steps are complete, the PFX-encoded signed certificate file is split and returned as three files in PEM format, shown in the following figure. To view the list of files in a directory, enter the command dir in Windows or type the command ls -l in Linux.

  • cert-file.pem
  • withoutpw-privatekey.pem
  • ca-chain.pem

    Figure 6: PEM-formatted files

    Figure 6: PEM-formatted files

Import the PEM certificates into ACM

Use the ACM console to import the PEM-encoded SSL certificate. You need the PEM files containing the SSL certificate (cert-file.pem), the private key (withoutpw-privatekey.pem), and the root certificate of the CA (ca-chain.pem) that you created in the previous procedure.

To import the certificates

  1. Open the ACM console. If this is your first time using ACM, look for the AWS Certificate Manager heading and select the Get started button.
  2. Select Import a certificate.
  3. Add the files you created in the previous procedure:
    1. Use a text-editing tool such as Notepad to open cert-file.pem. Copy the lines beginning at –BEGIN CERTIFICATE– and ending with –END CERTIFICATE–. Paste them into the Certificate body text box.
    2. Open withoutpw-privatekey.pem. Copy the lines beginning at –BEGIN RSA PRIVATE KEY– and ending with –END RSA PRIVATE KEY–. Paste them into the Certificate private key, text box.
    3. For Certificate chain, copy and paste the lines starting –BEGIN CERTIFICATE– and ending with –END CERTIFICATE– in the file ca-chain.pem.

      Figure 7: Add the files to import the certificate

      Figure 7: Add the files to import the certificate

  4. Select Next and add tags for the certificate. Each tag is a label consisting of a key and value that you define. Tags help you manage, identify, organize, search for, and filter resources.
  5. Select Review and import.
  6. Review the information about your certificate, then select Import.

Conclusion

In this post, we discussed how you can use OpenSSL tools to import a PFX-encoded SSL/TLS certificate into ACM. You can use the imported certificate with any ACM-integrated AWS service. ACM makes it easier to set up SSL/TLS for a website or application on AWS. ACM can replace many of the manual processes usually associated with using and managing SSL/TLS certificates. ACM can also manage renewals, which can help you avoid downtime due to misconfigured, revoked, or expired certificates. You can renew an imported certificate by obtaining and importing a new certificate from your certificate issuer, or you can request a new certificate from ACM.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Praveen Kumar Jeyarajan

PraveenKumar is a DevOps Consultant in AWS supporting enterprise customers and their journey to the cloud. Before his work on AWS and cloud technologies, PraveenKumar focused on solving myriad technical challenges using the latest technologies. Outside of work, he enjoys watching movies and playing tennis.

Author

Viyoma Sachdeva

Viyoma is a DevOps Consultant in AWS supporting global customers and their journey to the cloud. Outside of work, she enjoys watching series and spending time with her family.