Tag Archives: Migration Solutions

Augmentation patterns to modernize a mainframe on AWS

Post Syndicated from Lewis Tang original https://aws.amazon.com/blogs/architecture/augmentation-patterns-to-modernize-a-mainframe-on-aws/

Customers with mainframes want to use Amazon Web Services (AWS) to increase agility, maximize the value of their investments, and innovate faster. On June 8, 2022, AWS announced the general availability of AWS Mainframe Modernization, a new service that makes it faster and simpler for customers to modernize mainframe-based workloads.

In this post, we discuss the common use cases and the augmentation architecture patterns that help liberate data from mainframe for modern data analytics, get rid of expensive and unsupported tape storage solutions for mainframe, build new capabilities that integrate with core mainframe workloads, and enable agile development and testing by adopting CI/CD for mainframe.

Pattern 1: Augment mainframe data retention with backup and archival on AWS

Mainframes process and generate the most business-critical data. It’s imperative to provide data protection via solutions, such as data backup, archiving, and disaster recovery. Mainframes usually use automated tape libraries—virtual tape libraries for backup and archive. These tapes need to be stored, organized, and transported to vaults and disaster recovery sites. All this can be very expensive and rigid.

There is a more cost-effective approach that helps simplify the operations of tape libraries:  leverage AWS partner tools, such as Model9, to transparently migrate the data on tape storage to AWS.

As depicted in Figure 1, mainframe data can be transferred via the secured network connection using AWS Transfer Family services or AWS DataSync to AWS cloud storage services, such as Amazon Elastic File System, Amazon Elastic Block Store, and Amazon Simple Storage Service (S3). After data is stored in AWS cloud, you can configure and move data among these services to meet with the business data processing need. Depending on data storage requirements, data storage costs can be further optimized by configuring S3 Lifecyle policies to move data among Amazon S3 storage classes. For long-term data archiving purpose, you can choose S3 Glacier storage class to achieve durability, resilience, and the optimal cost effectiveness.

Mainframe data backup and archival augmentation

Figure 1. Mainframe data backup and archival augmentation

Pattern 2: Augment mainframe with agile development and test environments including CI/CD pipeline on AWS

For any business-critical business application, a typical mainframe workload requires development and test environments to support production workloads. It’s common to see the lengthy application development lifecycle, a lack of automated testing, and an absent CI/CD pipeline with most of mainframes. Furthermore, the existing mainframe development processes and tools are outdated, as they are unable to keep up with the business pace, resulting in a growing backlog. Organizations with mainframes look for application development solutions to solve these challenges.

As demonstrated in Figure 2, AWS developer tools orchestrate code compilation, testing, and deployment among mainframe test environments. Mainframe test environments are either provided by the mainframe vendors as emulators or by AWS partners, such as Micro Focus. You can load the preferred developer tools and run an integrated development environment (IDE) from Amazon WorkSpaces or Amazon AppStream 2.0. Developers create or modify code in the IDE, and then commit and push their code to AWS CodeCommit. As soon as the code is pushed, an event is generated and triggers the pipeline in AWS CodePipeline to build the new code in a compilation environment via AWS CodeBuild. The pipeline pushes the new code to the test environment.

To optimize cost, you can scale the test environment capacity to meet needs. The tests are executed, and the test environment can be shut down when not in use. When the tests are successful, the pipeline pushes the code back to the mainframe via AWS CodeDeploy and an intermediary server. On the mainframe side, the code can go through a recompilation and final testing before being pushed to production.

You can further optimize operations and licensing cost of mainframe emulator by leveraging the managed integrated development and test environment provided by AWS Mainframe Modernization service.

Mainframe CI/CD augmentation

Figure 2. Mainframe CI/CD augmentation

Pattern 3: Augment mainframe with agile data analytics on AWS

Core business applications running on mainframes generate a lot of data throughout the years. Decades of historical business transactions and massive amounts of user data present an opportunity to develop deep business insight. By creating a data lake using the AWS big data services, you can gain faster analytics capabilities and better insight into core business data originated from mainframe applications.

Figure 3 depicts data being pulled from relational, hierarchical, or mainframe file-based data stores on mainframes. These data are presented in various formats and stored as DB2 for z/OS, VSAM, IMS DB, IDMS, DMS, or other formats. You can use AWS partners data replication and change data capture tools from AWS Marketplace or AWS cloud services, such as Amazon Managed Streaming for Apache Kafka for near real-time data streaming, Transfer Family services, and DataSync for moving data in batch from mainframes to AWS.

Once data are replicated to AWS, you can further process data using the services like AWS Lambda, or Amazon Elastic Container Service and store the processed data on various AWS storage services, such as Amazon DynamoDB, Amazon Relational Database Service, and Amazon S3.

By using AWS big data and data analytics services, such as Amazon EMR, Amazon Redshift, Amazon Athena, AWS Glue, and Amazon QuickSight, you can develop deep business insight and present flexible visuals to your customers. Read more about mainframe data integration.

Mainframe data analytics augmentation

Figure 3. Mainframe data analytics augmentation

Pattern 4: Augment mainframe with new functions and channels on AWS

Organizations with a mainframe use AWS to innovate and iterate quickly, as they often lack agility. For example, a common scenario for a bank could be providing a mobile application for customer engagements, such as supporting a marketing campaign for a new credit card.

As depicted in Figure 4, with the data replicated from mainframes to AWS cloud and analyzed by AWS big data and analytics services, new business functions can be developed on cloud-native applications by using Amazon API Gateway, AWS Lambda, and AWS Fargate. These new business applications can interact with mainframe data, and the combination can give deep business insight.

To add new innovation capabilities, with time-series data generated by the new business function applications, using Amazon Forecast can predict domain-specific metrics, such as inventory, workforce, web traffic, and finances. Amazon Lex can build virtual agents, automate informational response to customer enquiries, and improve business productivity. Adding Amazon SageMaker, you can prepare data gathered from mainframe and new business applications at scale to build, train, and deploy machine learning models for any business cases.

You can further improve customer engagement by incorporating Amazon Connect and Amazon Pinpoint to build multichannel communications.

Mainframe new functions and channels augmentation

Figure 4. Mainframe new functions and channels augmentation

Conclusion

To increase agility, maximize the value of investments, and innovate faster, organizations can adopt the patterns discussed in this post to augment mainframes by using AWS services to build resilient data protection solution, provision agile CI/CD integrated development and test environment, liberate mainframe data and developing innovation solutions for new digital customer experience. With AWS Mainframe Modernization service, you can accelerate this journey and innovate faster.

Ingest data from Snowflake to Amazon S3 using AWS Glue Marketplace Connectors

Post Syndicated from Sindhu Achuthan original https://aws.amazon.com/blogs/big-data/ingest-data-from-snowflake-to-amazon-s3-using-aws-glue-marketplace-connectors/

In today’s complex business landscape, organizations are challenged to consume from variety of sources and keep up with data that pours in all through the day. There is a demand to design applications that enables data to be portable across cloud platforms and give them the ability to derive insights from one or more data sources to remain competitive. In this post, we demonstrate how AWS Glue integration with Snowflake has simplified the process of connecting to Snowflake and applying data transformations without writing a single line of code. With AWS Glue Studio, you can now use a simple visual interface to compose jobs for migrations that move and integrate data. It enables you to subscribe to a Snowflake connector in AWS Marketplace, query Snowflake tables and save the data in Amazon Simple Storage Service (Amazon S3) as Parquet format.

If you choose to bring your own custom connector or prefer a different connector from AWS Marketplace, follow the steps in this blog Performing data transformations using Snowflake and AWS Glue. In this post, we use the new AWS Glue Connector for Snowflake to seamlessly connect with Snowflake without the need to install JDBC drivers. To validate the data ingested, we use Amazon Redshift Spectrum to create an external table and query the data in Amazon S3. With Amazon Redshift Spectrum, you can efficiently query and retrieve data from files in Amazon S3 without having to load the data into Amazon Redshift tables.

Solution Overview

Let’s take a look at the architecture diagram on how AWS Glue connects to Snowflake for data ingestion.

Prerequisites

Before you start, make sure you have the following:

  1. An account in Snowflake, specifically a service account that has permissions to tables to be queried.
  2. AWS Identity and Access Management (IAM) permissions in place to create AWS Glue and Amazon Redshift service roles and policies. To configure, follow the instructions in Setting up IAM Permissions for AWS Glue and Create an IAM role for Amazon Redshift.
  3. Amazon Redshift Serverless endpoint. If you do not have it configured, follow the instructions in Amazon Redshift Serverless Analytics.

Configure the Amazon S3 VPC Endpoint

As a first step, we configure an Amazon S3 VPC Endpoint to enable AWS Glue to use a private IP address to access Amazon S3 with no exposure to the public internet. Complete the following steps.

  1. Open the Amazon VPC console.
  2. In the left navigation pane, choose Endpoints.
  3. Choose Create Endpoint, and follow the steps to create an Amazon S3 VPC endpoint of type Gateway.

Next, we create a secret using AWS Secrets Manager

  1. On AWS Secrets Manager console, choose Store a new secret.
  2. For Secret type, select Other type of secret.
  3. Enter a key as sfUser and the value as your Snowflake user name.
  4. Enter a key as sfPassword and the value as your Snowflake user password.
  5. Choose Next.
  6. Name the secret snowflake_credentials and follow through the rest of the steps to store the secret.

Subscribe to AWS Marketplace Snowflake Connector

To subscribe to the connector, follow the steps and activate Snowflake Connector for AWS Glue. This native connector simplifies the process of connecting AWS Glue jobs to extract data from Snowflake

  1. Navigate to the Snowflake Connector for AWS Glue in AWS Marketplace.
  2. Choose Continue to Subscribe.
  3. Review the terms and conditions, pricing, and other details.
  4. Choose Continue to Configuration.
  5. For Delivery Method, choose your delivery method.
  6. For Software version, choose your software version
  7. Choose Continue to Launch.
  8. Under Usage instructions, choose Activate the Glue connector in AWS Glue Studio. You’re redirected to AWS Glue Studio.
  9. For Name, enter a name for your connection (for example, snowflake_s3_glue_connection).
  10. Optionally, choose a VPC, subnet, and security group.
  11. For AWS Secret, choose snowflake_credentials.
  12. Choose Create connection.

A message appears that the connection was successfully created, and the connection is now visible on the AWS Glue Studio console.

Configure AWS Glue for Snowflake JDBC connectivity

Next, we configure a AWS Glue job by following the steps below to extract data.

  1. On the AWS Glue console, choose AWS Glue Studio on the left navigation pane.
  2. On the AWS Glue Studio console, choose Jobs on the left navigation pane.
  3. Create a job with “Visual with source and target” and choose the Snowflake connector for AWS Glue 3.0 as the source and Amazon S3 as the target.
  4. Enter a name for the job.
  5. Under job details, select an IAM role.
  6. Create a new IAM role if you don’t have already with required AWS Glue and AWS Secrets Manager policies.
  7. Under Visual, Choose the Data source – Connection node and choose the connection you created.
  8. In connection options, create a key value pair with query as shown below. Note that CUSTOMER table in SNOWFLAKE_SAMPLE_DATA database is considered for this migration. This table gets preloaded (1.5M rows) when you install Snowflake Schema.

    key value
    query SELECT
    C_CUSTKEY,
    C_NAME,
    C_ADDRESS,
    C_NATIONKEY,
    C_PHONE,
    C_ACCTBAL,
    C_MKTSEGMENT,
    C_COMMENT
    FROM
    SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
    sfUrl MXA94638.us-east-1.snowflakecomputing.com
    sfDatabase SNOWFLAKE_SAMPLE_DATA
    sfWarehouse COMPUTE_WH

  9. In the Output schema section, specify the source schema as key-value pairs as shown below.
  10. Choose the Transform-ApplyMapping node to view the following transform details.

  11. Choose the Data target properties – S3 node and enter S3 bucket details as shown below.
  12. Choose Save.

After you save the job, the following script is generated. It assumes the account information and credentials are stored in AWS Secrets Manager as described earlier.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node Snowflake Connector for AWS Glue 3.0
SnowflakeConnectorforAWSGlue30_node1 = glueContext.create_dynamic_frame.from_options(
    connection_type="marketplace.spark",
    connection_options={
        "query": "SELECT C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER",
        "sfUrl": "MXA94638.us-east-1.snowflakecomputing.com",
        "sfDatabase": "SNOWFLAKE_SAMPLE_DATA",
        "sfWarehouse": "COMPUTE_WH",
        "connectionName": "snowflake_s3_glue_connection",
    },
    transformation_ctx="SnowflakeConnectorforAWSGlue30_node1",
)

# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
    frame=SnowflakeConnectorforAWSGlue30_node1,
    mappings=[
        ("C_CUSTKEY", "decimal", "C_CUSTKEY", "decimal"),
        ("C_NAME", "string", "C_NAME", "string"),
        ("C_ADDRESS", "string", "C_ADDRESS", "string"),
        ("C_NATIONKEY", "decimal", "C_NATIONKEY", "decimal"),
        ("C_PHONE", "string", "C_PHONE", "string"),
        ("C_ACCTBAL", "decimal", "C_ACCTBAL", "decimal"),
        ("C_MKTSEGMENT", "string", "C_MKTSEGMENT", "string"),
        ("C_COMMENT", "string", "C_COMMENT", "string"),
    ],
    transformation_ctx="ApplyMapping_node2",
)

# Script generated for node S3 bucket
S3bucket_node3 = glueContext.write_dynamic_frame.from_options(
    frame=ApplyMapping_node2,
    connection_type="s3",
    format="glueparquet",
    connection_options={"path": "s3://sf-redshift-po/test/", "partitionKeys": []},
    format_options={"compression": "snappy"},
    transformation_ctx="S3bucket_node3",
)

job.commit()
  1. Choose Run to run the job.

After the job completes successfully, the run status should change to Succeeded.

The following screenshot shows that the data was written to Amazon S3.

Query Amazon S3 data using Amazon Redshift Spectrum

Let’s query the data in Amazon Redshift Spectrum

  1. On the Amazon Redshift console, choose the AWS Region.
  2. In the left navigation pane, choose Query Editor.
  3. Run the create external table DDL given below.
    CREATE EXTERNAL TABLE "spectrum_schema"."customer"
    (c_custkey   decimal(10,2),
    c_name       varchar(256),
    c_address    varchar(256),
    c_nationkey  decimal(10,2),
    c_phone      varchar(256),
    c_acctbal    decimal(10,2),
    c_mktsegment varchar(256),
    c_comment    varchar(256))
    stored as parquet
    location 's3://sf-redshift-po/test'; 

  1. Run the select query on the CUSTOMER table.
SELECT count(c_custkey), c_nationkey FROM "dev"."spectrum_schema"."customer" group by c_nationkey

Considerations

The AWS Glue crawler doesn’t work directly with Snowflake. This is a native capability that you can use for other AWS data sources that are joined or connected in the AWS Glue ETL job. Instead, you can define connections in the script as shown earlier in this post.

The Snowflake source tables covered in this post only focus on structured data types and therefore semi-structured or unstructured data types in Snowflake (binary, varbinary, and variant) are out of scope. However, you could use AWS Glue functions such as relationalize to flatten nested schema data into semi-normalized structures, or you could use Amazon Redshift Spectrum to support these data types.

Conclusion

In this post, we learnt how to define Snowflake connection parameters in AWS Glue, connect to Snowflake from AWS Glue using the AWS native connector for Snowflake, migrate to Amazon S3 and use Redshift Spectrum to query data in Amazon S3 to meet your business needs.

We welcome any thoughts or questions in the comments section below


About the Authors

Sindhu Achuthan is a Data Architect with Global Financial Services at Amazon Web Services. She works with customers to provide architectural guidance for analytics solutions on Amazon Glue, Amazon Redshift, AWS Lambda, and other services. Outside work, she is a DIYer, likes to go on long trails, and do yoga.

Shayon Sanyal is a Sr. Solutions Architect specializing in databases at AWS. His day job allows him to help AWS customers design scalable, secure, performant and robust database architectures on the cloud. Outside work, you can find him hiking, traveling or training for the next half-marathon.