Tag Archives: Migration Solutions

Accelerate large-scale modernization of .NET, mainframe, and VMware workloads using Amazon Q Developer

Post Syndicated from Krishna Parab original https://aws.amazon.com/blogs/devops/accelerate-large-scale-modernization-of-net-mainframe-and-vmware-workloads-using-amazon-q-developer/

Software runs the world – not just the new software applications built in modern languages and deployed on the most optimized cloud infrastructure, but also legacy software built over years and barely understood by the teams that inherit them. These legacy applications may have snowballed into monolithic blocks or may be fragmented across siloed on-premises infrastructure. The significant maintenance, security, and compliance challenges caused can create lasting implications for business performance and competitiveness. Therefore, transformation of legacy applications using modern languages, new frameworks, and cloud services has become an organizational imperative.

Application modernization challenges

Modernization of software applications is a long and painful journey – requiring large teams of developers, domain experts, and consultants who first need to understand the application landscape, devise strategic modernization plans, and then tactically implement the plans in phases, typically over a span of many years. This process is linear, slow, and complex. Traditional labor-intensive modernization approaches incur significant costs and take years to leverage new cloud technologies and innovations for business-critical applications.

Generative AI can help with intelligent automation, domain expertise, and scalability to transform modernization journeys.

Introducing Amazon Q Developer transformation capabilities

Q Developer transformation capabilities powered by LLMs and domain-expert agents support human-agent interaction via an IDE experience for individual developers and a web experience for multifunctional teams.

Amazon Q Developer transformation capabilities

Amazon Q Developer, the most capable generative AI–powered assistant for software development, is now the first generative AI-powered assistant for large-scale modernization and migration of .NET, mainframe, and VMware workloads. This extends Q Developer’s transformation capabilities for Java upgrades launched in April 2024 to new types of workloads. Q Developer combines both foundational models and specialized tools based on AI and automated reasoning via autonomous agents that tackle workload-specific modernization steps spanning analysis, planning, and implementation.

Multifunctional teams, including consultants, IT experts, workload domain experts, and developers, can use a unified web experience to offload transformation tasks to Amazon Q Developer agents and transform hundreds of workloads at a time. The agents can port .NET Framework to cross-platform Linux-ready .NET, modernize COBOL applications on mainframes to Java applications on AWS, or virtualized workloads on VMware to scalable workloads on EC2. The modernization teams engage with Q Developer using natural language and share transformation objectives, code repositories, and context. Q Developer agents analyze artifacts like code segments, dependencies, and integrations, applying expertise from prior modernizations. They propose customized plans tailored to codebases, resource utilization, and objectives. The teams can then review, adjust, and approve the plans with iterative engagement with the agents. After the plans are approved, the agents implement the transformation keeping the modernization teams updated on milestones completed and blockers needing human guidance. The transformation journey is an interactive process between the modernization team and Q Developer, with modernization team maintaining control and visibility over the transformation.

Human team members interact with Q Developer generative AI agents using natural language chat.

Natural language chat with Q Developer AI agents

Faster, scalable, and better modernization

Amazon Q Developer enhances transformation in three primary ways – acceleration, scalability, and quality.

Amazon Q Developer automates and accelerates complex, multi-step processes. Agents conduct assessment and discovery of legacy artifacts to build documentation and dependency maps that improve the understanding of source assets. Most large-scale modernization projects are done in waves that need to be carefully planned. The agents develop modernization wave plans based on source dependencies, stated project goals, and teams can review and approve the plans. Thereafter, the goal-seeking autonomous agents handle implementation complexities to execute the plans. Customers using Amazon Q Developer can modernize Windows .NET applications to Linux up to four times faster than traditional methods and help customers realize up to 40% savings in licensing costs. Migration Planning for the sequence to transform monolith z/OS COBOL application code that takes months to accomplish with human subject matter experts, Amazon Q Developer generates in minutes. Q Developer agents convert on-premises VMware network configurations into modern AWS equivalents in hours vs. the weeks required with traditional manual approaches. The shorter time spent on manual modernization means more freedom for your team to focus on innovation.

Modernization has traditionally been a linear journey with multiple steps and dependencies on cross-functional teams with limited mechanisms for collaboration. This limits teams’ ability to tackle large-scale projects. Amazon Q Developer addresses the challenges by task parallelization and web-based collaboration. Multiple generative AI agents work simultaneously on tasks. Large monolithic applications can be decomposed along business functions like engineering, marketing, sales applications, and transformed in parallel. A unified web-based experience for large-scale transformation means multi-functional team members can collaborate with the autonomous agents, and review and approve key decisions in one place, enabling teams to execute larger and more complex projects in a given time.

Finally, the quality of transformation manifested in functional equivalence, security, and resilience of modernized applications determines the business outcomes like project ROI and operational performance. To ensure transformation quality, you need expertise in languages and frameworks like COBOL, Java, .NET; specialized steps like code base analysis, monolith decomposition, code refactoring, network translation; and domains like mainframe, virtualization, and cloud. You may not have the requisite expertise in your team. That is where Amazon Q Developer can help. Q Developer agents are trained with specific domain expertise to identify code dependencies and frameworks, replace deprecated code, upgrade to new language frameworks, incorporate security best practices, and validate upgraded workloads using workload-tailored plans. Your team can examine the agents’ recommendations, make informed decisions, and guide the modernization journey towards better outcomes like enhanced security, compliance, and performance.

Q Developer supports modernization of .NET Framework applications to cross-platform .NET applications, mainframe-based COBOL applications to Java applications on AWS, on-premises VMware workloads to workloads on EC2, and Java v8/11/17 to Java17/21.

Workloads supported by Amazon Q Developer transformation capabilities

Next steps

Amazon Q Developer transformation capabilities are now available in preview. To learn more, please visit Q Developer web page featuring short demo videos and documentation that can get you started. Read the AWS News blogs that walk you through the unified web experience and IDE experience. Dive deeper into the transformation of specific workloads by reading the workload-specific blogs related to transformation of .NET, mainframe, and VMware workloads.

About the author:

Elio Damaggio

Krishna Parab

Krishna B. Parab leads product marketing for Amazon Q Developer transformation capabilities. He has over 13 years of experience in product marketing and prior experience in engineering and product management. He has led marketing for Cisco Cloud, ServiceNow service management SaaS, Arm Pelion IoT platform, Automation Anywhere RPA platform, and AWS Mainframe Modernization service. Krishna’s educational background includes BTech, MS, and MBA degrees from IIT Bombay, UT Austin, and University of Michigan, respectively.

Elio Damaggio

Elio Damaggio

Elio Damaggio is the product lead for the transformation capabilities of Amazon Q Developer. With more than 15 years in tech, 11 patents, and a PhD in Computer Science, he is now looking for exciting ways to empower developers through AI.

Email Journaling with SES Mail Manager

Post Syndicated from Zip Zieper original https://aws.amazon.com/blogs/messaging-and-targeting/email-journaling-with-ses-mail-manager/

Introduction to Journaling

Email journaling is the practice of preserving comprehensive records of all email communications within an organization. This approach stems from the need to maintain rigid, compliance-driven retention policies focused on auditing an entire organization’s email activities. Because journaled email messages are often required to satisfy on-demand audit and investigation requests, they must be readily searchable, making accessibility a key requirement. Reflecting legal and regulatory requirements, email journaling has historically required expensive, dedicated off-site storage and complex retrieval systems.

Amazon WorkMail is a managed business email service with flexible journaling capabilities that are configurable at both the individual mailbox and organization-wide level. With WorkMail, you can use custom rules to selectively preserve or redirect certain messages using granular journaling controls. This flexibility allows administrators to implement both traditional email journaling and configurations that you can customize to meet specific use cases.

Email journaling is used to capture and retain every email sent to and from an organization, primarily for compliance purposes. In contrast, email archiving is typically used to offload and store emails from an organization’s primary email system, often driven by inbox size limits and data backup or eDiscovery needs. While journaling focuses on preserving a consolidated record of communications separate from live mailboxes, archiving is a more selective process. Journaling is usually driven by regulatory, audit, and compliance requirements. As discussed in this blog post, you can use the Mail Manager archiving feature not only for selective email backup and optimization, but also to fulfill your email journaling requirements. You can learn more about email archiving with Mail Manager in this blog post.

Amazon Simple Email Service (SES) Mail Manager provides comprehensive tools that simplify managing large volumes of email communications within an organization. Mail Manager has a built-in archiving function which can be used as an inexpensive journaling solution for email systems like Amazon WorkMail. Mail Manager’s rules engine allows for the creation of rules that readily satisfy a wide range of email journaling requirements. Additionally, Mail Manager’s archiving capability supports multiple, concurrent archiving destinations that can be independently searched and exported on demand.

In this blog post, we discuss how Amazon WorkMail and Amazon Simple Email Service (SES) Mail Manager make email journaling easier to set up and use, more cost-effective and versatile. We’ll walk the reader through setting up email journaling for an Amazon WorkMail organization that uses SES Mail Manager’s routing, processing, and archiving features.

SES Mail Manager as Journaling Destination for WorkMail

For our purposes, we’ll assume you’ve already set up WorkMail as your mailbox provider, but the process described below will work with the journaling features of most 3rd party email solutions. If you want to explore Amazon WorkMail, visit the getting started documentation here.

In the following sections, we’ll describe how to configure WorkMail journaling to send full email journals to SES Mail Manager’s archives. We’ll define different retention periods for each archive to demonstrate how this solution can be used to meet both short and long-term retention requirements. Finally, we’ll use the AWS SES Mail Manager console to search, export, and manage the email journals and archives.

In our examples, we’ll use Amazon Route 53 to create a new domain called ‘journaling.solutions’ which we’ll configure to send all ‘@journaling.solutions’ emails to an SES Mail Manager Ingest endpoint. To begin, open the AWS Console, navigate to your WorkMail Organization’s settings, and click on the Journaling tab:

Organization settings Journaling tab

Organization settings Journaling tab

Click Edit, enable journaling, and provide a journaling email address (we’re using ‘[email protected]’) to receive journaled content. Provide a report email address, such as the admin email list, to receive journaling reports:

Provide a Journaling email address

Provide a Journaling email address

Open the AWS SES console in a new browser window, and navigate to Mail Manager’s Rule sets. Create a new rule set called ‘journaling-rule-demo’. Click Edit and create a new rule called “journal-all”, with an Archive action. Click the create an archive button and create an archive called ‘journaling-archive-demo’:

Create a new Rule Set called ‘Journaling-rule-demo’

Create a new Rule Set called ‘Journaling-rule-demo’

When creating Mail Manager archives, you have options to set the retention period from 3 months to permanent storage. You can also choose to encrypt your archived messages with your own KMS key. The configuration in our example is for permanent storage and shows the optional text field for using your own KMS key:

you can encrypt the archived messages with your own KMS key

you can encrypt the archived messages with your own KMS key

Traditional journaling calls for recording every email message to the journal, so for our ‘journal-all’ rule, we will not define filtering behaviors in the rule set. This will instruct Mail manager to send all emails for [email protected] to the journaling-archive-demo archive. It is worth noting that Mail Manager’s rule set can be configured to filter and independently process multiple recipient addresses. Consult the documentation to learn about other ways to customize Mail Manager for your use cases.

Next, create a new traffic policy, called journaling-traffic-demo, and configure it to reject any message not explicitly sent to the journaling destination address ([email protected]):

Create a new Traffic policy, called ‘Journaling-traffic-demo’

Create a new Traffic policy, called ‘Journaling-traffic-demo’

Create an open ingress endpoint called ‘journaling-demo-IG’, and select the ‘journaling-traffic-demo’ traffic policy and ‘journaling-rule-demo’ rule set:

Create an Open Ingress endpoint called ‘Journaling-demo-IG’,

Create an Open Ingress endpoint called ‘Journaling-demo-IG’,

After you press the create Ingest endpoint button, Mail Manager will create an Ingress endpoint and assign it a DNS A Record to be used in your DNS configurations to route email to Mail Manager:

Mail Manager Ingress endpoint DNS A Record to be used in your DNS configurations

Mail Manager Ingress endpoint DNS A Record to be used in your DNS configurations

From the General details page of the Ingress endpoint, copy the Ingress endpoint’s DNS A Record to your clipboard. Open a new browser window to your DNS provider’s MX configuration page (in our example below, we’re using AWS Route53). Edit the MX record for ‘journaling.solutions’ by pasting the Ingress endpoint A record. This configuration will route email sent to any address ‘@journaling.solutions’ to the Mail Manager’s Ingress endpoint for processing by the Traffic policy and Rule set:

Using AWS Route53 to edit MX record for ‘journaling.solutions’ to Ingress endpoint A record

Using AWS Route53 to edit MX record for ‘journaling.solutions’ to Ingress endpoint A record

To test your new journaling configuration, send several emails to several email addresses in your WorkMail organization (or the alternative inbox provider you configured in the first step). WorkMail (or your alternative inbox provider) will send a full record of all emails to the journaling destination address ([email protected]).

Wait a few minutes after sending the emails above, then open the AWS Mail Manager console’s archiving controls and search for messages sent in the last 12 hours:

AWS Mail Manager console’s archiving controls

AWS Mail Manager console’s archiving controls

The example above shows a search for all messages received in the “last 12 hours”, with no other filters specified. The results show every message inserted into the archive in this timeframe. You’ll see one entry where the from address is different (from toby@tegwj@…). This is an example of mail that was sent directly to the journaling destination address ([email protected]). This works because our traffic policy and rule set configurations don’t include any filters.

A cost effective solution at scale

Using Mail Manager as a journaling solution gives you more direct control over your costs than typical journaling services. While most journaling services in the market today charge a fixed rate per journaled mailbox, Mail Manager pricing is comprised of a monthly fixed fee per ingestion endpoint and consumption pricing for basic message handling, and the amount of data archived.

For example, imagine your organization has 250 mailboxes, each handling 50 messages per day. On a monthly basis this amounts to 375,000 messages. If we assume each message is 40 kilobytes in size, your organization is generating roughly 15 gigabytes of email per month. As you can see from the table below, the total cost in month 1 is about $140, or $0.56/mailbox.

|Item |Unit Price |Volume |Subtotal/Mo |
|— |— |— |— |
|Ingress Endpoint |$50/mo |1 |$50 |
|Core message processing |$0.15/1000 msgs |375 |$56.25 |
|Archive insertion/indexing |$2/GB (one-time) |15 |$30 |
|Archive storage |$0.19/GB/mo |15 |$2.85 |
|Subtotal: | | |$139.10 |
| |Monthly price per mailbox |$0.56 |

If the proposed email rate in our assumptions stays constant, the Mail Manager archive will grow by 15 gigabytes each month. After 36 months, the total monthly storage cost increases to $102.60. This results in a total monthly spend in month 36 of $238.85, or $0.96/mailbox/month.

Conclusion

In this blog post, we’ve explored how Amazon WorkMail and Amazon SES Mail Manager can provide a cost-effective and accessible solution for email journaling. By leveraging the flexible journaling capabilities of WorkMail and the archiving features of SES Mail Manager, organizations can easily satisfy rigorous compliance requirements around email retention and accessibility.

The combination of WorkMail’s journaling controls and SES Mail Manager’s rule-based archiving allows you to tailor your journaling solution to your specific needs. Whether you require short-term retention for audits or long-term preservation for legal and regulatory purposes, SES Mail Manager’s flexible archiving options have you covered with predictable and transparent costs that scale with your organization’s email volume.

If you’re looking for a modern, scalable, and cost-effective solution for your email journaling needs, we encourage you to explore the capabilities of Amazon SES Mail Manager. Get started today by visiting the AWS documentation and begin streamlining your email compliance and retention processes.

About the Authors

Toby Weir-Jones

Toby Weir-Jones

Toby is a Principal Product Manager for Amazon SES and WorkMail. He joined AWS in January 2021 and has significant experience in both business and consumer information security products and services. His focus on email solutions at SES is all about tackling a product that everyone uses and finding ways to bring innovation and improved performance to one of the most ubiquitous IT tools.

Zip

Zip

Zip is a Sr. Specialist Solutions Architect at AWS, working with Amazon Pinpoint and Simple Email Service and WorkMail. Outside of work he enjoys time with his family, cooking, mountain biking, boating, learning and beach plogging.

Andy Wong

Andy Wong

Andy Wong is a Sr. Product Manager with the Amazon WorkMail team. He has 10 years of diverse experience in supporting enterprise customers and scaling start-up companies across different industries. Andy’s favorite activities outside of technology are soccer, tennis and free-diving.

Bruno Giorgini

Bruno Giorgini

Bruno Giorgini is a Senior Solutions Architect specializing in Pinpoint and SES. With over two decades of experience in the IT industry, Bruno has been dedicated to assisting customers of all sizes in achieving their objectives. When he is not crafting innovative solutions for clients, Bruno enjoys spending quality time with his wife and son, exploring the scenic hiking trails around the SF Bay Area.

Email Archiving with Mail Manager: Why To Archive In Transit vs At The Mailbox

Post Syndicated from Zip Zieper original https://aws.amazon.com/blogs/messaging-and-targeting/email-archiving-with-mail-manager-why-to-archive-in-transit-vs-at-the-mailbox/

When designing Amazon Simple Email Service’s (SES) Mail Manager, we often heard from customers about the “PST-file problem” inherent with user-side mailbox-based archiving. This occurs when, for a variety of reasons, end users decide to archive their emails to local PST files or other local storage. These PST files are fragile and easily corrupted. Furthermore, they are subject to the backup practices of individual workstations. Lastly, PST files are readily are portable and can be easily copied and moved outside the visibility of the email system and your IT and IP controls.

We developed Amazon Simple Email Service (SES) Mail Manager archiving features in response to this problem, and based on additional customer feedback: the need for consistent email retention behaviors, for all email. Customers also wanted the flexibility to determine which messages to archive, where to put them, and how long to retain those messages.

To make the feature applicable to the widest set of use cases, we designed Mail Manager to be able to archive any email traversing the SES service, not just those that have already been delivered to a user’s mailbox. This added flexibility ensures organizations can maintain a complete record of exactly those email communications they wish to preserve. Rather than require external tools to search and export Mail Manager’s archives, we built these functions directly into the SES console.

In fact, the entire Media Manager archiving solution is fully managed by SES within the customer’s Mail Manager account, reducing the operational overhead traditionally associated with email archiving and compliance.

Figure 1 - Mail Manager Archiving

Figure 1 – Mail Manager Archiving

At the core of the SES Mail Manager archiving solution is the ability to capture and retain any message, regardless of its source or destination, as it flows through the service’s rules engine. This design approach ensures that every email message traversing Mail Manager can be subject to archiving and retention policies, rather than requiring organizations to manage different systems and tools for mail flowing through mail servers, internal relays and other email infrastructure. The result is a unified, comprehensive compliance solution that provides visibility and control over an organization’s email archiving.

SES also published a detailed overview of the Archiving feature, which is available here: Archiving and sending to final SMTP server.

Archiving on its own isn’t an innovation; it’s an email primitive – an essential capability that can be used to enable other, more complex solutions. Historically, retention of email was configured as a function of your on-premises mail server, where your mailboxes themselves were resident. Personally-authored emails were considered the high-value material to retain, and adding archiving as a function of mailbox configurations was the simplest approach.

In practice, we find that the mail captured at the mailbox server, or end user’s inbox, represents only a fraction of of the mail a typical enterprise generates. As organizations grow, the number of applications generating Application To Person (A2P) messages tends to increase dramatically. Similarly, as corporate environments become more complex, SaaS-based solutions that are external to the primary email infrastructure often use email to update employees along with workflow-management systems. Much of that mail eludes archiving as it bypasses individual user mailboxes.

The SES strategy for archiving is to capture mail from anywhere, to anywhere, as long as it transits an ingress endpoint as part of your Mail Manager configuration. You have two choices: you can write those messages directly to an S3 bucket you control, and then ingest it into any other tool you like. Alternately, you can send messages into a managed archive within Mail Manager, and gain access to search, export, and configurable retention features. By default, SES configures retention for 6 months, but it’s adjustable up to permanent retention for customers who require it.

Mail Manager’s archiving feature captures any message which matches your rule, or all messages traversing any ingress endpoint. You can choose to write all messages to or from your senior leadership team into one archive, or you can organize by other envelope metadata. The rules operate the same way whether the message is A2P or Person to Person (P2P), ensuring uniform policies and retention options.

With Mail Manager’s managed archives, you pay for each gigabyte ingested, indexed, and available for search, and a separate storage fee for each gigabyte retained every month. Note that the storage fee includes both the raw content of the messages, and the size of the computed index required for search and export functions.

For messages you write to your S3 buckets, you also have the option to invoke an S3 trigger action that calls an Amazon Lambda to drive various automatation workflows. Regulated industries might want to write all messages to S3 to leverage S3’s glacier storage option for very long-term storage.

You can even split your workload between Mail Manager’s managed archive, for emails you are likely to need readily discoverable, and the Write to S3 option, for content which you don’t expect to ever need to search with granularity, but still needs to be archived to “check the box” for your retention policy. In fact, AWS encourages such a builder-oriented approach, because it rewards thoughtful decisions and resource utilization, and conforms to the broad goal of consumption-based pricing, which Mail Manager embraces fully at every step.

Figure 2 - Rule Set with conditions for archiving

Figure 2 – Rule Set with conditions for archivingMail Manager provides a more comprehensive, resilient archiving approach that increases both the overall scope of mail that can be captured, and the fidelity of the archived data. You don’t need any special adapters or plugins to capture mail from any source. All email that comes through your Mail Manager Ingress Endpoint can be archived.

Figure 3 - Create archive

Figure 3 – Create archive

Why not try Mail Manager today and experience the benefits of a centralized, scalable email archiving solution? You’ll pay only for the data you ingest and retain each month, without the fragility and visibility issues of user-managed archives. Visit the SES website to start your free trial of Mail Manager and take control of your organization’s critical email records. To start with Mail Manager, visit https://aws.amazon.com/ses/, click on Mail Manager, and set up your first workload today.

If you have any questions or need further guidance, feel free to reach out to us via the SES Forums or in the comments section of this blog post. We’re here to help you navigate the evolving email landscape and unlock the full potential of your Amazon SES investment.

About the Authors

Toby Weir-Jones

Toby Weir-Jones

Toby is a Principal Product Manager for Amazon SES and WorkMail. He joined AWS in January 2021 and has significant experience in both business and consumer information security products and services. His focus on email solutions at SES is all about tackling a product that everyone uses and finding ways to bring innovation and improved performance to one of the most ubiquitous IT tools.

Brett Ezell

Brett Ezell

Brett is an Amazon Pinpoint and Amazon Simple Email Service Specialist Solutions Architect at AWS. As a Navy veteran, he joined AWS in 2020 through an AWS technical military apprenticeship program. When he isn’t deep diving into solutions for customer challenges, Brett spends his time collecting vinyl, attending live music, and training at the gym. An admitted comic book nerd, he feeds his addiction every Wednesday by combing through his local shop for new books.

Zip

Zip

Zip is a Sr. Specialist Solutions Architect at AWS, working with Amazon Pinpoint and Simple Email Service and WorkMail. Outside of work he enjoys time with his family, cooking, mountain biking, boating, learning and beach plogging.

Augmentation patterns to modernize a mainframe on AWS

Post Syndicated from Lewis Tang original https://aws.amazon.com/blogs/architecture/augmentation-patterns-to-modernize-a-mainframe-on-aws/

Customers with mainframes want to use Amazon Web Services (AWS) to increase agility, maximize the value of their investments, and innovate faster. On June 8, 2022, AWS announced the general availability of AWS Mainframe Modernization, a new service that makes it faster and simpler for customers to modernize mainframe-based workloads.

In this post, we discuss the common use cases and the augmentation architecture patterns that help liberate data from mainframe for modern data analytics, get rid of expensive and unsupported tape storage solutions for mainframe, build new capabilities that integrate with core mainframe workloads, and enable agile development and testing by adopting CI/CD for mainframe.

Pattern 1: Augment mainframe data retention with backup and archival on AWS

Mainframes process and generate the most business-critical data. It’s imperative to provide data protection via solutions, such as data backup, archiving, and disaster recovery. Mainframes usually use automated tape libraries—virtual tape libraries for backup and archive. These tapes need to be stored, organized, and transported to vaults and disaster recovery sites. All this can be very expensive and rigid.

There is a more cost-effective approach that helps simplify the operations of tape libraries:  leverage AWS partner tools, such as Model9, to transparently migrate the data on tape storage to AWS.

As depicted in Figure 1, mainframe data can be transferred via the secured network connection using AWS Transfer Family services or AWS DataSync to AWS cloud storage services, such as Amazon Elastic File System, Amazon Elastic Block Store, and Amazon Simple Storage Service (S3). After data is stored in AWS cloud, you can configure and move data among these services to meet with the business data processing need. Depending on data storage requirements, data storage costs can be further optimized by configuring S3 Lifecyle policies to move data among Amazon S3 storage classes. For long-term data archiving purpose, you can choose S3 Glacier storage class to achieve durability, resilience, and the optimal cost effectiveness.

Mainframe data backup and archival augmentation

Figure 1. Mainframe data backup and archival augmentation

Pattern 2: Augment mainframe with agile development and test environments including CI/CD pipeline on AWS

For any business-critical business application, a typical mainframe workload requires development and test environments to support production workloads. It’s common to see the lengthy application development lifecycle, a lack of automated testing, and an absent CI/CD pipeline with most of mainframes. Furthermore, the existing mainframe development processes and tools are outdated, as they are unable to keep up with the business pace, resulting in a growing backlog. Organizations with mainframes look for application development solutions to solve these challenges.

As demonstrated in Figure 2, AWS developer tools orchestrate code compilation, testing, and deployment among mainframe test environments. Mainframe test environments are either provided by the mainframe vendors as emulators or by AWS partners, such as Micro Focus. You can load the preferred developer tools and run an integrated development environment (IDE) from Amazon WorkSpaces or Amazon AppStream 2.0. Developers create or modify code in the IDE, and then commit and push their code to AWS CodeCommit. As soon as the code is pushed, an event is generated and triggers the pipeline in AWS CodePipeline to build the new code in a compilation environment via AWS CodeBuild. The pipeline pushes the new code to the test environment.

To optimize cost, you can scale the test environment capacity to meet needs. The tests are executed, and the test environment can be shut down when not in use. When the tests are successful, the pipeline pushes the code back to the mainframe via AWS CodeDeploy and an intermediary server. On the mainframe side, the code can go through a recompilation and final testing before being pushed to production.

You can further optimize operations and licensing cost of mainframe emulator by leveraging the managed integrated development and test environment provided by AWS Mainframe Modernization service.

Mainframe CI/CD augmentation

Figure 2. Mainframe CI/CD augmentation

Pattern 3: Augment mainframe with agile data analytics on AWS

Core business applications running on mainframes generate a lot of data throughout the years. Decades of historical business transactions and massive amounts of user data present an opportunity to develop deep business insight. By creating a data lake using the AWS big data services, you can gain faster analytics capabilities and better insight into core business data originated from mainframe applications.

Figure 3 depicts data being pulled from relational, hierarchical, or mainframe file-based data stores on mainframes. These data are presented in various formats and stored as DB2 for z/OS, VSAM, IMS DB, IDMS, DMS, or other formats. You can use AWS partners data replication and change data capture tools from AWS Marketplace or AWS cloud services, such as Amazon Managed Streaming for Apache Kafka for near real-time data streaming, Transfer Family services, and DataSync for moving data in batch from mainframes to AWS.

Once data are replicated to AWS, you can further process data using the services like AWS Lambda, or Amazon Elastic Container Service and store the processed data on various AWS storage services, such as Amazon DynamoDB, Amazon Relational Database Service, and Amazon S3.

By using AWS big data and data analytics services, such as Amazon EMR, Amazon Redshift, Amazon Athena, AWS Glue, and Amazon QuickSight, you can develop deep business insight and present flexible visuals to your customers. Read more about mainframe data integration.

Mainframe data analytics augmentation

Figure 3. Mainframe data analytics augmentation

Pattern 4: Augment mainframe with new functions and channels on AWS

Organizations with a mainframe use AWS to innovate and iterate quickly, as they often lack agility. For example, a common scenario for a bank could be providing a mobile application for customer engagements, such as supporting a marketing campaign for a new credit card.

As depicted in Figure 4, with the data replicated from mainframes to AWS cloud and analyzed by AWS big data and analytics services, new business functions can be developed on cloud-native applications by using Amazon API Gateway, AWS Lambda, and AWS Fargate. These new business applications can interact with mainframe data, and the combination can give deep business insight.

To add new innovation capabilities, with time-series data generated by the new business function applications, using Amazon Forecast can predict domain-specific metrics, such as inventory, workforce, web traffic, and finances. Amazon Lex can build virtual agents, automate informational response to customer enquiries, and improve business productivity. Adding Amazon SageMaker, you can prepare data gathered from mainframe and new business applications at scale to build, train, and deploy machine learning models for any business cases.

You can further improve customer engagement by incorporating Amazon Connect and Amazon Pinpoint to build multichannel communications.

Mainframe new functions and channels augmentation

Figure 4. Mainframe new functions and channels augmentation

Conclusion

To increase agility, maximize the value of investments, and innovate faster, organizations can adopt the patterns discussed in this post to augment mainframes by using AWS services to build resilient data protection solution, provision agile CI/CD integrated development and test environment, liberate mainframe data and developing innovation solutions for new digital customer experience. With AWS Mainframe Modernization service, you can accelerate this journey and innovate faster.

Ingest data from Snowflake to Amazon S3 using AWS Glue Marketplace Connectors

Post Syndicated from Sindhu Achuthan original https://aws.amazon.com/blogs/big-data/ingest-data-from-snowflake-to-amazon-s3-using-aws-glue-marketplace-connectors/

In today’s complex business landscape, organizations are challenged to consume from variety of sources and keep up with data that pours in all through the day. There is a demand to design applications that enables data to be portable across cloud platforms and give them the ability to derive insights from one or more data sources to remain competitive. In this post, we demonstrate how AWS Glue integration with Snowflake has simplified the process of connecting to Snowflake and applying data transformations without writing a single line of code. With AWS Glue Studio, you can now use a simple visual interface to compose jobs for migrations that move and integrate data. It enables you to subscribe to a Snowflake connector in AWS Marketplace, query Snowflake tables and save the data in Amazon Simple Storage Service (Amazon S3) as Parquet format.

If you choose to bring your own custom connector or prefer a different connector from AWS Marketplace, follow the steps in this blog Performing data transformations using Snowflake and AWS Glue. In this post, we use the new AWS Glue Connector for Snowflake to seamlessly connect with Snowflake without the need to install JDBC drivers. To validate the data ingested, we use Amazon Redshift Spectrum to create an external table and query the data in Amazon S3. With Amazon Redshift Spectrum, you can efficiently query and retrieve data from files in Amazon S3 without having to load the data into Amazon Redshift tables.

Solution Overview

Let’s take a look at the architecture diagram on how AWS Glue connects to Snowflake for data ingestion.

Prerequisites

Before you start, make sure you have the following:

  1. An account in Snowflake, specifically a service account that has permissions to tables to be queried.
  2. AWS Identity and Access Management (IAM) permissions in place to create AWS Glue and Amazon Redshift service roles and policies. To configure, follow the instructions in Setting up IAM Permissions for AWS Glue and Create an IAM role for Amazon Redshift.
  3. Amazon Redshift Serverless endpoint. If you do not have it configured, follow the instructions in Amazon Redshift Serverless Analytics.

Configure the Amazon S3 VPC Endpoint

As a first step, we configure an Amazon S3 VPC Endpoint to enable AWS Glue to use a private IP address to access Amazon S3 with no exposure to the public internet. Complete the following steps.

  1. Open the Amazon VPC console.
  2. In the left navigation pane, choose Endpoints.
  3. Choose Create Endpoint, and follow the steps to create an Amazon S3 VPC endpoint of type Gateway.

Next, we create a secret using AWS Secrets Manager

  1. On AWS Secrets Manager console, choose Store a new secret.
  2. For Secret type, select Other type of secret.
  3. Enter a key as sfUser and the value as your Snowflake user name.
  4. Enter a key as sfPassword and the value as your Snowflake user password.
  5. Choose Next.
  6. Name the secret snowflake_credentials and follow through the rest of the steps to store the secret.

Subscribe to AWS Marketplace Snowflake Connector

To subscribe to the connector, follow the steps and activate Snowflake Connector for AWS Glue. This native connector simplifies the process of connecting AWS Glue jobs to extract data from Snowflake

  1. Navigate to the Snowflake Connector for AWS Glue in AWS Marketplace.
  2. Choose Continue to Subscribe.
  3. Review the terms and conditions, pricing, and other details.
  4. Choose Continue to Configuration.
  5. For Delivery Method, choose your delivery method.
  6. For Software version, choose your software version
  7. Choose Continue to Launch.
  8. Under Usage instructions, choose Activate the Glue connector in AWS Glue Studio. You’re redirected to AWS Glue Studio.
  9. For Name, enter a name for your connection (for example, snowflake_s3_glue_connection).
  10. Optionally, choose a VPC, subnet, and security group.
  11. For AWS Secret, choose snowflake_credentials.
  12. Choose Create connection.

A message appears that the connection was successfully created, and the connection is now visible on the AWS Glue Studio console.

Configure AWS Glue for Snowflake JDBC connectivity

Next, we configure a AWS Glue job by following the steps below to extract data.

  1. On the AWS Glue console, choose AWS Glue Studio on the left navigation pane.
  2. On the AWS Glue Studio console, choose Jobs on the left navigation pane.
  3. Create a job with “Visual with source and target” and choose the Snowflake connector for AWS Glue 3.0 as the source and Amazon S3 as the target.
  4. Enter a name for the job.
  5. Under job details, select an IAM role.
  6. Create a new IAM role if you don’t have already with required AWS Glue and AWS Secrets Manager policies.
  7. Under Visual, Choose the Data source – Connection node and choose the connection you created.
  8. In connection options, create a key value pair with query as shown below. Note that CUSTOMER table in SNOWFLAKE_SAMPLE_DATA database is considered for this migration. This table gets preloaded (1.5M rows) when you install Snowflake Schema.

    key value
    query SELECT
    C_CUSTKEY,
    C_NAME,
    C_ADDRESS,
    C_NATIONKEY,
    C_PHONE,
    C_ACCTBAL,
    C_MKTSEGMENT,
    C_COMMENT
    FROM
    SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER
    sfUrl MXA94638.us-east-1.snowflakecomputing.com
    sfDatabase SNOWFLAKE_SAMPLE_DATA
    sfWarehouse COMPUTE_WH

  9. In the Output schema section, specify the source schema as key-value pairs as shown below.
  10. Choose the Transform-ApplyMapping node to view the following transform details.

  11. Choose the Data target properties – S3 node and enter S3 bucket details as shown below.
  12. Choose Save.

After you save the job, the following script is generated. It assumes the account information and credentials are stored in AWS Secrets Manager as described earlier.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

args = getResolvedOptions(sys.argv, ["JOB_NAME"])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args["JOB_NAME"], args)

# Script generated for node Snowflake Connector for AWS Glue 3.0
SnowflakeConnectorforAWSGlue30_node1 = glueContext.create_dynamic_frame.from_options(
    connection_type="marketplace.spark",
    connection_options={
        "query": "SELECT C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.CUSTOMER",
        "sfUrl": "MXA94638.us-east-1.snowflakecomputing.com",
        "sfDatabase": "SNOWFLAKE_SAMPLE_DATA",
        "sfWarehouse": "COMPUTE_WH",
        "connectionName": "snowflake_s3_glue_connection",
    },
    transformation_ctx="SnowflakeConnectorforAWSGlue30_node1",
)

# Script generated for node ApplyMapping
ApplyMapping_node2 = ApplyMapping.apply(
    frame=SnowflakeConnectorforAWSGlue30_node1,
    mappings=[
        ("C_CUSTKEY", "decimal", "C_CUSTKEY", "decimal"),
        ("C_NAME", "string", "C_NAME", "string"),
        ("C_ADDRESS", "string", "C_ADDRESS", "string"),
        ("C_NATIONKEY", "decimal", "C_NATIONKEY", "decimal"),
        ("C_PHONE", "string", "C_PHONE", "string"),
        ("C_ACCTBAL", "decimal", "C_ACCTBAL", "decimal"),
        ("C_MKTSEGMENT", "string", "C_MKTSEGMENT", "string"),
        ("C_COMMENT", "string", "C_COMMENT", "string"),
    ],
    transformation_ctx="ApplyMapping_node2",
)

# Script generated for node S3 bucket
S3bucket_node3 = glueContext.write_dynamic_frame.from_options(
    frame=ApplyMapping_node2,
    connection_type="s3",
    format="glueparquet",
    connection_options={"path": "s3://sf-redshift-po/test/", "partitionKeys": []},
    format_options={"compression": "snappy"},
    transformation_ctx="S3bucket_node3",
)

job.commit()
  1. Choose Run to run the job.

After the job completes successfully, the run status should change to Succeeded.

The following screenshot shows that the data was written to Amazon S3.

Query Amazon S3 data using Amazon Redshift Spectrum

Let’s query the data in Amazon Redshift Spectrum

  1. On the Amazon Redshift console, choose the AWS Region.
  2. In the left navigation pane, choose Query Editor.
  3. Run the create external table DDL given below.
    CREATE EXTERNAL TABLE "spectrum_schema"."customer"
    (c_custkey   decimal(10,2),
    c_name       varchar(256),
    c_address    varchar(256),
    c_nationkey  decimal(10,2),
    c_phone      varchar(256),
    c_acctbal    decimal(10,2),
    c_mktsegment varchar(256),
    c_comment    varchar(256))
    stored as parquet
    location 's3://sf-redshift-po/test'; 

  1. Run the select query on the CUSTOMER table.
SELECT count(c_custkey), c_nationkey FROM "dev"."spectrum_schema"."customer" group by c_nationkey

Considerations

The AWS Glue crawler doesn’t work directly with Snowflake. This is a native capability that you can use for other AWS data sources that are joined or connected in the AWS Glue ETL job. Instead, you can define connections in the script as shown earlier in this post.

The Snowflake source tables covered in this post only focus on structured data types and therefore semi-structured or unstructured data types in Snowflake (binary, varbinary, and variant) are out of scope. However, you could use AWS Glue functions such as relationalize to flatten nested schema data into semi-normalized structures, or you could use Amazon Redshift Spectrum to support these data types.

Conclusion

In this post, we learnt how to define Snowflake connection parameters in AWS Glue, connect to Snowflake from AWS Glue using the AWS native connector for Snowflake, migrate to Amazon S3 and use Redshift Spectrum to query data in Amazon S3 to meet your business needs.

We welcome any thoughts or questions in the comments section below


About the Authors

Sindhu Achuthan is a Data Architect with Global Financial Services at Amazon Web Services. She works with customers to provide architectural guidance for analytics solutions on Amazon Glue, Amazon Redshift, AWS Lambda, and other services. Outside work, she is a DIYer, likes to go on long trails, and do yoga.

Shayon Sanyal is a Sr. Solutions Architect specializing in databases at AWS. His day job allows him to help AWS customers design scalable, secure, performant and robust database architectures on the cloud. Outside work, you can find him hiking, traveling or training for the next half-marathon.