Tag Archives: Amazon Elastic File System (EFS)

Replication failback and increased IOPS are new for Amazon EFS

2023-11-27 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/replication-failback-and-increased-iops-are-new-for-amazon-efs/

Today, Amazon Elastic File System (Amazon EFS) has introduced two new capabilities:

Replication failback – Failback support for EFS replication makes it easier and more cost-effective to synchronize changes between EFS file systems when performing disaster recovery (DR) workflows. You can now quickly replicate incremental changes from your secondary back to your primary file system after disaster events and other DR-related activities.
Increased IOPS – Amazon EFS now supports up to 250,000 read IOPS and up to 50,000 write IOPS per file system, making it easier to run more IOPS-heavy workloads at any scale for virtual servers, containers, and serverless functions that require shared storage.

Let’s see more in depth how these work in practice.

Introducing Amazon EFS replication failback
With Amazon EFS replication, you can create a replica of your file system in the same or in another AWS Region. When replication is enabled, Amazon EFS automatically keeps the primary (source) and secondary (destination) file systems synchronized. To help you meet your compliance and business continuity goals, EFS replication is designed to provide a recovery point objective (RPO) and a recovery time objective (RTO) measured in minutes.

Now, with failback support, you can respond to disaster recovery (DR) events, conduct planned business continuity tests, and manage other DR-related activities with greater speed and cost efficiency. Failback support allows you to switch the direction of replication between the primary and secondary file systems. EFS replication keeps the two file systems in sync by copying only incremental changes, eliminating the need to make full copies of your data or use a self-managed, custom solution to complete a recovery workflow.

Using Amazon EFS replication failback
I have a file system replicated to another Region. As part of a periodic DR test, I want to switch to using the secondary file system and then revert back to the primary file system, preserving all the changes made on the secondary file system. To do so, I can use EFS Replication failback in just a few steps.

First, I delete the replication from the primary (source) to the secondary (destination) file system. After this, the secondary file system becomes writable. To do so, in the Amazon EFS console, I check I am in the correct Region and select the secondary file system. In the Replication tab, I choose Delete replication and confirm deletion. I can also start from the primary file system. In that case, the Delete replication link in the Replication tab opens a new browser tab and asks to confirm deletion like before.

I can now use the secondary file system and change its data as needed.

To go back to using the primary file system, I create a “reverse replication” from the secondary to the primary file system. To do so, I check I am in the correct Region and select the secondary file system. In the Replication tab, I choose Create replication and the new option Replicate to existing file system. Then, I select the Region of the primary file system and use the console to browse the EFS file systems in that Region and choose the primary one.

The console warns me that Replication overwrite protection is enabled for the primary file system. I follow the Disable protection link to open a new browser tab and edit the primary file system to disable replication overwrite protection.

Now, I go back to the browser tab where I am creating the failback replication from the secondary to the primary file system. I refresh the protection check and choose to create the replication.

In the following dialog, I confirm that I want Amazon EFS to write to the primary file system.

To know when the primary file system is back in sync, I check the Last synced timestamp in the Replication tab, which indicates that all changes made to the source file system before that time are replicated to the destination. Optionally, I can look at the TimeSinceLastSync metric (expressed in minutes) in Amazon CloudWatch to understand how data is being replicated.

When the primary file system is back in sync, I delete the replication from the secondary to the primary file system. To complete the restore of the original configuration, I again create the replication from the primary to the secondary file system.

Increased IOPS per file system
The Amazon EFS team has been able to increase IOPS again! The last time they did it was just a few months back. Starting today, an EFS file system can handle up to 50,000 write IOPS (a 2x improvement) and up to 250,000 read IOPS (a 4.5x improvement) when working with frequently-accessed data from a high-performance cache managed by Amazon EFS.

You can monitor the percentage utilization of your file system’s available IOPS using the PercentIOLimit CloudWatch metric. This metric considers the maximum IOPS for writes and uncached reads, including combinations of the two. Reads from the cache are not included in the PercentIOLimit metric.

With these performance improvements, you can run even more IOPS-demanding workloads on Amazon EFS, such as machine learning (ML) training, fine-tuning, and inference. Other use cases that can benefit from the increased IOPS are data science user shares, SaaS applications, and media processing.

Things to know
EFS replication failback is available in all AWS Regions where EFS is available. There are no additional costs for using replication failback. You pay for the usual replication and file system changes as described in Amazon EFS pricing.

The increased IOPS limits are immediately available for all file systems using the Elastic Throughput mode in all Regions where EFS is available. You don’t need to do anything to benefit from these performance improvements. To achieve the maximum IOPS, your application needs sufficient parallelization. For example, using multiple clients and distributing the load across a large number of files. For more information, see the performance tips in the user guide.

Learn more
Amazon EFS product page

— Danilo

Optimize your storage costs for rarely-accessed files with Amazon EFS Archive

2023-11-26 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/optimize-your-storage-costs-for-rarely-accessed-files-with-amazon-efs-archive/

Today, we are introducing EFS Archive, a new storage class for Amazon Elastic File System (Amazon EFS) optimized for long-lived data that is rarely accessed.

With this launch, Amazon EFS supports three Regional storage classes:

EFS Standard – Powered by SSD storage and designed to deliver submillisecond latency for active data.
EFS Infrequent Access (EFS IA) – Cost-optimized for data accessed only a few times a quarter, and that doesn’t need the submillisecond latencies of EFS Standard.
EFS Archive – Cost-optimized for long-lived data accessed a few times a year or less and offering similar performance to EFS IA.

All Regional storage classes deliver gigabytes-per-second throughput and hundreds of thousands of IOPS performance and are designed for eleven nines of durability.

You don’t need to manually pick and choose a storage class for your file systems because EFS lifecycle management can automatically migrate files across storage classes based on their access patterns. This allows you to have a single shared file system that contains files processed in very different ways: from active latency-sensitive to cold rarely-accessed data.

Many datasets have subsets of data that are valuable for generating insights but aren’t often used. With EFS Archive, you can store rarely accessed data cost-effectively while keeping it in the same shared file system as other data. This simplified storage approach allows end users and applications to collaborate on large shared datasets in one place, making it easier and quicker to set up and scale analytics workloads.

Using EFS Archive, you can optimize costs for workloads with large file-based datasets that contain a mix of active and inactive data such as user shares, machine learning (ML) training datasets, SaaS applications, and data retained for regulatory compliance like financial transactions and medical records.

Let’s see how this works in practice.

Using EFS Archive storage
To use the new EFS Archive storage class, I need to configure lifecycle management for the file system. In the Amazon EFS console, I select one of my file systems and choose Edit. To use EFS Archive storage, the file system Throughput mode must be Elastic. Elastic Throughput is the recommended choice for most workloads because it is designed to provide applications with as much throughput as they need with pay-as-you-use pricing.

Now, I configure Lifecycle management to transition files into EFS IA or EFS Archive based on my workload’s access patterns.

My workloads rarely use files older than one month. Files older than a quarter are not used by normal activities but need to be kept for a longer time. Based on these considerations, I select to automatically transition files to EFS IA after 30 days and to EFS Archive after 90 days since the last access. These are the default settings for new file systems.

When one of my old files is accessed, it’s usually an indicator that is being used in a new analysis, so it’ll become active again for some period. For this reason, I use the option to transition files back to Standard storage on their first access in IA or Archive storage.

I save changes, and that’s it! This file system will now automatically use different storage classes based on how files are being processed by my applications.

Things to know
EFS Archive is available today in all AWS Regions where Amazon EFS is offered, excluding those based in China.

To offer a more cost-optimized experience for colder, rarely-accessed files, EFS Archive offers 50 percent lower storage cost than EFS IA with a three times higher request charge when data is accessed. For more information, see Amazon EFS pricing.

You can use EFS Archive with existing file systems by configuring the file system lifecycle policies. New file systems are created by default with a lifecycle policy that automatically transitions files to EFS IA after 30 days and to EFS Archive after 90 days since the last access.

Optimize your storage costs by configuring lifecycle management for your Amazon EFS file systems.

— Danilo

Welcome to AWS Storage Day 2023

2023-08-09 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/welcome-to-aws-storage-day-2023/

Welcome to the fifth annual AWS Storage Day! This virtual event is happening today starting at 9:00 AM Pacific Time (12:00 PM Eastern Time) and is available for you to watch on the AWS On Air Twitch channel. The first AWS Storage Day was hosted in 2019, and this event has grown into an innovation day that we look forward to delivering to you every year. In last year’s Storage Day post, I wrote about the constant innovations in AWS Storage aimed at helping you put your data to work while keeping it secure and protected. This year, Storage Day is focused on storage for AI/ML, data protection and resiliency, and the benefits of moving to the cloud.

AWS Storage Day Key Themes
When it comes to storage for AI/ML, data volumes are increasing at an unprecedented rate, exploding from terabytes to petabytes and even to exabytes. With a modern data architecture on AWS, you can rapidly build scalable data lakes, use a broad and deep collection of purpose-built data services, scale your systems at a low cost without compromising performance, share data across organizational boundaries, and manage compliance, security, and governance, allowing you to make decisions with speed and agility at scale.
To train machine learning models and build Generative AI applications, you must have the right data strategy in place. So, I’m happy to see that, among the list of sessions to look forward to at the live event, the Optimize generative AI and ML with AWS Infrastructure session will discuss how you can transform your data into meaningful insights.

Whether you’re just getting started with the cloud, planning to migrate applications to AWS, or already building applications on AWS, we have resources to help you protect your data and meet your business continuity objectives. Our data protection and resiliency features and solutions can help you meet your business continuity goals and deliver disaster recovery during data loss events, across recovery point and time objectives (RPO and RTO). With the unprecedented data growth happening in the world today, determining where your data is stored, how it’s secured, and who has access to it is a higher priority than ever. Be sure to join the Protect data in AWS amid a rapidly evolving cyber landscape session to learn more.

When moving data to the cloud, you need to understand where you’re moving it for different use cases, the types of data you’re moving, and the network resources available, among other considerations. There are many reasons to move to the cloud, recently, Enterprise Strategy Group (ESG) validated that organizations reduced compute, networking, and storage costs by up to 66 percent by migrating on-premises workloads to AWS Cloud infrastructure. ESG confirmed that migrating on-premises workloads to AWS provides organizations with reduced costs, increased performance, improved operational efficiency, faster time to value, and improved business agility.
We have a number of sessions that discuss how to move to the cloud, based on your use case. I’m most looking forward to the Hybrid cloud storage and edge compute: AWS, where you need it session, which will discuss considerations for workloads that can’t fully move to the cloud.

Tune in to learn from experts about new announcements, leadership insights, and educational content related to the broad portfolio of AWS Storage services and features that address all these themes and more. Today, we have announcements related to Amazon Simple Storage Service (Amazon S3), Amazon FSx for Windows File Server, Amazon Elastic File System (Amazon EFS), Amazon FSx for OpenZFS, and more.

Let’s get into it.

15 Years of Amazon EBS
Not long ago, I was reading Jeff Barr’s post titled 15 Years of AWS Blogging! In this post, Jeff mentioned a few posts he wrote for the earliest AWS services and features. Amazon Elastic Block Store (Amazon EBS) is on this list as a service that simplifies the use of Amazon EC2.

Well, it’s been 15 years since the launch of Amazon EBS was announced, and today we celebrate 15 years of this service. If you were one of the original users who put Amazon EBS to good use and provided us with the very helpful feedback that helped us invent and simplify, iterate and improve, I’m sure you can’t believe how time flies. Today, Amazon EBS handles more than 100 trillion I/O operations daily, and over 390 million EBS volumes are created every day.

If you’re new to Amazon EBS, join us for a fireside chat with Matt Garman, Senior Vice President, Sales, Marketing, and Global Services at AWS, and learn the strategy and customer challenges behind the launch of the service in 2008. You’ll also hear from long-term EBS customer, Stripe, about its growth with EBS since Stripe was launched 12 years ago.

Amazon EBS has continuously improved its scalability and performance to support more customer workloads as the direct storage attachment for Amazon EC2 instances. With the launch of Amazon EC2 M7i instances, powered by custom 4th Generation Intel Xeon Scalable processors, on August 2, you can attach up to 128 Amazon EBS volumes, an increase from 28 on a previous generation M6i instance. The higher number of volume attachments means you can increase storage density per instance and improve resource utilization, reducing total compute cost.

You can host up to 127 containers per instance for larger database applications and scale them more cost effectively before needing to provision more instances and only pay for resources you need. With a higher number of volume attachments, you can fully utilize the memory and vCPU available on these powerful M7i instances as your database storage footprint grows. EBS is also increasing the number of multi-volume snapshots you can create, for up to 128 EBS volumes attached to an instance, enabling you to create crash-consistent backups of all volumes attached to an instance.

Join the 15 years of innovations with Amazon EBS session for a discussion about how the original vision for Amazon EBS has evolved to meet your growing demands for cloud infrastructure.

Mountpoint for Amazon S3
Now generally available, Mountpoint for Amazon S3 is a new open source file client that delivers high throughput access, lowering compute costs for data lakes on Amazon S3. Mountpoint for Amazon S3 is a file client that translates local file system API calls to S3 object API calls. Using Mountpoint for Amazon S3, you can mount an Amazon S3 bucket as a local file system on your compute instance, to access your objects through a file interface with the elastic storage and throughput of Amazon S3. Mountpoint for Amazon S3 supports sequential and random read operations on existing files, and sequential write operations for creating new files.

The Deep dive and demo of Mountpoint for Amazon S3 session demonstrates how to use the file client to access objects in Amazon S3 using file APIs, making it easier to store data at scale and maximize the value of your data with analytics and machine learning workloads. Read this blog post to learn more about Mountpoint for Amazon S3 and how to get started, including a demo.

Put Cold Storage to Work Faster with Amazon S3 Glacier Flexible Retrieval
Amazon S3 Glacier Flexible Retrieval improves data restore time by up to 85 percent, at no additional cost. Faster data restores automatically apply to the Standard retrieval tier when using Amazon S3 Batch Operations. These restores begin to return objects within minutes, so you can process restored data faster. Processing restored data in parallel with ongoing restores helps you accelerate data workflows and quickly respond to business needs. Now, whether you’re transcoding media, restoring operational backups, training machine learning models, or analyzing historical data, you can speed up your data restores from archive.

Coupled with the S3 Glacier improvements to restore throughput by up to 10 times for millions of objects announced in 2022, S3 Glacier data restores of all sizes now benefit from both faster starts and shorter completion times.

Join the Maximize the value of cold data with Amazon S3 Glacier session to learn how Amazon S3 Glacier is helping organizations of all sizes and from all industries transform their data archiving to unlock business value, increase agility, and save on storage costs. Read this blog post to learn more about the Amazon S3 Glacier Flexible Retrieval performance improvements and follow step-by-step guidance on how to get started with faster standard retrievals from S3 Glacier Flexible Retrieval.

Supporting a Broad Spectrum of File Workloads
To serve a broad spectrum of use cases that rely on file systems, we offer a portfolio of file system services, each targeting a different set of needs. Amazon EFS is a serverless file system built to deliver an elastic experience for sharing data across compute resources. Amazon FSx makes it easier and cost-effective for you to launch, run, and scale feature-rich, high-performance file systems in the cloud, enabling you to move to the cloud with no changes to your code, processes, or how you manage your data.

Power ML research and big data analytics with Amazon EFS
Amazon EFS offers serverless and fully scalable file storage, designed for high scalability in both storage capacity and throughput performance. Just last week, we announced enhanced support for faster read and write IOPS, making it easier to power more demanding workloads. We’ve improved the performance capabilities of Amazon EFS by adding support for up to 55,000 read IOPS and up to 25,000 write IOPS per file system. These performance enhancements help you to run more demanding workflows, such as machine learning (ML) research with KubeFlow, financial simulations with IBM Symphony, and big data processing with Domino Data Lab, Hadoop, and Spark.

Join the Build and run analytics and SaaS applications at scale session to hear how recent Amazon EFS performance improvements can help power more workloads.

Multi-AZ file systems on Amazon FSx for OpenZFS
You can now use a multi-AZ deployment option when creating file systems on Amazon FSx for OpenZFS, making it easier to deploy file storage that spans multiple AWS Availability Zones to provide multi-AZ resilience for business-critical workloads. With this launch, you can take advantage of the power, agility, and simplicity of Amazon FSx for OpenZFS for a broader set of workloads, including business-critical workloads like database, line-of-business, and web-serving applications that require highly available shared storage that spans multiple AZs.

The new multi-AZ file systems are designed to deliver high levels of performance to serve a broad variety of workloads, including performance-intensive workloads such as financial services analytics, media and entertainment workflows, semiconductor chip design, and game development and streaming, up to 21 GB per second of throughput and over 1 million IOPS for frequently accessed cached data, and up to 10 GB per second and 350,000 IOPS for data accessed from persistent disk storage.

Join the Migrate NAS to AWS to reduce TCO and gain agility session to learn more about multi-AZs with Amazon FSx for OpenZFS.

New, Higher Throughput Capacity Levels on Amazon FSx for Windows File Server
Performance improvements for Amazon FSx for Windows File Server help you accelerate time-to-results for performance-intensive workloads such as SQL Server databases, media processing, cloud video editing, and virtual desktop infrastructure (VDI).

We’re adding four new, higher throughput capacity levels to increase the maximum I/O available up to 12 GB per second from the previous I/O of 2 GB per second. These throughput improvements come with correspondingly higher levels of disk IOPS, designed to deliver an increase up to 350,000 IOPS.

In addition, by using FSx for Windows File Server, you can provision IOPS higher than the default 3 IOPS per GiB for your SSD file system. This allows you to scale SSD IOPS independently from storage capacity, allowing you to optimize costs for performance-sensitive workloads.

Join the Migrate NAS to AWS to reduce TCO and gain agility session to learn more about the performance improvements for Amazon FSx for Windows File Server.

Logically Air-Gapped Vault for AWS Backup
AWS Backup is a fully managed, policy-based data protection solution that enables customers to centralize and automate backup restores across 19 AWS services (spanning compute, storage, and databases) and third-party applications such as VMware Cloud on AWS and on-premises, as well as SAP HANA on Amazon EC2.

Today, we’re announcing the preview of logically air-gapped vault as a new type of AWS Backup Vault that acts as an additional layer of protection to mitigate against malware events. With logically air-gapped vault, customers can recover their application data through a different trusted account.

Join the Deep dive on data recovery for ransomware events session to learn more about logically air-gapped vault for AWS Backup.

Copy Data to and from Other Clouds with AWS DataSync
AWS DataSync is an online data movement and discovery service that simplifies data migration and helps you quickly, easily, and securely transfer your file or object data to, from, and between AWS storage services. In addition to support of data migration to and from AWS storage services, DataSync supports copying to and from other clouds such as Google Cloud Storage, Azure Files, and Azure Blob Storage. Using DataSync, you can move your object data at scale between Amazon S3 compatible storage on other clouds and AWS storage services such as Amazon S3. We’re now expanding the support of DataSync for copying data to and from other clouds to include DigitalOcean Spaces, Wasabi Cloud Storage, Backblaze B2 Cloud Storage, Cloudflare R2 Storage, and Oracle Cloud Storage.

Join the Identify and accelerate data migrations at scale session to learn more about this expanded support for DataSync.

Join Us Online
Join us today for the AWS Storage Day virtual event on the AWS On Air channel on Twitch. The event will be live starting at 9:00 AM Pacific Time (12:00 PM Eastern Time) on August 9. All sessions will be available on demand approximately two days after Storage Day.

We look forward to seeing you on Twitch!

– Veliswa

AWS Cloud service considerations for designing multi-tenant SaaS solutions

2023-08-04 Dennis Greene

Post Syndicated from Dennis Greene original https://aws.amazon.com/blogs/architecture/aws-cloud-service-considerations-for-designing-multi-tenant-saas-solutions/

An increasing number of software as a service (SaaS) providers are considering the move from single to multi-tenant to utilize resources more efficiently and reduce operational costs. This blog aims to inform customers of considerations when evaluating a transformation to multi-tenancy in the Amazon Web Services (AWS) Cloud. You’ll find valuable information on how to optimize your cloud-based SaaS design to reduce operating expenses, increase resiliency, and offer a high-performing experience for your customers.

Single versus multi-tenancy

In a multi-tenant architecture, resources like compute, storage, and databases can be shared among independent tenants. In contrast, a single-tenant architecture allocates exclusive resources to each tenant.

Let’s consider a SaaS product that needs to support many customers, each with their own independent deployed website. Using a single-tenant model (see Figure 1), the SaaS provider may opt to utilize a dedicated AWS account to host each tenant’s workloads. To contain their respective workloads, each tenant would have their own Amazon Elastic Compute Cloud (Amazon EC2) instances organized within an Auto Scaling group. Access to the applications running in these EC2 instances would be done via an Application Load Balancer (ALB). Each tenant would be allocated their own database environment using Amazon Relational Database Service (RDS). The website’s storage (consisting of PHP, JavaScript, CSS, and HTML files) would be provided by Amazon Elastic Block Store (EBS) volumes attached to the EC2 instances. The SaaS provider would have a control plane AWS account used to create and modify these tenant-specific accounts.

Figure 1. Single-tenant configuration

To transition to a multi-tenant pattern, the SaaS provider can use containerization to package each website, and a container orchestrator to deploy the websites across shared compute nodes (EC2 instances). Kubernetes can be employed as a container orchestrator, and a website would then be represented by a Kubernetes deployment and its associated pods. A Kubernetes namespace would serve as the logical encapsulation of the tenant-specific resources, as each tenant would be mapped to one Kubernetes namespace. The Kubernetes HorizontalPodAutoscaler can be utilized for autoscaling purposes, dynamically adjusting the number of replicas in the deployment on a given namespace based on workload demands.

When additional compute resources are required, tools such as the Cluster Autoscaler, or Karpenter, can dynamically add more EC2 instances to the shared Kubernetes Cluster. An ALB can be reused by multiple tenants to route traffic to the appropriate pods. For RDS, SaaS providers can use tenant-specific database schemas to separate tenant data. For static data, Amazon Elastic File System (EFS) and tenant-specific directories can be employed. The SaaS provider would still have a control plane AWS account that would now interact with the Kubernetes and AWS APIs to create and update tenant-specific resources.

This transition to a multi-tenant design utilizing Kubernetes, Amazon Elastic Kubernetes Service (EKS), and other managed services offers numerous advantages. It enables efficient resource utilization by leveraging containerization and auto-scaling capabilities, reducing costs, and optimizing performance (see Figure 2).

Figure 2. Multi-tenant configuration

EKS cluster sizing and customer segmentation considerations in multi-tenancy designs

A high concentration of SaaS tenants hosted within the same system results in a large “blast radius.” This means a failure within the system has the potential to impact all resident tenants. This situation can lead to downtime for multiple tenants at once. To address this problem, SaaS providers are encouraged to partition their customers amongst multiple AWS accounts, each with their own deployments of this multi-tenant architecture. The number of tenants that can be present in a single cluster is a determination that can only be made by the SaaS provider after weighing the risks. Compare the shared fate of some subset of their customers, against the possible efficiency benefits of a multi-tenant architecture.

EKS security

SaaS providers must evaluate whether it’s appropriate for them to make use of containers as a workload isolation boundary. This is of particular importance in multi-tenant Kubernetes architectures, given that containers running on a single Amazon EC2 instance will share the underlying Linux kernel. Security vulnerabilities place this shared resource (the EC2 instance) at risk from attack vectors from the host Linux instance. Risk is elevated when any container running in a Kubernetes Pod cluster initiates untrusted code. This risk is heightened if SaaS providers permit tenants to “bring their code”. Kubernetes is a single tenant orchestrator, but with a multi-tenant approach to SaaS architectures, a single instance of the Amazon EKS control plane will be shared among all the workloads running within a cluster. Amazon EKS considers the cluster as the hard isolation security boundary. Every Amazon EKS managed Kubernetes cluster is isolated in a dedicated single-tenant Amazon VPC. At present, hard multi-tenancy can only be implemented by provisioning a unique cluster for each tenant.

EFS considerations

A SaaS provider may consider EFS as the storage solution for the static content of the multiple tenants. This provides them with a straightforward, serverless, and elastic file system. Directories may be used to separate the content for each tenant. While this approach of creating tenant-specific directories in EFS provides many benefits, there may be challenges harvesting per-tenant utilization and performance metrics. This can result in operational challenges for providers that need to granularly meter per-tenant usage of resources. Consequently, noisy neighbors will be difficult to identify and remediate. To resolve this, SaaS providers should consider building a custom solution to monitor the individual tenants in the multi-tenant file system by leveraging storage and throughput/IOPS metrics.

RDS considerations

Multi-tenant workloads, where data for multiple customers or end users is consolidated in the same RDS database cluster, can present operational challenges regarding per-tenant observability. Both MySQL Community Edition and open-source PostgreSQL have limited ability to provide per-tenant observability and resource governance. AWS customers operating multi-tenant workloads often use a combination of ‘database’ or ‘schema’ and ‘database user’ accounts as substitutes. AWS customers should use alternate mechanisms to establish a mapping between a tenant and these substitutes. This will give you the ability to process raw observability data from the database engine externally. You can then map these substitutes back to tenants, and distinguish tenants in the observability data.

Conclusion

In this blog, we’ve shown what to consider when moving to a multi-tenancy SaaS solution in the AWS Cloud, how to optimize your cloud-based SaaS design, and some challenges and remediations. Invest effort early in your SaaS design strategy to explore your customer requirements for tenancy. Work backwards from your SaaS tenants end goals. What level of computing performance do they require? What are the required cyber security features? How will you, as the SaaS provider, monitor and operate your platform with the target tenancy configuration? Your respective AWS account team is highly qualified to advise on these design decisions. Take advantage of reviewing and improving your design using the AWS Well-Architected Framework. The tenancy design process should be followed by extensive prototyping to validate functionality before production rollout.

Related information

Designing a hybrid AI/ML data access strategy with Amazon SageMaker

2023-07-10 Franklin Aguinaldo

Post Syndicated from Franklin Aguinaldo original https://aws.amazon.com/blogs/architecture/designing-a-hybrid-ai-ml-data-access-strategy-with-amazon-sagemaker/

Over time, many enterprises have built an on-premises cluster of servers, accumulating data, and then procuring more servers and storage. They often begin their ML journey by experimenting locally on their laptops. Investment in artificial intelligence (AI) is at a different stage in every business organization. Some remain completely on-premises, others are hybrid (both on-premises and cloud), and the remaining have moved completely into the cloud for their AI and machine learning (ML) workloads.

These enterprises are also researching or have started using the cloud to augment their on-premises systems for several reasons. As technology improves, both the size and quantity of data increases over time. The amount of data captured and the number of datapoints continues to expand, which presents a challenge to manage on-premises. Many enterprises are distributed, with offices in different geographic regions, continents, and time zones. While it is possible to increase the on-premises footprint and network pipes, there are still hidden costs to consider for maintenance and upkeep. These organizations are looking to the cloud to shift some of that effort and enable them to burst and use the rich AI and ML features on the cloud.

Defining a hybrid data access strategy

Moving ML workloads into the cloud calls for a robust hybrid data strategy describing how and when you will connect your on-premises data stores to the cloud. For most, it makes sense to make the cloud the source of truth, while still permitting your teams to use and curate datasets on-premises. Defining the cloud as source of truth for your datasets means the primary copy will be in the cloud and any dataset generated will be stored in the same location in the cloud. This ensures that requests for data is served from the primary copy and any derived copies.

A hybrid data access strategy should address the following:

Understand your current and future storage footprint for ML on-premises. Create a map of your ML workloads, along with performance and access requirements for testing and training.
Define connectivity across on-premises locations and the cloud. This includes east-west and north-south traffic to support interconnectivity between sites, required bandwidth, and throughput for the data movement workload requirements.
Define your single source of truth (SSOT)[1] and where the ML datasets will primarily live. Consider how dated, new, hot, and cold data will be stored.
Define your storage performance requirements, mapping them to the appropriate cloud storage services. This will give you the ability to take advantage of cloud-native ML with Amazon SageMaker.

Hybrid data access strategy architecture

To help address these challenges, we worked on outlining an end-to-end system architecture in Figure 1 that defines: 1) connectivity between on-premises data centers and AWS Regions; 2) mappings for on-premises data to the cloud; and 3) Aligning Amazon SageMaker to appropriate storage, based on ML requirements.

Figure 1. AI/ML hybrid data access strategy reference architecture

Let’s explore this architecture step by step.

On-premises connectivity to the AWS Cloud runs through AWS Direct Connect for high transfer speeds.
AWS DataSync is used for migrating large datasets into Amazon Simple Storage Service (Amazon S3). AWS DataSync agent is installed on-premises.
On-premises network file system (NFS) or server message block (SMB) data is bridged to the cloud through Amazon S3 File Gateway, using either a virtual machine (VM) or hardware appliance.
AWS Storage Gateway uploads data into Amazon S3 and caches it on-premises.
Amazon S3 is the source of truth for ML assets stored on the cloud.
Download S3 data for experimentation to Amazon SageMaker Studio.
Amazon SageMaker notebooks instances can access data through S3, Amazon FSx for Lustre, and Amazon Elastic File System. Use Amazon File Cache for high-speed caching for access to on-premises data, and Amazon FSx for NetApp ONTAP for cloud bursting.
SageMaker training jobs can use data in Amazon S3, EFS, and FSx for Lustre. S3 data is accessed via File, Fast File, or Pipe mode, and pre-loaded or lazy-loaded when using FSx for Lustre as training job input. Any existing data on EFS can also be made available to training jobs as well.
Leverage Amazon S3 Glacier for archiving data and reducing storage costs.

ML workloads using Amazon SageMaker

Let’s go deeper into how SageMaker can help you with your ML workloads.

To start mapping ML workloads to the cloud, consider which AWS storage services work with Amazon SageMaker. Amazon S3 typically serves as the central storage location for both structured and unstructured data that is used for ML. This includes raw data coming from upstream applications, and also curated datasets that are organized and stored as part of a Feature Store.

In the initial phases of development, a SageMaker Studio user will leverage S3 APIs to download data from S3 to their private home directory. This home directory is backed by a SageMaker-managed EFS file system. Studio users then point their notebook code (also stored in the home directory) to the local dataset and begin their development tasks.

To scale up and automate model training, SageMaker users can launch training jobs that run outside of the SageMaker Studio notebook environment. There are several options for making data available to a SageMaker training job.

Amazon S3. Users can specify the S3 location of the training dataset. When using S3 as a data source, there are three input modes to choose from:
- File mode. This is the default input mode, where SageMaker copies the data from S3 to the training instance storage. This storage is either a SageMaker-provisioned Amazon Elastic Block Store (Amazon EBS) volume or an NVMe SSD that is included with specific instance types. Training only starts after the dataset has been downloaded to the storage, and there must be enough storage space to fit the entire dataset.
- Fast file mode. Fast file mode exposes S3 objects as a POSIX file system on the training instance. Dataset files are streamed from S3 on demand, as the training script reads them. This means that training can start sooner and require less disk space. Fast file mode also does not require changes to the training code.
- Pipe mode. Pipe input also streams data in S3 as the training script reads it, but requires code changes. Pipe input mode is largely replaced by the newer and easier-to-use Fast File mode.
FSx for Lustre. Users can specify a FSx for Lustre file system, which SageMaker will mount to the training instance and run the training code. When the FSx for Lustre file system is linked to a S3 bucket, the data can be lazily loaded from S3 during the first training job. Subsequent training jobs on the same dataset can then access it with low latency. Users can also choose to pre-load the file system with S3 data using hsm_restore commands.
Amazon EFS. Users can specify an EFS file system that already contains their training data. SageMaker will mount the file system on the training instance and run the training code.
Find out how to Choose the best data source for your SageMaker training job.

Conclusion

With this reference architecture, you can develop and deliver ML workloads that run either on-premises or in the cloud. Your enterprise can continue using its on-premises storage and compute for particular ML workloads, while also taking advantage of the cloud, using Amazon SageMaker. The scale available on the cloud allows your enterprise to conduct experiments without worrying about capacity. Start defining your hybrid data strategy on AWS today!

Additional resources:

[1] The practice of aggregating data from many sources to a single source or location.

New – Announcing Amazon EFS Elastic Throughput

2022-11-28 Veliswa Boya

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/new-announcing-amazon-efs-elastic-throughput/

Today, we are announcing the availability of Amazon EFS Elastic Throughput, a new throughput mode for Amazon EFS that is designed to provide your applications with as much throughput as they need with pay-as-you-use pricing. This new throughput mode enables you to further simplify running workloads and applications on AWS by providing shared file storage that doesn’t need provisioning or capacity management.

Elastic Throughput is ideal for spiky and unpredictable workloads with performance requirements that are difficult to forecast. When you enable Elastic Throughput on an Amazon EFS file system, you no longer need to think about actively managing your file system performance or over-paying for idle resources in order to ensure performance for your applications. When you enable Elastic Throughput, you don’t specify or provision throughput capacity, Amazon EFS automatically delivers the throughput performance your application needs while you the builder pays only for the amount of data read or written.

Amazon EFS is built to provide serverless, fully elastic file storage that lets you share file data for your cloud-based applications without having to think about provisioning or managing storage capacity and performance. With Elastic Throughput, Amazon EFS now extends its simplicity and elasticity to performance, enabling you to run an even broader range of file workloads on Amazon EFS. Amazon EFS is well suited to support a broad spectrum of use cases that include analytics and data science, machine learning, CI/CD tools, content management and web serving, and SaaS applications.

A Quick Review
As you may already know, Amazon EFS already has the Bursting Throughput mode, which is available as a default and supports bursting to higher levels for up to 12 hours a day. If your application is throughput constrained on Bursting mode (for example, utilizes more than 80 percent of permitted throughput or exhausts burst credits), then you should consider using Provisioned (which we announced in 2018), or the new Elastic Throughput modes.

With this announcement of Elastic Throughput mode, and in addition to the already existing Provisioned Throughput mode, Amazon EFS now offers two options for workloads that require higher levels of throughput performance. You should use Provisioned Throughput if you know your workload’s performance requirements and you expect your workload to consume a higher share (more than 5 percent on average) of your application’s peak throughput capacity. You should use Elastic Throughput if you don’t know your application’s throughput or your application is very spiky.

To access Elastic Throughput mode (or any of the Throughput modes), select Customize (selecting Create instead will create your file system with the default Bursting mode).

Create File system

New – Elastic Throughput

You can also enable Elastic Throughput for new and existing General Purpose file systems using the Amazon EFS console or programmatically using the Amazon EFS CLI, Amazon EFS API, or AWS CloudFormation.

Elastic Throughput in Action
Once you have enabled Elastic Throughput mode, you will be able to monitor your cost and throughput usage using Amazon CloudWatch and set alerts on unplanned throughput charges using AWS Budgets.

I have a test file system elasticblog that I created previously using the Amazon EFS console, and now I cannot wait to see Elastic Throughput in action.

File system (elasticblog)

I have provisioned an Amazon Elastic Compute Cloud (Amazon C2) instance which I mounted to my file system. This EC2 instance has data that I will add to the file system.

I have also created CloudWatch Alarms, which will monitor throughput usage and set alarm thresholds (ReadIOBytes, WriteIOBytes, TotalIOBytes, and MetadataIOBytes).

CloudWatch for Throughput Usage

The CloudWatch dashboard for my test file system elasticblog looks like this.

CloudWatch Dashboard – TotalIOBytes for File System

Elastic Throughput allows you to drive throughput up to a limit of 3 GiB/s for read operations and 1 GiB/s for write operations per file system in all Regions.

Available Now
Amazon EFS Elastic Throughput is available in all Regions supporting EFS except for the AWS China Regions.

To learn more, see the Amazon EFS User Guide. Please send feedback to AWS re:Post for Amazon Elastic File System or through your usual AWS support contacts.

– Veliswa x

Deploying IBM Cloud Pak for integration on Red Hat OpenShift Service on AWS

2022-11-07 Eduardo Monich Fronza

Post Syndicated from Eduardo Monich Fronza original https://aws.amazon.com/blogs/architecture/deploying-ibm-cloud-pak-for-integration-on-red-hat-openshift-service-on-aws/

Customers across many industries use IBM integration software, such as IBM MQ, DataPower, API Connect, and App Connect, as the backbone that integrates and orchestrates their business-critical workloads.

These customers often tell Amazon Web Services (AWS), they want to migrate their applications to AWS Cloud, as part of their business strategy: to lower costs, gain agility, and innovate faster.

In this blog, we will explore how customers, who are looking at ways to run IBM software on AWS, can use Red Hat OpenShift Service on AWS (ROSA) to deploy IBM Cloud Pak for Integration (CP4I) with modernized versions of IBM integration products.

As ROSA is a fully managed OpenShift service that is jointly supported by AWS and Red Hat, plus managed by Red Hat site reliability engineers, customers benefit from not having to manage the lifecycle of Red Hat OpenShift Container Platform (OCP) clusters.

This post explains the steps to:

Create a ROSA cluster
Configure persistent storage
Install CP4I and the IBM MQ 9.3 operator

Cloud Pak for integration architecture

In this blog, we are implementing a highly available ROSA cluster with three Availability Zones (AZ), three master nodes, three infrastructure nodes, and three worker nodes.

Review the AWS documentation for Regions and AZs and the regions where ROSA is available to choose the best region for your deployment.

Figure 1 demonstrates the solution’s architecture.

Figure 1. IBM Cloud Pak for Integration on ROSA architecture

In our scenario, we are building a public ROSA cluster, with an internet-facing Classic Load Balancer providing access to Ports 80 and 443. Consider using a ROSA private cluster when you are deploying CP4I in your AWS account.

We are using Amazon Elastic File System (Amazon EFS) and Amazon Elastic Block Store (Amazon EBS) for our cluster’s persistent storage. Review the IBM CP4I documentation for information about supported AWS storage options.

Review AWS prerequisites for ROSA and AWS Security best practices in IAM documentation, before deploying CP4I for production workloads, to protect your AWS account and resources.

Cost

You are responsible for the cost of the AWS services used when deploying CP4I in your AWS account. For cost estimates, see the pricing pages for each AWS service you use.

Prerequisites

Before getting started, review the following prerequisites:

This blog assumes familiarity with: CP4I, ROSA, Amazon Elastic Compute Cloud (Amazon EC2), Amazon EBS, Amazon EFS, Amazon Virtual Private Cloud, AWS Cloud9, and AWS Identity and Access Management (IAM)
Access to an AWS account, with permissions to create the resources described in the installation steps section
Verification of the required AWS service quotas to deploy ROSA. If needed, you can request service quota increases from the AWS console
Access to an IBM entitlement API key: either a 60-day trial or an existing entitlement
Access to a Red Hat ROSA token; you can register on the Red Hat website to obtain one
A bastion host to run the CP4I installation, we have used an AWS Cloud 9 workspace. You can use another device, provided it supports the required software packages:
- AWS Command Line Interface (aws cli)
- Red Hat OpenShift Service on AWS command-line interface (rosa)
- OpenShift command-line interface (oc)
- Kubernetes command-line tool (kubectl)
- The Linux utilities (jq, wget, gettext)

Installation steps

To deploy CP4I on ROSA, complete the following steps:

From the AWS ROSA console, click Enable ROSA to active the service on your AWS account (Figure 2).

Figure 2. Enable ROSA on your AWS account
Create an AWS Cloud9 environment to run your CP4I installation. We used a t3.small instance type.

When it comes up, close the Welcome tab and open a new Terminal tab to install the required packages:

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
wget https://mirror.openshift.com/pub/openshift-v4/clients/rosa/latest/rosa-linux.tar.gz
sudo tar -xvzf rosa-linux.tar.gz -C /usr/local/bin/

rosa download oc
sudo tar -xvzf openshift-client-linux.tar.gz -C /usr/local/bin/

sudo yum -y install jq gettext

Ensure the ELB service-linked role exists in your AWS account:

aws iam get-role --role-name 
"AWSServiceRoleForElasticLoadBalancing" || aws iam create-service-linked-role --aws-service-name 
"elasticloadbalancing.amazonaws.com"

Create an IAM policy named cp4i-installer-permissions with the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:*",
                "cloudformation:*",
                "cloudwatch:*",
                "ec2:*",
                "elasticfilesystem:*",
                "elasticloadbalancing:*",
                "events:*",
                "iam:*",
                "kms:*",
                "logs:*",
                "route53:*",
                "s3:*",
                "servicequotas:GetRequestedServiceQuotaChange",
                "servicequotas:GetServiceQuota",
                "servicequotas:ListServices",
                "servicequotas:ListServiceQuotas",
                "servicequotas:RequestServiceQuotaIncrease",
                "sts:*",
                "support:*",
                "tag:*"
            ],
            "Resource": "*"
        }
    ]
}

Create an IAM role:
1. Select AWS service and EC2, then click Next: Permissions.
2. Select the cp4i-installer-permissions policy, and click Next.
3. Name it cp4i-installer, and click Create role.
From your AWS Cloud9 IDE, click the grey circle button on the top right, and select Manage EC2 Instance (Figure 3).

Figure 3. Manage the AWS Cloud9 EC2 instance
On the Amazon EC2 console, select the AWS Cloud9 instance, then choose Actions / Security / Modify IAM Role.
Choose cp4i-installer from the IAM Role drop down, and click Update IAM role (Figure 4).

Figure 4. Attach the IAM role to your workspace

Update the IAM settings for your AWS Cloud9 workspace:

aws cloud9 update-environment --environment-id $C9_PID --managed-credentials-action DISABLE
rm -vf ${HOME}/.aws/credentials

Configure the following environment variables:

export ACCOUNT_ID=$(aws sts get-caller-identity --output text --query Account)
export AWS_REGION=$(curl -s 169.254.169.254/latest/dynamic/instance-identity/document | jq -r '.region')
export ROSA_CLUSTER_NAME=cp4iblog01

Configure the aws cli default region:

aws configure set default.region ${AWS_REGION}

Navigate to the Red Hat Hybrid Cloud Console, and copy your OpenShift Cluster Manager API Token.
Use the token and log in to your Red Hat account:
```
rosa login --token=<your_openshift_api_token>
```
Verify that your AWS account satisfies the quotas to deploy your cluster:
```
rosa verify quota
```
When deploying ROSA for the first time, create the account-wide roles:
```
rosa create account-roles --mode auto --yes
```

Create your ROSA cluster:

rosa create cluster --cluster-name $ROSA_CLUSTER_NAME --sts \
  --multi-az \
  --region $AWS_REGION \
  --version 4.10.35 \
  --compute-machine-type m5.4xlarge \
  --compute-nodes 3 \
  --operator-roles-prefix cp4irosa \
  --mode auto --yes \
  --watch

Once your cluster is ready, create a cluster-admin user (it takes approximately 5 minutes):
```
rosa create admin --cluster=$ROSA_CLUSTER_NAME
```
Log in to your cluster using the cluster-admin credentials. You can copy the command from the output of the previous step. For example:
```
oc login https://<your_cluster_api_address>:6443 \
  --username cluster-admin \
  --password <your_cluster-admin_password>
```

Create an IAM policy allowing ROSA to use Amazon EFS:

cat <<EOF > $PWD/efs-policy.json
{
  "Version": "2012-10-17",
  "Statement": [
 {
   "Effect": "Allow",
   "Action": [
     "elasticfilesystem:DescribeAccessPoints",
     "elasticfilesystem:DescribeFileSystems"
   ],
   "Resource": "*"
 },
 {
   "Effect": "Allow",
   "Action": [
     "elasticfilesystem:CreateAccessPoint"
   ],
   "Resource": "*",
   "Condition": {
     "StringLike": {
       "aws:RequestTag/efs.csi.aws.com/cluster": "true"
     }
   }
 },
 {
   "Effect": "Allow",
   "Action": "elasticfilesystem:DeleteAccessPoint",
   "Resource": "*",
   "Condition": {
     "StringEquals": {
       "aws:ResourceTag/efs.csi.aws.com/cluster": "true"
     }
   }
 }
  ]
}
EOF
POLICY=$(aws iam create-policy --policy-name "${ROSA_CLUSTER_NAME}-cp4i-efs-csi" --policy-document file://$PWD/efs-policy.json --query 'Policy.Arn' --output text) || POLICY=$(aws iam list-policies --query 'Policies[?PolicyName==`cp4i-efs-csi`].Arn' --output text)

Create an IAM trust policy:

export OIDC_PROVIDER=$(oc get authentication.config.openshift.io cluster -o json | jq -r .spec.serviceAccountIssuer| sed -e "s/^https:\/\///")
cat <<EOF > $PWD/TrustPolicy.json
{
  "Version": "2012-10-17",
  "Statement": [
 {
   "Effect": "Allow",
   "Principal": {
     "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
   },
   "Action": "sts:AssumeRoleWithWebIdentity",
   "Condition": {
     "StringEquals": {
       "${OIDC_PROVIDER}:sub": [
         "system:serviceaccount:openshift-cluster-csi-drivers:aws-efs-csi-driver-operator",
         "system:serviceaccount:openshift-cluster-csi-drivers:aws-efs-csi-driver-controller-sa"
       ]
     }
   }
 }
  ]
}
EOF

Create an IAM role with the previously created policies:

ROLE=$(aws iam create-role \
  --role-name "${ROSA_CLUSTER_NAME}-aws-efs-csi-operator" \
  --assume-role-policy-document file://$PWD/TrustPolicy.json \
  --query "Role.Arn" --output text)
aws iam attach-role-policy \
  --role-name "${ROSA_CLUSTER_NAME}-aws-efs-csi-operator" \
  --policy-arn $POLICY

Create an OpenShift secret to store the AWS access keys:

cat <<EOF | oc apply -f -
apiVersion: v1
kind: Secret
metadata:
  name: aws-efs-cloud-credentials
  namespace: openshift-cluster-csi-drivers
stringData:
  credentials: |-
    [default]
    role_arn = $ROLE
    web_identity_token_file = /var/run/secrets/openshift/serviceaccount/token
EOF

Install the Amazon EFS CSI driver operator:

cat <<EOF | oc create -f -
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  generateName: openshift-cluster-csi-drivers-
  namespace: openshift-cluster-csi-drivers
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  labels:
    operators.coreos.com/aws-efs-csi-driver-operator.openshift-cluster-csi-drivers: ""
  name: aws-efs-csi-driver-operator
  namespace: openshift-cluster-csi-drivers
spec:
  channel: stable
  installPlanApproval: Automatic
  name: aws-efs-csi-driver-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
EOF

Track the operator installation:

watch oc get deployment aws-efs-csi-driver-operator \
 -n openshift-cluster-csi-drivers

Install the AWS EFS CSI driver:

cat <<EOF | oc apply -f -
apiVersion: operator.openshift.io/v1
kind: ClusterCSIDriver
metadata:
  name: efs.csi.aws.com
spec:
  managementState: Managed
EOF

Wait until the CSI driver is running:

watch oc get daemonset aws-efs-csi-driver-node \
 -n openshift-cluster-csi-drivers

Create a rule allowing inbound NFS traffic from your cluster’s VPC Classless Inter-Domain Routing (CIDR):

NODE=$(oc get nodes --selector=node-role.kubernetes.io/worker -o jsonpath='{.items[0].metadata.name}')
VPC_ID=$(aws ec2 describe-instances --filters "Name=private-dns-name,Values=$NODE" --query 'Reservations[*].Instances[*].{VpcId:VpcId}' | jq -r '.[0][0].VpcId')
CIDR=$(aws ec2 describe-vpcs --filters "Name=vpc-id,Values=$VPC_ID" --query 'Vpcs[*].CidrBlock' | jq -r '.[0]')
SG=$(aws ec2 describe-instances --filters "Name=private-dns-name,Values=$NODE" --query 'Reservations[*].Instances[*].{SecurityGroups:SecurityGroups}' | jq -r '.[0][0].SecurityGroups[0].GroupId')
aws ec2 authorize-security-group-ingress \
  --group-id $SG \
  --protocol tcp \
  --port 2049 \
  --cidr $CIDR | jq .

Create an Amazon EFS file system:

EFS_FS_ID=$(aws efs create-file-system --performance-mode generalPurpose --encrypted --region ${AWS_REGION} --tags Key=Name,Value=ibm_cp4i_fs | jq -r '.FileSystemId')
SUBNETS=($(aws ec2 describe-subnets --filters "Name=vpc-id,Values=${VPC_ID}" "Name=tag:Name,Values=*${ROSA_CLUSTER_NAME}*private*" | jq --raw-output '.Subnets[].SubnetId'))
for subnet in ${SUBNETS[@]}; do
  aws efs create-mount-target \
    --file-system-id $EFS_FS_ID \
    --subnet-id $subnet \
    --security-groups $SG
done

Create an Amazon EFS storage class:

cat <<EOF | oc apply -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
parameters:
  provisioningMode: efs-ap
  fileSystemId: $EFS_FS_ID
  directoryPerms: "750"
  gidRangeStart: "1000"
  gidRangeEnd: "2000"
  basePath: "/ibm_cp4i_rosa_fs"
EOF

Add the IBM catalog sources to OpenShift:

cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: ibm-operator-catalog
  namespace: openshift-marketplace
spec:
  displayName: IBM Operator Catalog
  image: 'icr.io/cpopen/ibm-operator-catalog:latest'
  publisher: IBM
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 45m
EOF

Get the console URL of your ROSA cluster:

rosa describe cluster --cluster=$ROSA_CLUSTER_NAME | grep Console

Copy your entitlement key from the IBM container software library.
Log in to your ROSA web console, navigate to Workloads > Secrets.
Set the project to openshift-config; locate and click pull-secret (Figure 5).

Figure 5. Edit the pull-secret entry
Expand Actions and click Edit Secret.
Scroll to the end of the page, and click Add credentials (Figure 6):
1. Registry server address: cp.icr.io
2. Username field: cp
3. Password: your_ibm_entitlement_key
  
  Figure 6. Configure your IBM entitlement key secret
Next, navigate to Operators > OperatorHub. On the OperatorHub page, use the search filter to locate the tile for the operators you plan to install: IBM Cloud Pak for Integration and IBM MQ. Keep all values as default for both installations (Figure 7). For example, IBM Cloud Pak for Integration:

Figure 7. Install CP4I operators
Create a namespace for each CP4I workload that will be deployed. In this blog, we’ve created for the platform UI and IBM MQ:
```
oc new-project integration
oc new-project ibm-mq
```
Review the IBM documentation to select the appropriate license for your deployment.

Deploy the platform UI:

cat <<EOF | oc apply -f -
apiVersion: integration.ibm.com/v1beta1
kind: PlatformNavigator
metadata:
  name: integration-quickstart
  namespace: integration
spec:
  license:
    accept: true
    license: L-RJON-CD3JKX
  mqDashboard: true
  replicas: 3  # Number of replica pods, 1 by default, 3 for HA
  storage:
    class: efs-sc
  version: 2022.2.1
EOF

Track the deployment status, which takes approximately 40 minutes:
```
watch oc get platformnavigator -n integration
```

Create an IBM MQ queue manager instance:

cat <<EOF | oc apply -f -
apiVersion: mq.ibm.com/v1beta1
kind: QueueManager
metadata:
  name: qmgr-inst01
  namespace: ibm-mq
spec:
  license:
    accept: true
    license: L-RJON-CD3JKX
    use: NonProduction
  web:
    enabled: true
  template:
    pod:
      containers:
        - env:
            - name: MQSNOAUT
              value: 'yes'
          name: qmgr
  queueManager:
    resources:
      limits:
        cpu: 500m
      requests:
        cpu: 500m
    availability:
      type: SingleInstance
    storage:
      queueManager:
        type: persistent-claim
        class: gp3
        deleteClaim: true
        size: 2Gi
      defaultClass: gp3
    name: CP4IQMGR
  version: 9.3.0.1-r1
EOF

Check the status of the queue manager:

oc describe queuemanager qmgr-inst01 -n ibm-mq

Validation steps

Let’s verify our installation!

Run the commands to retrieve the CP4I URL and administrator password:

oc describe platformnavigator integration-quickstart \
  -n integration | grep "^.*UI Endpoint" | xargs | cut -d ' ' -f3
oc get secret platform-auth-idp-credentials \
  -n ibm-common-services -o jsonpath='{.data.admin_password}' \
  | base64 -d && echo

Using the information from the previous step, access your CP4I web console.
Select the option to authenticate with the IBM provided credentials (admin only) to login with your admin password.
From the CP4I console, you can manage users and groups allowed to access the platform, install new operators, and view the components that are installed.
Click qmgr-inst01 in the Messaging widget to bring up your IBM MQ setup (Figure 8).

Figure 8. CP4I console features
In the Welcome to IBM MQ panel, click the CP4IQMGR queue manager. This shows the state, resources, and allows you to configure your instances (Figure 9).

Figure 9. Queue manager details

Congratulations! You have successfully deployed IBM CP4I on Red Hat OpenShift on AWS.

Post installation

Review the following topics, when you are installing CP4I on production environments:

Configuring identity providers on ROSA
Configure identity and access management on CP4I
Deploying instances of capabilities, like API Connect, App Connect, and DataPower
Enable auto scaling for your ROSA cluster
Configure logging and enable monitoring for your ROSA cluster
Considerations for Amazon EFS to setup IBM MQ using Amazon EFS storage classes.

Cleanup

Connect to your Cloud9 workspace, and run the following steps to delete the CP4I installation, including ROSA. This avoids incurring future charges on your AWS account:

EFS_EF_ID=$(aws efs describe-file-systems \
  --query 'FileSystems[?Name==`ibm_cp4i_fs`].FileSystemId' \
  --output text)
MOUNT_TARGETS=$(aws efs describe-mount-targets --file-system-id $EFS_EF_ID --query 'MountTargets[*].MountTargetId' --output text)
for mt in ${MOUNT_TARGETS[@]}; do
  aws efs delete-mount-target --mount-target-id $mt
done
aws efs delete-file-system --file-system-id $EFS_EF_ID

rosa delete cluster -c $ROSA_CLUSTER_NAME --yes --region $AWS_REGION

Monitor your cluster uninstallation logs, run:

rosa logs uninstall -c $ROSA_CLUSTER_NAME --watch

Once the cluster is uninstalled, remove the operator-roles and oidc-provider, as informed in the output of the rosa delete command. For example:

rosa delete operator-roles -c 1vepskr2ms88ki76k870uflun2tjpvfs --mode auto –yes
rosa delete oidc-provider -c 1vepskr2ms88ki76k870uflun2tjpvfs --mode auto --yes

Conclusion

This post explored how to deploy CP4I on AWS ROSA. We also demonstrated how customers can take full advantage of managed OpenShift service, focusing on further modernizing application stacks by using AWS managed services (like ROSA) for their application deployments.

If you are interested in learning more about ROSA, take part in the AWS ROSA Immersion Workshop.

Check out the blog on Running IBM MQ on AWS using High-performance Amazon FSx for NetApp ONTAP to learn how to use Amazon FSx for NetApp ONTAP for distributed storage and high availability with IBM MQ.

For more information and getting started with IBM Cloud Pak deployments, visit the AWS Marketplace for new offerings.

Deploying IBM Cloud Pak for Data on Red Hat OpenShift Service on AWS

2022-09-14 Eduardo Monich Fronza

Post Syndicated from Eduardo Monich Fronza original https://aws.amazon.com/blogs/architecture/deploying-ibm-cloud-pak-for-data-on-red-hat-openshift-service-on-aws/

Amazon Web Services (AWS) customers who are looking for a more intuitive way to deploy and use IBM Cloud Pak for Data (CP4D) on the AWS Cloud, can now use the Red Hat OpenShift Service on AWS (ROSA).

ROSA is a fully managed service, jointly supported by AWS and Red Hat. It is managed by Red Hat Site Reliability Engineers and provides a pay-as-you-go pricing model, as well as a unified billing experience on AWS.

With this, customers do not manage the lifecycle of Red Hat OpenShift Container Platform clusters. Instead, they are free to focus on developing new solutions and innovating faster, using IBM’s integrated data and artificial intelligence platform on AWS, to differentiate their business and meet their ever-changing enterprise needs.

CP4D can also be deployed from the AWS Marketplace with self-managed OpenShift clusters. This is ideal for customers with requirements, like Red Hat OpenShift Data Foundation software defined storage, or who prefer to manage their OpenShift clusters.

In this post, we discuss how to deploy CP4D on ROSA using IBM-provided Terraform automation.

Cloud Pak for data architecture

Here, we install CP4D in a highly available ROSA cluster across three availability zones (AZs); with three master nodes, three infrastructure nodes, and three worker nodes.

Review the AWS Regions and Availability Zones documentation and the regions where ROSA is available to choose the best region for your deployment.

This is a public ROSA cluster, accessible from the internet via port 443. When deploying CP4D in your AWS account, consider using a private cluster (Figure 1).

Figure 1. IBM Cloud Pak for Data on ROSA

We are using Amazon Elastic Block Store (Amazon EBS) and Amazon Elastic File System (Amazon EFS) for the cluster’s persistent storage. Review the IBM documentation for information about supported storage options.

Review the AWS prerequisites for ROSA, and follow the Security best practices in IAM documentation to protect your AWS account before deploying CP4D.

Cost

The costs associated with using AWS services when deploying CP4D in your AWS account can be estimated on the pricing pages for the services used.

Prerequisites

This blog assumes familiarity with: CP4D, Terraform, Amazon Elastic Compute Cloud (Amazon EC2), Amazon EBS, Amazon EFS, Amazon Virtual Private Cloud, and AWS Identity and Access Management (IAM).

You will need the following before getting started:

Access to an AWS account, with permissions to create the resources described in the installation steps section.
An AWS IAM user, with the permissions described in the AWS prerequisites for ROSA documentation.
Sufficient AWS service quotas to deploy ROSA. You can request service-quota increases from the AWS console.
An IBM entitlement API key: either a 60-day trial or an existing entitlement.
A bastion host to run the CP4D installer, with the following packages:
- AWS Command Line Interface (aws cli)
- OpenShift command-line interface (oc)
- Kubernetes command-line tool (kubectl)
- Terraform
- Git
- Podman
- Python 3.8
- httpd-tools, jq, wget, vim, unzip

Installation steps

Complete the following steps to deploy CP4D on ROSA:

First, enable ROSA on the AWS account. From the AWS ROSA console, click on Enable ROSA, as in Figure 2.

Figure 2. Enabling ROSA on your AWS account
Click on Get started. Redirect to the Red Hat website, where you can register and obtain a Red Hat ROSA token.

Navigate to the AWS IAM console. Create an IAM policy named cp4d-installer-policy and add the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:*",
                "cloudformation:*",
                "cloudwatch:*",
                "ec2:*",
                "elasticfilesystem:*",
                "elasticloadbalancing:*",
                "events:*",
                "iam:*",
                "kms:*",
                "logs:*",
                "route53:*",
                "s3:*",
                "servicequotas:GetRequestedServiceQuotaChange",
                "servicequotas:GetServiceQuota",
                "servicequotas:ListServices",
                "servicequotas:ListServiceQuotas",
                "servicequotas:RequestServiceQuotaIncrease",
                "sts:*",
                "support:*",
                "tag:*"
            ],
            "Resource": "*"
        }
    ]
}

Next, let’s create an IAM user from the AWS IAM console, which will be used for the CP4D installation:
a. Specify a name, like ibm-cp4d-bastion.
b. Set the credential type to Access key – Programmatic access.
c. Attach the IAM policy created in Step 3.
d. Download the .csv credentials file.
From the Amazon EC2 console, create a new EC2 key pair and download the private key.
Launch an Amazon EC2 instance from which the CP4D installer is launched:
a. Specify a name, like ibm-cp4d-bastion.
b. Select an instance type, such as t3.medium.
c. Select the EC2 key pair created in Step 4.
d. Select the Red Hat Enterprise Linux 8 (HVM), SSD Volume Type for 64-bit (x86) Amazon Machine Image.
e. Create a security group with an inbound rule that allows connection. Restrict access to your own IP address or an IP range from your organization.
f. Leave all other values as default.
Connect to the EC2 instance via SSH using its public IP address. The remaining installation steps will be initiated from it.

Install the required packages:

$ sudo yum update -y
$ sudo yum install git unzip vim wget httpd-tools python38 -y

$ sudo ln -s /usr/bin/python3 /usr/bin/python
$ sudo ln -s /usr/bin/pip3 /usr/bin/pip
$ sudo pip install pyyaml

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install

$ wget "https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64"
$ chmod +x jq-linux64
$ sudo mv jq-linux64 /usr/local/bin/jq

$ wget "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.10.15/openshift-client-linux-4.10.15.tar.gz"
$ tar -xvf openshift-client-linux-4.10.15.tar.gz
$ chmod u+x oc kubectl
$ sudo mv oc /usr/local/bin
$ sudo mv kubectl /usr/local/bin

$ sudo yum install -y yum-utils
$ sudo yum-config-manager --add-repo $ https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
$ sudo yum -y install terraform

$ sudo subscription-manager repos --enable=rhel-7-server-extras-rpms
$ sudo yum install -y podman

Configure the AWS CLI with the IAM user credentials from Step 4 and the desired AWS region to install CP4D:

$ aws configure

AWS Access Key ID [None]: AK****************7Q
AWS Secret Access Key [None]: vb************************************Fb
Default region name [None]: eu-west-1
Default output format [None]: json

Clone the following IBM GitHub repository:
https://github.com/IBM/cp4d-deployment.git
```
$ cd ~/cp4d-deployment/managed-openshift/aws/terraform/
```

For the purpose of this post, we enabled Watson Machine Learning, Watson Studio, and Db2 OLTP services on CP4D. Use the example in this step to create a Terraform variables file for CP4D installation. Enable CP4D services required for your use case:

region			= "eu-west-1"
tenancy			= "default"
access_key_id 		= "your_AWS_Access_key_id"
secret_access_key 	= "your_AWS_Secret_access_key"

new_or_existing_vpc_subnet	= "new"
az				= "multi_zone"
availability_zone1		= "eu-west-1a"
availability_zone2 		= "eu-west-1b"
availability_zone3 		= "eu-west-1c"

vpc_cidr 		= "10.0.0.0/16"
public_subnet_cidr1 	= "10.0.0.0/20"
public_subnet_cidr2 	= "10.0.16.0/20"
public_subnet_cidr3 	= "10.0.32.0/20"
private_subnet_cidr1 	= "10.0.128.0/20"
private_subnet_cidr2 	= "10.0.144.0/20"
private_subnet_cidr3 	= "10.0.160.0/20"

openshift_version 		= "4.10.15"
cluster_name 			= "your_ROSA_cluster_name"
rosa_token 			= "your_ROSA_token"
worker_machine_type 		= "m5.4xlarge"
worker_machine_count 		= 3
private_cluster 			= false
cluster_network_cidr 		= "10.128.0.0/14"
cluster_network_host_prefix 	= 23
service_network_cidr 		= "172.30.0.0/16"
storage_option 			= "efs-ebs" 
ocs 				= { "enable" : "false", "ocs_instance_type" : "m5.4xlarge" } 
efs 				= { "enable" : "true" }

accept_cpd_license 		= "accept"
cpd_external_registry 		= "cp.icr.io"
cpd_external_username 	= "cp"
cpd_api_key 			= "your_IBM_API_Key"
cpd_version 			= "4.5.0"
cpd_namespace 		= "zen"
cpd_platform 			= "yes"

watson_knowledge_catalog 	= "no"
data_virtualization 		= "no"
analytics_engine 		= "no"
watson_studio 			= "yes"
watson_machine_learning 	= "yes"
watson_ai_openscale 		= "no"
spss_modeler 			= "no"
cognos_dashboard_embedded 	= "no"
datastage 			= "no"
db2_warehouse 		= "no"
db2_oltp 			= "yes"
cognos_analytics 		= "no"
master_data_management 	= "no"
decision_optimization 		= "no"
bigsql 				= "no"
planning_analytics 		= "no"
db2_aaservice 			= "no"
watson_assistant 		= "no"
watson_discovery 		= "no"
openpages 			= "no"
data_management_console 	= "no"

Save your file, and launch the commands below to install CP4D and track progress:

$ terraform init -input=false
$ terraform apply --var-file=cp4d-rosa-3az-new-vpc.tfvars \
   -input=false | tee terraform.log

The installation runs for 4 or more hours. Once installation is complete, the output includes (as in Figure 3):
a. Commands to get the CP4D URL and the admin user password
b. CP4D admin user
c. Login command for the ROSA cluster

Figure 3. CP4D installation output

Validation steps

Let’s verify the installation!

$ oc login https://api.cp4dblog.17e7.p1.openshiftapps.com:6443 --username cluster-admin --password *****-*****-*****-*****

Initiate the following command to get the cluster’s console URL (Figure 4):
```
$ oc whoami --show-console
```
Figure 4. ROSA console URL
Run the commands in this step to retrieve the CP4D URL and admin user password (Figure 5).
```
$ oc extract secret/admin-user-details \
  --keys=initial_admin_password --to=- -n zen
$ oc get routes -n zen
```
Figure 5. Retrieve the CP4D admin user password and URL

Initiate the following commands to have the CP4D workloads in your ROSA cluster (Figure 6):

$ oc get pods -n zen
$ oc get deployments -n zen
$ oc get svc -n zen 
$ oc get pods -n ibm-common-services 
$ oc get deployments -n ibm-common-services
$ oc get svc -n ibm-common-services
$ oc get subs -n ibm-common-services

Figure 6. Checking the CP4D pods running on ROSA

Log in to your CP4D web console using its URL and your admin password.
Expand the navigation menu. Navigate to Services > Services catalog for the available services (Figure 7).

Figure 7. Navigating to the CP4D services catalog
Notice that the services set as “enabled” correspond with your Terraform definitions (Figure 8).

Figure 8. Services enabled in your CP4D catalog

Congratulations! You have successfully deployed IBM CP4D on Red Hat OpenShift on AWS.

Post installation

Refer to the IBM documentation on setting up services, if you need to enable additional services on CP4D.

When installing CP4D on productive environments, please review the IBM documentation on securing your environment. Also, the Red Hat documentation on setting up identity providers for ROSA is informative. You can also consider enabling auto scaling for your cluster.

Cleanup

Connect to your bastion host, and run the following steps to delete the CP4D installation, including ROSA. This step avoids incurring future charges on your AWS account.

$ cd ~/cp4d-deployment/managed-openshift/aws/terraform/
$ terraform destroy -var-file="cp4d-rosa-3az-new-vpc.tfvars"

If you’ve experienced any failures during the CP4D installation, run these next steps:

$ cd ~/cp4d-deployment/managed-openshift/aws/terraform
$ sudo cp installer-files/rosa /usr/local/bin
$ sudo chmod 755 /usr/local/bin/rosa
$ Cluster_Name=`rosa list clusters -o yaml | grep -w "name:" | cut -d ':' -f2 | xargs`
$ rosa remove cluster --cluster=${Cluster_Name}
$ rosa logs uninstall -c ${Cluster_Name } –watch
$ rosa init --delete-stack
$ terraform destroy -var-file="cp4d-rosa-3az-new-vpc.tfvars"

Conclusion

In summary, we explored how customers can take advantage of a fully managed OpenShift service on AWS to run IBM CP4D. With this implementation, customers can focus on what is important to them, their workloads, and their customers, and less on managing the day-to-day operations of managing OpenShift to run CP4D.

Check out the IBM Cloud Pak for Data Simplifies and Automates How You Turn Data into Insights blog to learn how to use CP4D on AWS to unlock the value of your data.

Additional resources

Using AWS Backup and Oracle RMAN for backup/restore of Oracle databases on Amazon EC2: Part 2

2022-07-11 Jeevan Shetty

Post Syndicated from Jeevan Shetty original https://aws.amazon.com/blogs/architecture/using-aws-backup-and-oracle-rman-for-backup-restore-of-oracle-databases-on-amazon-ec2-part-2/

Customers running Oracle databases on Amazon Elastic Compute Cloud (Amazon EC2) often take database and schema backups using Oracle native tools like Data Pump and Recovery Manager (RMAN) to satisfy data protection, disaster recovery (DR), and compliance requirements. A priority is to reduce backup time as the data grows exponentially and recover sooner in case of failure/disaster.

In Part 1 of this two-part series, we explain how we can use AWS Backup and Amazon Simple Storage Service (Amazon S3) bucket to perform the backup and restore of an Oracle Database on AWS EC2.

In Part 2, we provide a mechanism to use AWS Backup to create a full backup of the EC2 instance, including the OS image, Oracle binaries, logs, and data files. The mechanism also uses Oracle RMAN to perform archived redo log backup to Amazon Elastic File System (Amazon EFS). Then, we demonstrate the steps to restore a database to a specific point-in-time using AWS Backup and Oracle RMAN.

Solution overview

Figure 1 demonstrates the workflow:

Oracle database on Amazon EC2 configured with Oracle Secure Backup.
AWS Backup service to backup EC2 instance at regular intervals.
Amazon EFS for storing Oracle RMAN archive log backups.

Figure 1. Oracle Database in Amazon EC2 using AWS Backup and EFS for backup and restore

Prerequisites

An AWS account
Oracle database and AWS CLI in an EC2 instance
Access to configure AWS Backup
Access to configure EFS to store the Oracle RMAN archive log backups

1. Configure AWS Backup

Configure AWS Backup as detailed in Step 1 of Part 1.

Oracle RMAN archive log backup

While AWS Backup is now creating a daily backup of the EC2 instance, we also want to make sure we backup the archived log files to a protected location. This will let us do point-in-time restores and restore to more recent times than just the last daily EC2 backup. Below we provide the steps to backup archive log using RMAN to Amazon EFS.

Backup/restore archive logs to/from Amazon EFS

Backing up the Oracle Archive logs is an important part of the process. In this section, we will describe how you can backup their Oracle archive logs to Amazon EFS. One advantage of this option (as compared with using Oracle Secure Backup [detailed in Part 1 of this series]) is that it does not require any additional Oracle licensing.

2. Configure Amazon EFS

a. Create an Amazon EFS file system that will be used to store Oracle RMAN Archive log backups. The image below details the steps involved in creation of an Amazon EFS. Consider that a sample file system ID: fs-0123abcdef012345 is created and will be used to store RMAN archive log backup.

Figure 2. Configure Amazon EFS which is used to store Oracle RMAN archive log backups

b. Install the Amazon EFS Client and follow instructions to install EFS client on RHEL EC2 instance. Note: next steps were tested on RHEL 7.9.

sudo yum -y install git
sudo yum -y install rpm-build
git clone https://github.com/aws/efs-utils
cd efs-utils/
sudo yum -y install make
sudo make rpm
sudo yum -y install ./build/amazon-efs-utils*rpm

c. Mount the EFS file system on your EC2 instance. In this example, we show the steps to mount EFS filesystem on EC2 Instance (if the command requests to upgrade stunnel, refer to Upgrading stunnel. Ensure that the EC2 instance profile attached has necessary policies to access EFS. /rman for mount point and file system ID: fs-0123abcdef012345 are examples for EFS file system.

sudo mkdir /rman
sudo mount -t efs -o tls,iam fs-0123abcdef012345 /rman

d. To mount EFS file system automatically on EC2 instance reboot, add an entry in /etc/fstab. This example is for RHEL EC2 instance:

fs-0123abcdef012345:/      /rman        efs     _netdev,tls,iam        0 0

3. Configure RMAN backup to Amazon EFS

With Amazon EFS mounted on EC2 instance, we can configure Oracle RMAN archive log backups to EFS. In below commands oratst is used as an example of your ORACLE_SID.

a. Configure RMAN repository to take control file backup to Amazon EFS automatically.

CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '/rman/ctrl-D_%d_%F';
CONFIGURE CONTROLFILE AUTOBACKUP ON;

b. Create a script (for example, rman_archive.sh) with below commands and schedule using crontab (example entry: */5 * * * * rman_archive.sh) to run every 5 minutes. This will ensure that Oracle Archive logs are backed up to Amazon EFS (/rman) frequently, ensuring an recovery point objective (RPO) of 5 minutes.

dt=`date +%Y%m%d_%H%M%S`

rman target / log=/rman/rman_arch_bkup_oratst_${dt}.log <<EOF

RUN
{
    allocate channel c1_efs device type disk format '/rman/arch-D-%d_%T_s%s_p%p' MAXPIECESIZE 10G;
    BACKUP ARCHIVELOG ALL delete all input;
    release channel c1_efs;
}

EOF

4. Perform database point-in-time recovery

In event of a database crash/corruption, we can use AWS Backup service and Oracle RMAN archive log backup to recover database to a specific point-in-time.

a. Typically, you would pick the most recent Recovery Point completed before the time to which you wish to recover. Using AWS Backup, identify the Recovery point ID to restore by following the steps from Restoring an Amazon EC2 instance. Note: when following the steps, be sure to set the “User data” settings as described in the next bulleted item.

After the EBS volumes are created from the snapshot, there is no need to wait for all of the data to transfer from Amazon S3 to your Amazon EBS volume before your attached instance can start accessing the volume. Amazon EBS Snapshots implement lazy loading, so that you can begin using them right away.

b. Ensure that database does not start automatically after restoring the EC2 instance, by renaming /etc/oratab. Use below command in “User data” section while restoring EC2 instance. After database recovery, we can rename it back to /etc/oratab.

#!/usr/bin/sh
sudo su - 
mv /etc/oratab /etc/oratab_bk

c. Login to the EC2 instance once it is up and execute the RMAN recovery commands mentioned below. Identify the DBID from RMAN logs saved in the EFS. Below commands use database oratst as an example.

rman target /

RMAN> startup nomount

RMAN> set dbid DBID


# Below command is to restore the controlfile from autobackup

RMAN> RUN
{
    set controlfile autobackup format for device type disk to '/rman/ctrl-D_%d_%F';
    RESTORE CONTROLFILE FROM AUTOBACKUP;
    alter database mount;
}


#Identify the recovery point (sequence_number) by listing the backups available in catalog.

RMAN> list backup;

In Figure 3, the most recent archive log backed up is 460, so you can use this sequence number in the next set of RMAN commands.

RMAN> RUN
{
    allocate channel c1_efs device type disk format '/rman/arch-D-%d_%T_s%s_p%p';    
    recover database until sequence sequence_number;
    ALTER DATABASE OPEN RESETLOGS;
    release channel c1_efs;
}

Sample output of Oracle RMAN “list backup” command

Figure 3. Sample output of Oracle RMAN “list backup” command

d. To avoid performance issues due to lazy loading, after the database is open, you can run below command to force a faster restoration of the blocks from S3 bucket to EBS volumes (below example allocates two channels and validates the entire database).

RMAN> RUN
{
  ALLOCATE CHANNEL c1 DEVICE TYPE DISK;
  ALLOCATE CHANNEL c2 DEVICE TYPE DISK;
  VALIDATE database section size 1200M;
}

e. This completes the recovery of database, and we can let the database to auto start by renaming file back to /etc/oratab.

mv /etc/oratab_bk /etc/oratab

5. Backup retention

Ensure that the AWS Backup Lifecycle policy match Oracle archive log backup retention. Also, follow documentation to configure Oracle backup retention and deleting expired backup. Below is a sample command for Oracle backup retention.

CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 31 DAYS; 

RMAN> RUN
{
    allocate channel c1_efs device type disk format '/rman/arch-D-%d_%T_s%s_p%p';

    crosscheck backup;
    delete noprompt obsolete;
    delete noprompt expired backup;
    
    release channel c1_efs;
}

Cleanup

Follow below instructions to remove or cleanup the setup:

Delete the backup plan created in Step 1.
Remove the cron entry from the EC2 instance configured in Step 3b.
Delete the EFS that was created in Step 2 to store Oracle RMAN archive log backups.

Conclusion

In this post, we demonstrated the use for AWS Backup for EC2 snapshot and EFS as storage for Oracle RMAN archive log backups. With this strategy for backup, Oracle Database running on EC2 can be restored and recovered to a point-in-time faster than oracle native backup and recovery strategies. Also, by using EFS for Oracle RMAN archive log backups, we can avoid the additional licensing required to use Oracle Secure Backup, explained in Part 1. You can leverage this solution to facilitate restoring copies of your production database for development or testing purposes and to Recover from a user error that removes data or corrupts existing data.

To learn more about AWS Backup, refer to the AWS Backup Documentation.

Using AWS Backup and Oracle RMAN for backup/restore of Oracle databases on Amazon EC2: Part 1

2022-07-08 Jeevan Shetty

Post Syndicated from Jeevan Shetty original https://aws.amazon.com/blogs/architecture/using-aws-backup-and-oracle-rman-for-backup-restore-of-oracle-databases-on-amazon-ec2-part-1/

Customers running Oracle databases on Amazon Elastic Compute Cloud (Amazon EC2) often take database and schema backups using Oracle native tools, like Data Pump and Recovery Manager (RMAN), to satisfy data protection, disaster recovery (DR), and compliance requirements. A priority is to reduce backup time as the data grows exponentially and recover sooner in case of failure/disaster.

In situations where RMAN backup is used as a DR solution, using AWS Backup to backup the file system and using RMAN to backup the archive logs are an efficient method to perform Oracle database point-in-time recovery in the event of a disaster.

Sample use cases:

Quickly build a copy of production database to test bug fixes or for a tuning exercise.
Recover from a user error that removes data or corrupts existing data.
A complete database recovery after a media failure.

There are two options to backup the archive logs using RMAN:

Using Oracle Secure Backup (OSB) and an Amazon Simple Storage Service (Amazon S3) bucket as the storage for archive logs
Using Amazon Elastic File System (Amazon EFS) as the storage for archive logs

This is Part 1 of this two-part series, we provide a mechanism to use AWS Backup to create a full backup of the EC2 instance, including the OS image, Oracle binaries, logs, and data files. In this post, we will use Oracle RMAN to perform archived redo log backup to an Amazon S3 bucket. Then, we demonstrate the steps to restore a database to a specific point-in-time using AWS Backup and Oracle RMAN.

Solution overview

Figure 1 demonstrates the workflow:

Oracle database on Amazon EC2 configured with Oracle Secure Backup (OSB)
AWS Backup service to backup EC2 instance at regular intervals.
AWS Identity and Access Management (IAM) role for EC2 instance that grants permission to write archive log backups to Amazon S3
S3 bucket for storing Oracle RMAN archive log backups

Figure 1. Oracle Database in Amazon EC2 using AWS Backup and S3 for backup and restore

Prerequisites

For this solution, the following prerequisites are required:

An AWS account
Oracle database and AWS CLI in an EC2 instance
Access to configure AWS Backup
Acces to S3 bucket to store the RMAN archive log backup

1. Configure AWS Backup

You can choose AWS Backup to schedule daily backups of the EC2 instance. AWS Backup efficiently stores your periodic backups using backup plans. Only the first EBS snapshot performs a full copy from Amazon Elastic Block Storage (Amazon EBS) to Amazon S3. All subsequent snapshots are incremental snapshots, copying just the changed blocks from Amazon EBS to Amazon S3, thus, reducing backup duration and storage costs. Oracle supports Storage Snapshot Optimization, which takes third-party snapshots of the database without placing the database in backup mode. By default, AWS Backup now creates crash-consistent backups of Amazon EBS volumes that are attached to an EC2 instance. Customers no longer have to stop their instance or coordinate between multiple Amazon EBS volumes attached to the same EC2 instance to ensure crash-consistency of their application state.

You can create daily scheduled backup of EC2 instances. Figures 2, 3, and 4 are sample screenshots of the backup plan, associating an EC2 instance with the backup plan.

Figure 2. Configure backup rule using AWS Backup

Figure 3. Select EC2 instance containing Oracle Database for backup

Figure 4. Summary screen showing the backup rule and resources managed by AWS Backup

Oracle RMAN archive log backup

While AWS Backup is now creating a daily backup of the EC2 instance, we also want to make sure we backup the archived log files to a protected location. This will let us do point-in-time restores and restore to other recent times than just the last daily EC2 backup. Here, we provide the steps to backup archive log using RMAN to S3 bucket.

Backup/restore archive logs to/from Amazon S3 using OSB

Backing-up the Oracle archive logs is an important part of the process. In this section, we will describe how you can backup their Oracle Archive logs to Amazon S3 using OSB. Note: OSB is a separately licensed product from Oracle Corporation, so you will need to be properly licensed for OSB if you use this approach.

2. Setup S3 bucket and IAM role

Oracle Archive log backups can be scheduled using cron script to run at regular interval (for example, every 15 minutes). These backups are stored in an S3 bucket.

a. Create an S3 bucket with lifecycle policy to transition the objects to S3 Standard-Infrequent Access.
b. Attach the following policy to the IAM Role of EC2 containing Oracle database or create an IAM role (ec2access) with the following policy and attach it to the EC2 instance. Update bucket-name with the bucket created in previous step.


        {
            "Sid": "S3BucketAccess",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name",
                "arn:aws:s3:::bucket-name/*"
            ]
        }

3. Setup OSB

After we have configured the backup of EC2 instance using AWS Backup, we setup OSB in the EC2 instance. In these steps, we show the mechanism to configure OSB.

a. Verify hardware and software prerequisites for OSB Cloud Module.
b. Login to the EC2 instance with User ID owning the Oracle Binaries.
c. Download Amazon S3 backup installer file (osbws_install.zip)
d. Create Oracle wallet directory.

mkdir $ORACLE_HOME/dbs/osbws_wallet

e. Create a file (osbws.sh) in the EC2 instance with the following commands. Update IAM role with the one created/updated in Step 2b.

java -jar osbws_install.jar —IAMRole ec2access walletDir $ORACLE_HOME/dbs/osbws_wallet -libDir $ORACLE_HOME/lib/

f. Change permission and run the file.

chmod 700 osbws.sh
./osbws.sh

Sample output: AWS credentials are valid.
Oracle Secure Backup Web Service wallet created in directory /u01/app/oracle/product/19.3.0.0/db_1/dbs/osbws_wallet.
Oracle Secure Backup Web Service initialization file /u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora created.
Downloading Oracle Secure Backup Web Service Software Library from file osbws_linux64.zip.
Download complete.

g. Set ORACLE_SID by executing below command:

. oraenv

h. Running the script – osbws.sh installs OSB libraries and creates a file called osbws<ORACLE_SID>.ora.
i. Add/modify below with S3 bucket(bucket-name) and region(ex:us-west-2) created in Step 2a.

OSB_WS_HOST=http://s3.us-west-2.amazonaws.com
OSB_WS_BUCKET=bucket-name
OSB_WS_LOCATION=us-west-2

4. Configure RMAN backup to S3 bucket

With OSB installed in the EC2 instance, you can backup Oracle archive logs to S3 bucket. These backups can be used to perform database point-in-time recovery in case of database crash/corruption . oratst is used as an example in below commands.

a. Configure RMAN repository. Example below uses Oracle 19c and Oracle Sid – oratst.

RMAN> configure channel device type sbt parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)';

b. Create a script (for example, rman_archive.sh) with below commands, and schedule using crontab (example entry: */5 * * * * rman_archive.sh) to run every 5 minutes. This will makes sure Oracle Archive logs are backed up to Amazon S3 frequently, thus ensuring an recovery point objective (RPO) of 5 minutes.

dt=`date +%Y%m%d_%H%M%S`

rman target / log=rman_arch_bkup_oratst_${dt}.log <<EOF

RUN
{
	allocate channel c1_s3 device type sbt
	parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)' MAXPIECESIZE 10G;

	BACKUP ARCHIVELOG ALL delete all input;
	Backup CURRENT CONTROLFILE;

release channel c1_s3;
	
}

EOF

c. Copy RMAN logs to S3 bucket. These logs contain the database identifier (DBID) that is required when we have to restore the database using Oracle RMAN.

aws s3 cp rman_arch_bkup_oratst_${dt}.log s3://bucket-name

5. Perform database point-in-time recovery

In the event of a database crash/corruption, we can use AWS Backup service and Oracle RMAN Archive log backup to recover database to a specific point-in-time.

a. Typically, you would pick the most recent recovery point completed before the time you wish to recover. Using AWS Backup, identify the recovery point ID to restore by following the steps on restoring an Amazon EC2 instance. Note: when following the steps, be sure to set the “User data” settings as described in the next bullet item.

After the EBS volumes are created from the snapshot, there is no need to wait for all of the data to transfer from Amazon S3 to your EBS volume before your attached instance can start accessing the volume. Amazon EBS snapshots implement lazy loading, so that you can begin using them right away.

b. Be sure the database does not start automatically after restoring the EC2 instance, by renaming /etc/oratab. Use the following command in “User data” section while restoring EC2 instance. After database recovery, we can rename it back to /etc/oratab.

#!/usr/bin/sh
sudo su - 
mv /etc/oratab /etc/oratab_bk

c. Login to the EC2 instance once it is up, and execute the RMAN recovery commands mentioned. Identify the DBID from RMAN logs saved in the S3 bucket. These commands use database oratst as an example:

rman target /

RMAN> startup nomount

RMAN> set dbid DBID

# Below command is to restore the controlfile from autobackup

RMAN> RUN
{
    allocate channel c1_s3 device type sbt
	parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)';

    RESTORE CONTROLFILE FROM AUTOBACKUP;
    alter database mount;

    release channel c1_s3;
}


#Identify the recovery point (sequence_number) by listing the backups available in catalog.

RMAN> list backup;

In Figure 5, the most recent archive log backed up is 380, so you can use this sequence number in the next set of RMAN commands.

Figure 5. Sample output of Oracle RMAN “list backup” command

RMAN> RUN
{
    allocate channel c1_s3 device type sbt
	parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)';

    recover database until sequence sequence_number;
    ALTER DATABASE OPEN RESETLOGS;
    release channel c1_s3;
}

d. To avoid performance issues due to lazy loading, after the database is open, run the following command to force a faster restoration of the blocks from S3 bucket to EBS volumes (this example allocates two channels and validates the entire database).

RMAN> RUN
{
  ALLOCATE CHANNEL c1 DEVICE TYPE DISK;
  ALLOCATE CHANNEL c2 DEVICE TYPE DISK;
  VALIDATE database section size 1200M;
}

e. This completes the recovery of database, and we can let the database automatically start by renaming file back to /etc/oratab.

mv /etc/oratab_bk /etc/oratab

6. Backup retention

Ensure that the AWS Backup lifecycle policy matches the Oracle Archive log backup retention. Also, follow documentation to configure Oracle backup retention and delete expired backups. This is a sample command for Oracle backup retention:

CONFIGURE BACKUP OPTIMIZATION ON;
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 31 DAYS; 

RMAN> RUN
{
    allocate channel c1_s3 device type sbt
	parms='SBT_LIBRARY=/u01/app/oracle/product/19.3.0.0/db_1/lib/libosbws.so,SBT_PARMS=(OSB_WS_PFILE=/u01/app/oracle/product/19.3.0.0/db_1/dbs/osbwsoratst.ora)';

            crosscheck backup;
            delete noprompt obsolete;
            delete noprompt expired backup;

    release channel c1_s3;
}

Cleanup

Follow below instructions to remove or cleanup the setup:

Delete the backup plan created in Step 1.
Uninstall Oracle Secure Backup from the EC2 instance.
Delete/Update IAM role (ec2access) to remove access from the S3 bucket used to store archive logs.
Remove the cron entry from the EC2 instance configured in Step 4b.
Delete the S3 bucket that was created in Step 2a to store Oracle RMAN archive log backups.

Conclusion

In this post, we demonstrate how to use AWS Backup and Oracle RMAN Archive log backup of Oracle databases running on Amazon EC2 can restore and recover efficiently to a point-in-time, without requiring an extra-step of restoring data files. Data files are restored as part of the AWS Backup EC2 instance restoration. You can leverage this solution to facilitate restoring copies of your production database for development or testing purposes, plus recover from a user error that removes data or corrupts existing data.

To learn more about AWS Backup, refer to the AWS Backup AWS Backup Documentation.

Disaster recovery approaches for Db2 databases on AWS

2022-03-02 Sai Parthasaradhi

Post Syndicated from Sai Parthasaradhi original https://aws.amazon.com/blogs/architecture/disaster-recovery-approaches-for-db2-databases-on-aws/

As you migrate your critical enterprise workloads from an IBM Db2 on-premises database to the AWS Cloud, it’s critical to have a reliable and effective disaster recovery (DR) strategy. This helps the database applications operate with little or no disruption from unexpected events like a natural disaster.

Recovery point objective (RPO), recovery time objective (RTO), and cost, are three key metrics to consider when developing your DR strategy, (see Figure 1.) Based on these metrics, you can define your DR strategy for Db2 databases on AWS. It can be either an on-demand backup restore approach or nearly continuous replication method.

Figure 1. Disaster recovery strategies

In this post, we show an overview of active/passive cross-Region disaster recovery options for the Db2 database on Amazon Elastic Compute Cloud (Amazon EC2). This solution uses native Db2 features and AWS services such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), and Amazon VPC Peering connection.

Approach 1: Db2 log shipping

In this approach, the transactional log files produced by the primary database are made available to the standby database via a log archive location. The transaction logs from the archive location can be replayed on the standby database by manually applying the Rollforward command, or by setting up user exit programs.

We can use Amazon S3 or Amazon EFS as the log archive location to share the logs with the standby database hosted in a secondary AWS Region.

Using Amazon S3:

Starting Db2 11.5.7, we can specify DB2REMOTE Amazon S3 storage for LOGARCHMETH1 and LOGARCHMETH2 database log archive method configuration parameters. This enables us to archive/retrieve transaction logs to/from Amazon S3.

In Figure 2, we enable Amazon S3 Cross-Region Replication (CRR) between the S3 buckets in the primary and the DR AWS Regions. This permits the transaction logs to be replicated into the S3 bucket in the DR Region.

We set up an AWS Lambda function to tell AWS Systems Manager (SSM) to run a command document. This document runs a bash script containing Rollforward command on the standby database instance. The Lambda function can be invoked based on the S3 bucket events in the DR Region.

Figure 2. Db2 log shipping using S3 Cross-Region Replication

This approach works as follows:

The transactions are committed and the active transaction log files gets closed on the primary database. It then marks the log file as ready for archive into the destination (the S3 bucket.)
The database asynchronously archives the log files into the S3 bucket archive location in the primary Region. This gets replicated to the S3 bucket in the DR Region.
This S3 event in the DR Region will initiate an AWS Lambda function to apply the Rollforward database operation on the standby database.
Db2 pulls the logs from the S3 bucket in the DR Region and applies them to the standby database.
When the primary Region is unavailable, initiate failover manually or by using scripts on the standby database. Use the Rollforward command so that the database can replay up to the end of logs and stop and be ready to accept client connections.

Using Amazon EFS:

In this approach, we configure the database parameter LOGARCHMETH1 with Amazon EFS as an archive location for transaction logs using the DISK option. It will push the transaction logs to a directory on Amazon EFS.

As shown in Figure 3, we configure a Replication for Amazon EFS to automatically replicate the database archive logs to the EFS in the DR Region. This can be mounted on the standby database.

Figure 3. Db2 log shipping using Amazon EFS replication

This approach replicates transaction logs to EFS. We can schedule a script for every few minutes that runs the Rollforward command to replay the logs on the standby database.

Alternatively, we can use the user exit programs provided along with the Db2 installation. This automatically applies the logs with the log archive method LOGARCHMETH1 with the parameter value set to USEREXIT.

This approach has the following advantages:

Straightforward setup, with minimal database configurations.
This can be a DR option for multi-partitioned database environments or environments where federation is set up with two-phase commit for federated transactions.
Bulk load operations on the primary database can be replayed on standby by sharing the load image using EFS.
Rollforward operation progress can be checked on standby using monitoring commands.

Limitations of this approach are as follows:

We cannot connect to the standby database to offload read-only workloads as the database will be in Rollforward recovery mode.
We must write custom scripts like Lambda, user exit programs, or bash scripts to replay the logs on the standby database.
Non-logged operations, such as database configuration parameters or nonrecoverable bulk data loads, are not replayed on standby database.
Automated failover to standby is not possible.

Approach 2: Db2 highly available and disaster recovery (HADR) auxiliary standby

In this approach, we set up Db2 Highly Available and Disaster Recovery (HADR) to deploy an auxiliary Db2 standby database in a secondary or DR AWS Region.

The architecture for this approach is shown in Figure 4, and works as follows:

We establish TCP/IP connectivity between the primary and auxiliary Db2 standby database using Amazon VPC Peering connection.
Any transaction written on the primary Db2 database is committed without waiting for replication onto the auxiliary standby database.
Replicated transactions are replayed on the auxiliary standby database, which connects with the primary database in a remote catchup state.
When the primary AWS Region is unavailable, promote standby database to primary using the takeover commands manually.

Figure 4. Db2 HADR with auxiliary standby database

This approach has the following advantages:

The replication is handled by the database automatically without the need for custom scripts.
We can enable reads on standby to offload read-only workload, such as reporting from the primary database to stand by. This will reduce the load on the primary database.
Key metrics such as replication lag, connection status, and errors can be monitored from the primary database.

Limitations of this approach are as follows:

Non-logged operations, such as database configuration parameters or nonrecoverable bulk data loads are not replayed on the standby database.
This approach is not supported in a multi-partitioned database environment or two phase commit federated transactions.
Automated failover to standby is not possible.
There are various other restrictions, which must be evaluated.

Conclusion

In this post, we discussed how to set up a disaster recovery Db2 database using database native features and AWS services. We discussed the advantages and restrictions for each. You can use this post as a reference for setting up the right disaster recovery approach for your database to minimize data loss and maintain business continuity. Let us know your comments, we always love your feedback!

For further reading:

Amazon Elastic File System Update – Sub-Millisecond Read Latency

2022-02-15 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-elastic-file-system-update-sub-millisecond-read-latency/

Amazon Elastic File System (Amazon EFS) was announced in early 2015 and became generally available in 2016. We launched EFS in order to make it easier for you to build applications that need shared access to file data. EFS is (and always has been) simple and serverless: you simply create a file system, attach it to any number of EC2 instances, Lambda functions, or containers, and go about your work. EFS is highly durable and scalable, and gives you a strong read-after-write consistency model.

Since the 2016 launch we have added many new features and capabilities including encryption data at rest and in transit, an Infrequent Access storage class, and several other lower cost storage classes. We have also worked to improve performance, delivering a 400% increase in read operations per second, a 100% increase in per-client throughput, and then a further tripling of read throughput.

Our customers use EFS file systems to support many different applications and use cases including home directories, build farms, content management (WordPress and Drupal), DevOps (Git, GitLab, Jenkins, and Artifactory), and machine learning inference, to name a few of each.

Sub-Millisecond Read Latency
Faster is always better, and today I am thrilled to be able to tell you that your latency-sensitive EFS workloads can now run about twice as fast as before!

Up until today, EFS latency for read operations (both data and metadata) was typically in the low single-digit milliseconds. Effective today, new and existing EFS file systems now provide average latency as low as 600 microseconds for the majority of read operations on data and metadata.

This performance boost applies to One Zone and Standard General Purpose EFS file systems. New or old, you will still get the same availability, durability, scalability, and strong read-after-write consistency that you have come to expect from EFS, at no additional cost and with no configuration changes.

We “flipped the switch” and enabled this performance boost for all existing EFS General Purpose mode file systems over the course of the last few weeks, so you may already have noticed the improvement. Of course, any new file systems that you create will also benefit.

Learn More
To learn more about the performance characteristics of EFS, read Amazon EFS Performance.

— Jeff;

PS – Our multi-year roadmap contains a bunch of short-term and long-term performance enhancements, so stay tuned for more good news!

New – Replication for Amazon Elastic File System (EFS)

2022-01-26 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-replication-for-amazon-elastic-file-system-efs/

Amazon Elastic File System (Amazon EFS) allows EC2 instances, AWS Lambda functions, and containers to share access to a fully-managed file system. First announced in 2015 and generally available in 2016, Amazon EFS delivers low-latency performance for a wide variety of workloads and can scale to thousands of concurrent clients or connections. Since the 2016 launch we have continued to listen and to innovate, and have added many new features and capabilities in response to your feedback. These include on-premises access via Direct Connect (2016), encryption of data at rest (2017), provisioned throughput and encryption of data in transit (2018), an infrequent access storage class (2019), IAM authorization & access points (2020), lower-cost one zone storage classes (2021), and more.

Introducing Replication
Today I am happy to announce that you can now use replication to automatically maintain copies of your EFS file systems for business continuity or to help you to meet compliance requirements as part of your disaster recovery strategy. You can set this up in minutes for new or existing EFS file systems, with replication either within a single AWS region or between two AWS regions in the same AWS partition.

Once configured, replication begins immediately. All replication traffic stays on the AWS global backbone, and most changes are replicated within a minute, with an overall Recovery Point Objective (RPO) of 15 minutes for most file systems. Replication does not consume any burst credits and it does not count against the provisioned throughput of the file system.

Configuring Replication
To configure replication, I open the Amazon EFS Console , view the file system that I want to replicate, and select the Replication tab:

I click Create replication, choose the desired destination region, and select the desired storage (Regional or One Zone). I can use the default KMS key for encryption or I can choose another one. I review my settings and click Create replication to proceed:

Replication begins right away and I can see the new, read-only file system immediately:

A new CloudWatch metric, TimeSinceLastSync, is published when the initial replication is complete, and periodically after that:

The replica is created in the selected region. I create any necessary mount targets and mount the replica on an EC2 instance:

EFS tracks modifications to the blocks (currently 4 MB) that are used to store files and metadata, and replicates the changes at a rate of up to 300 MB per second. Because replication is block-based, it is not crash-consistent; if you need crash-consistency you may want to take a look at AWS Backup.

After I have set up replication, I can change the lifecycle management, intelligent tiering, throughput mode, and automatic backup setting for the destination file system. The performance mode is chosen when the file system is created, and cannot be changed.

Initiating a Fail-Over
If I need to fail over to the replica, I simply delete the replication. I can do this from either side (source or destination), by clicking Delete and confirming my intent:

I enter delete, and click Delete replication to proceed:

The former read-only replica is now a writable file system that I can use as part of my recovery process. To fail-back, I create a replica in the original location, wait for replication to finish, and delete the replication.

I can also use the command line and the EFS APIs to manage replication. For example:

createreplication-configuration / CreateReplicationConfiguration – Establish replication for an existing file system.

describe-replication-configurations / DescribeReplicationConfigurations – See the replication configuration for a source or destination file system, or for all replication configurations in an AWS account. The data returned for a destination file system also includes LastReplicatedTimestamp, the time of the last successful sync.

delete-replication-configuration / DeleteReplicationConfiguration – End replication for a file system.

Available Now
This new feature is available now and you can start using it today in the AWS US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), Europe (Paris), Europe (Stockholm), South America (São Paulo), and GovCloud Regions.

You pay the usual storage fees for the original and replica file systems and any applicable cross-region or intra-region data transfer charges.

— Jeff;

Choosing between storage mechanisms for ML inferencing with AWS Lambda

2021-11-02 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/choosing-between-storage-mechanisms-for-ml-inferencing-with-aws-lambda/

This post is written by Veda Raman, SA Serverless, Casey Gerena, Sr Lab Engineer, Dan Fox, Principal Serverless SA.

For real-time machine learning inferencing, customers often have several machine learning models trained for specific use-cases. For each inference request, the model must be chosen dynamically based on the input parameters.

This blog post walks through the architecture of hosting multiple machine learning models using AWS Lambda as the compute platform. There is a CDK application that allows you to try these different architectures in your own account. Finally, it then discusses the different storage options for hosting the models and the benefits of each.

Overview

The serverless architecture for inferencing uses AWS Lambda and API Gateway. The machine learning models are stored either in Amazon S3 or Amazon EFS. Alternatively, they are part of the Lambda function deployed as a container image and stored in Amazon ECR.

All three approaches package and deploy the machine learning inference code as Lambda function along with the dependencies as a container image. More information on how to deploy Lambda functions as container images can be found here.

A user sends a request to Amazon API Gateway requesting a machine learning inference.
API Gateway receives the request and triggers Lambda function with the necessary data.
Lambda loads the container image from Amazon ECR. This container image contains the inference code and business logic to run the machine learning model. However, it does not store the machine learning model (unless using the container hosted option, see step 6).
Model storage option: For S3, when the Lambda function is triggered, it downloads the model files from S3 dynamically and performs the inference.
Model storage option: For EFS, when the Lambda function is triggered, it accesses the models via the local mount path set in the Lambda file system configuration and performs the inference.
Model storage option: If using the container hosted option, you must package the model in Amazon ECR with the application code defined for the Lambda function in step 3. The model runs in the same container as the application code. In this case, choosing the model happens at build-time as opposed to runtime.
Lambda returns the inference prediction to API Gateway and then to the user.

The storage option you choose, either Amazon S3, Amazon EFS, or Amazon ECR via Lambda OCI deployment, to host the models influences the inference latency, cost of the infrastructure and DevOps deployment strategies.

Comparing single and multi-model inference architectures

There are two types of ML inferencing architectures, single model and multi-model. In single model architecture, you have a single ML inference model that performs the inference for all incoming requests. The model is stored either in S3, ECR (via OCI deployment with Lambda), or EFS and is then used by a compute service such as Lambda.

The key characteristic of a single model is that each has its own compute. This means that for every Lambda function there is a single model associated with it. It is a one-to-one relationship.

Multi-model inferencing architecture is where there are multiple models to be deployed and the model to perform the inference should be selected dynamically based on the type of request. So you may have four different models for a single application and you want a Lambda function to choose the appropriate model at invocation time. It is a many-to-one relationship.

Regardless of whether you use single or multi-model, the models must be stored in S3, EFS, or ECR via Lambda OCI deployments.

Should I load a model outside the Lambda handler or inside?

It is a general best practice in Lambda to load models and anything else that may take a longer time to process outside of the Lambda handler. For example, loading a third-party package dependency. This is due to cold start invocation times – for more information on performance, read this blog.

However, if you are running a multi-model inference, you may want to load inside the handler so you can load a model dynamically. This means you could potentially store 100 models in EFS and determine which model to load at the time of invocation of the Lambda function.

In these instances, it makes sense to load the model in the Lambda handler. This can increase the processing time of your function, since you are loading the model at the time of request.

Deploying the solution

The example application is open-sourced. It performs NLP question/answer inferencing using the HuggingFace BERT model using the PyTorch framework (expanding upon previous work found here). The inference code and the PyTorch framework are packaged as a container image and then uploaded to ECR and the Lambda service.

The solution has three stacks to deploy:

MlEfsStack – Stores the inference models inside of EFS and loads two models inside the Lambda handler, the model is chosen at invocation time.
MlS3Stack – Stores the inference model inside of S3 and loads a single model outside of the Lambda handler.
MlOciStack – Stores the inference models inside of the OCI container loads two models outside of the Lambda handler, the model is chosen at invocation time.

To deploy the solution, follow along the README file on GitHub.

Testing the solution

To test the solution, you can either send an inference request through API Gateway or invoke the Lambda function through the CLI. To send a request to the API, run the following command in a terminal (be sure to replace with your API endpoint and Region):

curl --location --request POST 'https://asdf.execute-api.us-east-1.amazonaws.com/develop/' --header 'Content-Type: application/json' --data-raw '{"model_type": "nlp1","question": "When was the car invented?","context": "Cars came into global use during the 20th century, and developed economies depend on them. The year 1886 is regarded as the birth year of the modern car when German inventor Karl Benz patented his Benz Patent-Motorwagen. Cars became widely available in the early 20th century. One of the first cars accessible to the masses was the 1908 Model T, an American car manufactured by the Ford Motor Company. Cars were rapidly adopted in the US, where they replaced animal-drawn carriages and carts, but took much longer to be accepted in Western Europe and other parts of the world."}'

General recommendations for model storage

For single model architectures, you should always load the ML model outside of the Lambda handler for increased performance on subsequent invocations after the initial cold start, this is true regardless of the model storage architecture that is chosen.

For multi-model architectures, if possible, load your model outside of the Lambda handler; however, if you have too many models to load in advance then load them inside of the Lambda handler. This means that a model will be loaded at every invocation of Lambda, increasing the duration of the Lambda function.

Recommendations for model hosting on S3

S3 is a good option if you need a simpler, low-cost storage option to store models. S3 is recommended when you cannot predict your application traffic volume for inference.

Additionally, if you must retrain the model, you can upload the retrained model to the S3 bucket without redeploying the Lambda function.

Recommendations for model hosting on EFS

EFS is a good option if you have a latency-sensitive workload for inference or you are already using EFS in your environment for other machine learning related activities (for example, training or data preparation).

With EFS, you must VPC-enable the Lambda function to mount the EFS filesystem, which requires an additional configuration.

For EFS, it’s recommended that you perform throughput testing with both EFS burst mode and provisioned throughput modes. Depending on inference request traffic volume, if the burst mode is not able to provide the desired performance, you must provision throughput for EFS. See the EFS burst throughput documentation for more information.

Recommendations for container hosted models

This is the simplest approach since all the models are available in the container image uploaded to Lambda. This also has the lowest latency since you are not downloading models from external storage.

However, it requires that all the models are packaged into the container image. If you have too many models that cannot fit into the 10 GB of storage space in the container image, then this is not a viable option.

One drawback of this approach is that anytime a model changes, you must re-package the models with the inference Lambda function code.

This approach is recommended if your models can fit in the 10 GB limit for container images and you are not re-training models frequently.

Cleaning up

To clean up resources created by the CDK templates, run “cdk destroy <StackName>”

Conclusion

Using a serverless architecture for real-time inference can scale your application for any volume of traffic while removing the operational burden of managing your own infrastructure.

In this post, we looked at the serverless architecture that can be used to perform real-time machine learning inference. We then discussed single and multi-model architectures and how to load the models in the Lambda function. We then looked at the different storage mechanisms available to host the machine learning models. We compared S3, EFS, and container hosting for storing models and provided our recommendations of when to use each.

For more learning resources on serverless, visit Serverless Land.

Migrate Resources Between AWS Accounts

2021-09-28 Ashok Srirama

Post Syndicated from Ashok Srirama original https://aws.amazon.com/blogs/architecture/migrate-resources-between-aws-accounts/

Have you ever wondered how to move resources between Amazon Web Services (AWS) accounts? You can really view this as a migration of resources. Migrating resources from one AWS account to another may be desired or required due to your business needs. Following are a few scenarios where this may be of benefit:

When you acquire, sell, or merge overseas operations from other businesses.
When you move regional operations from one Managed Service Provider (MSP) to another.
When you reorganize your AWS account and organizational structure.

This process may involve migrating the infrastructure either partially or completely.

In this blog, we will discuss various approaches to migrating resources based on type, configuration, and workload needs. Usually, the first consideration is infrastructure. What’s in your environment? What are the interdependencies? How will you migrate each resource?

Using this information, you can outline a plan on how you will approach migrating each of the resources in your portfolio, and in what order.

Here are some considerations to address for a typical migration:

Infrastructure – This includes resources required to run your workloads that do not have any persistent data. Examples are AWS Lambda functions, Amazon Elastic Container Service clusters, Elastic load balancers, and more.
Compute – This includes Amazon Elastic Compute Cloud (Amazon EC2) instances that store persistent data in Amazon Elastic Block Store (EBS) volumes.
Storage – This includes file and object storage services like Amazon Simple Storage Service (S3), Amazon Elastic File System (EFS), or others.
Database – This includes managed database services like Amazon Relational Database Service (RDS), Amazon DynamoDB, and Amazon Redshift.

Let’s look at each of these considerations in detail.

Migrating infrastructure

To migrate infrastructure that includes ephemeral resources, you can use one of the following Infrastructure as Code (IaC) approaches, shown in Figure 1. IaC templates are like programming scripts that automate the provisioning of IT resources.

Figure 1. Approaches to migrate infrastructure using IaC

1. If you are already using AWS CloudFormation templates, you can easily import the existing templates to the target AWS account.

AWS CloudFormation simplifies provisioning and management on AWS. You can create templates for quick and reliable provisioning of services or applications (called “stacks”).

2. You can use tools like Former2 to templatize your existing resources in the source AWS account and deploy them in the target account.

Former2 is an open-source project that allows you to generate IaC templates. For example, AWS CloudFormation or HashiCorp Terraform can be generated from the existing resources within your AWS account. Read Accelerate infrastructure as code development with open source Former2 for step-by-step guidance.

Migrating compute resources

To migrate compute resources that have a persistent state, you can use one of the following approaches, shown in Figure 2. These provide a virtual computing environment, allowing you to launch instances with a variety of operating systems.

Figure 2. Approaches to migrate compute resources

1. If you are already using AWS Backup service and AWS Organizations to centrally manage backup policies, you can enable AWS Backup cross-account management feature. This manages, monitors, restores your backup, and copies jobs across AWS accounts. Ensure you have both accounts in same AWS Organization. Once the backups are available in the target account, you can restore EC2 instances. Follow detailed instructions at Creating backup copies across AWS accounts.

AWS Backup is a fully managed data protection service that centralizes and automates data across AWS services, in the cloud, and on-premises. You can configure backup policies and monitor activity for your AWS resources. You can automate and consolidate backup tasks that were previously performed service-by-service. This removes the need to create custom scripts and use manual processes.

2. Create an Amazon Machine Image of your EC2 instances and share it with the target account. You can launch new EC2 instances using the shared AMI. Follow step-by-step instructions: How do I transfer an Amazon EC2 instance or AMI to a different AWS account?

Amazon Machine Image (AMI) provides the information required to launch an instance. Specify an AMI and then launch multiple instances from a single AMI with the same configuration. You can use different AMIs to launch instances when you need instances with different configurations.

For migrating non-persistent compute resources, refer Migrating Infrastructure section.

Migrating storage resources

AWS offers various storage services including object, file, and block storage. To migrate objects from a S3 bucket, you can take the following approaches, shown in Figure 3a.

Figure 3a. Approaches to migrate S3 buckets

1. Use Amazon S3 command line interface (CLI) commands to copy the initial load of objects from the source account to the target account. Read How can I copy S3 objects from another AWS account? After the initial copy, you can enable Amazon S3 replication feature to continuously replicate object changes across accounts. Add a bucket policy to grant source bucket permission to replicate objects in destination bucket. Read this walkthrough on how to configure replications.

2. If the S3 bucket contains large number of objects, use Amazon S3 Batch operations to copy objects across AWS accounts in bulk. Read Cross-account bulk transfer of files using Amazon S3 Batch Operations.

To migrate files from an Amazon EFS file system, you can take the following approach, shown in Figure 3b.

Figure 3b. Approach to migrate EFS file systems

Use AWS DataSync agent to transfer data from one EFS file system to another. AWS DataSync is an online transfer service that simplifies moving, copying, and synchronizing large amounts of data between on-premises storage systems and AWS storage services. Read Transferring file data across AWS Regions and accounts using AWS DataSync for step-by-step guidance.

Migrating database resources

AWS offers various purpose-built database engines. These include relational, key-value, document, in-memory, graph, time series, wide column, and ledger databases. To migrate relational databases, you can take one of the following approaches, shown in Figure 4.

Figure 4. Approaches to migrate relational database resources

1. If you want to continuously replicate data changes, use AWS Database Migration Service (AWS DMS) to replicate your data across AWS accounts with high availability. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. You can set up a DMS task for either one-time migration or on-going replication. An on-going replication task keeps your source and target databases in sync. Once set up, the on-going replication task will continuously apply source changes to the target with minimal latency. Learn how to Set Up AWS DMS for Cross-Account Migration.

AWS DMS is a cloud service that streamlines the migration of relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud or between combinations of cloud and on-premises setups.

2. Use RDS Snapshots to create and share database backups across AWS accounts. Use the shared snapshots to launch new Amazon Relational Database Service (RDS) instances in the target account. Read step-by-step instructions: How can I share an encrypted Amazon RDS DB snapshot with another account?

3. Use AWS Backup to create backup policies that automatically back up your AWS resources. Use AWS Backup cross-account management feature to manage and monitor your backup, restore, and copy jobs across AWS accounts. Once the backups are available in the target account, you can restore RDS instances. Learn about Creating backup copies across AWS accounts.

In this section, we discussed relational databases migration. You can also use AWS DMS for migrating other databases. Read supported AWS DMS source and target databases.

Conclusion

In this blog post, we discussed various approaches you can take to migrate resources from one account to another depending upon the resource type and configuration. Additionally, you can also explore CloudEndure Migration for continuous data replication. Learn more about Migrating workloads across AWS Regions with CloudEndure Migration.

Welcome to AWS Storage Day 2021

2021-09-02 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/welcome-to-aws-storage-day-2021/

Welcome to the third annual AWS Storage Day 2021! During Storage Day 2020 and the first-ever Storage Day 2019 we made many impactful announcements for our customers and this year will be no different. The one-day, free AWS Storage Day 2021 virtual event will be hosted on the AWS channel on Twitch. You’ll hear from experts about announcements, leadership insights, and educational content related to AWS Storage services.

The first part of the day is the leadership track. Wayne Duso, VP of Storage, Edge, and Data Governance, will be presenting a live keynote. He’ll share information about what’s new in AWS Cloud Storage and how these services can help businesses increase agility and accelerate innovation. The keynote will be followed by live interviews with the AWS Storage leadership team, including Mai-Lan Tomsen Bukovec, VP of AWS Block and Object Storage.

The second part of the day is a technical track in which you’ll learn more about Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (EBS), Amazon Elastic File System (Amazon EFS), AWS Backup, Cloud Data Migration, AWS Transfer Family and Amazon FSx.

To register for the event, visit the AWS Storage Day 2021 event page.

Now as Jeff Barr likes to say, let’s get into the announcements.

Amazon FSx for NetApp ONTAP
Today, we are pleased to announce Amazon FSx for NetApp ONTAP, a new storage service that allows you to launch and run fully managed NetApp ONTAP file systems in the cloud. Amazon FSx for NetApp ONTAP joins Amazon FSx for Lustre and Amazon FSx for Windows File Server as the newest file system offered by Amazon FSx.

Amazon FSx for NetApp ONTAP provides the full ONTAP experience with capabilities and APIs that make it easy to run applications that rely on NetApp or network-attached storage (NAS) appliances on AWS without changing your application code or how you manage your data. To learn more, read New – Amazon FSx for NetApp ONTAP.

Amazon S3
Amazon S3 Multi-Region Access Points is a new S3 feature that allows you to define global endpoints that span buckets in multiple AWS Regions. Using this feature, you can now build multi-region applications without adding complexity to your applications, with the same system architecture as if you were using a single AWS Region.

S3 Multi-Region Access Points is built on top of AWS Global Accelerator and routes S3 requests over the global AWS network. S3 Multi-Region Access Points dynamically routes your requests to the lowest latency copy of your data, so the upload and download performance can increase by 60 percent. It’s a great solution for applications that rely on reading files from S3 and also for applications like autonomous vehicles that need to write a lot of data to S3. To learn more about this new launch, read How to Accelerate Performance and Availability of Multi-Region Applications with Amazon S3 Multi-Region Access Points.

There’s also great news about the Amazon S3 Intelligent-Tiering storage class! The conditions of usage have been updated. There is no longer a minimum storage duration for all objects stored in S3 Intelligent-Tiering, and monitoring and automation charges for objects smaller than 128 KB have been removed. Smaller objects (128 KB or less) are not eligible for auto-tiering when stored in S3 Intelligent-Tiering. Now that there is no monitoring and automation charge for small objects and no minimum storage duration, you can use the S3 Intelligent-Tiering storage class by default for all your workloads with unknown or changing access patterns. To learn more about this announcement, read Amazon S3 Intelligent-Tiering – Improved Cost Optimizations for Short-Lived and Small Objects.

Amazon EFS
Amazon EFS Intelligent Tiering is a new capability that makes it easier to optimize costs for shared file storage when access patterns change. When you enable Amazon EFS Intelligent-Tiering, it will store the files in the appropriate storage class at the right time. For example, if you have a file that is not used for a period of time, EFS Intelligent-Tiering will move the file to the Infrequent Access (IA) storage class. If the file is accessed again, Intelligent-Tiering will automatically move it back to the Standard storage class.

To get started with Intelligent-Tiering, enable lifecycle management in a new or existing file system and choose a lifecycle policy to automatically transition files between different storage classes. Amazon EFS Intelligent-Tiering is perfect for workloads with changing or unknown access patterns, such as machine learning inference and training, analytics, content management and media assets. To learn more about this launch, read Amazon EFS Intelligent-Tiering Optimizes Costs for Workloads with Changing Access Patterns.

AWS Backup
AWS Backup Audit Manager allows you to simplify data governance and compliance management of your backups across supported AWS services. It provides customizable controls and parameters, like backup frequency or retention period. You can also audit your backups to see if they satisfy your organizational and regulatory requirements. If one of your monitored backups drifts from your predefined parameters, AWS Backup Audit Manager will let you know so you can take corrective action. This new feature also enables you to generate reports to share with auditors and regulators. To learn more, read How to Monitor, Evaluate, and Demonstrate Backup Compliance with AWS Backup Audit Manager.

Amazon EBS
Amazon EBS direct APIs now support creating 64 TB EBS Snapshots directly from any block storage data, including on-premises. This was increased from 16 TB to 64 TB, allowing customers to create the largest snapshots and recover them to Amazon EBS io2 Block Express Volumes. To learn more, read Amazon EBS direct API documentation.

AWS Transfer Family
AWS Transfer Family Managed Workflows is a new feature that allows you to reduce the manual tasks of preprocessing your data. Managed Workflows does a lot of the heavy lifting for you, like setting up the infrastructure to run your code upon file arrival, continuously monitoring for errors, and verifying that all the changes to the data are logged. Managed Workflows helps you handle error scenarios so that failsafe modes trigger when needed.

AWS Transfer Family Managed Workflows allows you to configure all the necessary tasks at once so that tasks can automatically run in the background. Managed Workflows is available today in the AWS Transfer Family Management Console. To learn more, read Transfer Family FAQ.

Join us online for more!
Don’t forget to register and join us for the AWS Storage Day 2021 virtual event. The event will be live at 8:30 AM Pacific Time (11:30 AM Eastern Time) on September 2. The event will immediately re-stream for the Asia-Pacific audience with live Q&A moderators on Friday, September 3, at 8:30 AM Singapore Time. All sessions will be available on demand next week.

We look forward to seeing you there!

— Marcia

New – Amazon EFS Intelligent-Tiering Optimizes Costs for Workloads with Changing Access Patterns

2021-09-02 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-efs-intelligent-tiering-optimizes-costs-for-workloads-with-changing-access-patterns/

Amazon Elastic File System (Amazon EFS) offers four storage classes: two Standard storage classes, Amazon EFS Standard and Amazon EFS Standard-Infrequent Access (EFS Standard-IA), and two One Zone storage classes, Amazon EFS One Zone, and Amazon EFS One Zone-Infrequent Access (EFS One Zone-IA). Standard storage classes store data within and across multiple availability zones (AZ). One Zone storage classes store data redundantly within a single AZ, at 47 percent lower price compared to file systems using Standard storage classes, for workloads that don’t require multi-AZ resilience.

The EFS Standard and EFS One Zone storage classes are performance-optimized to deliver lower latency. The Infrequent Access (IA) storage classes are cost-optimized for files that are not accessed every day. With EFS lifecycle management, you can move files that have not been accessed for the duration of the lifecycle policy (7, 14, 30, 60, or 90 days) to the IA storage classes. This will reduce the cost of your storage by up to 92 percent compared to EFS Standard and EFS One Zone storage classes respectively.

Customers love the cost savings provided by the IA storage classes, but they also want to ensure that they won’t get unexpected data access charges if access patterns change and files that have transitioned to IA are accessed frequently. Reading from or writing data to the IA storage classes incurs a data access charge for every access.

Today, we are launching Amazon EFS Intelligent-Tiering, a new EFS lifecycle management feature that automatically optimizes costs for shared file storage when data access patterns change, without operational overhead.

With EFS Intelligent-Tiering, lifecycle management monitors the access patterns of your file system and moves files that have not been accessed for the duration of the lifecycle policy from EFS Standard or EFS One Zone to EFS Standard-IA or EFS One Zone-IA, depending on whether your file system uses EFS Standard or EFS One Zone storage classes. If the file is accessed again, it is moved back to EFS Standard or EFS One Zone storage classes.

EFS Intelligent-Tiering optimizes your costs even if your workload file data access patterns change. You’ll never have to worry about unbounded data access charges because you only pay for data access charges for transitions between storage classes.

Getting started with EFS Intelligent-Tiering
To get started with EFS Intelligent-Tiering, create a file system using the AWS Management Console, enable lifecyle management and set two lifecycle policies.

Choose a Transition into IA option to move infrequently accessed files to the IA storage classes. From the drop down list, you can choose lifecycle policies of 7, 14, 30, 60, or 90 days. Additionally, choose a Transition out of IA option and select On first access to move files back to EFS Standard or EFS One Zone storage classes on access.

For an existing file system, you can click the Edit button on your file system to enable or change lifecycle management and EFS Intelligent-Tiering.

Also, you can use the PutLifecycleConfiguration API action or put-lifecycle-configuration command specifying the file system ID of the file system for which you are enabling lifecycle management and the two policies for EFS Intelligent-Tiering.

$ aws efs put-lifecycle-configuration \
   --file-system-id File-System-ID \
   --lifecycle-policies "[{"TransitionToIA":"AFTER_30_DAYS"},
     {"TransitionToPrimaryStorageClass":"AFTER_1_ACCESS"}]"
   --region us-west-2 \
   --profile adminuser

You get the following response:

{
  "LifecyclePolicies": [
    {
        "TransitionToIA": "AFTER_30_DAYS"
    },
    {
        "TransitionToPrimaryStorageClass": "AFTER_1_ACCESS"
    }
  ]
}

To disable EFS Intelligent-Tiering, set both the Transition into IA and Transition out of IA options to None. This will disable lifecycle management, and your files will remain on the storage class they’re on.

Any files that have already started to move between storage classes at the time that you disabled EFS Intelligent-Tiering will complete moving to their new storage class. You can disable transition policies independently of each other.

For more information, see Amazon EFS lifecycle management in the Amazon EFS User Guide.

Now Available
Amazon EFS Intelligent-Tiering is available in all AWS Regions where Amazon EFS is available. To learn more, join us for the third annual and completely free-to-attend AWS Storage Day 2021 and tune in to our livestream on the AWS Twitch channel today.

You can send feedback to the AWS forum for Amazon EFS or through your usual AWS Support contacts.

– Channy

Augmenting VMware Cloud on AWS Workloads with Native AWS services

2021-08-27 Talha Kalim

Post Syndicated from Talha Kalim original https://aws.amazon.com/blogs/architecture/augmenting-vmware-cloud-on-aws-workloads-with-native-aws-services/

VMware Cloud on AWS allows you to quickly migrate VMware workloads to a VMware-managed Software-Defined Data Center (SDDC) running in the AWS Cloud and extend your on-premises data centers without replatforming or refactoring applications.

You can use native AWS services with Virtual Machines (VMs) in the SDDC, to reduce operational overhead and lower your Total Cost of Ownership (TCO) while increasing your workload’s agility and scalability.

This post covers patterns for connectivity between native AWS services and VMware workloads. We also explore common integrations, including using AWS Cloud storage from an SDDC, securing VM workloads using AWS networking services, and using AWS databases and analytics services with workloads running in the SDDC.

Networking between SDDC and native AWS services

Establishing robust network connectivity with VMware Cloud SDDC VMs is critical to successfully integrating AWS services. This section shows you different options to connect the VMware SDDC with your native AWS account.

The simplest way to get started is to use AWS services in the connected Amazon Virtual Private Cloud (VPC) that is selected during the SDDC deployment process. Figure 1 shows this connectivity, which is automatically configured and available once the SDDC is deployed.

Figure 1. SDDC to Customer Account VPC connectivity configured at deployment

The SDDC Elastic Network Interface (ENI) allows you to connect to native AWS services within the connected VPC, but it doesn’t provide transitive routing beyond the connected VPC. For example, it will not connect the SDDC to other VPCs and the internet.

If you’re looking to connect to native AWS services in multiple accounts and VPCs in the same AWS Region, you have two connectivity options. These are explained in the following sections.

Attaching VPCs to VMware Transit Connect

When you need high-throughput connectivity in a multi-VPC environment, use VMware Transit Connect (VTGW), as shown in Figure 2.

Figure 2. Multi-account VPC connectivity through VMware Transit Connect VPC attachments

VTGW uses a VMware-managed AWS Transit Gateway to interconnect SDDCs within an SDDC group. It also allows you to attach your VPCs in the same Region to the VTGW by providing connectivity to any SDDC within the SDDC group.

Connecting through an AWS Transit Gateway

To connect to your VPCs through an existing Transit Gateway in your account, use IPsec virtual private network (VPN) connections from the SDDC with Border Gateway Protocol (BGP)-based routing, as shown in Figure 3. Multiple IPsec tunnels to the Transit Gateway use equal-cost multi-path routing, which increases bandwidth by load-balancing traffic.

Figure 3. Multi-account VPC connectivity through an AWS Transit Gateway

For scalable, high throughput connectivity to an existing Transit Gateway, connect to the SDDC via a Transit VPC that is attached to the VTGW, as shown in Figure 3. You must manually configure the routes between the VPCs and SDDC for this architecture.

In the following sections, we’ll show you how to use some of these connectivity options for common native AWS services integrations with VMware SDDC workloads.

Reducing TCO with Amazon EFS, Amazon FSx, and Amazon S3

As you are sizing your VMware Cloud on AWS SDDC, consider using AWS Cloud storage for VMs that provide files services or require object storage. Migrating these workloads to cloud storage like Amazon Simple Storage Service (Amazon S3), Amazon Elastic File System (Amazon EFS), or Amazon FSx can reduce your overall TCO through optimized SDDC sizing.

Additionally, you can reduce the undifferentiated heavy lifting involved with deploying and managing complex architectures for file services in VM disks. Figure 4 shows how these services integrate with VMs in the SDDC.

Figure 4. Connectivity examples for AWS Cloud storage services

We recommend connecting to your S3 buckets via the VPC gateway endpoint in the connected VPC. This is a more cost-effective approach because it avoids the data processing costs associated with a VTGW and AWS PrivateLink for Amazon S3.

Similarly, the recommended approach for Amazon EFS and Amazon FSx is to deploy the services in the connected VPC for VM access through the SDDC elastic network interface. You can also connect to existing Amazon EFS and Amazon FSx file shares in other accounts and VPCs using a VTGW, but consider the data transfer costs first.

Integrating AWS networking and content delivery services

Using various AWS networking and content delivery services with VMware Cloud on AWS workloads will provide robust traffic management, security, and fast content delivery. Figure 5 shows how AWS networking and content delivery services integrate with workloads running on VMs.

Figure 5. Connectivity examples for AWS networking and content delivery services

Deploy Elastic Load Balancing (ELB) services in a VPC subnet that has network reachability to the SDDC VMs. This includes the connected VPC over the SDDC elastic network interface, a VPC attached via VTGW, and VPCs attached to a Transit Gateway connected to the SDDC.

VTGW connectivity should be used when the design requires using existing networking services in other VPCs. For example, if you have a dedicated internet ingress/egress VPC. An internal ELB can also be used for load-balancing traffic between services running in SDDC VMs and services running within AWS VPCs.

Use Amazon CloudFront, a global content delivery service, to integrate with load balancers, S3 buckets for static content, or directly with publicly accessible SDDC VMs. Additionally, use Amazon Route 53 to provide public and private DNS services for VMware Cloud on AWS. Deploy services such as AWS WAF and AWS Shield to provide comprehensive network security for VMware workloads in the SDDC.

Integrating with AWS database and analytics services

Data is one the most valuable assets in an organization, and databases are often the most demanding and critical workloads running in on-premises VMware environments.

A common customer pattern to reduce TCO for storage-heavy or memory-intensive databases is to use purpose-built Databases on AWS like Amazon Relational Database Service (RDS). Amazon RDS lets you migrate on-premises relational databases to the cloud and integrate it with SDDC VMs. Using AWS databases also reduces operational overhead you may incur with tasks associated with managing availability, scalability, and disaster recovery (DR).

With AWS Analytics services integrations, you can take advantage of the close proximity of data within VMware Cloud on AWS data stores to gain meaningful insights from your business data. For example, you can use Amazon Redshift to create a data warehouse to run analytics at scale on relational data from transactional systems, operational databases, and line-of-business applications running within the SDDC.

Figure 6 shows integration options for AWS databases and analytics services with VMware Cloud on AWS VMs.

Figure 6. Connectivity examples for AWS Database and Analytics services

We recommend deploying and consuming database services in the connected VPC. If you have existing databases in other accounts or VPCs that require integration with VMware VMs, connect them using the VTGW.

Analytics services can involve ingesting large amounts of data from various sources, including from VMs within the SDDC, creating a significant amount of data traffic. In such scenarios, we recommend using the SDDC connected VPC to deploy any required interface endpoints for analytics services to achieve a cost-effective architecture.

Summary

VMware Cloud on AWS is one of the fastest ways to migrate on-premises VMware workloads to the cloud. In this blog post, we provided different architecture options for connecting the SDDC to native AWS services. This lets you evaluate your requirements to select the most cost-effective option for your workload.

The example integrations covered in this post are common AWS service integrations, including storage, network, and databases. They are a great starting point, but the possibilities are endless. Integrating services like Amazon Machine Learning (Amazon ML), and Serverless on AWS allows you to deliver innovative services to your users, often without having to re-factor existing application backends running on VMware Cloud on AWS.

Additional Resources

If you need to integrate VMware Cloud on AWS with an AWS service, explore the following resources and reach out to us at AWS.

Manage your Digital Microscopy Data using OMERO on AWS

2021-07-29 Travis Berkley

Post Syndicated from Travis Berkley original https://aws.amazon.com/blogs/architecture/manage-your-digital-microscopy-data-using-omero-on-aws/

The Open Microscopy Environment (OME) consortium develops open-source software and format standards for microscopy data. OME Remote Objects (OMERO) is an open source, image data management platform designed to support digital pathology and cellular biology studies. You can access, share, and work with various biological data. This can include histopathology, high content screening, electron microscopy, and even non-image genotype data. Deploying this open source tool on Amazon Web Services (AWS) allows you to access your image data in a secure central repository. You can take advantage of elastic storage by growing the archive as needed without provisioning excess storage beforehand. OMERO has a web interface, which facilitates data access and visualization. It also supports connection through the OMERO client or other third-party image analysis tools, like CellProfilerTM, QuPath, Fiji, ImageJ, and others.

The challenge of microscopy data

Saint Louis University (SLU) School of Medicine Research Microscopy and Histology Core required a centralized system for both distribution and hosting. The solution must provide research imaging distribution to both internal and external clients. It also needed the capability of hosting an educational platform for microscope images. SLU decided that the open source software OMERO was an ideal fit for them.

In order to provide speed, ease of access, and security for the University’s computer networks, SLU decided the solution must be hosted in the cloud. By partnering with AWS, SLU established a robust system for their clients. The privately hosted images on OMERO represent research material databases used by University researchers. OMERO also hosts teaching datasets for resident and fellow education. Other publicly hosted repositories provide access to source images for future publishing standards and regulations. SLU reported that the implementation was extraordinarily smooth for a non-programmer. In addition, the system design allowed for advanced data management to control costs and security.

Reviewing the OMERO architecture

OMERO is a typical three-tier web application, consisting of the following components:

OMERO.web provides access to OMERO’s data hierarchies and also enables annotation, organization, and visualization of data. This web browser-based client of OMERO.server exposes the annotation-based data-sharing mechanism.
OMERO.server is a middleware server application that provides access to image data and metadata stored in a series of databases. It contains a multi-threaded, image-rendering engine and supports a wide range (>140) of image pyramid formats through the Bio-Formats Java library. This Java application facilitates remote access and interoperability for modern scientific studies. It also exposes an API to allow any OMERO client to access the original data and any derived measurements.
OMERO relational database (PostgreSQL) provides the underlying storage facilities. This storage backend contains the processed metadata associated with the binary images, measurement specification, user information, structured annotations, and more.

Figure 1. Architectural diagram for a highly available (HA) deployment of OMERO on AWS including data ingestion options

To achieve the highly available (HA) deployment in the diagram, follow the guidance from this GitHub repository. Since OMERO only supports one writer per mounted network file share, there is one OMERO read+write server and one read-only server in the HA deployment. Otherwise, multiple instances will compete to get first access to Amazon Elastic File System (EFS). If HA is not a requirement, you can lower costs by deploying only the read+write OMERO.server.

OMERO is deployed on AWS using AWS CloudFormation (CFN) templates, which will deploy two nested CFN stacks, one for storage, and one for compute. The storage template creates an EFS volume and an Amazon Relational Database Service (RDS) instance of PostgreSQL. EFS provides the option to move files to an infrequent accessed storage class after a certain number of days to save storage cost. RDS has Multi-AZ option to improve business continuity. The compute template creates Amazon Elastic Container Service (Amazon ECS) containers for the OMERO web and server functions. You have the option to deploy the OMERO containers on AWS Fargate or Amazon EC2 launch type. It also creates an Amazon Application Load Balancer (ALB) with duration-based stickiness enabled and an AWS Certificate Manager (ACM) certificate for Transport Layer Security (TLS) termination at ALB. Only the ALB is publicly accessible, as the web portal is protected behind it in private subnets. VPC and subnets are required, which can be obtained via this CFN template. It also requires the hosted zone ID and fully qualified domain name in Amazon Route 53, which will be used to validate the TLS certificate. If higher security is not a requirement, there is an option to deploy without the registered domain and the hosted zone in Route 53. You will then be able to access the OMERO web through Application Load Balancer DNS name without TLS encryption.

Additionally, the containers of OMERO.web and OMERO.server can be extended with plugins. The landing page for login can be customized with logos, brands, or disclaimers. Build a new Docker container image with specific configuration changes to enrich the functionality of this open source platform.

You can use Amazon ECS Exec to access the OMERO command line interface (CLI) to import images within the OMERO.server container, running on either AWS Fargate or EC2 launch type. You can also run Amazon ECS Exec via AWS CloudShell. The OMERO CFN templates enable Amazon ECS Exec commands by default. You will only need to install AWS CLI and SSM plugin on your clients or AWS CloudShell to initiate the commands. When you import images within the OMERO.server container instances, you can use the OMERO in-place import to avoid redundant copies of the image files on Amazon EFS. Alternatively, you can access the Windows desktop OMERO client OMERO.insight, via the application virtualization service Amazon AppStream 2.0. This connects to the OMERO.server in the same VPC. Amazon AppStream 2.0 allows Amazon S3 being used as home folder storage, so you can import images directly from Amazon S3 to OMERO.server.

AWS offers multiple options to move your microscopic image data from on premises facilities to the cloud storage, as illustrated in Figure 1:

Use AWS Transfer Family to copy data directly from on premises devices to EFS
Alternatively, transfer data directly from your on-premises Network File System (NFS) to EFS using AWS DataSync. AWS DataSync can also be used to transfer files from S3 to EFS.
Set up AWS Storage Gateway, in particular File Gateway, to move your image files from on premises to Amazon Simple Storage Service (S3) first. A storage lifecycle policy can archive images. You can track the storage activity metrics using Amazon S3 Storage Lens and gain insights on storage cost using cost allocation tags. Once the files are in Amazon S3, you can either set up AWS DataSync to transfer files from S3 to EFS, or directly import files into OMERO.server.

To find the latest development to this solution, check out digital pathology on AWS repository on GitHub.

Conclusion

Researchers and scientists at St. Louis University were able to grow their image repository on AWS without the concern of fixed storage limits. They can scale their compute environment up or down as their research requirements dictate. The managed services, like Amazon ECS and RDS, are able to significantly reduce the operational workloads from researchers. SLU reports that this platform is of great use to their researchers. Other universities, academic medical centers, and pharmaceutical and biotechnology companies can also use this cloud-based image data management platform to collect, visualize, and share access to their image data assets.

Blue/Green deployment with AWS Developer tools on Amazon EC2 using Amazon EFS to host application source code

2021-07-28 Rakesh Singh

Post Syndicated from Rakesh Singh original https://aws.amazon.com/blogs/devops/blue-green-deployment-with-aws-developer-tools-on-amazon-ec2-using-amazon-efs-to-host-application-source-code/

Many organizations building modern applications require a shared and persistent storage layer for hosting and deploying data-intensive enterprise applications, such as content management systems, media and entertainment, distributed applications like machine learning training, etc. These applications demand a centralized file share that scales to petabytes without disrupting running applications and remains concurrently accessible from potentially thousands of Amazon EC2 instances.

Simultaneously, customers want to automate the end-to-end deployment workflow and leverage continuous methodologies utilizing AWS developer tools services for performing a blue/green deployment with zero downtime. A blue/green deployment is a deployment strategy wherein you create two separate, but identical environments. One environment (blue) is running the current application version, and one environment (green) is running the new application version. The blue/green deployment strategy increases application availability by generally isolating the two application environments and ensuring that spinning up a parallel green environment won’t affect the blue environment resources. This isolation reduces deployment risk by simplifying the rollback process if a deployment fails.

Amazon Elastic File System (Amazon EFS) provides a simple, scalable, and fully-managed elastic NFS file system for use with AWS Cloud services and on-premises resources. It scales on demand, thereby eliminating the need to provision and manage capacity in order to accommodate growth. Utilize Amazon EFS to create a shared directory that stores and serves code and content for numerous applications. Your application can treat a mounted Amazon EFS volume like local storage. This means you don’t have to deploy your application code every time the environment scales up to multiple instances to distribute load.

In this blog post, I will guide you through an automated process to deploy a sample web application on Amazon EC2 instances utilizing Amazon EFS mount to host application source code, and utilizing a blue/green deployment with AWS code suite services in order to deploy the application source code with no downtime.

How this solution works

This blog post includes a CloudFormation template to provision all of the resources needed for this solution. The CloudFormation stack deploys a Hello World application on Amazon Linux 2 EC2 Instances running behind an Application Load Balancer and utilizes Amazon EFS mount point to store the application content. The AWS CodePipeline project utilizes AWS CodeCommit as the version control, AWS CodeBuild for installing dependencies and creating artifacts, and AWS CodeDeploy to conduct deployment on EC2 instances running in an Amazon EC2 Auto Scaling group.

Figure 1 below illustrates our solution architecture.

Figure 1: Sample solution architecture

The event flow in Figure 1 is as follows:

A developer commits code changes from their local repo to the CodeCommit repository. The commit triggers CodePipeline execution.
CodeBuild execution begins to compile source code, install dependencies, run custom commands, and create deployment artifact as per the instructions in the Build specification reference file.
During the build phase, CodeBuild copies the source-code artifact to Amazon EFS file system and maintains two different directories for current (green) and new (blue) deployments.
After successfully completing the build step, CodeDeploy deployment kicks in to conduct a Blue/Green deployment to a new Auto Scaling Group.
During the deployment phase, CodeDeploy mounts the EFS file system on new EC2 instances as per the CodeDeploy AppSpec file reference and conducts other deployment activities.
After successful deployment, a Lambda function triggers in order to store a deployment environment parameter in Systems Manager parameter store. The parameter stores the current EFS mount name that the application utilizes.
The AWS Lambda function updates the parameter value during every successful deployment with the current EFS location.

Prerequisites

For this walkthrough, the following are required:

An AWS account
Access to an AWS account with administrator or PowerUser (or equivalent) AWS Identity and Access Management(IAM) role policies attached
Git Command Line installed and configured in your local environment

Deploy the solution

Once you’ve assembled the prerequisites, download or clone the GitHub repo and store the files on your local machine. Utilize the commands below to clone the repo:

mkdir -p ~/blue-green-sample/
cd ~/blue-green-sample/
git clone https://github.com/aws-samples/blue-green-deployment-pipeline-for-efs

Once completed, utilize the following steps to deploy the solution in your AWS account:

Create a private Amazon Simple Storage Service (Amazon S3) bucket by using this documentation

Figure 2: AWS S3 console view when creating a bucket
Upload the cloned or downloaded GitHub repo files to the root of the S3 bucket. the S3 bucket objects structure should look similar to Figure 3:

Figure 3: AWS S3 bucket object structure
Go to the S3 bucket and select the template name solution-stack-template.yml, and then copy the object URL.
Open the CloudFormation console. Choose the appropriate AWS Region, and then choose Create Stack. Select With new resources.
Select Amazon S3 URL as the template source, paste the object URL that you copied in Step 3, and then choose Next.
On the Specify stack details page, enter a name for the stack and provide the following input parameter. Modify the default values for other parameters in order to customize the solution for your environment. You can leave everything as default for this walkthrough.

ArtifactBucket– The name of the S3 bucket that you created in the first step of the solution deployment. This is a mandatory parameter with no default value.

Figure 4: Defining the stack name and input parameters for the CloudFormation stack

Choose Next.
On the Options page, keep the default values and then choose Next.
On the Review page, confirm the details, acknowledge that CloudFormation might create IAM resources with custom names, and then choose Create Stack.
Once the stack creation is marked as CREATE_COMPLETE, the following resources are created:

A virtual private cloud (VPC) configured with two public and two private subnets.
NAT Gateway, an EIP address, and an Internet Gateway.
Route tables for private and public subnets.
Auto Scaling Group with a single EC2 Instance.
Application Load Balancer and a Target Group.
Three security groups—one each for ALB, web servers, and EFS file system.
Amazon EFS file system with a mount target for each Availability Zone.
CodePipeline project with CodeCommit repository, CodeBuild, and CodeDeploy resources.
SSM parameter to store the environment current deployment status.
Lambda function to update the SSM parameter for every successful pipeline execution.
Required IAM Roles and policies.

Note: It may take anywhere from 10-20 minutes to complete the stack creation.

Test the solution

Now that the solution stack is deployed, follow the steps below to test the solution:

Validate CodePipeline execution status

After successfully creating the CloudFormation stack, a CodePipeline execution automatically triggers to deploy the default application code version from the CodeCommit repository.

In the AWS console, choose Services and then CloudFormation. Select your stack name. On the stack Outputs tab, look for the CodePipelineURL key and click on the URL.
Validate that all steps have successfully completed. For a successful CodePipeline execution, you should see something like Figure 5. Wait for the execution to complete in case it is still in progress.

Figure 5: CodePipeline console showing execution status of all stages

Validate the Website URL

After completing the pipeline execution, hit the website URL on a browser to check if it’s working.

On the stack Outputs tab, look for the WebsiteURL key and click on the URL.
For a successful deployment, it should open a default page similar to Figure 6.

Figure 6: Sample “Hello World” application (Green deployment)

Validate the EFS share

After the website deployed successfully, we will get into the application server and validate the EFS mount point and the application source code directory.

Open the Amazon EC2 console, and then choose Instances in the left navigation pane.
Select the instance named bg-sample and choose
For Connection method, choose Session Manager, and then choose connect

After the connection is made, run the following bash commands to validate the EFS mount and the deployed content. Figure 7 shows a sample output from running the bash commands.

sudo df –h | grep efs
ls –la /efs/green
ls –la /var/www/

Figure 7: Sample output from the bash command (Green deployment)

Deploy a new revision of the application code

After verifying the application status and the deployed code on the EFS share, commit some changes to the CodeCommit repository in order to trigger a new deployment.

On the stack Outputs tab, look for the CodeCommitURL key and click on the corresponding URL.
Click on the file html.
Click on
Uncomment line 9 and comment line 10, so that the new lines look like those below after the changes:

background-color: #0188cc; 
#background-color: #90ee90;

Add Author name, Email address, and then choose Commit changes.

After you commit the code, the CodePipeline triggers and executes Source, Build, Deploy, and Lambda stages. Once the execution completes, hit the Website URL and you should see a new page like Figure 8.

Figure 8: New Application version (Blue deployment)

On the EFS side, the application directory on the new EC2 instance now points to /efs/blue as shown in Figure 9.

Figure 9: Sample output from the bash command (Blue deployment)

Solution review

Let’s review the pipeline stages details and what happens during the Blue/Green deployment:

1) Build stage

For this sample application, the CodeBuild project is configured to mount the EFS file system and utilize the buildspec.yml file present in the source code root directory to run the build. Following is the sample build spec utilized in this solution:

version: 0.2
phases:
  install:
    runtime-versions:
      php: latest   
  build:
    commands:
      - current_deployment=$(aws ssm get-parameter --name $SSM_PARAMETER --query "Parameter.Value" --region $REGION --output text)
      - echo $current_deployment
      - echo $SSM_PARAMETER
      - echo $EFS_ID $REGION
      - if [[ "$current_deployment" == "null" ]]; then echo "this is the first GREEN deployment for this project" ; dir='/efs/green' ; fi
      - if [[ "$current_deployment" == "green" ]]; then dir='/efs/blue' ; else dir='/efs/green' ; fi
      - if [ ! -d $dir ]; then  mkdir $dir >/dev/null 2>&1 ; fi
      - echo $dir
      - rsync -ar $CODEBUILD_SRC_DIR/ $dir/
artifacts:
  files:
      - '**/*'

During the build job, the following activities occur:

Installs latest php runtime version.
Reads the SSM parameter value in order to know the current deployment and decide which directory to utilize. The SSM parameter value flips between green and blue for every successful deployment.
Synchronizes the latest source code to the EFS mount point.
Creates artifacts to be utilized in subsequent stages.

Note: Utilize the default buildspec.yml as a reference and customize it further as per your requirement. See this link for more examples.

2) Deploy Stage

The solution is utilizing CodeDeploy blue/green deployment type for EC2/On-premises. The deployment environment is configured to provision a new EC2 Auto Scaling group for every new deployment in order to deploy the new application revision. CodeDeploy creates the new Auto Scaling group by copying the current one. See this link for more details on blue/green deployment configuration with CodeDeploy. During each deployment event, CodeDeploy utilizes the appspec.yml file to run the deployment steps as per the defined life cycle hooks. Following is the sample AppSpec file utilized in this solution.

version: 0.0
os: linux
hooks:
  BeforeInstall:
    - location: scripts/install_dependencies
      timeout: 180
      runas: root
  AfterInstall:
    - location: scripts/app_deployment
      timeout: 180
      runas: root
  BeforeAllowTraffic :
     - location: scripts/check_app_status
       timeout: 180
       runas: root

Note: The scripts mentioned in the AppSpec file are available in the scripts directory of the CodeCommit repository. Utilize these sample scripts as a reference and modify as per your requirement.

For this sample, the following steps are conducted during a deployment:

BeforeInstall:
- Installs required packages on the EC2 instance.
- Mounts the EFS file system.
- Creates a symbolic link to point the apache home directory /var/www/html to the appropriate EFS mount point. It also ensures that the new application version deploys to a different EFS directory without affecting the current running application.

AfterInstall:
- Stops apache web server.
- Fetches current EFS directory name from Systems Manager.
- Runs some clean up commands.
- Restarts apache web server.

BeforeAllowTraffic:
- Checks application status if running fine.
- Exits the deployment with error if the app returns a non 200 HTTP status code.

3) Lambda Stage

After completing the deploy stage, CodePipeline triggers a Lambda function in order to update the SSM parameter value with the updated EFS directory name. This parameter value alternates between “blue” and “green” to help CodePipeline identify the right EFS file system path during the next deployment.

CodeDeploy Blue/Green deployment

Let’s review the sequence of events flow during the CodeDeploy deployment:

CodeDeploy creates a new Auto Scaling group by copying the original one.
Provisions a replacement EC2 instance in the new Auto Scaling Group.
Conducts the deployment on the new instance as per the instructions in the yml file.
Sets up health checks and redirects traffic to the new instance.
Terminates the original instance along with the Auto Scaling Group.
After completing the deployment, it should appear as shown in Figure 10.

AWS CodeDeploy console view of a Blue/Green CodeDeploy deployment on Ec2

Figure 10: AWS console view of a Blue/Green CodeDeploy deployment on Ec2

Troubleshooting

To troubleshoot any service-related issues, see the following links:

More information

Now that you have tested the solution, here are some additional points worth noting:

The sample template and code utilized in this blog can work in any AWS region and are mainly intended for demonstration purposes. Utilize the sample as a reference and modify it further as per your requirement.
This solution works with single account, Region, and VPC combination.
For this sample, we have utilized AWS CodeCommit as version control, but you can also utilize any other source supported by AWS CodePipeline like Bitbucket, GitHub, or GitHub Enterprise Server

Clean up

Follow these steps to delete the components and avoid any future incurring charges:

Open the AWS CloudFormation console.
On the Stacks page in the CloudFormation console, select the stack that you created for this blog post. The stack must be currently running.
In the stack details pane, choose Delete.
Select Delete stack when prompted.
Empty and delete the S3 bucket created during deployment step 1.

Conclusion

In this blog post, you learned how to set up a complete CI/CD pipeline for conducting a blue/green deployment on EC2 instances utilizing Amazon EFS file share as mount point to host application source code. The EFS share will be the central location hosting your application content, and it will help reduce your overall deployment time by eliminating the need for deploying a new revision on every EC2 instance local storage. It also helps to preserve any dynamically generated content when the life of an EC2 instance ends.

Author bio

Rakesh Singh

Rakesh is a Senior Technical Account Manager at Amazon. He loves automation and enjoys working directly with customers to solve complex technical issues and provide architectural guidance. Outside of work, he enjoys playing soccer, singing karaoke, and watching thriller movies.

Single versus multi-tenancy

EKS cluster sizing and customer segmentation considerations in multi-tenancy designs

EKS security

EFS considerations

RDS considerations

Conclusion

Defining a hybrid data access strategy

Hybrid data access strategy architecture

ML workloads using Amazon SageMaker

Conclusion

Cloud Pak for integration architecture

Cost

Prerequisites

Installation steps

Validation steps

Post installation

Cleanup

Conclusion

Further reading

Cloud Pak for data architecture

Cost

Prerequisites

Installation steps

Validation steps

Post installation

Cleanup

Conclusion

Additional resources

Solution overview

Prerequisites

1. Configure AWS Backup

2. Configure Amazon EFS

3. Configure RMAN backup to Amazon EFS

4. Perform database point-in-time recovery

5. Backup retention

Cleanup

Conclusion

Solution overview

Prerequisites

1. Configure AWS Backup

2. Setup S3 bucket and IAM role

Cleanup

Conclusion

Approach 1: Db2 log shipping

Approach 2: Db2 highly available and disaster recovery (HADR) auxiliary standby

Conclusion

Overview

Comparing single and multi-model inference architectures

Should I load a model outside the Lambda handler or inside?

Deploying the solution

Testing the solution

General recommendations for model storage

Recommendations for model hosting on S3

Recommendations for model hosting on EFS

Recommendations for container hosted models

Cleaning up

Conclusion

Migrating infrastructure

Migrating compute resources

Migrating storage resources

Migrating database resources

Conclusion

Networking between SDDC and native AWS services

Attaching VPCs to VMware Transit Connect

Connecting through an AWS Transit Gateway

Reducing TCO with Amazon EFS, Amazon FSx, and Amazon S3

Integrating AWS networking and content delivery services

Integrating with AWS database and analytics services

Summary

Additional Resources

The challenge of microscopy data

Reviewing the OMERO architecture

Conclusion

How this solution works

The event flow in Figure 1 is as follows:

Prerequisites

Deploy the solution

Test the solution

Solution review

CodeDeploy Blue/Green deployment