Tag Archives: OpenShift

Deploying IBM Cloud Pak for Data on Red Hat OpenShift Service on AWS

2022-09-14 Eduardo Monich Fronza

Post Syndicated from Eduardo Monich Fronza original https://aws.amazon.com/blogs/architecture/deploying-ibm-cloud-pak-for-data-on-red-hat-openshift-service-on-aws/

Amazon Web Services (AWS) customers who are looking for a more intuitive way to deploy and use IBM Cloud Pak for Data (CP4D) on the AWS Cloud, can now use the Red Hat OpenShift Service on AWS (ROSA).

ROSA is a fully managed service, jointly supported by AWS and Red Hat. It is managed by Red Hat Site Reliability Engineers and provides a pay-as-you-go pricing model, as well as a unified billing experience on AWS.

With this, customers do not manage the lifecycle of Red Hat OpenShift Container Platform clusters. Instead, they are free to focus on developing new solutions and innovating faster, using IBM’s integrated data and artificial intelligence platform on AWS, to differentiate their business and meet their ever-changing enterprise needs.

CP4D can also be deployed from the AWS Marketplace with self-managed OpenShift clusters. This is ideal for customers with requirements, like Red Hat OpenShift Data Foundation software defined storage, or who prefer to manage their OpenShift clusters.

In this post, we discuss how to deploy CP4D on ROSA using IBM-provided Terraform automation.

Cloud Pak for data architecture

Here, we install CP4D in a highly available ROSA cluster across three availability zones (AZs); with three master nodes, three infrastructure nodes, and three worker nodes.

Review the AWS Regions and Availability Zones documentation and the regions where ROSA is available to choose the best region for your deployment.

This is a public ROSA cluster, accessible from the internet via port 443. When deploying CP4D in your AWS account, consider using a private cluster (Figure 1).

Figure 1. IBM Cloud Pak for Data on ROSA

We are using Amazon Elastic Block Store (Amazon EBS) and Amazon Elastic File System (Amazon EFS) for the cluster’s persistent storage. Review the IBM documentation for information about supported storage options.

Review the AWS prerequisites for ROSA, and follow the Security best practices in IAM documentation to protect your AWS account before deploying CP4D.

Cost

The costs associated with using AWS services when deploying CP4D in your AWS account can be estimated on the pricing pages for the services used.

Prerequisites

This blog assumes familiarity with: CP4D, Terraform, Amazon Elastic Compute Cloud (Amazon EC2), Amazon EBS, Amazon EFS, Amazon Virtual Private Cloud, and AWS Identity and Access Management (IAM).

You will need the following before getting started:

Access to an AWS account, with permissions to create the resources described in the installation steps section.
An AWS IAM user, with the permissions described in the AWS prerequisites for ROSA documentation.
Sufficient AWS service quotas to deploy ROSA. You can request service-quota increases from the AWS console.
An IBM entitlement API key: either a 60-day trial or an existing entitlement.
A bastion host to run the CP4D installer, with the following packages:
- AWS Command Line Interface (aws cli)
- OpenShift command-line interface (oc)
- Kubernetes command-line tool (kubectl)
- Terraform
- Git
- Podman
- Python 3.8
- httpd-tools, jq, wget, vim, unzip

Installation steps

Complete the following steps to deploy CP4D on ROSA:

First, enable ROSA on the AWS account. From the AWS ROSA console, click on Enable ROSA, as in Figure 2.

Figure 2. Enabling ROSA on your AWS account
Click on Get started. Redirect to the Red Hat website, where you can register and obtain a Red Hat ROSA token.

Navigate to the AWS IAM console. Create an IAM policy named cp4d-installer-policy and add the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "autoscaling:*",
                "cloudformation:*",
                "cloudwatch:*",
                "ec2:*",
                "elasticfilesystem:*",
                "elasticloadbalancing:*",
                "events:*",
                "iam:*",
                "kms:*",
                "logs:*",
                "route53:*",
                "s3:*",
                "servicequotas:GetRequestedServiceQuotaChange",
                "servicequotas:GetServiceQuota",
                "servicequotas:ListServices",
                "servicequotas:ListServiceQuotas",
                "servicequotas:RequestServiceQuotaIncrease",
                "sts:*",
                "support:*",
                "tag:*"
            ],
            "Resource": "*"
        }
    ]
}

Next, let’s create an IAM user from the AWS IAM console, which will be used for the CP4D installation:
a. Specify a name, like ibm-cp4d-bastion.
b. Set the credential type to Access key – Programmatic access.
c. Attach the IAM policy created in Step 3.
d. Download the .csv credentials file.
From the Amazon EC2 console, create a new EC2 key pair and download the private key.
Launch an Amazon EC2 instance from which the CP4D installer is launched:
a. Specify a name, like ibm-cp4d-bastion.
b. Select an instance type, such as t3.medium.
c. Select the EC2 key pair created in Step 4.
d. Select the Red Hat Enterprise Linux 8 (HVM), SSD Volume Type for 64-bit (x86) Amazon Machine Image.
e. Create a security group with an inbound rule that allows connection. Restrict access to your own IP address or an IP range from your organization.
f. Leave all other values as default.
Connect to the EC2 instance via SSH using its public IP address. The remaining installation steps will be initiated from it.

Install the required packages:

$ sudo yum update -y
$ sudo yum install git unzip vim wget httpd-tools python38 -y

$ sudo ln -s /usr/bin/python3 /usr/bin/python
$ sudo ln -s /usr/bin/pip3 /usr/bin/pip
$ sudo pip install pyyaml

$ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
$ unzip awscliv2.zip
$ sudo ./aws/install

$ wget "https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64"
$ chmod +x jq-linux64
$ sudo mv jq-linux64 /usr/local/bin/jq

$ wget "https://mirror.openshift.com/pub/openshift-v4/clients/ocp/4.10.15/openshift-client-linux-4.10.15.tar.gz"
$ tar -xvf openshift-client-linux-4.10.15.tar.gz
$ chmod u+x oc kubectl
$ sudo mv oc /usr/local/bin
$ sudo mv kubectl /usr/local/bin

$ sudo yum install -y yum-utils
$ sudo yum-config-manager --add-repo $ https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
$ sudo yum -y install terraform

$ sudo subscription-manager repos --enable=rhel-7-server-extras-rpms
$ sudo yum install -y podman

Configure the AWS CLI with the IAM user credentials from Step 4 and the desired AWS region to install CP4D:

$ aws configure

AWS Access Key ID [None]: AK****************7Q
AWS Secret Access Key [None]: vb************************************Fb
Default region name [None]: eu-west-1
Default output format [None]: json

Clone the following IBM GitHub repository:
https://github.com/IBM/cp4d-deployment.git
```
$ cd ~/cp4d-deployment/managed-openshift/aws/terraform/
```

For the purpose of this post, we enabled Watson Machine Learning, Watson Studio, and Db2 OLTP services on CP4D. Use the example in this step to create a Terraform variables file for CP4D installation. Enable CP4D services required for your use case:

region			= "eu-west-1"
tenancy			= "default"
access_key_id 		= "your_AWS_Access_key_id"
secret_access_key 	= "your_AWS_Secret_access_key"

new_or_existing_vpc_subnet	= "new"
az				= "multi_zone"
availability_zone1		= "eu-west-1a"
availability_zone2 		= "eu-west-1b"
availability_zone3 		= "eu-west-1c"

vpc_cidr 		= "10.0.0.0/16"
public_subnet_cidr1 	= "10.0.0.0/20"
public_subnet_cidr2 	= "10.0.16.0/20"
public_subnet_cidr3 	= "10.0.32.0/20"
private_subnet_cidr1 	= "10.0.128.0/20"
private_subnet_cidr2 	= "10.0.144.0/20"
private_subnet_cidr3 	= "10.0.160.0/20"

openshift_version 		= "4.10.15"
cluster_name 			= "your_ROSA_cluster_name"
rosa_token 			= "your_ROSA_token"
worker_machine_type 		= "m5.4xlarge"
worker_machine_count 		= 3
private_cluster 			= false
cluster_network_cidr 		= "10.128.0.0/14"
cluster_network_host_prefix 	= 23
service_network_cidr 		= "172.30.0.0/16"
storage_option 			= "efs-ebs" 
ocs 				= { "enable" : "false", "ocs_instance_type" : "m5.4xlarge" } 
efs 				= { "enable" : "true" }

accept_cpd_license 		= "accept"
cpd_external_registry 		= "cp.icr.io"
cpd_external_username 	= "cp"
cpd_api_key 			= "your_IBM_API_Key"
cpd_version 			= "4.5.0"
cpd_namespace 		= "zen"
cpd_platform 			= "yes"

watson_knowledge_catalog 	= "no"
data_virtualization 		= "no"
analytics_engine 		= "no"
watson_studio 			= "yes"
watson_machine_learning 	= "yes"
watson_ai_openscale 		= "no"
spss_modeler 			= "no"
cognos_dashboard_embedded 	= "no"
datastage 			= "no"
db2_warehouse 		= "no"
db2_oltp 			= "yes"
cognos_analytics 		= "no"
master_data_management 	= "no"
decision_optimization 		= "no"
bigsql 				= "no"
planning_analytics 		= "no"
db2_aaservice 			= "no"
watson_assistant 		= "no"
watson_discovery 		= "no"
openpages 			= "no"
data_management_console 	= "no"

Save your file, and launch the commands below to install CP4D and track progress:

$ terraform init -input=false
$ terraform apply --var-file=cp4d-rosa-3az-new-vpc.tfvars \
   -input=false | tee terraform.log

The installation runs for 4 or more hours. Once installation is complete, the output includes (as in Figure 3):
a. Commands to get the CP4D URL and the admin user password
b. CP4D admin user
c. Login command for the ROSA cluster

Figure 3. CP4D installation output

Validation steps

Let’s verify the installation!

$ oc login https://api.cp4dblog.17e7.p1.openshiftapps.com:6443 --username cluster-admin --password *****-*****-*****-*****

Initiate the following command to get the cluster’s console URL (Figure 4):
```
$ oc whoami --show-console
```
Figure 4. ROSA console URL
Run the commands in this step to retrieve the CP4D URL and admin user password (Figure 5).
```
$ oc extract secret/admin-user-details \
  --keys=initial_admin_password --to=- -n zen
$ oc get routes -n zen
```
Figure 5. Retrieve the CP4D admin user password and URL

Initiate the following commands to have the CP4D workloads in your ROSA cluster (Figure 6):

$ oc get pods -n zen
$ oc get deployments -n zen
$ oc get svc -n zen 
$ oc get pods -n ibm-common-services 
$ oc get deployments -n ibm-common-services
$ oc get svc -n ibm-common-services
$ oc get subs -n ibm-common-services

Figure 6. Checking the CP4D pods running on ROSA

Log in to your CP4D web console using its URL and your admin password.
Expand the navigation menu. Navigate to Services > Services catalog for the available services (Figure 7).

Figure 7. Navigating to the CP4D services catalog
Notice that the services set as “enabled” correspond with your Terraform definitions (Figure 8).

Figure 8. Services enabled in your CP4D catalog

Congratulations! You have successfully deployed IBM CP4D on Red Hat OpenShift on AWS.

Post installation

Refer to the IBM documentation on setting up services, if you need to enable additional services on CP4D.

When installing CP4D on productive environments, please review the IBM documentation on securing your environment. Also, the Red Hat documentation on setting up identity providers for ROSA is informative. You can also consider enabling auto scaling for your cluster.

Cleanup

Connect to your bastion host, and run the following steps to delete the CP4D installation, including ROSA. This step avoids incurring future charges on your AWS account.

$ cd ~/cp4d-deployment/managed-openshift/aws/terraform/
$ terraform destroy -var-file="cp4d-rosa-3az-new-vpc.tfvars"

If you’ve experienced any failures during the CP4D installation, run these next steps:

$ cd ~/cp4d-deployment/managed-openshift/aws/terraform
$ sudo cp installer-files/rosa /usr/local/bin
$ sudo chmod 755 /usr/local/bin/rosa
$ Cluster_Name=`rosa list clusters -o yaml | grep -w "name:" | cut -d ':' -f2 | xargs`
$ rosa remove cluster --cluster=${Cluster_Name}
$ rosa logs uninstall -c ${Cluster_Name } –watch
$ rosa init --delete-stack
$ terraform destroy -var-file="cp4d-rosa-3az-new-vpc.tfvars"

Conclusion

In summary, we explored how customers can take advantage of a fully managed OpenShift service on AWS to run IBM CP4D. With this implementation, customers can focus on what is important to them, their workloads, and their customers, and less on managing the day-to-day operations of managing OpenShift to run CP4D.

Check out the IBM Cloud Pak for Data Simplifies and Automates How You Turn Data into Insights blog to learn how to use CP4D on AWS to unlock the value of your data.

Additional resources

Architecture Patterns for Red Hat OpenShift on AWS

2020-09-22 Ryan Niksch

Post Syndicated from Ryan Niksch original https://aws.amazon.com/blogs/architecture/architecture-patterns-for-red-hat-openshift-on-aws/

Editor’s note: Although this blog post and its accompanying code make use of the word “Master,” Red Hat is making open source code more inclusive by eradicating “problematic language.” Read more about this.

Introduction

Red Hat OpenShift is an application platform that provides customers with turnkey application platform that is much more than a simple Kubernetes orchestration.

OpenShift customers choose AWS as their cloud of choice because of the efficiency, security, and reliability, scalability, and elasticity it provides. Customers seeking to modernize their business, process, and application stacks are drawn to the rich AWS service and feature sets.

As such, we see some customers migrate from on-premises to AWS or exist in a hybrid context with application workloads running in various locations. For OpenShift customers, this poses a few questions and considerations:

What are the recommendations for the best way to deploy OpenShift on AWS?
How is this different from what customers were used to on-premises?
How does this ensure resilience and availability?
Do customers need a multi-region, multi-account approach?

For hybrid customers, there are assumptions and misconceptions:

Where does the control plane exist?
Is there replication, and if so, what are the considerations and ramifications?

In this post I will run through some of the more common questions and patterns for OpenShift on AWS, while looking at some of the terminology and conceptual differences of AWS. I’ll explore migration and hybrid use cases and address some misconceptions.

OpenShift building blocks

On AWS, OpenShift 4x is the norm. To that effect, I will focus on OpenShift 4, but many of the considerations will apply to both OpenShift 3 and OpenShift 4.

Let’s unpack some of the OpenShift building blocks. An OpenShift cluster consists of Master, infrastructure, and worker nodes. The Master forms the control plane and infrastructure nodes cater to a routing layer and additional functions, such as logging, monitoring etc. Worker nodes are the nodes that customer application container workloads will exist on.

When deployed on-premises, OpenShift nodes will be placed in separate network subnets. Depending on distance, latency, etc., a single OpenShift cluster may span two data centers that have some nodes in a subnet in one data center and other subnets in a different data center. This applies to customers with data centers within a few miles of each other with high-speed connectivity. An alternative would be an OpenShift cluster in each data center.

AWS concepts and terminology

At AWS, the concept of “region” is a geolocation, such as EMEA (Europe, Middle East, and Africa) or APAC (Asian Pacific) rather than a data center or specific building. An Availability Zone (AZ) is the closest construct on AWS that maps to a physical data center. Within each region you will find multiple (typically three or more) AZs. Note that a single AZ will contain multiple physical data centers but we treat it as a single point of failure. For example, an event that impacts an AZ would be expected to impact all the data centers within that AZ. To this effect, customers should deploy workloads spanning multiple AZs to protect against any event that would impact a single AZ.

Deploying OpenShift

When deploying an OpenShift cluster on AWS, we recommend starting with three Master nodes spread across three AWS AZs and three worker nodes spread across three AZs. This allows for the combination of resilience and availably constructs provided by AWS as well as Red Hat OpenShift. The OpenShift installer provides a means of deploying the underlying AWS infrastructure in two ways: IPI Installer-provisioned infrastructure and UPI user-provisioned infrastructure. Both Red Hat and AWS collect customer feedback and use this to drive recommended patterns that are then included in the OpenShift installer. As such, the OpenShift installer IPI mode becomes a living reference architecture for deploying OpenShift on AWS.

Deploying OpenShift

The installer will require inputs for the environment on which it’s being deployed. In this case, since I am deploying on AWS, I will need to provide the AWS region, AZs, or subnets that related to the AZs, as well as EC2 instance type. The installer will then generate a set of ignition files that will be used during the deployment of OpenShift:

apiVersion: v1
baseDomain: example.com 
controlPlane: 
  hyperthreading: Enabled   
  name: master
  platform:
    aws:
      zones:
      - us-west-2a
      - us-west-2b
      - us-west-2c
      rootVolume:
        iops: 4000
        size: 500
        type: io1
      type: m5.xlarge 
  replicas: 3
compute: 
- hyperthreading: Enabled 
  name: worker
  platform:
    aws:
      rootVolume:
        iops: 2000
        size: 500
        type: io1 
      type: m5.xlarge
      zones:
      - us-west-2a
      - us-west-2b
      - us-west-2c
  replicas: 3
metadata:
  name: test-cluster 
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-west-2 
    userTags:
      adminContact: jdoe
      costCenter: 7536
pullSecret: '{"auths": ...}' 
fips: false 
sshKey: ssh-ed25519 AAAA...

What does this look like at scale?

For larger implementations, we would see additional worker nodes spread across three or more AZs. As more worker nodes are added, use of the control plane increases. Initially scaling up the Amazon Elastic Compute Cloud (EC2) instance type to a larger instance type is an effective way of addressing this. It’s possible to add more Master nodes, and we recommend that an odd number of nodes are maintained. It is more common to see scaling out of the infrastructure nodes before there is a need to scale Masters. For large-scale implementations, infrastructure functions such as the router, monitoring, and logging functions can be moved to separate EC2 instances from the Master nodes, as well as from each other. It is important to spread the routing layer across multiple AZs, which is critical to maintaining availability and resilience.

The process of resource separation is now controlled by infrastructure machine sets within OpenShift. An infrastructure machine set would need to be defined, then the infrastructure role edited to be moved from the default to this new infrastructure machine set. Read about this in greater detail.

OpenShift in a multi-account context

Using AWS accounts as a means of separation is a common well-architected pattern. AWS Organizations and AWS Control Tower are services that are commonly adopted as part of a multi-account strategy. This is very much the case when looking to enable teams to use their own accounts and when an account vending process is needed to cater for self-service account provisioning.

OpenShift in a multi-account context

OpenShift clusters are deployed into multiple accounts. An OpenShift dev cluster is deployed into an AWS Dev account. This account would typically have AWS Developer Support associated with it. A separate production OpenShift cluster would be provisioned into an AWS production account with AWS Enterprise Support. Enterprise support provides for faster support case response times, and you get the benefit of dedicated resources such as a technical account manager and solutions architect.

CICD pipelines and processes are then used to control the application life cycle from code to dev to production. The pipelines would push the code to different OpenShift cluster end points at different stages of the life cycle.

Hybrid use case implementation

A common misconception of hybrid implementations is that there is a single cluster or control plan that has worker nodes in various locations. For example, there could be a cluster where the Master and infrastructure nodes are deployed in one location, but also worker nodes registered with this cluster that exist on-premises as well as in the cloud.

Having a single customer control plane for a hybrid implementation, even if technically possible, introduces undesired risks.

There is the potential to take multiple environments with very different resilience characteristics and make them interdependent of each other. This can result in performance and reliability issues, and these may increase not only the possibility of the risk manifesting, but also increase in the impact or blast radius.

Instead, hybrid implementations will see separate OpenShift clusters deployed into various locations. A customer may deploy clusters on-premises to cater for a workload that can’t be migrated to the cloud in the short term. Separate OpenShift clusters can then deployed into accounts in AWS for workloads on the cloud. Customers can also deploy separate OpenShift clusters in different AWS regions to cater for proximity to the consuming customer.

Though adding multiple clusters doesn’t add significant administrative overhead, there is a desire to be able to gain visibility and telemetry to all the deployed clusters from a central location. This may see the OpenShift clusters registered with Red Hat Advanced Cluster Manager for Kubernetes.

Summary

Take advantage of the IPI model, not only as a guide but to also save time. Make AWS Organizations, AWS Control Tower, and the AWS Service catalog part of your cloud and hybrid strategies. These will not only speed up migrations but also form building blocks for a modernized business with a focus of enabling prescriptive self-service. Consider Red Hat advanced cluster manager for multi cluster management.

Noise

Tag Archives: OpenShift

Deploying IBM Cloud Pak for Data on Red Hat OpenShift Service on AWS

Cloud Pak for data architecture

Cost

Prerequisites

Installation steps

Validation steps

Post installation

Cleanup

Conclusion

Additional resources

Architecture Patterns for Red Hat OpenShift on AWS

Introduction

OpenShift building blocks

AWS concepts and terminology

Deploying OpenShift

What does this look like at scale?

OpenShift in a multi-account context

Hybrid use case implementation

Summary

The collective thoughts of the interwebz