Tag Archives: AWS Service Catalog

Week in Review: Terraform in Service Catalog, AWS Supply Chain, Streaming Response in Lambda, and Amplify Library for Swift – April 10, 2023

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/week-in-review-terraform-in-service-catalog-aws-supply-chain-streaming-response-in-lambda-and-amplify-library-for-swift-april-10-2023/

The AWS Summit season has started. AWS Summits are free technical and business conferences happening in large cities across the planet. This week, we were happy to welcome our customers and partners in Sydney and Paris. In France, 9,973 customers and partners joined us for the day to meet and exchange ideas but also to attend one of the more than 145 technical breakout sessions and the keynote. This is the largest cloud computing event in France, and I can’t resist sharing a picture from the main room during the opening keynote.

AWS Summit Paris keynote

There are AWS Summits on all continents ; you can find the list and the links for registration here https://aws.amazon.com/events/summits. The next on my agenda are listed at the end of this post.

These two Summits did not slow down our services teams. I counted 44 new capabilities since last Monday. Here are the few that caught my attention.

Last Week on AWS

AWS Lambda response streaming – Response streaming is a new invocation pattern that lets functions progressively stream response payloads back to clients. You can use Lambda response payload streaming to send response data to callers as it becomes available. Response streaming also allows you to build functions that return larger payloads and perform long-running operations while reporting incremental progress (within the 15 minutes execution period). My colleague Julian wrote an incredibly detailed blog post to help you to get started.

AWS Supply Chain Now Generally Available – AWS Supply Chain is a cloud application that mitigates risk and lowers costs with unified data and built-in contextual collaboration. It connects to your existing enterprise resource planning (ERP) and supply chain management systems to bring you ML-powered actionable insights into your supply chain.

AWS Service Catalog Supports Terraform Templates – With AWS Service Catalog, you can create, govern, and manage a catalog of infrastructure as code (IaC) templates that are approved for use on AWS. You can now define AWS Service Catalog products and their resources using either AWS CloudFormation or Hashicorp Terraform and choose the tool that better aligns with your processes and expertise.

Amazon S3 enforces two security best practices and brings new visibility into object replication status – As announced on December 13, 2022, Amazon S3 is now deploying two new default bucket security settings by automatically enabling S3 Block Public Access and disabling S3 access control lists (ACLs) for all new S3 buckets. Amazon S3 also adds a new Amazon CloudWatch metric that can be used to diagnose and correct S3 Replication configuration issues more quickly. The OperationFailedReplication metric, available in both the Amazon S3 console and in Amazon CloudWatch, gives you per-minute visibility into the number of objects that did not replicate to the destination bucket for each of your replication rules.

AWS Security Hub launches four security best practicesAWS Security Hub has released 4 new controls for its National Institute of Standards and Technology (NIST) SP 800-53 Rev. 5 standard. These controls conduct fully-automatic security checks against Elastic Load Balancing (ELB), Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Redshift, and Amazon Simple Storage Service (Amazon S3). To use these controls, you should first turn on the NIST standard.

AWS Cloud Operation Competency Partners – AWS Cloud Operations covers five fundamental solution areas: Cloud Governance, Cloud Financial Management, Monitoring and Observability, Compliance and Auditing, and Operations Management. The new competency enables customers to select validated AWS Partners who offer comprehensive solutions with an integrated approach across multiple areas.

Amplify Library for Swift on macOS – Amplify is an open-source, client-side library making it easier to access a cloud backend from your front-end application code. It provides language-specific constructs to abstract low-level details of the cloud API. It helps you to integrate services such as analytics, object storage, REST or GraphQL APIs, user authentication, geolocation and mapping, and push notifications. You can now write beautiful macOS applications that connect to the same cloud backend as their iOS counterparts.

X in Y Jeff started this section a while ago to list the expansion of new services and capabilities to additional Regions. I noticed 11 Regional expansions this week:

Upcoming AWS Events
And to finish this post, I recommend you check your calendars and sign up for these AWS-led events:

Dot Net Developer Day.Net Developer Day.NET Enterprise Developer Day EMEA 2023 (April 25) is a free, one-day virtual conference providing enterprise developers with the most relevant information to swiftly and efficiently migrate and modernize their .NET applications and workloads on AWS.

AWS re:Inforce 2023 – Now register AWS re:Inforce, in Anaheim, California, June 13–14. AWS Chief Information Security Officer CJ Moses will share the latest innovations in cloud security and what AWS Security is focused on. The breakout sessions will provide real-world examples of how security is embedded into the way businesses operate. To learn more and get the limited discount code to register, see CJ’s blog post of Gain insights and knowledge at AWS re:Inforce 2023 in the AWS Security Blog.

AWS Global Summits – Check your calendars and sign up for the AWS Summit close to where you live or work: Seoul (May 3–4), Berlin and Singapore (May 4), Stockholm (May 11), Hong Kong (May 23), Amsterdam (June 1), London (June 7), Madrid (June 15), and Milano (June 22).

AWS Community Day – Join community-led conferences driven by AWS user group leaders close to your city: Lima (April 15), Helsinki (April 20), Chicago (June 15), Manila (June 29–30), and Munich (September 14). Recently, we have been bringing together AWS user groups from around the world into Meetup Pro accounts. Find your group and its meetups in your city!

You can browse all upcoming AWS-led in-person and virtual events, and developer-focused events such as AWS DevDay.

Stay Informed
That was my selection for this week! To better keep up with all of this news, don’t forget to check out the following resources:

That’s all for this week. Check back next Monday for another Week in Review!

— seb

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

New – Self-Service Provisioning of Terraform Open-Source Configurations with AWS Service Catalog

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-self-service-provisioning-of-terraform-open-source-configurations-with-aws-service-catalog/

With AWS Service Catalog, you can create, govern, and manage a catalog of infrastructure as code (IaC) templates that are approved for use on AWS. These IaC templates can include everything from virtual machine images, servers, software, and databases to complete multi-tier application architectures. You can control which IaC templates and versions are available, what is configured by each version, and who can access each template based on individual, group, department, or cost center. End users such as engineers, database administrators, and data scientists can then quickly discover and self-service provision approved AWS resources that they need to use to perform their daily job functions.

When using Service Catalog, the first step is to create products based on your IaC templates. You can then collect products, together with configuration information, in a portfolio.

Starting today, you can define Service Catalog products and their resources using either AWS CloudFormation or Hashicorp Terraform and choose the tool that better aligns with your processes and expertise. You can now integrate your existing Terraform configurations into Service Catalog to have them part of a centrally approved portfolio of products and share it with the AWS accounts used by your end users. In this way, you can prevent inconsistencies and mitigate the risk of noncompliance.

When resources are deployed by Service Catalog, you can maintain least privilege access during provisioning and govern tagging on the deployed resources. End users of Service Catalog pick and choose what they need from the list of products and versions they have access to. Then, they can provision products in a single action regardless of the technology (CloudFormation or Terraform) used for the deployment.

The Service Catalog hub-and-spoke model that enables organizations to govern at scale can now be extended to include Terraform configurations. With the Service Catalog hub and spoke model, you can centrally manage deployments using a management/user account relationship:

  • One management account – Used to create Service Catalog products, organize them into portfolios, and share portfolios with user accounts
  • Multiple user accounts (up to thousands) – A user account is any AWS account in which the end users of Service Catalog are provisioning resources.

Let’s see how this works in practice.

Creating an AWS Service Catalog Product Using Terraform
To get started, I install the Terraform Reference Engine (provided by AWS on GitHub) that configures the code and infrastructure required for the Terraform open-source engine to work with AWS Service Catalog. I only need to do this once, in the management account for Service Catalog, and the setup takes just minutes. I use the automated installation script:

./deploy-tre.sh -r us-east-1

To keep things simple for this post, I create a product deploying a single EC2 instance using AWS Graviton processors and the Amazon Linux 2023 operating system. Here’s the content of my main.tf file:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.16"
    }
  }

  required_version = ">= 1.2.0"
}

provider "aws" {
  region  = "us-east-1"
}

resource "aws_instance" "app_server" {
  ami           = "ami-00c39f71452c08778"
  instance_type = "t4g.large"

  tags = {
    Name = "GravitonServerWithAmazonLinux2023"
  }
}

I sign in to the AWS Management Console in the management account for Service Catalog. In the Service Catalog console, I choose Product list in the Administration section of the navigation pane. There, I choose Create product.

In Product details, I select Terraform open source as Product type. I enter a product name and description and the name of the owner.

Console screenshot.

In the Version details, I choose to Upload a template file (using a tar.gz archive). Optionally, I can specify the template using an S3 URL or an external code repository (on GitHub, GitHub Enterprise Server, or Bitbucket) using an AWS CodeStar provider.

Console screenshot.

I enter support details and custom tags. Note that tags can be used to categorize your resources and also to check permissions to create a resource. Then, I complete the creation of the product.

Adding an AWS Service Catalog Product Using Terraform to a Portfolio
Now that the Terraform product is ready, I add it to my portfolio. A portfolio can include both Terraform and CloudFormation products. I choose Portfolios from the Administrator section of the navigation pane. There, I search for my portfolio by name and open it. I choose Add product to portfolio. I search for the Terraform product by name and select it.

Console screenshot.

Terraform products require a launch constraint. The launch constraint specifies the name of an AWS Identity and Access Management (IAM) role that is used to deploy the product. I need to separately ensure that this role is created in every account with which the product is shared.

The launch role is assumed by the Terraform open-source engine in the management account when an end user launches, updates, or terminates a product. The launch role also contains permissions to describe, create, and update a resource group for the provisioned product and tag the product resources. In this way, Service Catalog keeps the resource group up-to-date and tags the resources associated with the product.

The launch role enables least privilege access for end users. With this feature, end users don’t need permission to directly provision the product’s underlying resources because your Terraform open-source engine assumes the launch role to provision those resources, such as an approved configuration of an Amazon Elastic Compute Cloud (Amazon EC2) instance.

In the Launch constraint section, I choose Enter role name to use a role I created before for this product:

  • The trust relationship of the role defines the entities that can assume the role. For this role, the trust relationship includes Service Catalog and the management account that contains the Terraform Reference Engine.
  • For permissions, the role allows to provision, update, and terminate the resources required by my product and to manage resource groups and tags on those resources.

Console screenshot.

I complete the addition of the product to my portfolio. Now the product is available to the end users who have access to this portfolio.

Launching an AWS Service Catalog Product Using Terraform
End users see the list of products and versions they have access to and can deploy them in a single action. If you already use Service Catalog, the experience is the same as with CloudFormation products.

I sign in to the AWS Console in the user account for Service Catalog. The portfolio I used before has been shared by the management account with this user account. In the Service Catalog console, I choose Products from the Provisioning group in the navigation pane. I search for the product by name and choose Launch product.

Console screenshot.

I let Service Catalog generate a unique name for the provisioned product and select the product version to deploy. Then, I launch the product.

Console screenshot.

After a few minutes, the product has been deployed and is available. The deployment has been managed by the Terraform Reference Engine.

Console screenshot.

In the Associated tags tab, I see that Service Catalog automatically added information on the portfolio and the product.

Console screenshot.

In the Resources tab, I see the resources created by the provisioned product. As expected, it’s an EC2 instance, and I can follow the link to open the Amazon EC2 console and get more information.

Console screenshot.

End users such as engineers, database administrators, and data scientists can continue to use Service Catalog and launch the products they need without having to consider if they are provisioned using Terraform or CloudFormation.

Availability and Pricing
AWS Service Catalog support for Terraform open-source configurations is available today in all AWS Regions where it is offered. There is no change in pricing when using Terraform. With Service Catalog, you pay for the API calls you make to the service, and you can start for free with the free tier. You also pay for the resources used and created by the Terraform Reference Engine. For more information, see Service Catalog Pricing.

Enable self-service provisioning at scale for your Terraform open-source configurations.

Danilo

Maintain visibility over the use of cloud architecture patterns

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/maintain-visibility-over-the-use-of-cloud-architecture-patterns/

Cloud platform and enterprise architecture teams use architecture patterns to provide guidance for different use cases. Cloud architecture patterns are typically aggregates of multiple Amazon Web Services (AWS) resources, such as Elastic Load Balancing with Amazon Elastic Compute Cloud, or Amazon Relational Database Service with Amazon ElastiCache. In a large organization, cloud platform teams often have limited governance over cloud deployments, and, therefore, lack control or visibility over the actual cloud pattern adoption in their organization.

While having decentralized responsibility for cloud deployments is essential to scale, a lack of visibility or controls leads to inefficiencies, such as proliferation of infrastructure templates, misconfigurations, and insufficient feedback loops to inform cloud platform roadmap.

To address this, we present an integrated approach that allows cloud platform engineers to share and track use of cloud architecture patterns with:

  1. AWS Service Catalog to publish an IT service catalog of codified cloud architecture patterns that are pre-approved for use in the organization.
  2. Amazon QuickSight to track and visualize actual use of service catalog products across the organization.

This solution enables cloud platform teams to maintain visibility into the adoption of cloud architecture patterns in their organization and build a release management process around them.

Publish architectural patterns in your IT service catalog

We use AWS Service Catalog to create portfolios of pre-approved cloud architecture patterns and expose them as self-service to end users. This is accomplished in a shared services AWS account where cloud platform engineers manage the lifecycle of portfolios and publish new products (Figure 1). Cloud platform engineers can publish new versions of products within a portfolio and deprecate older versions, without affecting already-launched resources in end-user AWS accounts. We recommend using organizational sharing to share portfolios with multiple AWS accounts.

Application engineers launch products by referencing the AWS Service Catalog API. Access can be via infrastructure code, like AWS CloudFormation and TerraForm, or an IT service management tool, such as ServiceNow. We recommend using a multi-account setup for application deployments, with an application deployment account hosting the deployment toolchain: in our case, using AWS developer tools.

Although not explicitly depicted, the toolchain can be launched as an AWS Service Catalog product and include pre-populated infrastructure code to bootstrap initial product deployments, as described in the blog post Accelerate deployments on AWS with effective governance.

Launching cloud architecture patterns as AWS Service Catalog products

Figure 1. Launching cloud architecture patterns as AWS Service Catalog products

Track the adoption of cloud architecture patterns

Track the usage of AWS Service Catalog products by analyzing the corresponding AWS CloudTrail logs. The latter can be forwarded to an Amazon EventBridge rule with a filter on the following events: CreateProduct, UpdateProduct, DeleteProduct, ProvisionProduct and TerminateProvisionedProduct.

The logs are generated no matter how you interact with the AWS Service Catalog API, such as through ServiceNow or TerraForm. Once in EventBridge, Amazon Kinesis Data Firehose delivers the events to Amazon Simple Storage Service (Amazon S3) from where QuickSight can access them. Figure 2 depicts the end-to-end flow.

Tracking adoption of AWS Service Catalog products with Amazon QuickSight

Figure 2. Tracking adoption of AWS Service Catalog products with Amazon QuickSight

Depending on your AWS landing zone setup, CloudTrail logs from all relevant AWS accounts and regions need to be forwarded to a central S3 bucket in your shared services account or, otherwise, centralized logging account. Figure 3 provides an overview of this cross-account log aggregation.

Aggregating AWS Service Catalog product logs across AWS accounts

Figure 3. Aggregating AWS Service Catalog product logs across AWS accounts

If your landing zone allows, consider giving permissions to EventBridge in all accounts to write to a central event bus in your shared services AWS account. This avoids having to set up Kinesis Data Firehose delivery streams in all participating AWS accounts and further simplifies the solution (Figure 4).

Aggregating AWS Service Catalog product logs across AWS accounts to a central event bus

Figure 4. Aggregating AWS Service Catalog product logs across AWS accounts to a central event bus

If you are already using an organization trail, you can use Amazon Athena or AWS Lambda to discover the relevant logs in your QuickSight dashboard, without the need to integrate with EventBridge and Kinesis Data Firehose.

Reporting on product adoption can be customized in QuickSight. The S3 bucket storing AWS Service Catalog logs can be defined in QuickSight as datasets, for which you can create an analysis and publish as a dashboard.

In the past, we have reported on the top ten products used in the organization (if relevant, also filtered by product version or time period) and the top accounts in terms of product usage. The following figure offers an example dashboard visualizing product usage by product type and number of times they were provisioned. Note: the counts of provisioned and terminated products differ slightly, as logging was activated after the first products were created and provisioned for demonstration purposes.

Example Amazon QuickSight dashboard tracking AWS Service Catalog product adoption

Figure 5. Example Amazon QuickSight dashboard tracking AWS Service Catalog product adoption

Conclusion

In this blog, we described an integrated approach to track adoption of cloud architecture patterns using AWS Service Catalog and QuickSight. The solution has a number of benefits, including:

  • Building an IT service catalog based on pre-approved architectural patterns
  • Maintaining visibility into the actual use of patterns, including which patterns and versions were deployed in the organizational units’ AWS accounts
  • Compliance with organizational standards, as architectural patterns are codified in the catalog

In our experience, the model may compromise on agility if you enforce a high level of standardization and only allow the use of a few patterns. However, there is the potential for proliferation of products, with many templates differing slightly without a central governance over the catalog. Ideally, cloud platform engineers assume responsibility for the roadmap of service catalog products, with formal intake mechanisms and feedback loops to account for builders’ localization requests.

Accelerate deployments on AWS with effective governance

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/accelerate-deployments-on-aws-with-effective-governance/

Amazon Web Services (AWS) users ask how to accelerate their teams’ deployments on AWS while maintaining compliance with security controls. In this blog post, we describe common governance models introduced in mature organizations to manage their teams’ AWS deployments. These models are best used to increase the maturity of your cloud infrastructure deployments.

Governance models for AWS deployments

We distinguish three common models used by mature cloud adopters to manage their infrastructure deployments on AWS. The models differ in what they control: the infrastructure code, deployment toolchain, or provisioned AWS resources. We define the models as follows:

  1. Central pattern library, which offers a repository of curated deployment templates that application teams can re-use with their deployments.
  2. Continuous Integration/Continuous Delivery (CI/CD) as a service, which offers a toolchain standard to be re-used by application teams.
  3. Centrally managed infrastructure, which allows application teams to deploy AWS resources managed by central operations teams.

The decision of how much responsibility you shift to application teams depends on their autonomy, operating model, application type, and rate of change. The three models can be used in tandem to address different use cases and maximize impact. Typically, organizations start by gathering pre-approved deployment templates in a central pattern library.

Model 1: Central pattern library

With this model, cloud platform engineers publish a central pattern library from which teams can reference infrastructure as code templates. Application teams reuse the templates by forking the central repository or by copying the templates into their own repository. Application teams can also manage their own deployment AWS account and pipeline with AWS CodePipeline), as well as the resource-provisioning process, while reusing templates from the central pattern library with a service like AWS CodeCommit. Figure 1 provides an overview of this governance model.

Deployment governance with central pattern library

Figure 1. Deployment governance with central pattern library

The central pattern library represents the least intrusive form of enablement via reusable assets. Application teams appreciate the central pattern library model, as it allows them to maintain autonomy over their deployment process and toolchain. Reusing existing templates speeds up the creation of your teams’ first infrastructure templates and eases policy adherence, such as tagging policies and security controls.

After the reusable templates are in the application team’s repository, incremental updates can be pulled from the central library when the template has been enhanced. This allows teams to pull when they see fit. Changes to the team’s repository will trigger the pipeline to deploy the associated infrastructure code.

With the central pattern library model, application teams need to manage resource configuration and CI/CD toolchain on their own in order to gain the benefits of automated deployments. Model 2 addresses this.

Model 2: CI/CD as a service

In Model 2, application teams launch a governed deployment pipeline from AWS Service Catalog. This includes the infrastructure code needed to run the application and “hello world” source code to show the end-to-end deployment flow.

Cloud platform engineers develop the service catalog portfolio (in this case the CI/CD toolchain). Then, application teams can launch AWS Service Catalog products, which deploy an instance of the pipeline code and populated Git repository (Figure 2).

The pipeline is initiated immediately after the repository is populated, which results in the “hello world” application being deployed to the first environment. The infrastructure code (for example, Amazon Elastic Compute Cloud [Amazon EC2] and AWS Fargate) will be located in the application team’s repository. Incremental updates can be pulled by launching a product update from AWS Service Catalog. This allows application teams to pull when they see fit.

Deployment governance with CI/CD as a service

Figure 2. Deployment governance with CI/CD as a service

This governance model is particularly suitable for mature developer organizations with full-stack responsibility or platform projects, as it provides end-to-end deployment automation to provision resources across multiple teams and AWS accounts. This model also adds security controls over the deployment process.

Since there is little room for teams to adapt the toolchain standard, the model can be perceived as very opinionated. The model expects application teams to manage their own infrastructure. Model 3 addresses this.

Model 3: Centrally managed infrastructure

This model allows application teams to provision resources managed by a central operations team as self-service. Cloud platform engineers publish infrastructure portfolios to AWS Service Catalog with pre-approved configuration by central teams (Figure 3). These portfolios can be shared with all AWS accounts used by application engineers.

Provisioning AWS resources via AWS Service Catalog products ensures resource configuration fulfills central operations requirements. Compared with Model 2, the pre-populated infrastructure templates launch AWS Service Catalog products, as opposed to directly referencing the API of the corresponding AWS service (for example Amazon EC2). This locks down how infrastructure is configured and provisioned.

Deployment governance with centrally managed infrastructure

Figure 3. Deployment governance with centrally managed infrastructure

In our experience, it is essential to manage the variety of AWS Service Catalog products. This avoids proliferation of products with many templates differing slightly. Centrally managed infrastructure propagates an “on-premises” mindset so it should be used only in cases where application teams cannot own the full stack.

Models 2 and 3 can be combined for application engineers to launch both deployment toolchain and resources as AWS Service Catalog products (Figure 4), while also maintaining the opportunity to provision from pre-populated infrastructure templates in the team repository. After the code is in their repository, incremental updates can be pulled by running an update from the provisioned AWS Service Catalog product. This allows the application team to pull an update as needed while avoiding manual deployments of service catalog products.

Using AWS Service Catalog to automate CI/CD and infrastructure resource provisioning

Figure 4. Using AWS Service Catalog to automate CI/CD and infrastructure resource provisioning

Comparing models

The three governance models differ along the following aspects (see Table 1):

  • Governance level: What component is managed centrally by cloud platform engineers?
  • Role of application engineers: What is the responsibility split and operating model?
  • Use case: When is each model applicable?

Table 1. Governance models for managing infrastructure deployments

 

Model 1: Central pattern library Model 2: CI/CD as a service Model 3: Centrally managed infrastructure
Governance level Centrally defined infrastructure templates Centrally defined deployment toolchain Centrally defined provisioning and management of AWS resources
Role of cloud platform engineers Manage pattern library and policy checks Manage deployment toolchain and stage checks Manage resource provisioning (including CI/CD)
Role of application teams Manage deployment toolchain and resource provisioning Manage resource provisioning Manage application integration
Use case Federated governance with application teams maintaining autonomy over application and infrastructure Platform projects or development organizations with strong preference for pre-defined deployment standards including toolchain Applications without development teams (e.g., “commercial-off-the-shelf”) or with separation of duty (e.g., infrastructure operations teams)

Conclusion

In this blog post, we distinguished three common governance models to manage the deployment of AWS resources. The three models can be used in tandem to address different use cases and maximize impact in your organization. The decision of how much responsibility is shifted to application teams depends on your organizational setup and use case.

Want to learn more?

Govern CI/CD best practices via AWS Service Catalog

Post Syndicated from César Prieto Ballester original https://aws.amazon.com/blogs/devops/govern-ci-cd-best-practices-via-aws-service-catalog/

Introduction

AWS Service Catalog enables organizations to create and manage Information Technology (IT) services catalogs that are approved for use on AWS. These IT services can include resources such as virtual machine images, servers, software, and databases to complete multi-tier application architectures. AWS Service Catalog lets you centrally manage deployed IT services and your applications, resources, and metadata , which helps you achieve consistent governance and meet your compliance requirements. In addition,  this configuration enables users to quickly deploy only approved IT services.

In large organizations, as more products are created, Service Catalog management can become exponentially complicated when different teams work on various products. The following solution simplifies Service Catalog products provisioning by considering elements such as shared accounts, roles, or users who can run portfolios or tags in the form of best practices via Continuous Integrations and Continuous Deployment (CI/CD) patterns.

This post demonstrates how Service Catalog Products can be delivered by taking advantage of the main benefits of CI/CD principles along with reducing complexity required to sync services. In this scenario, we have built a CI/CD Pipeline exclusively using AWS Services and the AWS Cloud Development Kit (CDK) Framework to provision the necessary Infrastructure.

Customers need the capability to consume services in a self-service manner, with services built on patterns that follow best practices, including focus areas such as compliance and security. The key tenants for these customers are: the use of infrastructure as code (IaC), and CI/CD. For these reasons, we built a scalable and automated deployment solution covered in this post.Furthermore, this post is also inspired from another post from the AWS community, Building a Continuous Delivery Pipeline for AWS Service Catalog.

Solution Overview

The solution is built using a unified AWS CodeCommit repository with CDK v1 code, which manages and deploys the Service Catalog Product estate. The solution supports the following scenarios: 1) making Products available to accounts and 2) provisioning these Products directly into accounts. The configuration provides flexibility regarding which components must be deployed in accounts as opposed to making a collection of these components available to account owners/users who can in turn build upon and provision them via sharing.

Figure shows the pipeline created comprised of stages

The pipeline created is comprised of the following stages:

  1. Retrieving the code from the repository
  2. Synthesize the CDK code to transform it into a CloudFormation template
  3. Ensure the pipeline is defined correctly
  4. Deploy and/or share the defined Portfolios and Products to a hub account or multiple accounts

Deploying and using the solution

Deploy the pipeline

We have created a Python AWS Cloud Development Kit (AWS CDK) v1 application hosted in a Git Repository. Deploying this application will create the required components described in this post. For a list of the deployment prerequisites, see the project README.

Clone the repository to your local machine. Then, bootstrap and deploy the CDK stack following the next steps.

git clone https://github.com/aws-samples/aws-cdk-service-catalog-pipeline
cd aws-cdk-service-catalog
pip install -r requirements.txt
cdk bootstrap aws://account_id/eu-west-1
cdk deploy

The infrastructure creation takes around 3-5 minutes to complete deploying the AWS CodePipelines and repository creation. Once CDK has deployed the components, you will have a new empty repository where we will define the target Service Catalog estate. To do so, clone the new repository and push our sample code into it:

git clone https://git-codecommit.eu-west-1.amazonaws.com/v1/repos/service-catalog-repo
git checkout -b main
cd service-catalog-repo
cp -aR ../cdk-service-catalog-pipeline/* .
git add .
git commit -am "First commit"
git push origin main

Review and update configuration

Our cdk.json file is used to manage context settings such as shared accounts, permissions, region to deploy, etc.

shared_accounts_ecs: AWS account IDs where the ECS portfolio will be shared
shared_accounts_storage: AWS account IDs where the Storage portfolio will be shared
roles: ARN for the roles who will have permissions to access to the Portfolio
users: ARN for the users who will have permissions to access to the Portfolio
groups: ARN for the groups who will have permissions to access to the Portfolio
hub_account: AWS account ID where the Portfolio will be created
pipeline_account: AWS account ID where the main Infrastructure Pipeline will be created
region: the AWS region to be used for the deployment of the account
"shared_accounts_ecs":["012345678901","012345678902"],
    "shared_accounts_storage":["012345678901","012345678902"],
    "roles":[],
    "users":[],
    "groups":[],
    "hub_account":"012345678901",
    "pipeline_account":"012345678901",
    "region":"eu-west-1"

There are two mechanisms that can be used to create Service Catalog Products in this solution: 1) providing a CloudFormation template or 2) declaring a CDK stack (that will be transformed as part of the pipeline). Our sample contains two Products, each demonstrating one of these options: an Amazon Elastic Container Services (ECS) deployment and an Amazon Simple Storage Service (S3) product.

These Products are automatically shared with accounts specified in the shared_accounts_storage variable. Each product is managed by a CDK Python file in the cdk_service_catalog folder.

Figure shows Pipeline stages that AWS CodePipeline runs through

Figure shows Pipeline stages that AWS CodePipeline runs through

Figure shows Pipeline stages that AWS CodePipeline runs through

The Pipeline stages that AWS CodePipeline runs through are as follows:

  1. Download the AWS CodeCommit code
  2. Synthesize the CDK code to transform it into a CloudFormation template
  3. Auto-modify the Pipeline in case you have made manual changes to it
  4. Display the different Portfolios and Products associated in a Hub account in a Region or in multiple accounts

Adding new Portfolios and Products

To add a new Portfolio to the Pipeline, we recommend creating a new class under cdk_service_catalog similar to cdk_service_catalog_ecs_stack.py from our sample. Once the new class is created with the products you wish to associate, we instantiate the new class inside cdk_pipelines.py, and then add it inside the wave in the stage. There are two ways to create portfolio products. The first one is by creating a CloudFormation template, as can be seen in the Amazon Elastic Container Service (ECS) example.  The second way is by creating a CDK stack that will be transformed into a template, as can be seen in the Storage example.

Product and Portfolio definition:

class ECSCluster(servicecatalog.ProductStack):
    def __init__(self, scope, id):
        super().__init__(scope, id)
        # Parameters for the Product Template
        cluster_name = cdk.CfnParameter(self, "clusterName", type="String", description="The name of the ECS cluster")
        container_insights_enable = cdk.CfnParameter(self, "container_insights", type="String",default="False",allowed_values=["False","True"],description="Enable Container Insights")
        vpc = cdk.CfnParameter(self, "vpc", type="AWS::EC2::VPC::Id", description="VPC")
        ecs.Cluster(self,"ECSCluster_template", enable_fargate_capacity_providers=True,cluster_name=cluster_name.value_as_string,container_insights=bool(container_insights_enable.value_as_string),vpc=vpc)
              cdk.Tags.of(self).add("key", "value")

Clean up

The following will help you clean up all necessary parts of this post: After completing your demo, feel free to delete your stack using the CDK CLI:

cdk destroy --all

Conclusion

In this post, we demonstrated how Service Catalog deployments can be accelerated by building a CI/CD pipeline using self-managed services. The Portfolio & Product estate is defined in its entirety by using Infrastructure-as-Code and automatically deployed based on your configuration. To learn more about AWS CDK Pipelines or AWS Service Catalog, visit the appropriate product documentation.

Authors:

 

César Prieto Ballester

César Prieto Ballester is a Senior DevOps Consultant at AWS. He enjoys automating everything and building infrastructure using code. Apart from work, he plays electric guitar and loves riding his mountain bike.

Daniel Mutale

Daniel Mutale is a Cloud Infrastructure Architect at AWS Professional Services. He enjoys creating cloud based architectures and building out the underlying infrastructure to support the architectures using code. Apart from work, he is an avid animal photographer and has a passion for interior design.

Raphael Sack

Raphael is a technical business development manager for Service Catalog & Control Tower. He enjoys tinkering with automation and code and active member of the management tools community.

Deploy consistent DNS with AWS Service Catalog and AWS Control Tower customizations

Post Syndicated from Shiva Vaidyanathan original https://aws.amazon.com/blogs/architecture/deploy-consistent-dns-with-aws-service-catalog-and-aws-control-tower-customizations/

Many organizations need to connect their on-premises data centers, remote sites, and cloud resources. A hybrid connectivity approach connects these different environments. Customers with a hybrid connectivity network need additional infrastructure and configuration for private DNS resolution to work consistently across the network. It is a challenge to build this type of DNS infrastructure for a multi-account environment. However, there are several options available to address this problem with AWS. Automating DNS infrastructure using Route 53 Resolver endpoints covers how to use Resolver endpoints or private hosted zones to manage your DNS infrastructure.

This blog provides another perspective on how to manage DNS infrastructure with  Customizations for Control Tower and AWS Service Catalog. Service Catalog Portfolios and products use AWS CloudFormation to abstract the complexity and provide standardized deployments. The solution enables you to quickly deploy DNS infrastructure compliant with standard practices and baseline configuration.

Control Tower Customizations with Service Catalog solution overview

The solution uses the Customizations for Control Tower framework and AWS Service Catalog to provision the DNS resources across a multi-account setup. The Service Catalog Portfolio created by the solution consists of three Amazon Route 53 products: Outbound DNS product, Inbound DNS product, and Private DNS. Sharing this portfolio with the organization makes the products available to both existing and future accounts in your organization. Users who are given access to AWS Service Catalog can choose to provision these three Route 53 products in a self-service or a programmatic manner.

  1. Outbound DNS product. This solution creates inbound and outbound Route 53 resolver endpoints in a Networking Hub account. Deploying the solution creates a set of Route 53 resolver rules in the same account. These resolver rules are then shared with the organization via AWS Resource Access Manager (RAM). Amazon VPCs in spoke accounts are then associated with the shared resolver rules by the Service Catalog Outbound DNS product.
  2. Inbound DNS product. A private hosted zone is created in the Networking Hub account to provide on-premises resolution of Amazon VPC IP addresses. A DNS forwarder for the cloud namespace is required to be configured by the customer for the on-premises DNS servers. This must point to the IP addresses of the Route 53 Inbound Resolver endpoints. Appropriate resource records (such as a CNAME record to a spoke account resource like an Elastic Load Balancer or a private hosted zone) are added. Once this has been done, the spoke accounts can launch the Inbound DNS Service Catalog product. This activates an AWS Lambda function in the hub account to authorize the spoke VPC to be associated to the Hub account private hosted zone. This should permit a client from on-premises to resolve the IP address of resources in your VPCs in AWS.
  3. Private DNS product. For private hosted zones in the spoke accounts, the corresponding Service Catalog product enables each spoke account to deploy a private hosted zone. The DNS name is a subdomain of the parent domain for your organization. For example, if the parent domain is cloud.example.com, one of the spoke account domains could be called spoke3.cloud.example.com. The product uses the local VPC ID (spoke account) and the Network Hub VPC ID. It also uses the Region for the Network Hub VPC that is associated to this private hosted zone. You provide the ARN of the Amazon SNS topic from the Networking Hub account. This creates an association of the Hub VPC to the newly created private hosted zone, which allows the spoke account to notify the Networking Hub account.

The notification from the spoke account is performed via a custom resource that is a part of the private hosted zone product. Processing of the notification in the Networking Hub account to create the VPC association is performed by a Lambda function in the Networking Hub account. We also record each authorization-association within Amazon DynamoDB tables in the Networking Hub account. One table is mapping the account ID with private hosted zone IDs and domain name, and the second table is mapping hosted zone IDs with VPC IDs.

The following diagram (Figure 1) shows the solution architecture:

Figure 1. A Service Catalog based DNS architecture setup with Route 53 Outbound DNS product, Inbound DNS product, and Route 53 Private DNS product

Figure 1. A Service Catalog based DNS architecture setup with Route 53 Outbound DNS product, Inbound DNS product, and Route 53 Private DNS product

Prerequisites

Deployment steps

The deployment of this solution has two phases:

  1. Deploy the Route 53 package to the existing Customizations for Control Tower (CfCT) solution in the management account.
  2. Setup user access, and provision Route 53 products using AWS Service Catalog in spoke accounts.

All the code used in this solution can be found in the GitHub repository.

Phase 1: Deploy the Route 53 package to the existing Customizations for Control Tower solution in the management account

Log in to the AWS Management Console of the management account. Select the Region where you want to deploy the landing zone. Deploy the Customizations for Control Tower (CfCT) Solution.

1. Clone your CfCT AWS CodeCommit repository:

2. Create a directory in the root of your CfCT CodeCommit repo called route53. Create a subdirectory called templates and copy the Route53-DNS-Service-Catalog-Hub-Account.yml template and the Route53-DNS-Service-Catalog-Spoke-Account.yml under the templates folder.

3. Edit the parameters present in file Route53-DNS-Service-Catalog-Hub-Account.json with value appropriate to your environment.

4. Create a S3 bucket leveraging s3Bucket.yml template and customizations.

5. Upload the three product template files (OutboundDNSProduct.yml, InboundDNSProduct.yml, PrivateDNSProduct.yml) to the s3 bucket created in step 4.

6. Under the same route53 directory, create another sub-directory called parameters. Place the updated parameter json file from previous step under this folder.

7. Edit the manifest.yaml file in the root of your CfCT CodeCommit repository to include the Route 53 resource, manifest.yml is provided as a reference. Update the Region values in this example to the Region of your Control Tower. Also update the deployment target account name to the equivalent Networking Hub account within your AWS Organization.

8. Create and push a commit for the changes made to the CfCT solution to your CodeCommit repository.

9. Finally, navigate to AWS CodePipeline in the AWS Management Console to monitor the progress. Validate the deployment of resources via CloudFormation StackSets is complete to the target Networking Hub account.

Phase 2: Setup user access, and provision Route 53 products using AWS Service Catalog in spoke accounts

In this section, we walk through how users can vend products from the shared AWS Service Catalog Portfolio using a self-service model. The following steps will walk you through setting up user access and provision products:

1. Sign in to AWS Management Console of the spoke account in which you want to deploy the Route 53 product.

2. Navigate to the AWS Service Catalog service, and choose Portfolios.

3. On the Imported tab, choose your portfolio as shown in Figure 2.

Figure 2. Imported DNS portfolio (spoke account)

Figure 2. Imported DNS portfolio (spoke account)

4. Choose the Groups, roles, and users pane and add the IAM role, user, or group that you want to use to launch the product.

5. In the left navigation pane, choose Products as shown in Figure 3.

6. On the Products page, choose either of the three products, and then choose Launch Product.

Figure 3. DNS portfolio products (Inbound DNS, Outbound DNS, and Private DNS products)

Figure 3. DNS portfolio products (Inbound DNS, Outbound DNS, and Private DNS products)

7. On the Launch Product page, enter a name for your provisioned product, and provide the product parameters:

  • Outbound DNS product:
    • ChildDomainNameResolverRuleId: Rule ID for the Shared Route 53 Resolver rule for child domains.
    • OnPremDomainResolverRuleID: Rule ID for the Shared Route 53 Resolver rule for on-premises DNS domain.
    • LocalVPCID: Enter the VPC ID, which the Route 53 Resolver rules are to be associated with (for example: vpc-12345).
  • Inbound DNS product:
    • NetworkingHubPrivateHostedZoneDomain: Domain of the private hosted zone in the hub account.
    • LocalVPCID: Enter the ID of the VPC from the account and Region where you are provisioning this product (for example: vpc-12345).
    • SNSAuthorizationTopicArn: Enter ARN of the SNS topic belonging to the Networking Hub account.
  • Private DNS product:
    • DomainName: the FQDN for the private hosted zone (for example: account1.parent.internal.com).
    • LocalVPCId: Enter the ID of the VPC from the account and Region where you are provisioning this product.
    • AdditionalVPCIds: Enter the ID of the VPC from the Network Hub account that you want to associate to your private hosted zone.
    • AdditionalAccountIds: Provide the account IDs of the VPCs mentioned in AdditionalVPCIds.
    • NetworkingHubAccountId: Account ID of the Networking Hub account
    • SNSAssociationTopicArn: Enter ARN of the SNS topic belonging to the Networking Hub account.

8. Select Next and Launch Product.

Validation of Control Tower Customizations with Service Catalog solution

For the Outbound DNS product:

  • Validate the successful DNS infrastructure provisioning. To do this, navigate to Route 53 service in the AWS Management Console. Under the Rules section, select the rule you provided when provisioning the product.
  • Under that Rule, confirm that spoke VPC is associated to this rule.
  • For further validation, launch an Amazon EC2 instance in one of the spoke accounts.  Resolve the DNS name of a record present in the on-premises DNS domain using the dig utility.

For the Inbound DNS product:

  • In the Networking Hub account, navigate to the Route 53 service in the AWS Management Console. Select the private hosted zone created here for inbound access from on-premises. Verify the presence of resource records and the VPCs to ensure spoke account VPCs are associated.
  • For further validation, from a client on-premises, resolve the DNS name of one of your AWS specific domains, using the dig utility, for example.

For the Route 53 private hosted zone (Private DNS) product:

  • Navigate to the hosted zone in the Route 53 AWS Management Console.
  • Expand the details of this hosted zone. You should see the VPCs (VPC IDs that were provided as inputs) associated during product provisioning.
  • For further validation, create a DNS A record in the Route 53 private hosted zone of one of the spoke accounts.
  • Spin up an EC2 instance in the VPC of another spoke account.
  • Resolve the DNS name of the record created in the previous step using the dig utility.
  • Additionally, the details of each VPC and private hosted zone association is maintained within DynamoDB tables in the Networking Hub account

Cleanup steps

All the resources deployed through CloudFormation templates should be deleted after successful testing and validation to avoid any unwanted costs.

  • Remove the changes made to the CfCT repo to remove the references to the Route 53 folder in the manifest.yaml and the route53 folder. Then commit and push the changes to prevent future re-deployment.
  • Go to the CloudFormation console, identify the stacks appropriately, and delete them.
  • In spoke accounts, you can shut down the provisioned AWS Service Catalog product(s), which would terminate the corresponding CloudFormation stacks on your behalf.

Note: In a multi account setup, you must navigate through account boundaries and follow the previous steps where products were deployed.

Conclusion

In this post, we showed you how to create a portfolio using AWS Service Catalog. It contains a Route 53 Outbound DNS product, an Inbound DNS product, and a Private DNS product. We described how you can share this portfolio with your AWS Organization. Using this solution, you can provision Route 53 infrastructure in a programmatic, repeatable manner to standardize your DNS infrastructure.

We hope that you’ve found this post informative and we look forward to hearing how you use this feature!

Automate building data lakes using AWS Service Catalog

Post Syndicated from Mamata Vaidya original https://aws.amazon.com/blogs/big-data/automate-building-data-lakes-using-aws-service-catalog/

Today, organizations spend a considerable amount of time understanding business processes, profiling data, and analyzing data from a variety of sources. The result is highly structured and organized data used primarily for reporting purposes. These traditional systems extract data from transactional systems that consist of metrics and attributes that describe different aspects of the business. Non-traditional data sources such as web server logs, sensor data, clickstream data, social network activity, text, and images drive new and interesting use cases like intrusion detection, predictive maintenance, ad placement, and numerous optimizations across a wide range of industries. However, storing the varied datasets can become expensive and difficult as the volume of data increases.

The data lake approach embraces these non-traditional data types, wherein all the data is kept in its raw form and only transformed when needed. A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. Data lakes can collect streaming audio, video, call logs, and sentiment and social media data to provide more complete, robust insights. This has a considerable impact on the ability to perform AI, machine learning (ML), and data science.

Before building a data lake, organizations need to complete the following prerequisites:

  • Understand the foundational building blocks of data lake
  • Understand the services involved in building a data lake
  • Define the personas needed to manage the data lake
  • Create the security policies required for the different services to work in harmony when moving the data to create the data lake

To make building a data lake easier, this post presents a solution to manage and deploy your data lake as an AWS Service Catalog product. This enables you to create a data lake for your entire organization or individual lines of business, or simply to get started with analytics and ML use cases.

Solution overview

This post provides a simple way to deploy a data lake as an AWS Service Catalog product. AWS Service Catalog allows you to centrally manage and deploy IT services and applications in a self-service manner through a common, customizable product catalog. We create automated pipelines to move data from an operational database into an Amazon Simple Storage Service (Amazon S3) based data lake as well as define ways to move unstructured data from disparate data sources into the data lake. We also define fine-grained permissions in the data lake to enable query engines like Amazon Athena to securely analyze data.

The following are some advantages of having your data lake as an AWS Service Catalog product:

  • Enforce compliance with corporate standards so you can control which IT services and versions are available and who gets permission access by individual, group, department, or cost center.
  • Enforce governance by helping employees quickly find and deploy only approved IT services without giving direct access to the underlying services.
  • End-users, like developers, data scientists, or business users, have quick and easy access to a custom, curated list of products that can be deployed consistently, is always in compliance, and is always secure through self-service, which accelerates business growth.
  • Enforce constraints such as limiting the AWS Region in which the data lake can be launched.
  • Enforce tagging based on department or cost center to keep track of the data lake built for different departments.
  • Centrally manage the IT service lifecycle by centrally adding new versions to the data lake product.
  • Improve operational efficiency by integrating with third-party products and ITSM tools such as ServiceNow and Jira.
  • Build a data lake based on a reusable foundation provided by a central IT organization.

The following diagram illustrates how data lake can be bundled as a product inside a Service Catalog Portfolio along with other products:

Solution architecture

The following diagram illustrates the architecture for this solution:

We use the following services in this solution:

  • Amazon S3 – Amazon S3 is an object storage service that offers industry-leading scalability, data availability, security, and performance. For this use case, you use Amazon S3 as storage for the data lake.
  • AWS Lake Formation – Lake Formation makes it simple to set up a secure data lake—a centralized, curated, and secured repository that stores all your data—both in its original form and prepared for analysis. The data lake admin can easily label the data and give users granular permissions to access authorized datasets.
  • AWS Glue – AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, ML, and application development.
  • Amazon Athena – Athena is an interactive query service that makes it simple to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries you run and the amount of data being scanned.

Datasets

To illustrate how data is managed in the data lake, we use sample datasets that are publicly available. The first dataset is United States manufacturers census data that we download in a structured format into a relational database. In addition, we can load United States school census data in its raw format into the data lake.

Walkthrough overview

AWS Service Catalog allows organizations to create and manage catalogs of IT services that are approved for use on AWS. It allows you to centrally manage deployed IT services and your applications, resources, and metadata. Following the same concept, we deploy a data lake as a collection of AWS services and resources as an AWS Service Catalog product. This helps you achieve consistent governance and meet your compliance requirements, while enabling users to quickly deploy only the approved services.

Follow the steps in the next sections to deploy a data lake as an AWS Service Catalog product. For this post, we load United States public census data into an Amazon Relational Database Service (Amazon RDS) for MySQL instance to demonstrate ingestion of data into the data lake from a relational database. We use an AWS CloudFormation template to create S3 buckets to load the script for creating the data lake as an AWS Service Catalog product as well as scripts for data transformation.

Deploy the CloudFormation template

Be sure to deploy your resources in the US East (N. Virginia) Region (us-east-1). We use the provided CloudFormation template to create all the necessary resources. This step removes any manual errors by increasing efficiency, and provides consistent configurations over time.

  1. Choose Launch Stack:
  2. On the Create stack page, Amazon S3 URL should show as https://aws-bigdata-blog.s3.amazonaws.com/artifacts/datalake-service-catalog/datalake_portfolio.yaml.
  3. Choose Next.
  4. Enter datalake-portfolio for the stack name.
  5. For Portfolio name, enter a name for the AWS Service Catalog portfolio that holds the data lake product.
  6. Choose Next.
  7. Choose Create stack and wait for the stack to create the resources in your AWS account.

On the stack’s Resources tab, you can find the following:

  • DataLakePortfolio – The AWS Service Catalog portfolio
  • ProdAsDataLake – The data lake as a product
  • ProductCFTDataLake – The CloudFormation template as a product

If you choose the arrow next to the DataLakePortfolio resource, you’re redirected to the AWS Service Catalog portfolio, with datalake listed as a product.

Grant permissions to launch the AWS Service Catalog product

We need to provide appropriate permissions for the current user to launch the datalake product we just created.

  1. On the portfolio page on the AWS Service Catalog console, choose the Groups, roles, and users tab.
  2. Choose Add groups, roles, users.
  3. Select the group, role, or user you want to grant permissions to launch the product.

Another approach is to enhance the capability of the data lake by building a multi-tenant data lake. A multi-tenant data lake enables hosting data from multiple business units in the same data lake and maintaining data isolation through roles with different permission sets. To build a multi-tenant data lake, you can add a wide range of stakeholders (developers, analysts, data scientists) from different organizational units. By defining appropriate roles, multi-tenancy helps achieve data sharing and collaboration between different teams and integrate multiple data silos to get a unified view of the data. You can add these appropriate roles on the portfolio page.

In the following example screenshot, data analysts from HR and Marketing have access to their own datasets, the business analyst has access to both datasets to get a unified view of the data to derive meaningful insights, and the admin user manages the operations of the central data lake.

In addition, you can enforce constraints on the data lake from the AWS Service Catalog console as opposed to the data lake product launched independently as a CloudFormation script. This allows the central IT team to enable governance control when a department chooses to build a data lake for their business users.

  1. To enable constraints, choose the Constraints tab on the portfolio page.

For example, a template constraint allows you to limit the options that are available to end-users when they launch the product. The following screenshot shows an example of configuring a template constraint.

The VPC CIDR range is restricted to a certain range when launching the data lake.

We can now see the template constraint listed on the Constraints tab.

If the constraint is violated and a different CIDR range is entered while launching the product, the template throws an error, as shown in the following screenshot.

In addition, while launching the product to track costs per department or team, the central IT team can define tags in the TagOptions library and force the operations team to select tags from a list of values to distinctly select the business unit for which the data lake is being created and eventually track costs per department or business unit.

  1. Choose the Tags tab to manage tags.
  2. After setting your organization’s standards for roles, constraints, and tags, the central IT team can share the AWS Service Catalog datalake portfolio with accounts or organizations via AWS Organizations.

AWS Service Catalog administrators from another AWS account can then distribute the data lake product to their end-users.

You can view the accounts with access to the portfolio on the Share tab.

Launch the data lake

To launch the data lake, complete the following steps:

  1. Sign in as the user or role that you granted permissions to launch the data lake. If you have never launched AWS Lake Formation service and not defined an initial administrator, please go to the service and add an administrator.
  2. On the AWS Service Catalog console, select the datalake product and choose Launch product.
  3. Select Generate name to automatically enter a name for the provisioned product.
  4. Select your product version (for this post, v1.0 is selected by default).
  5. Enter DB username and password.
  6. Verify the stack name of the previously launched CloudFormation template, datalake-portfolio.
  7. Choose Launch product.

The datalake product triggers the CloudFormation template in the background, creates all the resources, and launches the data lake in your account.

  1. On the AWS Service Catalog console, choose Provisioned products in the navigation pane.
  2. Choose the output value with the link to the CloudFormation stack that created the data lake for your account.
  3. On the Resources tab, review the details of the resources created.

The following resources are created in this step as part of the launching the AWS Service Catalog product:

  • Data ingestion:
    • A VPC with subnets and security groups for hosting the RDS for MySQL database with sample data.
    • An RDS for MySQL database as a sample source to load data into the data lake. Verify the VPC CIDR range to host the data lake as well as database subnet CIDR ranges for the database.
    • The default RDS for MySQL database. You can change the password as needed on the Amazon RDS console.
    • An AWS Glue JDBC connection to connect to the RDS for MySQL database with the sample data loaded.
    • An AWS Glue crawler for data ingestion into the data lake.
  • Data transformation:
    • AWS Glue jobs for data transformation.
    • An AWS Glue Data Catalog database to hold the metadata information.
    • AWS Identity and Access Management (IAM) AWS Glue and AWS Lambda workflow roles to read data from the RDS for MySQL database and load data into the data lake.
  • Data visualization:
    • IAM data lake administrator and data lake analyst roles for managing and accessing data in the data lake through Lake Formation.
    • Two Athena named queries.
    • Two users:
      • datalake_admin – Responsible for day-to-day operations, management, and governance of the data lake.
      • datalake_analyst – Has permissions to only view and analyze the data using different visualization tools.

Data ingestion, transformation, and visualization

After the CloudFormation stack is ready, we complete the following steps to ingest, transform, and visualize the data.

Ingest the data

We run an AWS Glue crawler to load data into the data lake. Optionally, you can verify that the data is available in the data source by following the steps in the appendix of this post. To run the crawler, complete the following steps:

  1. On the AWS Glue console, choose Crawlers in the navigation pane.

The Crawlers page shows four crawlers created as part of the data lake product deployment.

  1. Select the crawler GlueRDSCrawler-xxxx.
  2. Choose Run crawler.

A table is added to the AWS Glue database gluedatabasemysql-blogdb.

The raw data is now ready to run any kind of transformations that are needed. In this example, we transform the raw data into Parquet format.

Transform the data

AWS Glue provides a console and API operations to set up and manage your extract, transform, and load (ETL) workload. A job is the business logic that performs the ETL work in AWS Glue. When you start a job, AWS Glue runs a script that extracts data from sources, transforms the data, and loads it into targets. In this case, our source is the raw S3 bucket and the target is the curated S3 bucket to store the transformed data in Parquet format after the AWS Glue job runs.

To transform the data, complete the following steps:

  1. On the AWS Glue console, choose Jobs in the navigation pane.

The Jobs page lists the AWS Glue job created as part of the data lake product deployment.

  1. Select the job that starts with GlueRDSJob.
  2. On the Action menu, choose Edit script.
  3. Update the name of the S3 bucket on line 33 to the ProcessedBucketS3 value on the Outputs tab of the second CloudFormation stack.
  4. Select the job again and on the Action menu, choose Run job.

You can see the status of the job as it runs.

The ETL job uses the AWS Glue IAM role created as part of the CloudFormation script. To write data into the curated bucket of the data lake, appropriate permissions need to be granted to this role. These permissions have already been granted as part of the data lake deployment. When the job is complete, its status shows as Succeeded.

The transformed data is stored in the curated bucket on the data lake.

The sample data is now transformed and is ready for data visualization.

Visualize the data

In this final step, we use Lake Formation to manage and govern the data that determines who has access to the data and what level of access they have. We do this by assigning granular permissions for the users and personas created by the data lake product. We can then query the data using Athena.

The users datalake-admin and datalake-analyst have already been created. datalake_admin is responsible for day-to-day operations, management, and governance of the data lake. datalake_analyst has permissions to view and analyze the data using different visualization tools.

As part of the data lake deployment, we defined the curated S3 bucket as the data lake location in Lake Formation. To read from and write to the data lake location, we have to make sure all the permissions are properly assigned. In the previous section, we embedded the permission for the AWS Glue ETL job to read from and write to the data lake location in the CloudFormation template. Therefore, the role SC-xxxxGlueWorkFlowRole-xxxxx has appropriate permissions to assume by the crawlers and create the required database and table schema for querying the data. Note that the first crawler analyzes data in the RDS for MySQL database and doesn’t access the data lake, so we didn’t need to give it permissions for the data lake.

To run the crawler, complete the following steps:

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Select the crawler LakeCuratedZoneCrawler-xxxxx and choose Run crawler.

The crawler reads the data from the data lake and populates the table in the AWS Glue database created in the data ingestion stage and makes it available to query using Athena.

To query the populated data in the AWS Glue Data Catalog using Athena, we need to provide granular permissions to the role using Lake Formation governance and management.

  1. On the Lake Formation console, choose Data lake permissions in the navigation pane.
  2. Choose Grant.
  3. For IAM users and roles, choose the role you want to assign the permissions to.
  4. Select Named data catalog resources.
  5. Choose the database and table.
  6. For Table permissions, select Select.
  7. For Data permissions, select All data access.

This allows the user to see all the data in the table but not modify it.

  1. Choose Grant.

Now you can query the data with Athena. If you haven’t already set up the Athena query results path, see Specifying a Query Result Location for instructions.

  1. On the Athena console, open the query editor.
  2. Choose the Saved queries tab.

You should see the two queries created as part of the data lake product deployment.

  1. Choose the query CensusManufacturersQuery.

The database, table, and query are pre-populated in the query editor.

  1. Choose Run query to access the data in the data lake.

We have completed the process to load, transform, and visualize the data in the data lake by rapid deployment of a data lake as an AWS Service Catalog product. We used sample data ingested in an RDS for MySQL database as an example. You can repeat this process and implement similar steps using Amazon S3 as a data source. To do so, the sample data file schools-census-data.csv is loaded and the corresponding AWS Glue crawler and job to ingest, transform, and visualize the data has been created for you as part of this AWS Service Catalog data lake product deployment.

Conclusion

In this post, we saw how you can minimize the time and effort required to build a data lake. Setting up a data lake helps organizations to be data-driven, identifying patterns in data and acting quickly to accelerate business growth. Additionally, to take full advantage of your data lake, you can build and offer data-driven products and applications with ease through a highly customizable product catalog. With AWS Service Catalog, you can easily and quickly deploy a data lake following common best practices. AWS Service Catalog also enforces constraints for network and account baselines to securely build a data lake in an end-user environment.

Appendix

To verify the sample data is loaded into Amazon RDS, complete the following steps:

  1. On the Amazon Elastic Compute Cloud (Amazon EC2) console, select the EC2SampleRDSdata instance.
  2. On the Actions menu, choose Monitor and troubleshoot.
  3. Choose Get system log.

The system log shows the count of records loaded into the RDS for MySQL database:

count(*)

1296

Next, we can test the connection to the database.

  1. On the AWS Glue console, choose Connections in the navigation pane.

You should see RDSConnectionMySQL-xxxx created for you.

  1. Select the connection and choose Test connection.
  2. For IAM role¸ choose the role SC-xxxxGlueWorkFlowRole-xxxxx.

RDSConnectionMySQL-xxxx should successfully connect to your RDS for MySQL DB instance.


About the Authors

Mamata Vaidya is a Senior Solutions Architect at Amazon Web Services(AWS) accelerating customers in their adoption to the cloud in the area of bigdata analytics and foundational architecture. She has over 20 years of experience in building and architecting enterprise systems in healthcare, finance and cybersecurity with strong management skills. Prior to AWS, Mamata worked for Bristol-Myers Squibb and Citigroup in senior technical management positions. Outside of work, Mamata enjoys hiking with family and friends and mentoring high school students.

Shan Kandaswamy is a Solutions Architect at Amazon Web Services (AWS) who is passionate about helping customers solve complex problems. He is a technical evangelist who advocates for distributed architecture, bigdata analytics and serverless technologies to help customers navigate the cloud landscape as they move to cloud computing. He’s a big fan of travel, watching movies and learning something new every day.

Field Notes: Perform Automations in Ungoverned Regions During Account Launch Using AWS Control Tower Lifecycle Events

Post Syndicated from Amit Kumar original https://aws.amazon.com/blogs/architecture/field-notes-perform-automations-in-ungoverned-regions-during-account-launch-using-aws-control-tower-lifecycle-events/

This post was co-authored by Amit Kumar; Partner Solutions Architect at AWS, Pavan Kumar Alladi; Senior Cloud Architect at Tech Mahindra, and Thooyavan Arumugam; Senior Cloud Architect at Tech Mahindra.

Organizations use AWS Control Tower to set up and govern secure, multi-account AWS environments. Frequently enterprises with a global presence want to use AWS Control Tower to perform automations during the account creation including in AWS Regions where AWS Control Tower service is not available. To review the current list of Regions where AWS Control Tower is available, visit the AWS Regional Services List.

This blog post shows you how we can use AWS Control Tower lifecycle events, AWS Service Catalog, and AWS Lambda to perform automation in the Region where AWS Control Tower service is unavailable. This solution depicts the scenario for a single Region and the solution need to be changed to work with a multi-Regions scenario.

We use an AWS CloudFormation template to create a virtual private cloud (VPC) with subnet and internet gateway as an example and use it in shared service catalog products at the organization level to make it available in child accounts. Every time AWS Control Tower lifecycle events related to account creation occurs, a Lambda function is initiated to perform automation activities in AWS Regions that are not governed by AWS Control Tower.

The solution in this blog post uses the following AWS services:

Figure 1. Solution architecture

Figure 1. Solution architecture

Prerequisites

For this walkthrough, you need the following prerequisites:

  • AWS Control Tower configured with AWS Organizations defined and registered within AWS Control Tower. For this blog post, AWS Control Tower is deployed in AWS Mumbai Region and with an AWS Organizations structure as depicted in Figure 2.
  • Working knowledge of AWS Control Tower.
Figure 2. AWS Organizations structure

Figure 2. AWS Organizations structure

Create an AWS Service Catalog product and portfolio, and share at the AWS Organizations level

  1. Sign in to AWS Control Tower management account as an administrator, and select an AWS Region which is not governed by AWS Control Tower (for this blog post, we will use AWS us-west-1 (N. California) as the Region because at this time it is unavailable in AWS Control Tower).
  2. In the AWS Service Catalog console, in the left navigation menu, choose Products.
  3. Choose upload new product. For Product Name enter customvpcautomation, and for Owner enter organizationabc. For method, choose Use a template file.
  4. In Upload a template file, select Choose file, and then select the CloudFormation template you are going to use for automation. In this example, we are going to use a CloudFormation template which creates a VPC with CIDR 10.0.0.0/16, Public Subnet, and Internet Gateway.
Figure 3. AWS Service Catalog product

Figure 3. AWS Service Catalog product

CloudFormation template: save this as a YAML file before selecting this in the console.

AWSTemplateFormatVersion: 2010-09-09
Description: Template to create a VPC with CIDR 10.0.0.0/16 with a Public Subnet and Internet Gateway. 

Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true
      Tags:
        - Key: Name
          Value: VPC

  IGW:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: IGW

  VPCtoIGWConnection:
    Type: AWS::EC2::VPCGatewayAttachment
    DependsOn:
      - IGW
      - VPC
    Properties:
      InternetGatewayId: !Ref IGW
      VpcId: !Ref VPC

  PublicRouteTable:
    Type: AWS::EC2::RouteTable
    DependsOn: VPC
    Properties:
      VpcId: !Ref VPC
      Tags:
        - Key: Name
          Value: Public Route Table

  PublicRoute:
    Type: AWS::EC2::Route
    DependsOn:
      - PublicRouteTable
      - VPCtoIGWConnection
    Properties:
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref IGW
      RouteTableId: !Ref PublicRouteTable

  PublicSubnet:
    Type: AWS::EC2::Subnet
    DependsOn: VPC
    Properties:
      VpcId: !Ref VPC
      MapPublicIpOnLaunch: true
      CidrBlock: 10.0.0.0/24
      AvailabilityZone: !Select
        - 0
        - !GetAZs
          Ref: AWS::Region
      Tags:
        - Key: Name
          Value: Public Subnet

  PublicRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    DependsOn:
      - PublicRouteTable
      - PublicSubnet
    Properties:
      RouteTableId: !Ref PublicRouteTable
      SubnetId: !Ref PublicSubnet

Outputs:

 PublicSubnet:
    Description: Public subnet ID
    Value:
      Ref: PublicSubnet
    Export:
      Name:
        'Fn::Sub': '${AWS::StackName}-SubnetID'

 VpcId:
    Description: The VPC ID
    Value:
      Ref: VPC
    Export:
      Name:
        'Fn::Sub': '${AWS::StackName}-VpcID'
  1. After the CloudFormation template is selected, choose Review, and then choose Create Product.
Figure 4. AWS Service Catalog product

Figure 4. AWS Service Catalog product

  1. In the AWS Service Catalog console, in the left navigation menu, choose Portfolios, and then choose Create portfolio.
  2. For Portfolio name, enter customvpcportfolio, for Owner, enter organizationabc, and then choose Create.
Figure 5. AWS Service Catalog portfolio

Figure 5. AWS Service Catalog portfolio

  1. After the portfolio is created, select customvpcportfolio. In the actions dropdown, select Add product to portfolio. Then select customvpcautomation product, and choose Add Product to Portfolio.
  2. Navigate back to customvpcportfolio, and select the portfolio name to see all the details. On the portfolio details page, expand the Groups, roles, and users tab, and choose Add groups, roles, users. Next, select the Roles tab and search for AWSControlTowerAdmin role, and choose Add access.
Figure 6. AWS Service Catalog portfolio role selection

Figure 6. AWS Service Catalog portfolio role selection

  1. Navigate to the Share section in portfolio details, and choose Share option. Select AWS Organization, and choose Share.

Note: If you get a warning stating “AWS Organizations sharing is not enabled”, then choose Enable and select the organizational unit (OU) where you want this portfolio to be shared. In this case, we have shared at Workload OU where all workload account is created.

Figure 7. AWS Service Catalog portfolio sharing

Figure 7. AWS Service Catalog portfolio sharing

Create an AWS Identity and Access Management (IAM) role

  1. Sign in to AWS Control Tower management account as an administrator and navigate to IAM Service.
  2. In the IAM console, choose Policies in the navigation pane, then choose Create Policy.
  3. Click on Choose a service, and select STS. In the Actions menu, choose All STS Actions, in Resources, choose All resources, and then choose Next: Tags.
  4. Skip the Tag section, go to the Review section, and for Name enter lambdacrossaccountSTS, and then choose Create policy.
  5. In the navigation pane of the IAM console, choose Roles, and then choose Create role. For the use case, select Lambda, and then choose Next: Permissions.
  6. Select AWSServiceCatalogAdminFullAccess and AmazonSNSFullAccess, then choose Next: Tags (skip tag screen if needed), then choose Next: Review.
  7. For Role name, enter Automationnongovernedregions, and then choose Create role.
Figure 8. AWS IAM role permissions

Figure 8. AWS IAM role permissions

Create an Amazon Simple Notification Service (Amazon SNS) topic

  1. Sign in to AWS Control Tower management account as an administrator and select AWS Mumbai Region (Home Region for AWS CT). Navigate to Amazon SNS Service, and on the navigation panel, choose Topics.
  2. On the Topics page, Choose Create topic. On the Create topic page, in the Details section, for Type select Standard, and for Name enter ControlTowerNotifications. Keep default for other options, and then choose Create topic.
  3. In the Details section, in the left navigation pane, choose Subscriptions.
  4. On the Subscriptions page, choose Create subscription. For Protocol, choose Email and for Endpoint mention the email id where notification need to come and choose Create Subscription.

You will receive an email stating that the subscription is in pending status. Follow the email instructions to confirm the subscription. Check in the Amazon SNS Service console to verify subscription confirmation.

Figure 9. Amazon SNS topic creation and subscription

Figure 9. Amazon SNS topic creation and subscription

Create an AWS Lambda function

  1. Sign in to AWS Control Tower management account as an administrator and select AWS Mumbai Region (Home Region for AWS Control Tower). Open the Functions page on the Lambda console, and choose Create function.
  2.  In the Create function section, choose Author from scratch.
  3. In the Basic information section:
    1. For Function name, enter NonGovernedCrossAccountAutomation.
    2. For Runtime, choose Python 3.8.
    3. For Role, select Choose an existing role.
    4. For Existing role, select the Lambda role that you created earlier.
  1. Choose Create function.
  2. Copy and paste the following code in to the Lambda editor (replace the existing code).
  3. In the File menu, choose Save.

Lambda function code: The Lambda function is developed to initiate the AWS Service Catalog product, shared at Organizations level from AWS Control Tower management account, onto all member accounts in a hub and spoke model. Key activities performed by the Lambda function are:

    • Assume role – Provides the mechanism to assume AWSControlTowerExecution role in the child account.
    • Launch product – Launch the AWS Service Catalog product shared in the non-governed Region with the member account.
    • Email notification – Send notifications to the subscribed recipients.

When this Lambda function is invoked by the AWS Control Tower lifecycle event, it performs the activity of provisioning the AWS Service Catalog products in the Region which is not governed by AWS Control Tower.

#####################################################################################
# Decription:This Lambda used execute service catalog products in unmanaged ControlTower 
# regions while creation of AWS accounts
# Environment: Control Tower Env
# Version 1.0
#####################################################################################

import boto3
import os
import time

SSM_Master = boto3.client('ssm')
STS_Master = boto3.client('sts')
SC_Master = boto3.client('servicecatalog',region_name = 'us-west-1')
SNS_Master = boto3.client('sns')

def lambda_handler(event, context):
    
    if event['detail']['serviceEventDetails']['createManagedAccountStatus']['state'] == 'SUCCEEDED':
        
        account_name = event['detail']['serviceEventDetails']['createManagedAccountStatus']['account']['accountName']
        account_id = event['detail']['serviceEventDetails']['createManagedAccountStatus']['account']['accountId']
        ##Assume role to member account
        assume_role(account_id)  
        
        try:
        
            print("-- Executing Service Catalog Procduct in the account: ", account_name)
            ##Launch Product in member account
            launch_product(os.environ['ProductName'], SC_Member)
            sendmail(f'-- Product Launched successfully ')

        except Exception as err:
        
            print(f'-- Error in Executing Service Catalog Procduct in the account: : {err}')
            sendmail(f'-- Error in Executing Service Catalog Procduct in the account: : {err}')   
    
 ##Function to Assume Role and create session in the Member account.                       
def assume_role(account_id):
    
    global SC_Member, IAM_Member, role_arn
    
    ## Assume the Member account role to execute the SC product.
    role_arn = "arn:aws:iam::$ACCOUNT_NUMBER$:role/AWSControlTowerExecution".replace("$ACCOUNT_NUMBER$", account_id)
    
    ##Assuming Member account Service Catalog.
    Assume_Member_Acc = STS_Master.assume_role(RoleArn=role_arn,RoleSessionName="Member_acc_session")
    aws_access_key_id=Assume_Member_Acc['Credentials']['AccessKeyId']
    aws_secret_access_key=Assume_Member_Acc['Credentials']['SecretAccessKey']
    aws_session_token=Assume_Member_Acc['Credentials']['SessionToken']

    #Session to Connect to IAM and Service Catalog in Member Account                          
    IAM_Member = boto3.client('iam',aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key,aws_session_token=aws_session_token)
    SC_Member = boto3.client('servicecatalog', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key,aws_session_token=aws_session_token,region_name = "us-west-1")
    
    ##Accepting the portfolio share in the Member account.
    print("-- Accepting the portfolio share in the Member account.")
    
    length = 0
    
    while length == 0:
        
        try:
            
            search_product = SC_Member.search_products()
            length = len(search_product['ProductViewSummaries'])
        
        except Exception as err:
            
            print(err)
        
        if length == 0:
        
            print("The shared product is still not available. Hence waiting..")
            time.sleep(10)
            ##Accept portfolio share in member account
            Accept_portfolio = SC_Member.accept_portfolio_share(PortfolioId=os.environ['portfolioID'],PortfolioShareType='AWS_ORGANIZATIONS')
            Associate_principal = SC_Member.associate_principal_with_portfolio(PortfolioId=os.environ['portfolioID'],PrincipalARN=role_arn, PrincipalType='IAM')
        
        else:
            
            print("The products are listed in account.")
    
    print("-- The portfolio share has been accepted and has been assigned the IAM Role principal.")
    
    return SC_Member

##Function to execute product in the Member account.    
def launch_product(ProductName, session):
  
    describe_product = SC_Master.describe_product_as_admin(Name=ProductName)
    created_time = []
    version_ID = []
    
    for version in describe_product['ProvisioningArtifactSummaries']:
        
        describe_provisioning_artifacts = SC_Master.describe_provisioning_artifact(ProvisioningArtifactId=version['Id'],Verbose=True,ProductName=ProductName,)
        
        if describe_provisioning_artifacts['ProvisioningArtifactDetail']['Active'] == True:
            
            created_time.append(describe_provisioning_artifacts['ProvisioningArtifactDetail']['CreatedTime'])
            version_ID.append(describe_provisioning_artifacts['ProvisioningArtifactDetail']['Id'])
    
    latest_version = dict(zip(created_time, version_ID))
    latest_time = max(created_time)
    launch_provisioned_product = session.provision_product(ProductName=ProductName,ProvisionedProductName=ProductName,ProvisioningArtifactId=latest_version[latest_time],ProvisioningParameters=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ])
    print("-- The provisioned product ID is : ", launch_provisioned_product['RecordDetail']['ProvisionedProductId'])
    
    return(launch_provisioned_product)   
    
def sendmail(message):
    
     sendmail = SNS_Master.publish(
     TopicArn=os.environ['SNSTopicARN'],
     Message=message,
     Subject="Alert - Attention Required",
MessageStructure='string')
  1. Choose Configuration, then choose Environment variables.
  2. Choose Edit, and then choose Add environment variable for each of the following:
    1. Variable 1: Key as ProductName, and Value as “customvpcautomation” (name of the product created in the previous step).
    2. Variable 2: Key as SNSTopicARN, and Value as “arn:aws:sns:ap-south-1:<accountid>:ControlTowerNotifications” (ARN of the Amazon SNS topic created in the previous step).
    3. Variable 3: Key as portfolioID, and Value as “port-tbmq6ia54yi6w” (ID for the portfolio which was created in the previous step).
Figure 10. AWS Lambda function environment variable

Figure 10. AWS Lambda function environment variable

  1. Choose Save.
  2. On the function configuration page, on the General configuration pane, choose Edit.
  3. Change the Timeout value to 5 min.
  4. Go to Code Section, and choose the Deploy option to deploy all the changes.

Create an Amazon EventBridge rule and initiate with a Lambda function

  1. Sign in to AWS Control Tower management account as an administrator, and select AWS Mumbai Region (Home Region for AWS Control Tower).
  2. On the navigation bar, choose Services, select Amazon EventBridge, and in the left navigation pane, select Rules.
  3. Choose Create rule, and for Name enter NonGovernedRegionAutomation.
  4. Choose Event pattern, and then choose Pre-defined pattern by service.
  5. For Service provider, choose AWS.
  6. For Service name, choose Control Tower.
  7. For Event type, choose AWS Service Event via CloudTrail.
  8. Choose Specific event(s) option, and select CreateManagedAccount.
  9. In Select targets, for Target, choose Lambda. Select the Lambda function which was created earlier named as NonGovernedCrossAccountAutomation in Function dropdown.
  10. Choose Create.
Figure 11. Amazon EventBridge rule initiated with AWS Lambda

Figure 11. Amazon EventBridge rule initiated with AWS Lambda

Solution walkthrough

    1. Sign in to AWS Control Tower management account as an administrator, and select AWS Mumbai Region (Home Region for AWS Control Tower).
    2. Navigate to the AWS Control Tower Account Factory page, and select Enroll account.
    3. Create a new account and complete the Account Details section. Enter the Account email, Display name, AWS SSO email, and AWS SSO user name, and select the Organizational Unit dropdown. Choose Enroll account.
Figure 12. AWS Control Tower new account creation

Figure 12. AWS Control Tower new account creation

      1. Wait for account creation and enrollment to succeed.
Figure 13. AWS Control Tower new account enrollment

Figure 13. AWS Control Tower new account enrollment

      1. Sign out of the AWS Control Tower management account, and log in to the new account. Select the AWS us-west-1 (N. California) Region. Navigate to AWS Service Catalog and then to Provisioned products. Select the Access filter as Account and you will observe that one provisioned product is created and available.
Figure 14. AWS Service Catalog provisioned product

Figure 14. AWS Service Catalog provisioned product

      1. Go to VPC service to verify if a new VPC is created by the AWS Service Catalog product with a CIDR of 10.0.0.0/16.
Figure 15. AWS VPC creation validation

Figure 15. AWS VPC creation validation

      1. Step 4 and Step 5 validates that you are able to perform the automation during account creation through the AWS Control Tower lifecycle events in non-governed Regions.

Cleaning up

To avoid incurring future charges, clean up the resources created as part of this blog post.

  • Delete the AWS Service Catalog product and portfolio you created.
  • Delete the IAM role, Amazon SNS topic, Amazon EventBridge rule, and AWS Lambda function you created.
  • Delete the AWS Control Tower setup (if created).

Conclusion

In this blog post, we demonstrated how to use AWS Control Tower lifecycle events to perform automation tasks during account creation in Regions not governed by AWS Control Tower. AWS Control Tower provides a way to set up and govern a secure, multi-account AWS environment. With this solution, customers can use AWS Control Tower to automate various tasks during account creation in Regions regardless if AWS Control Tower is available in that Region.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.
Daniel Cordes

Pavan Kumar Alladi

Pavan Kumar Alladi is a Senior Cloud Architect with Tech Mahindra and is based out of Chennai, India. He is working on AWS technologies from past 10 years as a specialist in designing and architecting solutions on AWS Cloud. He is ardent in learning and implementing cloud based cutting edge solutions and is extremely zealous about applying cloud services to resolve complex real world business problems. Currently, he leads customer engagements to deliver solutions for Platform Engineering, Cloud Migrations, Cloud Security and DevOps.

Gaurav Jain

Thooyavan Arumugam

Thooyavan Arumugam is a Senior Cloud Architect at Tech Mahindra’s AWS Practice team. He has over 16 years of industry experience in Cloud infrastructure, network, and security. He is passionate about learning new technologies and helping customers solve complex technical problems by providing solutions using AWS products and services. He provides advisory services to customers and solution design for Cloud Infrastructure (Security, Network), new platform design and Cloud Migrations.

Field Notes: Customizing the AWS Control Tower Account Factory with AWS Service Catalog

Post Syndicated from Remek Hetman original https://aws.amazon.com/blogs/architecture/field-notes-customizing-the-aws-control-tower-account-factory-with-aws-service-catalog/

Many AWS customers who are managing hundreds or thousands of accounts know how complex and time consuming this process can be. To reduce the burden and simplify the process of creating new accounts, last year AWS released a new service, AWS Control Tower.

AWS Control Tower helps you automate the process of setting up a multi-account AWS environment (AWS Landing Zone) that is secure, well-architected, and ready to use. This Landing Zone is created following best practices established through AWS’ experience working with thousands of enterprises as they move to the cloud. This includes the configuration of AWS Organizations, centralized logging, federated access, mandatory guardrails, and networking.

Those elements are a good starting point to cover the initial configuration of the new account. For some organizations, the next step is to baseline a newly vended account to align it with the company policies and compliance requirements. This means create or deploy necessary roles, policies, governance controls, security groups and so on.

In this blog post, I describe a solution that helps achieve consistent governance and compliance requirements across accounts created by AWS Control Tower. The solution uses the AWS Service Catalog as the repository of products that will be deployed into the new accounts.

Prerequisites and assumptions

Before I get into how it works, let’s first review a few key AWS Service Catalog concepts:

  • An AWS Service Catalog product is a blueprint for building your AWS resources that you want to make available for deployment on AWS along with the configuration information.
  • A portfolio is a collection of products, together with the configuration information.
  • A provisioned product is an AWS CloudFormation stack.
  • Constraints control the way users can deploy a product. With launch constraints, you can specify a role that the AWS Service Catalog can assume to launch a product.
  • Review AWS Service Catalog reference blueprints for a quick way to set up and configure AWS Service Catalog portfolios and products.

You need an AWS Service Catalog portfolio with products that you plan to deploy to the accounts. The portfolio has to be created in the AWS Control Tower primary account. If you don’t have a portfolio, the starting point would be to review these resources:

Solution Overview

This solution supports the following scenarios:

  • Deployment of products to the newly vended AWS Control Tower account (Figure 1)
  • Update and deployment of products to existing accounts (Figure 2)
Figure 1 Deployment to new account

Figure 1 – Deployment to new account

The architecture diagram in figure 1 shows the process of deploying products to the new account.

  1. AWS Control Tower creates a new account
  2. Once an account is created successfully, Amazon CloudWatch Events trigger an AWS Lambda function
  3. The Lambda function pulls the configuration from an Amazon S3 bucket and:
    • Validates configuration
    • Grants Lambda role access to portfolio(s)
    • Creates StackSet constrains for products
  4. Lambda calls the AWS Step Function
  5. The AWS Step Function orchestrates deployment of the AWS Service Catalog products and monitors progress
Figure 2 Update or deployment to an existing account

Figure 2 – Update or deployment to an existing account

The architecture diagram in figure 2 shows process of updating product or deploying product to the existing account.

  1. User uploads update configuration to an Amazon S3 Bucket
  2. Update triggers AWS Lambda Function
  3. Lambda function reads uploaded configuration and:
    • Validates configuration
    • Grants Lambda role access to portfolio(s)
    • Creates StackSet constrains for products
  4. Calls AWS Step Function
  5. AWS Step Function orchestrate update/deployment of AWS Service Catalog products and monitoring progress

Deployment and configuration

Before proceeding with the deployment, you must install Python3.

Then, follow these steps:

  1. Download the solution from GitHub
  2. Go to folder src and run the following command:                                                                                    “pip3 install –r requirements.txt –t .”
  3.  Zip content of the src folder to “control-tower-account-factory-solution.zip” file
  4. Upload zip file to your Amazon S3 bucket created in the AWS Region where you want to deploy the solution
  5. Launch the AWS CloudFormation template

AWS CloudFormation Template Parameters:

  • ConfigurationBucketName – name of the Amazon S3 bucket from where AWS Lambda function should pull the configuration file. You can provide the name of an existing bucket.
  • CreateConfigurationBucket – set to true if you want to create a bucket specified in previous parameter. If a bucket exists, set value to false
  • ConfigurationFileName – name of the configuration file for a new account. Default value: config.yml
  • UpdateFileName – name of the configuration file for updates. Default value: update.yml
  • SourceCodeBucketName – name of the bucket where you uploaded the zipped Lambda code
  • SourceCodePackageName – name of the zipped file contain Lambda. If the file was uploaded to folder, include the name of folder(s) as well. Example: my_folder/control-tower-account-factory-solution.zip
  • BaselineFunctionName –name of the AWS Lambda function. Default value: control-tower- account-factory-lambda
  • BaselinetLambdaRoleName –name of the AWS Lambda function IAM role. Default value: control-tower- account-factory-lambda-role
  • StateMachineName  – name of the AWS Step Function. Default value: control-tower- account-factory-state-machine
  • StateMachineRoleName – name of the AWS Step Function IAM role. Default value: control-tower- account-factory-state-machine-role
  • TrackingTableName – name of the DynamoDB table to track all deployment and updates. Default value: control-tower-account-factory-tracking-table
  • MaxIterations –maximum iteration of the AWS Step Function before the reports time out. Default value: 30. It can be overwrite in configuration file. For more information see Configuration section.
  • TopicName – (optional) name of the SNS Topic that will be used for sending notification. Default value: control-tower- account-factory-account-notification
  • NotificationEmail – (optional) the email address where to send the notification. If this isn’t provided, the SNS topic won’t be created.

Configuration Files

The configuration files have to be in the YAML format, you can use the following schemas for new or existing accounts:

Configuration schema for new account:

The configuration can apply to all new accounts or can be divided based on the organization unit associated with the new account. Also, you can mix deployments where some products are always deployed and some only to specific organization units.

Example configuration file

Configuration schema for existing accounts

This configuration can be used to update provisioned products or deploy products into existing accounts. You can specify a target location either as account(s), organization unit(s) or both. If you specified an organization unit as a target, the product(s) will be updated/deployed to all accounts associated with organization unit.

If the product doesn’t exist in the account, the Lambda function attempts to deploy it. You can manage this by setting the parameter ‘deployifnotexist’ to true. If omitted or set to false, Lambda won’t provision the product to an existing account.

Example configuration file

All products deployed by the AWS Service Catalog are provisioned under their own name that has to be unique across all provisioned products. In this example, products are provisioned under the following naming convention:

<account id where product is provision>-<”provision_name” from configuration file>

Example:  123456789-my-product

In the configuration file you can specify dependencies between products. Dependencies need to be provided as the list of provisioned product name not a product name. If provided, deployment of the product will be suspended until all dependencies successfully deployed.

The Step Function is running in the loop with interval of 1 minute and checking if dependencies were deployed. In the event of an error in the dependency naming or configuration, the step function iterates only until it reaches the maximum iterations defined in the configuration file. If the maximum iterations are reached, the step function reports time out and interrupts the products’ deployment.

Important: if you updating a product by adding a new AWS Region, you cannot specify the dependency or run updates in existing regions the same time. In this scenario you should break the update as follows:

  • Upload configuration to create dependencies in the new region
  • Upload configuration to create products only in the new region

You can find different examples of configuration files in GitHub.

Note: The name of the configuration file and Amazon S3 location must match the values provided in the AWS CloudFormation template during solution deployment.

AWS Lambda functions deployment considerations

When deploying product is a Lambda function, you need to consider two requirements:

  1. The source code of the Lambda function needs to be in the Amazon S3 bucket created in the same AWS region where you are planning to deploy a Lambda function
  2. The destination account needs to have permission to the source Amazon S3 bucket

To accommodate both requirements, the approach is to create an Amazon S3 bucket under the AWS Control Tower primary account. There is one Amazon S3 bucket in each Region where the functions will be deployed.

Each deployment bucket should have attached the following policy:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<S3 bucket name>/*",
            "Condition": {
                "StringEquals": {
                    "aws:PrincipalOrgID": "<AWS Organizations Id>"
                },
                "StringLike": {
                    "aws:PrincipalArn": [
                        "arn:aws:iam::*:role/AWSControlTowerExecution",
        "arn:aws:iam::*:role/< name of the Lambda IAM role – default: control-tower-account-factory-lambda-role>"
                    ]
                }
            }
        }
    ]
}

This allows the deployment role to access Lambda’s source code from any account in your AWS Organizations.

Conclusion

In this blog post, I’ve outlined a solution to help you drive consistent governance and compliance requirements across accounts vended though AWS Control Tower. This solution provides enterprises with mechanisms to manage all deployments from a centralized account. Also, this reduces the need for maintaining multiple separate CI/CD pipelines.  Therefore you can simplify and reduce deployment time in a multi-account environment.

This solution allows you keep provisioned products up to date by updating existing products or bringing old accounts to the same compliance level as the new accounts.

Since this solution supports deployment to existing accounts and can be run without AWS Control Tower, it can be used for deployment of any AWS Service Catalog product either in single or multiple account environments. This solution then becomes an integral part of CI/CD pipeline.

For more information, review the AWS Control Tower documentation and AWS Service Catalog documentation. Also, review the links listed in the “Prerequisites and assumptions” section of this post.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Standardizing CI/CD pipelines for .NET web applications with AWS Service Catalog

Post Syndicated from Borja Prado Miguelez original https://aws.amazon.com/blogs/devops/standardizing-cicd-pipelines-net-web-applications-aws-service-catalog/

As companies implement DevOps practices, standardizing the deployment of continuous integration and continuous deployment (CI/CD) pipelines is increasingly important. Your developer team may not have the ability or time to create your own CI/CD pipelines and processes from scratch for each new project. Additionally, creating a standardized DevOps process can help your entire company ensure that all development teams are following security and governance best practices.

Another challenge that large enterprise and small organization IT departments deal with is managing their software portfolio. This becomes even harder in agile scenarios working with mobile and web applications where you need to not only provision the cloud resources for hosting the application, but also have a proper DevOps process in place.

Having a standardized portfolio of products for your development teams enables you to provision the infrastructure resources needed to create development environments, and helps reduce the operation overhead and accelerate the overall development process.

This post shows you how to provide your end-users a catalog of resources with all the functionality a development team needs to check in code and run it in a highly scalable load balanced cloud compute environment.

We use AWS Service Catalog to provision a cloud-based AWS Cloud9 IDE, a CI/CD pipeline using AWS CodePipeline, and the AWS Elastic Beanstalk compute service to run the website. AWS Service Catalog allows organizations to keep control of the services and products that can be provisioned across the organization’s AWS account, and there’s an effective software delivery process in place by using CodePipeline to orchestrate the application deployment. The following diagram illustrates this architecture.

Architecture Diagram

You can find all the templates we use in this post on the AWS Service Catalog Elastic Beanstalk Reference architecture GitHub repo.

Provisioning the AWS Service Catalog portfolio

To get started, you must provision the AWS Service Catalog portfolio with AWS CloudFormation.

  1. Choose Launch Stack, which creates the AWS Service Catalog portfolio in your AWS account.Launch Stack action
  2. If you’re signed into AWS as an AWS Identity and Access Management (IAM) role, add your role name in the LinkedRole1 parameter.
  3. Continue through the stack launch screens using the defaults and choosing Next.
  4. Select the acknowledgements in the Capabilities box on the third screen.

When the stack is complete, a top-level CloudFormation stack with the default name SC-RA-Beanstalk-Portfolio, which contains five nested stacks, has created the AWS Service Catalog products with the services the development team needs to implement a CI/CD pipeline and host the web application. This AWS Service Catalog reference architecture provisions the AWS Service Catalog products needed to set up the DevOps pipeline and the application environment.

Cloudformation Portfolio Stack

When the portfolio has been created, you have completed the administrator setup. As an end-user (any roles you added to the LinkedRole1 or LinkedRole2 parameters), you can access the portfolio section on the AWS Service Catalog console and review the product list, which now includes the AWS Cloud9 IDE, Elastic Beanstalk application, and CodePipeline project that we will use for continuous delivery.

Service Catalog Products

On the AWS Service Catalog administrator section, inside the Elastic Beanstalk reference architecture portfolio, we can add and remove groups, roles, and users by choosing Add groups, roles, users on the Group, roles, and users tab. This lets us enable developers or other users to deploy the products from this portfolio.

Service Catalog Groups, Roles, and Users

Solution overview

The rest of this post walks you through how to provision the resources you need for CI/CD and web application deployment. You complete the following steps:

  1. Deploy the CI/CD pipeline.
  2. Provision the AWS Cloud9 IDE.
  3. Create the Elastic Beanstalk environment.

Deploying the CI/CD pipeline

The first product you need is the CI/CD pipeline, which manages the code and deployment process.

  1. Sign in to the AWS Service Catalog console in the same Region where you launched the CloudFormation stack earlier.
  2. On the Products list page, locate the CodePipeline product you created earlier.
    Service Catalog Products List
  3. Choose Launch product.

You now provision the CI/CI pipeline. For this post, we use some name examples for the pipeline name, Elastic Beanstalk application name, and code repository, which you can of course modify.

  1. Enter a name for the provisioned Codepipeline product.
  2. Select the Windows version and click Next.
  3. For the application and repository name, enter dotnetapp.
  4. Leave all other settings at their default and click Next.
  5. Choose Launch to start the provisioning of the CodePipeline product.

When you’re finished, the provisioned pipeline should appear on the Provisioned products list.

CodePipeline Product Provisioned

  1. Copy the CloneUrlHttp output to use in a later step.

You now have the CI/CD pipeline ready, with the code repository and the continuous integration service that compiles the code, runs tests, and generates the software bundle stored in Amazon Simple Storage Service (Amazon S3) ready to be deployed. The following diagram illustrates this architecture.

CodePipeline Configuration Diagram

When the Elastic Beanstalk environment is provisioned, the deploy stage takes care of deploying the bundle application stored in Amazon S3, so the DevOps pipeline takes care of the full software delivery as shown in the earlier architecture diagram.

The Region we use should support the WINDOWS_SERVER_2019_CONTAINER build image that AWS CodeBuild uses. You can modify the environment type or create a custom one by editing the CloudFormation template used for the CodePipeline for Windows.

Provisioning the AWS Cloud9 IDE

To show the full lifecycle of the deployment of a web application with Elastic Beanstalk, we use a .NET web application, but this reference architecture also supports Linux. To provision an AWS Cloud9 environment, complete the following steps:

  1. From the AWS Service Catalog product list, choose the AWS Cloud9 IDE.
  2. Click Launch product.
  3. Enter a name for the provisioned Cloud9 product and click Next.
  4. Enter an EnvironmentName and select the InstanceType.
  5. Set LinkedRepoPath to /dotnetapp.
  6. For LinkedRepoCloneUrl, enter the CloneUrlHttp from the previous step.
  7. Leave the default parameters for tagOptions and Notifications, and click Launch.
    Cloud9 Environment Settings

Now we download a sample ASP.NET MVC application in the AWS Cloud9 IDE, move it under the folder we specified in the previous step, and push the code.

  1. Open the IDE with the Cloud9Url link from AWS Service Catalog output.
  2. Get the sample .NET web application and move it under the dotnetapp. See the following code:
  3. cp -R aws-service-catalog-reference-architectures/labs/SampleDotNetApplication/* dotnetapp/
  1. Check in to the sample application to the CodeCommit repo:
  2. cd dotnetapp
    git add --all
    git commit -m "initial commit"
    git push

Now that we have committed the application to the code repository, it’s time to review the DevOps pipeline.

  1. On the CodePipeline console, choose Pipelines.

You should see the pipeline ElasticBeanstalk-ProductPipeline-dotnetapp running.

CodePipeline Execution

  1. Wait until the three pipeline stages are complete, this may take several minutes.

The code commitment and web application build stages are successful, but the code deployment stage fails because we haven’t provisioned the Elastic Beanstalk environment yet.

If you want to deploy your own sample or custom ASP.NET web application, CodeBuild requires the build specification file buildspec-build-dotnet.yml for the .NET Framework, which is located under the elasticbeanstalk/codepipeline folder in the GitHub repo. See the following example code:

version: 0.2
env:
  variables:
    DOTNET_FRAMEWORK: 4.6.1
phases:
  build:
    commands:
      - nuget restore
      - msbuild /p:TargetFrameworkVersion=v$env:DOTNET_FRAMEWORK /p:Configuration=Release /p:DeployIisAppPath="Default Web Site" /t:Package
      - dir obj\Release\Package
artifacts:
  files:
    - 'obj/**/*'
    - 'codepipeline/*'

Creating the Elastic Beanstalk environment

Finally, it’s time to provision the hosting system, an Elastic Beanstalk Windows-based environment, where the .NET sample web application runs. For this, we follow the same approach from the previous steps and provision the Elastic Beanstalk AWS Service Catalog product.

  1. On the AWS Service Catalog console, on the Product list page, choose the Elastic Beanstalk application product.
  2. Choose Launch product.
  3. Enter an environment name and click Next.
  4. Enter the application name.
  5. Enter the S3Bucket and S3SourceBundle that were generated (you can retrieve them from the Amazon S3 console).
  6. Set the SolutionStackName to 64bit Windows Server Core 2019 v2.5.8 running IIS 10.0. Follow this link for up to date platform names.
  7. Elastic Beanstalk Environment Settings
  1. Launch the product.
  2. To verify that you followed the steps correctly, review that the provisioned products are all available (AWS Cloud9 IDE, Elastic Beanstalk CodePipeline project, and Elastic Beanstalk application) and the recently created Elastic Beanstalk environment is healthy.

As in the previous step, if you’re planning to deploy your own sample or custom ASP.NET web application, AWS CodeDeploy requires the deploy specification file buildspec-deploy-dotnet.yml for the .NET Framework, which should be located under the codepipeline folder in the GitHub repo. See the following code:

version: 0.2
phases:
  pre_build:
    commands:          
      - echo application deploy started on `date`      
      - ls -l
      - ls -l obj/Release/Package
      - aws s3 cp ./obj/Release/Package/SampleWebApplication.zip s3://$ARTIFACT_BUCKET/$EB_APPLICATION_NAME-$CODEBUILD_BUILD_NUMBER.zip
  build:
    commands:
      - echo Pushing package to Elastic Beanstalk...      
      - aws elasticbeanstalk create-application-version --application-name $EB_APPLICATION_NAME --version-label v$CODEBUILD_BUILD_NUMBER --description "Auto deployed from CodeCommit build $CODEBUILD_BUILD_NUMBER" --source-bundle S3Bucket="$ARTIFACT_BUCKET",S3Key="$EB_APPLICATION_NAME-$CODEBUILD_BUILD_NUMBER.zip"
      - aws elasticbeanstalk update-environment --environment-name "EB-ENV-$EB_APPLICATION_NAME" --version-label v$CODEBUILD_BUILD_NUMBER

The same codepipeline folder contains some build and deploy specification files besides the .NET ones, which you could use if you prefer to use a different framework like Python to deploy a web application with Elastic Beanstalk.

  1. To complete the application deployment, go to the application pipeline and release the change, which triggers the pipeline with the application environment now ready.
    Deployment Succeeded

When you create the environment through the AWS Service Catalog, you can access the provisioned Elastic Beanstalk environment.

  1. In the Events section, locate the LoadBalancerURL, which is the public endpoint that we use to access the website.
    Elastic Beanstalk LoadBalancer URL
  1. In our preferred browser, we can check that the website has been successfully deployed.ASP.NET Sample Web Application

Cleaning up

When you’re finished, you should complete the following steps to delete the resources you provisioned to avoid incurring further charges and keep the account free of unused resources.

  1. The CodePipeline product creates an S3 bucket which you must empty from the S3 console.
  2. On the AWS Service Catalog console, end the provisioned resources from the Provisioned products list.
  3. As administrator, in the CloudFormation console, delete the stack SC-RA-Beanstalk-Portfolio.

Conclusions

This post has shown you how to deploy a standardized DevOps pipeline which was then used to manage and deploy a sample .NET application on Elastic Beanstalk using the Service Catalog Elastic Beanstalk reference architecture. AWS Service Catalog is the ideal service for administrators who need to centrally provision and manage the AWS services needed with a consistent governance model. Deploying web applications to Elastic Beanstalk is very simple for developers and provides built in scalability, patch maintenance, and self-healing for your applications.

The post includes the information and references on how to extend the solution with other programming languages and operating systems supported by Elastic Beanstalk.

About the Authors

Borja Prado
Borja Prado Miguelez

Borja is a Senior Specialist Solutions Architect for Microsoft workloads at Amazon Web Services. He is passionate about web architectures and application modernization, helping customers to build scalable solutions with .NET and migrate their Windows workloads to AWS.

Chris Chapman
Chris Chapman

Chris is a Partner Solutions Architect covering AWS Marketplace, Service Catalog, and Control Tower. Chris was a software developer and data engineer for many years and now his core mission is helping customers and partners automate AWS infrastructure deployment and provisioning.

Build a self-service environment for each line of business using Amazon EMR and AWS Service Catalog

Post Syndicated from Tanzir Musabbir original https://aws.amazon.com/blogs/big-data/build-a-self-service-environment-for-each-line-of-business-using-amazon-emr-and-aws-service-catalog/

Enterprises often want to centralize governance and compliance requirements, and provide a common set of policies on how Amazon EMR instances should be set up. You can use AWS Service Catalog to centrally manage commonly deployed Amazon EMR cluster configurations, and this helps you achieve consistent governance and meet your compliance requirements, while at the same time enabling your end users to quickly deploy only the approved EMR cluster configurations on a self-service basis.

In this post, we will demonstrate how enterprise administrators can use AWS Service Catalog to create and manage catalogs, that data engineers and data scientists use to quickly discover and deploy clusters using a self-service environment. With AWS Service Catalog you can control which EMR release versions are available, cluster configuration, and permission access by individual, group, department, or cost center.

The following are a few key AWS Service Catalog concepts:

  • An AWS Service Catalog product is a blueprint for building the AWS resources that you want available for deployment. You create your products by importing AWS CloudFormation templates.
  • A portfolio is a collection of products. With AWS Service Catalog, you can create a customized portfolio for each type of user in your organization and selectively grant access to the appropriate portfolio.
  • A provisioned product is a collection of resources that result from instantiating an AWS CloudFormation

Use cases

You can use AWS Service Catalog to provide Amazon EMR as a self-serve Extract, Transform, Load (ETL) platform at scale while hiding all the security and network configurations from end users.

As an administrator in AWS Service Catalog, you can create one or more Service Catalog products that define different configurations to be used for EMR clusters. In those Service Catalog products, you can define the security and network configurations to be used for the EMR cluster, you can define auto-scaling rules, instance configurations, different purchase options, or you can preconfigure EMR to run different EMR Step jobs. On the other hand, as a user in Service Catalog, you can browse through different EMR templates through Service Catalog products and provision the product based on your requirement. By following this approach, you can make your EMR usage self-serviceable, reduce the EMR learning curve for your users, and ensure adherence to security standards and best practices.

The following image illustrates how the interactions look between Amazon EMR administrators and end-users when using AWS Service Catalog to provision EMR clusters.

The use cases in this post have three AWS Identity and Access Management (IAM) users with different access permissions:

  • emr-admin: This user is the administrator and has access to all the resources. This user creates EMR clusters for their end-users based on their requirements.
  • emr-data-engineer: The data engineer uses Spark and Hive most of the time. They run different ETL scripts on Hive and Spark to process, transform, and enrich their datasets.
  • emr-data-analyst: This user is very familiar with SQL and mostly uses Hue to submit queries to Hive.

You can solve several Amazon EMR operational use cases using AWS Service Catalog. The following sections discuss three different use cases. Later in this post, you walk through each of the use cases with a solution.

Use case 1: Ensuring least privilege and appropriate access

The administrator wants to enforce a few organizational standards. The first one is no default EMR_EC2_ROLE for any EMR cluster. Instead, the administrator wants to have a role that has limited access to Amazon Simple Storage Service (Amazon S3) and assigns that role automatically every time an EMR cluster is launched. Second, end-users sometimes forget to add appropriate tags to their resources. Because of that, often times it is hard for the administrator to identify their resources and allocate cost appropriately. So, the administrator wants to have a mechanism that assigns tags to EMR clusters automatically when they launch.

Use case 2: Providing Amazon EMR as a self-serve ETL platform with Spark and Hive

Data engineers use Spark and Hive applications, and they prefer to have a platform where they just submit their jobs without spending time creating the cluster. They also want to try out different Amazon EMR versions to see how their jobs run on different Spark or Hive versions. They don’t want to spend time learning AWS or Amazon EMR. Additionally, the administrator doesn’t want to give full Amazon EMR access to all users.

Use case 3: Automatically scaling the Hive cluster for analysts

Data analysts have strong SQL backgrounds, so they typically use Hue to submit their Hive queries. They run queries against a large dataset, so they want to have an EMR cluster that can scale when needed. They also don’t have access to the Amazon EMR console and don’t know how to configure automatic scaling for Amazon EMR.

Solution overview

Service Catalog, self-serve your Amazon EMR users, enforce best practices and compliance, and speed up the adoption process.

At a high level, the solution includes the following steps:

  1. Configuring the AWS environment to run this solution.
  2. Creating a CloudFormation template.
  3. Setting up AWS Service Catalog products and portfolios.
  4. Managing access to AWS Service Catalog and provisioning products.
  5. Demonstrating the self-service Amazon EMR platform for users.
  6. Enforcing best practices and compliance through AWS Service Catalog.
  7. Executing ETL workloads on Amazon EMR using AWS Service Catalog.
  8. Optionally, setting up AWS Service Catalog and launching Amazon EMR products through the AWS Command Line Interface (AWS CLI).

The following section looks at the CloudFormation template, which you use to set up the AWS environment to run this solution.

Setting up the AWS environment

To set up this solution, you need to create a few AWS resources. The CloudFormation template provided in this post creates all the required AWS resources. This template requires you to pass the following parameters during the launch:

  • A password for your test users.
  • An Amazon Compute Cloud (Amazon EC2) key pair.
  • The latest AMI ID for the EC2 helper instance. This instance configures the environment and sets up the required files and templates for this solution.

This template is designed only to show how you can use Amazon EMR with AWS Service Catalog. This setup isn’t intended for production use without modification.

To launch the CloudFormation stack, choose Launch Stack:

Launching this stack creates several AWS resources. The following resources shown in the AWS CloudFormation output are the ones you need in the next step:

Key Description
ConsoleLoginURL URL you use to switch between multiple users
EMRSCBlogBucket Name of the S3 bucket to store blog-related files
UserPassword Password to use for all the test users
DataAdminUsername IAM user name for the administrator user
DataEngineerUsername IAM user name for the data engineer user
DataAnalystUsername IAM user name for the data analyst user
HiveScriptURL Amazon S3 path for the Hive script
HiveETLInputParameter Path for the Hive input parameter
HiveETLOutputParameter Path for the Hive output parameter
SparkScriptURL Amazon S3 path for the Spark script
SparkETLInputParameter Path for the Spark input parameter
SparkETLOutputParameter Path for the Spark output parameter

When the CloudFormation template is complete, record the outputs listed on the Outputs tab on the AWS CloudFormation console. See the following screenshot.

(Optional) Configuring the AWS CLI

The AWS CLI is a unified tool to manage your AWS services. In the optional step, you use the AWS CLI to create AWS Service Catalog products and portfolios. Installation of AWS CLI isn’t required for this solution. For instructions on configuring the AWS CLI in your environment, see Configuring the AWS CLI.

Provisioning EMR clusters through AWS Service Catalog

You can create AWS Service Catalog products from the existing CloudFormation template and use those products to provision a variety of EMR clusters. You can create an EMR cluster and consume the cluster’s services without having access to the cluster, which improves the Amazon EMR adoption process.

The following CloudFormation template creates an EMR cluster. This template takes two parameters:

  • Cluster size – You select how many core nodes you want to have in the EMR cluster
  • Compute type – Based on the compute type you choose; the template selects the respective EC2 instance type

As an account administrator, you can define the internal configuration for the EMR cluster. End users are not required to know all the security groups, subnet ID, key pair, and other information. They also don’t need to access the EMR cluster or spend time setting up your cluster. As an administrator, you define a template for the cluster; enforce all the compliance, versions, applications, automatic scaling rules through the CloudFormation template, and expose this template as a product through AWS Service Catalog.

The following section walks you through the solution for each use case.

Use cases walkthrough

The CloudFormation template already configured AWS Service Catalog portfolios and products. You can review these on the AWS Service Catalog console.

  1. Use the ConsoleLoginURL from the AWS CloudFormation console Outputs tab and sign in as an emr-admin user.
  2. On the AWS Service Catalog console, you can see two portfolios for engineers and analysts. In each of those portfolios, you can see two products.

The Data Analysts Stack contains products for the analyst and is assigned to the user emr-data-analyst. The Data Engineering Stack contains products for engineers and is assigned to the emr-data-engineer user. Upon logging in, they can see their respective products and portfolios.

Use case 1: Ensuring least privilege and appropriate access

The cluster administrator creates the least privilege IAM role for their users and associated that role through the Service Catalog product. Similarly, the administrator also assigns appropriate tags for each product. When data engineers or analysts launch an EMR cluster using any of their assigned products, the cluster has the least privilege access and resources are tagged automatically. To confirm this access is in place, complete the following steps:

  1. Sign in to the AWS Management Console as either emr-data-engineer user or emr-data-analyst user.

Your console looks slightly different because the end-user does not manage the products, they just use the product to launch the clusters or execute jobs on the cluster.

  1. Choose Default EMR and provision this product by choosing Launch Product.
  2. For the name of the provisioned product, enter SampleEMR.

The next screen shows a list of allowed parameters your administrator thinks you may need.

  1. Leave all parameters as default.
  2. For the cluster name, enter Sample EMR.
  3. Review all the information and launch the product.

It takes few minutes to spin up the cluster. When the cluster is ready, the status changes to Succeeded. The provision product page also shows you a list of outputs your product owner wants you to see. For example, using output values, your product owner can share Master DNS Address, Resource Manager URL, and Hue URL as shown in the following figure.

To verify if this launched EMR cluster has the expected IAM role and tags, sign in as emr-admin user and go to the AWS EMR Console to review the service role for EC2 instances and tags.

Use case 2: Providing Amazon EMR as a self-serve ETL platform with Spark and Hive

For this use case, data engineers have two different ETL scripts:

  • A Spark script that reads Amazon reviews stored in Amazon S3 and converts them into Parquet before writing back to Amazon S3
  • A Hive script that reads Amazon reviews data from Amazon S3 and finds out the top toys based on customer ratings.

The administrator creates a product to self-serve these users; the product defines the job type and the job parameters. End users selects the job type and passes script, input and output locations.

  1. Sign in as emr-data-engineer.
  2. Select the EMR ETL Engine product.
  3. Choose Launch.

The next page shows if the product has multiple versions. Because the engineer wants to try out two different Amazon EMR versions, the administrator provided both options through the product version. You can launch the EMR cluster with the required version by selecting your preferred product version.

  1. Enter the name of the product.
  2. For this post, select EMR 5.29.0.
  3. Choose Next.

  1. For JobType, choose Spark.
  2. For JobArtifacts, enter the following value (you can get these values from the AWS CloudFormation output):
s3://blog-emr-sc-<account-id>/scripts/spark_converter.py s3://amazon-reviews-pds/tsv/amazon_reviews_us_Toys_v1_00.tsv.gz s3://blog-emr-sc-<account-id>/spark/
  1. Choose Next.

Based on your configuration, an EMR cluster launches. When the cluster is ready, the Spark job runs.

  1. In a different browser, sign in as emr-admin using the ConsoleLoginURL (from the AWS CloudFormation output).

You can see the cluster status, job status, and output path from the Amazon EMR console.

Now, go to Amazon S3 console to check the output path:

The Parquet files are written inside the Spark folder.

  1. To test the Hive job, go back to the first browser where you already signed in as emr-data-engineer.
  2. Choose Provisioned products list.
  3. Choose the product options menu (right-click) and choose Update provisioned product.

  1. On the next page, you can select a different version or the same version.
  2. In the Parameters section, choose Hive.
  3. In the JobArtifacts field, enter the following Hive parameters:
s3://blog-emr-sc-<account-id>/scripts/hive_converter.sql -d INPUT=s3://amazon-reviews-pds/tsv/ -d OUTPUT=s3://blog-emr-sc-<account-id>/hive/
  1. Choose Update.

If you select the same version, AWS Service Catalog compares the old provisioned product with the updated product and only runs the portion that you changed. For this post, I chose the same Amazon EMR version and only updated the job type and parameters. You can see that the same EMR cluster is still there, but on the Steps tab, a new step is executed for Hive.

  1. On the Amazon S3 console using the second browser, verify that a new folder hive is created with data that represents top toys based on Amazon reviews.

To recap, you saw how to use AWS Service Catalog to provide a product to run your ETL jobs. Your data engineers can focus on their ETL scripts and your platform can self-serve them to run their ETL jobs on the EMR cluster.

Use case 3: Automatically scaling the Hive cluster for data analysts

To automatically scale the Hive cluster for data analysts, complete the following steps:

  1. Using the console login URL from the AWS CloudFormation output, and sign in as emr-data-analyst and go to AWS Service Catalog console.

You can see a different set of products for this user.

For this use case, your data analysts want to have an automatically scaling EMR cluster with Hive application. The administrator set up the Auto-scaling EMR product with preconfigured rules.

  1. Choose Auto-scaling EMR.
  2. Enter a provisioned product name.
  3. Select Hive Auto-scaling.
  4. Choose Next.
  5. In the Parameters section, leave the options at their default and enter a cluster name.
  6. Launch the product.

The product owner also provided a client URL (for example, Hue URL) through the product output so business analysts can connect to it.

  1. Sign in as emr-admin and validate if this new cluster is configured with the expected automatic scaling rules.
  2. On the Amazon EMR console, choose the cluster.

You can see the configuration on the Hardware tab.

In this use case, you learned how to use AWS Service Catalog to provide business analyst users a preconfigured, automatically scaled EMR cluster.

(Optional) Setting up AWS Service Catalog for Amazon EMR using AWS CLI

In the previous section, I demonstrated the solution using the AWS Service Catalog console. In the following section, I will show you how you use AWS Service Catalog using the AWS CLI. You can create AWS Service Catalog products and portfolios, assign IAM principals, and launch products.

  1. Create a portfolio named CLI – Stack for the user emr-admin. See the following command:
aws --region us-east-1 servicecatalog create-portfolio --display-name "CLI - Stack" --provider-name "@emr-admin" --description "Sample stack for pre-defined EMR clusters"

You receive a JSON output.

  1. Record the portfolio id port-xxxxxxxx from the output to use later.

The emr-admin user is the provider for this portfolio. The user is created with power user access, so the user can see the full-service catalog console and can manage products and portfolios.

You can associate this portfolio with multiple users. By assigning them to a portfolio, they can use the portfolio, browse through its products, and provision new products. For this use case, you associate a portfolio to emr-admin and the AWS CLI user name (the name of the user that you used to configure your AWS CLI). Make sure to update the portfolio and AWS account ID.

  1. Enter the following code:
aws --region us-east-1 servicecatalog associate-principal-with-portfolio --portfolio-id port-xxxxxxxxxx --principal-type IAM --principal-arn arn:aws:iam::xxxxx:user/emr-admin

aws --region us-east-1 servicecatalog associate-principal-with-portfolio --portfolio-id port-xxxxxxxxxx --principal-type IAM --principal-arn arn:aws:iam::xxxxx:user/<aws-cli-user-name>
  1. To verify the portfolio to the user’s association, enter the following command with the portfolio ID:
aws --region us-east-1 servicecatalog list-principals-for-portfolio --portfolio-id port-xxxxxxxxx

It will list out the associated principals for the above portfolio as shown in this following figure:

The CloudFormation template already copied the Amazon EMR template into your Amazon S3 account at the path s3://blog-emr-sc-<account-id>/products.

  1. To create the product CLI - Sample EMR using that template from Amazon S3, enter the following command:
aws --region us-east-1 servicecatalog create-product --name "CLI - Sample EMR" --owner "@emr-admin" --description "Sample EMR cluster with default" --product-type CLOUD_FORMATION_TEMPLATE --provisioning-artifact-parameters '{"Name": "Initial revision", "Description": "", "Info":{"LoadTemplateFromURL":"https://s3.amazonaws.com/blog-emr-sc-<account-id>/products/sample-cluster.template"},"Type":"CLOUD_FORMATION_TEMPLATE"}'

  1. Record the product ID and provision ID from the JSON output.

You now have a product and a portfolio. A portfolio can have one to many products, and each product can have multiple versions.

  1. To assign the CLI -Sample EMR product to the portfolio you created in Step 1, enter the following command:
aws --region us-east-1 servicecatalog associate-product-with-portfolio --product-id prod-xxxxxx --portfolio-id port-xxxxxx

A launch constraint specifies the IAM role that AWS Service Catalog assumes when an end-user launches a product. With a launch constraint, you can control end-user access to your AWS resources and limit usage.

The CloudFormation template already created the role Blog-SCLaunchRole; create a launch constraint using that IAM role. Use the portfolio and product IDs that you collected from the previous step and your AWS account ID.

  1. To create the launch constraint, enter the following command:
aws --region us-east-1 servicecatalog create-constraint --type LAUNCH --portfolio-id port-xxxxxx --product-id prod-xxxxxx --parameters '{"RoleArn" : "arn:aws:iam::<account-id>:role/Blog-SCLaunchRole"}'

  1. Record the launch constraint ID to use later.

You now have an AWS Service Catalog product that you can use to provision an EMR cluster. The CloudFormation template that you used to create the CLI - Sample EMR product takes three parameters (ClusterName, ComputeRequirements, ClusterSize).

  1. To pass those three parameters as a key value pair, enter the following command (use the product ID and provision ID that you recorded earlier):
aws --region us-east-1 servicecatalog provision-product --product-id prod-xxxxxx --provisioning-artifact-id pa-xxxxx --provisioned-product-name cli-emr --provisioning-parameters Key=ClusterName,Value=cli-emr-cluster Key=ComputeRequirements,Value=CPU Key=ClusterSize,Value=2

  1. Check the provisioned product’s status by using the provisioned product ID:
aws --region us-east-1 servicecatalog describe-provisioned-product --id pp-xxxxx

To recap, in this section you learned how to use AWS Service Catalog CLI to configure AWS Service Catalog products and portfolios, and how to provision an EMR cluster through AWS Service Catalog product.

Cleaning up

To clean up the resources you created, complete the following steps:

  1. Terminate the product that you provisioned in the previous step:
aws --region us-east-1 servicecatalog terminate-provisioned-product --provisioned-product-id pp-xxxxx
  1. Disassociate the product CLI – Sample EMR from the portfolio CLI – Stack:
aws --region us-east-1 servicecatalog disassociate-product-from-portfolio --product-id prod-xxxxx --portfolio-id port-xxxxx
  1. Disassociate IAM principals from the portfolio CLI – Stack:
aws --region us-east-1 servicecatalog disassociate-principal-from-portfolio --portfolio-id port-xxxxx --principal-arn arn:aws:iam::xxxxxx:user/emr-admin

aws --region us-east-1 servicecatalog disassociate-principal-from-portfolio --portfolio-id port-xxxxx --principal-arn arn:aws:iam::xxxxxx:user/<aws-cli-user-name> 
  1. Delete the launch constraint created in the previous step:
aws --region us-east-1 servicecatalog delete-constraint --id cons-xxxxx
  1. Delete the product CLI – Sample EMR:
aws --region us-east-1 servicecatalog delete-product --id prod-xxxxx
  1. Delete the portfolio CLI – Stack:
aws --region us-east-1 servicecatalog delete-portfolio --id port-xxxxx

Cleaning up additional resources

You must also clean up the resources you created with the CloudFormation template.

  1. On the AWS Service Catalog console, choose Provisioned products list.
  2. Terminate each product that you provisioned for these use cases.
  3. Check each of the users and their provisioned products to make sure they’re terminated.
  4. On the Amazon S3 console, empty the bucket blog-emr-sc-<account-id>.
  5. If you are using the AWS CLI, delete the objects in the blog-emr-sc-<account-id> bucket with the following command (make sure you’re running this command on the correct bucket):
aws S3 s3://blog-emr-sc-<account-id> --recursive
  1. If you ran the optional AWS CLI section, make sure you follow the cleanup process mentioned in that section.
  2. On the AWS CloudFormation console or AWS CLI, delete the stack named Blog-EMR-Service-Catalog.

Next steps

To enhance this solution, you can explore the following options:

  • In this post, I enforced resource tagging through AWS CloudFormation. You can also use the AWS Service Catalog TagOptions library to provide a consistent taxonomy and tagging of AWS Service Catalog resources. During a product launch (provisioning), AWS Service Catalog aggregates the associated portfolio and product TagOptions and applies them to the provisioned product.
  • This solution demonstrates the usage of launch constraints and how you can provide limited access to your AWS resources to your users. You can also use template constraints to manage parameters. Template constraints make sure that end-users only have options that you allow them when launching products. This can help you maintain your organization’s compliance requirements.
  • You can integrate AWS Budgets with AWS Service Catalog. By associating AWS Budgets with your products and portfolios, you can track your usage and service costs. You can set a custom budget for each of the portfolios and trigger alerts when your costs exceed your threshold.

Summary

In this post, I showed you how you can simplify your Amazon EMR provisional process using the AWS Service Catalog, how to make Amazon EMR a self-service platform for your end-users, and how you can enforce best practices and compliance to your EMR clusters. You also walked through three different use cases and implemented solutions with AWS Service Catalog. Give this solution a try and share your experience with us!

 


About the Author

Tanzir Musabbir is a Data & Analytics Architect with AWS. At AWS, he works with our customers to provide them architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena & AWS Glue. Tanzir is a big Real Madrid fan and he loves to travel in his free time.