Tag Archives: Best practices

Using VPC Sharing for a Cost-Effective Multi-Account Microservice Architecture

Post Syndicated from Anandprasanna Gaitonde original https://aws.amazon.com/blogs/architecture/using-vpc-sharing-for-a-cost-effective-multi-account-microservice-architecture/

Introduction

Many cloud-native organizations building modern applications have adopted a microservice architecture because of its flexibility, performance, and scalability. Even customers with legacy and monolithic application stacks are embarking on an application modernization journey and opting for this type of architecture. A microservice architecture allows applications to be composed of several loosely coupled discreet services that are independently deployable, scalable, and maintainable. These applications can comprise a large number of microservices, which often span multiple business units within an organization. These customers typically have a multi-account AWS environment with each AWS account belonging to an individual business unit. Their microservice implementations reside in the Virtual Public Clouds (VPCs) of their respective AWS accounts. You can set up multi-account AWS environment incorporating best practices using AWS Landing Zone or AWS Control Tower.

This type of multi-account, multi-VPC architecture provides a good boundary and isolation for individual microservices and achieves a highly available, scalable, and secure architecture. However, for microservices that require a high degree of interconnectivity and are within the same trust boundaries, you can use other AWS capabilities to optimize cost and network management complexity.

This blog presents a cost-effective approach that requires less VPC management while still using separate accounts for billing and access control. This approach does not sacrifice scalability, high availability, fault tolerance, and security. To achieve a similar microservice architecture, you can share a VPC across AWS accounts using AWS Resource Access Manager (AWS RAM) and Network Load Balancer (NLB) support in a shared Amazon Virtual Private Cloud (VPC). This allows multiple microservices to coexist in the same VPC, even though they are developed by different business units.

Microservices architecture in a multi-VPC approach

In this architecture, microservices deployed across multiple VPCs use privately exposed endpoints for better security posture instead of going over the internet. This requires the customers to enable inter-VPC communication using the various networking capabilities of AWS as shown below:

microservices deployed across multiple VPCs use privately exposed endpoints

In the above reference architecture, we created a VPC in Account A, which is hosting the front end of the application across a fleet of Amazon Elastic Compute Cloud (Amazon EC2) instances using an AWS Auto Scaling group. For simplicity, we’ve illustrated a single public and private subnet for the application front end. In reality, this spans across multiple subnets across multiple Availability Zones (AZ) to support a highly available and fault-tolerant configuration.

To ensure security, the application must communicate privately to microservices mS1 and mS2 deployed in VPC of Account B and Account C respectively. For high availability, these microservices are also implemented using a fleet of Amazon EC2 instances with the Auto Scaling group spanning across multiple subnets/availability zones. For high-performance load balancing, they are fronted by a Network Load Balancer.

While this architecture shows an implementation using Amazon EC2, it can also use containerized services deployed using Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). These microservices may have interdependencies and invoke each other’s’ APIs for servicing the requests of the application layer. This application to mS and mS to mS communication can be achieved using following possible connectivity options:

When only few VPC interconnections are required, Amazon VPC peering and AWS PrivateLink may be a viable option. For higher number of VPC interconnections, we recommend AWS Transit Gateway for better manageability of connections and routing through a centralized resource. However, based on the amount of traffic this can introduce significant costs to your architecture.

Alternative approach to microservice architecture using Network Load Balancers in a shared VPC

The above architecture pattern allows your individual microservice teams to continue to own their AWS resources that host their microservice implementation. But they can deploy them in a shared VPC owned by the central account, eliminating the need for inter-VPC network connections. You can share Amazon VPCs to use the implicit routing within a VPC for applications that require a high degree of interconnectivity and are within the same trust boundaries.

This architecture uses AWS RAM, which allows you to share the VPC Subnets from AWS Account A to participating AWS accounts within your AWS organization. When the subnets are shared, participant AWS accounts (Account B and Account C) can see the shared subnets in their own environment. They can then deploy their Amazon EC2 instances in those subnets. This is depicted in the diagram where the visibility of the shared subnets (SS1 and SS2) is extended to the participating accounts (Account B and Account C).

You can also deploy the NLB in these shared subnets. Then, each participant account owns all the AWS resources for their microservice stack, but it’s deployed in the VPC of Account A.

This allows your individual microservice teams to maintain control over load balancer configurations and Auto Scaling policies based for their specific microservices’ needs. At the same time, using the AWS RAM they are able to effectively use the existing VPC environment of Account A.

This architecture presents several benefits over the multi-VPC architecture discussed earlier:

  • You can deploy the entire application, including the individual microservices, into a single shared VPC. This is while still allowing individual microservice teams control over their AWS resources deployed in that VPC.
  • Since the entire architecture now resides in a single VPC, it doesn’t require other networking connectivity features. It can rely on intra-VPC traffic for communication between the application (API) layer and microservices.
  • This leads to reduction in cost of the architecture. While the AWS RAM functionality is free of charge, this also reduces the data transfer and per-connection costs incurred by other options such as VPC peering, AWS PrivateLink, and AWS Transit Gateway.
  • This maintains the isolation across the individual microservices and the application layer.  Participants can’t view, modify, or delete resources that belong to others or the VPC owner.
  • This also leads to effective utilization of your VPC CIDR block resources.
  • Since multiple subnets belonging to different Availability Zones are shared, the application and individual mS continues to take advantage of scalability, availability, and fault tolerance.

The following illustration shows how you can configure AWS RAM to set up the VPC subnet resource shares between owner Account A and participating Account B. The example below shows the sharing of private subnet SS1 using this method:

(Click for larger image)

Accounts A and B Resource Share

Once this subnet is shared, the participating Account B can launch its Network Load Balancer of its microservice ms1 in the shared VPC subnet as shown below:

Account B can launch its Network Load Balancer of its microservice ms1 in the shared VPC subnet

While this architecture has many advantages, there are important considerations:

  • This style of architecture is suitable when you are certain that the number of microservices is small enough to coexist in a single VPC without depleting the CIDR block of the shared subnets of the VPC.
  • If the traffic between these microservices is in-significant, then the cost benefit of this architecture over other options may not be substantial. This is due to the effect of traffic flow on data transfer cost.

Conclusion

AWS Cloud provides several options to build a microservices architecture. It is important to look at the characteristics of your application to determine which architectural choices top opt for. The AWS RAM and the ability to deploy AWS resources (including Network Load Balancers in shared VPC) helps you eliminate inter-VPC traffic and associated networking costs. And this without sacrificing high availability, scalability, fault tolerance, and security for your application.

10 things you can do today to reduce AWS costs

Post Syndicated from Roshni Pary original https://aws.amazon.com/blogs/compute/10-things-you-can-do-today-to-reduce-aws-costs/

This post is contributed by Shankar Ramachandran, SA Specialist, Cost Optimization

Introduction

AWS’s breadth of services and pricing options offer the flexibility to effectively manage your costs, and still keep the performance and capacity per your business requirement. While the fundamental process of cost optimization on AWS remains the same – monitor your AWS costs and usage, analyze the data to find savings, take action to realize the savings; in this blog I will take a more tactical approach of reducing cost with changes in user demand.

Before you start

Before taking any actions to reduce cost, find out the costs of AWS services you are consuming. The AWS Free Tier provides customers the ability to explore and try out AWS services free of charge up to specified limits for each service. Use the steps in this video to check if you are exceeding the free tier limit.

Next, use AWS Cost Explorer to view and analyze your AWS costs and usage. This tool provides default reports that help you visualize cost and usage at a high level (e.g. AWS  accounts, AWS service),  or at the resource level (e.g. EC2 instance ID). Start by identifying the top accounts where your costs incur using the “Monthly costs by linked account report”. Next, identify the top services that contribute to the costs within those accounts. You can do this by using the “Monthly costs by service report”. Use the hourly and resource level granularity and tags to filter and identify the top resources incurring costs.

Cost Explorer resource level granularity

Now, you should have an understanding of your AWS costs and usage. Next, let’s walk through 10 tactical things you can do today using existing AWS tools and services to reduce your AWS costs.

#1 Identify Amazon EC2 instances with low-utilization and reduce cost by stopping or rightsizing

Use AWS Cost Explorer Resource Optimization to get a report of EC2 instances that are either idle or have low utilization. You can reduce costs by either stopping or downsizing these instances. Use AWS Instance Scheduler to automatically stop instances. Use AWS Operations Conductor to automatically resize the EC2 instances (based on the recommendations report from Cost Explorer).

Cost Explorer rightsizing recommendations

Use AWS Compute Optimizer to look at instance type recommendations beyond downsizing within an instance family. It gives downsizing recommendations within or across instance families, upsizing recommendations to remove performance bottlenecks, and recommendations for EC2 instances that are parts of an Auto Scaling group.

#2 Identify Amazon EBS volumes with low-utilization and reduce cost by snapshotting then deleting them

EBS volumes that have very low activity (less than 1 IOPS per day) over a period of 7 days indicate that they are probably not in use. Identify these volumes using the Trusted Advisor Underutilized Amazon EBS Volumes Check. To reduce costs, first snapshot the volume (in case you need it later), then delete these volumes. You can automate the creation of snapshots using the Amazon Data Lifecycle Manager. Follow the steps here to delete EBS volumes.

#3 Analyze Amazon S3 usage and reduce cost by leveraging lower cost storage tiers

Use S3 Analytics to analyze storage access patterns on the object data set for 30 days or longer. It makes recommendations on where you can leverage S3 Infrequently Accessed (S3 IA) to reduce costs. You can automate moving these objects into lower cost storage tier using Life Cycle Policies. Alternately, you can also use S3 Intelligent-Tiering, which automatically analyzes and moves your objects to the appropriate storage tier.

#4 Identify Amazon RDS, Amazon Redshift instances with low utilization and reduce cost by stopping (RDS) and pausing (Redshift)

Use the Trusted Advisor Amazon RDS Idle DB instances check, to identify DB instances which have not had any connection over the last 7 days. To reduce costs, stop these DB instances using the automation steps described in this blog post. For Redshift, use the Trusted Advisor Underutilized Redshift clusters check, to identify clusters which have had no connections for the last 7 days, and less than 5% cluster wide average CPU utilization for 99% of the last 7 days. To reduce costs, pause these clusters using the steps in this blog.

Pause underutilized Redshift clusters

#5 Analyze Amazon DynamoDB usage and reduce cost by leveraging Autoscaling or On-demand

Analyze your DynamoDB usage by monitoring 2 metrics, ConsumedReadCapacityUnits and ConsumedWriteCapacityUnits, in CloudWatch. To automatically scale (in and out) your DynamoDB table, use the AutoScaling feature. Using the steps here, you can enable AutoScaling on your existing tables. Alternately, you can also use the on-demand option. This option allows you to pay-per-request for read and write requests so that you only pay for what you use, making it easy to balance costs and performance.

DynamoDB read/write capacity mode

#6 Review networking and reduce costs by deleting idle load balancers

Use the Trusted Advisor Idle Load Balancers check to get a report of load balancers that have RequestCount of less than 100 over the past 7 days. Then, use the steps here, to delete these load balancers to reduce costs. Additionally, use the steps provided in this blog, review your data transfer costs using Cost Explorer.

Cost Explorer filters for EC2 Data Transfer

If data transfer from EC2 to the public internet shows up as a significant cost, consider using Amazon CloudFront. Any image, video, or static web content can be cached at AWS edge locations worldwide, using the Amazon CloudFront Content Delivery Network (CDN). CloudFront eliminates the need to over-provision capacity in order to serve potential spikes in traffic.

#7 Use Amazon EC2 Spot Instances to reduce EC2 costs

If your workload is fault-tolerant, use Spot instances to reduce costs by up to 90%. Typical workload examples include big data, containerized workloads, CI/CD, web servers, high-performance computing (HPC), and other test & development workloads. Using EC2 Auto Scaling you can launch both On-Demand and Spot instances to meet a target capacity. Auto Scaling automatically takes care of requesting for Spot instances and attempts to maintain the target capacity even if your Spot instances are interrupted. You can learn more about Spot by watching this 2019 re:Invent session:

#8 Review and modify EC2 AutoScaling Groups configuration

An EC2 Autoscaling group allows your EC2 fleet to expand or shrink based on demand. Review your scaling activity using either the describe-scaling-activity CLI command or on the console using the steps described here. Analyze the result to see if the scaling policy can be tuned to add instances less aggressively. Also review your settings to see if the minimum can be reduced so as to serve end user requests but with a smaller fleet size.

#9 Use Reserved Instances (RI) to reduce RDS, Redshift, ElastiCache and Elasticsearch costs

Use one year, no upfront RIs to get a discount of up to 42% compared to On-Demand pricing. Use the recommendations provided in AWS Cost Explorer RI purchase recommendations, which is based on your RDS, Redshift, ElastiCache and Elasticsearch usage. Make sure you adjust the parameters to one year, no upfront. This requires a one year commitment but the break-even point is typically seven to nine months. I recommend doing #4 before #9

#10 Use Compute Savings Plans to reduce EC2, Fargate and Lambda costs

Compute Savings Plans automatically apply to EC2 instance usage regardless of instance family, size, AZ, region, OS or tenancy, and also apply to Fargate and Lambda usage. Use one year, no upfront Compute Savings Plans to get a discount of up to 54% compared to On-Demand pricing. Use the recommendations provided in AWS Cost Explorer, and ensure that you have chosen compute, one year, no upfront options. Once you sign up for Savings Plans, your compute usage is automatically charged at the discounted Savings Plans prices. Any usage beyond your commitment will be charged at regular On Demand rates. I recommend doing #1 before #10.

Savings Plans Recommendations

Whats Next

With these 10 steps, you can save costs on EC2, Fargate, Lambda, EBS, S3, ELB, RDS, Redshift, DynamoDB, ElastiCache and Elasticsearch. I recommend setting up a budget using AWS Budgets, so that you get alerted when your cost and usage changes.

Set a budget to track your AWS cost and usage

 

Setup an alert on forecasted costs

Using Budgets, you can set up an alert on forecasted costs too (apart from actual). This gives you the ability to get ahead of the problem and reduce costs proactively.

Conclusion

To learn more quick cost optimization techniques, watch the on-demand webinar – Nine Ways to Reduce Your AWS Bill. We are here to help you, please reach out to AWS Support and your AWS account team if you need further assistance in optimizing your AWS environment.

Building well-architected serverless applications: Understanding application health – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-understanding-application-health-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the nine serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the Introduction post for a table of contents and explaining the example application.

Question OPS1: How do you evaluate your serverless application’s health?

Evaluating your metrics, distributed tracing, and logging gives you insight into business and operational events, and helps you understand which services should be optimized to improve your customer’s experience. By understanding the health of your Serverless Application, you will know whether it is functioning as expected, and can proactively react to any signals that indicate it is becoming unhealthy.

Required practice: Understand, analyze, and alert on metrics provided out of the box

It is important to understand metrics for every AWS service used in your application so you can decide how to measure its behavior. AWS services provide a number of out-of-the-box standard metrics to help monitor the operational health of your application.

As these metrics are generated automatically, it is a simple way to start monitoring your application and can also be augmented with custom metrics.

The first stage is to identify which services the application uses. The airline booking component uses AWS Step Functions, AWS Lambda, Amazon SNS, and Amazon DynamoDB.

When I make a booking, as shown in the Introduction post, AWS services emit metrics to Amazon CloudWatch. These are processed asynchronously without impacting the application’s performance.

There are two default CloudWatch dashboards to visualize key metrics quickly: per service and cross service.

Per service

To view the per service metrics dashboard, I open the CloudWatch console.

per-service-metrics-dashboardI select a service where Overview is shown, such as Lambda. Now I can view the metrics for all Lambda functions in the account.

per-service-metrics-lambdaCross service

To see an overview of key metrics across all AWS services, open the CloudWatch console and choose View cross service dashboard.

cross-service-metrics-dashboardI see a list of all services with one or two key metrics displayed. This provides a good overview of all services your application uses.

Alerting

The next stage is to identify the key metrics for comparison and set up alerts for under- and over-performing services. Here are some recommended metrics to alarm on for a number of AWS services.

Alerts can be configured manually or via infrastructure as code tools such as the AWS Serverless Application Model, AWS CloudFormation, or third-party tools.

To configure a manual alert for Lambda function errors using CloudWatch Alarms:

  1. I open the CloudWatch console and select Alarms and select Create Alarm.
  2. I choose Select Metric and from AWS Namespaces, select Lambda, Across All Functions and select Errors and select Select metric.add-metric-to-alert
  3. I change the Statistic to Sum and the Period to 1 minute.metric-values
  4. Under Conditions, I select a Static threshold Greater than 1 and select Next.

Alarms can also be created using anomaly detection rather than static values if there is a discernible pattern or trend. Anomaly detection looks at past metric data and uses machine learning to create a model of expected values. Alerts can then be configured if they fall outside this band of “normal” values. I use a Static threshold for this alarm.

  1. For the notification, I set the trigger to alarm to an existing SNS topic with my email address, then choose Next.metric-notification
  2. I enter a descriptive alarm name such as serverlessairline-lambda-prod-errors > 1, select Next, and choose Create alarm.

I have now manually set up an alarm.

Use CloudWatch composite alarms to combine multiple alarms to reduce noise and focus on critical issues. For example, a single alarm could trigger if there are both Lambda function errors as well as high Lambda concurrent executions.

It is simpler and more scalable to include alerting within infrastructure as code. Here is an example of alerting programmatically using CloudFormation.

I view the out of the box standard metrics and in this example, manually create an alarm for Lambda function errors.

Improvement plan summary:

  1. Understand what metrics and dimensions each managed service used provides.
  2. Configure alerts on relevant metrics for when services are unhealthy.

Good practice: Use structured and centralized logging

Central logging provides a single place to search and analyze logs. Structured logging means selecting a consistent log format and content structure to simplify querying across multiple components.

To identify a business transaction across components, such as a particular flight booking, log operational information from upstream and downstream services. Add information such as customer_id along with business outcomes such as order=accepted or order=confirmed. Make sure you are not logging any sensitive or personal identifying data in any logs.

Use JSON as your logging output format. Log multiple fields in a single object or dictionary rather than many one line messages for simpler searching.

Here is an example of a structured logging format.

The airline booking component, which is written in Python, currently uses a shared library with a separate log processing stack.

Embedded Metrics Format is a simpler mechanism to replace the shared library and use structured logging. CloudWatch Embedded Metrics adds environmental metadata such as Lambda Function version and also automatically extracts custom metrics so you can visualize and alarm on them. There are open-source client libraries available for Node.js and Python.

I then add embedded metrics to the individual confirm booking module with the following steps:

  1. I install the aws-embedded-metrics library using the instructions.
  2. In the function init code, I import the module and create a metric_scope with the following code

from aws_embedded_metrics import metric_scope
@metric_scope

  1. In the function handler, I log the generated bookingReference with the following code.

metrics.set_property("BookingReference", ret["bookingReference"])

In this example I also log the entire incoming event details.

metrics.set_property("event", event)

It is best practice to only log what is required to avoid unnecessary costs. Ensure the event does not have any sensitive or personal identifying data which is available to anyone who has access to the logs.

To avoid the duplicate logging in this example airline application which adds cost, I remove the existing shared library logger.*() lines.

When I make a booking, the CloudWatch log message is in structured JSON format. It contains the properties I set event, BookingReference, as well as function metadata.

I can then search for all log activity related to a specific booking across multiple functions with booking_id. I can track customer activity across multiple bookings using customer_id.

Logging is often created as a shared library resource which all functions reference. Another option is using Lambda Layers, which lets functions import additional code such as external libraries. Multiple functions can share this code.

Improvement plan summary:

  1. Log request identifiers from downstream services, component name, component runtime information, unique correlation identifiers, and information that helps identify a business transaction.
  2. Use JSON as the logging output. Prefer logging entire objects/dictionaries rather than many one line messages. Mask or remove sensitive data when logging.
  3. Minimize logging debugging information to a minimum as they can incur both costs and increase noise to signal ratio

Conclusion

Evaluating serverless application health helps understand which services should be optimized to improve your customer’s experience. I cover out of the box metrics and alerts, as well as structured and centralized logging.

This well-architected question will be continued in an upcoming post where I look at custom metrics and distributed tracing.

Building well-architected serverless applications: Introduction

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-introduction/

Customers building production applications in the cloud are looking to build and operate their applications following best practices. Best practices are useful throughout the software development lifecycle and help to answer the question “am I well-architected?”

Serverless technologies provide a solid foundation for building well-architected applications with a goal to reduce and minimize the impact of issues that can happen. What does it mean to be “well architected”?

In 2015, AWS released the AWS Well-Architected Framework.  It’s a structured way to compare applications against AWS architectural best practices with advice on how to improve. This formalized framework publicly documents the approach our solutions architects use when conducting architectural reviews. The Well-Architected Framework can be used by customers and partners to evaluate applications. It’s currently based on five pillars:

  • Operational Excellence
  • Security
  • Reliability
  • Performance Efficiency
  • Cost Optimization

In 2017, AWS extended the framework’s general advice to be more application domain specific with the concept of a “lens”. A lens adds additional questions for specific technology areas, which focus on what is different from the generic advice. Today, there are three technology domain lenses:

In 2018, AWS added the AWS Well-Architected Tool within the AWS Management Console. The tool allows you to define a Workload and answer a number of questions based on each of the five pillars. Each question has context to explain what the question means and why it is important, and then provides a number of best practices.

well-architected-tool-questionThe tool is used to assess risks, and find opportunities for improvement for workloads. It can also provide a broader view across multiple applications for a central team. Each question has a number of best practices to follow, categorized as high or medium risk. This can help you decide where to focus.

Serverless Lens

In February, we added functionality to apply lenses to the Well-Architected Tool. The first one is the Serverless Lens. This includes nine questions, each with additional best practice recommendations to follow.

serverless-lens-in-consoleApplying best practices to production applications may need some more time and effort. The Well-Architected Tool and Serverless Lens are available to help and are not intended to only be a one-time check.

We suggest at a minimum reading the whitepapers and being familiar with the questions before the design phase. It’s a good idea to check throughout the application lifecycle; halfway (or sooner), close to launch, and post-launch for each iteration. Although it is never too late to add best practices to your applications, starting early helps reduce the remediation process as well as making it part of the full application lifecycle journey.

The following posts are a multi-part series addressing each of the questions within the Serverless Lens of the Well-Architected Tool.

Series Episodes: Building well-architected serverless applications

  1. Introduction
  2. Operational Excellence: Understanding serverless application health
    1. Out of the box metrics and alerts; structured and centralized logging
    2. Custom metrics and distributed tracing (upcoming)
  3. Operational Excellence – Understanding serverless application health (upcoming)
  4. Operational Excellence – Approaching application lifecycle management (upcoming)
  5. Security – Controlling access to serverless APIs (upcoming)
  6. Security – Managing serverless security boundaries (upcoming)
  7. Security – Implementing application security (upcoming)
  8. Reliability – Regulating inbound request rates (upcoming)
  9. Reliability – Building resiliency into serverless applications (upcoming)
  10. Performance Efficiency – Optimizing your Serverless Application Performance (upcoming)
  11. Cost Optimization – Optimizing your Serverless Application Costs (upcoming)

Example application

For this series, I am looking at an example application, AWS Serverless Airline Booking. This is a complete web application for a fictional airline booking service built using serverless technologies that provide Flight Search, Payment, Booking, and Loyalty Point features.

I show how to use the Well-Architected Tool with the Serverless Lens to help apply serverless best practices to the application.

airline-mobile-exampleThis web application is the theme of Season 2 of Build on Serverless, which is a video series on Twitch.tv, and presented at AWS re:Invent 2019.

You can view this application directly on GitHub or deploy into an AWS account using the Getting Started instructions. This is not required to follow this Building Well-Architected Serverless Applications series but allows you to explore the code.

The application architecture consists of a frontend web application, which interacts with a number of backend microservices to provide the airline booking functionality.

airline-architectureThere are four backend services that make up the application:

ServiceDescription
CatalogProvides flight search
BookingProvides new and list bookings. Business workflow implemented in Python.
PaymentProvides payment authorization, collection, and refund workflows using Stripe. Business workflow implemented in Python
LoyaltyProvides loyalty points for customers including tiers. Implemented in TypeScript.

Once the application is deployed, I can create a booking by searching for flights.

Once I select an available flight, I enter payment details using a test Stripe credit card.

The application authorizes the payment, confirms, and stores the booking and then updates the loyalty points.

Next steps

Building serverless applications using best practices can give you more confidence in the architecture and operations of your workloads.

Using the Well-Architected Tool and Serverless Lens can give you more visibility into your applications. They can help pinpoint and rank areas to improve. Well-Architected works best when integrated into your application lifecycle processes.

Continue learning in the next post, which dives into the first Well-Architected Serverless Lens question: Operational Excellence – Understanding Serverless Application Health – part1.

Top 10 security items to improve in your AWS account

Post Syndicated from Nathan Case original https://aws.amazon.com/blogs/security/top-10-security-items-to-improve-in-your-aws-account/

If you’re looking to improve your cloud security, a good place to start is to follow the top 10 most important cloud security tips that Stephen Schmidt, Chief Information Security Officer for AWS, laid out at AWS re:Invent 2019. Below are the tips, expanded to help you take action.

10 most important security tips

1) Accurate account information

When AWS needs to contact you about your AWS account, we use the contact information defined in the AWS Management Console, including the email address used to create the account and those listed under Alternate Contacts. All email addresses should be set up to go to aliases that are not dependent on a single person. You should also have a process for regularly checking that these email addresses work, and that you are responding to emails—especially security notifications you might receive from [email protected]. Learn how to set the alternate contacts to help ensure someone is receiving important messages, even when you are unavailable.

Alternative Contacts user interface

2) Use multi-factor authentication (MFA)

MFA is the best way to protect accounts from inappropriate access. Always set up MFA on your Root user and AWS Identity and Access Management (IAM) users. If you use AWS Single Sign-On (SSO) to control access to AWS or to federate your corporate identity store, you can enforce MFA there. Implementing MFA at the federated identity provider (IdP) means that you can take advantage of existing MFA processes in your organization. To get started, see Using Multi-Factor Authentication (MFA) in AWS.

3) No hard-coding secrets

When you build applications on AWS, you can use AWS IAM roles to deliver temporary, short-lived credentials for calling AWS services. However, some applications require longer-lived credentials, such as database passwords or other API keys. If this is the case, you should never hard code these secrets in the application or store them in source code.

You can use AWS Secrets Manager to control the information in your application. Secrets Manager allows you to rotate, manage, and retrieve database credentials, API keys, and other secrets through their lifecycle. Users and applications can retrieve secrets with a call to Secrets Manager APIs, eliminating the need to hard code sensitive information in plain text.

You should also learn how to use AWS IAM roles for applications running on Amazon EC2. Also, for best results, learn how to securely provide database credentials to AWS Lambda functions by using AWS Secrets Manager.

4) Limit security groups

Security groups are a key way that you can enable network access to resources you have provisioned on AWS. Ensuring that only the required ports are open and the connection is enabled from known network ranges is a foundational approach to security. You can use services such as AWS Config or AWS Firewall Manager to programmatically ensure that the virtual private cloud (VPC) security group configuration is what you intended. The Network Reachability rules package analyzes your Amazon Virtual Private Cloud (Amazon VPC) network configuration to determine whether your Amazon EC2 instances can be reached from external networks, such as the Internet, a virtual private gateway, or AWS Direct Connect. AWS Firewall Manager can also be used to automatically apply AWS WAF rules to internet-facing resources across your AWS accounts. Learn more about detecting and responding to changes in VPC Security Groups.

5) Intentional data policies

Not all data is created equal, which means classifying data properly is crucial to its security. It’s important to accommodate the complex tradeoffs between a strict security posture and a flexible agile environment. A strict security posture, which requires lengthy access-control procedures, creates stronger guarantees about data security. However, such a security posture can work counter to agile and fast-paced development environments, where developers require self-service access to data stores. Design your approach to data classification to meet a broad range of access requirements.

How you classify data doesn’t have to be as binary as public or private. Data comes in various degrees of sensitivity and you might have data that falls in all of the different levels of sensitivity and confidentiality. Design your data security controls with an appropriate mix of preventative and detective controls to match data sensitivity appropriately. In the suggestions below, we deal mostly with the difference between public and private data. If you have no classification policy currently, public versus private is a good place to start.

To protect your data once it has been classified, or while you are classifying it:

  1. If you have Amazon Simple Storage Service (Amazon S3) buckets that are for public usage, move all of that data into a separate AWS account set aside for public access. Set up policies to allow only processes — not humans — to move data into those buckets. This lets you block the ability to make a public Amazon S3 bucket in any other AWS account.
  2. Use Amazon S3 to block public access in any account that should not be able to share data through Amazon S3.
  3. Use two different IAM roles for encryption and decryption with KMS. This lets you separate the data entry (encryption) and data review (decryption), and it allows you to do threat detection on the failed decryption attempts by analyzing that role.

6) Centralize CloudTrail logs

Logging and monitoring are important parts of a robust security plan. Being able to investigate unexpected changes in your environment or perform analysis to iterate on your security posture relies on having access to data. AWS recommends that you write logs, especially AWS CloudTrail, to an S3 bucket in an AWS account designated for logging (Log Archive). The permissions on the bucket should prevent deletion of the logs, and they should also be encrypted at rest. Once the logs are centralized, you can integrate with SIEM solutions or use AWS services to analyze them. Learn how to use AWS services to visualize AWS CloudTrail logs. Once you have CloudTrail logs centralized, you can also use the same Log Archive account to centralize logs from other sources, such as CloudWatch Logs and AWS load balancers.

7) Validate IAM roles

As you operate your AWS accounts to iterate and build capability, you may end up creating multiple IAM roles that you discover later you don’t need. Use AWS IAM Access Analyzer to review access to your internal AWS resources and determine where you have shared access outside your AWS accounts. Routinely reevaluating AWS IAM roles and permissions with Security Hub or open source products such as Prowler will give you the visibility needed to validate compliance with your Governance, Risk, and Compliance (GRC) policies. If you’re already past this point, and have already created multiple roles, you can search for unused IAM roles and remove them.

8) Take actions on findings (This isn’t just GuardDuty anymore!)

AWS Security Hub, Amazon GuardDuty, and AWS Identity and Access Management Access Analyzer are managed AWS services that provide you with actionable findings in your AWS accounts. They are easy to turn on and can integrate across multiple accounts. Turning them on is the first step. You also need to take action when you see findings. The action(s) to take are determined by your own incident response policy. For each finding, ensure that you have determined what your required response actions should be.

Action can be notifying a human to respond, but as you get more experienced in AWS services, you will want to automate the response to the findings generated by Security Hub or GuardDuty. Learn more about how to automate your response and remediation from Security Hub findings.

9) Rotate keys

One of the things that Security Hub provides is a view of the compliance posture of your AWS accounts using the CIS Benchmarks. One of these checks is to look for IAM users with access keys more than 90 days old. If you need to use access keys rather than roles, you should rotate them regularly. Review best practices for managing AWS access keys for more guidance. If your users access AWS via federation, then you can remove the need to issue AWS access keys for your users. Users authenticate to the IdP and assume an IAM role in the target AWS account. The result is that long-term credentials are not needed, and your user will have short-term credentials associated with an IAM role.

10) Be involved in the dev cycle

All of the guidance to this point has been focused on the technology configuration that you can implement. The last piece of advice, “be involved in the dev cycle,” is about people, and can be broadly summarized as “raise the security culture of your organization.” The role of people in all parts of the organization is to help the business launch their solutions securely. As people focused on security, we can guide and educate the rest of our organization to understand what they need to do to raise the bar for security in everything they build. Security is everyone’s job — not just for those folks with it in their job title.

What the security people in every organization can do is to make security easier, by shifting the process to make the easiest and most desirable action one that is almost the most secure. For example, each team should not build their own identity federation or logging solution. We are stronger when we work together, and this applies to securing the cloud as well. The goal is to make security more approachable so that co-workers want to talk to the security team because they know it is the place to get help. For more about creating this type of security team, read Cultivating Security Leadership.

Now that you’ve revisited the top 10 things to make your cloud more secure, make sure you have them set up in your AWS accounts — and go build securely!

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Nathan Case

Nathan Case

Nathan is a Security Strategist, Geek. He joined AWS in 2016. You can learn more about him here.

Event-Based Processing for Asynchronous Communication

Post Syndicated from Roberto Iturralde original https://aws.amazon.com/blogs/architecture/event-based-processing-for-asynchronous-communication/

In the first post in this series on messaging patterns, we gave an overview of messaging and the benefits and challenges of both synchronous and asynchronous service communication. In this post, we will look at common characteristics to consider when evaluating messaging channel technologies. We will also introduce Amazon EventBridge, the AWS Serverless event bus.

What is an Event?

An event is the occurrence of an immutable message describing something that happened in the past. The event data payload typically contains details of the occurrence and associated metadata. Events are routed from a producer to potential consumers via a transportation channel, as defined in the previous post in this series. This event may be delivered to zero, one, or many consumers, depending on the number of consumers subscribed to that transportation channel, and possibly further limited to the subscribed consumers who have filtered their subscription to particular events.

Below is an example of an AWS event. AWS events are emitted by AWS services, are represented as JSON objects, require all the top-level metadata fields shown below, and contain a source-specific payload in the “detail” field.

Sample AWS event

  "version": "0",
  "id": "6a7e8feb-b491-4cf7-a9f1-bf3703467718",
  "detail-type": "EC2 Instance State-change Notification",
  "source": "aws.ec2",
  "account": "111122223333",
  "time": "2017-12-22T18:43:48Z",
  "region": "us-west-1",
  "resources": [
    "arn:aws:ec2:us-west-1:123456789012:instance/ i-1234567890abcdef0"
  ],
  "detail": {
    "instance-id": " i-1234567890abcdef0",
    "state": "terminated"
  }
}

Event Schemas

Looking at the above sample event and the documentation on AWS events, you’ll notice there is a schema for the top-level fields of all events and separate schemas for the source-specific attributes in the “detail” field. This is because multiple AWS services emit events to the Amazon EventBridge default message bus. Unlike HTTP APIs, which often have a published schema and validate requests against that schema, it’s common for event transportation channels to allow heterogeneous events through the channel, like the default bus in EventBridge.

To simplify the experience of defining, discovering, and developing against event schemas, Amazon EventBridge has a Schema Registry. The registry is pre-populated with OpenAPI schemas for events from existing AWS services and allows you to define your own schemas. It also supports downloading code bindings for popular programming languages for schemas in the registry.

Event channel characteristics

When evaluating an event channel technology against your business and technical requirements, keep in mind the following common characteristics. This list is not exhaustive, but may serve as a useful starting point for orienting yourself to each technology.

  • Ordering: Are events guaranteed to be delivered in the order they arrive?
  • Duplication: Can duplicate events be introduced? Will duplicate events in the channel be de-duplicated?
  • Fanout: Can each event be delivered to or processed by multiple consumers?
  • Push versus pull: Are events sent to consumers or do consumers need to fetch events?
  • Filtering: Can event consumers choose to receive a filtered subset of the events passing through the channel?
  • Serverless: Does the underlying infrastructure need to be managed and tuned by the channel owner?

Amazon EventBridge

Amazon Eventbrdige custom application

Amazon EventBridge is a serverless, fully managed event bus. Producers can be AWS services, supported Software as a Service (SaaS) partners, or your own applications that write JSON messages to an EventBridge event bus. Messages are delivered to consumers configured as the targets of event rules. Event rules can filter messages based on attributes in the JSON payload, and a single rule can route events to multiple targets. Event rules retry delivery of events to configured targets with exponential backoff for up to 24 hours. EventBridge does not guarantee event order will be maintained and promises as-least-once event delivery, meaning duplicate messages can be introduced. Consider EventBridge for use cases where you need rich event filtering, fanout of events to tens of consumers, message delivery or consumption across AWS accounts, to listen to events from a wide variety of AWS services, or to receive messages from SaaS partner systems.

Conclusion

In this post, we defined events and outlined criteria against which to evaluate event transportation technologies. We then briefly reviewed Amazon EventBridge against these criteria. For a deeper discussion on eventing technologies, I recommend watching the recording of the session Moving to Event-Driven Architectures by AWS Distinguished Engineer, Tim Bray, from AWS re:Invent 2019. For a deep dive on Amazon EventBridge, watch this AWS Tech Talk. For an example of integrating Amazon EventBridge into a serverless architecture, review this blog post that walks through an example application.

Introduction to Messaging for Modern Cloud Architecture

Post Syndicated from Sam Dengler original https://aws.amazon.com/blogs/architecture/introduction-to-messaging-for-modern-cloud-architecture/

We hope you’ve enjoyed reading our posts on best practices for your serverless applications. The posts in the following series will focus on best practices when introducing messaging patterns into your applications. Let’s review some core messaging concepts and see how they can be used to address challenges when designing modern cloud architectures.

Introduction

Applications can communicate information with each other using messages, a mechanism for packaging a data payload and associated metadata. The application that sends a message is called the producer and the application that receives the message is called the consumer. Producers and consumers exchange messages using a variety of transportation channels, for example point-to-point requests, message queues, subscription topics, or event buses. These transportation channels have differently characteristics that make them useful when implementing message communication patterns. Dependencies emerge when producers and consumers exchange messages, which is called coupling.

Synchronous Communication

synchronous communication

Message communication is called synchronous when the producer sends a message to the consumer and waits for a response before the producer continues its processing logic. An example of synchronous communication over a point-to-point channel is when a HTTP client makes a request to a HTTP service, waits for the service to process the request, and then applies logic to the HTTP response to determine how to proceed.

Synchronous communication patterns are more straightforward to implement, however they create tight coupling between producers and consumers. Tight coupling can cause problems due to traffic spikes and failures propagating directly throughout the application. For example, in a three-tier architecture, when the application experiences a spike in client traffic, the web tier directly translates the traffic spike as pressure on downstream resources (the logic and data tiers), which may not scale to meet the sudden demand. Likewise, downstream resource failure in the logic or data tier directly impacts the web tier from responding to client requests. Applications can mimic a synchronous experience, for example a status spinner, using asynchronous communication with a polling or push notification strategy.

Asynchronous Communication

Asynchronous communication

Message communication is called asynchronous when the producer sends a message to the consumer and proceeds without waiting for the response. An example of asynchronous communication over a message queue channel is when a client publishes a message to a queue, and after the queue acknowledges receipt of the message, the publisher proceeds without waiting for the consumer to process the message.

Asynchronous communication patterns are implemented using transportation channels such as queues, topics, and event buses to create loose coupling between producers and consumers. Loose coupling increases an architecture’s resiliency to failure and ability to handle traffic spikes because it creates an indirection between producer and consumer communication, enabling them to operate independently of each other. Using the three-tier architecture example, a message queue can be introduced between the web, logic, and data tiers to enable each to scale independently of each other. When the application experiences a spike in client traffic, the web tier translates the traffic spike as more messages to the queue for processing, however the logic tier may continue to process messages off the queue without being directly impacted.

Considerations and Next Steps

Although asynchronous communication patterns can benefit modern cloud architectures, there are tradeoffs to consider. Asynchronous messaging adds latency to end-to-end processing time due to the addition of middleware. Producers and consumers take a dependency on the middleware stack, which must also scale to meet demand and be resilient to failure. Care must be taken to appropriately configure producers, consumers, and middleware to handle errors so that messages are not lost, more monitoring is required to ensure proper operations, and multiple logs must be correlated to troubleshoot and diagnose problems.

Amazon MQ, Amazon KinesisAmazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), and Amazon EventBridge are highly available, large scale, failure resistant managed services that can be used to implement asynchronous messaging patterns. You can explore these services at the AWS Messaging page and their integration into Serverless Architectures in the free new digital course, Architecting Serverless Solutions. You can also visit the AWS Event-Driven Architecture page to learn how to apply messaging patterns to build event-driven solutions. The upcoming posts in this series will explore these AWS services to help ensure message patterns are implemented using best practices when applied to modern cloud architecture.

Identifying and resolving security code vulnerabilities using Snyk in AWS CI/CD Pipeline

Post Syndicated from Jay Yeras original https://aws.amazon.com/blogs/devops/identifying-and-resolving-vulnerabilities-in-your-code/

The majority of companies have embraced open-source software (OSS) at an accelerated rate even when building proprietary applications. Some of the obvious benefits for this shift include transparency, cost, flexibility, and a faster time to market. Snyk’s unique combination of developer-first tooling and best in class security depth enables businesses to easily build security into their continuous development process.

Even for teams building proprietary code, use of open-source packages and libraries is a necessity. In reality, a developer’s own code is often a small core within the app, and the rest is open-source software. While relying on third-party elements has obvious benefits, it also presents numerous complexities. Inadvertently introducing vulnerabilities into your codebase through repositories that are maintained in a distributed fashion and with widely varying levels of security expertise can be common, and opens up applications to effective attacks downstream.

There are three common barriers to truly effective open-source security:

  1. The security task remains in the realm of security and compliance, often perpetuating the siloed structure that DevOps strives to eliminate and slowing down release pace.
  2. Current practice may offer automated scanning of repositories, but the remediation advice it provides is manual and often un-actionable.
  3. The data generated often focuses solely on public sources, without unique and timely insights.

Developer-led application security

This blog post demonstrates techniques to improve your application security posture using Snyk tools to seamlessly integrate within the developer workflow using AWS services such as Amazon ECR, AWS Lambda, AWS CodePipeline, and AWS CodeBuild. Snyk is a SaaS offering that organizations use to find, fix, prevent, and monitor open source dependencies. Snyk is a developer-first platform that can be easily integrated into the Software Development Lifecycle (SDLC). The examples presented in this post enable you to actively scan code checked into source code management, container images, and serverless, creating a highly efficient and effective method of managing the risk inherent to open source dependencies.

Prerequisites

The examples provided in this post assume that you already have an AWS account and that your account has the ability to create new IAM roles and scope other IAM permissions. You can use your integrated development environment (IDE) of choice. The examples reference AWS Cloud9 cloud-based IDE. An AWS Quick Start for Cloud9 is available to quickly deploy to either a new or existing Amazon VPC and offers expandable Amazon EBS volume size.

Sample code and AWS CloudFormation templates are available to simplify provisioning the various services you need to configure this integration. You can fork or clone those resources. You also need a working knowledge of git and how to fork or clone within your source provider to complete these tasks.

cd ~/environment && \ 
git clone https://github.com/aws-samples/aws-modernization-with-snyk.git modernization-workshop 
cd modernization-workshop 
git submodule init 
git submodule update

Configure your CI/CD pipeline

The workflow for this example consists of a continuous integration and continuous delivery pipeline leveraging AWS CodeCommit, AWS CodePipeline, AWS CodeBuild, Amazon ECR, and AWS Fargate, as shown in the following screenshot.

CI/CD Pipeline

For simplicity, AWS CloudFormation templates are available in the sample repo for services.yaml, pipeline.yaml, and ecs-fargate.yaml, which deploy all services necessary for this example.

Launch AWS CloudFormation templates

A detailed step-by-step guide can be found in the self-paced workshop, but if you are familiar with AWS CloudFormation, you can launch the templates in three steps. From your Cloud9 IDE terminal, change directory to the location of the sample templates and complete the following three steps.

1) Launch basic services

aws cloudformation create-stack --stack-name WorkshopServices --template-body file://services.yaml \
--capabilities CAPABILITY_NAMED_IAM until [[ `aws cloudformation describe-stacks \
--stack-name "WorkshopServices" --query "Stacks[0].[StackStatus]" \
--output text` == "CREATE_COMPLETE" ]]; do echo "The stack is NOT in a state of CREATE_COMPLETE at `date`"; sleep 30; done &&; echo "The Stack is built at `date` - Please proceed"

2) Launch Fargate:

aws cloudformation create-stack --stack-name WorkshopECS --template-body file://ecs-fargate.yaml \
--capabilities CAPABILITY_NAMED_IAM until [[ `aws cloudformation describe-stacks \ 
--stack-name "WorkshopECS" --query "Stacks[0].[StackStatus]" \ 
--output text` == "CREATE_COMPLETE" ]]; do echo "The stack is NOT in a state of CREATE_COMPLETE at `date`"; sleep 30; done &&; echo "The Stack is built at `date` - Please proceed"

3) From your Cloud9 IDE terminal, change directory to the location of the sample templates and run the following command:

aws cloudformation create-stack --stack-name WorkshopPipeline --template-body file://pipeline.yaml \
--capabilities CAPABILITY_NAMED_IAM until [[ `aws cloudformation describe-stacks \
--stack-name "WorkshopPipeline" --query "Stacks[0].[StackStatus]" \
--output text` == "CREATE_COMPLETE" ]]; do echo "The stack is NOT in a state of CREATE_COMPLETE at `date`"; sleep 30; done &&; echo "The Stack is built at `date` - Please proceed"

Improving your security posture

You need to sign up for a free account with Snyk. You may use your Google, Bitbucket, or Github credentials to sign up. Snyk utilizes these services for authentication and does not store your password. Once signed up, navigate to your name and select Account Settings. Under API Token, choose Show, which will reveal the token to copy, and copy this value. It will be unique for each user.

Save your password to the session manager

Run the following command, replacing abc123 with your unique token. This places the token in the session parameter manager.

aws ssm put-parameter --name "snykAuthToken" --value "abc123" --type SecureString

Set up application scanning

Next, you need to insert testing with Snyk after maven builds the application. The simplest method is to insert commands to download, authorize, and run the Snyk commands after maven has built the application/dependency tree.

The sample Dockerfile contains an environment variable from a value passed to the docker build command, which contains the token for Snyk. By using an environment variable, Snyk automatically detects the token when used.

#~~~~~~~SNYK Variable~~~~~~~~~~~~ 
# Declare Snyktoken as a build-arg ARG snyk_auth_token
# Set the SNYK_TOKEN environment variable ENV
SNYK_TOKEN=${snyk_auth_token}
#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Download Snyk, and run a test, looking for medium to high severity issues. If the build succeeds, post the results to Snyk for monitoring and reporting. If a new vulnerability is found, you are notified.

# package the application
RUN mvn package -Dmaven.test.skip=true

#~~~~~~~SNYK test~~~~~~~~~~~~
# download, configure and run snyk. Break build if vulns present, post results to `https://snyk.io/`
RUN curl -Lo ./snyk "https://github.com/snyk/snyk/releases/download/v1.210.0/snyk-linux"
RUN chmod -R +x ./snyk
#Auth set through environment variable
RUN ./snyk test --severity-threshold=medium
RUN ./snyk monitor

Set up docker scanning

Later in the build process, a docker image is created. Analyze it for vulnerabilities in buildspec.yml. First, pull the Snyk token snykAuthToken from the parameter store.

env:
  parameter-store:
    SNYK_AUTH_TOKEN: "snykAuthToken"

Next, in the prebuild phase, install Snyk.

phases:
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - aws --version
      - $(aws ecr get-login --region $AWS_DEFAULT_REGION --no-include-email)
      - REPOSITORY_URI=$(aws ecr describe-repositories --repository-name petstore_frontend --query=repositories[0].repositoryUri --output=text)
      - COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
      - IMAGE_TAG=${COMMIT_HASH:=latest}
      - PWD=$(pwd)
      - PWDUTILS=$(pwd)
      - curl -Lo ./snyk "https://github.com/snyk/snyk/releases/download/v1.210.0/snyk-linux"
      - chmod -R +x ./snyk

Next, in the build phase, pass the token to the docker compose command, where it is retrieved in the Dockerfile code you set up to test the application.

build:
    commands:
      - echo Build started on `date`
      - echo Building the Docker image...
      - cd modules/containerize-application
      - docker build --build-arg snyk_auth_token=$SNYK_AUTH_TOKEN -t $REPOSITORY_URI:latest.

You can further extend the build phase to authorize the Snyk instance for testing the Docker image that’s produced. If it passes, you can pass the results to Snyk for monitoring and reporting.

build:
    commands:
      - $PWDUTILS/snyk auth $SNYK_AUTH_TOKEN
      - $PWDUTILS/snyk test --docker $REPOSITORY_URI:latest
      - $PWDUTILS/snyk monitor --docker $REPOSITORY_URI:latest
      - docker tag $REPOSITORY_URI:latest $REPOSITORY_URI:$IMAGE_TAG

For reference, a sample buildspec.yaml configured with Snyk is available in the sample repo. You can either copy this file and overwrite your existing buildspec.yaml or open an editor and replace the contents.

Testing the application

Now that services have been provisioned and Snyk tools have been integrated into your CI/CD pipeline, any new git commit triggers a fresh build and application scanning with Snyk detects vulnerabilities in your code.

In the CodeBuild console, you can look at your build history to see why your build failed, identify security vulnerabilities, and pinpoint how to fix them.

Testing /usr/src/app...
✗ Medium severity vulnerability found in org.primefaces:primefaces
Description: Cross-site Scripting (XSS)
Info: https://snyk.io/vuln/SNYK-JAVA-ORGPRIMEFACES-31642
Introduced through: org.primefaces:[email protected]
From: org.primefaces:[email protected]
Remediation:
Upgrade direct dependency org.primefaces:[email protected] to org.primefaces:[email protected] (triggers upgrades to org.primefaces:[email protected])
✗ Medium severity vulnerability found in org.primefaces:primefaces
Description: Cross-site Scripting (XSS)
Info: https://snyk.io/vuln/SNYK-JAVA-ORGPRIMEFACES-31643
Introduced through: org.primefaces:[email protected]
From: org.primefaces:[email protected]
Remediation:
Upgrade direct dependency org.primefaces:[email protected] to org.primefaces:[email protected] (triggers upgrades to org.primefaces:[email protected])
Organisation: sample-integrations
Package manager: maven
Target file: pom.xml
Open source: no
Project path: /usr/src/app
Tested 37 dependencies for known vulnerabilities, found 2 vulnerabilities, 2 vulnerable paths.
The command '/bin/sh -c ./snyk test' returned a non-zero code: 1
[Container] 2020/02/14 03:46:22 Command did not exit successfully docker build --build-arg snyk_auth_token=$SNYK_AUTH_TOKEN -t $REPOSITORY_URI:latest . exit status 1
[Container] 2020/02/14 03:46:22 Phase complete: BUILD Success: false
[Container] 2020/02/14 03:46:22 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: docker build --build-arg snyk_auth_token=$SNYK_AUTH_TOKEN -t $REPOSITORY_URI:latest .. Reason: exit status 1

Remediation

Once you remediate your vulnerabilities and check in your code, another build is triggered and an additional scan is performed by Snyk. This time, you should see the build pass with a status of Succeeded.

You can also drill down into the CodeBuild logs and see that Snyk successfully scanned the Docker Image and found no package dependency issues with your Docker container!

[Container] 2020/02/14 03:54:14 Running command $PWDUTILS/snyk test --docker $REPOSITORY_URI:latest
Testing 300326902600.dkr.ecr.us-west-2.amazonaws.com/petstore_frontend:latest...
Organisation: sample-integrations
Package manager: rpm
Docker image: 300326902600.dkr.ecr.us-west-2.amazonaws.com/petstore_frontend:latest
✓ Tested 190 dependencies for known vulnerabilities, no vulnerable paths found.

Reporting

Snyk provides detailed reports for your imported projects. You can navigate to Projects and choose View Report to set the frequency with which the project is checked for vulnerabilities. You can also choose View Report and then the Dependencies tab to see which libraries were used. Snyk offers a comprehensive database and remediation guidance for known vulnerabilities in their Vulnerability DB. Specifics on potential vulnerabilities that may exist in your code would be contingent on the particular open source dependencies used with your application.

Cleaning up

Remember to delete any resources you may have created in order to avoid additional costs. If you used the AWS CloudFormation templates provided here, you can safely remove them by deleting those stacks from the AWS CloudFormation Console.

Conclusion

In this post, you learned how to leverage various AWS services to build a fully automated CI/CD pipeline and cloud IDE development environment. You also learned how to utilize Snyk to seamlessly integrate with AWS and secure your open-source dependencies and container images. If you are interested in learning more about DevSecOps with Snyk and AWS, then I invite you to check out this workshop and watch this video.

 

About the Author

Author Photo

 

Jay is a Senior Partner Solutions Architect at AWS bringing over 20 years of experience in various technical roles. He holds a Master of Science degree in Computer Information Systems and is a subject matter expert and thought leader for strategic initiatives that help customers embrace a DevOps culture.

 

 

Transforming DevOps at Broadridge on AWS

Post Syndicated from Som Chatterjee original https://aws.amazon.com/blogs/devops/transforming-devops-for-a-fintech-on-aws/

by Tom Koukourdelis (Broadridge – Vice President, Head of Global Cloud Platform Development and Engineering), Sreedhar Reddy (Broadridge – Vice President, Enterprise Cloud Architecture)

We have seen large enterprises in all industry segments meaningfully utilizing AWS to build new capabilities and deliver business value. While doing so, enterprises have to balance existing systems, processes, tools, and culture while innovating at pace with industry disruptors. Broadridge Financial Solutions, Inc. (NYSE: BR) is no exception. Broadridge is a $4 billion global FinTech leader and a leading provider of investor communications and technology-driven solutions to banks, broker-dealers, asset and wealth managers, and corporate issuers.

This blog post explores how we adopted AWS at scale while being secured and compliant, as well as delivering a high degree of productivity for our builders on AWS. It also describes the steps we took to create technical (a cloud solution as a foundation based on AWS) and procedural (organizational) capabilities by leveraging AWS cloud adoption constructs. The improvement in our builder productivity and agility directly contributes to rolling out differentiated business capabilities addressing our customer needs in a timely manner. In this post, we share real-life learnings and takeaways to adopt AWS at scale, transform business and application team experiences, and deliver customer delight.

Background

At Broadridge we have number of distributed and mainframe systems supporting multiple financial services domains and sub-domains such as post trade, proxy communications, financial and regulatory reporting, portfolio management, and financial operations. The majority of these systems were built and deployed years ago at on-premises data centers all over the US and abroad.

Builder personas at Broadridge are diverse in terms of location, culture, and the technology stack they use to build and support applications (we use a number of front-end JS frameworks; .NET; Java; ColdFusion for web development; ORMs for data entity relational mapping; IBM MQ; Apache Camel for messaging; databases like SQL, Oracle, Sybase, and other open source stacks for transaction management; databases, and batch processing on virtualized and bare metal instances). With more than 200 on-premises distributed applications and mainframe systems across front-, mid-, and back-office ecosystems, we wanted to leverage AWS to improve efficiency and build agility, and to reduce costs. The ability to reach customers at new geographies, reduced time to market, and opportunities to build new business competencies were key parameters as well.

Broadridge’s core tenents for cloud adoption

When AWS adoption within Broadridge attained a critical mass (known as the Foundation stage of adoption), the business and technology leadership teams defined our posture of cloud adoption and shared them with teams across the organization using the following tenets. Enterprises looking to adopt AWS at scale should define similar tenets fit for their organizations in plain language understandable by everyone across the board.

  • Iterate: Understanding that we cannot disrupt ongoing initiatives, small and iterative approach of moving workloads to cloud in waves— rinse and repeat— was to be adopted. Staying away from long-drawn, capital-intensive big bangs were to be avoided.
  • Fully automate: Starting from infrastructure deployment to application build, test, and release, we decided early on that automation and no-touch deployment are the right approach both to leverage cloud capabilities and to fuel a shift toward a matured DevOps culture.
  • Trust but verify only exceptions: Security and regulatory compliance are paramount for an organization like Broadridge. Guardrails (such as service control policies, managed AWS Config rules, multi-account strategy) and controls (such as PCI, NIST control frameworks) are iteratively developed to baseline every AWS account and AWS resource deployed. Manual security verification of workloads isn’t needed unless an exception is raised. Defense in depth (distancing attack surface from sensitive data and resources using multi-layered security) strategies were to be applied.
  • Go fast; re-hosting is acceptable: Not every workload needs to go through years of rewriting and refactoring before it is deemed suitable for the cloud. Minor tweaking (light touch re-platforming) to go fast (such as on-premises Oracle to RDS for Oracle) is acceptable.
  • Timeliness and small wins are key: Organizations spend large sums of capital to completely rewrite applications and by the time they are done, the business goal and customer expectation will have changed. That leads to material dissatisfaction with customers. We wanted to avoid that by setting small, measurable targets.
  • Cloud fluency: Investment in training and upskilling builders and leaders across the organization (developers, infra-ops, sec-ops, managers, salesforce, HR, and executive leadership) were to be to made to build fluency on the cloud.

The first milestone

The first milestone in our adoption journey was synonymous with Project stage of adoption and had the following characteristics.

A controlled sprawl of shadow IT

We first gave small teams with little to no exposure to critical business functions (such as customer data and SLA-oriented workloads) sandboxes to test out proofs of concepts (PoC) on AWS. We created the cloud sandboxes with least privilege, and added additional privileges upon request after verification. During this time, our key AWS usage characteristics were:

  • Manual AWS account setup with least privilege
  • Manual IAM role creation with role boundaries and authentication and authorization from the existing enterprise Active Directory
  • Integration with existing Security Information and Event Management (SIEM) tools to audit role sprawl and config changes
  • Proofs of concepts only
  • Account tagging for chargeback and tracking purposes
  • No automated build, test, deploy, or integration with existing delivery pipeline
  • Small and definitive timeframes for PoCs with defined goals

A typical AWS environment at this stage will resemble that shown in the following diagram:

Representative AWS usage during first milestone

As shown above, at this time the corporate assets were connected to a highly restrictive AWS environment through VPN. The access to the AWS environment were setup based on AWS Identity primitives or IAM roles mapped to and federated with the on-premises Active Directory. There was a single VPC setup for a sandbox account with no egress to the internet. There were no customer data hosted on this AWS environment and the AWS environment was connected with our SIEM of choice.

Early adopters became first educators and mentors

Members of the first teams to carry out proofs of concept on AWS shared learnings with each other and with the leadership team within Broadridge. This helped build communities of practices (CoPs) over time. Initial CoPs established were for networking and security, and were later extended to various practices like Terraform, Chef, and Jenkins.

Tech PMO team within Broadridge as the quasi-central cloud team

Ownership is vital no matter how small the effort and insignificant the impact of risky experimentation. The ownership of account setup, role creation, integration with on-premises AD and SIEM, and oversight to ensure that the experimentation does not pose any risk to the brand led us to build a central cloud team with experienced AWS and infrastructure practitioners. This team created a process for cloud migration with first manual guardrails of allowed and disallowed actions, manual interventions, and checkpoints built in every step.

At this stage, a representative pattern of work products across teams resembles what is shown below.

Work products across teams during initial stages of AWS usage

As the diagram suggests, individual application teams built overlapping—and, in many cases, identical—technical building blocks across the teams. This was acceptable as the teams were experimenting and running PoCs on AWS. In an actual production application delivery, the blocks marked with a * would be considered technical and functional waste—that is, undifferentiated lift which increases the cost of doing business.

The second milestone

In hindsight, this is perhaps the most important milestone in our cloud adoption journey. This step was marked with following key characteristics:

  • Every new team doing PoCs are rebuilding the same building blocks: This includes networking (VPCs and security groups), identity primitives (account, roles, and policies), monitoring (Amazon CloudWatch setup and custom metrics), and compute (images with org-mandated security patches).
  • The teams usually asking the same first fundamental questions: These include questions such as: What is an ideal CIDR block range? How do we integrate with SIEM? How do we spin up web servers on Amazon EC2? How do we secure access to data? How do we setup workload monitoring?
  • Security reviews rarely finding new security gaps but adding time to the process: A central security group as part of the central cloud team reviewed every new account request and every new service usage request without finding new security gaps when the application team used the baseline guardrails.
  • Manual effort is spent on tagging, chargeback, and other approvals: A portion of the application PoC/minimum viable product (MVP) lifecycle was spent on housekeeping. While housekeeping was necessary, the effort spent was undifferentiated.

The follow diagram represents the efforts for every team during the first phase.

Team wise efforts showing duplicative work

As shown above, every application team spent effort on building nearly the same capabilities before they could begin developing their team specific application functionalities and assets. The common blocks of work are undifferentiated and leads to spending effort which also varies depending on the efficiency of the team.

During this step, learnings from the PoCs led us to establish the tenets shared earlier in this post. To address the learnings, Broadridge established a cloud platform team. The cloud platform team, also referred to as the cloud enablement engine (CEE), is a team of builders who create the foundational building blocks on AWS that address common infrastructure, security, monitoring, auditing, and break-glass controls. At the same time, we established a cloud business office (CBO) as a liaison between the application and business teams and the CEE. CBO exists to manage and prioritize foundational requirements from multiple application teams as they go online on AWS and helps create the product backlog for CEE.

Cloud Enablement Engine Responsibilities:

  • Build out foundational building blocks utilizing AWS multi-account strategy
  • Build security guardrails, compliance controls, infrastructure as code automation, auditing and monitoring controls
  • Implement cloud platform backlog that funnel from CBO as common asks from app teams
  • Work with our AWS team to understand service roadmap, future releases, and provide feedback

Cloud Business Office Responsibilities:

  • Identify and prioritize repeating technical building blocks that cuts across multiple teams
  • Establish acceptable architecture patterns based on application use cases
  • Manage cloud programs to ensure CEE deliverables and business expectations align
  • Identify skilling needs, budget, and track spend
  • Contribute to the cloud platform backlog
  • Work with AWS team to understand service roadmap, future releases, and provide feedback

These teams were set up to scale AWS adoption, put building blocks into the hands of the applications teams, and ultimately deliver differentiated capabilities to Broadridge’s business teams and end customers. The following diagram translates the relationship and modus operandi among the teams:

CEE and CBO working model

Upon establishing the conceptual working model, the CBO and CEE teams looked at solutions from AWS to enable them to achieve the working model quickly. The starting point was AWS Landing Zone (ALZ). ALZ is an AWS solution based on the AWS multi-account strategy. It is a set of vetted constructs and best practices that we use as mechanisms to accelerate AWS adoption.

AWS multi-account strategy

The multi-account strategy employs best practices around separation of concerns, reduction of blast radius, account setup based on Software Development Life Cycle (SDLC) phases, and base operational roles for auditing, monitoring, security, and compliance, as shown in the above diagram. This strategy defines the need for having centralized shared or core accounts, which works as the master account for monitoring, governance, security, and auditing. A number of AWS services like Amazon GuardDuty, AWS Security Hub, and AWS Config configurations are set in these centralized accounts. Spoke or child accounts are vended as per a team’s requirement which are spun up with these governance, monitoring, and security defaults connected to the centralized account for log capturing, threat detection, configuration management, and security management.

The third milestone

The third milestone is synonymous with the Foundation stage of adoption

Using the ALZ construct, our CEE team developed a core set of principles to be used by every application team. Based on our core tenets, the CEE team built out an entry point (a web-based UI workflow application). This web UI was the entry point for any application team requesting an environment within AWS for experimentation or to begin the application development life cycle. Simplistically, the web UI sat on top of an automation engine built using APIs from AWS, ALZ components (Account Vending Machine, Shared Services Account, Logging Account, Security Account, default security groups, default IAM roles, and AD groups), and Terraform based code. The CBO team helped establish the common architecture patterns that was codified into this engine.

Team on-boarding workflow using foundational building blocks on AWS

An Angular based web UI is the starting point for application team to request for the AWS accounts. The web UI entry point asks a number of questions validating the type of account requested along with its intended purpose, ingress/egress requirements, high availability and disaster recovery requirements, business unit for charge back and ownership purposes. Once all information is entered, it sends out a notification based on a preset organization dispatch matrix rule. Upon receiving the request, the approver has the option to approve it or asks further clarification question. Once satisfactorily answered the approver approves the account vending request and a Terraform code is kicked in to create the default account.

When an account is created through this process, the following defaults are set up for a secure environment for development, testing, and staging. Similar guardrails are deployed in the production accounts as well.

  • Creates a new account under an existing AWS Organizational Unit (OU) based on the input parameters. Tags the chargeback codes, custom tags, and also integrates the resources with existing CMDB
  • Connects the new account to the master shared services and logging account as per the AWS Landing Zone constructs
  • Integrates with the CloudWatch event bus as a sender account
  • Runs stsAssumeRole commands on the new account to create infosec cross-account roles
  • Defines actions, conditions, role limits, and account policies
  • Creates environment variables related to the account in the parameter store within AWS Systems Manager
  • Connects the new account to TrendMicro for AV purposes
  • Attaches the default VPC of the new account to an existing AWS Transit Gateway
  • Generates a Splunk key for the account to store in the Splunk KV store
  • Uses AWS APIs to attach Enterprise support to the new account
  • Creates or amends a new AD group based on the IAM role
  • Integrates as an Amazon Macie member account
  • Enables AWS Security Hub for the account by running an enable-security-hub call
  • Sets up Chef runner for the new account
  • Runs account setting lock procedures to set Amazon S3 public settings, EBS default encryption setting
  • Enable firewall by setting AWS WAF rules for the account
  • Integrates the newly created account with CloudHealth and Dome9

Deploying all these guardrails in any new accounts removes the need for manual setup and intervention. This gives application developers the needed freedom to stop worrying about infrastructure and access provisioning while giving them a higher speed to value.

Using these technical and procedural cloud adoption constructs, we have been able to reduce application onboarding time. This has led to quicker delivery of business capability with the application teams focusing only on what differentiates their business rather than repeatedly building undifferentiated work products. This has also led to creation of mature building blocks over time for use of the application teams. Using these building blocks the teams are also modernizing applications by iteratively replacing old application blocks.

Conclusion

In summary, we are able to deliver better business outcomes and differentiated customer experience by:

  • Building common asks as reusable and automated enterprise assets and improving the overall enterprise-wide maturity by indexing on and growing these assets.
  • Depending on an experienced team to deliver baseline operational controls and guardrails.
  • Improving their security posture with higher-level and managed AWS security services instead of rebuilding everything from the ground up.
  • Using the Cloud Business Office to improve funneling of common asks. This helps the next team on AWS to benefit from a readily available set of approved services and application blueprints.

We will continue to build on and maturing these reusable building blocks by using AWS services and new feature releases.

 

The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

 

How to get started with security response automation on AWS

Post Syndicated from Cameron Worrell original https://aws.amazon.com/blogs/security/how-get-started-security-response-automation-aws/

At AWS, we encourage you to use automation to help quickly detect and respond to security events within your AWS environments. In addition to increasing the speed of detection and response, automation also helps you scale your security operations as you expand your workloads running on AWS. For these reasons, security automation is a key ­principle outlined in both the Well-Architected and Cloud Adoption frameworks as well as in the AWS Security Incident Response Guide.

In this blog post, you’ll learn to implement automated security response mechanisms within your AWS environments. This post will include common patterns, implementation considerations, and an example solution. Security response automation is a broad topic that spans many areas. The goal of this blog post is to introduce you to core concepts and help you get started.

A word from our lawyers: Please note that you are responsible for making your own independent assessment of the information in this post. This post: (a) is for informational purposes only, (b) represents current AWS product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers, or licensors.

What is security response automation?

Security response automation is a planned and programmed action taken to achieve a desired state for an application or resource based on a condition or event. When you implement security response automation, you should adopt an approach that draws from existing security frameworks. Frameworks are published materials which consist of standards, guidelines, and best practices in order help organizations manage cybersecurity-related risk. Using frameworks helps you achieve consistency and scalability and enables you to focus more on the strategic aspects of your security program. You should work with compliance professionals within your organization to understand any specific security frameworks that may also be relevant for your AWS environment.

Our example solution is based on the NIST Cybersecurity Framework (CSF), which is designed to help organizations assess and improve their ability to prevent, detect, and respond to security events. According to the CSF, “cybersecurity incident response” supports your ability to contain the impact of potential cybersecurity incidents. Although automation is not a CSF requirement, automating responses to events enables you to create repeatable, predictable approaches to monitoring and responding to threats.

The five main steps in the CSF are identify, protect, detect, respond and recover. We’ve expanded the detect and respond steps to include automation and investigation activities.
 

Figure 1: The five steps in the CSF

Figure 1: The five steps in the CSF

The following definitions for each step in the diagram above are based on the CSF but have been adapted for our example in this blog post. Although we will focus on the detect, automate and respond steps, it’s important to understand the entire process flow.

  • Identify: Identify and understand the resources, applications, and data within your AWS environment.
  • Protect: Develop and implement appropriate controls and safeguards to ensure delivery of services.
  • Detect: Develop and implement appropriate activities to identify the occurrence of a cybersecurity event. This step includes the implementation of monitoring capabilities which will be discussed further in the next section.
  • Automate: Develop and implement planned, programmed actions that will achieve a desired state for an application or resource based on a condition or event.
  • Investigate: Perform a systematic examination of the security event to establish the root cause.
  • Respond: Develop and implement appropriate activities to take automated or manual actions regarding a detected security event.
  • Recover: Develop and implement appropriate activities to maintain plans for resilience and to restore any capabilities or services that were impaired due to a security event.

Security response automation on AWS

AWS CloudTrail, AWS Config, and Amazon EventBridge continuously record details about the resources and configuration changes in your AWS account. You can use this information to automatically detect resource changes and to react to deviations from your desired state.
 

Figure 2: Automated remediation flow

Figure 2: Automated remediation flow

As shown in the diagram above, an automated remediation flow on AWS has three stages:

  • Monitor: Your automated monitoring tools collect information about resources and applications running in your AWS environment. For example, they might collect AWS CloudTrail information about activities performed in your AWS account, usage metrics from your Amazon EC2 instances, or flow log information about the traffic going to and from network interfaces in your Amazon Virtual Private Cloud (VPC).
  • Detect: When a monitoring tool detects a predefined condition—such as a breached threshold, anomalous activity, or configuration deviation—it raises a flag within the system. A triggering condition might be an anomalous activity detected by Amazon GuardDuty, a resource becoming out of compliance with an AWS Config Rule, or a high rate of blocked requests on an Amazon VPC security group or AWS WAF web access control list.
  • Respond: When a condition is flagged, an automated response is triggered that performs an action you’ve predefined—something intended to remediate or mitigate the flagged condition. Examples of automated response actions might include modifying a VPC security group, patching an Amazon EC2 instance, or rotating credentials.

You can use the event-driven flow described above to achieve many automated response patterns with varying degrees of complexity. Your response pattern could be as simple as invoking a single AWS Lambda function, or it could be a complex series of AWS Step Function tasks with advanced logic. In this blog post, we’ll use two simple Lambda functions in our example solution.

How to define your response automation

Now that we’ve introduced the concept of security response automation, start thinking about security requirements within your environment that you’d like to enforce through automation. These design requirements might come from general best practices you’d like to follow, or they might be specific controls from compliance frameworks relevant for your business. Regardless, your objectives should be quantitative, not qualitative. Here are some examples of quantitative objectives:

  • Remote administrative network access to servers should be limited.
  • Server storage volumes should be encrypted.
  • AWS console logins should be protected by multi-factor authentication.

As an optional step, you can expand these objectives into user stories that define the conditions and remediation actions when there is an event. User stories are informal descriptions that briefly document a feature within a software system. User stories may be global and span across multiple applications or they may be specific to a single application. For example:

“Remote administrative network access to servers should be limited. Remote access ports include SSH TCP port 22 and RDP TCP port 3389. If open remote access ports are detected within the environment, they should be automatically closed and the owner will be notified.”

Once you’ve completed your user story, you can determine how to use automated remediation to help achieve these objectives in your AWS environment. User stories should be stored in a location that provides versioning support and can reference the associated automation code.

You should carefully consider the effect of your remediation mechanisms in order to prevent unintended impact on your resources and applications. Remediation actions such as instance termination, credential revocation, and security group modification can adversely affect application availability. Depending on the level of risk that’s acceptable to your organization, your automated mechanism might only provide a notification which can then be manually investigated prior to remediation. Once you’ve identified an automated remediation mechanism, you can build out the required components and test them in a non-production environment.

Sample response automation walkthrough

In the following section, we’ll walk you through an automated remediation for a simulated event that indicates potential unauthorized activity—the unintended disabling of CloudTrail logging. Outside parties might want to disable logging to prevent detection and recording of their unauthorized activity. Our response is to re-enable the CloudTrail logging and immediately notify the security contact. Here’s the user story for this scenario:

“CloudTrail logging should be enabled for all AWS accounts and regions. If CloudTrail logging is disabled, it will automatically be enabled and the security operations team will be notified.”

Note: The sample response automation below references Amazon EventBridge which extends and builds upon CloudWatch Events. Amazon EventBridge uses the same Amazon CloudWatch Events API, so the event structure and rules configuration are the same. This blog post uses base functionality that is identical in both EventBridge and CloudWatch Events.

Prerequisites

In order to use our sample remediation, you will need to enable Amazon GuardDuty and AWS Security Hub in the AWS Region you have selected. Both of these services include a 30-day free trial. See the AWS Security Hub pricing page and the Amazon GuardDuty pricing page for additional details.

Important: You’ll use AWS CloudTrail to test the sample remediation. Running more than one CloudTrail trail in your AWS account will result in charges based on the number of events processed while the trail is running. Charges for additional copies of management events recorded in a Region are applied based on the published pricing plan. To minimize the charges, follow the clean-up steps that we provide later in this post to remove the sample automation and delete the trail.

Deploy the sample response automation

In this section, we’ll show you how to deploy and test the CloudTrail logging remediation sample. Amazon GuardDuty generates the finding Stealth:IAMUser/CloudTrailLoggingDisabled when CloudTrail logging is disabled, and AWS Security Hub collects findings from GuardDuty using the standardized finding format mentioned earlier. We recommend that you deploy this sample into a non-production AWS account.

Select the Launch Stack button below to deploy a CloudFormation template with an automation sample in the us-east-1 Region. You can also download the template and implement it in another Region. The template consists of an Amazon EventBridge rule, an AWS Lambda function and the IAM permissions necessary for both components to execute. It takes several minutes for the CloudFormation stack build to complete.

Select this image to open a link that starts building the CloudFormation stack

  1. In the CloudFormation console, choose the Select Template form, and then select Next.
  2. On the Specify Details page, provide the email address for a security contact. (For the purpose of this walkthrough, it should be an email address you have access to.) Then select Next.
  3. On the Options page, accept the defaults, then select Next.
  4. On the Review page, confirm the details, then select Create.
  5. While the stack is being created, check the inbox of the email address you provided in step 2. Look for an email message with the subject AWS Notification – Subscription Confirmation. Select the link in the body of the email to confirm your subscription to the Amazon Simple Notification Service (Amazon SNS) topic. You should see a success message similar to the screenshot below:
     
    Figure 3: SNS subscription confirmation

    Figure 3: SNS subscription confirmation

  6. Return to the CloudFormation console. Once the Status field for the CloudFormation stack changes to CREATE COMPLETE (as shown in figure 4), the solution is implemented and is ready for testing.
     
    Figure 4: CREATE_COMPLETE status

    Figure 4: CREATE_COMPLETE status

Test the sample automation

You’re now ready to test the automated response by creating a test trail in CloudTrail, then trying to stop it.

  1. From the AWS Management Console, choose Services > CloudTrail.
  2. Select Trails, then select Create Trail.
  3. On the Create Trail form:
    1. Enter a value for Trail name. We use test-trail in our example below.
    2. Under Management events, select Write-only (to minimize event volume).
       
      Figure 5: Create a CloudTrail trail

      Figure 5: Create a CloudTrail trail

    3. Under Storage location, choose an existing S3 bucket or create a new one. Note that since S3 bucket names are globally unique, you must add characters (such as a random string) to the name. For example: my-test-trail-bucket-<random-string>.
  4. On the Trails page of the CloudTrail console, verify that the new trail has started. You should see a green checkmark in the Status column, as shown in figure 6.
     
    Figure 6: Verify new trail has started

    Figure 6: Verify new trail has started

  5. You’re now ready to act like an unauthorized user trying to cover their tracks! Stop the logging for the trail you just created:
    1. Select the new trail name to display its configuration page.
    2. Toggle the Logging switch in the top-right corner to OFF.
    3. When prompted with a warning dialog box, select Continue.
    4. Verify that the Logging switch is now off, as shown below.
       
      Figure 7: Verify logging switch is off

      Figure 7: Verify logging switch is off

      You have now simulated a security event by disabling logging for one of the trails in the CloudTrail service. Within the next few seconds, the near real-time automated response will detect the stopped trail, restart it, and send an email notification. You can refresh the Trails page of the CloudTrail console to verify that the trail’s status is ON again.

      Within the next several minutes, the investigatory automated response will also begin. GuardDuty will detect the action that stopped the trail and enrich the data about the source of unexpected behavior. Security Hub will then ingest that information and optionally correlate with other security events.

      Following the steps below, you can monitor findings within Security Hub for the finding type TTPs/Defense Evasion/Stealth:IAMUser-CloudTrailLoggingDisabled to be generated:

  6. In the AWS Management Console, choose Services > Security Hub
    1. Select Findings in the left pane.
    2. Select the Add filters field, then select Type.
    3. Select EQUALS, paste TTPs/Defense Evasion/Stealth:IAMUser-CloudTrailLoggingDisabled into the field, then select Apply.
    4. Refresh your browser periodically until the finding is generated.
    5. Figure 8: Monitor Security Hub for your finding

      Figure 8: Monitor Security Hub for your finding

While you wait on that detection, let’s dig into the components of automation.

How the sample automation works

This example incorporates two automated responses: a near real-time workflow and an investigatory workflow. The near real-time workflow provides a rapid response to an individual event, in this case the stopping of a trail. The goal is to restore the trail to a functioning state and alert security responders as quickly as possible. The investigatory workflow still includes a response to provide defense in depth and also uses services that support a more in-depth investigation of the incident.

Figure 9: Sample automation workflow

Figure 9: Sample automation workflow

In the near real-time workflow, Amazon EventBridge monitors for the undesired activity. When a trail is stopped, AWS CloudTrail publishes an event on the EventBridge bus. An EventBridge rule detects the trail-stopping event and invokes a Lambda function to respond to the event by restarting the trail and notifying the security contact via an Amazon Simple Notification Service (SNS) topic.

In the investigative workflow, CloudTrail logs are monitored for undesired activities. For example, if a trail is stopped, there will be a corresponding log record. GuardDuty detects this activity and retrieves additional data points about the source IP that executed the API call. Two common examples of those additional data points in GuardDuty findings include whether the API call came from an IP address on a threat list, or whether it came from a network not commonly used in your AWS account. An AWS Lambda function responds by restarting the trail and notifying the security contact. Finally, the finding is imported into AWS Security Hub for additional investigation.

AWS Security Hub imports findings from AWS security services such as GuardDuty, Amazon Macie and Amazon Inspector, plus from any third-party product integrations you’ve enabled. All findings are provided to Security Hub in AWS Security Finding Format, which eliminates the need for data conversion. Security Hub correlates these findings to help you identify related security events and determine a root cause. Security Hub also publishes its findings to Amazon EventBridge to enable further processing by other AWS services such as AWS Lambda.

Respond step deep dive

Amazon EventBridge and AWS Lambda work together to respond to a security finding. Amazon EventBridge is a service that provides real-time access to changes in data in AWS services, your own applications, and Software-as-a-Service (SaaS) applications without writing code. In this example, EventBridge identifies a Security Hub finding that requires action and invokes a Lambda function that performs remediation. As shown in figure 10, the Lambda function both notifies the security operator via SNS and restarts the stopped CloudTrail.

Figure 10: Sample "respond" workflow

Figure 10: Sample “respond” workflow

To set this response up, we looked for an event to indicate that a trail had stopped or was disabled. We knew that the GuardDuty finding Stealth:IAMUser/CloudTrailLoggingDisabled is raised when CloudTrail logging is disabled. Therefore, we configured the default event bus to look for this event. You can learn more about all of the available GuardDuty findings in the user guide.

How the code works

When Security Hub publishes a finding to EventBridge, it includes full details of the incident as discovered by GuardDuty. The finding is published in JSON format. If you review the details of the sample finding, note that it has several fields helping you identify the specific events that you’re looking for. Here are some of the relevant details:


{
   …
   "source":"aws.securityhub",
   …
   "detail":{
      "findings": [{
		…
    	“Types”: [
			"TTPs/Defense Evasion/Stealth:IAMUser-CloudTrailLoggingDisabled"
			],
		…
      }]
}

You can build an event pattern using these fields, which an EventBridge filtering rule can then use to identify events and to invoke the remediation Lambda function. Below is a snippet from the CloudFormation template we provided earlier that defines that event pattern for the EventBridge filtering rule:


# pattern matches the nested JSON format of a specific Security Hub finding
      EventPattern:
        source:
        - aws.securityhub
        detail-type:
          - "Security Hub Findings - Imported"
        detail:
          findings:
            Types:
              - "TTPs/Defense Evasion/Stealth:IAMUser-CloudTrailLoggingDisabled"

Once the rule is in place, EventBridge continuously scans the event bus for this pattern. When EventBridge finds a match, it invokes the remediating Lambda function and passes the full details of the event to the function. The Lambda function then parses the JSON fields in the event so that it can act as shown in this Python code snippet:


# extract trail ARN by parsing the incoming Security Hub finding (in JSON format)
trailARN = event['detail']['findings'][0]['ProductFields']['action/awsApiCallAction/affectedResources/AWS::CloudTrail::Trail']   

# description contains useful details to be sent to security operations
description = event['detail']['findings'][0]['Description']

The code also issues a notification to security operators so they can review the findings and insights in Security Hub and other services to better understand the incident and to decide whether further manual actions are warranted. Here’s the code snippet that uses SNS to send out a note to security operators:


#Sending the notification that the AWS CloudTrail has been disabled.
snspublish = snsclient.publish(
	TargetArn = snsARN,
	Message="Automatically restarting CloudTrail logging.  Event description: \"%s\" " %description
	)

While notifications to human operators are important, the Lambda function will not wait to take action. It immediately remediates the condition by restarting the stopped trail in CloudTrail. Here’s a code snippet that restarts the trail to reenable logging:


#Enabling the AWS CloudTrail logging
try:
	client = boto3.client('cloudtrail')
	enablelogging = client.start_logging(Name=trailARN)
	logger.debug("Response on enable CloudTrail logging- %s" %enablelogging)
except ClientError as e:
	logger.error("An error occured: %s" %e)

After the trail has been restarted, API activity is once again logged and can be audited. This can help provide relevant data for the remaining steps in the incident response process. The data is especially important for the post-incident phase, when your team analyzes lessons learned to prevent future incidents. You can also use this phase to identify additional steps to automate in your incident response.

Clean up

After you’ve completed the sample security response automation, we recommend that you remove the resources created in this walkthrough example from your account in order to minimize any charges associated with the trail in CloudTrail and data stored in S3.

Important: Deleting resources in your account can negatively impact the applications running in your AWS account. Verify that applications and AWS account security do not depend on the resources you’re about to delete.

Here are the clean-up steps:

  1. Delete the CloudFormation stack.
  2. Delete the trail you created in CloudTrail.
  3. If you created an S3 bucket for CloudTrail logs, you can also delete that S3 bucket.
  4. New accounts can try GuardDuty at no cost for 30 days. You can suspend or disable GuardDuty before the free trial period to avoid charges.
  5. Security Hub comes with a 30-day free trial. You can avoid charges by disabling the service before the trial period is over.

Summary

You’ve learned the basic concepts and considerations behind security response automation on AWS and how to use Amazon EventBridge, Amazon GuardDuty and AWS Security Hub to automatically re-enable AWS CloudTrail when it becomes disabled unexpectedly. As a next step, you may want to start building your own response automations and dive deeper into the AWS Security Incident Response Guide, NIST Cybersecurity Framework (CSF) or the AWS Cloud Adoption Framework (CAF) Security Perspective. You can explore additional automatic remediation solutions on the AWS Security Blog. You can find the code used in this example on GitHub.

If you have feedback about this blog post, submit them in the Comments section below. If you have questions about using this solution, start a thread in the EventBridgeGuardDuty or Security Hub forums, or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Cameron Worrell

Cameron Worrell

Cameron is a Solutions Architect with a passion for security and enterprise transformation. He joined AWS in 2015.

Alex Tomic

Alex Tomic

Alex is an AWS Enterprise Solutions Architect focused on security and compliance. He joined AWS in 2014.

Nathan Case

Nathan Case

Nathan is a Senior Security Strategist, and joined AWS in 2016. He is always interested to see where our customers plan to go and how we can help them get there. He is also interested in intel, combined data lake sharing opportunities, and open source collaboration. In the end Nathan loves technology and that we can change the world to make it a better place.

Setting up a CI/CD pipeline by integrating Jenkins with AWS CodeBuild and AWS CodeDeploy

Post Syndicated from Noha Ghazal original https://aws.amazon.com/blogs/devops/setting-up-a-ci-cd-pipeline-by-integrating-jenkins-with-aws-codebuild-and-aws-codedeploy/

In this post, I explain how to use the Jenkins open-source automation server to deploy AWS CodeBuild artifacts with AWS CodeDeploy, creating a functioning CI/CD pipeline. When properly implemented, the CI/CD pipeline is triggered by code changes pushed to your GitHub repo, automatically fed into CodeBuild, then the output is deployed on CodeDeploy.

Solution overview

The functioning pipeline creates a fully managed build service that compiles your source code. It then produces code artifacts that can be used by CodeDeploy to deploy to your production environment automatically.

The deployment workflow starts by placing the application code on the GitHub repository. To automate this scenario, I added source code management to the Jenkins project under the Source Code section. I chose the GitHub option, which by design clones a copy from the GitHub repo content in the Jenkins local workspace directory.

In the second step of my automation procedure, I enabled a trigger for the Jenkins server using an “Poll SCM” option. This option makes Jenkins check the configured repository for any new commits/code changes with a specified frequency. In this testing scenario, I configured the trigger to perform every two minutes. The automated Jenkins deployment process works as follows:

  1. Jenkins checks for any new changes on GitHub every two minutes.
  2. Change determination:
    1. If Jenkins finds no changes, Jenkins exits the procedure.
    2. If it does find changes, Jenkins clones all the files from the GitHub repository to the Jenkins server workspace directory.
  3. The File Operation plugin deletes all the files cloned from GitHub. This keeps the Jenkins workspace directory clean.
  4. The AWS CodeBuild plugin zips the files and sends them to a predefined Amazon S3 bucket location then initiates the CodeBuild project, which obtains the code from the S3 bucket. The project then creates the output artifact zip file, and stores that file again on the S3 bucket.
  5. The HTTP Request plugin downloads the CodeBuild output artifacts from the S3 bucket.
    I edited the S3 bucket policy to allow access from the Jenkins server IP address. See the following example policy:

    {
      "Version": "2012-10-17",
      "Id": "S3PolicyId1",
      "Statement": [
        {
          "Sid": "IPAllow",
          "Effect": "Allow",
          "Principal": "*",
          "Action": "s3:*",
          "Resource": "arn:aws:s3:::examplebucket/*",
          "Condition": {
             "IpAddress": {"aws:SourceIp": "x.x.x.x/x"},  <--- IP of the Jenkins server
          } 
        } 
      ]
    }
    
    

    This policy enables the HTTP request plugin to access the S3 bucket. This plugin doesn’t use the IAM instance profile or the AWS access keys (access key ID and secret access key).

  6. The output artifact is a compressed ZIP file. The CodeDeploy plugin by design requires the files to be unzipped to zip them and send them over to the S3 bucket for the CodeDeploy deployment. For that, I used the File Operation plugin to perform the following:
    1. Unzip the CodeBuild zipped artifact output in the Jenkins root workspace directory. At this point, the workspace directory should include the original zip file downloaded from the S3 bucket from Step 5 and the files extracted from this archive.
    2. Delete the original .zip file, and leave only the source bundle contents for the deployment.
  7. The CodeDeploy plugin selects and zips all workspace directory files. This plugin uses the CodeDeploy application name, deployment group name, and deployment configurations that you configured to initiate a new CodeDeploy deployment. The CodeDeploy plugin then uploads the newly zipped file according to the S3 bucket location provided to CodeDeploy as a source code for its new deployment operation.

Walkthrough

In this post, I walk you through the following steps:

  • Creating resources to build the infrastructure, including the Jenkins server, CodeBuild project, and CodeDeploy application.
  • Accessing and unlocking the Jenkins server.
  • Creating a project and configuring the CodeDeploy Jenkins plugin.
  • Testing the whole CI/CD pipeline.

Create the resources

In this section, I show you how to launch an AWS CloudFormation template, a tool that creates the following resources:

  • Amazon S3 bucket—Stores the GitHub repository files and the CodeBuild artifact application file that CodeDeploy uses.
  • IAM S3 bucket policy—Allows the Jenkins server access to the S3 bucket.
  • JenkinsRole—An IAM role and instance profile for the Amazon EC2 instance for use as a Jenkins server. This role allows Jenkins on the EC2 instance to access the S3 bucket to write files and access to create CodeDeploy deployments.
  • CodeDeploy application and CodeDeploy deployment group.
  • CodeDeploy service role—An IAM role to enable CodeDeploy to read the tags applied to the instances or the EC2 Auto Scaling group names associated with the instances.
  • CodeDeployRole—An IAM role and instance profile for the EC2 instances of CodeDeploy. This role has permissions to write files to the S3 bucket created by this template and to create deployments in CodeDeploy.
  • CodeBuildRole—An IAM role to be used by CodeBuild to access the S3 bucket and create the build projects.
  • Jenkins server—An EC2 instance running Jenkins.
  • CodeBuild project—This is configured with the S3 bucket and S3 artifact.
  • Auto Scaling group—Contains EC2 instances running Apache and the CodeDeploy agent fronted by an Elastic Load Balancer.
  • Auto Scaling launch configurations—For use by the Auto Scaling group.
  • Security groups—For the Jenkins server, the load balancer, and the CodeDeploy EC2 instances.

 

  1. To create the CloudFormation stack (for example in the AWS Frankfurt Region) click the below link:
    .

    .
  2. Choose Next and provide the following values on the Specify Details page:
    • For Stack name, name your stack as you prefer.
    • For CodedeployInstanceCount, choose the default of t2.medium.
      To check the supported instance types by AWS Region, see Supported Regions.
    • For InstanceCount, keep the default of 3, to launch three EC2 instances for CodeDeploy.
    • For JenkinsInstanceType, keep the default of t2.medium.
    • For KeyName, choose an existing EC2 key pair in your AWS account. Use this to connect by using SSH to the Jenkins server and the CodeDeploy EC2 instances. Make sure that you have access to the private key of this key pair.
    • For PublicSubnet1, choose a public subnet from which the load balancer, Jenkins server, and CodeDeploy web servers launch.
    • For PublicSubnet2, choose a public subnet from which the load balancers and CodeDeploy web servers launch.
    • For VpcId, choose the VPC for the public subnets you used in PublicSubnet1 and PublicSubnet2.
    • For YourIPRange, enter the CIDR block of the network from which you connect to the Jenkins server using HTTP and SSH. If your local machine has a static public IP address, go to https://www.whatismyip.com/ to find your IP address, and then enter your IP address followed by /32. If you don’t have a static IP address (or aren’t sure if you have one), enter 0.0.0.0/0. Then, any address can reach your Jenkins server.
      .
  3. Choose Next.
  4. On the Review page, select the I acknowledge that this template might cause AWS CloudFormation to create IAM resources check box.
  5. Choose Create and wait for the CloudFormation stack status to change to CREATE_COMPLETE. This takes approximately 6–10 minutes.
  6. Check the resulting values on the Outputs tab. You need them later.
    .
  7. Browse to the ELBDNSName value from the Outputs tab, verifying that you can see the Sample page. You should see a congratulatory message.
  8. Your Jenkins server should be ready to deploy.

Access and unlock your Jenkins server

In this section, I discuss how to access, unlock, and customize your Jenkins server.

  1. Copy the JenkinsServerDNSName value from the Outputs tab of the CloudFormation stack, and paste it into your browser.
  2. To unlock the Jenkins server, SSH to the server using the IP address and key pair, following the instructions from Unlocking Jenkins.
  3. Use the root user to Cat the log file (/var/log/jenkins/jenkins.log) and copy the automatically generated alphanumeric password (between the two sets of asterisks). Then, use the password to unlock your Jenkins server, as shown in the following screenshots.
    .
  4. On the Customize Jenkins page, choose Install suggested plugins.

  5. Wait until Jenkins installs all the suggested plugins. When the process completes, you should see the check marks alongside all of the installed plugins.
    .
    .
  6. On the Create First Admin User page, enter a user name, password, full name, and email address of the Jenkins user.
  7. Choose Save and continue, Save and finish, and Start using Jenkins.
    .
    After you install all the needed Jenkins plugins along with their required dependencies, the Jenkins server restarts. This step should take about two minutes. After Jenkins restarts, refresh the page. Your Jenkins server should be ready to use.

Create a project and configure the CodeDeploy Jenkins plugin

Now, to create our project in Jenkins we need to configure the required Jenkins plugin.

  1. Sign in to Jenkins with the user name and password that you created earlier and click on Manage Jenkins then Manage Plugins.
  2. From the Available tab search for and select the below plugins then choose Install without restart:
    .
    AWS CodeDeploy
    AWS CodeBuild
    Http Request
    File Operations
    .
  3. Select the Restart Jenkins when installation is complete and no jobs are running.
    Jenkins will take couple of minutes to download the plugins along with their dependencies then will restart.
  4. Login then choose New Item, Freestyle project.
  5. Enter a name for the project (for example, CodeDeployApp), and choose OK.
    .

    .
  6. On the project configuration page, under Source Code Management, choose Git. For Repository URL, enter the URL of your GitHub repository.
    .

    .
  7. For Build Triggers, select the Poll SCM check box. In the Schedule, for testing enter H/2 * * * *. This entry tells Jenkins to poll GitHub every two minutes for updates.
    .

    .
  8. Under Build Environment, select the Delete workspace before build starts check box. Each Jenkins project has a dedicated workspace directory. This option allows you to wipe out your workspace directory with each new Jenkins build, to keep it clean.
    .

    .
  9. Under Build Actions, add a Build Step, and AWS CodeBuild. On the AWS Configurations, choose Manually specify access and secret keys and provide the keys.
    .
    .
  10. From the CloudFormation stack Outputs tab, copy the AWS CodeBuild project name (myProjectName) and paste it in the Project Name field. Also, set the Region that you are using and choose Use Jenkins source.
    It is a best practice is to store AWS credentials for CodeBuild in the native Jenkins credential store. For more information, see the Jenkins AWS CodeBuild Plugin wiki.
    .
    .
  11. To make sure that all files cloned from the GitHub repository are deleted choose Add build step and select File Operation plugin, then click Add and select File Delete. Under File Delete operation in the Include File Pattern, type an asterisk.
    .
    .
  12. Under Build, configure the following:
    1. Choose Add a Build step.
    2. Choose HTTP Request.
    3. Copy the S3 bucket name from the CloudFormation stack Outputs tab and paste it after (http://s3-eu-central-1.amazonaws.com/) along with the name of the zip file codebuild-artifact.zip as the value for HTTP Plugin URL.
      Example: (http://s3-eu-central-1.amazonaws.com/mybucketname/codebuild-artifact.zip)
    4. For Ignore SSL errors?, choose Yes.
      .

      .
  13. Under HTTP Request, choose Advanced and leave the default values for Authorization, Headers, and Body. Under Response, for Output response to file, enter the codebuild-artifact.zip file name.
    .

    .
  14. Add the two build steps for the File Operations plugin, in the following order:
    1. Unzip action: This build step unzips the codebuild-artifact.zip file and places the contents in the root workspace directory.
    2. File Delete action: This build step deletes the codebuild-artifact.zip file, leaving only the source bundle contents for deployment.
      .
      .
  15. On the Post-build Actions, choose Add post-build actions and select the Deploy an application to AWS CodeDeploy check box.
  16. Enter the following values from the Outputs tab of your CloudFormation stack and leave the other settings at their default (blank):
    • For AWS CodeDeploy Application Name, enter the value of CodeDeployApplicationName.
    • For AWS CodeDeploy Deployment Group, enter the value of CodeDeployDeploymentGroup.
    • For AWS CodeDeploy Deployment Config, enter CodeDeployDefault.OneAtATime.
    • For AWS Region, choose the Region where you created the CodeDeploy environment.
    • For S3 Bucket, enter the value of S3BucketName.
      The CodeDeploy plugin uses the Include Files option to filter the files based on specific file names existing in your current Jenkins deployment workspace directory. The plugin zips specified files into one file. It then sends them to the location specified in the S3 Bucket parameter for CodeDeploy to download and use in the new deployment.
      .
      As shown below, in the optional Include Files field, I used (**) so all files in the workspace directory get zipped.
      .
      .
  17. Choose Deploy Revision. This option registers the newly created revision to your CodeDeploy application and gets it ready for deployment.
  18. Select the Wait for deployment to finish? check box. This option allows you to view the CodeDeploy deployments logs and events on your Jenkins server console output.
    .
    .
    Now that you have created a project, you are ready to test deployment.

Testing the whole CI/CD pipeline

To test the whole solution, put an application on your GitHub repository. You can download the sample from here.

The following screenshot shows an application tree containing the application source files, including text and binary files, executables, and packages:

In this example, the application files are the templates directory, test_app.py file, and web.py file.

The appspec.yml file is the main application specification file telling CodeDeploy how to deploy your application. Jenkins uses the AppSpec file to manage each deployment as a series of lifecycle event “hooks”, as defined in the file. For information about how to create a well-formed AppSpec file, see AWS CodeDeploy AppSpec File Reference.

The buildspec.yml file is a collection of build commands and related settings, in YAML format, that CodeBuild uses to run a build. You can include a build spec as part of the source code, or you can define a build spec when you create a build project. For more information, see How AWS CodeBuild Works.

The scripts folder contains the scripts that you would like to run during the CodeDeploy LifecycleHooks execution with respect to your application requirements. For more information, see Plan a Revision for AWS CodeDeploy.

To test this solution, perform the following steps:

  1. Unzip the application files and send them to your GitHub repository, run the following git commands from the path where you placed your sample application:
    $ git add -A
    
    $ git commit -m 'Your first application'
    
    $ git push
  2. On the Jenkins server dashboard, wait for two minutes until the previously set project trigger starts working. After the trigger starts working, you should see a new build taking place.
    .

    .
  3. In the Jenkins server Console Output page, check the build events and review the steps performed by each Jenkins plugin. You can also review the CodeDeploy deployment in detail, as shown in the following screenshot:
    .

On completion, Jenkins should report that you have successfully deployed a web application. You can also use your ELBDNSName value to confirm that the deployed application is running successfully.

.

.Conclusion

In this post, I outlined how you can use a Jenkins open-source automation server to deploy CodeBuild artifacts with CodeDeploy. I showed you how to construct a functioning CI/CD pipeline with these tools. I walked you through how to build the deployment infrastructure and automatically deploy application version changes from GitHub to your production environment.

Hopefully, you have found this post informative and the proposed solution useful. As always, AWS welcomes all feedback or comment.

About the Author

.

 

Noha Ghazal is a Cloud Support Engineer at Amazon Web Services. She is is a subject matter expert for AWS CodeDeploy. In her role, she enjoys supporting customers with their CodeDeploy and other DevOps configurations. Outside of work she enjoys drawing portraits, fishing and playing video games.

 

 

Nine AWS Security Hub best practices

Post Syndicated from Ketan Srivastava original https://aws.amazon.com/blogs/security/nine-aws-security-hub-best-practices/

AWS Security Hub is a security and compliance service that became generally available on June 25, 2019. It provides you with extensive visibility into your security and compliance status across multiple AWS accounts, in a single dashboard per region. The service helps you monitor critical settings to ensure that your AWS accounts remain secure, allowing you to notice and react quickly to any changes in your environment.

AWS Security Hub aggregates, organizes, and prioritizes security findings from supported AWS services—that’s Amazon GuardDuty, Amazon Inspector, and Amazon Macie at the time this post was published—and from various AWS partner security solutions. AWS Security Hub also generates its own findings, based on automated, resource-level and account-level configuration and compliance checks using service-linked AWS Config rules plus other analytic techniques. These checks help you keep your AWS accounts compliant with industry standards and best practices, such as the Center for Internet Security (CIS) AWS Foundations standard.

In this post, I’ll provide nine best practices to help you use AWS Security Hub as effectively as possible.

1. Use the AWS Labs script to turn on Security Hub in all your AWS accounts in all regions and to establish your existing Amazon GuardDuty master/member hierarchy

As a best practice, you should continuously monitor all regions across all of your AWS accounts for unauthorized behavior or misconfigurations, even in regions that you don’t use heavily. AWS already recommends that you do this when using monitoring services like AWS Config and AWS CloudTrail. I recommend that you enable Security Hub in every region available in your AWS accounts.

In addition, you can also invite other AWS accounts to enable Security Hub and share findings with your AWS account. If you send an invitation and it is accepted by the other account owner, your Security Hub account is designated as the master account, and any associated Security Hub accounts become your member accounts. Users from the master account will then be able to view Security Hub findings from member accounts.

To simplify these configurations, you can utilize the AWS Labs script available on GitHub, which provides a step-by-step guide to automate this process. This script allows you to enable (and disable) AWS Security Hub simultaneously across a list of associated AWS accounts and bulk-add them to become your Security Hub members; it sends invitations from the master account and automatically accepts invitations in all member accounts. To run the script, you must have the AWS account IDs and root email addresses of the AWS accounts that you want as your Security Hub members. (Note that you should only share your root email address and account ID with AWS accounts that you trust. Visit the IAM best practices page to learn more about how to keep access to your AWS accounts secured.)

By default, the Security Hub master/member association is independent of the relationships that you’ve established between your Amazon GuardDuty or Amazon Macie accounts and other associated accounts. If you have an existing master/member hierarchy in GuardDuty or Macie, you can export that list of accounts into a CSV file and then use it with the script. For example in GuardDuty, use the ListMembers API to export the AWS Account ID and email of all member accounts, as follows:

aws guardduty list-members –detector-id <Detector ID> –query "Members[].[AccountId, Email]" –output text | awk ‘{print $1 "," $2}’

The output of the above command will be your GuardDuty member account IDs and their corresponding root email addresses, one per line and separated with a comma as shown below:

12345678910,[email protected]
98765432101,[email protected]

2. Enable AWS Config in all AWS accounts and regions and leave the AWS CIS Foundations standard check enabled

When you enable Security Hub in any region, the AWS CIS standard checks are enabled by default. I recommend leaving them enabled; they are important security measures that are applicable to all AWS accounts.

To run most of these checks, Security Hub uses service-linked AWS Config rules. Because of this, you should make sure that AWS Config is turned on and recording all supported resources, including global resources, in all accounts and regions where Security Hub is deployed. You are not charged by AWS Config for these service-linked rules. You are only charged via Security Hub’s pricing model.

3. Use specific managed IAM policies for different types of Security Hub users

You can choose to allow a large group of users to access List and Read Security Hub actions, which will permit them to view your security findings. However, you should allow only a small group of users to access the Security Hub Write actions. This will permit only authorized users to archive, resolve, or remediate the findings.

You can use AWS managed policies to give your employees the permissions they need to get started. These policies are already available in your account and are maintained and updated by AWS. To grant more granular permission to your Security Hub users, I recommend that you create your own customer managed policies. A great place to start with this is to import an existing AWS managed policy. That way, you know that the policy is initially correct, and all you need to do is customize it for your environment.

AWS categorizes each service action into one of five access levels based on what each action does: List, Read, Write, Permissions management, or Tagging. To determine which access level to include in the IAM policies that you assign to your users, you can view the policy summary by navigating from the IAM Console to Policies, then selecting any AWS managed or customer managed policy. Next, on the Summary page, under the Permissions tab, select Policy summary (see Figure 1). For more details and examples of access level classification, see Understanding Access Level Summaries Within Policy Summaries.
 

Figure 1: Policy summary of AWSSecurityHubReadOnlyAccess AWS managed policy

Figure 1: Policy summary of AWSSecurityHubReadOnlyAccess AWS managed policy

4. Use tags for access controls and cost allocation

A SecurityHub::Hub resource represents the implementation of the AWS Security Hub service per region in your AWS account. Security Hub allows you to assign metadata to your SecurityHub::Hub resource in the form of tags. Each tag is a string consisting of a user-defined key and an optional key-value that makes it easier for you to identify and manage the AWS resources in your environment.

You can control access permissions by using tags on your SecurityHub::Hub resource. For example, you can allow a group of developer IAM entities to manage and update only the SecurityHub::Hub resources that have the tag key developer associated with them. This can help you restrict access to your production SecurityHub::Hub resources, while allowing your developers to continue testing in their developer environment.

For more information on the supported tag-based conditions which you can use with the Security Hub APIs, refer to Condition Keys for AWS Security Hub. Please note that when you use tag-based conditions for access control, you must define who can modify those tags.

To make it easier to categorize and track your AWS costs, you can also activate cost allocation tags. This helps you organize your SecurityHub::Hub resource costs. AWS generates a cost allocation report as a CSV file, with your usage and costs grouped according to your active tags. You can apply tags that represent business categories (such as cost centers, application names, or project environments) to organize your costs.

For more information on commonly used tagging categories and effective tagging strategies, read about AWS Tagging Strategies.

5. Integrate and enable your existing security products (with 34 integrations today and more to come)

Numerous tools can help you understand the security and compliance posture of your AWS accounts, but these tools generate their own set of findings, often in different formats. Security Hub normalizes the findings.

With Security Hub, findings generated from integrated providers (both third-party services and AWS services) are ingested using a standard findings format, which eliminates the need for security teams to convert the data. You can currently integrate 34 findings providers to import and/or export findings with Security Hub. Some partner products, like PagerDuty, Splunk, and Slack, can receive findings from Security Hub, although they don’t generate findings.

If you want to add a third-party partner product to your AWS environment, you can choose the Purchase link from the Security Hub console’s Integrations page and navigate to AWS Marketplace. Once purchased, choose the Configure link to navigate to step-by-step instructions to install the product and configure its integration with Security Hub. Then choose Enable integration to create a product subscription in your account for that third-party provider (see Figure 2).

After you enable a subscription, a resource policy is automatically attached to it. The resource policy defines the permissions that Security Hub needs to accept and process the product’s findings. You can also enable the subscription via the API and CloudFormation.
 

Figure 2: Integrating partner findings provider with Security Hub

Figure 2: Integrating partner findings provider with Security Hub

6. Build out customized remediation playbooks using Amazon CloudWatch Events, AWS Systems Manager Automation documents, and AWS Step Functions to automatically resolve findings that don’t require human intervention

Security Hub automatically sends all findings to Amazon CloudWatch Events. This integration helps you automate your response to threat incidents by allowing you to take specific actions using AWS Systems Manager Automation documents, OpsItems, and AWS Step Functions. Using these tools, you can create your own incident handling plan. This will allow your security team to focus on strengthening the security of your AWS environments rather than on remediating the current findings.
 

Figure 3: Creating a CloudWatch Events Rule for sending matched Security Hub findings to specific Targets

Figure 3: Creating a CloudWatch Events Rule for sending matched Security Hub findings to specific Targets

7. Create custom actions to send a copy of a Security Hub finding to a resource that is internal or external to your AWS account, enabling additional visibility and remediation options for the finding

Because of its integration with CloudWatch Events, you can use Security Hub to create custom actions that will send specific findings to ticketing, chat, email, or automated remediation systems. Custom actions can also be sent to your own AWS resources, such as AWS Systems Manager OpsCenter, AWS Lambda or Amazon Kinesis, allowing you to do your own remediation or data capture related to the finding.

For an in-depth look at this architecture, plus specific examples of how to implement custom actions, see How to Integrate AWS Security Hub Custom Actions with PagerDuty and How to Enable Custom Actions in AWS Security Hub.

In addition, Security Hub gives you the option to choose a language-specific AWS SDK so that you can use custom actions to resolve findings programmatically. Below, I’ll demonstrate how you can implement this using AWS Lambda and AWS SDK for Python (Boto3). In my example, I’ll remediate the finding generated by Security Hub for CIS check 2.4, “Ensure CloudTrail trails are integrated with Amazon CloudWatch Logs.” For this walk-through, I assume that you have the necessary AWS IAM permissions to work with Security Hub, CloudWatch Events, Lambda and AWS CloudTrail.
 

Figure 4: Data flow supporting remediation of Security Hub findings using custom actions

Figure 4: Data flow supporting remediation of Security Hub findings using custom actions

As shown in figure 4:

  1. When findings against CIS check 2.4 are generated in Security Hub, Security Hub will send them to CloudWatch Events using custom actions that I’ll describe below.
  2. CloudWatch Events will send the findings to a Lambda function that has been configured as the target.
  3. The Lambda function will utilize a Python script to check whether the finding has been generated against CIS check 2.4. If it has, the Lambda function will identify the affected CloudTrail trail and configure it with CloudWatch Logs to monitor the trail logs.

Prerequisites

  1. You must configure an IAM Role for AWS CloudTrail to assume so that it can deliver events to your CloudWatch Logs log group. For more information about how to do this, refer to the AWS CloudTrail documentation. I’ll refer to this role as the CloudTrail role.
  2. To deploy the Lambda function, you must configure an IAM Role for the Lambda function to assume. I’ll refer to this role as the Lambda execution role. The following sample policy includes the permissions that you’ll assign to it for this example. Please replace <CloudTrail_CloudWatchLogs_Role> with the CloudTrail role that you created in the previous step. Depending on your use case, you can restrict this IAM policy further to grant least privilege, which is a recommended IAM Best Practice.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents",
                "logs:DescribeLogGroups",
                "cloudtrail:UpdateTrail",
                "iam:GetRole"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::012345678910:role/<CloudTrail_CloudWatchLogs_Role>"
        }
    ]
}     

Solution deployment

  1. Create a custom action in AWS Security Hub and associate it with a CloudWatch Events rule that you configure for your Security Hub findings. Follow the instructions laid out in the Security Hub user guide for the exact steps to do this.
  2. Create a Lambda Function, which will complete the auto-remediation of the CIS 2.4 findings:
    1. Open the Lambda Console and select Create function.
    2. On the next page, choose Author from scratch.
    3. Under Basic information, enter a name for your function. For Runtime, select Python 3.7.
       
      Figure 5: Updating basic information to create the Lambda function

      Figure 5: Updating basic information to create the Lambda function

    4. Under Permissions, expand Choose or create an execution role.
    5. Under Execution role, select the drop down menu and change the setting to Use an existing role.
    6. Under Existing role, select the Lambda execution role that you created earlier, then select Create function.
       
      Figure 6: Updating basic information and permissions to create the Lambda function

      Figure 6: Updating basic information and permissions to create the Lambda function

    7. Delete the default function code and paste the code I’ve provided below:
      
              import json, boto3
              cloudtrail_client = boto3.client('cloudtrail')
              cloudwatchlogs_client = boto3.client('logs')
              iam_client = boto3.client('iam')
              
              role_details = iam_client.get_role(RoleName='<CloudTrail_CloudWatchLogs_Role>')
              
              def lambda_handler(event, context):
                  # First off all, let us see if the JSON sent by CWE has any Security Hub findings.
                  if 'detail' in event.keys() and 'findings' in event['detail'].keys() and len(event['detail']['findings']) > 0:
                      print("There are some findings. Let's check them!")
                      print("Number of findings: %i" % len(event['detail']['findings']))
              
                      # Then we need to filter out the findings. In this code snippet, we'll handle only findings related to CloudTrail trails for integration with CloudWatch Logs.
                      for finding in event['detail']['findings']:
                          if 'Title' in finding.keys():
                              if 'Ensure CloudTrail trails are integrated with CloudWatch Logs' in finding['Title']:
                                  print("There's a CloudTrail-related finding. I can handle it!")
              
                                  if 'Compliance' in finding.keys() and 'Status' in finding['Compliance'].keys():
                                      print("Compliance Status: %s" % finding['Compliance']['Status'])
              
                                      # We can skip compliant findings, and evaluate only the non-compliant ones.                        
                                      if finding['Compliance']['Status'] == 'PASSED':
                                          continue
              
                                      # For each non-compliant finding, we need to get specific pieces of information so as to create the correct log group and update the CloudTrail trail.                        
                                      for resource in finding['Resources']:
                                          resource_id = resource['Id']
                                          cloudtrail_name = resource['Details']['Other']['name']
                                          loggroup_name = 'CloudTrail/' + cloudtrail_name
                                          print("ResourceId for the finding is %s" % resource_id)
                                          print("LogGroup name: %s" % loggroup_name)
              
                                          # At this point, we can create the log group using the name extracted from the finding.
                                          try:
                                              response_logs = cloudwatchlogs_client.create_log_group(logGroupName=loggroup_name)
                                          except Exception as e:
                                              print("Exception: %s" % str(e))
              
                                          # For updating the CloudTrail trail, we need to have the ARN of the log group. Let's retrieve it now.                            
                                          response_logsARN = cloudwatchlogs_client.describe_log_groups(logGroupNamePrefix = loggroup_name)
                                          print("LogGroup ARN: %s" % response_logsARN['logGroups'][0]['arn'])
                                          print("The role used by CloudTrail is: %s" % role_details['Role']['Arn'])
              
                                          # Finally, let's update the CloudTrail trail so that it sends logs to the new log group created.
                                          try:
                                              response_cloudtrail = cloudtrail_client.update_trail(
                                                  Name=cloudtrail_name,
                                                  CloudWatchLogsLogGroupArn = response_logsARN['logGroups'][0]['arn'],
                                                  CloudWatchLogsRoleArn = role_details['Role']['Arn']
                                              )
                                          except Exception as e:
                                              print("Exception: %s" % str(e))
                              else:
                                  print("Title: %s" % finding['Title'])
                                  print("This type of finding cannot be handled by this function. Skipping it…")
                          else:
                              print("This finding doesn't have a title and so cannot be handled by this function. Skipping it…")
                  else:
                      print("There are no findings to remediate.")            
              

    8. After pasting the code, replace <CloudTrail_CloudWatchLogs_Role> with your CloudTrail role and select Save to save your Lambda function.
       
      Figure 7: Editing Lambda code to replace the correct CloudTrail role

      Figure 7: Editing Lambda code to replace the correct CloudTrail role

  3. Go to your CloudWatch console and select Rules in the navigation pane on the left.
    1. From the list of CloudWatch rules that you see, select the rule which you created in Step 1 of this solution deployment.
    2. Then, select Actions on the top right of the page and choose Edit.
    3. On the Step 1: Create rule page, under Targets, choose Lambda function and select the Lambda function you created in Step 2.
    4. Select Configure details.
    5. On the Step 2: Configure rule details page, select Update rule.
       
      Figure 8: Adding your created Lambda function as Target for the CloudWatch rule

      Figure 8: Adding your created Lambda function as target for the CloudWatch rule

  4. Configuration is now complete, and you can test your rule. Go to your AWS Security Hub console and select Compliance standards in the navigation pane.
    1. Next, select CIS AWS Foundations.
       
      Figure 9: Compliance standards page in the Security Hub console

      Figure 9: Compliance standards page in the Security Hub console

    2. Search for the rule 2.4 Ensure CloudTrail trails are integrated with CloudWatch Logs and select it.
       
      Figure 10: Locating CIS check 2.4 in the Security Hub console

      Figure 10: Locating CIS check 2.4 in the Security Hub console

    3. If you’ve left the default AWS Security Hub CIS checks enabled (along with AWS Config service in the same region), and if you have CloudTrail trails in that region which are not yet configured to deliver events to CloudWatch Logs, you should see a low severity finding with a Failed Compliance status.
    4. Select the failed finding by selecting the checkbox and choosing the Actions button.
    5. Finally, from the dropdown menu, select the custom action that you created in Step 1 to send the finding to CloudWatch Events. CloudWatch Events will send the finding to your Lambda function, which you configured as the target for the rule in step 3. The Lambda function will automatically identify the affected CloudTrail trail and configure CloudWatch Logs log group for you. The log group will have the same name as your trail for identification purposes. You can modify the code to suit your needs further.

    Note: There may be a delay before the compliance status of the remediated resource changes. Once the CIS AWS Foundations Standard is enabled, Security Hub will run the checks within 2 hours. After that, the checks are automatically run once every 24 hours.

     

    Figure 11: Findings generated against CIS check 2.4 in the Security Hub Console

    Figure 11: Findings generated against CIS check 2.4 in the Security Hub console

    8. Customize your insights using the default “managed insights” as templates and use them to prioritize resources and findings to act upon

    A Security Hub “insight” is a collection of related findings to which one or more Security Hub filters have been applied. Insights can help you organize your findings and identify security risks that need immediate attention.

    Security Hub offers several managed (default) insights. You can use these as templates for new insights, and modify them depending on your use case. You can save these modified queries as new custom insights to ensure an even greater visibility of your AWS accounts. Please refer to the documentation for step-by-step instructions on how to create custom insights.
     

    Figure 12: Creating a Security Hub custom insight

    Figure 12: Creating a Security Hub custom insight

    9. Use the free trial to evaluate what your costs could be

    Security Hub provides a 30-day free trial for all AWS accounts and regions. The trial is a good way to evaluate how much Security Hub will cost, on average, to monitor threats and compliance in your environments. You can view an estimate by navigating from the Security Hub console to Settings, then Usage (see Figure 13).
     

    Figure 13: Estimating your Security Hub costs

    Figure 13: Estimating your Security Hub costs

    Conclusion

    AWS Security Hub allows you to have more visibility into the security and compliance status of your AWS environments. Using the Security Hub best practices discussed here, security teams can spend more time on incident remediation and recovery rather than incident detection and organization. Security Hub has undergone HIPAA, ISO, PCI, and SOC certification. To learn more about Security Hub, refer to the AWS Security Hub documentation.

    If you have comments about this post, submit them in the Comments section below. If you have questions about anything in this post, start a new thread on the AWS Security Hub forum or contact AWS Support.

    Want more AWS Security news? Follow us on Twitter.

    Author

    Ketan Srivastava

    Ketan is a Cloud Support Engineer at AWS. He enjoys the fact that, at AWS, there are always so many opportunities to build things better for our customers and learn from these opportunities. Outside of work, he plays MOBAs and travels to new places with his wife. He holds a Master of Science degree from Rochester Institute of Technology.

Let’s Annotate Our Methods With The Features They Implement

Post Syndicated from Bozho original https://techblog.bozho.net/lets-annotate-our-methods-with-the-features-they-implement/

Writing software consists of very little actual “writing”, and much more thinking, designing, reading, “digging”, analyzing, debugging, refactoring, aligning and meeting others.

The reading and digging part is where you try to understand what has been implemented before, why it has been implemented, and how it works. In larger projects it becomes increasingly hard to find what is happening and why – there are so many classes that interfere, and so many methods participate in implementing a particular feature.

That’s probably because there is a mismatch between the programming units (classes, methods) and the business logic units (features). Product owners want a “password reset” feature, and they don’t care if it’s done using framework configuration, custom code split in three classes, or one monolithic controller method that does that job.

This mismatch is partially addressed by the so called BDD (behaviour driven development), as business people can define scenarios in a formalized language (although they rarely do, it’s still up to the QAs or developers to write the tests). But having your tests organized around features and behaviours doesn’t mean the code is, and BDD doesn’t help in making your way through the codebase in search of why and how something is implemented.

Another issue is linking a piece of code to the issue tracking system. Source control conventions and hooks allow for setting the issue tracker number as part of the commit, and then when browsing the code, you can annotate the file and see the issue number. However, due the the many changes, even a very strict team will end up methods that are related to multiple issues and you can’t easily tell which is the proper one.

Yet another issue with the lack of a “feature” unit in programming languages is that you can’t trivially reuse existing projects to start a new one. We’ve all been there – you have a similar project and you want to get a skeleton to get thing running faster. And while there are many tools to help that (Spring Boot, Spring Roo, and other scaffolding utilities), they can rarely deliver what you need – you always have to tweak something, delete something, customize some configuration, as defaults are almost never practical.

And I have a simple proposal that will help with the issues above. As with any complex problem, simple ideas don’t solve everything, but are at least a step forward.

The proposal is in the title – let’s annotate our methods with the features they implement. Let’s have @Feature(name = "Forgotten password", issueTrackerCode="PROJ-123"). A method can implement multiple features, but that is generally discouraged by best practices (e.g. the single responsibility principle). The granularity of “feature” is something that has to be determined by each team and is the tricky part – sometimes an epic describes a feature, sometimes individual stories or even subtasks do. A definition of a feature should be agreed upon and every new team member should be told what to do and how to interpret it.

There is of course a lot of complexity, e.g. for generic methods like DAO methods, utility methods, or methods that are reused in too many places. But they also represent features, it’s just that these features are horizontal. “Data access layer” is a feature – a more technical one indeed, but it counts, and maybe deserves a story in the issue tracker.

Your features can actually be listed in one or several enums, grouped by type – business, horizontal, performance, etc. That way you can even compose features – e.g. account creation contains the logic itself, database access, a security layer.

How does such a proposal help?

  • Consciousnesses about the single responsibility of methods and that code should be readable
  • Provides a rationale for the existence of each method. Even if a proper comment is missing, the annotation will put a method (or a class) in context
  • Helps navigating code and fixing issues (if you can see all places where a feature is implemented, you are more likely to spot an issue)
  • Allows tools to analyze your features – amount, complexity, how chaotic a feature is spread across the code base, test coverage per feature, etc.
  • Allows tools to use existing projects for scaffolding for new ones – you specify the features you want to have, and they are automatically copied

At this point I’m supposed to give a link to a GitHub project for a feature annotation library. But it doesn’t make sense to have a single-annotation project. It can easily be part of guava or something similar Or can be manually created in each project. The complex part – the tools that will do the scanning and analysis, deserve separate projects, but unfortunately I don’t have time to write one.

But even without the tools, the concept of annotating methods with their high-level features is I think a useful one. Instead of trying to deduce why is this method here and what requirements does it have to implement (and were all necessary tests written at the time), such an annotation can come handy.

The post Let’s Annotate Our Methods With The Features They Implement appeared first on Bozho's tech blog.

New – How to better monitor your custom application metrics using Amazon CloudWatch Agent

Post Syndicated from Helen Lin original https://aws.amazon.com/blogs/devops/new-how-to-better-monitor-your-custom-application-metrics-using-amazon-cloudwatch-agent/

This blog was contributed by Zhou Fang, Sr. Software Development Engineer for Amazon CloudWatch and Helen Lin, Sr. Product Manager for Amazon CloudWatch

Amazon CloudWatch collects monitoring and operational data from both your AWS resources and on-premises servers, providing you with a unified view of your infrastructure and application health. By default, CloudWatch automatically collects and stores many of your AWS services’ metrics and enables you to monitor and alert on metrics such as high CPU utilization of your Amazon EC2 instances. With the CloudWatch Agent that launched last year, you can also deploy the agent to collect system metrics and application logs from both your Windows and Linux environments. Using this data collected by CloudWatch, you can build operational dashboards to monitor your service and application health, set high-resolution alarms to alert and take automated actions, and troubleshoot issues using Amazon CloudWatch Logs.

We recently introduced CloudWatch Agent support for collecting custom metrics using StatsD and collectd. It’s important to collect system metrics like available memory, and you might also want to monitor custom application metrics. You can use these custom application metrics, such as request count to understand the traffic going through your application or understand latency so you can be alerted when requests take too long to process. StatsD and collectd are popular, open-source solutions that gather system statistics for a wide variety of applications. By combining the system metrics the agent already collects, with the StatsD protocol for instrumenting your own metrics and collectd’s numerous plugins, you can better monitor, analyze, alert, and troubleshoot the performance of your systems and applications.

Let’s dive into an example that demonstrates how to monitor your applications using the CloudWatch Agent.  I am operating a RESTful service that performs simple text encoding. I want to use CloudWatch to help monitor a few key metrics:

  • How many requests are coming into my service?
  • How many of these requests are unique?
  • What is the typical size of a request?
  • How long does it take to process a job?

These metrics help me understand my application performance and throughput, in addition to setting alarms on critical metrics that could indicate service degradation, such as request latency.

Step 1. Collecting StatsD metrics

My service is running on an EC2 instance, using Amazon Linux AMI 2018.03.0. Make sure to attach the CloudWatchAgentServerPolicy AWS managed policy so that the CloudWatch agent can collect and publish metrics from this instance:

Here is the service structure:

 

The “/encode” handler simply returns the base64 encoded string of an input text.  To monitor key metrics, such as total and unique request count as well as request size and method response time, I used StatsD to define these custom metrics.

@RestController

public class EncodeController {

    @RequestMapping("/encode")
    public String encode(@RequestParam(value = "text") String text) {
        long startTime = System.currentTimeMillis();
        statsd.incrementCounter("totalRequest.count", new String[]{"path:/encode"});
        statsd.recordSetValue("uniqueRequest.count", text, new String[]{"path:/encode"});
        statsd.recordHistogramValue("request.size", text.length(), new String[]{"path:/encode"});
        String encodedString = Base64.getEncoder().encodeToString(text.getBytes());
        statsd.recordExecutionTime("latency", System.currentTimeMillis() - startTime, new String[]{"path:/encode"});
        return encodedString;
    }
}

Note that I need to first choose a StatsD client from here.

The “/status” handler responds with a health check ping.  Here I am monitoring my available JVM memory:

@RestController
public class StatusController {

    @RequestMapping("/status")
    public int status() {
        statsd.recordGaugeValue("memory.free", Runtime.getRuntime().freeMemory(), new String[]{"path:/status"});
        return 0;
    }
}

 

Step 2. Emit custom metrics using collectd (optional)

collectd is another popular, open-source daemon for collecting application metrics. If I want to use the hundreds of available collectd plugins to gather application metrics, I can also use the CloudWatch Agent to publish collectd metrics to CloudWatch for 15-months retention. In practice, I might choose to use either StatsD or collectd to collect custom metrics, or I have the option to use both. All of these use cases  are supported by the CloudWatch agent.

Using the same demo RESTful service, I’ll show you how to monitor my service health using the collectd cURL plugin, which passes the collectd metrics to CloudWatch Agent via the network plugin.

For my RESTful service, the “/status” handler returns HTTP code 200 to signify that it’s up and running. This is important to monitor the health of my service and trigger an alert when the application does not respond with a HTTP 200 success code. Additionally, I want to monitor the lapsed time for each health check request.

To collect these metrics using collectd, I have a collectd daemon installed on the EC2 instance, running version 5.8.0. Here is my collectd config:

LoadPlugin logfile
LoadPlugin curl
LoadPlugin network

<Plugin logfile>
  LogLevel "debug"
  File "/var/log/collectd.log"
  Timestamp true
</Plugin>

<Plugin curl>
    <Page "status">
        URL "http://localhost:8080/status";
        MeasureResponseTime true
        MeasureResponseCode true
    </Page>
</Plugin>

<Plugin network>
    <Server "127.0.0.1" "25826">
        SecurityLevel Encrypt
        Username "user"
        Password "secret"
    </Server>
</Plugin>

 

For the cURL plugin, I configured it to measure response time (latency) and response code (HTTP status code) from the RESTful service.

Note that for the network plugin, I used Encrypt mode which requires an authentication file for the CloudWatch Agent to authenticate incoming collectd requests.  Click here for full details on the collectd installation script.

 

Step 3. Configure the CloudWatch agent

So far, I have shown you how to:

A.  Use StatsD to emit custom metrics to monitor my service health
B.  Optionally use collectd to collect metrics using plugins

Next, I will install and configure the CloudWatch agent to accept metrics from both the StatsD client and collectd plugins.

I installed the CloudWatch Agent following the instructions in the user guide, but here are the detailed steps:

Install CloudWatch Agent:

wget https://s3.amazonaws.com/amazoncloudwatch-agent/linux/amd64/latest/AmazonCloudWatchAgent.zip -O AmazonCloudWatchAgent.zip && unzip -o AmazonCloudWatchAgent.zip && sudo ./install.sh

Configure CloudWatch Agent to receive metrics from StatsD and collectd:

{
  "metrics": {
    "append_dimensions": {
      "AutoScalingGroupName": "${aws:AutoScalingGroupName}",
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "collectd": {},
      "statsd": {}
    }
  }
}

Pass the above config (config.json) to the CloudWatch Agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:config.json -s

In case you want to skip these steps and just execute my sample agent install script, you can find it here.

 

Step 4. Generate and monitor application traffic in CloudWatch

Now that I have the CloudWatch agent installed and configured to receive StatsD and collect metrics, I’m going to generate traffic through the service:

echo "send 100 requests"
for i in {1..100}
do
   curl "localhost:8080/encode?text=TextToEncode_${i}[email protected]#%"
   echo ""
   sleep 1
done

 

Next, I log in to the CloudWatch console and check that the service is up and running. Here’s a graph of the StatsD metrics:

 

Here is a graph of the collectd metrics:

 

Conclusion

With StatsD and collectd support, you can now use the CloudWatch Agent to collect and monitor your custom applications in addition to the system metrics and application logs it already collects. Furthermore, you can create operational dashboards with these metrics, set alarms to take automated actions when free memory is low, and troubleshoot issues by diving into the application logs.  Note that StatsD supports both Windows and Linux operating systems while collectd is Linux only.  For Windows, you can also continue to use Windows Performance Counters to collect custom metrics instead.

The CloudWatch Agent with custom metrics support (version 1.203420.0 or later) is available in all public AWS Regions, AWS GovCloud (US), with AWS China (Beijing) and AWS China (Ningxia) coming soon.

The agent is free to use; you pay the usual CloudWatch prices for logs and custom metrics.

For more details, head over to the CloudWatch user guide for StatsD and collectd.

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:

 

Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.

 

AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.
Compute

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.

 

Containers
June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.

 

Databases

June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.
DevOps

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.

 

Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new

 

AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.
IoT

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.

 

Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.

 

Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.

 

Mobile
June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.

 

Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.

 

Serverless

June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.

 

Storage

June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

Protecting your API using Amazon API Gateway and AWS WAF — Part I

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/protecting-your-api-using-amazon-api-gateway-and-aws-waf-part-i/

This post courtesy of Thiago Morais, AWS Solutions Architect

When you build web applications or expose any data externally, you probably look for a platform where you can build highly scalable, secure, and robust REST APIs. As APIs are publicly exposed, there are a number of best practices for providing a secure mechanism to consumers using your API.

Amazon API Gateway handles all the tasks involved in accepting and processing up to hundreds of thousands of concurrent API calls, including traffic management, authorization and access control, monitoring, and API version management.

In this post, I show you how to take advantage of the regional API endpoint feature in API Gateway, so that you can create your own Amazon CloudFront distribution and secure your API using AWS WAF.

AWS WAF is a web application firewall that helps protect your web applications from common web exploits that could affect application availability, compromise security, or consume excessive resources.

As you make your APIs publicly available, you are exposed to attackers trying to exploit your services in several ways. The AWS security team published a whitepaper solution using AWS WAF, How to Mitigate OWASP’s Top 10 Web Application Vulnerabilities.

Regional API endpoints

Edge-optimized APIs are endpoints that are accessed through a CloudFront distribution created and managed by API Gateway. Before the launch of regional API endpoints, this was the default option when creating APIs using API Gateway. It primarily helped to reduce latency for API consumers that were located in different geographical locations than your API.

When API requests predominantly originate from an Amazon EC2 instance or other services within the same AWS Region as the API is deployed, a regional API endpoint typically lowers the latency of connections. It is recommended for such scenarios.

For better control around caching strategies, customers can use their own CloudFront distribution for regional APIs. They also have the ability to use AWS WAF protection, as I describe in this post.

Edge-optimized API endpoint

The following diagram is an illustrated example of the edge-optimized API endpoint where your API clients access your API through a CloudFront distribution created and managed by API Gateway.

Regional API endpoint

For the regional API endpoint, your customers access your API from the same Region in which your REST API is deployed. This helps you to reduce request latency and particularly allows you to add your own content delivery network, as needed.

Walkthrough

In this section, you implement the following steps:

  • Create a regional API using the PetStore sample API.
  • Create a CloudFront distribution for the API.
  • Test the CloudFront distribution.
  • Set up AWS WAF and create a web ACL.
  • Attach the web ACL to the CloudFront distribution.
  • Test AWS WAF protection.

Create the regional API

For this walkthrough, use an existing PetStore API. All new APIs launch by default as the regional endpoint type. To change the endpoint type for your existing API, choose the cog icon on the top right corner:

After you have created the PetStore API on your account, deploy a stage called “prod” for the PetStore API.

On the API Gateway console, select the PetStore API and choose Actions, Deploy API.

For Stage name, type prod and add a stage description.

Choose Deploy and the new API stage is created.

Use the following AWS CLI command to update your API from edge-optimized to regional:

aws apigateway update-rest-api \
--rest-api-id {rest-api-id} \
--patch-operations op=replace,path=/endpointConfiguration/types/EDGE,value=REGIONAL

A successful response looks like the following:

{
    "description": "Your first API with Amazon API Gateway. This is a sample API that integrates via HTTP with your demo Pet Store endpoints", 
    "createdDate": 1511525626, 
    "endpointConfiguration": {
        "types": [
            "REGIONAL"
        ]
    }, 
    "id": "{api-id}", 
    "name": "PetStore"
}

After you change your API endpoint to regional, you can now assign your own CloudFront distribution to this API.

Create a CloudFront distribution

To make things easier, I have provided an AWS CloudFormation template to deploy a CloudFront distribution pointing to the API that you just created. Click the button to deploy the template in the us-east-1 Region.

For Stack name, enter RegionalAPI. For APIGWEndpoint, enter your API FQDN in the following format:

{api-id}.execute-api.us-east-1.amazonaws.com

After you fill out the parameters, choose Next to continue the stack deployment. It takes a couple of minutes to finish the deployment. After it finishes, the Output tab lists the following items:

  • A CloudFront domain URL
  • An S3 bucket for CloudFront access logs
Output from CloudFormation

Output from CloudFormation

Test the CloudFront distribution

To see if the CloudFront distribution was configured correctly, use a web browser and enter the URL from your distribution, with the following parameters:

https://{your-distribution-url}.cloudfront.net/{api-stage}/pets

You should get the following output:

[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Set up AWS WAF and create a web ACL

With the new CloudFront distribution in place, you can now start setting up AWS WAF to protect your API.

For this demo, you deploy the AWS WAF Security Automations solution, which provides fine-grained control over the requests attempting to access your API.

For more information about deployment, see Automated Deployment. If you prefer, you can launch the solution directly into your account using the following button.

For CloudFront Access Log Bucket Name, add the name of the bucket created during the deployment of the CloudFormation stack for your CloudFront distribution.

The solution allows you to adjust thresholds and also choose which automations to enable to protect your API. After you finish configuring these settings, choose Next.

To start the deployment process in your account, follow the creation wizard and choose Create. It takes a few minutes do finish the deployment. You can follow the creation process through the CloudFormation console.

After the deployment finishes, you can see the new web ACL deployed on the AWS WAF console, AWSWAFSecurityAutomations.

Attach the AWS WAF web ACL to the CloudFront distribution

With the solution deployed, you can now attach the AWS WAF web ACL to the CloudFront distribution that you created earlier.

To assign the newly created AWS WAF web ACL, go back to your CloudFront distribution. After you open your distribution for editing, choose General, Edit.

Select the new AWS WAF web ACL that you created earlier, AWSWAFSecurityAutomations.

Save the changes to your CloudFront distribution and wait for the deployment to finish.

Test AWS WAF protection

To validate the AWS WAF Web ACL setup, use Artillery to load test your API and see AWS WAF in action.

To install Artillery on your machine, run the following command:

$ npm install -g artillery

After the installation completes, you can check if Artillery installed successfully by running the following command:

$ artillery -V
$ 1.6.0-12

As the time of publication, Artillery is on version 1.6.0-12.

One of the WAF web ACL rules that you have set up is a rate-based rule. By default, it is set up to block any requesters that exceed 2000 requests under 5 minutes. Try this out.

First, use cURL to query your distribution and see the API output:

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets
[
  {
    "id": 1,
    "type": "dog",
    "price": 249.99
  },
  {
    "id": 2,
    "type": "cat",
    "price": 124.99
  },
  {
    "id": 3,
    "type": "fish",
    "price": 0.99
  }
]

Based on the test above, the result looks good. But what if you max out the 2000 requests in under 5 minutes?

Run the following Artillery command:

artillery quick -n 2000 --count 10  https://{distribution-name}.cloudfront.net/prod/pets

What you are doing is firing 2000 requests to your API from 10 concurrent users. For brevity, I am not posting the Artillery output here.

After Artillery finishes its execution, try to run the cURL request again and see what happens:

 

$ curl -s https://{distribution-name}.cloudfront.net/prod/pets

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<HTML><HEAD><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<TITLE>ERROR: The request could not be satisfied</TITLE>
</HEAD><BODY>
<H1>ERROR</H1>
<H2>The request could not be satisfied.</H2>
<HR noshade size="1px">
Request blocked.
<BR clear="all">
<HR noshade size="1px">
<PRE>
Generated by cloudfront (CloudFront)
Request ID: [removed]
</PRE>
<ADDRESS>
</ADDRESS>
</BODY></HTML>

As you can see from the output above, the request was blocked by AWS WAF. Your IP address is removed from the blocked list after it falls below the request limit rate.

Conclusion

In this first part, you saw how to use the new API Gateway regional API endpoint together with Amazon CloudFront and AWS WAF to secure your API from a series of attacks.

In the second part, I will demonstrate some other techniques to protect your API using API keys and Amazon CloudFront custom headers.

Connect, collaborate, and learn at AWS Global Summits in 2018

Post Syndicated from Tina Kelleher original https://aws.amazon.com/blogs/big-data/connect-collaborate-and-learn-at-aws-global-summits-in-2018/

Regardless of your career path, there’s no denying that attending industry events can provide helpful career development opportunities — not only for improving and expanding your skill sets, but for networking as well. According to this article from PayScale.com, experts estimate that somewhere between 70-85% of new positions are landed through networking.

Narrowing our focus to networking opportunities with cloud computing professionals who’re working on tackling some of today’s most innovative and exciting big data solutions, attending big data-focused sessions at an AWS Global Summit is a great place to start.

AWS Global Summits are free events that bring the cloud computing community together to connect, collaborate, and learn about AWS. As the name suggests, these summits are held in major cities around the world, and attract technologists from all industries and skill levels who’re interested in hearing from AWS leaders, experts, partners, and customers.

In addition to networking opportunities with top cloud technology providers, consultants and your peers in our Partner and Solutions Expo, you’ll also hone your AWS skills by attending and participating in a multitude of education and training opportunities.

Here’s a brief sampling of some of the upcoming sessions relevant to big data professionals:

May 31st : Big Data Architectural Patterns and Best Practices on AWS | AWS Summit – Mexico City

June 6th-7th: Various (click on the “Big Data & Analytics” header) | AWS Summit – Berlin

June 20-21st : [email protected] | Public Sector Summit – Washington DC

June 21st: Enabling Self Service for Data Scientists with AWS Service Catalog | AWS Summit – Sao Paulo

Be sure to check out the main page for AWS Global Summits, where you can see which cities have AWS Summits planned for 2018, register to attend an upcoming event, or provide your information to be notified when registration opens for a future event.

Practice Makes Perfect: Testing Campaigns Before You Send Them

Post Syndicated from Zach Barbitta original https://aws.amazon.com/blogs/messaging-and-targeting/practice-makes-perfect-testing-campaigns-before-you-send-them/

In an article we posted to Medium in February, we talked about how to determine the best time to engage your customers by using Amazon Pinpoint’s built-in session heat map. The session heat map allows you to find the times that your customers are most likely to use your app. In this post, we continued on the topic of best practices—specifically, how to appropriately test a campaign before going live.

In this post, we’ll talk about the old adage “practice makes perfect,” and how it applies to the campaigns you send using Amazon Pinpoint. Let’s take a scenario many of our customers encounter daily: creating a campaign to engage users by sending a push notification.

As you can see from the preceding screenshot, the segment we plan to target has nearly 1.7M recipients, which is a lot! Of course, before we got to this step, we already put several best practices into practice. For example, we determined the best time to engage our audience, scheduled the message based on recipients’ local time zones, performed A/B/N testing, measured lift using a hold-out group, and personalized the content for maximum effectiveness. Now that we’re ready to send the notification, we should test the message before we send it to all of the recipients in our segment. The reason for testing the message is pretty straightforward: we want to make sure every detail of the message is accurate before we send it to all 1,687,575 customers.

Fortunately, Amazon Pinpoint makes it easy to test your messages—in fact, you don’t even have to leave the campaign wizard in order to do so. In step 3 of the campaign wizard, below the message editor, there’s a button labelled Test campaign.

When you choose the Test campaign button, you have three options: you can send the test message to a segment of 100 endpoints or less, or to a set of specific endpoint IDs (up to 10), or to a set of specific device tokens (up to 10), as shown in the following image.

In our case, we’ve already created a segment of internal recipients who will test our message. On the Test Campaign window, under Send a test message to, we choose A segment. Then, in the drop-down menu, we select our test segment, and then choose Send test message.

Because we’re sending the test message to a segment, Amazon Pinpoint automatically creates a new campaign dedicated to this test. This process executes a test campaign, complete with message analytics, which allows you to perform end-to-end testing as if you sent the message to your production audience. To see the analytics for your test campaign, go to the Campaigns tab, and then choose the campaign (the name of the campaign contains the word “test”, followed by four random characters, followed by the name of the campaign).

After you complete a successful test, you’re ready to launch your campaign. As a final check, the Review & Launch screen includes a reminder that indicates whether or not you’ve tested the campaign, as shown in the following image.

There are several other ways you can use this feature. For example, you could use it for troubleshooting a campaign, or for iterating on existing campaigns. To learn more about testing campaigns, see the Amazon Pinpoint User Guide.