Tag Archives: Amazon VPC

Using Route 53 Private Hosted Zones for Cross-account Multi-region Architectures

Post Syndicated from Anandprasanna Gaitonde original https://aws.amazon.com/blogs/architecture/using-route-53-private-hosted-zones-for-cross-account-multi-region-architectures/

This post was co-written by Anandprasanna Gaitonde, AWS Solutions Architect and John Bickle, Senior Technical Account Manager, AWS Enterprise Support

Introduction

Many AWS customers have internal business applications spread over multiple AWS accounts and on-premises to support different business units. In such environments, you may find a consistent view of DNS records and domain names between on-premises and different AWS accounts useful. Route 53 Private Hosted Zones (PHZs) and Resolver endpoints on AWS create an architecture best practice for centralized DNS in hybrid cloud environment. Your business units can use flexibility and autonomy to manage the hosted zones for their applications and support multi-region application environments for disaster recovery (DR) purposes.

This blog presents an architecture that provides a unified view of the DNS while allowing different AWS accounts to manage subdomains. It utilizes PHZs with overlapping namespaces and cross-account multi-region VPC association for PHZs to create an efficient, scalable, and highly available architecture for DNS.

Architecture Overview

You can set up a multi-account environment using services such as AWS Control Tower to host applications and workloads from different business units in separate AWS accounts. However, these applications have to conform to a naming scheme based on organization policies and simpler management of DNS hierarchy. As a best practice, the integration with on-premises DNS is done by configuring Amazon Route 53 Resolver endpoints in a shared networking account. Following is an example of this architecture.

Route 53 PHZs and Resolver Endpoints

Figure 1 – Architecture Diagram

The customer in this example has on-premises applications under the customer.local domain. Applications hosted in AWS use subdomain delegation to aws.customer.local. The example here shows three applications that belong to three different teams, and those environments are located in their separate AWS accounts to allow for autonomy and flexibility. This architecture pattern follows the option of the “Multi-Account Decentralized” model as described in the whitepaper Hybrid Cloud DNS options for Amazon VPC.

This architecture involves three key components:

1. PHZ configuration: PHZ for the subdomain aws.customer.local is created in the shared Networking account. This is to support centralized management of PHZ for ancillary applications where teams don’t want individual control (Item 1a in Figure). However, for the key business applications, each of the teams or business units creates its own PHZ. For example, app1.aws.customer.local – Application1 in Account A, app2.aws.customer.local – Application2 in Account B, app3.aws.customer.local – Application3 in Account C (Items 1b in Figure). Application1 is a critical business application and has stringent DR requirements. A DR environment of this application is also created in us-west-2.

For a consistent view of DNS and efficient DNS query routing between the AWS accounts and on-premises, best practice is to associate all the PHZs to the Networking Account. PHZs created in Account A, B and C are associated with VPC in Networking Account by using cross-account association of Private Hosted Zones with VPCs. This creates overlapping domains from multiple PHZs for the VPCs of the networking account. It also overlaps with the parent sub-domain PHZ (aws.customer.local) in the Networking account. In such cases where there is two or more PHZ with overlapping namespaces, Route 53 resolver routes traffic based on most specific match as described in the Developer Guide.

2. Route 53 Resolver endpoints for on-premises integration (Item 2 in Figure): The networking account is used to set up the integration with on-premises DNS using Route 53 Resolver endpoints as shown in Resolving DNS queries between VPC and your network. Inbound and Outbound Route 53 Resolver endpoints are created in the VPC in us-east-1 to serve as the integration between on-premises DNS and AWS. The DNS traffic between on-premises to AWS requires an AWS Site2Site VPN connection or AWS Direct Connect connection to carry DNS and application traffic. For each Resolver endpoint, two or more IP addresses can be specified to map to different Availability Zones (AZs). This helps create a highly available architecture.

3. Route 53 Resolver rules (Item 3 in Figure): Forwarding rules are created only in the networking account to route DNS queries for on-premises domains (customer.local) to the on-premises DNS server. AWS Resource Access Manager (RAM) is used to share the rules to accounts A, B and C as mentioned in the section “Sharing forwarding rules with other AWS accounts and using shared rules” in the documentation. Account owners can now associate these shared rules with their VPCs the same way that they associate rules created in their own AWS accounts. If you share the rule with another AWS account, you also indirectly share the outbound endpoint that you specify in the rule as described in the section “Considerations when creating inbound and outbound endpoints” in the documentation. This implies that you use one outbound endpoint in a region to forward DNS queries to your on-premises network from multiple VPCs, even if the VPCs were created in different AWS accounts. Resolver starts to forward DNS queries for the domain name that’s specified in the rule to the outbound endpoint and forward to the on-premises DNS servers. The rules are created in both regions in this architecture.

This architecture provides the following benefits:

  1. Resilient and scalable
  2. Uses the VPC+2 endpoint, local caching and Availability Zone (AZ) isolation
  3. Minimal forwarding hops
  4. Lower cost: optimal use of Resolver endpoints and forwarding rules

In order to handle the DR, here are some other considerations:

  • For app1.aws.customer.local, the same PHZ is associated with VPC in us-west-2 region. While VPCs are regional, the PHZ is a global construct. The same PHZ is accessible from VPCs in different regions.
  • Failover routing policy is set up in the PHZ and failover records are created. However, Route 53 health checkers (being outside of the VPC) require a public IP for your applications. As these business applications are internal to the organization, a metric-based health check with Amazon CloudWatch can be configured as mentioned in Configuring failover in a private hosted zone.
  • Resolver endpoints are created in VPC in another region (us-west-2) in the networking account. This allows on-premises servers to failover to these secondary Resolver inbound endpoints in case the region goes down.
  • A second set of forwarding rules is created in the networking account, which uses the outbound endpoint in us-west-2. These are shared with Account A and then associated with VPC in us-west-2.
  • In addition, to have DR across multiple on-premises locations, the on-premises servers should have a secondary backup DNS on-premises as well (not shown in the diagram).
    This ensures a simple DNS architecture for the DR setup, and seamless failover for applications in case of a region failure.

Considerations

  • If Application 1 needs to communicate to Application 2, then the PHZ from Account A must be shared with Account B. DNS queries can then be routed efficiently for those VPCs in different accounts.
  • Create additional IP addresses in a single AZ/subnet for the resolver endpoints, to handle large volumes of DNS traffic.
  • Look at Considerations while using Private Hosted Zones before implementing such architectures in your AWS environment.

Summary

Hybrid cloud environments can utilize the features of Route 53 Private Hosted Zones such as overlapping namespaces and the ability to perform cross-account and multi-region VPC association. This creates a unified DNS view for your application environments. The architecture allows for scalability and high availability for business applications.

New – VPC Reachability Analyzer

Post Syndicated from Harunobu Kameda original https://aws.amazon.com/blogs/aws/new-vpc-insights-analyzes-reachability-and-visibility-in-vpcs/

With Amazon Virtual Private Cloud (VPC), you can launch a logically isolated customer-specific virtual network on the AWS Cloud. As customers expand their footprint on the cloud and deploy increasingly complex network architectures, it can take longer to resolve network connectivity issues caused by misconfiguration. Today, we are happy to announce VPC Reachability Analyzer, a network diagnostics tool that troubleshoots reachability between two endpoints in a VPC, or within multiple VPCs.

Ensuring Your Network Configuration is as Intended
You have full control over your virtual network environment, including choosing your own IP address range, creating subnets, and configuring route tables and network gateways. You can also easily customize the network configuration of your VPC. For example, you can create a public subnet for a web server that has access to the Internet with Internet Gateway. Security-sensitive backend systems such as databases and application servers can be placed on private subnets that do not have internet access. You can use multiple layers of security, such as security groups and network access control list (ACL), to control access to entities of each subnet by protocol, IP address, and port number.

You can also combine multiple VPCs via VPC peering or AWS Transit Gateway for region-wide, or global network connections that can route traffic privately. You can also use VPN Gateway to connect your site with your AWS account for secure communication. Many AWS services that reside outside the VPC, such as AWS Lambda, or Amazon S3, support VPC endpoints or AWS PrivateLink as entities inside the VPC and can communicate with those privately.

When you have such rich controls and feature set, it is not unusual to have unintended configuration that could lead to connectivity issues. Today, you can use VPC Reachability Analyzer for analyzing reachability between two endpoints without sending any packets. VPC Reachability analyzer looks at the configuration of all the resources in your VPCs and uses automated reasoning to determine what network flows are feasible. It analyzes all possible paths through your network without having to send any traffic on the wire. To learn more about how these algorithms work checkout this re:Invent talk or read this paper.

How VPC Reachability Analyzer Works
Let’s see how it works. Using VPC Reachability Analyzer is very easy, and you can test it with your current VPC. If you need an isolated VPC for test purposes, you can run the AWS CloudFormation YAML template at the bottom of this article. The template creates a VPC with 1 subnet, 2 security groups and 3 instances as A, B, and C. Instance A and B can communicate with each other, but those instances cannot communicate with instance C because the security group attached to instance C does not allow any incoming traffic.

You see Reachability Analyzer in the left navigation of the VPC Management Console.

Click Reachability Analyzer, and also click Create and analyze path button, then you see new windows where you can specify a path between a source and destination, and start analysis.

You can specify any of the following endpoint types: VPN Gateways, Instances, Network Interfaces, Internet Gateways, VPC Endpoints, VPC Peering Connections, and Transit Gateways for your source and destination of communication. For example, we set instance A for source and the instance B for destination. You can choose to check for connectivity via either the TCP or UDP protocols. Optionally, you can also specify a port number, or source, or destination IP address.

Configuring test path

Finally, click the Create and analyze path button to start the analysis. The analysis can take up to several minutes depending on the size and complexity of your VPCs, but it typically takes a few seconds.

You can now see the analysis result as Reachable. If you click the URL link of analysis id nip-xxxxxxxxxxxxxxxxx, you can see the route hop by hop.

The communication from instance A to instance C is not reachable because the security group attached to instance C does not allow any incoming traffic.

If you click nip-xxxxxxxxxxxxxxxxx for more detail, you can check the Explanations for details.

Result Detail

Here we see the security group that blocked communication. When you click on the security group listed in the upper right corner, you can go directly to the security group editing window to change the security group rules. In this case adding a properly scoped ingress rule will allow the instances to communicate.

Available Today
This feature is available for all AWS commercial Regions except for China (Beijing), and China (Ningxia) regions. More information is available in our technical documentation, and remember that to use this feature your IAM permissions need to be set up as documented here.

– Kame

CloudFormation YAML template for test

---
Description: An AWS VPC configuration with 1 subnet, 2 security groups and 3 instances. When testing ReachabilityAnalyzer, this provides both a path found and path not found scenario.
AWSTemplateFormatVersion: 2010-09-09

Mappings:
  RegionMap:
    us-east-1:
      execution: ami-0915e09cc7ceee3ab
      ecs: ami-08087103f9850bddd

Resources:
  # VPC
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 172.0.0.0/16
      EnableDnsSupport: true
      EnableDnsHostnames: true
      InstanceTenancy: default

  # Subnets
  Subnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 172.0.0.0/20
      MapPublicIpOnLaunch: false

  # SGs
  SecurityGroup1:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow all ingress and egress traffic
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - CidrIp: 0.0.0.0/0
          IpProtocol: "-1" # -1 specifies all protocols

  SecurityGroup2:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Allow all egress traffic
      VpcId: !Ref VPC

  # Instances
  # Instance A and B should have a path between them since they are both in SecurityGroup 1
  InstanceA:
    Type: AWS::EC2::Instance
    Properties:
      ImageId:
        Fn::FindInMap:
          - RegionMap
          - Ref: AWS::Region
          - execution
      InstanceType: 't3.nano'
      SubnetId:
        Ref: Subnet1
      SecurityGroupIds:
        - Ref: SecurityGroup1

  # Instance A and B should have a path between them since they are both in SecurityGroup 1
  InstanceB:
    Type: AWS::EC2::Instance
    Properties:
      ImageId:
        Fn::FindInMap:
          - RegionMap
          - Ref: AWS::Region
          - execution
      InstanceType: 't3.nano'
      SubnetId:
        Ref: Subnet1
      SecurityGroupIds:
        - Ref: SecurityGroup1

  # This instance should not be reachable from Instance A or B since it is in SecurityGroup 2
  InstanceC:
    Type: AWS::EC2::Instance
    Properties:
      ImageId:
        Fn::FindInMap:
          - RegionMap
          - Ref: AWS::Region
          - execution
      InstanceType: 't3.nano'
      SubnetId:
        Ref: Subnet1
      SecurityGroupIds:
        - Ref: SecurityGroup2

 

AWS Network Firewall – New Managed Firewall Service in VPC

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-network-firewall-new-managed-firewall-service-in-vpc/

Our customers want to have a high availability, scalable firewall service to protect their virtual networks in the cloud. Security is the number one priority of AWS, which has provided various firewall capabilities on AWS that address specific security needs, like Security Groups to protect Amazon Elastic Compute Cloud (EC2) instances, Network ACLs to protect Amazon Virtual Private Cloud (VPC) subnets, AWS Web Application Firewall (WAF) to protect web applications running on Amazon CloudFront, Application Load Balancer (ALB) or Amazon API Gateway, and AWS Shield to protect against Distributed Denial of Service (DDoS) attacks.

We heard customers want an easier way to scale network security across all the resources in their workload, regardless of which AWS services they used. They also want customized protections to secure their unique workloads, or to comply with government mandates or commercial regulations. These customers need the ability to do things like URL filtering on outbound flows, pattern matching on packet data beyond IP/Port/Protocol and the ability to alert on specific vulnerabilities for protocols beyond HTTP/S.

Today, I am happy to announce AWS Network Firewall, a high availability, managed network firewall service for your virtual private cloud (VPC). It enables you to easily deploy and manage stateful inspection, intrusion prevention and detection, and web filtering to protect your virtual networks on AWS. Network Firewall automatically scales with your traffic, ensuring high availability with no additional customer investment in security infrastructure.

With AWS Network Firewall, you can implement customized rules to prevent your VPCs from accessing unauthorized domains, to block thousands of known-bad IP addresses, or identify malicious activity using signature-based detection. AWS Network Firewall makes firewall activity visible in real-time via CloudWatch metrics and offers increased visibility of network traffic by sending logs to S3, CloudWatch and Kinesis Firehose. Network Firewall is integrated with AWS Firewall Manager, giving customers who use AWS Organizations a single place to enable and monitor firewall activity across all your VPCs and AWS accounts. Network Firewall is interoperable with your existing security ecosystem, including AWS partners such as CrowdStrike, Palo Alto Networks, and Splunk. You can also import existing rules from community maintained Suricata rulesets.

Concepts of Network Firewall
AWS Network Firewall runs stateless and stateful traffic inspection rules engines. The engines use rules and other settings that you configure inside a firewall policy.

You use a firewall on a per-Availability Zone basis in your VPC. For each Availability Zone, you choose a subnet to host the firewall endpoint that filters your traffic. The firewall endpoint in an Availability Zone can protect all of the subnets inside the zone except for the one where it’s located.

You can manage AWS Network Firewall with the following central components.

  • Firewall – A firewall connects the VPC that you want to protect to the protection behavior that’s defined in a firewall policy. For each Availability Zone where you want protection, you provide Network Firewall with a public subnet that’s dedicated to the firewall endpoint. To use the firewall, you update the VPC route tables to send incoming and outgoing traffic through the firewall endpoints.
  • Firewall policy – A firewall policy defines the behavior of the firewall in a collection of stateless and stateful rule groups and other settings. You can associate each firewall with only one firewall policy, but you can use a firewall policy for more than one firewall.
  • Rule group – A rule group is a collection of stateless or stateful rules that define how to inspect and handle network traffic. Rules configuration includes 5-tuple and domain name filtering. You can also provide stateful rules using Suricata open source rule specification.

AWS Network Firewall – Getting Started
You can start AWS Network Firewall in AWS Management Console, AWS Command Line Interface (CLI), and AWS SDKs for creating and managing firewalls. In the navigation pane in VPC console, expand AWS Network Firewall and then choose Create firewall in Firewalls menu.

To create a new firewall, enter the name that you want to use to identify this firewall and select your VPC from the dropdown. For each availability zone (AZ) where you want to use AWS Network Firewall, create a public subnet to for the firewall endpoint. This subnet must have at least one IP address available and a non-zero capacity. Keep these firewall subnets reserved for use by Network Firewall.

For Associated firewall policy, select Create and associate an empty firewall policy and choose Create firewall.

Your new firewall is listed in the Firewalls page. The firewall has an empty firewall policy. In the next step, you’ll specify the firewall behavior in the policy. Select your newly created the firewall policy in Firewall policies menu.

You can create or add new stateless or stateful rule groups – zero or more collections of firewall rules, with priority settings that define their processing order within the policy, and stateless default action defines how Network Firewall handles a packet that doesn’t match any of the stateless rule groups.

For stateless default action, the firewall policy allows you to specify different default settings for full packets and for packet fragments. The action options are the same as for the stateless rules that you use in the firewall policy’s stateless rule groups.

You are required to specify one of the following options:

  • Allow – Discontinue all inspection of the packet and permit it to go to its intended destination.
  • Drop – Discontinue all inspection of the packet and block it from going to its intended destination.
  • Forward to stateful rule groups – Discontinue stateless inspection of the packet and forward it to the stateful rule engine for inspection.

Additionally, you can optionally specify a named custom action to apply. For this action, Network Firewall sends an CloudWatch metric dimension named CustomAction with a value specified by you. After you define a named custom action, you can use it by name in the same context where you have define it. You can reuse a custom action setting among the rules in a rule group and you can reuse a custom action setting between the two default stateless custom action settings for a firewall policy.

After you’ve defined your firewall policy, you can insert the firewall into your VPC traffic flow by updating the VPC route tables to include the firewall.

How to set up Rule Groups
You can create new stateless or stateful rule groups in Network Firewall rule groups menu, and choose Create rule group. If you select Stateful rule group, you can select one of three options: 1) 5-tuple format, specifying source IP, source port, destination IP, destination port, and protocol, and specify the action to take for matching traffic, 2) Domain list, specifying a list of domain names and the action to take for traffic that tries to access one of the domains, and 3) Suricata compatible IPS rules, providing advanced firewall rules using Suricata rule syntax.

Network Firewall supports the standard stateless “5 tuple” rule specification for network traffic inspection with priority number that indicates the processing order of the stateless rule within the rule group.

Similarly, a stateful 5 tuple rule has the following match settings. These specify what the Network Firewall stateful rules engine looks for in a packet. A packet must satisfy all match settings to be a match.

A rule group with domain names has the following match settings – Domain name, a list of strings specifying the domain names that you want to match, and Traffic direction, a direction of traffic flow to inspect. The following JSON shows an example rule definition for a domain name rule group.

{
  "RulesSource": {
    "RulesSourceList": {
      "TargetType": "FQDN_SNI","HTTP_HOST",
      "Targets": [
        "test.example.com",
        "test2.example.com"
      ],
      "GeneratedRulesType": "DENYLIST"
    }
  } 
}

A stateful rule group with Suricata compatible IPS rules has all settings defined within the Suricata compatible specification. For example, as following is to detect SSH protocol anomalies. For information about Suricata, see the Suricata website.

alert tcp any any -> any 22 (msg:"ALERT TCP port 22 but not SSH"; app-layer-protocol:!ssh; sid:2271009; rev:1;)

You can monitor Network Firewall using CloudWatch, which collects raw data and processes it into readable, near real-time metrics, and AWS CloudTrail, a service that provides a record of API calls to AWS Network Firewall by a user, role, or an AWS service. CloudTrail captures all API calls for Network Firewall as events. To learn more about logging and monitoring, see the documentation.

Network Firewall Partners
At this launch, Network Firewall integrates with a collection of AWS partners. They provided us with lots of helpful feedback. Here are some of the blog posts that they wrote in order to share their experiences (I am updating this article with links as they are published).

Available Now
AWS Network Firewall is now available in US East (N. Virginia), US West (Oregon), and Europe (Ireland) Regions. Take a look at the product page, price, and the documentation to learn more. Give this a try, and please send us feedback either through your usual AWS Support contacts or the AWS forum for Amazon VPC.

Learn all the details about AWS Network Firewall and get started with the new feature today.

Channy;

Snowflake: Running Millions of Simulation Tests with Amazon EKS

Post Syndicated from Keith Joelner original https://aws.amazon.com/blogs/architecture/snowflake-running-millions-of-simulation-tests-with-amazon-eks/

This post was co-written with Brian Nutt, Senior Software Engineer and Kao Makino, Principal Performance Engineer, both at Snowflake.

Transactional databases are a key component of any production system. Maintaining data integrity while rows are read and written at a massive scale is a major technical challenge for these types of databases. To ensure their stability, it’s necessary to test many different scenarios and configurations. Simulating as many of these as possible allows engineers to quickly catch defects and build resilience. But the Holy Grail is to accomplish this at scale and within a timeframe that allows your developers to iterate quickly.

Snowflake has been using and advancing FoundationDB (FDB), an open-source, ACID-compliant, distributed key-value store since 2014. FDB, running on Amazon Elastic Cloud Compute (EC2) and Amazon Elastic Block Storage (EBS), has proven to be extremely reliable and is a key part of Snowflake’s cloud services layer architecture. To support its development process of creating high quality and stable software, Snowflake developed Project Joshua, an internal system that leverages Amazon Elastic Kubernetes Service (EKS), Amazon Elastic Container Registry (ECR), Amazon EC2 Spot Instances, and AWS PrivateLink to run over one hundred thousand of validation and regression tests an hour.

About Snowflake

Snowflake is a single, integrated data platform delivered as a service. Built from the ground up for the cloud, Snowflake’s unique multi-cluster shared data architecture delivers the performance, scale, elasticity, and concurrency that today’s organizations require. It features storage, compute, and global services layers that are physically separated but logically integrated. Data workloads scale independently from one another, making it an ideal platform for data warehousing, data lakes, data engineering, data science, modern data sharing, and developing data applications.

Snowflake architecture

Developing a simulation-based testing and validation framework

Snowflake’s cloud services layer is composed of a collection of services that manage virtual warehouses, query optimization, and transactions. This layer relies on rich metadata stored in FDB.

Prior to the creation of the simulation framework, Project Joshua, FDB developers ran tests on their laptops and were limited by the number they could run. Additionally, there was a scheduled nightly job for running further tests.

Joshua at Snowflake

Amazon EKS as the foundation

Snowflake’s platform team decided to use Kubernetes to build Project Joshua. Their focus was on helping engineers run their workloads instead of spending cycles on the management of the control plane. They turned to Amazon EKS to achieve their scalability needs. This was a crucial success criterion for Project Joshua since at any point in time there could be hundreds of nodes running in the cluster. Snowflake utilizes the Kubernetes Cluster Autoscaler to dynamically scale worker nodes in minutes to support a tests-based queue of Joshua’s requests.

With the integration of Amazon EKS and Amazon Virtual Private Cloud (Amazon VPC), Snowflake is able to control access to the required resources. For example: the database that serves Joshua’s test queues is external to the EKS cluster. By using the Amazon VPC CNI plugin, each pod receives an IP address in the VPC and Snowflake can control access to the test queue via security groups.

To achieve its desired performance, Snowflake created its own custom pod scaler, which responds quicker to changes than using a custom metric for pod scheduling.

  • The agent scaler is responsible for monitoring a test queue in the coordination database (which, coincidentally, is also FDB) to schedule Joshua agents. The agent scaler communicates directly with Amazon EKS using the Kubernetes API to schedule tests in parallel.
  • Joshua agents (one agent per pod) are responsible for pulling tests from the test queue, executing, and reporting results. Tests are run one at a time within the EKS Cluster until the test queue is drained.

Achieving scale and cost savings with Amazon EC2 Spot

A Spot Fleet is a collection—or fleet—of Amazon EC2 Spot instances that Joshua uses to make the infrastructure more reliable and cost effective. ​ Spot Fleet is used to reduce the cost of worker nodes by running a variety of instance types.

With Spot Fleet, Snowflake requests a combination of different instance types to help ensure that demand gets fulfilled. These options make Fleet more tolerant of surges in demand for instance types. If a surge occurs it will not significantly affect tasks since Joshua is agnostic to the type of instance and can fall back to a different instance type and still be available.

For reservations, Snowflake uses the capacity-optimized allocation strategy to automatically launch Spot Instances into the most available pools by looking at real-time capacity data and predicting which are the most available. This helps Snowflake quickly switch instances reserved to what is most available in the Spot market, instead of spending time contending for the cheapest instances, at the cost of a potentially higher price.

Overcoming hurdles

Snowflake’s usage of a public container registry posed a scalability challenge. When starting hundreds of worker nodes, each node needs to pull images from the public registry. This can lead to a potential rate limiting issue when all outbound traffic goes through a NAT gateway.

For example, consider 1,000 nodes pulling a 10 GB image. Each pull request requires each node to download the image across the public internet. Some issues that need to be addressed are latency, reliability, and increased costs due to the additional time to download an image for each test. Also, container registries can become unavailable or may rate-limit download requests. Lastly, images are pulled through public internet and other services in the cluster can experience pulling issues.

​For anything more than a minimal workload, a local container registry is needed. If an image is first pulled from the public registry and then pushed to a local registry (cache), it only needs to pull once from the public registry, then all worker nodes benefit from a local pull. That’s why Snowflake decided to replicate images to ECR, a fully managed docker container registry, providing a reliable local registry to store images. Additional benefits for the local registry are that it’s not exclusive to Joshua; all platform components required for Snowflake clusters can be cached in the local ECR Registry. For additional security and performance Snowflake uses AWS PrivateLink to keep all network traffic from ECR to the workers nodes within the AWS network. It also resolved rate-limiting issues from pulling images from a public registry with unauthenticated requests, unblocking other cluster nodes from pulling critical images for operation.

Conclusion

Project Joshua allows Snowflake to enable developers to test more scenarios without having to worry about the management of the infrastructure. ​ Snowflake’s engineers can schedule thousands of test simulations and configurations to catch bugs faster. FDB is a key component of ​the Snowflake stack and Project Joshua helps make FDB more stable and resilient. Additionally, Amazon EC2 Spot has provided non-trivial cost savings to Snowflake vs. running on-demand or buying reserved instances.

If you want to know more about how Snowflake built its high performance data warehouse as a Service on AWS, watch the This is My Architecture video below.

Architecting for Reliable Scalability

Post Syndicated from Marwan Al Shawi original https://aws.amazon.com/blogs/architecture/architecting-for-reliable-scalability/

Cloud solutions architects should ideally “build today with tomorrow in mind,” meaning their solutions need to cater to current scale requirements as well as the anticipated growth of the solution. This growth can be either the organic growth of a solution or it could be related to a merger and acquisition type of scenario, where its size is increased dramatically within a short period of time.

Still, when a solution scales, many architects experience added complexity to the overall architecture in terms of its manageability, performance, security, etc. By architecting your solution or application to scale reliably, you can avoid the introduction of additional complexity, degraded performance, or reduced security as a result of scaling.

Generally, a solution or service’s reliability is influenced by its up time, performance, security, manageability, etc. In order to achieve reliability in the context of scale, take into consideration the following primary design principals.

Modularity

Modularity aims to break a complex component or solution into smaller parts that are less complicated and easier to scale, secure, and manage.

Monolithic architecture vs. modular architecture

Figure 1: Monolithic architecture vs. modular architecture

Modular design is commonly used in modern application developments. where an application’s software is constructed of multiple and loosely coupled building blocks (functions). These functions collectively integrate through pre-defined common interfaces or APIs to form the desired application functionality (commonly referred to as microservices architecture).

 

Scalable modular applications

Figure 2: Scalable modular applications

For more details about building highly scalable and reliable workloads using a microservices architecture, refer to Design Your Workload Service Architecture.

This design principle can also be applied to different components of the solution’s architecture. For example, when building a cloud solution on a single Amazon VPC, it may reach certain scaling limits and make it harder to introduce changes at scale due to the higher level of dependencies. This single complex VPC can be divided into multiple smaller and simpler VPCs. The architecture based on multiple VPCs can vary. For example, the VPCs can be divided based on a service or application building block, a specific function of the application, or on organizational functions like a VPC for various departments. This principle can also be leveraged at a regional level for very high scale global architectures. You can make the architecture modular at a global level by distributing the multiple VPCs across different AWS Regions to achieve global scale (facilitated by AWS Global Infrastructure).

In addition, modularity promotes separation of concerns by having well-defined boundaries among the different components of the architecture. As a result, each component can be managed, secured, and scaled independently. Also, it helps you avoid what is commonly known as “fate sharing,” where a vertically scaled server hosts a monolithic application, and any failure to this server will impact the entire application.

Horizontal scaling

Horizontal scaling, commonly referred to as scale-out, is the capability to automatically add systems/instances in a distributed manner in order to handle an increase in load. Examples of this increase in load could be the increase of number of sessions to a web application. With horizontal scaling, the load is distributed across multiple instances. By distributing these instances across Availability Zones, horizontal scaling not only increases performance, but also improves the overall reliability.

In order for the application to work seamlessly in a scale-out distributed manner, the application needs to be designed to support a stateless scaling model, where the application’s state information is stored and requested independently from the application’s instances. This makes the on-demand horizontal scaling easier to achieve and manage.

This principle can be complemented with a modularity design principle, in which the scaling model can be applied to certain component(s) or microservice(s) of the application stack. For example, only scale-out Amazon Elastic Cloud Compute (EC2) front-end web instances that reside behind an Elastic Load Balancing (ELB) layer with auto-scaling groups. In contrast, this elastic horizontal scalability might be very difficult to achieve for a monolithic type of application.

Leverage the content delivery network

Leveraging Amazon CloudFront and its edge locations as part of the solution architecture can enable your application or service to scale rapidly and reliably at a global level, without adding any complexity to the solution. The integration of a CDN can take different forms depending on the solution use case.

For example, CloudFront played an important role to enable the scale required throughout Amazon Prime Day 2020 by serving up web and streamed content to a worldwide audience, which handled over 280 million HTTP requests per minute.

Go serverless where possible

As discussed earlier in this post, modular architectures based on microservices reduce the complexity of the individual component or microservice. At scale it may introduce a different type of complexity related to the number of these independent components (microservices). This is where serverless services can help to reduce such complexity reliably and at scale. With this design model you no longer have to provision, manually scale, maintain servers, operating systems, or runtimes to run your applications.

For example, you may consider using a microservices architecture to modernize an application at the same time to simplify the architecture at scale using Amazon Elastic Kubernetes Service (EKS) with AWS Fargate.

Example of a serverless microservices architecture

Figure 3: Example of a serverless microservices architecture

In addition, an event-driven serverless capability like AWS Lambda is key in today’s modern scalable cloud solutions, as it handles running and scaling your code reliably and efficiently. See How to Design Your Serverless Apps for Massive Scale and 10 Things Serverless Architects Should Know for more information.

Secure by design

To avoid any major changes at a later stage to accommodate security requirements, it’s essential that security is taken into consideration as part of the initial solution design. For example, if the cloud project is new or small, and you don’t consider security properly at the initial stages, once the solution starts to scale, redesigning the entire cloud project from scratch to accommodate security best practices is usually not a simple option, which may lead to consider suboptimal security solutions that may impact the desired scale to be achieved. By leveraging CDN as part of the solution architecture (as discussed above), using Amazon CloudFront, you can minimize the impact of distributed denial of service (DDoS) attacks as well as perform application layer filtering at the edge. Also, when considering serverless services and the Shared Responsibility Model, from a security lens you can delegate a considerable part of the application stack to AWS so that you can focus on building applications. See The Shared Responsibility Model for AWS Lambda.

Design with security in mind by incorporating the necessary security services as part of the initial cloud solution. This will allow you to add more security capabilities and features as the solution grows, without the need to make major changes to the design.

Design for failure

The reliability of a service or solution in the cloud depends on multiple factors, the primary of which is resiliency. This design principle becomes even more critical at scale because the failure impact magnitude typically will be higher. Therefore, to achieve a reliable scalability, it is essential to design a resilient solution, capable of recovering from infrastructure or service disruptions. This principle involves designing the overall solution in such a way that even if one or more of its components fail, the solution is still be capable of providing an acceptable level of its expected function(s). See AWS Well-Architected Framework – Reliability Pillar for more information.

Conclusion

Designing for scale alone is not enough. Reliable scalability should be always the targeted architectural attribute. The design principles discussed in this blog act as the foundational pillars to support it, and ideally should be combined with adopting a DevOps model.

Mercado Libre: How to Block Malicious Traffic in a Dynamic Environment

Post Syndicated from Gaston Ansaldo original https://aws.amazon.com/blogs/architecture/mercado-libre-how-to-block-malicious-traffic-in-a-dynamic-environment/

Blog post contributors: Pablo Garbossa and Federico Alliani of Mercado Libre

Introduction

Mercado Libre (MELI) is the leading e-commerce and FinTech company in Latin America. We have a presence in 18 countries across Latin America, and our mission is to democratize commerce and payments to impact the development of the region.

We manage an ecosystem of more than 8,000 custom-built applications that process an average of 2.2 million requests per second. To support the demand, we run between 50,000 to 80,000 Amazon Elastic Cloud Compute (EC2) instances, and our infrastructure scales in and out according to the time of the day, thanks to the elasticity of the AWS cloud and its auto scaling features.

Mercado Libre

As a company, we expect our developers to devote their time and energy building the apps and features that our customers demand, without having to worry about the underlying infrastructure that the apps are built upon. To achieve this separation of concerns, we built Fury, our platform as a service (PaaS) that provides an abstraction layer between our developers and the infrastructure. Each time a developer deploys a brand new application or a new version of an existing one, Fury takes care of creating all the required components such as Amazon Virtual Private Cloud (VPC), Amazon Elastic Load Balancing (ELB), Amazon EC2 Auto Scaling group (ASG), and EC2) instances. Fury also manages a per-application Git repository, CI/CD pipeline with different deployment strategies, such like blue-green and rolling upgrades, and transparent application logs and metrics collection.

Fury- MELI PaaS

For those of us on the Cloud Security team, Fury represents an opportunity to enforce critical security controls across our stack in a way that’s transparent to our developers. For instance, we can dictate what Amazon Machine Images (AMIs) are vetted for use in production (such as those that align with the Center for Internet Security benchmarks). If needed, we can apply security patches across all of our fleet from a centralized location in a very scalable fashion.

But there are also other attack vectors that every organization that has a presence on the public internet is exposed to. The AWS recent Threat Landscape Report shows a 23% YoY increase in the total number of Denial of Service (DoS) events. It’s evident that organizations need to be prepared to quickly react under these circumstances.

The variety and the number of attacks are increasing, testing the resilience of all types of organizations. This is why we started working on a solution that allows us to contain application DoS attacks, and complements our perimeter security strategy, which is based on services such as AWS Shield and AWS Web Application Firewall (WAF). In this article, we will walk you through the solution we built to automatically detect and block these events.

The strategy we implemented for our solution, Network Behavior Anomaly Detection (NBAD), consists of four stages that we repeatedly execute:

  1. Analyze the execution context of our applications, like CPU and memory usage
  2. Learn their behavior
  3. Detect anomalies, gather relevant information and process it
  4. Respond automatically

Step 1: Establish a baseline for each application

End user traffic enters through different AWS CloudFront distributions that route to multiple Elastic Load Balancers (ELBs). Behind the ELBs, we operate a fleet of NGINX servers from where we connect back to the myriad of applications that our developers create via Fury.

MELI Architecture - nomaly detection project-step 1

Step 1: MELI Architecture – Anomaly detection project

We collect logs and metrics for each application that we ship to Amazon Simple Storage Service (S3) and Datadog. We then partition these logs using AWS Glue to make them available for consumption via Amazon Athena. On average, we send 3 terabytes (TB) of log files in parquet format to S3.

Based on this information, we developed processes that we complement with commercial solutions, such as Datadog’s Anomaly Detection, which allows us to learn the normal behavior or baseline of our applications and project expected adaptive growth thresholds for each one of them.

Anomaly detection

Step 2: Anomaly detection

When any of our apps receives a number of requests that fall outside the limits set by our anomaly detection algorithms, an Amazon Simple Notification Service (SNS) event is emitted, which triggers a workflow in the Anomaly Analyzer, a custom-built component of this solution.

Upon receiving such an event, the Anomaly Analyzer starts composing the so-called event context. In parallel, the Data Extractor retrieves vital insights via Athena from the log files stored in S3.

The output of this process is used as the input for the data enrichment process. This is responsible for consulting different threat intelligence sources that are used to further augment the analysis and determine if the event is an actual incident or not.

At this point, we build the context that will allow us not only to have greater certainty in calculating the score, but it will also help us validate and act quicker. This context includes:

  • Application’s owner
  • Affected business metrics
  • Error handling statistics of our applications
  • Reputation of IP addresses and associated users
  • Use of unexpected URL parameters
  • Distribution by origin of the traffic that generated the event (cloud providers, geolocation, etc.)
  • Known behavior patterns of vulnerability discovery or exploitation
Step 2: MELI Architecture - Anomaly detection project

Step 2: MELI Architecture – Anomaly detection project

Step 3: Incident response

Once we reconstruct the context of the event, we calculate a score for each “suspicious actor” involved.

Step 3: MELI Architecture - Anomaly detection project

Step 3: MELI Architecture – Anomaly detection project

Based on these analysis results we carry out a series of verifications in order to rule out false positives. Finally, we execute different actions based on the following criteria:

Manual review

If the outcome of the automatic analysis results in a medium risk scoring, we activate a manual review process:

  1. We send a report to the application’s owners with a summary of the context. Based on their understanding of the business, they can activate the Incident Response Team (IRT) on-call and/or provide feedback that allows us to improve our automatic rules.
  2. In parallel, our threat analysis team receives and processes the event. They are equipped with tools that allow them to add IP addresses, user-agents, referrers, or regular expressions into Amazon WAF to carry out temporary blocking of “bad actors” in situations where the attack is in progress.

Automatic response

If the analysis results in a high risk score, an automatic containment process is triggered. The event is sent to our block API, which is responsible for adding a temporary rule designed to mitigate the attack in progress. Behind the scenes, our block API leverages AWS WAF to create IPSets. We reference these IPsets from our custom rule groups in our web ACLs, in order to block IPs that source the malicious traffic. We found many benefits in the new release of AWS WAF, like support for Amazon Managed Rules, larger capacity units per web ACL as well as an easier to use API.

Conclusion

By leveraging the AWS platform and its powerful APIs, and together with the AWS WAF service team and solutions architects, we were able to build an automated incident response solution that is able to identify and block malicious actors with minimal operator intervention. Since launching the solution, we have reduced YoY application downtime over 92% even when the time under attack increased over 10x. This has had a positive impact on our users and therefore, on our business.

Not only was our downtime drastically reduced, but we also cut the number of manual interventions during this type of incident by 65%.

We plan to iterate over this solution to further reduce false positives in our detection mechanisms as well as the time to respond to external threats.

About the authors

Pablo Garbossa is an Information Security Manager at Mercado Libre. His main duties include ensuring security in the software development life cycle and managing security in MELI’s cloud environment. Pablo is also an active member of the Open Web Application Security Project® (OWASP) Buenos Aires chapter, a nonprofit foundation that works to improve the security of software.

Federico Alliani is a Security Engineer on the Mercado Libre Monitoring team. Federico and his team are in charge of protecting the site against different types of attacks. He loves to dive deep into big architectures to drive performance, scale operational efficiency, and increase the speed of detection and response to security events.

Reduce Cost and Increase Security with Amazon VPC Endpoints

Post Syndicated from Nigel Harris original https://aws.amazon.com/blogs/architecture/reduce-cost-and-increase-security-with-amazon-vpc-endpoints/

Introduction

This blog explains the benefits of using Amazon VPC endpoints and highlights a self-paced workshop that will help you to learn more about them. Amazon Virtual Private Cloud (Amazon VPC) enables you to launch AWS resources into a virtual network that you’ve defined. This virtual network resembles a traditional network that you’d operate in your own data center, with the benefits of using the scalable infrastructure of AWS.

A VPC endpoint allows you to privately connect your VPC to supported AWS services without requiring an Internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. Endpoints are virtual devices that are horizontally scaled, redundant, and highly available VPC components. They allow communication between instances in your VPC and services without imposing availability risks or bandwidth constraints on your network traffic.

VPC endpoints enable you to reduce data transfer charges resulting from network communication between private VPC resources (such as Amazon Elastic Cloud Compute—or EC2—instances) and AWS Services (such as Amazon Quantum Ledger Database, or QLDB). Without VPC endpoints configured, communications that originate from within a VPC destined for public AWS services must egress AWS to the public Internet in order to access AWS services. This network path incurs outbound data transfer charges. Data transfer charges for traffic egressing from Amazon EC2 to the Internet vary based on volume. However, at the time of writing, after the first 1GB / Month ($0.00 per GB), transfers are charged at a rate of $ 0.09/GB (for AWS US-East 1 Virginia). With VPC endpoints configured, communication between your VPC and the associated AWS service does not leave the Amazon network. If your workload requires you to transfer significant volumes of data between your VPC and AWS, you can reduce costs by leveraging VPC endpoints.

There are two types of VPC endpoints: interface endpoints and gateway endpoints. Amazon Simple Storage Service (S3) and Amazon DynamoDB are accessed using gateway endpoints. You can configure resource policies on both the gateway endpoint and the AWS resource that the endpoint provides access to. A VPC endpoint policy is an AWS Identity and Access Management (AWS IAM) resource policy that you can attach to an endpoint. It is a separate policy for controlling access from the endpoint to the specified service. This enables granular access control and private network connectivity from within a VPC. For example, you could create a policy that restricts access to a specific DynamoDB table through a VPC endpoint.

Figure 1: Accessing S3 via a Gateway VPC Endpoint

Figure 1: Accessing S3 via a Gateway VPC Endpoint

Interface endpoints enable you to connect to services powered by AWS PrivateLink. This includes a large number of AWS services, services hosted by other AWS customers and partners in their own VPCs, and supported AWS Marketplace partner services. Like gateway endpoints, interface endpoints can be secured using resource policies on the endpoint itself and the resource that the endpoint provides access to. Interface endpoints enable the use of security groups to restrict access to the endpoint.

Figure 2: Accessing QLDB via an Interface VPC Endpoint

Figure 2: Accessing QLDB via an Interface VPC Endpoint

In larger multi-account AWS environments, network design can vary considerably. Consider an organization that has built a hub-and-spoke network with AWS Transit Gateway. VPCs have been provisioned into multiple AWS accounts, perhaps to facilitate network isolation or to enable delegated network administration. When deploying distributed architectures such as this, a popular approach is to build a “shared services VPC, which provides access to services required by workloads in each of the VPCs. This might include directory services or VPC endpoints. Sharing resources from a central location instead of building them in each VPC may reduce administrative overhead and cost. This approach was outlined by my colleague Bhavin Desai in his blog post Centralized DNS management of hybrid cloud with Amazon Route 53 and AWS Transit Gateway.

Figure 3: Centralized VPC Endpoints (multiple VPCs)

Figure 3: Centralized VPC Endpoints (multiple VPCs)

Alternatively, an organization may have centralized its network and chosen to leverage VPC sharing to enable multiple AWS accounts to create application resources (such as Amazon EC2 instances, Amazon Relational Database Service (RDS) databases, and AWS Lambda functions) into a shared, centrally managed network. With either pattern, establishing granular set of controls to limit access to resources can be critical to support organizational security and compliance objectives while maintaining operational efficiency.

Figure 4: Centralized VPC Endpoints (shared VPC)

Figure 4: Centralized VPC Endpoints (shared VPC)

Learn how with the VPC Endpoint Workshop

Understanding how to appropriately restrict access to endpoints and the services they provide connectivity to is an often-misunderstood topic. I recently authored a hands-on workshop to help customers learn how to provision appropriate levels of access. Continue to learn about Amazon VPC Endpoints by taking the VPC Endpoint Workshop and then improve the security posture of your cloud workloads by leveraging network controls and VPC endpoint policies to manage access to your AWS resources.

Using AWS Lambda IAM condition keys for VPC settings

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/using-aws-lambda-iam-condition-keys-for-vpc-settings/

You can now control the Amazon Virtual Private Cloud (VPC) settings for your AWS Lambda functions using AWS Identity and Access Management (IAM) condition keys. IAM condition keys enable you to further refine the conditions under which an IAM policy statement applies. You can use the new condition keys in IAM policies when granting permissions to create and update functions.

The three new condition keys for VPC settings are lambda:VpcIds, lambda:SubnetIds, and lambda:SecurityGroupIds. The keys allow you to ensure that users can only deploy functions connected to one or more allowed VPCs, subnets, and security groups. If users try to create or update a function with VPC settings that are not allowed, Lambda rejects the operation.

Understanding Lambda and VPCs

All of the Lambda compute infrastructure runs inside VPCs owned by the Lambda service. Lambda functions can only be invoked by calling the Lambda API. There is no direct network access to the execution environment where your functions run.

Non-VPC connected Lambda functions

When your Lambda function is not configured to connect to your own VPCs, the function can access anything available on the public internet. This includes other AWS services, HTTPS endpoints for APIs, or services and endpoints outside AWS. The function cannot directly connect to your private resources inside of your VPC.

VPC connected Lambda functions

You can configure a Lambda function to connect to private subnets in a VPC in your account. When a Lambda function is configured to use a VPC, the Lambda function still runs inside the AWS Lambda service VPC. The function then sends all network traffic through your VPC and abides by your VPC’s network controls. You can use these controls to define where your functions can connect using security groups and network ACLs. Function egress traffic comes from your own network address space, and you have network visibility using VPC flow logs.

You can restrict access to network locations, including the public internet. A Lambda function connected to a VPC has no internet access by default. To give your function access to the internet, you can route outbound traffic to a network address translation (NAT) gateway in a public subnet.

When you configure your Lambda function to connect to your own VPC, it uses a shared elastic network interface (ENI) managed by AWS Hyperplane. The connection creates a VPC-to-VPC NAT and does a cross-account attachment, which allows network access from your Lambda functions to your private resources.

AWS Lambda service VPC with VPC-to-VPT NAT to customer VPC

AWS Lambda service VPC with VPC-to-VPT NAT to customer VPC

The Hyperplane ENI is a managed network interface resource that the Lambda service controls and sits in your VPC inside of your account. Multiple execution environments share the ENI to securely access resources inside of a VPC in your account. You still do not have direct network access to the execution environment.

When are ENIs created?

The network interface creation happens when your Lambda function is created or its VPC settings are updated. When a function is invoked, the execution environment uses the pre-created network interface and quickly establishes a network tunnel to it. This reduces the latency that was previously associated with creating and attaching a network interface during a cold start.

How many ENIs are required?

Because the network interfaces are shared across execution environments, typically only a handful of network interfaces are required per function. Every unique security group:subnet combination across functions in your account requires a distinct network interface. If multiple functions in the same account use the same security group:subnet pairing, it reuses the same network interface. This way, a single application with multiple functions but the same network and security configuration can benefit from the existing interface configuration.

Your function scaling is no longer directly tied to the number of network interfaces. Hyperplane ENIs can scale to support large numbers of concurrent function executions.

If your functions are not active for a long period of time, Lambda reclaims its network interfaces, and the function becomes idle and inactive. You must invoke an idle function to reactivate it. The first invocation fails and the function enters a pending state again until the network interface is available.

Using the new Lambda condition keys for VPC settings

With the new VPC condition key settings, you can specify one or more required VPC, subnets, and security groups. The lambda:VpcIds value is inferred from the subnet and security groups the CreateFunction API caller provides.

The condition syntax is in the format "Condition":{"{condition-operator}":{"{condition-key}":"{condition-value}"}}. You can use condition operators with multiple keys and values to construct policy documents.

I have a private VPC configured with the following four subnets:

Private VPC subnets

Private VPC subnets

I have a MySQL database instance running in my private VPC. The instance is running in us-east-1b in subnet subnet-046c0d0c487b0515b with a failover in us-east-1c in subnet subnet-091e180fa55fb8e83. I have an associated security group sg-0a56588b3406ee3d3 allowing access to the database. As this is a private subnet, I don’t allow internet access.

I want to ensure that any Lambda functions I create with my account must only connect to my private VPC.

  1. I create the following IAM policy document, which I attach to my account. It uses a Deny condition key with a ForAllValues:StringNotEquals condition operator to specify a required VpcId.
  2. {
        "Version": "2012-10-17",
        "Statement": [
    		{
    			"Sid": "Stmt159186333251",
    			"Action": ["lambda:CreateFunction","lambda:UpdateFunctionConfiguration"],
    			"Effect": "Deny",
    			"Resource": "*",
    			"Condition": {"ForAllValues:StringNotEquals": {"lambda:VpcIds":["vpc-0eebf3d0fe63a2db1"]}}
    		}
        ]
    }
    
  3. I attempt to create a Lambda function that does not connect to my VPC by excluding --vpc-config in the API call.
  4. aws lambda create-function --function-name MyVPCLambda1 \
      --runtime python3.7 --handler helloworld.handler --zip-file fileb://vpccondition.zip \
      --region us-east-1 --role arn:aws:iam::123456789012:role/VPCConditionLambdaRole
    
  5. I receive an AccessDeniedException error with an explicit deny:
  6. Lambda function creation AccessDeniedException

    Lambda function creation AccessDeniedException

  7. I attempt to create the Lambda function again and include any one of the subnets in my VPC, along with the security group. I must include both the SubnetIds and SecurityGroupId values with the --vpc-config.
aws lambda create-function --function-name MyVPCLambda1 \
  --vpc-config "SubnetIds=['subnet-019c87c9b67742a8f'],SecurityGroupIds=['sg-0a56588b3406ee3d3']" \
  --runtime python3.7 --handler helloworld.handler --zip-file fileb://vpccondition.zip \
  --region us-east-1 --role arn:aws:iam::123456789012:role/VPCConditionLambdaRole

The function is created successfully.

Successfully create Lambda function connected to VPC

Successfully create Lambda function connected to VPC

I also want to ensure that any Lambda functions created in my account must have the following in the configuration:

  • My private VPC
  • Both subnets containing my database instances
  • The security group including the MySQL database instance
  1. I amend my account IAM policy document to include restrictions for SubnetIds and SecurityGroupIds. I do not need to specify VpcIds as this is inferred.
  2. {
        "Version": "2012-10-17",
        "Statement": [
    		{
    			"Sid": "Stmt159186333252",
    			"Action": ["lambda:CreateFunction","lambda:UpdateFunctionConfiguration"],
    			"Effect": "Deny",
    			"Resource": "*",
    			"Condition": {"ForAllValues:StringNotEquals": {"lambda:SubnetIds": ["subnet-046c0d0c487b0515b","subnet-091e180fa55fb8e83"]}}
    		},
    		{
    			"Sid": "Stmt159186333253",
    			"Action": ["lambda:CreateFunction","lambda:UpdateFunctionConfiguration"],
    			"Effect": "Deny",
    			"Resource": "*",
    			"Condition": {"ForAllValues:StringNotEquals": {"lambda:SecurityGroupIds": ["sg-0a56588b3406ee3d3"]}}
    		}
        ]
    }
    
  3. I try to create another Lambda function, using --vpc-config values with a subnet in my VPC that’s not in the allowed permission list, along with the security group.
  4. aws lambda create-function --function-name MyVPCLambda2 \
      --vpc-config "SubnetIds=['subnet-019c87c9b67742a8f'],SecurityGroupIds=['sg-0a56588b3406ee3d3']" \
      --runtime python3.7 --handler helloworld.handler --zip-file fileb://vpccondition.zip \
      --region us-east-1 --role arn:aws:iam::123456789012:role/VPCConditionLambdaRole
    

    I receive an AccessDeniedException error.

  5. I retry, specifying both valid and allowed SubnetIds and SecurityGroupIds:
aws lambda create-function --function-name MyVPCLambda2 \
  --vpc-config "SubnetIds=['subnet-046c0d0c487b0515b','subnet-091e180fa55fb8e83'],SecurityGroupIds=['sg-0a56588b3406ee3d3']" \
  --runtime python3.7 --handler helloworld.handler --zip-file fileb://vpccondition.zip \
  --region us-east-1 --role arn:aws:iam::123456789012:role/VPCConditionLambdaRole

The function creation is successful.

Successfully create Lambda function connected to specific subnets and security groups

Successfully create Lambda function connected to specific subnets and security groups

With these settings, I can ensure that I can only create Lambda functions with the allowed VPC network security settings.

Updating Lambda functions

When updating Lambda function configuration, you do not need to specify the VPC settings if they already exist. Lambda checks the existing VPC settings before making the authorization call to IAM.

The following command to add more memory to the Lambda function, without specifying the VPC configuration, is successful as the configuration already exists.

aws lambda update-function-configuration --function-name MyVPCLambda2 --memory-size 512

Lambda layer condition keys

Lambda also has another existing condition key – lambda:Layer.

Lambda layers allow you to share code and content between multiple Lambda functions, or even multiple applications.

The lambda:Layer condition key allows you to enforce that a function must include a particular layer, or allowed group of layers. You can also prevent using layers. You can limit using layers to only those from your accounts, preventing layers published by accounts that are not yours.

Conclusion

You can now control the VPC settings for your Lambda functions using IAM condition keys.

The new VPC setting condition keys are available in all AWS Regions where Lambda is available. To learn more about the new condition keys and view policy examples, see “Using IAM condition keys for VPC settings” and  “Resource and Conditions for Lambda actions” in the Lambda Developer Guide.  To learn more about using IAM condition keys, see “IAM JSON Policy Elements: Condition” in the IAM User Guide.

Building well-architected serverless applications: Controlling serverless API access – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-controlling-serverless-api-access-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the nine serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the Introduction post for a table of contents and explanation of the example application.

Security question SEC1: How do you control access to your serverless API?

Use authentication and authorization mechanisms to prevent unauthorized access, and enforce quota for public resources. By controlling access to your API, you can help protect against unauthorized access and prevent unnecessary use of resources.

AWS has a number of services to provide API endpoints including Amazon API Gateway and AWS AppSync.

Use Amazon API Gateway for RESTful and WebSocket APIs. Here is an example serverless web application architecture using API Gateway.

Example serverless application architecture using API Gateway

Example serverless application architecture using API Gateway

Use AWS AppSync for managed GraphQL APIs.

AWS AppSync overview diagram

AWS AppSync overview diagram

The serverless airline example in this series uses AWS AppSync to provide the frontend, user-facing public API. The application also uses API Gateway to provide backend, internal, private REST APIs for the loyalty and payment services.

Good practice: Use an authentication and an authorization mechanism

Authentication and authorization are mechanisms for controlling and managing access to a resource. In this well-architected question, that is a serverless API. Authentication is verifying who a client or user is. Authorization is deciding whether they have the permission to access a resource. By enforcing authorization, you can prevent unauthorized access to your workload from non-authenticated users.

Integrate with an identity provider that can validate your API consumer’s identity. An identity provider is a system that provides user authentication as a service. The identity provider may use the XML-based Security Assertion Markup Language (SAML), or JSON Web Tokens (JWT) for authentication. It may also federate with other identity management systems. JWT is an open standard that defines a way for securely transmitting information between parties as a JSON object. JWT uses frameworks such as OAuth 2.0 for authorization and OpenID Connect (OIDC), which builds on OAuth2, and adds authentication.

Only authorize access to consumers that have successfully authenticated. Use an identity provider rather than API keys as a primary authorization method. API keys are more suited to rate limiting and throttling.

Evaluate authorization mechanisms

Use AWS Identity and Access Management (IAM) for authorizing access to internal or private API consumers, or other AWS Managed Services like AWS Lambda.

For public, user facing web applications, API Gateway accepts JWT authorizers for authenticating consumers. You can use either Amazon Cognito or OpenID Connect (OIDC).

App client authenticates and gets tokens

App client authenticates and gets tokens

For custom authorization needs, you can use Lambda authorizers.

A Lambda authorizer (previously called a custom authorizer) is an AWS Lambda function which API Gateway calls for an authorization check when a client makes a request to an API method. This means you do not have to write custom authorization logic in a function behind an API. The Lambda authorizer function can validate a bearer token such as JWT, OAuth, or SAML, or request parameters and grant access. Lambda authorizers can be used when using an identity provider other than Amazon Cognito or AWS IAM, or when you require additional authorization customization.

Lambda authorizers

Lambda authorizers

For more information, see the AWS Hero blog post, “The Complete Guide to Custom Authorizers with AWS Lambda and API Gateway”.

The AWS documentation also has a useful section on “Understanding Lambda Authorizers Auth Workflow with Amazon API Gateway”.

Enforce authorization for non-public resources within your API

Within API Gateway, you can enable native authorization for users authenticated using Amazon Cognito or AWS IAM. For authorizing users authenticated by other identity providers, use Lambda authorizers.

For example, within the serverless airline, the loyalty service uses a Lambda function to fetch loyalty points and next tier progress. AWS AppSync acts as the client using an HTTP resolver, via an API Gateway REST API /loyalty/{customerId}/get resource, to invoke the function.

To ensure only AWS AppSync is authorized to invoke the API, IAM authorization is set within the API Gateway method request.

Viewing API Gateway IAM authorization

Viewing API Gateway IAM authorization

The serverless airline uses the AWS Serverless Application Model (AWS SAM) to deploy the backend infrastructure as code. This makes it easier to know which IAM role has access to the API. One of the benefits of using infrastructure as code is visibility into all deployed application resources, including IAM roles.

The loyalty service AWS SAM template contains the AppsyncLoyaltyRestApiIamRole.

AppsyncLoyaltyRestApiIamRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
  AppsyncLoyaltyRestApiIamRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: appsync.amazonaws.com
            Action: sts:AssumeRole
      Path: /
      Policies:
        - PolicyName: LoyaltyApiInvoke
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - execute-api:Invoke
                # arn:aws:execute-api:region:account-id:api-id/stage/METHOD_HTTP_VERB/Resource-path
                Resource: !Sub arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${LoyaltyApi}/*/*/*

The IAM role specifies that appsync.amazonaws.com can perform an execute-api:Invoke on the specific API Gateway resource arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${LoyaltyApi}/*/*/*

Within AWS AppSync, you can enable native authorization for users authenticating using Amazon Cognito or AWS IAM. You can also use any external identity provider compliant with OpenID Connect (OIDC).

Improvement plan summary:

  1. Evaluate authorization mechanisms.
  2. Enforce authorization for non-public resources within your API

Required practice: Use appropriate endpoint type and mechanisms to secure access to your API

APIs may have public or private endpoints. Consider public endpoints to serve consumers where they may not be part of your network perimeter. Consider private endpoints to serve consumers within your network perimeter where you may not want to expose the API publicly. Public and private endpoints may have different levels of security.

Determine your API consumer and choose an API endpoint type

For providing public content, use Amazon API Gateway or AWS AppSync public endpoints.

For providing content with restricted access, use Amazon API Gateway with authorization to specific resources, methods, and actions you want to restrict. For example, the serverless airline application uses AWS IAM to restrict access to the private loyalty API so only AWS AppSync can call it.

With AWS AppSync providing a GraphQL API, restrict access to specific data types, data fields, queries, mutations, or subscriptions.

You can create API Gateway private REST APIs that you can only access from your AWS Virtual Private Cloud(VPC) by using an interface VPC endpoint.

API Gateway private endpoints

API Gateway private endpoints

For more information, see “Choose an endpoint type to set up for an API Gateway API”.

Implement security mechanisms appropriate to your API endpoint

With Amazon API Gateway and AWS AppSync, for both public and private endpoints, there are a number of mechanisms for access control.

For providing content with restricted access, API Gateway REST APIs support native authorization using AWS IAM, Amazon Cognito user pools, and Lambda authorizers. Amazon Cognito user pools is a feature that provides a managed user directory for authentication. For more detailed information, see the AWS Hero blog post, “Picking the correct authorization mechanism in Amazon API Gateway“.

You can also use resource policies to restrict content to a specific VPC, VPC endpoint, a data center, or a specific AWS Account.

API Gateway resource policies are different from IAM identity policies. IAM identity policies are attached to IAM users, groups, or roles. These policies define what that identity can do on which resources. For example, in the serverless airline, the IAM role AppsyncLoyaltyRestApiIamRole specifies that appsync.amazonaws.com can perform an execute-api:Invoke on the specific API Gateway resource arn:aws:execute-api:${AWS::Region}:${AWS::AccountId}:${LoyaltyApi}/*/*/*

Resource policies are attached to resources such as an Amazon S3 bucket, or an API Gateway resource or method. The policies define what identities can access the resource.

IAM access is determined by a combination of identity policies and resource policies.

For more information on the differences, see “Identity-Based Policies and Resource-Based Policies”. To see which services support resource-based policies, see “AWS Services That Work with IAM”.

API Gateway HTTP APIs support JWT authorizers as a part of OpenID Connect (OIDC) and OAuth 2.0 frameworks.

API Gateway WebSocket APIs support AWS IAM and Lambda authorizers.

With AWS AppSync public endpoints, you can enable authorization with the following:

  • AWS IAM
  • Amazon Cognito User pools for email and password functionality
  • Social providers (Facebook, Google+, and Login with Amazon)
  • Enterprise federation with SAML

Within the serverless airline, AWS Amplify Console hosts the public user facing site. Amplify Console provides a git-based workflow for building, deploying, and hosting serverless web applications. Amplify Console manages the hosting of the frontend assets for single page app (SPA) frameworks in addition to static websites, along with an optional serverless backend. Frontend assets are stored in S3 and the Amazon CloudFront global edge network distributes the web app globally.

The AWS Amplify CLI toolchain allows you to add backend resources using AWS CloudFormation.

Using Amplify CLI to add authentication

For the serverless airline, I use the Amplify CLI to add authentication using Amazon Cognito with the following command:

amplify add auth

When prompted, I specify the authentication parameters I require.

Amplify add auth

Amplify add auth

Amplify CLI creates a local CloudFormation template. Use the following command to deploy the updated authentication configuration to the cloud:

amplify push

Once the deployment is complete, I view the deployed authentication nested stack resources from within the CloudFormation Console. I see the Amazon Cognito user pool.

View Amplify authentication CloudFormation nested stack resources

View Amplify authentication CloudFormation nested stack resources

For a more detailed walkthrough using Amplify CLI to add authentication for the serverless airline, see the build video.

For more information on Amplify CLI and authentication, see “Authentication with Amplify”.

Conclusion

To help protect against unauthorized access and prevent unnecessary use of serverless API resources, control access using authentication and authorization mechanisms.

In this post, I cover the different mechanisms for authorization available for API Gateway and AWS AppSync. I explain the different approaches for public or private endpoints and show how to use IAM to control access to internal or private API consumers. I walk through how to use the Amplify CLI to create an Amazon Cognito user pool.

This well-architected question will be continued in a future post where I continue using the Amplify CLI to add a GraphQL API. I will explain how to view JSON Web Tokens (JWT) claims, and how to use Cognito identity pools to grant temporary access to AWS services. I will also show how to use API keys and API Gateway usage plans for rate limiting and throttling requests.

Improve VPN Network Performance of AWS Hybrid Cloud with Global Accelerator

Post Syndicated from Anandprasanna Gaitonde original https://aws.amazon.com/blogs/architecture/improve-vpn-network-performance-of-aws-hybrid-cloud-with-global-accelerator/

Introduction

Connecting on-premises data centers to AWS using AWS Site-to-Site VPN to support distributed applications is a common practice. With business expansion and acquisitions, your company’s on-premises IT footprint may grow into various geographies, with these multiple sites comprising of on-premises data centers and co-location facilities. AWS Site-to-Site VPN supports throughput up to 1.25 Gbps, although the actual throughput can be lower for VPN connections that are in a different geolocations from the AWS region. This is because the internet path between them has to traverse multiple networks. For globally distributed applications that interact with other applications and components located on-premises, these VPN connections can impact performance and user experience.

This blog post provides an architectural approach to improving the performance of such globally distributed applications. We’ll explain an architecture that utilizes AWS Global Accelerator to create highly performant connectivity in terms of latency and bandwidth for VPN connections that originate from distant geographies around the world. Using this architecture, you can optimize your inter-application traffic between remote sites and your AWS environment, which can lead to better application performance and customer experience.

Distributed application architecture in a hybrid cloud using VPN

Distributed application architecture in a hybrid cloud using VPN

The above figure shows a pictorial representation of a customer’s existing IT footprint spread across several locations in the U.S., Europe, and the Asia Pacific (APAC), while the AWS environment is set up in us-east-1 region. In this use case, a business application hosted in AWS has the following dependencies on remote data centers and is also accessed by remote corporate users:

  1. Communication with an application hosted in a data center in EU region
  2. Communication with a data center in the US where corporate users access the AWS application over VPN
  3. Integration with local API based service in the APAC region

Site-to-Site VPN from a remote site to an AWS environment provides secure connectivity for this inter-application traffic, as well as traffic from users to the application. Sites closer to the us-east-1 region may see reasonably good network performance and latency. However, sites that are geographically remote may experience higher latencies and not-so-reliable network performance due to the number of network hops spanning multiple networks and possible congestion. In addition, varying network paths through the Internet backbone can also lead to increased latencies. This impacts the overall application performance, which can lead to an unsatisfactory customer experience.

Optimizing application performance with Accelerated VPN connections

Optimizing application performance with Accelerated VPN connections

The above diagram shows the business application hosted in a multi-VPC architecture on AWS comprising of a production VPC and a sandbox VPC, typical of customer environments. These VPCs are interconnected using AWS Transit Gateway, and the VPN connections from the three remote sites terminate at AWS Transit Gateway as VPN attachments.

To improve the user experience for the application, VPN attachments to AWS Transit gateway are enabled with a feature called Accelerated Site-to-Site VPN. With this feature enabled, AWS Global Accelerator routes traffic from an on-premises network to the AWS Edge location closest to your customer’s gateway. It uses the AWS global network to route traffic through the AWS Global backbone from the closest Edge location, thereby ensuring the traffic remains over the optimum network path. This translates into faster response times, increased throughput, and a better user experience as described in this blog post about better performance for internet traffic with AWS Global Accelerator.

The Accelerated Site-to-Site VPN feature is enabled by creating accelerators that allow you to associate two Anycast static IPs from the Edge network. (Anycast is a network addressing and routing method that attributes a single IP address to multiple endpoints in a network.) These static IP addresses act as a fixed entry point to the VPN tunnel endpoints. This improves the availability and performance of your applications that need to interface with remote sites for their functionality. The above diagram shows three Edge locations, each one corresponding to the accelerators for each of the VPN connections. Since AWS Transit Gateway allows connectivity to multiple VPCs in your AWS environment, the benefit of improved network performance is extended to applications and workloads in VPCs connected to the transit gateway. This architecture scales as business demands and workloads continue to grow on AWS.

Configuring your VPN connections for the Acceleration

To make changes to your existing VPN, consider the following for enabling the acceleration:

  • If your current existing VPN connections are terminating on a VPN Gateway, you will need to create an AWS Transit Gateway and create VPC attachments from the application VPC to the Transit Gateway.
  • Existing VPN connections on Transit Gateway can’t be modified to take advantage of the acceleration, so you will need to tear down existing connections and set up new ones in the AWS console as shown below. Then, configure your customer gateway device to use the new Site-to-Site VPN connection and delete the old Site-to-Site VPN connection.

Create VPN connection

For more information and steps, see Creating a transit gateway VPN attachment.

Accelerated VPN connections use two VPN tunnels per connection like a regular Site-to-Site VPN connection. For accelerated VPN connections, each tunnel uses a separate accelerator and a separate pool of IP addresses for the tunnel endpoint IP addresses. The IP addresses for the two VPN tunnels are selected from two separate network zones. This ensures high availability for your VPN connections and can handle any network disruptions within a particular zone. If an Edge location fails, the customer gateway can reinitiate the VPN tunnel to the same IP address and get connected to the nearest available Edge location, making it resilient. These are the outside IP addresses to which the customer gateway will connect, as shown below:

Outside IP addresses to which customer gateway will connect

Considerations

Accelerated VPN functionality provides benefits to architectures involved in communicating with remote data centers and on-premises locations, but there are some considerations to keep in mind:

  • Additional charges are involved due to the use of Global Accelerator when acceleration is enabled. Performance testing should be done to evaluate the benefit it provides to your application.
  • Don’t enable accelerated VPN when the customer gateway for your VPN connection is also in an AWS environment since that traffic already traverses through the AWS backbone.
  • Applications that require a consistent network performance and a dedicated private connection should consider moving to AWS Direct Connect.

From the AWS Region where your application resides, you can use the Global Accelerator Speed Comparison tool from those remote data centers to see Global Accelerator download speeds compared to direct internet downloads. Note that while the tool uses TCP, the VPN uses UDP protocol, meaning it’s not a performance test of a VPN connection. However, it will give you a reasonable indication of the performance improvement for your VPN.

Summary

As you start adopting the cloud and migrating workloads to the AWS platform, you’ll realize the inherent benefits of scalability, high availability, and security to create fault-tolerant and production-grade applications. During this transition, you will have hybrid cloud environments utilizing VPN connectivity. Accelerated Site-to-Site VPN connections can provide you with performance improvements for your application traffic. This is a good alternative until your traffic demands and architecture considerations mandate the use of a dedicated network path using AWS Direct Connect from your remote locations to AWS.

 

Using Amazon EFS for AWS Lambda in your serverless applications

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/

Serverless applications are event-driven, using ephemeral compute functions to integrate services and transform data. While AWS Lambda includes a 512-MB temporary file system for your code, this is an ephemeral scratch resource not intended for durable storage.

Amazon EFS is a fully managed, elastic, shared file system designed to be consumed by other AWS services, such as Lambda. With the release of Amazon EFS for Lambda, you can now easily share data across function invocations. You can also read large reference data files, and write function output to a persistent and shared store. There is no additional charge for using file systems from your Lambda function within the same VPC.

EFS for Lambda makes it simpler to use a serverless architecture to implement many common workloads. It opens new capabilities, such as building and importing large code libraries directly into your Lambda functions. Since the code is loaded dynamically, you can also ensure that the latest version of these libraries is always used by every new execution environment. For appending to existing files, EFS is also a preferred option to using Amazon S3.

This blog post shows how to enable EFS for Lambda in your AWS account, and walks through some common use-cases.

Capabilities and behaviors of Lambda with EFS

EFS is built to scale on demand to petabytes of data, growing and shrinking automatically as files are written and deleted. When used with Lambda, your code has low-latency access to a file system where data is persisted after the function terminates.

EFS is a highly reliable NFS-based regional service, with all data stored durably across multiple Availability Zones. It is cost-optimized, due to no provisioning requirements, and no purchase commitments. It uses built-in lifecycle management to optimize between SSD-performance class and an infrequent access class that offer 92% lower cost.

EFS offers two performance modes – general purpose and MaxIO. General purpose is suitable for most Lambda workloads, providing lower operational latency and higher performance for individual files.

You also choose between two throughput modes – bursting and provisioned. The bursting mode uses a credit system to determine when a file system can burst. With bursting, your throughput is calculated based upon the amount of data you are storing. Provisioned throughput is useful when you need more throughout than provided by the bursting mode. Total throughput available is divided across the number of concurrent Lambda invocations.

The Lambda service mounts EFS file systems when the execution environment is prepared. This adds minimal latency when the function is invoked for the first time, often within hundreds of milliseconds. When the execution environment is already warm from previous invocations, the EFS mount is already available.

EFS can be used with Provisioned Concurrency for Lambda. When the reserved capacity is prepared, the Lambda service also configures and mounts EFS file system. Since Provisioned Concurrency executes any initialization code, any libraries or packages consumed from EFS at this point are downloaded. In this use-case, it’s recommended to use provisioned throughout when configuring EFS.

The EFS file system is shared across Lambda functions as it scales up the number of concurrent executions. As files are written by one instance of a Lambda function, all other instances can access and modify this data, depending upon the access point permissions. The EFS file system scales with your Lambda functions, supporting up to 25,000 concurrent connections.

Creating an EFS file system

Configuring EFS for Lambda is straight-forward. I show how to do this in the AWS Management Console but you can also use the AWS CLI, AWS SDK, AWS Serverless Application Model (AWS SAM), and AWS CloudFormation. EFS file systems are always created within a customer VPC, so Lambda functions using the EFS file system must all reside in the same VPC.

To create an EFS file system:

  1. Navigate to the EFS console.
  2. Choose Create File System.
    EFS: Create File System
  3. On the Configure network access page, select your preferred VPC. Only resources within this VPC can access this EFS file system. Accept the default mount targets, and choose Next Step.
  4. On Configure file system settings, you can choose to enable encryption of data at rest. Review this setting, then accept the other defaults and choose Next Step. This uses bursting mode instead of provisioned throughout.
  5. On the Configure client access page, choose Add access point.
    EFS: Add access point
  6. Enter the following parameters. This configuration creates a file system with open read/write permissions – read more about settings to secure your access points. Choose Next Step.EFS: Access points
  7. On the Review and create page, check your settings and choose Create File System.
  8. In the EFS console, you see the new file system and its configuration. Wait until the Mount target state changes to Available before proceeding to the next steps.

Alternatively, you can use CloudFormation to create the EFS access point. With the AWS::EFS::AccessPoint resource, the preceding configuration is defined as follows:

  AccessPointResource:
    Type: 'AWS::EFS::AccessPoint'
    Properties:
      FileSystemId: !Ref FileSystemResource
      PosixUser:
        Uid: "1000"
        Gid: "1000"
      RootDirectory:
        CreationInfo:
          OwnerGid: "1000"
          OwnerUid: "1000"
          Permissions: "0777"
        Path: "/lambda"

For more information, see the example setup template in the code repository.

Working with AWS Cloud9 and Amazon EC2

You can mount EFS access points on Amazon EC2 instances. This can be useful for browsing file systems contents and downloading files from other locations. The EFS console shows customized mount instructions directly under each created file system:

EFS customized mount instructions

The instance must have access to the same security group and reside in the same VPC as the EFS file system. After connecting via SSH to the EC2 instance, you mount the EFS mount target to a directory. You can also mount EFS in AWS Cloud9 instances using the terminal window.

Any files you write into the EFS file system are available to any Lambda functions using the same EFS file system. Similarly, any files written by Lambda functions are available to the EC2 instance.

Sharing large code packages with Lambda

EFS is useful for sharing software packages or binaries that are otherwise too large for Lambda layers. You can copy these to EFS and have Lambda use these packages as if there are installed in the Lambda deployment package.

For example, on EFS you can install Puppeteer, which runs a headless Chromium browser, using the following script run on an EC2 instance or AWS Cloud9 terminal:

  mkdir node && cd node
  npm init -y
  npm i puppeteer --save

Building packages in EC2 for EFS

You can then use this package from a Lambda function connected to this folder in the EFS file system. You include the Puppeteer package with the mount path in the require declaration:

const puppeteer = require ('/mnt/efs/node/node_modules/puppeteer')

In Node.js, to avoid changing declarations manually, you can add the EFS mount path to the Node.js module search path by using app-module-path. Lambda functions support a range of other runtimes, including Python, Java, and Go. Many other runtimes offer similar ways to add the EFS path to the list of default package locations.

There is an important difference between using packages in EFS compared with Lambda layers. When you use Lambda layers to include packages, these are downloaded to an immutable code package. Any changes to the underlying layer do not affect existing functions published using that layer.

Since EFS is a dynamic binding, any changes or upgrades to packages are available immediately to the Lambda function when the execution environment is prepared. This means you can output a build process to an EFS mount, and immediately consume any new versions of the build from a Lambda function.

Configuring AWS Lambda to use EFS

Lambda functions that access EFS must run from within a VPC. Read this guide to learn more about setting up Lambda functions to access resources from a VPC. There are also sample CloudFormation templates you can use to configure private and public VPC access.

The execution role for Lambda function must provide access to the VPC and EFS. For development and testing purposes, this post uses the AWSLambdaVPCAccessExecutionRole and AmazonElasticFileSystemClientFullAccess managed policies in IAM. For production systems, you should use more restrictive policies to control access to EFS resources.

Once your Lambda function is configured to use a VPC, next configure EFS in Lambda:

  1. Navigate to the Lambda console and select your function from the list.
  2. Scroll down to the File system panel, and choose Add file system.
    EFS: Add file system
  3. In the File system configuration:
  • From the EFS file system dropdown, select the required file system. From the Access point dropdown, choose the required EFS access point.
  • In the Local mount path, enter the path your Lambda function uses to access this resource. Enter an absolute path.
  • Choose Save.
    EFS: Add file system

The File system panel now shows the configuration of the EFS mount, and the function is ready to use EFS. Alternatively, you can use an AWS Serverless Application Model (SAM) template to add the EFS configuration to a function resource:

AWSTemplateFormatVersion: '2010-09-09'
Resources:
  MyLambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
	...
      FileSystemConfigs:
      - Arn: arn:aws:elasticfilesystem:us-east-1:xxxxxx:accesspoint/
fsap-123abcdef12abcdef
        LocalMountPath: /mnt/efs

To learn more, see the SAM documentation on this feature.

Example applications

You can view and download these examples from this GitHub repository. To deploy, follow the instructions in the repo’s README.md file.

1. Processing large video files

The first example uses EFS to process a 60-minute MP4 video and create screenshots for each second of the recording. This uses to the FFmpeg Linux package to process the video. After copying the MP4 to the EFS file location, invoke the Lambda function to create a series of JPG frames. This uses the following code to execute FFmpeg and pass the EFS mount path and input file parameters:

const os = require('os')

const inputFile = process.env.INPUT_FILE
const efsPath = process.env.EFS_PATH

const { exec } = require('child_process')

const execPromise = async (command) => {
	console.log(command)
	return new Promise((resolve, reject) => {
		const ls = exec(command, function (error, stdout, stderr) {
		  if (error) {
		    console.log('Error: ', error)
		    reject(error)
		  }
		  console.log('stdout: ', stdout);
		  console.log('stderr: ' ,stderr);
		})
		
		ls.on('exit', function (code) {
		  console.log('Finished: ', code);
		  resolve()
		})
	})
}

// The Lambda handler
exports.handler = async function (eventObject, context) {
	await execPromise(`/opt/bin/ffmpeg -loglevel error -i ${efsPath}/${inputFile} -s 240x135 -vf fps=1 ${efsPath}/%d.jpg`)
}

In this example, the process writes more than 2000 individual JPG files back to the EFS file system during a single invocation:

Console output from sample application

2. Archiving large numbers of files

Using the output from the first application, the second example creates a single archive file from the JPG files. The code uses the Node.js archiver package for processing:

const outputFile = process.env.OUTPUT_FILE
const efsPath = process.env.EFS_PATH

const fs = require('fs')
const archiver = require('archiver')

// The Lambda handler
exports.handler = function (event) {

  const output = fs.createWriteStream(`${efsPath}/${outputFile}`)
  const archive = archiver('zip', {
    zlib: { level: 9 } // Sets the compression level.
  })
  
  output.on('close', function() {
    console.log(archive.pointer() + ' total bytes')
  })
  
  output.on('end', function() {
    console.log('Data has been drained')
  })
  
  archive.pipe(output)  

  // append files from a glob pattern
  archive.glob(`${efsPath}/*.jpg`)
  archive.finalize()
}

After executing this Lambda function, the resulting ZIP file is written back to the EFS file system:

Console output from second sample application.

3. Unzipping archives with a large number of files

The last example shows how to unzip an archive containing many files. This uses the Node.js unzipper package for processing:

const inputFile = process.env.INPUT_FILE
const efsPath = process.env.EFS_PATH
const destinationDir = process.env.DESTINATION_DIR

const fs = require('fs')
const unzipper = require('unzipper')

// The Lambda handler
exports.handler = function (event) {

  fs.createReadStream(`${efsPath}/${inputFile}`)
    .pipe(unzipper.Extract({ path: `${efsPath}/${destinationDir}` }))

}

Once this Lambda function is executed, the archive is unzipped into a destination direction in the EFS file system. This example shows the screenshots unzipped into the frames subdirectory:

Console output from third sample application.

Conclusion

EFS for Lambda allows you to share data across function invocations, read large reference data files, and write function output to a persistent and shared store. After configuring EFS, you provide the Lambda function with an access point ARN, allowing you to read and write to this file system. Lambda securely connects the function instances to the EFS mount targets in the same Availability Zone and subnet.

EFS opens a range of potential new use-cases for Lambda. In this post, I show how this enables you to access large code packages and binaries, and process large numbers of files. You can interact with the file system via EC2 or AWS Cloud9 and pass information to and from your Lambda functions.

EFS for Lambda is supported at launch in APN Partner solutions, including Epsagon, Lumigo, Datadog, HashiCorp Terraform, and Pulumi. To learn more about how to use EFS for Lambda, see the AWS News Blog post and read the documentation.

How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS PrivateLink

Post Syndicated from Robert L. Cossin original https://aws.amazon.com/blogs/big-data/how-goldman-sachs-builds-cross-account-connectivity-to-their-amazon-msk-clusters-with-aws-privatelink/

This guest post presents patterns for accessing an Amazon Managed Streaming for Apache Kafka cluster across your AWS account or Amazon Virtual Private Cloud (Amazon VPC) boundaries using AWS PrivateLink. In addition, the post discusses the pattern that the Transaction Banking team at Goldman Sachs (TxB) chose for their cross-account access, the reasons behind their decision, and how TxB satisfies its security requirements with Amazon MSK. Using Goldman Sachs’s implementation as a use case, this post aims to provide you with general guidance that you can use when implementing an Amazon MSK environment.

Overview

Amazon MSK is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming data. When you create an MSK cluster, the cluster resources are available to participants within the same Amazon VPC. This allows you to launch the cluster within specific subnets of the VPC, associate it with security groups, and attach IP addresses from your VPC’s address space through elastic network interfaces (ENIs). Network traffic between clients and the cluster stays within the AWS network, with internet access to the cluster not possible by default.

You may need to allow clients access to an MSK cluster in a different VPC within the same or a different AWS account. You have options such as VPC peering or a transit gateway that allow for resources in either VPC to communicate with each other as if they’re within the same network. For more information about access options, see Accessing an Amazon MSK Cluster.

Although these options are valid, this post focuses on a different approach, which uses AWS PrivateLink. Therefore, before we dive deep into the actual patterns, let’s briefly discuss when AWS PrivateLink is a more appropriate strategy for cross-account and cross-VPC access.

VPC peering, illustrated below, is a bidirectional networking connection between two VPCs that enables you to route traffic between them using private IPv4 addresses or IPv6 addresses.

VPC peering is more suited for environments that have a high degree of trust between the parties that are peering their VPCs. This is because, after a VPC peering connection is established, the two VPCs can have broad access to each other, with resources in either VPC capable of initiating a connection. You’re responsible for implementing fine-grained network access controls with security groups to make sure that only specific resources intended to be reachable are accessible between the peered VPCs.

You can only establish VPC peering connections across VPCs that have non-overlapping CIDRs. This can pose a challenge when you need to peer VPCs with overlapping CIDRs, such as when peering across accounts from different organizations.

Additionally, if you’re running at scale, you can have hundreds of Amazon VPCs, and VPC peering has a limit of 125 peering connections to a single Amazon VPC. You can use a network hub like transit gateway, which, although highly scalable in enabling you to connect thousands of Amazon VPCs, requires similar bidirectional trust and non-overlapping CIDRs as VPC peering.

In contrast, AWS PrivateLink provides fine-grained network access control to specific resources in a VPC instead of all resources by default, and is therefore more suited for environments that want to follow a lower trust model approach, thus reducing their risk surface. The following diagram shows a service provider VPC that has a service running on Amazon Elastic Compute Cloud (Amazon EC2) instances, fronted by a Network Load Balancer (NLB). The service provider creates a configuration called a VPC endpoint service in the service provider VPC, pointing to the NLB. You can share this endpoint service with another Amazon VPC (service consumer VPC), which can use an interface VPC endpoint powered by AWS PrivateLink to connect to the service. The service consumers use this interface endpoint to reach the end application or service directly.

AWS PrivateLink makes sure that the connections initiated to a specific set of network resources are unidirectional—the connection can only originate from the service consumer VPC and flow into the service provider VPC and not the other way around. Outside of the network resources backed by the interface endpoint, no other resources in the service provider VPC get exposed. AWS PrivateLink allows for VPC CIDR ranges to overlap, and it can relatively scale better because thousands of Amazon VPCs can consume each service.

VPC peering and AWS PrivateLink are therefore two connectivity options suited for different trust models and use cases.

Transaction Banking’s micro-account strategy

An AWS account is a strong isolation boundary that provides both access control and reduced blast radius for issues that may occur due to deployment and configuration errors. This strong isolation is possible because you need to deliberately and proactively configure flows that cross an account boundary. TxB designed a strategy that moves each of their systems into its own AWS account, each of which is called a TxB micro-account. This strategy allows TxB to minimize the chances of a misconfiguration exposing multiple systems. For more information about TxB micro-accounts, see the video AWS re:Invent 2018: Policy Verification and Enforcement at Scale with AWS on YouTube.

To further complement the strong gains realized due to a TxB micro-account segmentation, TxB chose AWS PrivateLink for cross-account and cross-VPC access of their systems. AWS PrivateLink allows TxB service providers to expose their services as an endpoint service and use whitelisting to explicitly configure which other AWS accounts can create interface endpoints to these services. This also allows for fine-grained control of the access patterns for each service. The endpoint service definition only allows access to resources attached to the NLBs and thereby makes it easy to understand the scope of access overall. The one-way initiation of connection from a service consumer to a service provider makes sure that all connectivity is controlled on a point-to-point basis.  Furthermore, AWS PrivateLink allows the CIDR blocks of VPCs to overlap between the TxB micro-accounts. Thus the use of AWS PrivateLink sets TxB up for future growth as a part of their default setup, because thousands of TxB micro-account VPCs can consume each service if needed.

MSK broker access patterns using AWS PrivateLink

As a part of their micro-account strategy, TxB runs an MSK cluster in its own dedicated AWS account, and clients that interact with this cluster are in their respective micro-accounts. Considering this setup and the preference to use AWS PrivateLink for cross-account connectivity, TxB evaluated the following two patterns for broker access across accounts.

Pattern 1: Front each MSK broker with a unique dedicated interface endpoint

In this pattern, each MSK broker is fronted with a unique dedicated NLB in the TxB MSK account hosting the MSK cluster. The TxB MSK account contains an endpoint service for every NLB and is shared with the client account. The client account contains interface endpoints corresponding to the endpoint services. Finally, DNS entries identical to the broker DNS names point to the respective interface endpoint. The following diagram illustrates this pattern in the US East (Ohio) Region.

High-level flow

After setup, clients from their own accounts talk to the brokers using their provisioned default DNS names as follows:

  1. The client resolves the broker DNS name to the interface endpoint IP address inside the client VPC.
  2. The client initiates a TCP connection to the interface endpoint IP over port 9094.
  3. With AWS PrivateLink technology, this TCP connection is routed to the dedicated NLB setup for the respective broker listening on the same port within the TxB MSK account.
  4. The NLB routes the connection to the single broker IP registered behind it on TCP port 9094.

High-level setup

The setup steps in this section are shown for the US East (Ohio) Region, please modify if using another region. In the TxB MSK account, complete the following:

  1. Create a target group with target type as IP, protocol TCP, port 9094, and in the same VPC as the MSK cluster.
    • Register the MSK broker as a target by its IP address.
  2. Create an NLB with a listener of TCP port 9094 and forwarding to the target group created in the previous step.
    • Enable the NLB for the same AZ and subnet as the MSK broker it fronts.
  3. Create an endpoint service configuration for each NLB that requires acceptance and grant permissions to the client account so it can create a connection to this endpoint service.

In the client account, complete the following:

  1. Create an interface endpoint in the same VPC the client is in (this connection request needs to be accepted within the TxB MSK account).
  2. Create a Route 53 private hosted zone, with the domain name kafka.us-east-2.amazonaws.com, and associate it with the same VPC as the clients are in.
  3. Create A-Alias records identical to the broker DNS names to avoid any TLS handshake failures and point it to the interface endpoints of the respective brokers.

Pattern 2: Front all MSK brokers with a single shared interface endpoint

In this second pattern, all brokers in the cluster are fronted with a single unique NLB that has cross-zone load balancing enabled. You make this possible by modifying each MSK broker’s advertised.listeners config to advertise a unique port. You create a unique NLB listener-target group pair for each broker and a single shared listener-target group pair for all brokers. You create an endpoint service configuration for this single NLB and share it with the client account. In the client account, you create an interface endpoint corresponding to the endpoint service. Finally, you create DNS entries identical to the broker DNS names that point to the single interface. The following diagram illustrates this pattern in the US East (Ohio) Region.

High-level flow

After setup, clients from their own accounts talk to the brokers using their provisioned default DNS names as follows:

  1. The client resolves the broker DNS name to the interface endpoint IP address inside the client VPC.
  2. The client initiates a TCP connection to the interface endpoint over port 9094.
  3. The NLB listener within the TxB MSK account on port 9094 receives the connection.
  4. The NLB listener’s corresponding target group load balances the request to one of the brokers registered to it (Broker 1). In response, Broker 1 sends back the advertised DNS name and port (9001) to the client.
  5. The client resolves the broker endpoint address again to the interface endpoint IP and initiates a connection to the same interface endpoint over TCP port 9001.
  6. This connection is routed to the NLB listener for TCP port 9001.
  7. This NLB listener’s corresponding target group is configured to receive the traffic on TCP port 9094, and forwards the request on the same port to the only registered target, Broker 1.

High-level setup

The setup steps in this section are shown for the US East (Ohio) Region, please modify if using another region. In the TxB MSK account, complete the following:

  1. Modify the port that the MSK broker is advertising by running the following command against each running broker. The following example command shows changing the advertised port on a specific broker b-1 to 9001. For each broker you run the below command against, you must change the values of bootstrap-server, entity-name, CLIENT_SECURE, REPLICATION and REPLICATION_SECURE. Please note that while modifying the REPLICATION and REPLICATION_SECURE values, -internal has to be appended to the broker name and the ports 9093 and 9095 shown below should not be changed.
    ./kafka-configs.sh \
    --bootstrap-server b-1.exampleClusterName.abcde.c2.kafka.us-east-2.amazonaws.com:9094 \
    --entity-type brokers \
    --entity-name 1 \
    --alter \
    --command-config kafka_2.12-2.2.1/bin/client.properties \
    --add-config advertised.listeners=[\
    CLIENT_SECURE://b-1.exampleClusterName.abcde.c2.kafka.us-east-2.amazonaws.com:9001,\
    REPLICATION://b-1-internal.exampleClusterName.abcde.c2.kafka.us-east-2.amazonaws.com:9093,\
    REPLICATION_SECURE://b-1-internal.exampleClusterName.abcde.c2.kafka.us-east-2.amazonaws.com:9095]

  2. Create a target group with target type as IP, protocol TCP, port 9094, and in the same VPC as the MSK cluster. The preceding diagram represents this as B-ALL.
    • Register all MSK brokers to B-ALL as a target by its IP address.
  3. Create target groups dedicated for each broker (B1, B2) with the same properties as B-ALL.
    • Register the respective MSK broker to each target group by its IP address.
  4. Perform the same steps for additional brokers if needed and create unique listener-target group corresponding to the advertised port for each broker.
  5. Create an NLB that is enabled for the same subnets that the MSK brokers are in and with cross-zone load balancing enabled.
    • Create a TCP listener for every broker’s advertised port (9001, 9002) that forwards to the corresponding target group you created (B1, B2).
    • Create a special TCP listener 9094 that forwards to the B-ALL target group.
  6. Create an endpoint service configuration for the NLB that requires acceptance and grant permissions to the client account to create a connection to this endpoint service.

In the client account, complete the following:

  1. Create an interface endpoint in the same VPC the client is in (this connection request needs to be accepted within the TxB MSK account).
  2. Create a Route 53 private hosted zone, with the domain name kafka.us-east-2.amazonaws.com and associate it with the same VPC as the client is in.
  3. Under this hosted zone, create A-Alias records identical to the broker DNS names to avoid any TLS handshake failures and point it to the interface endpoint.

This post shows both of these patterns to be using TLS on TCP port 9094 to talk to the MSK brokers. If your security posture allows the use of plaintext communication between the clients and brokers, these patterns apply in that scenario as well, using TCP port 9092.

With both of these patterns, if Amazon MSK detects a broker failure, it mitigates the failure by replacing the unhealthy broker with a new one. In addition, the new MSK broker retains the same IP address and has the same Kafka properties, such as any modified advertised.listener configuration.

Amazon MSK allows clients to communicate with the service on TCP ports 9092, 9094, and 2181. As a byproduct of modifying the advertised.listener in Pattern 2, clients are automatically asked to speak with the brokers on the advertised port. If there is a need for clients in the same account as Amazon MSK to access the brokers, you should create a new Route53 hosted zone in the Amazon MSK account with identical broker DNS names pointing to the NLB DNS name. The Route53 record sets override the MSK broker DNS and allow for all traffic to the brokers to go via the NLB.

Transaction Banking’s MSK broker access pattern

For broker access across TxB micro-accounts, TxB chose Pattern 1, where one interface endpoint per broker is exposed to the client account. TxB streamlined this overall process by automating the creation of the endpoint service within the TxB MSK account and the interface endpoints within the client accounts without any manual intervention.

At the time of cluster creation, the bootstrap broker configuration is retrieved by calling the Amazon MSK APIs and stored in AWS Systems Manager Parameter Store in the client account so that they can be retrieved on application startup. This enables clients to be agnostic of the Kafka broker’s DNS names being launched in a completely different account.

A key driver for TxB choosing Pattern 1 is that it avoids having to modify a broker property like the advertised port. Pattern 2 creates the need for TxB to track which broker is advertising which port and make sure new brokers aren’t reusing the same port. This adds the overhead of having to modify and track the advertised port of new brokers being launched live and having to create a corresponding listener-target group pair for these brokers. TxB avoided this additional overhead by choosing Pattern 1.

On the other hand, Pattern 1 requires the creation of additional dedicated NLBs and interface endpoint connections when more brokers are added to the cluster. TxB limits this management overhead through automation, which requires additional engineering effort.

Also, using Pattern 1 costs more compared to Pattern 2, because each broker in the cluster has a dedicated NLB and an interface endpoint. For a single broker, it costs $37.80 per month to keep the end-to-end connectivity infrastructure up. The breakdown of the monthly connectivity costs is as follows:

  • NLB running cost – 1 NLB x $0.0225 x 720 hours/month = $16.20/month
  • 1 VPC endpoint spread across three AZs – 1 VPCE x 3 ENIs x $0.01 x 720 hours/month = $21.60/month

Additional charges for NLB capacity used and AWS PrivateLink data processed apply. For more information about pricing, see Elastic Load Balancing pricing and AWS PrivateLink pricing.

To summarize, Pattern 1 is best applicable when:

  • You want to minimize the management overhead associated with modifying broker properties, such as advertised port
  • You have automation that takes care of adding and removing infrastructure when new brokers are created or destroyed
  • Simplified and uniform deployments are primary drivers, with cost as a secondary concern

Transaction Banking’s security requirements for Amazon MSK

The TxB micro-account provides a strong application isolation boundary, and accessing MSK brokers using AWS PrivateLink using Pattern 1 allows for tightly controlled connection flows between these TxB micro-accounts. TxB further builds on this foundation through additional infrastructure and data protection controls available in Amazon MSK. For more information, see Security in Amazon Managed Streaming for Apache Kafka.

The following are the core security tenets that TxB’s internal security team require for using Amazon MSK:

  • Encryption at rest using Customer Master Key (CMK) – TxB uses the Amazon MSK managed offering of encryption at rest. Amazon MSK integrates with AWS Key Management Service (AWS KMS) to offer transparent server-side encryption to always encrypt your data at rest. When you create an MSK cluster, you can specify the AWS KMS CMK that AWS KMS uses to generate data keys that encrypt your data at rest. For more information, see Using CMKs and data keys.
  • Encryption in transit – Amazon MSK uses TLS 1.2 for encryption in transit. TxB makes client-broker encryption and encryption between the MSK brokers mandatory.
  • Client authentication with TLS – Amazon MSK uses AWS Certificate Manager Private Certificate Authority (ACM PCA) for client authentication. The ACM PCA can either be a root Certificate Authority (CA) or a subordinate CA. If it’s a root CA, you need to install a self-signed certificate. If it’s a subordinate CA, you can choose its parent to be an ACM PCA root, a subordinate CA, or an external CA. This external CA can be your own CA that issues the certificate and becomes part of the certificate chain when installed as the ACM PCA certificate. TxB takes advantage of this capability and uses certificates signed by ACM PCA that are distributed to the client accounts.
  • Authorization using Kafka Access Control Lists (ACLs) – Amazon MSK allows you to use the Distinguished Name of a client’s TLS certificates as the principal of the Kafka ACL to authorize client requests. To enable Kafka ACLs, you must first have client authentication using TLS enabled. TxB uses the Kafka Admin API to create Kafka ACLs for each topic using the certificate names of the certificates deployed on the consumer and producer client instances. For more information, see Apache Kafka ACLs.

Conclusion

This post illustrated how the Transaction Banking team at Goldman Sachs approaches an application isolation boundary through the TxB micro-account strategy and how AWS PrivateLink complements this strategy.  Additionally, this post discussed how the TxB team builds connectivity to their MSK clusters across TxB micro-accounts and how Amazon MSK takes the undifferentiated heavy lifting away from TxB by allowing them to achieve their core security requirements. You can leverage this post as a reference to build a similar approach when implementing an Amazon MSK environment.

 


About the Authors

Robert L. Cossin is a Vice President at Goldman Sachs in New York. Rob joined Goldman Sachs in 2004 and has worked on many projects within the firm’s cash and securities flows. Most recently, Rob is a technical architect on the Transaction Banking team, focusing on cloud enablement and security.

 

 

 

Harsha W. Sharma is a Solutions Architect with AWS in New York. Harsha joined AWS in 2016 and works with Global Financial Services customers to design and develop architectures on AWS, and support their journey on the cloud.

 

 

New – Amazon Simple Email Service (SES) for VPC Endpoints

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-simple-email-service-ses-for-vpc-endpoints/

Although chat and messaging applications have been popular, the email has retained its place as a ubiquitous channel with the highest Return on Investment (ROI) because of its low barrier to entry, affordability and ability to target specific recipients. To ensure that organization’s marketing and transactional messages are received by the end customer in a timely manner and to drive deeper engagement with them, you need to partner with a mature and trusted email service provider that has built specialized expertise in delivering email at scale.

Amazon Simple Email Services(SES) has been the trustworthy, flexible and affordable email service provider for developers and digital marketers since 2011. Amazon SES is a reliable, cost-effective service for businesses of all sizes that use email to keep in contact with their customers. Many businesses operate in industries that are highly secure and have strict security policies. So we have enhanced security and compliance features in Amazon SES, such as enabling you to configure DKIM using your own RSA key pair, and support HIPAA Eligibility and FIPS 140-2 Compliant Endpoints as well as regional expansions.

Today, I am pleased to announce that customers can now connect directly from Virtual Private Cloud (VPC) to Amazon SES through a VPC Endpoint, powered by AWS PrivateLink, in a secure and scalable manner. You can now access Amazon SES through your VPC without requiring an Internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. When you use an interface VPC Endpoint, communication between your VPC and Amazon SES APIs stays within the Amazon network, adding increased security.

With this launch, the traffic to Amazon SES does not transit over the Internet and never leaves the Amazon network to securely connect their VPC to Amazon SES without imposing availability risks or bandwidth constraints on their network traffic. You can centralize Amazon SES across your multi-account infrastructure and provide it as a service to your accounts without the need to utilizing an Internet gateway.

Amazon SES for VPC Endpoints – Getting Started
If you want to test sending emails from your EC2 instance in default VPC, Create a Security Group with following inbound rules and set the private IP of your instance in the EC2 console.

To create the VPC Endpoint for Amazon SES, use the Creating an Interface Endpoint procedure in the VPC console and select com.amazonaws.region.email-smtp service name, and attach security group that you just create it.

After your endpoint will be available, you can ssh to your EC2 instance and use openssl command to test connection or send email through just created endpoint. You can interact with the same way of SMTP interface from your operating system’s command line.

$ openssl s_client -crlf -quiet -starttls smtp -connect email-smtp.ap-southeast-2.amazonaws.com:465
...
depth=2 C = US, O = Amazon, CN = Amazon Root CA 1
verify return:1
depth=1 C = US, O = Amazon, OU = Server CA 1B, CN = Amazon
verify return:1
depth=0 CN = email-smtp.ap-southeast-2.amazonaws.com
verify return:1
...
220 email-smtp.amazonaws.com ESMTP SimpleEmailService-d-ZIFLXXX 
HELO email-smtp.amazonaws.com
...
250 Ok

Note that VPC Endpoints currently do not support cross-region requests—ensure that you create your endpoint in the same region in which you plan to issue your API calls to Amazon SES.

Now Available!
Amazon SES for VPC Endpoints is generally available and you can use it in all regions where Amazon SES is available. There is no additional charge for using Amazon SES for VPC Endpoints. Take a look at the product page and the documentation to learn more. Please send feedback to AWS forum for Amazon SES or through your usual AWS support contacts.

Channy;

Use AWS Firewall Manager and VPC security groups to protect your applications hosted on EC2 instances

Post Syndicated from Kaustubh Phatak original https://aws.amazon.com/blogs/security/use-aws-firewall-manager-vpc-security-groups-to-protect-applications-hosted-on-ec2-instances/

You can use AWS Firewall Manager to centrally configure and manage Amazon Virtual Private Cloud (Amazon VPC) security groups across all your AWS accounts. This post will take you through the step-by-step instructions to apply common security group rules, audit your security groups, and detect unused and redundant rules in your security groups across your AWS environment.

In this post, I’ll show you how to create and enforce a master set of security group rules by using common security group policy, while still allowing developers to deploy and manage application-specific security group rules. In the example below, the security group rules you’ll create allow SSH access only from the public IP address of the bastion host, and to set a policy that prohibits any security group rules that allow SSH access from everywhere (port 22).

When you use Firewall Manager to centrally apply a common security group, you can do things such as ensure that all Application Load Balancers only talk to Amazon CloudFront, or the Secure Shell (SSH) protocol is only allowed from specific IP ranges, or to give system administrators access to a central database.

In many organizations, developers write their own security group rules for their applications. However, if you’re a security administrator, you want to audit the security group rules so you’ll know when a security group is misconfigured. Using audit security group policy, you can set guardrails on which security group rules can or cannot be created across your organization. For example, you could only allow security group rules on ports 10-1000, or specify that you do not allow security group rules on port 23.

As an administrator, you also want to simplify operations by detecting unused and redundant security groups across their AWS accounts. You can use a managed audit policy to help identify unused and redundant security groups.

If you haven’t used these services before, here’s a quick overview:

  1. AWS Firewall Manager is a security management service that allows you to centrally configure and manage firewall rules across your accounts and applications in AWS Organization by using AWS Config in the background. Using AWS Firewall Manager, you can easily roll out AWS WAF rules, create AWS Shield Advanced protections, and enable security groups for your Amazon Elastic Compute Cloud (Amazon EC2) and elastic network interface resource types in Amazon VPCs.
  2. VPC security groups act as a virtual, stateful firewall for your Amazon Elastic Compute Cloud (Amazon EC2) instance to control inbound and outbound traffic. You can specify separate rules for inbound and outbound traffic, and instances associated with a security group can’t talk to each other unless you add rules allowing it.

After you put the master set of security group rules in place, you’ll get notification of all non-compliant changes made by the developers. You can take remediation action if necessary using an audit security group policy. In this post, you’ll also set up a usage security group policy, so that you can flag unused security groups and merge redundant security groups for simpler administration.

Prerequisites

AWS Firewall Manager has the following prerequisites:

  • AWS Organizations: Your organization must be using AWS Organizations to manage your accounts, and All Features must be enabled. For more information, see Creating an Organization and Enabling All Features in Your Organization.
  • An administrator AWS Account: You must designate one of the AWS accounts in your organization as the administrator for Firewall Manager. This gives the account permission to deploy AWS WAF rules across the organization.
  • AWS Config: You must enable AWS Config for all of the accounts in your organization, so that AWS Firewall Manager can detect newly created resources. To enable AWS Config for all of the accounts in your organization, you can use the Enable AWS Config template on the StackSets Sample Templates page. For more information, see Getting Started with AWS Config.

Note: You’ll be charged $100 per policy per month. In the solution in this post, you’ll create three policies. In addition, AWS Config charges also apply. For more information, see AWS Firewall Manager pricing and AWS Config pricing.

Overview

The diagram below illustrates the following steps:

  1. Complete the prerequisites that were outlined in the prerequisites section above.
  2. Create a primary security group under AWS Firewall Manager. This is a VPC security group that gets replicated as a new security group to every resource within the policy scope.
  3. In AWS Firewall Manager, create policies that can be applied to individual application security groups by mapping them to specific application name/value tags. The policies you create will result in the generation of individual new security groups.
  4. Application developers can build additional app-specific security group rules created in the previous step.

 

Figure 1: Overview of solution

Figure 1: Overview of solution

Create a common security group policy

You’ll begin by creating a common security group policy to push primary security group rules across all accounts.

  1. Sign in to the AWS Management Console using the AWS Firewall Manager administrator account that you set up in the prerequisites, and then open the Firewall Manager console.
  2. In the navigation pane, under AWS Firewall Manager, choose Security policies.
  3. Using the Filter menu, select the AWS Region where your application is hosted and choose Create policy. In my example, I choose US West (Oregon).
  4. For Policy type, choose Security group.
  5. For Security group policy type, choose Common security groups, then choose Next.
  6. Enter a policy name. In my example, I’ve named my policy Test_Common_Policy.
  7. Policy rules allow you to choose how the security groups in this policy are applied and maintained. For this tutorial, choose Apply the primary security groups to every resource within the policy scope and leave the other options unchecked. You can also choose to apply only one of these policies. Note that if you choose both check boxes, a local user won’t be able to modify security group and they won’t be able to add additional security groups.
  8. Choose Add primary security group to see all security groups in your account in your specified AWS Region. Select any one of your existing security groups, or create a new security group.
  9. (Optional) If you choose to create a new security group, you’ll be taken to the VPC dashboard where you can create your primary security group by following the Creating a Security Group documentation. Under audit security group, add the following:
    1. For Ingress Rules, choose Allow access on Port 22 from 203.0.113.1/32.
    2. For Egress Rules, choose Allow all traffic on all ports.
  10. After you select the primary security group, choose Add security group.
  11. For Policy action, for this example, choose Apply policy rules and identify resources that are non-compliant but do not auto remediate. By selecting this option, Firewall Manager will notify you of any non-compliant security groups, but will not auto-remediate. Choose Next.
  12. For Policy scope, select the following:
    1. For AWS accounts included in this policy, choose All accounts under my organization.
    2. For Resource Type to apply this policy, choose EC2 instances.
    3. For Criteria to select the resources to protect, choose Include only resources that have the specified tags.
    4. For Key, enter Env.
    5. For Value, enter Prod.

    Choose Next.

  13. Review the security policy, then choose Create policy.

 

Figure 2: Summary of Common Security Group policy

Figure 2: Summary of Common Security Group policy

The security policy will review all the EC2 instances in your child accounts in your specified AWS Region and add the primary security group to the primary network interface of the Amazon EC2 instances. All primary interfaces of the Amazon EC2 instances created in future will also have this primary security group. If the developers remove the security group rules of the primary security group, you’re notified when Firewall Manager Service marks the resource as non-compliant. You can then take remediation action of changing the security policy action to Apply policy rules and auto remediate any non-compliant resources and the non-compliant security group rules will be removed. Alternatively, you can check the non-compliant resources, then log into the AWS account and take remediation action manually.

Create an audit security group policy

Now, you’ll create an audit security group policy to enforce the guardrails. You’ll create a security group rule that allows port 22 access from an allowed IP subnet of 203.0.113.1/32 according to the security team’s recommendations.

  1. In the AWS Management Console, select AWS WAF and AWS Shield.
  2. In the navigation pane, under AWS Firewall Manager, choose Security policies.
  3. In the Filter, select the AWS Region where your application is hosted and choose Create policy. In my example, I will choose US West (Oregon).
  4. For Policy type, choose Security group. For Security group policy type, choose Auditing and enforcement guidelines for security group rules, then choose Next.
  5. Enter a policy name. In my example, I’ve named my policy Test_Audit_Policy.
  6. For Policy rules, select Allow any rules defined in audit security group.
  7. Choose Add audit security group to see all security groups in your account in your specified AWS Region. You can select a security group, or create a new security group.
  8. (Optional) If you choose to create a new security group, you’ll be taken to VPC dashboard where you can create your primary security group by following the Creating a Security Group documentation. In the audit security group, add the following:
    1. For Ingress Rules, choose Allow access on Port 22 from 203.0.113.1/32.
    2. For Egress Rules, choose Allow all traffic on all ports.
  9. After you select the audit security group, choose Add security group.
  10. For Policy action, you can only select Apply policy rules and identify resources that are non-compliant but do not auto remediate. By selecting this option, Firewall Manager will notify you of any non-compliant security groups, but will not auto-remediate. Choose Next.
  11. For Policy scope, select the following:
    1. For AWS accounts included in this policy, choose All accounts under my organization.
    2. For Resource type to apply this policy, choose Security groups.
    3. For Criteria to select the resources to protect, choose Include only resources that have the specified tags.
    4. For Key, enter Env.
    5. For Value, enter Prod.

    Choose Next.

  12. Review the security policy and choose Create policy.

 

Figure 3: Summary of Audit Security Group policy

Figure 3: Summary of Audit Security Group policy

The security policy will audit all the security groups in your child accounts in your specified AWS Region and will only allow security group ingress rules that allow port 22 access from 203.0.113.1/32. All security groups created in future will also have this restriction. If Firewall Manager detects that a security groups exists that allows port 22 access from everywhere except 203.0.113.1/32, you’re notified when Firewall Manager Service marks the resource as non-compliant. You can then take remediation action of editing the security policy action to Apply policy rules and auto remediate any non-compliant resources and the non-compliant security group rules will be removed. Alternatively, you can check the non-compliant resources, then log into the AWS account and take remediation action manually.

Create a usage security group policy

Lastly, you’ll create a usage security group policy to remove unused security groups, and to merge redundant security groups.

  1. In the AWS Management Console, select AWS WAF and Shield.
  2. In the navigation pane, under AWS Firewall Manager, choose Security policies. In the Filter, select the AWS Region where your application is hosted and choose Create policy. In my example, I am choosing US West (Oregon).
  3. For Policy type, choose Security group. For Security group policy type, choose Auditing and cleanup of unused and redundant security groups. Choose Next.
  4. Enter a policy name. In my example, I’ve named my policy Test_Usage_Policy.
  5. For Policy rules, select both the options: Security groups within this policy scope should be used by at least one resource and Security groups within this policy scope should not have similar content.
  6. For Policy action, select Apply policy rules and identify resources that are non-compliant but do not auto remediate. Choose Next.
  7. For Policy scope, select the following:
    1. For AWS accounts included in this policy, choose All accounts under my organization.
    2. For Resource type to apply this policy, choose Security groups.
    3. For Criteria to select the resources to protect, choose Include only resources that have the specified tags.
    4. For Key, enter Env.
    5. For Value, enter Prod.

    Choose Next.

  8. A pop-up warning message will appear. Select Exclude Firewall Manager admin account from the policy scope, so that security groups in the administrator account are not affected.
  9. Review the security policy and choose Create policy.

 

Figure 4: Summary of Usage Security Group policy

Figure 4: Summary of Usage Security Group policy

The security policy will review all the security groups in your child accounts in your specified AWS Region and check if there are any security groups that are not associated with any resource. The security policy will also review if there are any duplicate security group rules. After these cases are identified, AWS Firewall Manager will automatically merge them into one security group. All security groups created in future will also be checked for this. If Firewall Manager detects that a security groups exists that is not associated to any resource or has overlapping rules, you’ll be notified when Firewall Manager Service marks the resource as non-compliant. You can then take remediation action of editing the security policy action to Apply policy rules and auto remediate any non-compliant resources and the non-compliant security group rules will either be removed (in case of unused) or rules will be merged (in case of redundant security groups). Alternatively, you can check the non-compliant resources, then log into the AWS account and take remediation action manually.

Conclusion

In this post, you learned how you can create AWS Firewall Manager rules using the console. Using both VPC security groups and AWS Firewall Manager, you created a deployment strategy that enables the developers in your organization to maintain a security mindset and begin coding security group rules, while at the same time ensuring that all applications are still protected by a set of security group rules defined by your organization’s security team. In addition, you have reduced the likelihood of misconfigured or overly permissive security groups, as well as the operational burden, by simplifying the security groups created in all your member accounts.

For further reading, see AWS Firewall Manager Update – Support for VPC Security Groups.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Firewall Manager forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Kaustubh Phatak

Kaustubh is a Cloud Support Engineer II at AWS. On a daily basis, he provides solutions for customers’ cloud architecture questions related to Networking, Security, and DevOps domain. Outside of the office, Kaustubh likes to play cricket, ping-pong, and soccer. He is also an avid console gamer.

Using VPC Sharing for a Cost-Effective Multi-Account Microservice Architecture

Post Syndicated from Anandprasanna Gaitonde original https://aws.amazon.com/blogs/architecture/using-vpc-sharing-for-a-cost-effective-multi-account-microservice-architecture/

Introduction

Many cloud-native organizations building modern applications have adopted a microservice architecture because of its flexibility, performance, and scalability. Even customers with legacy and monolithic application stacks are embarking on an application modernization journey and opting for this type of architecture. A microservice architecture allows applications to be composed of several loosely coupled discreet services that are independently deployable, scalable, and maintainable. These applications can comprise a large number of microservices, which often span multiple business units within an organization. These customers typically have a multi-account AWS environment with each AWS account belonging to an individual business unit. Their microservice implementations reside in the Virtual Public Clouds (VPCs) of their respective AWS accounts. You can set up multi-account AWS environment incorporating best practices using AWS Landing Zone or AWS Control Tower.

This type of multi-account, multi-VPC architecture provides a good boundary and isolation for individual microservices and achieves a highly available, scalable, and secure architecture. However, for microservices that require a high degree of interconnectivity and are within the same trust boundaries, you can use other AWS capabilities to optimize cost and network management complexity.

This blog presents a cost-effective approach that requires less VPC management while still using separate accounts for billing and access control. This approach does not sacrifice scalability, high availability, fault tolerance, and security. To achieve a similar microservice architecture, you can share a VPC across AWS accounts using AWS Resource Access Manager (AWS RAM) and Network Load Balancer (NLB) support in a shared Amazon Virtual Private Cloud (VPC). This allows multiple microservices to coexist in the same VPC, even though they are developed by different business units.

Microservices architecture in a multi-VPC approach

In this architecture, microservices deployed across multiple VPCs use privately exposed endpoints for better security posture instead of going over the internet. This requires the customers to enable inter-VPC communication using the various networking capabilities of AWS as shown below:

microservices deployed across multiple VPCs use privately exposed endpoints

In the above reference architecture, we created a VPC in Account A, which is hosting the front end of the application across a fleet of Amazon Elastic Compute Cloud (Amazon EC2) instances using an AWS Auto Scaling group. For simplicity, we’ve illustrated a single public and private subnet for the application front end. In reality, this spans across multiple subnets across multiple Availability Zones (AZ) to support a highly available and fault-tolerant configuration.

To ensure security, the application must communicate privately to microservices mS1 and mS2 deployed in VPC of Account B and Account C respectively. For high availability, these microservices are also implemented using a fleet of Amazon EC2 instances with the Auto Scaling group spanning across multiple subnets/availability zones. For high-performance load balancing, they are fronted by a Network Load Balancer.

While this architecture shows an implementation using Amazon EC2, it can also use containerized services deployed using Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). These microservices may have interdependencies and invoke each other’s’ APIs for servicing the requests of the application layer. This application to mS and mS to mS communication can be achieved using following possible connectivity options:

When only few VPC interconnections are required, Amazon VPC peering and AWS PrivateLink may be a viable option. For higher number of VPC interconnections, we recommend AWS Transit Gateway for better manageability of connections and routing through a centralized resource. However, based on the amount of traffic this can introduce significant costs to your architecture.

Alternative approach to microservice architecture using Network Load Balancers in a shared VPC

The above architecture pattern allows your individual microservice teams to continue to own their AWS resources that host their microservice implementation. But they can deploy them in a shared VPC owned by the central account, eliminating the need for inter-VPC network connections. You can share Amazon VPCs to use the implicit routing within a VPC for applications that require a high degree of interconnectivity and are within the same trust boundaries.

This architecture uses AWS RAM, which allows you to share the VPC Subnets from AWS Account A to participating AWS accounts within your AWS organization. When the subnets are shared, participant AWS accounts (Account B and Account C) can see the shared subnets in their own environment. They can then deploy their Amazon EC2 instances in those subnets. This is depicted in the diagram where the visibility of the shared subnets (SS1 and SS2) is extended to the participating accounts (Account B and Account C).

You can also deploy the NLB in these shared subnets. Then, each participant account owns all the AWS resources for their microservice stack, but it’s deployed in the VPC of Account A.

This allows your individual microservice teams to maintain control over load balancer configurations and Auto Scaling policies based for their specific microservices’ needs. At the same time, using the AWS RAM they are able to effectively use the existing VPC environment of Account A.

This architecture presents several benefits over the multi-VPC architecture discussed earlier:

  • You can deploy the entire application, including the individual microservices, into a single shared VPC. This is while still allowing individual microservice teams control over their AWS resources deployed in that VPC.
  • Since the entire architecture now resides in a single VPC, it doesn’t require other networking connectivity features. It can rely on intra-VPC traffic for communication between the application (API) layer and microservices.
  • This leads to reduction in cost of the architecture. While the AWS RAM functionality is free of charge, this also reduces the data transfer and per-connection costs incurred by other options such as VPC peering, AWS PrivateLink, and AWS Transit Gateway.
  • This maintains the isolation across the individual microservices and the application layer.  Participants can’t view, modify, or delete resources that belong to others or the VPC owner.
  • This also leads to effective utilization of your VPC CIDR block resources.
  • Since multiple subnets belonging to different Availability Zones are shared, the application and individual mS continues to take advantage of scalability, availability, and fault tolerance.

The following illustration shows how you can configure AWS RAM to set up the VPC subnet resource shares between owner Account A and participating Account B. The example below shows the sharing of private subnet SS1 using this method:

(Click for larger image)

Accounts A and B Resource Share

Once this subnet is shared, the participating Account B can launch its Network Load Balancer of its microservice ms1 in the shared VPC subnet as shown below:

Account B can launch its Network Load Balancer of its microservice ms1 in the shared VPC subnet

While this architecture has many advantages, there are important considerations:

  • This style of architecture is suitable when you are certain that the number of microservices is small enough to coexist in a single VPC without depleting the CIDR block of the shared subnets of the VPC.
  • If the traffic between these microservices is in-significant, then the cost benefit of this architecture over other options may not be substantial. This is due to the effect of traffic flow on data transfer cost.

Conclusion

AWS Cloud provides several options to build a microservices architecture. It is important to look at the characteristics of your application to determine which architectural choices top opt for. The AWS RAM and the ability to deploy AWS resources (including Network Load Balancers in shared VPC) helps you eliminate inter-VPC traffic and associated networking costs. And this without sacrificing high availability, scalability, fault tolerance, and security for your application.

Top 10 Architecture Blog Posts of 2019

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/top-10-architecture-blog-posts-of-2019/

As we wind our way toward 2020, I want to take a moment to first thank you, our readers, for spending time on our blog. We grew our audience quite a bit this year and the credit goes to our hard-working Solutions Architects and other blog post writers. Below are the top 10 Architecture blog posts written in 2019.

#10: How to Architect APIs for Scale and Security

by George Mao

George Mao, a Specialist Solutions Architect at AWS, focuses on serverless computing and has FIVE posts in the top ten this year. Way to go, George!

This post was the first in a series that focused on best practices and concepts you should be familiar with when you architect APIs for your applications.

Read George’s post.

#9: From One to Many: Evolving VPC Guidance

by Androski Spicer

Since its inception, the Amazon Virtual Private Cloud (VPC) has acted as the embodiment of security and privacy for customers who are looking to run their applications in a controlled, private, secure, and isolated environment.

This logically isolated space has evolved, and in its evolution has increased the avenues that customers can take to create and manage multi-tenant environments with multiple integration points for access to resources on-premises.

Read Androski’s post.

#8: Things to Consider When You Build REST APIs with Amazon API Gateway

by George Mao

REST API 2

This post dives deeper into the things an API architect or developer should consider when building REST APIs with Amazon API Gateway.

Read George’s post.

#7: How to Design Your Serverless Apps for Massive Scale

by George Mao

Serverless at scale-1

Serverless is one of the hottest design patterns in the cloud today, allowing you to focus on building and innovating, rather than worrying about the heavy lifting of server and OS operations. In this series of posts, we’ll discuss topics that you should consider when designing your serverless architectures. First, we’ll look at architectural patterns designed to achieve massive scale with serverless.

Read George’s post.

#6: Best Practices for Developing on AWS Lambda

by George Mao

RDS instance: When to VPC enable a Lambda function

One of the benefits of using Lambda, is that you don’t have to worry about server and infrastructure management. This means AWS will handle the heavy lifting needed to execute your AWS Lambda functions. Take advantage of this architecture with the tips in this post.

Read George’s post.

#5: Stream Amazon CloudWatch Logs to a Centralized Account for Audit and Analysis

by David Bailey

Figure 1 - Initial Landing Zone logging account resources

A key component of enterprise multi-account environments is logging. Centralized logging provides a single point of access to all salient logs generated across accounts and regions, and is critical for auditing, security and compliance. While some customers use the built-in ability to push Amazon CloudWatch Logs directly into Amazon Elasticsearch Service for analysis, others would prefer to move all logs into a centralized Amazon Simple Storage Service (Amazon S3) bucket location for access by several custom and third-party tools. In this blog post, David Bailey will show you how to forward existing and any new CloudWatch Logs log groups created in the future to a cross-account centralized logging Amazon S3 bucket.

Read David’s post.

#4: Updates to Serverless Architectural Patterns and Best Practices

by Drew Dennis

Drew wrote this post at about the halfway point between re:Invent 2018 and re:Invent 2019, where he revisited some of the recent serverless announcements we’ve made. These are all complimentary to the patterns discussed in the re:Invent architecture track’s Serverless Architectural Patterns and Best Practices session.

Read Drew’s post.

#3: Understanding the Different Ways to Invoke Lambda Functions

by George Mao

Invoking Lambda

In George’s first post of this series (#7 on this list), he talked about general design patterns to enable massive scale with serverless applications. In this post, he’ll review the different ways you can invoke Lambda functions and what you should be aware of with each invocation model.

Read George’s post.

#2: Using API Gateway as a Single Entry Point for Web Applications and API Microservices

by Anandprasanna Gaitonde and Mohit Malik

In this post, Anand and Mohit talk about a reference architecture that allows API Gateway to act as single entry point for external-facing, API-based microservices and web applications across multiple external customers by leveraging a different subdomain for each one.

Read Anand’s and Mohit’s post.

#1: 10 Things Serverless Architects Should Know

by Justin Pirtle

Building on the first three parts of the AWS Lambda scaling and best practices series where you learned how to design serverless apps for massive scale, AWS Lambda’s different invocation models, and best practices for developing with AWS Lambda, Justin invited you to take your serverless knowledge to the next level by reviewing 10 topics to deepen your serverless skills.

Read Justin’s post.

Thank You

Thanks again to all our readers and blog post writers. We look forward to learning and building amazing things together in the coming year.

Best of 2019

Coming soon: Updated Lambda states lifecycle for VPC networking

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/coming-soon-updated-lambda-states-lifecycle-for-vpc-networking/

On November 27, we announced that AWS Lambda now includes additional attributes in the function information returned by several Lambda API actions to better communicate the current “state” of your function, when they are being created or updated. In our post “Tracking the state of AWS Lambda functions”, we covered the various states your Lambda function can be in, the conditions that lead to them, and how the Lambda service transitions the function through those states.

Our first feature using the function states lifecycle is a change to the recently announced improved VPC networking for AWS Lambda functions. As stated in the announcement post, Lambda creates the ENIs required for your function to connect to your VPCs, which can take 60–90 seconds to complete. We are updating this operation to explicitly place the function into a Pending state while pre-creating the required elastic network interface resources, and transitioning to an Active state after that process is completed. By doing this, we can use the lifecycle to complete the creation of these resources, and then reduce inconsistent invokes after the create/update has completed.

Most customers experience no impact from this change except for fewer long cold-starts due to network resource creation. As a reminder, any invocations or other API actions that operate on the function will fail during the time before the function is Active. To better assist you in adopting this behavior, we are rolling out this behavior for VPC configured functions in a phased manner. This post provides further details about timelines and approaches to both test the change before it is 100% live or delay it for your functions using a delay mechanism.

Changes to function create and update

On function create

During creation of new functions configured for VPC, your function remains in the Pending state until all VPC resources are created. You are not able to invoke the function or take any other Lambda API actions against it. After successful completion of the creation of these resources, your function transitions automatically to the Active state and is available for invokes and Lambda API actions. If the network resources fail to create then your function is placed in a Failed state.

On function update

During the update of functions configured for VPC, if there are any modifications to the VPC configuration, the function remains in the Active state, but shows in the InProgress status until all VPC resources are updated. During this time, any invokes go to the previous function code and configuration. After successful completion, the function LastUpdateStatus transitions automatically to Successful and all new invokes use the newly updated code and configuration. If the network resources fail to be created/updated then the LastUpdateStatus shows Failed, but the previous code and configuration remains in the Active state.

It’s important to note that creation or update of VPC resources can take between 60-90 seconds complete.

Change timeframe

As a reminder, all functions today show an Active state only. We are rolling out this change to create resources during the Pending state over a multiple phase period starting with the Begin Testing phase today, December 16, 2019. The phases allow you to update tooling for deploying and managing Lambda functions to account for this change. By the end of the update timeline, all accounts transition to using this new VPC resource create/update Lambda lifecycle.

Update timeline

Update timeline

December 16, 2019 – Begin Testing: You can now begin testing and updating any deployment or management tools you have to account for the upcoming lifecycle change. You can also use this time to update your function configuration to delay the change until the Delayed Update phase.

January 20, 2020 – General Update: All customers without the delayed update configuration begin seeing functions transition as described above under “On function create” and “On function update”.

February 17, 2020 – Delayed Update: The delay mechanism expires and customers now see the new VPC resource lifecycle applied during function create or update.

March 2, 2020 – Update End: All functions now have the new VPC resource lifecycle applied during function create or update.

Opt-in and delayed update configurations

Starting today, we are providing a mechanism for an opt-in, to allow you to update and test your tools and developer workflow processes for this change. We are also providing a mechanism to delay this change until the end of the Delayed Update phase. If you configure your functions for VPC and use the delayed update mechanism after the start of the General Update, your functions continue to experience a delayed first invocation due to VPC resource creation.

This mechanism operates on a function-by-function basis, so you can test and experiment individually without impacting your whole account. Once the General Update phase begins, all functions in an account that do not have the delayed update mechanism in place see the new lifecycle for their functions.

Both mechanisms work by adding a special string in the “Description” parameter of your Lambda functions. This string can be added to the prefix or suffix, or be the entire contents of the field.

To opt in:

aws:states:opt-in

To delay the update:

aws:states:opt-out

NOTE: Delay configuration mechanism has no impact after the Delayed Update phase ends.

Here is how this looks in the console:

  1. I add the opt-in configuration to my function’s Description.

    Opt-in in Description

    Opt-in in Description

  2. When I choose Save at the top, I see the update begin. During this time, I am blocked from executing tests, updating my code, and making some configuration changes against the function.

    Function updating

    Function updating

  3. After the update completes, I can once again run tests and other console commands.

    Function update successful

    Function update successful

Once the opt-in is set for a function, then updates on that function go through the update flow shown above. If I don’t change my function’s VPC configuration, then updates to my function transition almost instantly to the Successful update status.

With this in place, you can now test your development workflow ahead of the General Update phase. Download the latest CLI (version 1.16.291 or greater) or SDKs in order to see function state and related attribute information.

Conclusion

With functions states, you can have better clarity on how the resources required by your Lambda function are being created. This change does not impact the way that functions are invoked or how your code is executed. While this is a minor change to when resources are created for your Lambda function, the result is even better consistency of performance. Combined with the original announcement of improved VPC networking for Lambda, you experience better consistency for invokes, greatly reduced cold-starts, and fewer network resources created for your functions.

re:Invent 2019: Introducing the Amazon Builders’ Library (Part I)

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/reinvent-2019-introducing-the-amazon-builders-library-part-i/

Today, I’m going to tell you about a new site we launched at re:Invent, the Amazon Builders’ Library, a collection of living articles covering topics across architecture, software delivery, and operations. You get to peek under the hood of how Amazon architects, releases, and operates the software underpinning Amazon.com and AWS.

Want to know how Amazon.com does what it does? This is for you. In this two-part series (the next one coming December 23), I’ll highlight some of the best architecture articles written by Amazon’s senior technical leaders and engineers.

Avoiding insurmountable queue backlogs

Avoiding insurmountable queue backlogs

In queueing theory, the behavior of queues when they are short is relatively uninteresting. After all, when a queue is short, everyone is happy. It’s only when the queue is backlogged, when the line to an event goes out the door and around the corner, that people start thinking about throughput and prioritization.

In this article, I discuss strategies we use at Amazon to deal with queue backlog scenarios – design approaches we take to drain queues quickly and to prioritize workloads. Most importantly, I describe how to prevent queue backlogs from building up in the first place. In the first half, I describe scenarios that lead to backlogs, and in the second half, I describe many approaches used at Amazon to avoid backlogs or deal with them gracefully.

Read the full article by David Yanacek – Principal Engineer

Timeouts, retries, and backoff with jitter

Timeouts, retries and backoff with jitter

Whenever one service or system calls another, failures can happen. These failures can come from a variety of factors. They include servers, networks, load balancers, software, operating systems, or even mistakes from system operators. We design our systems to reduce the probability of failure, but impossible to build systems that never fail. So in Amazon, we design our systems to tolerate and reduce the probability of failure, and avoid magnifying a small percentage of failures into a complete outage. To build resilient systems, we employ three essential tools: timeouts, retries, and backoff.

Read the full article by Marc Brooker, Senior Principal Engineer

Challenges with distributed systems

Challenges with distributed systems

The moment we added our second server, distributed systems became the way of life at Amazon. When I started at Amazon in 1999, we had so few servers that we could give some of them recognizable names like “fishy” or “online-01”. However, even in 1999, distributed computing was not easy. Then as now, challenges with distributed systems involved latency, scaling, understanding networking APIs, marshalling and unmarshalling data, and the complexity of algorithms such as Paxos. As the systems quickly grew larger and more distributed, what had been theoretical edge cases turned into regular occurrences.

Developing distributed utility computing services, such as reliable long-distance telephone networks, or Amazon Web Services (AWS) services, is hard. Distributed computing is also weirder and less intuitive than other forms of computing because of two interrelated problems. Independent failures and nondeterminism cause the most impactful issues in distributed systems. In addition to the typical computing failures most engineers are used to, failures in distributed systems can occur in many other ways. What’s worse, it’s impossible always to know whether something failed.

Read the full article by Jacob Gabrielson, Senior Principal Engineer

Static stability using Availability Zones

Static stability using availability zones

At Amazon, the services we build must meet extremely high availability targets. This means that we need to think carefully about the dependencies that our systems take. We design our systems to stay resilient even when those dependencies are impaired. In this article, we’ll define a pattern that we use called static stability to achieve this level of resilience. We’ll show you how we apply this concept to Availability Zones, a key infrastructure building block in AWS and therefore a bedrock dependency on which all of our services are built.

Read the full article by Becky Weiss, Senior Principal Engineer, and Mike Furr, Principal Engineer

Check back in two weeks to read about some other architecture-based expert articles that let you in on how Amazon does what it does.

New for AWS Transit Gateway – Build Global Networks and Centralize Monitoring Using Network Manager

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-aws-transit-gateway-build-global-networks-and-centralize-monitoring-using-network-manager/

As your company grows and gets the benefits of a cloud-based infrastructure, your on-premises sites like offices and stores increasingly need high performance private connectivity to AWS and to other sites at a reasonable cost. Growing your network is hard, because traditional branch networks based on leased lines are costly, and they suffer from the same lack of elasticity and agility as traditional data centers.

At the same time, it becomes increasingly complex to manage and monitor a global network that is spread across AWS regions and on-premises sites. You need to stitch together data from these diverse locations. This results in an inconsistent operational experience, increased costs and efforts, and missed insights from the lack of visibility across different technologies.

Today, we want to make it easier to build, manage, and monitor global networks with the following new capabilities for AWS Transit Gateway:

  • Transit Gateway Inter-Region Peering
  • Accelerated Site-to-Site VPN
  • AWS Transit Gateway Network Manager

These new networking capabilities enable you to optimize your network using AWS’s global backbone, and to centrally visualize and monitor your global network. More specifically:

  • Inter-Region Peering and Accelerated VPN improve application performance by leveraging the AWS Global Network. In this way, you can reduce the number of leased-lines required to operate your network, optimizing your cost and improving agility. Transit Gateway Inter-Region Peering sends inter region traffic privately over AWS’s global network backbone. Accelerated VPN uses AWS Global Accelerator to route VPN traffic from remote locations through the closest AWS edge location to improve connection performance.
  • Network Manager reduces the operational complexity of managing a global network across AWS and on-premises. With Network Manager, you set up a global view of your private network simply by registering your Transit Gateways and on-premises resources. Your global network can then be visualized and monitored via a centralized operational dashboard.

These features allow you to optimize connectivity from on-premises sites to AWS and also between on-premises sites, by routing traffic through Transit Gateways and the AWS Global Network, and centrally managing through Network Manager.

Visualizing Your Global Network
In the Network Manager console, that you can reach from the Transit Gateways section of the Amazon Virtual Private Cloud console, you have an overview of your global networks. Each global network includes AWS and on-premises resources. Specifically, it provides a central point of management for your AWS Transit Gateways, your physical devices and sites connected to the Transit Gateways via Site-to-Site VPN Connections, and AWS Direct Connect locations attached to the Transit Gateways.

For example, this is the Geographic view of a global network covering North America and Europe with 5 Transit Gateways in 3 AWS Regions, 80 VPCs, 50 VPNs, 1 Direct Connect location, and 16 on-premises sites with 50 devices:

As I zoom in the map, I get a description on what these nodes represent, for example if they are AWS Regions, Direct Connect locations, or branch offices.

I can select any node in the map to get more information. For example, I select the US West (Oregon) AWS Region to see the details of the two Transit Gateways I am using there, including the state of all VPN connections, VPCs, and VPNs handled by the selected Transit Gateway.

Selecting a site, I get a centralized view with the status of the VPN connections, including site metadata such as address, location, and description. For example, here are the details of the Colorado branch offices.

In the Topology panel, I see the logical relationship of all the resources in my network. On the left here there is the entire topology of my global network, on the right the detail of the European part. Connections status is reported as color in the topology view.

Selecting any node in the topology map displays details specific to the resource type (Transit Gateway, VPC, customer gateway, and so on) including links to the corresponding service in the AWS console to get more information and configure the resource.

Monitoring Your Global Network
Network Manager is using Amazon CloudWatch, which collects raw data and processes it into readable, near real-time metrics for data in/out, packets dropped, and VPN connection status.

These statistics are kept for 15 months, so that you can access historical information and gain a better perspective on how your web application or service is performing. You can also set alarms that watch for certain thresholds, and send notifications or take actions when those thresholds are met.

For example, these are the last 12 hours of Monitoring for the Transit Gateway in Europe (Ireland).

In the global network view, you have a single point of view of all events affecting your network, simplifying root cause analysis in case of issues. Clicking on any of the messages in the console will take to a more detailed view in the Events tab.

Your global network events are also delivered by CloudWatch Events. Using simple rules that you can quickly set up, you can match events and route them to one or more target functions or streams. To process the same events, you can also use the additional capabilities offered by Amazon EventBridge.

Network Manager sends the following types of events:

  • Topology changes, for example when a VPN connection is created for a transit gateway.
  • Routing updates, such as when a route is deleted in a transit gateway route table.
  • Status updates, for example in case a VPN tunnel’s BGP session goes down.

Configuring Your Global Network
To get your on-premises resources included in the above visualizations and monitoring, you need to input into Network Manager information about your on-premises devices, sites, and links. You also need to associate devices with the customer gateways they host for VPN connections.

Our software-defined wide area network (SD-WAN) partners, such as Cisco, Aruba, Silver Peak, and Aviatrix, have configured their SD-WAN devices to connect with Transit Gateway Network Manager in only a few clicks. Their SD-WANs also define the on-premises devices, sites, and links automatically in Network Manager. SD-WAN integrations enable to include your on-premises network in the Network Manager global dashboard view without requiring you to input information manually.

Available Now
AWS Transit Gateway Network Manager is a global service available for Transit Gateways in the following regions: US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Europe (London), Europe (Paris), Asia Pacific (Singapore), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Sydney), Asia Pacific (Mumbai), Canada (Central), South America (São Paulo).

There is no additional cost for using Network Manager. You pay for the network resources you use, like Transit Gateways, VPNs, and so on. Here you can find more information on pricing for VPN and Transit Gateway.

You can learn more in the documentation of the Network ManagerInter-Region Peering, and Accelerated VPN.

With these new features, you can take advantage of the performance of our AWS Global Network, and simplify network management and monitoring across your AWS and on-premises resources.

Danilo