We’re sharing an update to the Logical Separation on AWS: Moving Beyond Physical Isolation in the Era of Cloud Computing whitepaper to help customers benefit from the security and innovation benefits of logical separation in the cloud. This paper discusses using a multi-pronged approach—leveraging identity management, network security, serverless and containers services, host and instance features, logging, and encryption—to build logical security mechanisms that meet and often exceed the security results of physical separation of resources and other on-premises security approaches. Public sector and commercial organizations worldwide can leverage these mechanisms to more confidently migrate sensitive workloads to the cloud without the need for physically dedicated infrastructure.
Amazon Web Services (AWS) addresses the concerns driving physical separation requirements through the logical security capabilities we provide customers and the security controls we have in place to protect customer data. The strength of that isolation combined with the automation and flexibility that the isolation provides is on par with or better than the security controls seen in traditional physically separated environments.
The paper also highlights a U.S. Department of Defense (DoD) use case demonstrating how the AWS logical separation capabilities met the intent behind a DoD requirement for dedicated, physically isolated infrastructure for its most sensitive unclassified workloads.
If you have questions or want to learn more, contact your account executive or contact AWS Support. If you have feedback about this post, submit comments in the Comments section below.
Connecting on-premises data centers to AWS using AWS Site-to-Site VPN to support distributed applications is a common practice. With business expansion and acquisitions, your company’s on-premises IT footprint may grow into various geographies, with these multiple sites comprising of on-premises data centers and co-location facilities. AWS Site-to-Site VPN supports throughput up to 1.25 Gbps, although the actual throughput can be lower for VPN connections that are in a different geolocations from the AWS region. This is because the internet path between them has to traverse multiple networks. For globally distributed applications that interact with other applications and components located on-premises, these VPN connections can impact performance and user experience.
This blog post provides an architectural approach to improving the performance of such globally distributed applications. We’ll explain an architecture that utilizes AWS Global Accelerator to create highly performant connectivity in terms of latency and bandwidth for VPN connections that originate from distant geographies around the world. Using this architecture, you can optimize your inter-application traffic between remote sites and your AWS environment, which can lead to better application performance and customer experience.
Distributed application architecture in a hybrid cloud using VPN
The above figure shows a pictorial representation of a customer’s existing IT footprint spread across several locations in the U.S., Europe, and the Asia Pacific (APAC), while the AWS environment is set up in us-east-1 region. In this use case, a business application hosted in AWS has the following dependencies on remote data centers and is also accessed by remote corporate users:
Communication with an application hosted in a data center in EU region
Communication with a data center in the US where corporate users access the AWS application over VPN
Integration with local API based service in the APAC region
Site-to-Site VPN from a remote site to an AWS environment provides secure connectivity for this inter-application traffic, as well as traffic from users to the application. Sites closer to the us-east-1 region may see reasonably good network performance and latency. However, sites that are geographically remote may experience higher latencies and not-so-reliable network performance due to the number of network hops spanning multiple networks and possible congestion. In addition, varying network paths through the Internet backbone can also lead to increased latencies. This impacts the overall application performance, which can lead to an unsatisfactory customer experience.
Optimizing application performance with Accelerated VPN connections
The above diagram shows the business application hosted in a multi-VPC architecture on AWS comprising of a production VPC and a sandbox VPC, typical of customer environments. These VPCs are interconnected using AWS Transit Gateway, and the VPN connections from the three remote sites terminate at AWS Transit Gateway as VPN attachments.
To improve the user experience for the application, VPN attachments to AWS Transit gateway are enabled with a feature called Accelerated Site-to-Site VPN. With this feature enabled, AWS Global Accelerator routes traffic from an on-premises network to the AWS Edge location closest to your customer’s gateway. It uses the AWS global network to route traffic through the AWS Global backbone from the closest Edge location, thereby ensuring the traffic remains over the optimum network path. This translates into faster response times, increased throughput, and a better user experience as described in this blog post about better performance for internet traffic with AWS Global Accelerator.
The Accelerated Site-to-Site VPN feature is enabled by creating accelerators that allow you to associate two Anycast static IPs from the Edge network. (Anycast is a network addressing and routing method that attributes a single IP address to multiple endpoints in a network.) These static IP addresses act as a fixed entry point to the VPN tunnel endpoints. This improves the availability and performance of your applications that need to interface with remote sites for their functionality. The above diagram shows three Edge locations, each one corresponding to the accelerators for each of the VPN connections. Since AWS Transit Gateway allows connectivity to multiple VPCs in your AWS environment, the benefit of improved network performance is extended to applications and workloads in VPCs connected to the transit gateway. This architecture scales as business demands and workloads continue to grow on AWS.
Configuring your VPN connections for the Acceleration
To make changes to your existing VPN, consider the following for enabling the acceleration:
If your current existing VPN connections are terminating on a VPN Gateway, you will need to create an AWS Transit Gateway and create VPC attachments from the application VPC to the Transit Gateway.
Existing VPN connections on Transit Gateway can’t be modified to take advantage of the acceleration, so you will need to tear down existing connections and set up new ones in the AWS console as shown below. Then, configure your customer gateway device to use the new Site-to-Site VPN connection and delete the old Site-to-Site VPN connection.
Accelerated VPN connections use two VPN tunnels per connection like a regular Site-to-Site VPN connection. For accelerated VPN connections, each tunnel uses a separate accelerator and a separate pool of IP addresses for the tunnel endpoint IP addresses. The IP addresses for the two VPN tunnels are selected from two separate network zones. This ensures high availability for your VPN connections and can handle any network disruptions within a particular zone. If an Edge location fails, the customer gateway can reinitiate the VPN tunnel to the same IP address and get connected to the nearest available Edge location, making it resilient. These are the outside IP addresses to which the customer gateway will connect, as shown below:
Considerations
Accelerated VPN functionality provides benefits to architectures involved in communicating with remote data centers and on-premises locations, but there are some considerations to keep in mind:
Additional charges are involved due to the use of Global Accelerator when acceleration is enabled. Performance testing should be done to evaluate the benefit it provides to your application.
Don’t enable accelerated VPN when the customer gateway for your VPN connection is also in an AWS environment since that traffic already traverses through the AWS backbone.
Applications that require a consistent network performance and a dedicated private connection should consider moving to AWS Direct Connect.
From the AWS Region where your application resides, you can use the Global Accelerator Speed Comparison tool from those remote data centers to see Global Accelerator download speeds compared to direct internet downloads. Note that while the tool uses TCP, the VPN uses UDP protocol, meaning it’s not a performance test of a VPN connection. However, it will give you a reasonable indication of the performance improvement for your VPN.
Summary
As you start adopting the cloud and migrating workloads to the AWS platform, you’ll realize the inherent benefits of scalability, high availability, and security to create fault-tolerant and production-grade applications. During this transition, you will have hybrid cloud environments utilizing VPN connectivity. Accelerated Site-to-Site VPN connections can provide you with performance improvements for your application traffic. This is a good alternative until your traffic demands and architecture considerations mandate the use of a dedicated network path using AWS Direct Connect from your remote locations to AWS.
Many cloud-native organizations building modern applications have adopted a microservice architecture because of its flexibility, performance, and scalability. Even customers with legacy and monolithic application stacks are embarking on an application modernization journey and opting for this type of architecture. A microservice architecture allows applications to be composed of several loosely coupled discreet services that are independently deployable, scalable, and maintainable. These applications can comprise a large number of microservices, which often span multiple business units within an organization. These customers typically have a multi-account AWS environment with each AWS account belonging to an individual business unit. Their microservice implementations reside in the Virtual Public Clouds (VPCs) of their respective AWS accounts. You can set up multi-account AWS environment incorporating best practices using AWS Landing Zone or AWS Control Tower.
This type of multi-account, multi-VPC architecture provides a good boundary and isolation for individual microservices and achieves a highly available, scalable, and secure architecture. However, for microservices that require a high degree of interconnectivity and are within the same trust boundaries, you can use other AWS capabilities to optimize cost and network management complexity.
This blog presents a cost-effective approach that requires less VPC management while still using separate accounts for billing and access control. This approach does not sacrifice scalability, high availability, fault tolerance, and security. To achieve a similar microservice architecture, you can share a VPC across AWS accounts using AWS Resource Access Manager (AWS RAM) and Network Load Balancer (NLB) support in a shared Amazon Virtual Private Cloud (VPC). This allows multiple microservices to coexist in the same VPC, even though they are developed by different business units.
Microservices architecture in a multi-VPC approach
In this architecture, microservices deployed across multiple VPCs use privately exposed endpoints for better security posture instead of going over the internet. This requires the customers to enable inter-VPC communication using the various networking capabilities of AWS as shown below:
In the above reference architecture, we created a VPC in Account A, which is hosting the front end of the application across a fleet of Amazon Elastic Compute Cloud (Amazon EC2) instances using an AWS Auto Scaling group. For simplicity, we’ve illustrated a single public and private subnet for the application front end. In reality, this spans across multiple subnets across multiple Availability Zones (AZ) to support a highly available and fault-tolerant configuration.
To ensure security, the application must communicate privately to microservices mS1 and mS2 deployed in VPC of Account B and Account C respectively. For high availability, these microservices are also implemented using a fleet of Amazon EC2 instances with the Auto Scaling group spanning across multiple subnets/availability zones. For high-performance load balancing, they are fronted by a Network Load Balancer.
While this architecture shows an implementation using Amazon EC2, it can also use containerized services deployed using Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). These microservices may have interdependencies and invoke each other’s’ APIs for servicing the requests of the application layer. This application to mS and mS to mS communication can be achieved using following possible connectivity options:
When only few VPC interconnections are required, Amazon VPC peering and AWS PrivateLink may be a viable option. For higher number of VPC interconnections, we recommend AWS Transit Gateway for better manageability of connections and routing through a centralized resource. However, based on the amount of traffic this can introduce significant costs to your architecture.
Alternative approach to microservice architecture using Network Load Balancers in a shared VPC
The above architecture pattern allows your individual microservice teams to continue to own their AWS resources that host their microservice implementation. But they can deploy them in a shared VPC owned by the central account, eliminating the need for inter-VPC network connections. You can share Amazon VPCs to use the implicit routing within a VPC for applications that require a high degree of interconnectivity and are within the same trust boundaries.
This architecture uses AWS RAM, which allows you to share the VPC Subnets from AWS Account A to participating AWS accounts within your AWS organization. When the subnets are shared, participant AWS accounts (Account B and Account C) can see the shared subnets in their own environment. They can then deploy their Amazon EC2 instances in those subnets. This is depicted in the diagram where the visibility of the shared subnets (SS1 and SS2) is extended to the participating accounts (Account B and Account C).
You can also deploy the NLB in these shared subnets. Then, each participant account owns all the AWS resources for their microservice stack, but it’s deployed in the VPC of Account A.
This allows your individual microservice teams to maintain control over load balancer configurations and Auto Scaling policies based for their specific microservices’ needs. At the same time, using the AWS RAM they are able to effectively use the existing VPC environment of Account A.
This architecture presents several benefits over the multi-VPC architecture discussed earlier:
You can deploy the entire application, including the individual microservices, into a single shared VPC. This is while still allowing individual microservice teams control over their AWS resources deployed in that VPC.
Since the entire architecture now resides in a single VPC, it doesn’t require other networking connectivity features. It can rely on intra-VPC traffic for communication between the application (API) layer and microservices.
This leads to reduction in cost of the architecture. While the AWS RAM functionality is free of charge, this also reduces the data transfer and per-connection costs incurred by other options such as VPC peering, AWS PrivateLink, and AWS Transit Gateway.
This maintains the isolation across the individual microservices and the application layer. Participants can’t view, modify, or delete resources that belong to others or the VPC owner.
This also leads to effective utilization of your VPC CIDR block resources.
Since multiple subnets belonging to different Availability Zones are shared, the application and individual mS continues to take advantage of scalability, availability, and fault tolerance.
The following illustration shows how you can configure AWS RAM to set up the VPC subnet resource shares between owner Account A and participating Account B. The example below shows the sharing of private subnet SS1 using this method:
(Click for larger image)
Once this subnet is shared, the participating Account B can launch its Network Load Balancer of its microservice ms1 in the shared VPC subnet as shown below:
While this architecture has many advantages, there are important considerations:
This style of architecture is suitable when you are certain that the number of microservices is small enough to coexist in a single VPC without depleting the CIDR block of the shared subnets of the VPC.
If the traffic between these microservices is in-significant, then the cost benefit of this architecture over other options may not be substantial. This is due to the effect of traffic flow on data transfer cost.
Conclusion
AWS Cloud provides several options to build a microservices architecture. It is important to look at the characteristics of your application to determine which architectural choices top opt for. The AWS RAM and the ability to deploy AWS resources (including Network Load Balancers in shared VPC) helps you eliminate inter-VPC traffic and associated networking costs. And this without sacrificing high availability, scalability, fault tolerance, and security for your application.
When I was delivering the Architecting on AWS class, customers often asked me how to configure an Amazon Virtual Private Cloud to enforce the same network security policies in the cloud as they have on-premises. For example, to scan all ingress traffic with an Intrusion Detection System (IDS) appliance or to use the same firewall in the cloud as on-premises. Until today, the only answer I could provide was to route all traffic back from their VPC to an on-premises appliance or firewall in order to inspect the traffic with their usual networking gear before routing it back to the cloud. This is obviously not an ideal configuration, it adds latency and complexity.
Today, we announce new VPC networking routing primitives to allow to route all incoming and outgoing traffic to/from an Internet Gateway (IGW) or Virtual Private Gateway (VGW) to a specific Amazon Elastic Compute Cloud (EC2) instance’s Elastic Network Interface. It means you can now configure your Virtual Private Cloud to send all traffic to an EC2 instance before the traffic reaches your business workloads. The instance typically runs network security tools to inspect or to block suspicious network traffic (such as IDS/IPS or Firewall) or to perform any other network traffic inspection before relaying the traffic to other EC2 instances.
How Does it Work? To learn how it works, I wrote this CDK script to create a VPC with two public subnets: one subnet for the appliance and one subnet for a business application. The script launches two EC2 instances with public IP address, one in each subnet. The script creates the below architecture:
This is a regular VPC, the subnets have routing tables to the Internet Gateway and the traffic flows in and out as expected. The application instance hosts a static web site, it is accessible from any browser. You can retrieve the application public DNS name from the EC2 Console (for your convenience, I also included the CLI version in the comments of the CDK script).
Configure Routing To configure routing, you need to know the VPC ID, the ENI ID of the ENI attached to the appliance instance, and the Internet Gateway ID. Assuming you created the infrastructure using the CDK script I provided, here are the commands I use to find these three IDs (be sure to adjust to the AWS Region you use):
To route all incoming traffic through my appliance, I create a routing table for the Internet Gateway and I attach a rule to direct all traffic to the EC2 instance Elastic Network Interface (ENI):
# create a new routing table for the Internet Gateway
ROUTE_TABLE_ID=$(aws ec2 create-route-table \
--region $AWS_REGION \
--vpc-id $VPC_ID \
--query "RouteTable.RouteTableId" \
--output text)
# create a route for 10.0.1.0/24 pointing to the appliance ENI
aws ec2 create-route \
--region $AWS_REGION \
--route-table-id $ROUTE_TABLE_ID \
--destination-cidr-block 10.0.1.0/24 \
--network-interface-id $ENI_ID
# associate the routing table to the Internet Gateway
aws ec2 associate-route-table \
--region $AWS_REGION \
--route-table-id $ROUTE_TABLE_ID \
--gateway-id $IGW_ID
Alternatively, I can use the VPC Console under the new Edge Associations tab.
To route all application outgoing traffic through the appliance, I replace the default route for the application subnet to point to the appliance’s ENI:
SUBNET_ID=$(aws ec2 describe-instances \
--region $AWS_REGION \
--query "Reservations[].Instances[] | [?Tags[?Key=='Name' && Value=='application']].NetworkInterfaces[].SubnetId" \
--output text)
ROUTING_TABLE=$(aws ec2 describe-route-tables \
--region $AWS_REGION \
--query "RouteTables[?VpcId=='${VPC_ID}'] | [?Associations[?SubnetId=='${SUBNET_ID}']].RouteTableId" \
--output text)
# delete the existing default route (the one pointing to the internet gateway)
aws ec2 delete-route \
--region $AWS_REGION \
--route-table-id $ROUTING_TABLE \
--destination-cidr-block 0.0.0.0/0
# create a default route pointing to the appliance's ENI
aws ec2 create-route \
--region $AWS_REGION \
--route-table-id $ROUTING_TABLE \
--destination-cidr-block 0.0.0.0/0 \
--network-interface-id $ENI_ID
aws ec2 associate-route-table \
--region $AWS_REGION \
--route-table-id $ROUTING_TABLE \
--subnet-id $SUBNET_ID
Alternatively, I can use the VPC Console. Within the correct routing table, I select the Routes tab and click Edit routes to replace the default route (the one pointing to 0.0.0.0/0) to target the appliance’s ENI.
Now I have the routing configuration in place. The new routing looks like:
Configure the Appliance Instance Finally, I configure the appliance instance to forward all traffic it receives. Your software appliance usually does that for you, no extra step is required when you use AWS Marketplace appliances. When using a plain Linux instance, two extra steps are required:
1. Connect to the EC2 appliance instance and configure IP traffic forwarding in the kernel:
Now, the appliance is ready to forward traffic to the other EC2 instances. You can test this by pointing your browser (or using `cURL`) to the application instance.
To verify the traffic is really flowing through the appliance, you can enable source/destination check on the instance again (use --source-dest-check parameter with the modify-instance-attributeCLI command above). The traffic is blocked when Source/Destination check is enabled.
Cleanup Should you use the CDK script I provided for this article, be sure to run cdk destroy when finished. This ensures you are not billed for the two EC2 instances I use for this demo. As I modified routing tables behind the back of AWS CloudFormation, I need to manually delete the routing tables, the subnet and the VPC. The easiest is to navigate to the VPC Console, select the VPC and click Actions => Delete VPC. The console deletes all components in the correct order. You might need to wait 5-10 minutes after the end of cdk destroy before the console is able to delete the VPC.
From our Partners During the beta test of these new routing capabilities, we granted early access to a collection of AWS partners. They provided us with tons of helpful feedback. Here are some of the blog posts that they wrote in order to share their experiences (I am updating this article with links as they are published):
128 Technology
Aviatrix
Checkpoint
Cisco
Citrix
FireEye
Fortinet
HashiCorp
IBM Security
Lastline
Netscout
Palo Alto Networks
ShieldX Networks
Sophos
Trend Micro
Valtix
Vectra AI
Versa Networks
Availability There is no additional costs to use Virtual Private Cloud ingress routing. It is available in all regions (including AWS GovCloud (US-West)) and you can start to use it today.
You can learn more about gateways routing tables in the updated VPC documentation.
What are the appliances you are going to use with this new VPC routing capability?
Amazon Web Services (AWS) serves more than a million private and public sector organizations all over the world from its extensive and expanding global infrastructure.
Like other countries, organizations all around New Zealand are using AWS to change the way they operate. For example, Xero, a Wellington-based online accountancy software vendor, now serves customers in more than 100 countries, while the Department of Conservation provides its end users with virtual desktops running in Amazon Workspaces.
New Zealand doesn’t currently have a dedicated AWS Region. Geographically, the closest is Asia Pacific (Sydney), which is 2,000 kilometers (km) away, across a deep sea. While customers rely on AWS for business-critical workloads, they are well-served by New Zealand’s international connectivity.
To connect to Amazon’s network, our New Zealand customers have a range of options:
Public internet endpoints
Managed or software Virtual Private Networks (VPN)
All rely on the extensive internet infrastructure connecting New Zealand to the world.
International Connectivity
The vast majority of internet traffic is carried over physical cables, while the percentage of traffic moving over satellite or wireless links is small by comparison.
Historically, cables were funded and managed by consortia of telecommunication providers. More recently, large infrastructure and service providers like AWS have contributed to or are building their own cable networks.
There are currently about 400 submarine cables in service globally. Modern submarine cables are fiber-optic, run for thousands of kilometers, and are protected by steel strands, plastic sheathing, copper, and a chemical water barrier. Over that distance, the signal can weaken—or attenuate—so signal repeaters are installed approximately every 50km to mitigate attenuation. Repeaters are powered by a charge running over the copper sheathing in the cable.
An example of submarine cable composition.. Source: WikiMedia Commons
For most of their run, these cables are about as thick as a standard garden hose. They are thicker, however, closer to shore and in areas where there’s a greater risk of damage by fishing nets, boat anchors, etc.
Cables can—and do—break, but redundancy is built into the network. According to Telegeography, there are 100 submarine cable faults globally every year. However, most faults don’t impact users meaningfully.
Southern Cross B: Takapuna (Auckland, New Zealand) -> Spencer Beach (Hawaii, USA) – 1.2 Tbps
A map of major submarine cables connecting to New Zealand. Source submarinecablemap.com
The four cables combined currently deliver 66 Tbps of available capacity. The Southern Cross NEXT cable is due to come online in 2020, which will add another 72 Tbps. These are, of course, potential capacities; it’s likely the “lit” capacity—the proportion of the cables’ overall capacity that is actually in use—is much lower.
Connecting to AWS from New Zealand
While understanding the physical infrastructure is important in practice, these details are not shared with customers. Connectivity options are evaluated on the basis of partner and AWS offerings, which include connectivity.
Customers connect to AWS in three main ways: over public endpoints, via site-to-site VPNs, and via Direct Connect (DX), all typically provided by partners.
Public Internet Endpoints
Customers can connect to public endpoints for AWS services over the public internet. Some services, like Amazon CloudFront, Amazon API Gateway, and Amazon WorkSpaces are generally used in this way.
Many organizations use a VPN to connect to AWS. It’s the simplest and lowest cost entry point to expose resources deployed in private ranges in an Amazon VPC. Amazon VPC allows customers to provision a logically isolated network segment, with fine-grained control of IP ranges, filtering rules, and routing.
AWS offers a managed site-to-site VPN service, which creates secure, redundant Internet Protocol Security (IPSec) VPNs, and also handles maintenance and high-availability while integrating with Amazon CloudWatch for robust monitoring.
If using an AWS managed VPN, the AWS endpoints have publicly routable IPs. They can be connected to over the public internet or via a Public Virtual Interface over DX (outlined below).
Customers can also deploy VPN appliances onto Amazon Elastic Compute Cloud (EC2) instances running in their VPC. These may be self-managed or provided by Amazon Marketplace sellers.
AWS also offers AWS Client VPN, for direct user access to AWS resources.
AWS Direct Connect
While connectivity over the internet is secure and flexible, it has one major disadvantage: it’s unpredictable. By design, traffic traversing the internet can take any path to reach its destination. Most of the time it works but occasionally routing conditions may reduce capacity or increase latency.
DX connections are either 1 or 10 Gigabits per second (Gbps). This capacity is dedicated to the customer; it isn’t shared, as other network users are never routed over the connection. This means customers can rely on consistent latency and bandwidth. The DX per-Gigabit transfer cost is lower than other egress mechanisms. For customers transferring large volumes of data, DX may be more cost effective than other means of connectivity.
Customers may publish their own 802.11q Virtual Local Area Network (VLAN) tags across the DX, and advertise routes via Border Gateway Protocol (BGP). A dedicated connection supports up to 50 private or public virtual interfaces. New Zealand does not have a physical point-of-presence for DX—users must procure connectivity to our Sydney Region. Many AWS Partner Network (APN) members in New Zealand offer this connectivity.
For customers who don’t want or need to manage VLANs to AWS—or prefer 1 Gbps or smaller links —APN partners offer hosted connections or hosted virtual interfaces. For more detail, please review our AWS Direct Connect Partners page.
Performance
There are physical limits to latency dictated by the speed of light, and the medium through which optical signals travel. Southern Cross publishes latency statistics, and it sees one-way latency of approximately 11 milliseconds (ms) over the 2,276km Alexandria to Whenuapai link. Double that for a round-trip to 22 ms.
In practice, we see customers achieving round-trip times from user workstations to Sydney in approximately 30-50 ms, assuming fair-weather internet conditions or DX links. Latency in Auckland (the largest city) tends to be on the lower end of that spectrum, while the rest of the country tends towards the higher end.
Bandwidth constraints are more often dictated by client hardware, but AWS and our partners offer up to 10 Gbps links, or smaller as required. For customers that require more than 10 Gbps over a single link, AWS supports Link Aggregation Groups (LAG).
As outlined above, there are a range of ways for customers to adopt AWS via secure, reliable, and performant networks. To discuss your use case, please contact an AWS Solutions Architect.
Since its inception, the Amazon Virtual Private Cloud (VPC) has acted as the embodiment of security and privacy for customers who are looking to run their applications in a controlled, private, secure, and isolated environment.
This logically isolated space has evolved, and in its evolution has increased the avenues that customers can take to create and manage multi-tenant environments with multiple integration points for access to resources on-premises.
This blog is a two-part series that begins with a look at the Amazon VPC as a single unit of networking in the AWS Cloud but eventually takes you to a world in which simplified architectures for establishing a global network of VPCs are possible.
From One VPC: Single Unit of Networking
To be successful with the AWS Virtual Private Cloud you first have to define success for today and what success might look like as your organization’s adoption of the AWS cloud increases and matures. In essence, your VPCs should be designed to satisfy the needs of your applications today and must be scalable to accommodate future needs.
Classless Inter-Domain Routing (CIDR) notations are used to denote the size of your VPC. AWS allows you specify a CIDR block between /16 and /28. The largest, /16, provides you with 65,536 IP addresses and the smallest possible allowed CIDR block, /28, provides you with 16 IP addresses. Note, the first four IP addresses and the last IP address in each subnet CIDR block are not available for you to use, and cannot be assigned to an instance.
AWS VPC supports both IPv4 and IPv6. It is required that you specify an IPv4 CIDR range when creating a VPC. Specifying an IPv6 range is optional.
Customers can specify ANY IPv4 address space for their VPC. This includes but is not limited to RFC 1918 addresses.
After creating your VPC, you divide it into subnets. In an AWS VPC, subnets are not isolation boundaries around your application. Rather, they are containers for routing policies.
Isolation is achieved by attaching an AWS Security Group (SG) to the EC2 instances that host your application. SGs are stateful firewalls, meaning that connections are tracked to ensure return traffic is allowed. They control inbound and outbound access to the elastic network interfaces that are attached to an EC2 instance. These should be tightly configured, only allowing access as needed.
It is our best practice that subnets should be created in categories. There two main categories; public subnets and private subnets. At minimum they should be designed as outlined in the below diagrams for IPv4 and IPv6 subnet design.
Recommended IPv4 subnet design pattern
Recommended IPv6 subnet design pattern
Subnet types are denoted by the ability and inability for applications and users on the internet to directly initiate access to infrastructure within a subnet.
Public Subnets
Public subnets are attached to a route table that has a default route to the Internet via an Internet gateway.
Resources in a public subnet can have a public IP or Elastic IP (EIP) that has a NAT to the Elastic Network Interface (ENI) of the virtual machines or containers that hosts your application(s). This is a one-to-one NAT that is performed by the Internet gateway.
Illustration of public subnet access path to the Internet through the Internet Gateway (IGW)
Private Subnets
A private subnet contains infrastructure that isn’t directly accessible from the Internet. Unlike the public subnet, this infrastructure only has private IPs.
Infrastructure in a private subnet gain access to resources or users on the Internet through a NAT infrastructure of sorts.
AWS natively provides NAT capability through the use of the NAT Gateway service. Customers can also create NAT instances that they manage or leverage third-party NAT appliances from the AWS Marketplace.
In most scenarios, it is recommended to use the AWS NAT Gateway as it is highly available (in a single Availability Zone) and is provided as a managed service by AWS. It supports 5 Gbps of bandwidth per NAT gateway and automatically scales up to 45 Gbps.
An AWS NAT gateway’s high availability is confined to a single Availability Zone. For high availability across AZs, it is recommended to have a minimum of two NAT gateways (in different AZs). This allows you to switch to an available NAT gateway in the event that one should become unavailable.
This approach allows you to zone your Internet traffic, reducing cross Availability Zone connections to the Internet. More details on NAT gateway are available here.
Illustration of an environment with a single NAT Gateway (NAT-GW)
Illustration of high availability with a multiple NAT Gateways (NAT-GW) attached to their own route table
Illustration of the failure of one NAT Gateway and the fail over to an available NAT Gateway by the manual changing of the default route next hop in private subnet A route table
AWS allocated IPv6 addresses are Global Unicast Addresses by default. That said, you can privatize these subnets by using an Egress-Only Internet Gateway (E-IGW), instead of a regular Internet gateway. E-IGWs are purposely built to prevents users and applications on the Internet from initiating access to infrastructure in your IPv6 subnet(s).
Illustration of internet access for hybrid IPv6 subnets through an Egress-Only Internet Gateway (E-IGW)
Applications hosted on instances living within a private subnet can have different access needs. Some require access to the Internet while others require access to databases, applications, and users that are on-premises. For this type of access, AWS provides two avenues: the Virtual Gateway and the Transit Gateway. The Virtual Gateway can only support a single VPC at a time, while the Transit Gateway is built to simplify the interconnectivity of tens to hundreds of VPCs and then aggregating their connectivity to resources on-premises. Given that we are looking at the VPC as a single unit of networking, all diagrams below contain illustrations of the Virtual Gateway which acts a WAN concentrator for your VPC.
Illustration of private subnets connecting to data center via a Virtual Gateway (VGW)
Illustration of private subnets connecting to Data Center via a VGW
Illustration of private subnets connecting to Data Center using AWS Direct Connect as primary and IPsec as backup
The above diagram illustrates a WAN connection between a VGW attached to a VPC and a customer’s data center.
AWS Site-to-Site VPN configuration leverages IPSec with each connection providing two redundant IPSec tunnels. AWS support both static routing and dynamic routing (through the use of BGP).
BGP is recommended, as it allows dynamic route advertisement, high availability through failure detection, and fail over between tunnels in addition to decreased management complexity.
VPC Endpoints: Gateway & Interface Endpoints
Applications running inside your subnet(s) may need to connect to AWS public services (like Amazon S3, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon API Gateway, etc.) or applications in another VPC that lives in another account. For example, you may have a database in another account that you would like to expose applications that lives in a completely different account and subnet.
For these scenarios you have the option to leverage an Amazon VPC Endpoint.
There are two types of VPC Endpoints: Gateway Endpoints and Interface Endpoints.
Gateway Endpoints only support Amazon S3 and Amazon DynamoDB. Upon creation, a gateway is added to your specified route table(s) and acts as the destination for all requests to the service it is created for.
Interface Endpoints differ significantly and can only be created for services that are powered by AWS PrivateLink.
Upon creation, AWS creates an interface endpoint consisting of one or more Elastic Network Interfaces (ENIs). Each AZ can support one interface endpoint ENI. This acts as a point of entry for all traffic destined to a specific PrivateLink service.
When an interface endpoint is created, associated DNS entries are created that point to the endpoint and each ENI that the endpoint contains. To access the PrivateLink service you must send your request to one of these hostnames.
As illustrated below, ensure the Private DNS feature is enabled for AWS public and Marketplace services:
Since interface endpoints leverage ENIs, customers can use cloud techniques they are already familiar with. The interface endpoint can be configured with a restrictive security group. These endpoints can also be easily accessed from both inside and outside the VPC. Access from outside a VPC can be accomplished through Direct Connect and VPN.
Illustration of a solution that leverages an interface and gateway endpoint
Customers can also create AWS Endpoint services for their applications or services running on-premises. This allows access to these services via an interface endpoint which can be extended to other VPCs (even if the VPCs themselves do not have Direct Connect configured).
VPC Sharing
At re:Invent 2018, AWS launched the feature VPC sharing, which helps customers control VPC sprawl by decoupling the boundary of an AWS account from the underlying VPC network that supports its infrastructure.
VPC sharing uses Amazon Resource Access Manager (RAM) to share subnets across accounts within the same AWS organization.
VPC sharing is defined as:
VPC sharing allows customers to centralize the management of network, its IP space and the access paths to resources external to the VPC. This method of centralization and reuse (of VPC components such as NAT Gateway and Direct Connect connections) results in a reduction of cost to manage and maintain this environment.
Great, but there are times when a customer needs to build networks with multiple VPCs in and across AWS regions. How should this be done and what are the best practices?
This post is contributed by Mahmoud ElZayet | Specialist SA – Dev Tech, AWS
Modern application development processes enable organizations to improve speed and quality continually. In this innovative culture, small, autonomous teams own the entire application life cycle. While such nimble, autonomous teams speed product delivery, they can also impose costs on compliance, quality assurance, and code deployment infrastructures.
Standardized tooling and application release code helps share best practices across teams, reduce duplicated code, speed on-boarding, create consistent governance, and prevent resource over-provisioning.
Overview
In this post, I show you how to use AWS Service Catalog to provide standardized and automated deployment blueprints. This helps accelerate and improve your product teams’ application release workflows on Amazon ECS. Follow my instructions to create a sample blueprint that your product teams can use to release containerized applications on ECS. You can also apply the blueprint concept to other technologies, such as serverless or Amazon EC2–based deployments.
The sample templates and scripts provided here are for demonstration purposes and should not be used “as-is” in your production environment. After you become familiar with these resources, create customized versions for your production environment, taking account of in-house tools and team skills, as well as all applicable standards and restrictions.
Prerequisites
To use this solution, you need the following resources:
Example Corp. has various product teams that develop applications and services on AWS. Example Corp. teams have expressed interest in deploying their containerized applications managed by AWS Fargate on ECS. As part of Example Corp’s central tooling team, you want to enable teams to quickly release their applications on Fargate. However, you also make sure that they comply with all best practices and governance requirements.
For convenience, I also assume that you have supplied product teams working on the same domain, application, or project with a shared AWS account for service deployment. Using this account, they all deploy to the same ECS cluster.
In this scenario, you can author and provide these teams with a shared deployment blueprint on ECS Fargate. Using AWS Service Catalog, you can share the blueprint with teams as follows:
Every time that a product team wants to release a new containerized application on ECS, they retrieve a new AWS Service Catalog ECS blueprint product. This enables them to obtain the required infrastructure, permissions, and tools. As a prerequisite, the ECS blueprint requires building blocks such as a git repository or an AWS CodeBuild project. Again, you can acquire those blocks through another AWS Service Catalog product.
The product team completes the ECS blueprint’s required parameters, such as the desired number of ECS tasks and application name. As an administrator, you can constrain the value of some parameters such as the VPC and the cluster name. For more information, see AWS Service Catalog Template Constraints.
The ECS blueprint product deploys all the required ECS resources, configured according to best practices. You can also use the AWS Cloud Development Kit (CDK) to maintain and provision pre-defined constructs for your infrastructure.
A standardized CI/CD pipeline also generates, enabling your product teams to publish their application to ECS automatically. Ideally, this pipeline should have all stages, practices, security checks, and standards required for application release. Product teams must still author application code, create a Dockerfile, build specifications, run automated tests and deployment scripts, and complete other tasks required for application release.
The ECS blueprint can be continually updated based on organization-wide feedback and to support new use cases. Your product team can always access the latest version through AWS Service Catalog. I recommend retaining multiple, customizable blueprints for various technologies.
For simplicity’s sake, my explanation envisions your environment as consisting of one AWS account. In practice, you can use IAM controls to segregate teams’ access to each other’s resources, even when they share an account. However, I recommend having at least two AWS accounts, one for testing and one for production purposes.
To see an example framework that helps deploy your AWS Service Catalog products to multiple accounts, see AWS Deployment Framework (ADF). This framework can also help you create cross-account pipelines that cater to different product teams’ needs, even when these teams deploy to the same technology stack.
To set up shared deployment blueprints for your production teams, follow the steps outlined in the following sections.
Set up the environment
In this section, I explain how to create a central ECS cluster in the appropriate VPC where teams can deploy their containers. I provide an AWS CloudFormation template to help you set up these resources. This template also creates an IAM role to be used by AWS Service Catalog later.
To run the CloudFormation template:
1. Use a git client to clone the following GitHub repository to a local directory. This will be the directory where you will run all the subsequent AWS CLI commands.
2. Using the AWS CLI, run the following commands. Replace <Application_Name> with a lowercase string with no spaces representing the application or microservice that your product team plans to release—for example, myapp.
4. In case of error, use the describe-events CLI command or review error details on the console.
5. When the stack creation reads CREATE_COMPLETE, run the following command, and make a note of the output values in an editor of your choice. You need this information for a later step:
6. Run the following commands to copy those CloudFormation templates to Amazon S3. Replace <Template_Bucket_Name> with the template bucket output value you just copied into your editor of choice:
In this section, I show you how to create two AWS Service Catalog products for teams to use in publishing their containerized app:
Core Build Tools
ECS Fargate Deployment Blueprint
To create an AWS Service Catalog portfolio that includes these products:
1. Using the AWS CLI, run the following command, replacing <Application_Name> with the application name you defined earlier and replacing <Template_Bucket_Name> with the template bucket output value you copied into your editor of choice:
3. In case of error, use the describe-events CLI command or check error details in the console.
Your AWS Service Catalog configuration should now be ready.
Test product teams experience
In this section, I show you how to use IAM roles to impersonate a product team member and simulate their first experience of containerized application deployment.
Assume team role
To assume the role that you created during the environment setup step
1. In the Management console, follow the instructions in Switching a Role.
For Account, enter the account ID used in the sample solution. To learn more about how to find an AWS account ID, see Your AWS Account ID and Its Alias.
For Role, enter <Application_Name>-product-team-role, where <Application_Name> is the same application name you defined in Environment Setup section.
(Optional) For Display name, enter a custom session value.
You are now logged in as a member of the product team.
Provision core build product
Next, provision the core build tools for your blueprint:
In the Service Catalog console, you should now see the two products created earlier listed under Products.
Select the first product, Core Build Tools.
Choose LAUNCH PRODUCT.
Name the product something such as <Application_Name>-build-tools, replacing <Application_Name> with the name previously defined for your application.
Provide the same application name you defined previously.
Leave the ContainerBuild parameter default setting as yes, as you are building a container requiring a container repository and its associated permissions.
Choose NEXT three times, then choose LAUNCH.
Under Events, watch the Status property. Keep refreshing until the status reads Succeeded. In case of failure, choose the URL value next to the key CloudformationStackARN. This choice takes you to the CloudFormation console, where you can find more information on the errors.
Now you have the following build tools created along with the required permissions:
AWS CodeCommit repository to store your code
CodeBuild project to build your container image and test your application code
Amazon ECR repository to store your container images
Amazon S3 bucket to store your build and release artifacts
Provision ECS Fargate deployment blueprint
In the Service Catalog console, follow the same steps to deploy the blueprint for ECS deployment. Here are the product provisioning details:
For the parameters Subnet1, Subnet2, VpcId, enter the output values you copied earlier into your editor of choice in the Setup Environment section.
For other parameters, enter the following:
ApplicationName: The same application name you defined previously.
ClusterName: Enter the value example-corp-ecs-cluster, which is the name chosen in the template for the central cluster.
Leave the DesiredCount and LaunchType parameters to their default values.
After the blueprint product creation completes, you should have an ECS service with a sample task definition for your product team. The build tools created earlier include the permissions required for deploying to the ECS service. Also, a CI/CD pipeline has been created to guide your product teams as they publish their application to the ECS service. Ideally, this pipeline should have all stages, practices, security checks, and standards required for application release.
Product teams still have to author application code, create a Dockerfile, build specifications, run automated tests and deployment scripts, and perform other tasks required for application release. The blueprint product can provide wiki links to reference examples for these steps, or access to pre-provisioned sample pipelines.
Test your pipeline
Now, upload a sample app to test your pipeline:
Log in with the product team role.
In the CodeCommit console, select the repository with the application name that you defined in the environment setup section.
Scroll down, choose Add file, Create file.
Paste the following in the page editor, which is a script to build the container image and push it to the ECR repository:
6. For Author name and Email address, enter your name and your preferred email address for the commit. Although optional, the addition of a commit message is a good practice.
7. Choose Commit changes.
8. Repeat the same steps for the Dockerfile. The sample Dockerfile creates a straightforward PHP application. Typically, you add your application content to that image.
File name: Dockerfile
File content:
FROM ubuntu:12.04
# Install dependencies
RUN apt-get update -y
RUN apt-get install -y git curl apache2 php5 libapache2-mod-php5 php5-mcrypt php5-mysql
# Configure apache
RUN a2enmod rewrite
RUN chown -R www-data:www-data /var/www
ENV APACHE_RUN_USER www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR /var/log/apache2
EXPOSE 80
CMD ["/usr/sbin/apache2", "-D", "FOREGROUND"]
Your pipeline should now be ready to run successfully. Although you can list all current pipelines in the Region, you can only describe and modify pipelines that have a prefix matching your application name. To confirm:
In the AWS CodePipeline console, select the pipeline <Application_Name>-ecs-fargate-pipeline.
The pipeline should now be running.
Because you performed two commits to the repository from the console, you must wait for the second run to complete before successful deployment to ECS Fargate.
Clean up
To clean up the environment, run the following commands in the AWS CLI, replacing <Application_Name> with your application name, <Account_Id> with your AWS Account ID with no hyphens and <Template_Bucket_Name> with the template bucket output value you copied into your editor of choice:
In this post, I showed you how to design and build ECS Fargate deployment blueprints. I explained how these accelerate and standardize the release of containerized applications on AWS. Your product teams can keep getting the latest standards and coded best practices through those automated blueprints.
As always, AWS welcomes feedback. Please submit comments or questions below.
If your organization is using software as a service (SaaS), your data is likely stored and protected by the SaaS provider. However, depending on the type of data that your organization stores and the compliance requirements that it must meet, you might need more control over how the encryption keys are stored, protected, and used. In this post, I’ll show you two options for deploying and managing your own CloudHSM cluster to secure your keys, while still allowing trusted third-party SaaS providers to securely access your HSM cluster in order to perform cryptographic operations. You can also use this architecture when you want to share your keys with another business unit or with an application that’s running in a separate AWS account.
AWS CloudHSM is one of several cryptography services provided by AWS to help you secure your data and keys in the AWS cloud. AWS CloudHSM provides single-tenant HSMs based on third-party FIPS 140-2 Level 3 validated hardware, under your control, in your Amazon Virtual Private Cloud (Amazon VPC). You can generate and use keys on your HSM using CloudHSM command line tools or standards-compliant C, Java, and OpenSSL SDKs.
A related, more widely used service is AWS Key Management Service (KMS). KMS is generally easier to use, cheaper to operate, and is natively integrated with most AWS services. However, there are some use cases for which you may choose to rely on CloudHSM to meet your security and compliance requirements.
Solution Overview
There are two ways you can set up your VPC and CloudHSM clusters to allow trusted third-party SaaS providers to use the HSM cluster for cryptographic operations. The first option is to use VPC peering to allow traffic to flow between the SaaS provider’s HSM client VPC and your CloudHSM VPC, and to utilize a custom application to harness the HSM.
The second option is to use KMS to manage the keys, specifying a custom key store to generate and store the keys. AWS KMS supports custom key stores backed by AWS CloudHSM clusters. When you create an AWS KMS customer master key (CMK) in a custom key store, AWS KMS generates and stores non-extractable key material for the CMK in an AWS CloudHSM cluster that you own and manage.
Decision Criteria: VPC Peering vs Custom Key Store
The right solution for you will depend on factors like your VPC configuration, security requirements, network setup, and the type of cryptographic operations you need. The following table provides a high-level summary of how these two options compare. Later in this post, I’ll go over both options in detail and explain the design considerations you need to be aware of before deploying the solution in your environment.
Technical Considerations
Solution
VPC Peering
Custom Keystore
Are you able to peer or connect your HSM VPC with your SaaS provider?
Is your SaaS provider sensitive to costs from KMS usage in their AWS account?
Does your SaaS provider need to encrypt your data directly with the Master Key?
Does your application rely on a PKCS#11-compliant or JCE-compliant SDK?
Does your SaaS provider need to use the keys in AWS services?
Do you need to log all key usage activities when SaaS providers use your HSM keys?
Option 1: VPC Peering
Figure 1: Architecture diagram showing VPC peering between the SaaS provider’s HSM client VPC and the customer’s HSM VPC
Figure 1 shows how you can deploy a CloudHSM cluster in a dedicated HSM VPC and peer this HSM VPC with your service provider’s VPC to allow them to access the HSM cluster through the client/application. I recommend that you deploy the CloudHSM cluster in a separate HSM VPC to limit the scope of resources running in that VPC. Since VPC peering is not transitive, service providers will not have access to any resources in your application VPCs or any other VPCs that are peered with the HSM VPC.
It’s possible to leverage the HSM cluster for other purposes and applications, but you should be aware of the potential drawbacks before you do. This approach could make it harder for you to find non-overlapping CIDR ranges for use with your SaaS provider. It would also mean that your SaaS provider could accidentally overwrite HSM account credentials or lock out your HSMs, causing an availability issue for your other applications. Due to these reasons, I recommend that you dedicate a CloudHSM cluster for use with your SaaS providers and use small VPC and subnet sizes, like /27, so that you’re not wasting IP space and it’s easier to find non-overlapping IP addresses with your SaaS provider.
If you’re using VPC peering, your HSM VPC CIDR cannot overlap with your SaaS provider’s VPC. Deploying the HSM cluster in a separate VPC gives you flexibility in selecting a suitable CIDR range that is non-overlapping with the service provider since you don’t have to worry about your other applications. Also, since you’re only hosting the HSM Cluster in this VPC, you can choose a CIDR range that is relatively small.
Design considerations
Here are additional considerations to think about when deploying this solution in your environment:
VPC peering allow resources in either VPC to communicate with each other as long as security groups, NACLS, and routing allow for it. In order to improve security, place only resources that are meant to be shared in the VPC, and secure communication at the port/protocol level by using security groups.
If you decide to revoke the SaaS provider’s access to your CloudHSM, you have two choices:
At the network layer, you can remove connectivity by deleting the VPC peering or by modifying the CloudHSM security groups to disallow the SaaS provider’s CIDR ranges.
Alternately, you can log in to the CloudHSM as Crypto Officer (CO) and change the password or delete the Crypto user that the SaaS provider is using.
If you’re deploying CloudHSM across multiple accounts or VPCs within your organization, you can also use AWS Transit Gateway to connect the CloudHSM VPC to your application VPCs. Transit Gateway is ideal when you have multiple application VPCs that needs CloudHSM access, as it easily scales and you don’t have to worry about the VPC peering limits or the number of peering connections to manage.
If you’re the SaaS provider, and you have multiple clients who might be interested in this solution, you must make sure that one customer IP space doesn’t overlap with yours. You must also make sure that each customer’s HSM VPC doesn’t overlap with any of the others. One solution is to dedicate one VPC per customer, to keep the client/application dedicated to that customer, and to peer this VPC with your application VPC. This reduces the overlapping CIDR dependency among all your customers.
Option 2: Custom Key Store
As the AWS KMS documentation explains, KMS supports custom key stores backed by AWS CloudHSM clusters. When you create an AWS KMS customer master key (CMK) in a custom key store, AWS KMS generates and stores non-extractable key material for the CMK in an AWS CloudHSM cluster that you own and manage. When you use a CMK in a custom key store, the cryptographic operations are performed in the HSMs in the cluster. This feature combines the convenience and widespread integration of AWS KMS with the added control of an AWS CloudHSM cluster in your AWS account. This option allows you to keep your master key in the CloudHSM cluster but allows your SaaS provider to use your master key securely by using KMS.
Each custom key store is associated with an AWS CloudHSM cluster in your AWS account. When you connect the custom key store to its cluster, AWS KMS creates the network infrastructure to support the connection. Then it logs into the key AWS CloudHSM client in the cluster using the credentials of a dedicated crypto user in the cluster. All of this is automatically set up, with no need to peer VPCs or connect to your SaaS provider’s VPC.
You create and manage your custom key stores in AWS KMS, and you create and manage your HSM clusters in AWS CloudHSM. When you create CMKs in an AWS KMS custom key store, you view and manage the CMKs in AWS KMS. But you can also view and manage their key material in AWS CloudHSM, just as you would do for other keys in the cluster.
The following diagram shows how some keys can be located in a CloudHSM cluster but be visible through AWS KMS. These are the keys that AWS KMS can use for crypto operations performed through KMS.
Figure 2: High level overview of KMS custom key store
While this option eliminates many of the networking components you need to set up for Option 1, it does limit the type of cryptographic operations that your SaaS provider can perform. Since the SaaS provider doesn’t have direct access to CloudHSM, the crypto operations are limited to the encrypt and decrypt operations supported by KMS, and your SaaS provider must use KMS APIs for all of their operations. This is easy if they’re using AWS services which use KMS already, but if they’re performing operations within their application before storing the data in AWS storage services, this approach could be challenging, because KMS doesn’t support all the same types of cryptographic operations that CloudHSM supports.
Figure 3 illustrates the various components that make up a custom key store and shows how a CloudHSM cluster can connect to KMS to create a customer controlled key store.
Figure 3: A cluster of two CloudHSM instances is connected to KMS to create a customer controlled key store
Design Considerations
Note that when using custom key store, you’re creating a kmsuser CU account in your AWS CloudHSM cluster and providing the kmsuser account credentials to AWS KMS.
This option requires your service provider to be able to use KMS as the key management option within their application. Because your SaaS provider cannot communicate directly with the CloudHSM cluster, they must instead use KMS APIs to encrypt the data. If your SaaS provider is encrypting within their application without using KMS, this option may not work for you.
When deploying a custom key store, you must not only control access to the CloudHSM cluster, you must also control access to AWS KMS.
I recommend dedicating an AWS account to the CloudHSM cluster and custom key store, as this simplifies setup. For more information, please refer to Controlling Access to Your Custom Key Store.
Network architecture that is not supported by CloudHSM
Figure 4: Diagram showing the network anti-pattern for deploying CloudHSM
Figure 4 shows various networking technologies, like AWS PrivateLink, Network Address Translation (NAT), and AWS Load Balancers, that cannot be used with CloudHSM when placed between the CloudHSM cluster and the client/application. All of these methods mask the real IPs of the HSM cluster nodes from the client, which breaks the communication between the CloudHSM client and the HSMs.
When the CloudHSM client successfully connects to the HSM cluster, it downloads a list of HSM IP addresses which is then stored and used for subsequent connections. When one of the HSM nodes is unavailable, the client/application will automatically try the IP address of the HSM nodes it knows about. When HSMs are added or removed from the cluster, the client is automatically reconfigured. Since the client relies on a current list of IP addresses to transparently handle high availability and failover within the cluster, masking the real IP address of the HSM node thus breaks the communication between the cluster and the client.
In this blog post, I’ve shown you two options for deploying CloudHSM to store your key material while allowing your SaaS provider to access and use those keys on your behalf. This allows you to remain in control of your encryption keys and use a SaaS solution without compromising security.
It’s important to understand the security requirements, network setup, and type of cryptographic operation for each approach, and to choose the option that aligns the best with your goals. As a best practice, it’s also important to understand how to secure your CloudHSM and KMS deployment and to use necessary role-based access control with minimum privilege. Read more about AWS KMS Best Practices and CloudHSM Best Practices.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS Key Management Service discussion forum.
Want more AWS Security news? Follow us on Twitter.
This post is contributed by Tony Pujals | Senior Developer Advocate, AWS
AWS recently increased the number of elastic network interfaces available when you run tasks on Amazon ECS. Use the account setting called awsvpcTrunking. If you use the Amazon EC2 launch type and task networking (awsvpc network mode), you can now run more tasks on an instance—5 to 17 times as many—as you did before.
As more of you embrace microservices architectures, you deploy increasing numbers of smaller tasks. AWS now offers you the option of more efficient packing per instance, potentially resulting in smaller clusters and associated savings.
Overview
To manage your own cluster of EC2 instances, use the EC2 launch type. Use task networking to run ECS tasks using the same networking properties as if tasks were distinct EC2 instances.
Task networking offers several benefits. Every task launched with awsvpc network mode has its own attached network interface, a primary private IP address, and an internal DNS hostname. This simplifies container networking and gives you more control over how tasks communicate, both with each other and with other services within their virtual private clouds (VPCs).
Task networking also lets you take advantage of other EC2 networking features like VPC Flow Logs. This feature lets you monitor traffic to and from tasks. It also provides greater security control for containers, allowing you to use security groups and network monitoring tools at a more granular level within tasks. For more information, see Introducing Cloud Native Networking for Amazon ECS Containers.
However, if you run container tasks on EC2 instances with task networking, you can face a networking limit. This might surprise you, particularly when an instance has plenty of free CPU and memory. The limit reflects the number of network interfaces available to support awsvpc network mode per container instance.
Raise network interface density limits with trunking
The good news is that AWS raised network interface density limits by implementing a networking feature on ECS called “trunking.” This is a technique for multiplexing data over a shared communication link.
If you’re migrating to microservices using AWS App Mesh, you should optimize network interface density. App Mesh requires awsvpc networking to provide routing control and visibility over an ever-expanding array of running tasks. In this context, increased network interface density might save money.
By opting for network interface trunking, you should see a significant increase in capacity—from 5 to 17 times more than the previous limit. For more information on the new task limits per container instance, see Supported Amazon EC2 Instance Types.
Applications with tasks not hitting CPU or memory limits also benefit from this feature through the more cost-effective “bin packing” of container instances.
Trunking is an opt-in feature
AWS chose to make the trunking feature opt-in due to the following factors:
Instance registration: While normal instance registration is straightforward with trunking, this feature increases the number of asynchronous instance registration steps that can potentially fail. Any such failures might add extra seconds to launch time.
Available IP addresses: The “trunk” belongs to the same subnet in which the instance’s primary network interface originates. This effectively reduces the available IP addresses and potentially the ability to scale out on other EC2 instances sharing the same subnet. The trunk consumes an IP address. With a trunk attached, there are two assigned IP addresses per instance, one for the primary interface and one for the trunk.
Differing customer preferences and infrastructure: If you have high CPU or memory workloads, you might not benefit from trunking. Or, you may not want awsvpc networking.
Consequently, AWS leaves it to you to decide if you want to use this feature. AWS might revisit this decision in the future, based on customer feedback. For now, your account roles or users must opt in to the awsvpcTrunking account setting to gain the benefits of increased task density per container instance.
Enable trunking
Enable the ECS elastic network interface trunking feature to increase the number of network interfaces that can be attached to supported EC2 container instance types. You must meet the following prerequisites before you can launch a container instance with the increased network interface limits:
Your account must have the AWSServiceRoleForECS service-linked role for ECS.
You must opt into the awsvpcTrunking account setting.
Make sure that a service-linked role exists for ECS
A service-linked role is a unique type of IAM role linked to an AWS service (such as ECS). This role lets you delegate the permissions necessary to call other AWS services on your behalf. Because ECS is a service that manages resources on your behalf, you need this role to proceed.
In most cases, you won’t have to create a service-linked role. If you created or updated an ECS cluster, ECS likely created the service-linked role for you.
You can confirm that your service-linked role exists using the AWS CLI, as shown in the following code example:
Your account, IAM user, or role must opt in to the awsvpcTrunking account setting. Select this setting using the AWS CLI or the ECS console. You can opt in for an account by making awsvpcTrunking its default setting. Or, you can enable this setting for the role associated with the instance profile with which the instance launches. For instructions, see Account Settings.
Other considerations
After completing the prerequisites described in the preceding sections, launch a new container instance with increased network interface limits using one of the supported EC2 instance types.
Keep the following in mind:
It’s available with the latest variant of the ECS-optimized AMI.
It only affects creation of new container instances after opting into awsvpcTrunking.
It only affects tasks created with awsvpc network mode and EC2 launch type. Tasks created with the AWS Fargate launch type always have a dedicated network interface, no matter how many you launch.
If you seek to optimize the usage of your EC2 container instances for clusters that you manage, enable the increased network interface density feature with awsvpcTrunking. By following the steps outlined in this post, you can launch tasks using significantly fewer EC2 instances. This is especially useful if you embrace a microservices architecture, with its increasing numbers of lighter tasks.
Hopefully, you found this post informative and the proposed solution intriguing. As always, AWS welcomes all feedback or comment.
In our previous post we discussed the various ways you can invoke AWS Lambda functions. In this post, we’ll provide some tips and best practices you can use when building your AWS Lambda functions.
One of the benefits of using Lambda, is that you don’t have to worry about server and infrastructure management. This means AWS will handle the heavy lifting needed to execute your Lambda functions. You should take advantage of this architecture with the tips below.
Tip #1: When to VPC-Enable a Lambda Function
Lambda functions always operate from an AWS-owned VPC. By default, your function has full ability to make network requests to any public internet address — this includes access to any of the public AWS APIs. For example, your function can interact with AWS DynamoDB APIs to PutItem or Query for records. You should only enable your functions for VPC access when you need to interact with a private resource located in a private subnet. An RDS instance is a good example.
Once your function is VPC-enabled, all network traffic from your function is subject to the routing rules of your VPC/Subnet. If your function needs to interact with a public resource, you will need a route through a NAT gateway in a public subnet.
Tip #2: Deploy Common Code to a Lambda Layer (i.e. the AWS SDK)
If you intend to reuse code in more than one function, consider creating a Layer and deploying it there. A great candidate would be a logging package that your team is required to standardize on. Another great example is the AWS SDK. AWS will include the AWS SDK for NodeJS and Python functions (and update the SDK periodically). However, you should bundle your own SDK and pin your functions to a version of the SDK you have tested.
Tip #3: Watch Your Package Size and Dependencies
Lambda functions require you to package all needed dependencies (or attach a Layer) — the bigger your deployment package, the slower your function will cold-start. Remove all unnecessary items, such as documentation and unused libraries. If you are using Java functions with the AWS SDK, only bundle the module(s) that you actually need to use — not the entire SDK.
Our first post in this series talked about how concurrency can effect your down stream systems. Since Lambda functions can scale extremely quickly, this means you should have controls in place to notify you when you have a spike in concurrency. A good idea is to deploy a CloudWatch Alarm that notifies your team when function metrics such as ConcurrentExecutions or Invocations exceeds your threshold. You should create an AWS Budget so you can monitor costs on a daily basis. Here is a great example of how to set up automated cost controls.
Tip #5: Over-Provision Memory (in some use cases) but Not Function Timeout
Lambda allocates compute power in proportion to the memory you allocate to your function. This means you can over provision memory to run your functions faster and potentially reduce your costs. You should benchmark your use case to determine where the breakeven point is for running faster and using more memory vs running slower and using less memory.
However, we recommend you do not over provision your function time out settings. Always understand your code performance and set a function time out accordingly. Overprovisioning function timeout often results in Lambda functions running longer than expected and unexpected costs.
About the Author
George Mao is a Specialist Solutions Architect at Amazon Web Services, focused on the Serverless platform. George is responsible for helping customers design and operate Serverless applications using services like Lambda, API Gateway, Cognito, and DynamoDB. He is a regular speaker at AWS Summits, re:Invent, and various tech events. George is a software engineer and enjoys contributing to open source projects, delivering technical presentations at technology events, and working with customers to design their applications in the Cloud. George holds a Bachelor of Computer Science and Masters of IT from Virginia Tech.
This post is contributed by Mani Chandrasekaran | Solutions Architect, AWS
Customers would like to run container-based applications in a private subnet inside a virtual private cloud (VPC), where there is no direct connectivity from the outside world to these applications. This is a very secure way of running applications which do not want to be directly exposed to the internet.
AWS Fargate is a compute engine for Amazon ECS that enables you to run containers without having to manage servers or clusters. With AWS Fargate with Amazon ECS, you don’t have to provision, configure, and scale clusters of virtual machines to run containers.
Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale. The API Gateway private integration makes it simple to expose your HTTP and HTTPS resources behind a virtual private cloud (VPC) with Amazon VPC private endpoints. This allows access by clients outside of the VPC without exposing the resources to the internet.
This post shows how API Gateway can be used to expose an application running on Fargate in a private subnet in a VPC using API Gateway private integration through AWS PrivateLink. With the API Gateway private integration, you can enable access to HTTP and HTTPS resources in a VPC without detailed knowledge of private network configurations or technology-specific appliances.
Architecture
You deploy a simple NGINX application running on Fargate within a private subnet as a first step, and then expose this NGINX application to the internet using the API.
As shown in the architecture in the following diagram, you create a VPC with two private subnets and two public subnets. To enable the Fargate tasks to download Docker images from Amazon ECR, you deploy two network address translation (NAT) gateways in the public subnets.
You also deploy a container application, NGINX, as an ECS service with one or more Fargate tasks running inside the private subnets. You provision an internal Network Load Balancer in the VPC private subnets and target the ECS service running as Fargate tasks. This is provisioned using an AWS CloudFormation template (link provided later in this post).
The integration between API Gateway and the Network Load Balancer inside the private subnet uses an API Gateway VpcLink resource. The VpcLink encapsulates connections between the API and targeted VPC resources when the application is hosted on Fargate. You set up an API with the private integration by creating a VpcLink that targets the Network Load Balancer and then uses the VpcLink as an integration endpoint .
Deployment
Here are the steps to deploy this solution:
Deploy an application on Fargate.
Set up an API Gateway private integration.
Deploy and test the API.
Clean up resources to avoid incurring future charges.
Step 1 — Deploy an application on AWS Fargate I’ve created an AWS CloudFormation template to make it easier for you to get started.
In the AWS Management Console, deploy the CloudFormation template in an AWS Region where Fargate and API Gateway are available.
On the Create stack page, specify the parameters specific to your environment. Or, use the default parameters, which deploy an NGINX Docker image as a Fargate task in an ECS cluster across two Availability Zones.
When the process is finished, the status changes to CREATE_COMPLETE and the details of the Network Load Balancer, VPC, subnets, and ECS cluster name appear on the Outputs tab.
1. Create a VPCLink in API Gateway with the ARN of the Network Load Balancer that you provisioned. Make sure that you specify the correct endpoint URL and Region based on the AWS Region that you selected for the CloudFormation template. Run the following command:
Find the ID value of the RestApi in the returned result. In this example, it is qc83xxxx. Use this ID to finish the operations on the API, including methods and integrations setup.
4. In this example, you create an API with only a GET method on the root resource (/) and integrate the method with the VpcLink.
Set up the GET / method. First, get the identifier of the root resource (/):
For a private integration, you must set connection-type to VPC_LINK and set connection-id to the VpcLink identifier, alnXXYY in this example. The URI parameter is not used to route requests to your endpoint, but is used to set the host header and for certificate validation.
Step 3 — Deploy and test the API
To test the API, run the following command to deploy the API:
Test the APIs with tools such as Postman or the curl command. To call a deployed API, you must submit requests to the URL for the API Gateway component service for API execution, known as execute-api.
2. To delete the Fargate-related resources created in CloudFormation, in the console, choose Delete Stack.
Conclusion
API Gateway private endpoints enable use cases for building private API–based services running on Fargate inside your own VPCs. You can take advantage of advanced features of API Gateway, such as custom authorizers, Amazon Cognito User Pools integration, usage tiers, throttling, deployment canaries, and API keys. At the same time, you can make sure the APIs or applications running in Fargate are not exposed to the internet.
You can now share a single AWS Directory Service for Microsoft Active Directory (also known as an AWS Managed Microsoft AD) with multiple AWS accounts within an AWS Region. This capability makes it easier and more cost-effective for you to manage directory-aware workloads from a single directory across accounts and Amazon Virtual Private Clouds (Amazon VPC). Instead of needing to manually domain join your Amazon Elastic Compute Cloud instances (EC2 instances) or create one directory per account and VPC, you can use your directory from any AWS account and from any VPC within an AWS Region.
In this post, I show you how to launch two EC2 instances, each in a separate Amazon VPC within the same AWS account (the directory consumer account), and then seamlessly domain-join both instances to a directory in another account (the directory owner account). You’ll accomplish this in four steps:
Create an AWS Managed Microsoft AD directory.
Establish networking connectivity between VPCs.
Share the directory with the directory consumer account.
Launch Amazon EC2 instances and seamlessly domain join to the directory.
Solution architecture
The following diagram shows the steps you’ll follow to use a single AWS Managed Microsoft AD in multiple accounts. Note that when you complete Step 3, AWS Microsoft Managed AD will create a shared directory in the directory consumer account. The shared directory contains the metadata that enables the EC2 seamless domain join to locate the directory in the directory owner account. Note that there are additional charges for directory sharing.
First, follow the steps to create an AWS Microsoft AD directory in your directory owner AWS Account and Amazon VPC. In the examples I use throughout this post, my domain name is example.com, but remember to replace this with your own domain name.
When you create your directory, you’ll have the option in Step 3: Choose VPC and subnets to choose the subnets in which to deploy your domain controllers. AWS Microsoft AD ensures that you select subnets from different Availability Zones. In my example, I have no subnet preference, so I choose No Preference from the Subnets drop-down list.
Figure 2: Selecting Subnet preference
Select Next to review your configuration, and then select Create directory. It can take 20-45 minutes for the directory creation process to finish. While AWS Managed Microsoft AD creates the directory, you can move on to the next step.
Step 2: Establish networking connectivity between VPCs
To domain join your Amazon EC2 instances to your directory, you need to establish networking connectivity between the VPCs. There are multiple methods of establishing networking connectivity between two VPCs. In this post, I’ll show you how to use Amazon VPC peering by performing the following steps:
Create one VPC peering connection between the directory owner VPC-0 and directory consumer VPC-1, then create another connection between the directory owner VPC-0 and directory consumer VPC-2. For reference, here are my own VPC details:
VPC
CIDR block
Directory owner VPC-0
172.31.0.0/16
Directory consumer VPC-1
10.0.0.0/16
Directory consumer VPC-2
10.100.0.0/16
Enable traffic routing between the peered VPCs by adding a route to your VPC route table that points to the VPC peering connection to route traffic to the other VPC in the peering connection. I’ve configured my directory owner VPC-0 route table by adding the following VPC peering connections:
Destination
Target
172.31.0.0/16
Local
10.0.0.0/16
pcx-0
10.100.0.0/16
pcx-1
Configure each of the directory consumer VPC route tables by adding the peering connection with the directory owner VPC-0. If you want, you can also create and attach an Internet Gateway to your directory consumer VPCs. This enables the instances in the directory consumer VPCs to communicate with the AWS System Manager (SSM) agent that performs the domain join. Here are my directory consumer VPC route table configurations: VPC-1 route table:
Step 3: Share the directory with the directory consumer account
Now that your networking is in place, you must make your directory visible to the directory consumer account. You can accomplish this by sharing your directory with the directory consumer account. Directory sharing works at the account level, which also makes the directory visible to all VPCs within the directory consumer account.
AWS Organizations makes it easier to share the directory within your organization because you can browse and validate the directory consumer accounts. To use this option, your organization must have all features enabled, and your directory must be in the organization master account. This method of sharing simplifies your setup because it doesn’t require the directory consumer accounts to accept your directory sharing request.
Handshake enables directory sharing when you aren’t using AWS Organizations. The handshake method requires the directory consumer account to accept the directory sharing request.
In my example, I’ll walk you through the steps to use AWS Organizations to share a directory:
Open the AWS Management Console, then select Directory Service and select the directory you want to share (in my case, example.com). Select the Actions button, and then the Share directory option.
Select Share this directory with AWS accounts inside your organization, then choose the Enable Access to AWS Organizations button. This allows your AWS account to list all accounts in your Organizations in the AWS Directory Service console.
Select your directory consumer account (in my example, Consumer Example) from the Organization accounts browser, then select the Add button.
Figure 3: Select the account and then select “Add”
You should now be able to see your directory consumer account in the Selected Accounts table. Select the Share button to share your directory with that account:
Figure 4: Selected accounts and the “Share” button
To share your directory with multiple directory consumer accounts, you can repeat steps 3 and 4 for each account.
When you’re finished sharing, AWS Managed Microsoft AD will create a shared directory in each directory consumer account. The shared directory contains the metadata to locate the directory in the directory owner account. Each shared directory has a unique identifier (Shared directory ID). After you’ve shared your directory, you can find your shared directory IDs in the Scale & Share tab in the AWS Directory Service console. In my example, AWS Managed Microsoft AD created the shared directory ID d-90673f8d56 in the Consumer Example account:
Figure 5: Confirmation notification about successful sharing
You can see the shared directory details in your directory consumer account by opening the AWS Management Console, choosing Directory Service, selecting the Directories shared with me option in the left menu, and then choosing the appropriate Shared directory ID link:
Figure 6: Shared account details example
Step 4: Launch Amazon EC2 instances and seamlessly domain join to the directory
Now that you’ve established the networking between your VPCs and shared the directory, you’re ready to launch EC2 instances in your directory consumer VPCs and seamlessly domain join to your directory. In my example, I use the Amazon EC2 console but you can also use AWS Systems Manager.
Follow the prompts of the Amazon EC2 launch instance wizard to select a Windows server instance type. When you reach Step 3: Configure Instance Details, select the shared directory that locates your domain in the directory owner account. (I’ve chosen d-926726739b, which will locate the domain example.com.) Then select the textEC2DomainJoin IAM role. Choose the Review and Launch button, and then the Launch button on the following screen.
Figure 7: The “Review and Launch” button
Now that you’ve joined your Amazon EC2 instance to the domain, you can log into your instance using a Remote Desktop Protocol (RDP) client with the credentials from your AD user account.
You can then install and run AD-aware workloads such as Microsoft SharePoint on the instance, and the application will use your directory. To launch your second instance, just repeat Step 4: Launch Amazon EC2 instances and seamlessly domain join to the directory, selecting the VPC-2 instead of VPC-1. This makes it easier and quicker for you to deploy and manage EC2 instances using the credentials from a single AWS Managed Microsoft AD directory across multiple accounts and VPCs.
Summary
In this blog post, I demonstrate how to seamlessly domain join Amazon EC2 instances from multiple accounts and VPCs to a single AWS Managed Microsoft AD directory. By sharing the directory with multiple accounts, you can simplify the management and deployment of directory-aware workloads on Amazon EC2 instances. This eliminates the need to manually domain join the instances or create one directory per account and VPC. In addition, with AWS Managed Microsoft AD and AWS Systems Manager, you can automate your Amazon EC2 deployments and seamlessly domain join to your single directory from any account and VPC without the need to write PowerShell code using AWS Command Line Interface or application programming interfaces.
One of the biggest trends in application development today is the use of APIs to power the backend technologies supporting a product. Increasingly, the way mobile, IoT, web applications, or internal services talk to each other and to application frontends is using some API interface.
Alongside this trend of building API-powered applications is the move to a microservices application design pattern. A larger application is represented by many smaller application components, also typically communicating via API. The growth of APIs and microservices being used together is driven across all sorts of companies, from startups up through enterprises. The number of tools required to manage APIs at scale, securely, and with minimal operational overhead is growing as well.
Today, we’re excited to announce the launch of Amazon API Gateway private endpoints. This has been one of the most heavily requested features for this service. We believe this is going to make creating and managing private APIs even easier.
API Gateway overview
When API Gateway first launched, it came with what are now known as edge-optimized endpoints. These publicly facing endpoints came fronted with Amazon CloudFront, a global content delivery network with over 100 points of presence today.
Edge-optimized endpoints helped you reduce latency to clients accessing your API on the internet from anywhere; typically, mobile, IoT, or web-based applications. Behind API Gateway, you could back your API with a number of options for backend technologies: AWS Lambda, Amazon EC2, Elastic Load Balancing products such as Application Load Balancers or Classic Load Balancers, Amazon DynamoDB, Amazon Kinesis, or any publicly available HTTPS-based endpoint.
In February 2016, AWS launched the ability for AWS Lambda functions to access resources inside of an Amazon VPC. With this launch, you could build API-based services that did not require a publicly available endpoint. They could still interact with private services, such as databases, inside your VPC.
In November 2017, API Gateway launched regional API endpoints, which are publicly available endpoints without any preconfigured CDN in front of them. Regional endpoints are great for helping to reduce request latency when API requests originate from the same Region as your REST API. You can also configure your own CDN distribution, which allows you to protect your public APIs with AWS WAF, for example. With regional endpoints, nothing changed about the backend technologies supported.
At re:Invent 2017, we announced endpoint integrations inside a private VPC. With this capability, you can now have your backend running on EC2 be private inside your VPC without the need for a publicly accessible IP address or load balancer. Beyond that, you can also now use API Gateway to front APIs hosted by backends that exist privately in your own data centers, using AWS Direct Connect links to your VPC. Private integrations were made possible via VPC Link and Network Load Balancers, which support backends such as EC2 instances, Auto Scaling groups, and Amazon ECS using the Fargate launch type.
Today’s launch solves one of the missing pieces of the puzzle, which is the ability to have private API endpoints inside your own VPC. With this new feature, you can still use API Gateway features, while securely exposing REST APIs only to the other services and resources inside your VPC, or those connected via Direct Connect to your own data centers.
Here’s how this works.
API Gateway private endpoints are made possible via AWS PrivateLink interface VPC endpoints. Interface endpoints work by creating elastic network interfaces in subnets that you define inside your VPC. Those network interfaces then provide access to services running in other VPCs, or to AWS services such as API Gateway. When configuring your interface endpoints, you specify which service traffic should go through them. When using private DNS, all traffic to that service is directed to the interface endpoint instead of through a default route, such as through a NAT gateway or public IP address.
API Gateway as a fully managed service runs its infrastructure in its own VPCs. When you interface with API Gateway publicly accessible endpoints, it is done through public networks. When they’re configured as private, the public networks are not made available to route your API. Instead, your API can only be accessed using the interface endpoints that you have configured.
Some things to note:
Because you configure the subnets in which your endpoints are made available, you control the availability of the access to your API Gateway hosted APIs. Make sure that you provide multiple interfaces in your VPC. In the above diagram, there is one endpoint in each subnet in each Availability Zone for which the VPC is configured.
Each endpoint is an elastic network interface configured in your VPC that has security groups configured. Network ACLs apply to the network interface as well.
This VPC will have two private and two public subnets, one of each in an AZ, as seen in the CloudFormation Designer.
Name the stack “PrivateAPIDemo”.
Set the Environment to “Demo”. This has no real effect beyond tagging and naming certain resources accordingly.
Choose Next.
On the Options page, leave all of the defaults and choose Next.
On the Review page, choose Create. It takes just a few moments for all of the resources in this template to be created.
After the VPC has a status of “CREATE_COMPLETE”, choose Outputs and make note of the values for VpcId, both public and private subnets 1 and 2, and the endpoint security group.
Make sure that you are in the same Region in which you just created the above stack.
In the left navigation pane, choose Endpoints, Create Endpoint.
For Service category, keep it set to “AWS Services”.
For Service Name, set it to “com.amazonaws.{region}.execute-api”.
For VPC, select the one created earlier.
For Subnets, select the two private labeled subnets from this VPC created earlier, one in each Availability Zone. You can find them labeled as “privateSubnet01” and “privateSubnet02”.
For Enable Private DNS Name, keep it checked as Enabled for this endpoint.
For Security Group, select the group named “EndpointSG”. It allows for HTTPS access to the endpoint for the entire VPC IP address range.
Choose Create Endpoint.
Creating the endpoint takes a few moments to go through all of the interface endpoint lifecycle steps. You need the DNS names later so note them now.
Open the API Gateway console in the same Region as the VPC and private endpoint.
Choose Create API, Example API.
For Endpoint Type, choose Private.
Choose Import.
Before deploying the API, create a resource policy to allow access to the API from inside the VPC.
In the left navigation pane, choose Resource Policy.
Choose Source VPC Whitelist from the three examples possible.
Replace {{vpceID}} with the ID of your VPC endpoint.
Choose Save.
In the left navigation pane, select the new API and choose Actions, Deploy API.
Choose [New Stage].
Name the stage demo.
Choose Deploy.
Your API is now fully deployed and available from inside your VPC. Next, test to confirm that it’s working.
Test the API
To emphasize the “privateness” of this API, test it from a resource that only lives inside your VPC and has no direct network access to it, in the traditional networking sense.
Launch a Lambda function inside the VPC, with no public access. To show its ability to hit the private API endpoint, invoke it using the console. The function is launched inside the private subnets inside the VPC without access to a NAT gateway, which would be required for any internet access. This works because Lambda functions are invoked using the service API, not any direct network access to the function’s underlying resources inside your VPC.
To create a Lambda function using CloudFormation, choose Launch stack.
All the code for this function is located inside of the template and the template creates just three resources, as shown in the diagram from Designer:
A Lambda function
An IAM role
A VPC security group
Name the template LambdaTester, or something easy to remember.
For the first parameter, enter a DNS name from your VPC endpoint. These can be found in the Amazon VPC console under Endpoints. For this example, use the endpoints that start with “vpce”. These are the private DNS names for them.For the API Gateway endpoint DNS, see the dashboard for your API Gateway API and copy the URL from the top of the page. Use just the endpoint DNS, not the “https://” or “/demo/” at the end.
Select the same value for Environment as you did earlier in creating your VPC.
Choose Next.
Leave all options as the default values and choose Next.
Select the check box next to I acknowledge that… and choose Create.
When your stack reaches the “CREATE_COMPLETE” state, choose Resources.
To go to the Lambda console for this function, choose the Physical ID of the AWS::Lambda::Function resource.
Note: If you chose a different environment than “Demo” for this example, modify the line “path: ‘/demo/pets’,” to the appropriate value.
Choose Test in the top right of the Lambda console. You are prompted to create a test event to pass the function. Because you don’t need to take anything here for the function to call the internal API, you can create a blank payload or leave the default as shown. Choose Save.
Choose Test again. This invokes the function and passes in the payload that you just saved. It takes just a few moments for the new function’s environment to spin to life and to call the code configured for it. You should now see the results of the API call to the PetStore API.
The JSON returned is from your API Gateway powered private API endpoint. Visit the API Gateway console to see activity on the dashboard and confirm again that this API was called by the Lambda function, as in the following screenshot:
Cleanup
Cleaning up from this demo requires a few simple steps:
Delete the stack for your Lambda function.
Delete the VPC endpoint.
Delete the API Gateway API.
Delete the VPC stack that you created first.
Conclusion
API Gateway private endpoints enable use cases for building private API–based services inside your own VPCs. You can now keep both the frontend to your API (API Gateway) and the backend service (Lambda, EC2, ECS, etc.) private inside your VPC. Or you can have networks using Direct Connect networks without the need to expose them to the internet in any way. All of this without the need to manage the infrastructure that powers the API gateway itself!
You can continue to use the advanced features of API Gateway such as custom authorizers, Amazon Cognito User Pools integration, usage tiers, throttling, deployment canaries, and API keys.
We believe that this feature greatly simplifies the growth of API-based microservices. We look forward to your feedback here, on social media, or in the AWS forums.
Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!
AWS re:Invent June 13, 2018 | 05:00 PM – 05:30 PM PT – Episode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar. Compute
Containers June 25, 2018 | 09:00 AM – 09:45 AM PT – Running Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.
June 19, 2018 | 11:00 AM – 11:45 AM PT – Launch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new
June 21, 2018 | 01:00 PM – 01:45 PM PT – Enabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.
June 28, 2018 | 01:00 PM – 01:45 PM PT – Fireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device. IoT
June 27, 2018 | 11:00 AM – 11:45 AM PT – AWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.
Mobile June 25, 2018 | 11:00 AM – 11:45 AM PT – Drive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.
June 26, 2018 | 11:00 AM – 11:45 AM PT – Deep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services. June 27, 2018 | 01:00 PM – 01:45 PM PT – Changing the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances. June 28, 2018 | 11:00 AM – 11:45 AM PT – Big Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.
Amazon QuickSight is a fully managed cloud business intelligence system that gives you Fast & Easy to Use Business Analytics for Big Data. QuickSight makes business analytics available to organizations of all shapes and sizes, with the ability to access data that is stored in your Amazon Redshift data warehouse, your Amazon Relational Database Service (RDS) relational databases, flat files in S3, and (via connectors) data stored in on-premises MySQL, PostgreSQL, and SQL Server databases. QuickSight scales to accommodate tens, hundreds, or thousands of users per organization.
Today we are launching a new, session-based pricing option for QuickSight, along with additional region support and other important new features. Let’s take a look at each one:
Pay-per-Session Pricing Our customers are making great use of QuickSight and take full advantage of the power it gives them to connect to data sources, create reports, and and explore visualizations.
However, not everyone in an organization needs or wants such powerful authoring capabilities. Having access to curated data in dashboards and being able to interact with the data by drilling down, filtering, or slicing-and-dicing is more than adequate for their needs. Subscribing them to a monthly or annual plan can be seen as an unwarranted expense, so a lot of such casual users end up not having access to interactive data or BI.
In order to allow customers to provide all of their users with interactive dashboards and reports, the Enterprise Edition of Amazon QuickSight now allows Reader access to dashboards on a Pay-per-Session basis. QuickSight users are now classified as Admins, Authors, or Readers, with distinct capabilities and prices:
Authors have access to the full power of QuickSight; they can establish database connections, upload new data, create ad hoc visualizations, and publish dashboards, all for $9 per month (Standard Edition) or $18 per month (Enterprise Edition).
Readers can view dashboards, slice and dice data using drill downs, filters and on-screen controls, and download data in CSV format, all within the secure QuickSight environment. Readers pay $0.30 for 30 minutes of access, with a monthly maximum of $5 per reader.
Admins have all authoring capabilities, and can manage users and purchase SPICE capacity in the account. The QuickSight admin now has the ability to set the desired option (Author or Reader) when they invite members of their organization to use QuickSight. They can extend Reader invites to their entire user base without incurring any up-front or monthly costs, paying only for the actual usage.
A New Region QuickSight is now available in the Asia Pacific (Tokyo) Region:
The UI is in English, with a localized version in the works.
Hourly Data Refresh Enterprise Edition SPICE data sets can now be set to refresh as frequently as every hour. In the past, each data set could be refreshed up to 5 times a day. To learn more, read Refreshing Imported Data.
Access to Data in Private VPCs This feature was launched in preview form late last year, and is now available in production form to users of the Enterprise Edition. As I noted at the time, you can use it to implement secure, private communication with data sources that do not have public connectivity, including on-premises data in Teradata or SQL Server, accessed over an AWS Direct Connect link. To learn more, read Working with AWS VPC.
Parameters with On-Screen Controls QuickSight dashboards can now include parameters that are set using on-screen dropdown, text box, numeric slider or date picker controls. The default value for each parameter can be set based on the user name (QuickSight calls this a dynamic default). You could, for example, set an appropriate default based on each user’s office location, department, or sales territory. Here’s an example:
URL Actions for Linked Dashboards You can now connect your QuickSight dashboards to external applications by defining URL actions on visuals. The actions can include parameters, and become available in the Details menu for the visual. URL actions are defined like this:
You can use this feature to link QuickSight dashboards to third party applications (e.g. Salesforce) or to your own internal applications. Read Custom URL Actions to learn how to use this feature.
Dashboard Sharing You can now share QuickSight dashboards across every user in an account.
Larger SPICE Tables The per-data set limit for SPICE tables has been raised from 10 GB to 25 GB.
Upgrade to Enterprise Edition The QuickSight administrator can now upgrade an account from Standard Edition to Enterprise Edition with a click. This enables provisioning of Readers with pay-per-session pricing, private VPC access, row-level security for dashboards and data sets, and hourly refresh of data sets. Enterprise Edition pricing applies after the upgrade.
Available Now Everything I listed above is available now and you can start using it today!
The adoption of Apache Spark has increased significantly over the past few years, and running Spark-based application pipelines is the new normal. Spark jobs that are in an ETL (extract, transform, and load) pipeline have different requirements—you must handle dependencies in the jobs, maintain order during executions, and run multiple jobs in parallel. In most of these cases, you can use workflow scheduler tools like Apache Oozie, Apache Airflow, and even Cron to fulfill these requirements.
Apache Oozie is a widely used workflow scheduler system for Hadoop-based jobs. However, its limited UI capabilities, lack of integration with other services, and heavy XML dependency might not be suitable for some users. On the other hand, Apache Airflow comes with a lot of neat features, along with powerful UI and monitoring capabilities and integration with several AWS and third-party services. However, with Airflow, you do need to provision and manage the Airflow server. The Cron utility is a powerful job scheduler. But it doesn’t give you much visibility into the job details, and creating a workflow using Cron jobs can be challenging.
What if you have a simple use case, in which you want to run a few Spark jobs in a specific order, but you don’t want to spend time orchestrating those jobs or maintaining a separate application? You can do that today in a serverless fashion using AWS Step Functions. You can create the entire workflow in AWS Step Functions and interact with Spark on Amazon EMR through Apache Livy.
In this post, I walk you through a list of steps to orchestrate a serverless Spark-based ETL pipeline using AWS Step Functions and Apache Livy.
Input data
For the source data for this post, I use the New York City Taxi and Limousine Commission (TLC) trip record data. For a description of the data, see this detailed dictionary of the taxi data. In this example, we’ll work mainly with the following three columns for the Spark jobs.
Column name
Column description
RateCodeID
Represents the rate code in effect at the end of the trip (for example, 1 for standard rate, 2 for JFK airport, 3 for Newark airport, and so on).
FareAmount
Represents the time-and-distance fare calculated by the meter.
TripDistance
Represents the elapsed trip distance in miles reported by the taxi meter.
The trip data is in comma-separated values (CSV) format with the first row as a header. To shorten the Spark execution time, I trimmed the large input data to only 20,000 rows. During the deployment phase, the input file tripdata.csv is stored in Amazon S3 in the <<your-bucket>>/emr-step-functions/input/ folder.
The following image shows a sample of the trip data:
Solution overview
The next few sections describe how Spark jobs are created for this solution, how you can interact with Spark using Apache Livy, and how you can use AWS Step Functions to create orchestrations for these Spark applications.
At a high level, the solution includes the following steps:
Trigger the AWS Step Function state machine by passing the input file path.
The first stage in the state machine triggers an AWS Lambda
The Lambda function interacts with Apache Spark running on Amazon EMR using Apache Livy, and submits a Spark job.
The state machine waits a few seconds before checking the Spark job status.
Based on the job status, the state machine moves to the success or failure state.
Subsequent Spark jobs are submitted using the same approach.
The state machine waits a few seconds for the job to finish.
The job finishes, and the state machine updates with its final status.
Let’s take a look at the Spark application that is used for this solution.
Spark jobs
For this example, I built a Spark jar named spark-taxi.jar. It has two different Spark applications:
MilesPerRateCode – The first job that runs on the Amazon EMR cluster. This job reads the trip data from an input source and computes the total trip distance for each rate code. The output of this job consists of two columns and is stored in Apache Parquet format in the output path.
The following are the expected output columns:
rate_code – Represents the rate code for the trip.
total_distance – Represents the total trip distance for that rate code (for example, sum(trip_distance)).
RateCodeStatus – The second job that runs on the EMR cluster, but only if the first job finishes successfully. This job depends on two different input sets:
csv – The same trip data that is used for the first Spark job.
miles-per-rate – The output of the first job.
This job first reads the tripdata.csv file and aggregates the fare_amount by the rate_code. After this point, you have two different datasets, both aggregated by rate_code. Finally, the job uses the rate_code field to join two datasets and output the entire rate code status in a single CSV file.
The output columns are as follows:
rate_code_id – Represents the rate code type.
total_distance – Derived from first Spark job and represents the total trip distance.
total_fare_amount – A new field that is generated during the second Spark application, representing the total fare amount by the rate code type.
Note that in this case, you don’t need to run two different Spark jobs to generate that output. The goal of setting up the jobs in this way is just to create a dependency between the two jobs and use them within AWS Step Functions.
Both Spark applications take one input argument called rootPath. It’s the S3 location where the Spark job is stored along with input and output data. Here is a sample of the final output:
The next section discusses how you can use Apache Livy to interact with Spark applications that are running on Amazon EMR.
Using Apache Livy to interact with Apache Spark
Apache Livy provides a REST interface to interact with Spark running on an EMR cluster. Livy is included in Amazon EMR release version 5.9.0 and later. In this post, I use Livy to submit Spark jobs and retrieve job status. When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy, and it starts listening on port 8998 by default. Livy provides APIs to interact with Spark.
Let’s look at a couple of examples how you can interact with Spark running on Amazon EMR using Livy.
To list active running jobs, you can execute the following from the EMR master node:
curl localhost:8998/sessions
If you want to do the same from a remote instance, just change localhost to the EMR hostname, as in the following (port 8998 must be open to that remote instance through the security group):
Through Spark submit, you can pass multiple arguments for the Spark job and Spark configuration settings. You can also do that using Livy, by passing the S3 path through the args parameter, as shown following:
For a detailed list of Livy APIs, see the Apache Livy REST API page. This post uses GET /batches and POST /batches.
In the next section, you create a state machine and orchestrate Spark applications using AWS Step Functions.
Using AWS Step Functions to create a Spark job workflow
AWS Step Functions automatically triggers and tracks each step and retries when it encounters errors. So your application executes in order and as expected every time. To create a Spark job workflow using AWS Step Functions, you first create a Lambda state machine using different types of states to create the entire workflow.
First, you use the Task state—a simple state in AWS Step Functions that performs a single unit of work. You also use the Wait state to delay the state machine from continuing for a specified time. Later, you use the Choice state to add branching logic to a state machine.
The following is a quick summary of how to use different states in the state machine to create the Spark ETL pipeline:
Task state – Invokes a Lambda function. The first Task state submits the Spark job on Amazon EMR, and the next Task state is used to retrieve the previous Spark job status.
Wait state – Pauses the state machine until a job completes execution.
Choice state – Each Spark job execution can return a failure, an error, or a success state So, in the state machine, you use the Choice state to create a rule that specifies the next action or step based on the success or failure of the previous step.
Here is one of my Task states, MilesPerRateCode, which simply submits a Spark job:
"MilesPerRate Job": {
"Type": "Task",
"Resource":"arn:aws:lambda:us-east-1:xxxxxx:function:blog-miles-per-rate-job-submit-function",
"ResultPath": "$.jobId",
"Next": "Wait for MilesPerRate job to complete"
}
This Task state configuration specifies the Lambda function to execute. Inside the Lambda function, it submits a Spark job through Livy using Livy’s POST API. Using ResultPath, it tells the state machine where to place the result of the executing task. As discussed in the previous section, Spark submit returns the session ID, which is captured with $.jobId and used in a later state.
The following code section shows the Lambda function, which is used to submit the MilesPerRateCode job. It uses the Python request library to submit a POST against the Livy endpoint hosted on Amazon EMR and passes the required parameters in JSON format through payload. It then parses the response, grabs id from the response, and returns it. The Next field tells the state machine which state to go to next.
Just like in the MilesPerRate job, another state submits the RateCodeStatus job, but it executes only when all previous jobs have completed successfully.
Here is the Task state in the state machine that checks the Spark job status:
Just like other states, the preceding Task executes a Lambda function, captures the result (represented by jobStatus), and passes it to the next state. The following is the Lambda function that checks the Spark job status based on a given session ID:
In the Choice state, it checks the Spark job status value, compares it with a predefined state status, and transitions the state based on the result. For example, if the status is success, move to the next state (RateCodeJobStatus job), and if it is dead, move to the MilesPerRate job failed state.
To set up this entire solution, you need to create a few AWS resources. To make it easier, I have created an AWS CloudFormation template. This template creates all the required AWS resources and configures all the resources that are needed to create a Spark-based ETL pipeline on AWS Step Functions.
This CloudFormation template requires you to pass the following four parameters during initiation.
Parameter
Description
ClusterSubnetID
The subnet where the Amazon EMR cluster is deployed and Lambda is configured to talk to this subnet.
KeyName
The name of the existing EC2 key pair to access the Amazon EMR cluster.
VPCID
The ID of the virtual private cloud (VPC) where the EMR cluster is deployed and Lambda is configured to talk to this VPC.
S3RootPath
The Amazon S3 path where all required files (input file, Spark job, and so on) are stored and the resulting data is written.
IMPORTANT: These templates are designed only to show how you can create a Spark-based ETL pipeline on AWS Step Functions using Apache Livy. They are not intended for production use without modification. And if you try this solution outside of the us-east-1 Region, download the necessary files from s3://aws-data-analytics-blog/emr-step-functions, upload the files to the buckets in your Region, edit the script as appropriate, and then run it.
To launch the CloudFormation stack, choose Launch Stack:
Launching this stack creates the following list of AWS resources.
Logical ID
Resource Type
Description
StepFunctionsStateExecutionRole
IAM role
IAM role to execute the state machine and have a trust relationship with the states service.
SparkETLStateMachine
AWS Step Functions state machine
State machine in AWS Step Functions for the Spark ETL workflow.
LambdaSecurityGroup
Amazon EC2 security group
Security group that is used for the Lambda function to call the Livy API.
RateCodeStatusJobSubmitFunction
AWS Lambda function
Lambda function to submit the RateCodeStatus job.
MilesPerRateJobSubmitFunction
AWS Lambda function
Lambda function to submit the MilesPerRate job.
SparkJobStatusFunction
AWS Lambda function
Lambda function to check the Spark job status.
LambdaStateMachineRole
IAM role
IAM role for all Lambda functions to use the lambda trust relationship.
EMRCluster
Amazon EMR cluster
EMR cluster where Livy is running and where the job is placed.
During the AWS CloudFormation deployment phase, it sets up S3 paths for input and output. Input files are stored in the <<s3-root-path>>/emr-step-functions/input/ path, whereas spark-taxi.jar is copied under <<s3-root-path>>/emr-step-functions/.
The following screenshot shows how the S3 paths are configured after deployment. In this example, I passed a bucket that I created in the AWS account s3://tm-app-demos for the S3 root path.
If the CloudFormation template completed successfully, you will see Spark-ETL-State-Machine in the AWS Step Functions dashboard, as follows:
Choose the Spark-ETL-State-Machine state machine to take a look at this implementation. The AWS CloudFormation template built the entire state machine along with its dependent Lambda functions, which are now ready to be executed.
On the dashboard, choose the newly created state machine, and then choose New execution to initiate the state machine. It asks you to pass input in JSON format. This input goes to the first state MilesPerRate Job, which eventually executes the Lambda function blog-miles-per-rate-job-submit-function.
Pass the S3 root path as input:
{
“rootPath”: “s3://tm-app-demos”
}
Then choose Start Execution:
The rootPath value is the same value that was passed when creating the CloudFormation stack. It can be an S3 bucket location or a bucket with prefixes, but it should be the same value that is used for AWS CloudFormation. This value tells the state machine where it can find the Spark jar and input file, and where it will write output files. After the state machine starts, each state/task is executed based on its definition in the state machine.
At a high level, the following represents the flow of events:
Execute the first Spark job, MilesPerRate.
The Spark job reads the input file from the location <<rootPath>>/emr-step-functions/input/tripdata.csv. If the job finishes successfully, it writes the output data to <<rootPath>>/emr-step-functions/miles-per-rate.
If the Spark job fails, it transitions to the error state MilesPerRate job failed, and the state machine stops. If the Spark job finishes successfully, it transitions to the RateCodeStatus Job state, and the second Spark job is executed.
If the second Spark job fails, it transitions to the error state RateCodeStatus job failed, and the state machine stops with the Failed status.
If this Spark job completes successfully, it writes the final output data to the <<rootPath>>/emr-step-functions/rate-code-status/ It also transitions the RateCodeStatus job finished state, and the state machine ends its execution with the Success status.
This following screenshot shows a successfully completed Spark ETL state machine:
The right side of the state machine diagram shows the details of individual states with their input and output.
When you execute the state machine for the second time, it fails because the S3 path already exists. The state machine turns red and stops at MilePerRate job failed. The following image represents that failed execution of the state machine:
You can also check your Spark application status and logs by going to the Amazon EMR console and viewing the Application history tab:
I hope this walkthrough paints a picture of how you can create a serverless solution for orchestrating Spark jobs on Amazon EMR using AWS Step Functions and Apache Livy. In the next section, I share some ideas for making this solution even more elegant.
Next steps
The goal of this post is to show a simple example that uses AWS Step Functions to create an orchestration for Spark-based jobs in a serverless fashion. To make this solution robust and production ready, you can explore the following options:
In this example, I manually initiated the state machine by passing the rootPath as input. You can instead trigger the state machine automatically. To run the ETL pipeline as soon as the files arrive in your S3 bucket, you can pass the new file path to the state machine. Because CloudWatch Events supports AWS Step Functions as a target, you can create a CloudWatch rule for an S3 event. You can then set AWS Step Functions as a target and pass the new file path to your state machine. You’re all set!
You can also improve this solution by adding an alerting mechanism in case of failures. To do this, create a Lambda function that sends an alert email and assigns that Lambda function to a Fail That way, when any part of your state fails, it triggers an email and notifies the user.
If you want to submit multiple Spark jobs in parallel, you can use the Parallel state type in AWS Step Functions. The Parallel state is used to create parallel branches of execution in your state machine.
With Lambda and AWS Step Functions, you can create a very robust serverless orchestration for your big data workload.
Cleaning up
When you’ve finished testing this solution, remember to clean up all those AWS resources that you created using AWS CloudFormation. Use the AWS CloudFormation console or AWS CLI to delete the stack named Blog-Spark-ETL-Step-Functions.
Summary
In this post, I showed you how to use AWS Step Functions to orchestrate your Spark jobs that are running on Amazon EMR. You used Apache Livy to submit jobs to Spark from a Lambda function and created a workflow for your Spark jobs, maintaining a specific order for job execution and triggering different AWS events based on your job’s outcome. Go ahead—give this solution a try, and share your experience with us!
Tanzir Musabbir is an EMR Specialist Solutions Architect with AWS. He is an early adopter of open source Big Data technologies. At AWS, he works with our customers to provide them architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena & AWS Glue. Tanzir is a big Real Madrid fan and he loves to travel in his free time.
When I talk with customers and partners, I find that they are in different stages in the adoption of DevOps methodologies. They are automating the creation of application artifacts and the deployment of their applications to different infrastructure environments. In many cases, they are creating and supporting multiple applications using a variety of coding languages and artifacts.
The management of these processes and artifacts can be challenging, but using the right tools and methodologies can simplify the process.
In this post, I will show you how you can automate the creation and storage of application artifacts through the implementation of a pipeline and custom deploy action in AWS CodePipeline. The example includes a Node.js code base stored in an AWS CodeCommit repository. A Node Package Manager (npm) artifact is built from the code base, and the build artifact is published to a JFrogArtifactory npm repository.
I frequently recommend AWS CodePipeline, the AWS continuous integration and continuous delivery tool. You can use it to quickly innovate through integration and deployment of new features and bug fixes by building a workflow that automates the build, test, and deployment of new versions of your application. And, because AWS CodePipeline is extensible, it allows you to create a custom action that performs customized, automated actions on your behalf.
JFrog’s Artifactory is a universal binary repository manager where you can manage multiple applications, their dependencies, and versions in one place. Artifactory also enables you to standardize the way you manage your package types across all applications developed in your company, no matter the code base or artifact type.
If you already have a Node.js CodeCommit repository, a JFrog Artifactory host, and would like to automate the creation of the pipeline, including the custom action and CodeBuild project, you can use this AWS CloudFormationtemplate to create your AWS CloudFormation stack.
This figure shows the path defined in the pipeline for this project. It starts with a change to Node.js source code committed to a private code repository in AWS CodeCommit. With this change, CodePipeline triggers AWS CodeBuild to create the npm package from the node.js source code. After the build, CodePipeline triggers the custom action job worker to commit the build artifact to the designated artifact repository in Artifactory.
This blog post assumes you have already:
· Created a CodeCommit repository that contains a Node.js project.
· Configured a two-stage pipeline in AWS CodePipeline.
The Source stage of the pipeline is configured to poll the Node.js CodeCommit repository. The Build stage is configured to use a CodeBuild project to build the npm package using a buildspec.yml file located in the code repository.
If you do not have a Node.js repository, you can create a CodeCommit repository that contains this simple ‘Hello World’ project. This project also includes a buildspec.yml file that is used when you define your CodeBuild project. It defines the steps to be taken by CodeBuild to create the npm artifact.
If you do not already have a pipeline set up in CodePipeline, you can use this template to create a pipeline with a CodeCommit source action and a CodeBuild build action through the AWS Command Line Interface (AWS CLI). If you do not want to install the AWS CLI on your local machine, you can use AWS Cloud9, our managed integrated development environment (IDE), to interact with AWS APIs.
In your development environment, open your favorite editor and fill out the template with values appropriate to your project. For information, see the readme in the GitHub repository.
Use this CLI command to create the pipeline from the template:
It creates a pipeline that has a CodeCommit source action and a CodeBuild build action.
Integrating JFrog Artifactory
JFrog Artifactory provides default repositories for your project needs. For my NPM package repository, I am using the default virtual npm repository (named npm) that is available in Artifactory Pro. You might want to consider creating a repository per project but for the example used in this post, using the default lets me get started without having to configure a new repository.
I can use the steps in the Set Me Up -> npm section on the landing page to configure my worker to interact with the default NPM repository.
Describes the required values to run the custom action. I will define my custom action in the ‘Deploy’ category, identify the provider as ‘Artifactory’, of version ‘1’, and specify a variety of configurationProperties whose values will be defined when this stage is added to my pipeline.
Polls CodePipeline for a job, scanning for its action-definition properties. In this blog post, after a job has been found, the job worker does the work required to publish the npm artifact to the Artifactory repository.
{
"category": "Deploy",
"configurationProperties": [{
"name": "TypeOfArtifact",
"required": true,
"key": true,
"secret": false,
"description": "Package type, ex. npm for node packages",
"type": "String"
},
{ "name": "RepoKey",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Name of the repository in which this artifact should be stored"
},
{ "name": "UserName",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Username for authenticating with the repository"
},
{ "name": "Password",
"required": true,
"key": true,
"secret": true,
"type": "String",
"description": "Password for authenticating with the repository"
},
{ "name": "EmailAddress",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Email address used to authenticate with the repository"
},
{ "name": "ArtifactoryHost",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Public address of Artifactory host, ex: https://myexamplehost.com or http://myexamplehost.com:8080"
}],
"provider": "Artifactory",
"version": "1",
"settings": {
"entityUrlTemplate": "{Config:ArtifactoryHost}/artifactory/webapp/#/artifacts/browse/tree/General/{Config:RepoKey}"
},
"inputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 1
},
"outputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 0
}
}
There are seven sections to the custom action definition:
category: This is the stage in which you will be creating this action. It can be Source, Build, Deploy, Test, Invoke, Approval. Except for source actions, the category section simply allows us to organize our actions. I am setting the category for my action as ‘Deploy’ because I’m using it to publish my node artifact to my Artifactory instance.
configurationProperties: These are the parameters or variables required for your project to authenticate and commit your artifact. In the case of my custom worker, I need:
TypeOfArtifact: In this case, npm, because it’s for the Node Package Manager.
RepoKey: The name of the repository. In this case, it’s the default npm.
UserName and Password for the user to authenticate with the Artifactory repository.
EmailAddress used to authenticate with the repository.
Artifactory host name or IP address.
provider: The name you define for your custom action stage. I have named the provider Artifactory.
version: Version number for the custom action. Because this is the first version, I set the version number to 1.
entityUrlTemplate: This URL is presented to your users for the deploy stage along with the title you define in your provider. The link takes the user to their artifact repository page in the Artifactory host.
inputArtifactDetails: The number of artifacts to expect from the previous stage in the pipeline.
outputArtifactDetails: The number of artifacts that should be the result from the custom action stage. Later in this blog post, I define 0 for my output artifacts because I am publishing the artifact to the Artifactory repository as the final action.
After I define the custom action in a JSON file, I use the AWS CLI to create the custom action type in CodePipeline:
After I create the custom action type in the same region as my pipeline, I edit the pipeline to add a Deploy stage and configure it to use the custom action I created for Artifactory:
I have created a custom worker for the actions required to commit the npm artifact to the Artifactory repository. The worker is in Python and it runs in a loop on an Amazon EC2 instance. My custom worker polls for a deploy job and publishes the NPM artifact to the Artifactory repository.
The EC2 instance is running Amazon Linux and has an IAM instance role attached that gives the worker permission to access CodePipeline. The worker process is as follows:
Take the configuration properties from the custom worker and poll CodePipeline for a custom action job.
After there is a job in the job queue with the appropriate category, provider, and version, acknowledge the job.
Download the zipped artifact created in the previous Build stage from the provided S3 buckets with the provided temporary credentials.
Unzip the artifact into a temporary directory.
A user-defined Artifactory user name and password is used to receive a temporary API key from Artifactory.
To avoid having to write the password to a file, use that temporary API key and user name to authenticate with the NPM repository.
Publish the Node.js package to the specified repository.
Because I am running my custom worker on an Amazon Linux EC2 instance, I installed npm with the following command:
sudo yum install nodejs npm --enablerepo=epel
For my custom worker, I used pip to install the required Python libraries:
pip install boto3 requests
For a full Python package list, see requirements.txt in the GitHub repository.
Let’s take a look at some of the code snippets from the worker.
First, the worker polls for jobs:
def action_type():
ActionType = {
'category': 'Deploy',
'owner': 'Custom',
'provider': 'Artifactory',
'version': '1' }
return(ActionType)
def poll_for_jobs():
try:
artifactory_action_type = action_type()
print(artifactory_action_type)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
while not jobs['jobs']:
time.sleep(10)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
if jobs['jobs']:
print('Job found')
return jobs['jobs'][0]
except ClientError as e:
print("Received an error: %s" % str(e))
raise
When there is a job in the queue, the poller returns a number of values from the queue such as jobId, the input and output S3 buckets for artifacts, temporary credentials to access the S3 buckets, and other configuration details from the stage in the pipeline.
After successfully receiving the job details, the worker sends an acknowledgement to CodePipeline to ensure that the work on the job is not duplicated by other workers watching for the same job:
def job_acknowledge(jobId, nonce):
try:
print('Acknowledging job')
result = codepipeline.acknowledge_job(jobId=jobId, nonce=nonce)
return result
except Exception as e:
print("Received an error when trying to acknowledge the job: %s" % str(e))
raise
With the job now acknowledged, the worker publishes the source code artifact into the desired repository. The worker gets the value of the artifact S3 bucket and objectKey from the inputArtifacts in the response from the poll_for_jobs API request. Next, the worker creates a new directory in /tmp and downloads the S3 object into this directory:
def get_bucket_location(bucketName, init_client):
region = init_client.get_bucket_location(Bucket=bucketName)['LocationConstraint']
if not region:
region = 'us-east-1'
return region
def get_s3_artifact(bucketName, objectKey, ak, sk, st):
init_s3 = boto3.client('s3')
region = get_bucket_location(bucketName, init_s3)
session = Session(aws_access_key_id=ak,
aws_secret_access_key=sk,
aws_session_token=st)
s3 = session.resource('s3',
region_name=region,
config=botocore.client.Config(signature_version='s3v4'))
try:
tempdirname = tempfile.mkdtemp()
except OSError as e:
print('Could not write temp directory %s' % tempdirname)
raise
bucket = s3.Bucket(bucketName)
obj = bucket.Object(objectKey)
filename = tempdirname + '/' + objectKey
try:
if os.path.dirname(objectKey):
directory = os.path.dirname(filename)
os.makedirs(directory)
print('Downloading the %s object and writing it to disk in %s location' % (objectKey, tempdirname))
with open(filename, 'wb') as data:
obj.download_fileobj(data)
except ClientError as e:
print('Downloading the object and writing the file to disk raised this error: ' + str(e))
raise
return(filename, tempdirname)
Because the downloaded artifact from S3 is a zip file, the worker must unzip it first. To have a clean area in which to work, I extract the downloaded zip archive into a new directory:
def unzip_codepipeline_artifact(artifact, origtmpdir):
# create a new temp directory
# Unzip artifact into new directory
try:
newtempdir = tempfile.mkdtemp()
print('Extracting artifact %s into temporary directory %s' % (artifact, newtempdir))
zip_ref = zipfile.ZipFile(artifact, 'r')
zip_ref.extractall(newtempdir)
zip_ref.close()
shutil.rmtree(origtmpdir)
return(os.listdir(newtempdir), newtempdir)
except OSError as e:
if e.errno != errno.EEXIST:
shutil.rmtree(newtempdir)
raise
The worker now has the npm package that I want to store in my Artifactory NPM repository.
To authenticate with the NPM repository, the worker requests a temporary token from the Artifactory host. After receiving this temporary token, it creates a .npmrc file in the worker user’s home directory that includes a hash of the user name and temporary token. After it has authenticated, the worker runs npm config set registry <URL OF REPOSITORY> to configure the npm registry value to be the Artifactory host. Next, the worker runs npm publish –registry <URL OF REPOSITORY>, which publishes the node package to the NPM repository in the Artifactory host.
def push_to_npm(configuration, artifact_list, temp_dir, jobId):
reponame = configuration['RepoKey']
art_type = configuration['TypeOfArtifact']
print("Putting artifact into NPM repository " + reponame)
token, hostname, username = gen_artifactory_auth_token(configuration)
npmconfigfile = create_npmconfig_file(configuration, username, token)
url = hostname + '/artifactory/api/' + art_type + '/' + reponame
print("Changing directory to " + str(temp_dir))
os.chdir(temp_dir)
try:
print("Publishing following files to the repository: %s " % os.listdir(temp_dir))
print("Sending artifact to Artifactory NPM registry URL: " + url)
subprocess.call(["npm", "config", "set", "registry", url])
req = subprocess.call(["npm", "publish", "--registry", url])
print("Return code from npm publish: " + str(req))
if req != 0:
err_msg = "npm ERR! Recieved non OK response while sending response to Artifactory. Return code from npm publish: " + str(req)
signal_failure(jobId, err_msg)
else:
signal_success(jobId)
except requests.exceptions.RequestException as e:
print("Received an error when trying to commit artifact %s to repository %s: " % (str(art_type), str(configuration['RepoKey']), str(e)))
raise
return(req, npmconfigfile)
If the return value from publishing to the repository is not 0, the worker signals a failure to CodePipeline. If the value is 0, the worker signals success to CodePipeline to indicate that the stage of the pipeline has been completed successfully.
For the custom worker code, see npm_job_worker.py in the GitHub repository.
I run my custom worker on an EC2 instance using the command python npm_job_worker.py, with an optional --version flag that can be used to specify worker versions other than 1. Then I trigger a release change in my pipeline:
From my custom worker output logs, I have just committed a package named node_example at version 1.0.3:
On artifact: index.js
Committing to the repo: https://artifactory.myexamplehost.com/artifactory/api/npm/npm
Sending artifact to Artifactory URL: https:// artifactoryhost.myexamplehost.com/artifactory/api/npm/npm
npm config: 0
npm http PUT https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
npm http 201 https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
+ [email protected]
Return code from npm publish: 0
Signaling success to CodePipeline
After that has been built successfully, I can find my artifact in my Artifactory repository:
To help you automate this process, I have created this AWS CloudFormation template that automates the creation of the CodeBuild project, the custom action, and the CodePipeline pipeline. It also launches the Amazon EC2-based custom job worker in an AWS Auto Scaling group. This template requires you to have a VPC and CodeCommit repository for your Node.js project. If you do not currently have a VPC in which you want to run your custom worker EC2 instances, you can use this AWS QuickStart to create one. If you do not have an existing Node.js project, I’ve provided a sample project in the GitHub repository.
Conclusion
I‘ve shown you the steps to integrate your JFrog Artifactory repository with your CodePipeline workflow. I’ve shown you how to create a custom action in CodePipeline and how to create a custom worker that works in your CI/CD pipeline. To dig deeper into custom actions and see how you can integrate your Artifactory repositories into your AWS CodePipeline projects, check out the full code base on GitHub.
If you have any questions or feedback, feel free to reach out to us through the AWS CodePipeline forum.
Erin McGill is a Solutions Architect in the AWS Partner Program with a focus on DevOps and automation tooling.
This post courtesy of Jeff Levine Solutions Architect for Amazon Web Services
Amazon Linux 2 is the next generation of Amazon Linux, a Linux server operating system from Amazon Web Services (AWS). Amazon Linux 2 offers a high-performance Linux environment suitable for organizations of all sizes. It supports applications ranging from small websites to enterprise-class, mission-critical platforms.
Amazon Linux 2 includes support for the LAMP (Linux/Apache/MariaDB/PHP) stack, one of the most popular platforms for deploying websites. To secure the transmission of data-in-transit to such websites and prevent eavesdropping, organizations commonly leverage Secure Sockets Layer/Transport Layer Security (SSL/TLS) services which leverage certificates to provide encryption. The LAMP stack provided by Amazon Linux 2 includes a self-signed SSL/TLS certificate. Such certificates may be fine for internal usage but are not acceptable when attestation by a certificate authority is required.
In this post, I discuss how to extend the capabilities of Amazon Linux 2 by installing Let’s Encrypt, a certificate authority provided by the Internet Security Research Group. Let’s Encrypt offers basic SSL/TLS certificates for DNS hosts at no charge that you can use to add encryption-in-transit to a single web server. For commercial or multi-server configurations, you should consider AWS Certificate Manager and Elastic Load Balancing.
Let’s Encrypt also requires the certbot package, which you install from EPEL, the Extra Packaged for Enterprise Linux collection. Although EPEL is not included with Amazon Linux 2, I show how you can install it from the Fedora Project.
Walkthrough
At a high level, you perform the following tasks for this walkthrough:
Provision a VPC, Amazon Linux 2 instance, and LAMP stack.
Install and enable the EPEL repository.
Install and configure Let’s Encrypt.
Validate the installation.
Clean up.
Prerequisites and costs
To follow along with this walkthrough, you need the following:
Accept all other default values including with regard to storage.
Create a new security group and accept the default rule that allows TCP port 22 (SSH) from everywhere (0.0.0.0/0 in IPv4). For the purposes of this walkthrough, permitting access from all IP addresses is reasonable. In a production environment, you may restrict access to different addresses.
Allocate and associate an Elastic IP address to the server when it enters the running state.
Respond “Y” to all requests for approval to install the software.
Step 3: Install and configure Let’s Encrypt
If you are no longer connected to the Amazon Linux 2 instance, connect to it at the Elastic IP address that you just created.
Install certbot, the Let’s Encrypt client to be used to obtain an SSL/TLS certificate and install it into Apache.
sudo yum install python2-certbot-apache.noarch
Respond “Y” to all requests for approval to install the software. If you see a message appear about SELinux, you can safely ignore it. This is a known issue with the latest version of certbot.
Create a DNS “A record” that maps a host name to the Elastic IP address. For this post, assume that the name of the host is lamp.example.com. If you are hosting your DNS in Amazon Route 53, do this by creating the appropriate record set.
After the “A record” has propagated, browse to lamp.example.com. The Apache test page should appear. If the page does not appear, use a tool such as nslookup on your workstation to confirm that the DNS record has been properly configured.
You are now ready to install Let’s Encrypt. Let’s Encrypt does the following:
Confirms that you have control over the DNS domain being used, by having you create a DNS TXT record using the value that it provides.
Obtains an SSL/TLS certificate.
Modifies the Apache-related scripts to use the SSL/TLS certificate and redirects users browsing the site in HTTP mode to HTTPS mode.
Use the following command to install certbot:
sudo certbot -i apache -a manual \
--preferred-challenges dns -d lamp.example.com
The options have the following meanings:
-i apache Use the Apache installer.
-a manual Authenticate domain ownership manually.
--preferred-challenges dns Use DNS TXT records for authentication challenge.
-d lamp.example.com Specify the domain for the SSL/TLS certificate.
You are prompted for the following information: E-mail address for renewals? Enter an email address for certificate renewals. Accept the terms of services? Respond as appropriate. Send your e-mail address to the EFF? Respond as appropriate. Log your current IP address? Respond as appropriate.
You are prompted to deploy a DNS TXT record with the name “_acme-challenge.lamp.example.com” with the supplied value, as shown below.
After you enter the record, wait until the TXT record propagates. To look up the TXT record to confirm the deployment, use the nslookup command in a separate command window, as shown below. Remember to use the set ty=txt command before entering the TXT record. You are prompted to select a virtual host. There is only one, so choose 1. The final prompt asks whether to redirect HTTP traffic to HTTPS. To perform the redirection, choose 2. That completes the configuration of Let’s Encrypt.
Browse to the http:// lamp.example.com site. You are redirected to the SSL/TLS page https://lamp.example.com.
To look at the encryption information, use the appropriate actions within your browser. For example, in Firefox, you can open the padlock and traverse the menus. In the encryption technical details, you can see from the “Connection Encrypted” line that traffic to the website is now encrypted using TLS 1.2.
Security note: As of the time of publication, this website also supports TLS 1.0. I recommend that you disable this protocol because of some known vulnerabilities associated with it. To do this:
Edit the file /etc/letsencrypt/options-ssl-apache.conf.
Look for the line beginning with SSLProtocol and change it to the following:
SSLProtocol all -SSLv2 -SSLv3 -TLSv1
Save the file. After you make changes to this file, Let’s Encrypt no longer automatically updates it. Periodically check your log files for recommended updates to this file.
Restart the httpd server with the following command:
sudo service httpd restart
Step 5: Cleanup
Use the following steps to avoid incurring any further costs.
Terminate the Amazon Linux 2 instance that you created.
Release the Elastic IP address that you allocated.
Revert any DNS changes that you made, including the A and TXT records.
Conclusion
Amazon Linux 2 is an excellent option for hosting websites through the LAMP stack provided by the Amazon-Linux-Extras feature. You can then enhance the security of the Apache web server by installing EPEL and Let’s Encrypt. Let’s Encrypt provisions an SSL/TLS certificate, optionally installs it for you on the Apache server, and enables data-in-transit encryption. You can get started with Amazon Linux 2 in just a few clicks.
The EU’s General Data Protection Regulation (GDPR) describes data processor and data controller roles, and some customers and AWS Partner Network (APN) partners are asking how this affects the long-established AWS Shared Responsibility Model. I wanted to take some time to help folks understand shared responsibilities for us and for our customers in context of the GDPR.
How does the AWS Shared Responsibility Model change under GDPR? The short answer – it doesn’t. AWS is responsible for securing the underlying infrastructure that supports the cloud and the services provided; while customers and APN partners, acting either as data controllers or data processors, are responsible for any personal data they put in the cloud. The shared responsibility model illustrates the various responsibilities of AWS and our customers and APN partners, and the same separation of responsibility applies under the GDPR.
AWS responsibilities as a data processor
The GDPR does introduce specific regulation and responsibilities regarding data controllers and processors. When any AWS customer uses our services to process personal data, the controller is usually the AWS customer (and sometimes it is the AWS customer’s customer). However, in all of these cases, AWS is always the data processor in relation to this activity. This is because the customer is directing the processing of data through its interaction with the AWS service controls, and AWS is only executing customer directions. As a data processor, AWS is responsible for protecting the global infrastructure that runs all of our services. Controllers using AWS maintain control over data hosted on this infrastructure, including the security configuration controls for handling end-user content and personal data. Protecting this infrastructure, is our number one priority, and we invest heavily in third-party auditors to test our security controls and make any issues they find available to our customer base through AWS Artifact. Our ISO 27018 report is a good example, as it tests security controls that focus on protection of personal data in particular.
AWS has an increased responsibility for our managed services. Examples of managed services include Amazon DynamoDB, Amazon RDS, Amazon Redshift, Amazon Elastic MapReduce, and Amazon WorkSpaces. These services provide the scalability and flexibility of cloud-based resources with less operational overhead because we handle basic security tasks like guest operating system (OS) and database patching, firewall configuration, and disaster recovery. For most managed services, you only configure logical access controls and protect account credentials, while maintaining control and responsibility of any personal data.
Customer and APN partner responsibilities as data controllers — and how AWS Services can help
Our customers can act as data controllers or data processors within their AWS environment. As a data controller, the services you use may determine how you configure those services to help meet your GDPR compliance needs. For example, AWS Services that are classified as Infrastructure as a Service (IaaS), such as Amazon EC2, Amazon VPC, and Amazon S3, are under your control and require you to perform all routine security configuration and management that would be necessary no matter where the servers were located. With Amazon EC2 instances, you are responsible for managing: guest OS (including updates and security patches), application software or utilities installed on the instances, and the configuration of the AWS-provided firewall (called a security group).
To help you realize data protection by design principles under the GDPR when using our infrastructure, we recommend you protect AWS account credentials and set up individual user accounts with Amazon Identity and Access Management (IAM) so that each user is only given the permissions necessary to fulfill their job duties. We also recommend using multi-factor authentication (MFA) with each account, requiring the use of SSL/TLS to communicate with AWS resources, setting up API/user activity logging with AWS CloudTrail, and using AWS encryption solutions, along with all default security controls within AWS Services. You can also use advanced managed security services, such as Amazon Macie, which assists in discovering and securing personal data stored in Amazon S3.
For more information, you can download the AWS Security Best Practices whitepaper or visit the AWS Security Resources or GDPR Center webpages. In addition to our solutions and services, AWS APN partners can provide hundreds of tools and features to help you meet your security objectives, ranging from network security and configuration management to access control and data encryption.
Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.
To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.
Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.
What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.
Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.
Solution overview
Here is how the solution is laid out:
The following sections walk you through each of these steps to set up the pipeline.
1. Define the schema
When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.
To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.
The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.
That’s all you have to do to prepare the schema for Kinesis Data Firehose.
2. Define the data streams
Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events. Open the Kinesis Data Streams console and create two streams. You can configure them with only one shard each because you don’t have a lot of data right now.
3. Define the Kinesis Data Firehose delivery stream for Parquet
Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.
As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.
To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.
On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.
Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.
Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.
Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.
4. Set up the Amazon Connect contact center
After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.
After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.
At this point, your pipeline is complete. Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.
So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.
To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.
5. Analyze the data with Athena
Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures. The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.
Let’s run the first query to see which agents have logged into the system.
The query might look complex, but it’s fairly straightforward:
WITH dataset AS (
SELECT
from_iso8601_timestamp(eventtimestamp) AS event_ts,
eventtype,
-- CURRENT STATE
json_extract_scalar(
currentagentsnapshot,
'$.agentstatus.name') AS current_status,
from_iso8601_timestamp(
json_extract_scalar(
currentagentsnapshot,
'$.agentstatus.starttimestamp')) AS current_starttimestamp,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.firstname') AS current_firstname,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.lastname') AS current_lastname,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.username') AS current_username,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.routingprofile.defaultoutboundqueue.name') AS current_outboundqueue,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
-- PREVIOUS STATE
json_extract_scalar(
previousagentsnapshot,
'$.agentstatus.name') as prev_status,
from_iso8601_timestamp(
json_extract_scalar(
previousagentsnapshot,
'$.agentstatus.starttimestamp')) as prev_starttimestamp,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.firstname') as prev_firstname,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.lastname') as prev_lastname,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.username') as prev_username,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
from kfhconnectblog
where eventtype <> 'HEART_BEAT'
)
SELECT
current_status as status,
current_username as username,
event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC
The query output looks something like this:
Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.
WITH src AS (
SELECT
eventid,
json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
from kfhconnectblog
),
src2 AS (
SELECT *
FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT
eventid,
username,
json_extract_scalar(c_item, '$.contactid') as c_contactid,
json_extract_scalar(c_item, '$.channel') as c_channel,
json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
json_extract_scalar(c_item, '$.queue.name') as c_queue,
json_extract_scalar(c_item, '$.state') as c_state,
from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
json_extract_scalar(p_item, '$.contactid') as p_contactid,
json_extract_scalar(p_item, '$.channel') as p_channel,
json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
json_extract_scalar(p_item, '$.queue.name') as p_queue,
json_extract_scalar(p_item, '$.state') as p_state,
from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT
username,
c_channel as channel,
c_direction as direction,
p_state as prev_state,
c_state as current_state,
c_ts as current_ts,
c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC
The query output looks similar to the following:
6. Analyze the data with Amazon Redshift Spectrum
With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.
Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.
SELECT
eventtype,
json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
json_extract_path_text(
currentagentsnapshot,
'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog
The following shows the query output:
Summary
In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.
Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.