Zero Trust architectures: An AWS perspective

Post Syndicated from Mark Ryland original https://aws.amazon.com/blogs/security/zero-trust-architectures-an-aws-perspective/

Our mission at Amazon Web Services (AWS) is to innovate on behalf of our customers so they have less and less work to do when building, deploying, and rapidly iterating on secure systems. From a security perspective, our customers seek answers to the ongoing question What are the optimal patterns to ensure the right level of confidentiality, integrity, and availability of my systems and data while increasing speed and agility? Increasingly, customers are asking specifically about how security architectural patterns that fall under the banner of Zero Trust architecture or Zero Trust networking might help answer this question.

Given the surge in interest in technology that uses the Zero Trust label, as well as the variety of concepts and models that come under the Zero Trust umbrella, we’d like to provide our perspective. We’ll share our definition and guiding principles for Zero Trust, and then explore the larger subdomains that have emerged under that banner. We’ll also talk about how AWS has woven these principles into the fabric of the AWS cloud since its earliest days, as well as into many recent developments. Finally, we’ll review how AWS can help you on your own Zero Trust journey, focusing on the underlying security objectives that matter most to our customers. Technological approaches rise and fall, but underlying security objectives tend to be relatively stable over time. (A good summary of some of those can be found in the Design Principles of the AWS Well-Architected Framework.)

Definition and guiding principles for Zero Trust

Let’s start out with a general definition. Zero Trust is a conceptual model and an associated set of mechanisms that focus on providing security controls around digital assets that do not solely or fundamentally depend on traditional network controls or network perimeters. The zero in Zero Trust fundamentally refers to diminishing—possibly to zero!—the trust historically created by an actor’s location within a traditional network, whether we think of the actor as a person or a software component. In a Zero Trust world, network-centric trust models are augmented or replaced by other techniques—which we can describe generally as identity-centric controls—to provide equal or better security mechanisms than we had in place previously. Better security mechanisms should be understood broadly to include attributes such as greater usability and flexibility, even if the overall security posture remains the same. Let’s consider more details and possible approaches along the two dimensions.

One dimension is the network. Do we achieve Zero Trust by allowing all network packets to flow between all hosts or endpoints, but implement all security controls above the network layer? Or do we break our systems down into smaller logical components and implement much tighter network segments or packet-level controls—so-called micro-segments or micro-perimeters? Do we add some kind of gateway or proxy technology that enforces a new kind of trust boundary? Do we still use VPN technology for network isolation but make it more dynamic and hidden from the user experience, so that users don’t even notice that network boundaries are being created and torn down as needed? Or some combination of these techniques?

The other dimension is identity and access management. Are we talking about human actors with their PCs, tablets, and phones trying to access web applications? Or are we talking about machine-to-machine, software-to-software communication, where all requests are authenticated and authorized using other kinds of techniques? Or perhaps we’re thinking of some combination of the two. For example, certain security-relevant properties or attributes of the user’s situation—strength of authentication, device type, ownership, posture assessment, health, network location, and others—are propagated to and through the software systems with which the user is interacting, and alter their access dynamically.

Thus, as we start to look more closely at Zero Trust, we can immediately see the possibility of confusion—because many different topics and concepts are implicated—but also a clear indication of opportunities to build better, more flexible, and more secure software systems. What are some of the principles that can help guide us through both the confusion and the opportunities?

Our first guiding principle for Zero Trust is that while the conceptual model decreases reliance on network location, the role of network controls and perimeters remains important to the overall security architecture. In other words, the best security doesn’t come from making a binary choice between identity-centric and network-centric tools, but rather by using both effectively in combination with each other. Identity-centric controls, such as the AWS SigV4 request signing process, which is used to interact with AWS API endpoints, uniquely authenticate and authorize each and every signed API request, and provide very fine-grained access controls. However, network-centric tools such as Amazon Virtual Private Cloud (Amazon VPC), security groups, AWS PrivateLink, and VPC endpoints are straightforward to understand and use, filter unnecessary noise out of the system, and provide excellent guardrails within which identity-centric controls can operate. Ideally, these two kinds of controls should not only coexist, they should be aware of and augment one another. For example, VPC endpoints provide the ability to attach a policy that allows you to write and enforce identity-centric rules at a logical network boundary—in that case, the private network exit from your Amazon VPC on the way to a nearby AWS service endpoint.

Our second guiding principle for Zero Trust is that it can mean different things in different contexts. Arguably one of the key reasons for the ambiguity surrounding Zero Trust is that the term encompasses many different use cases which share only the fundamental technical concept of diminishing the security relevance of a network location or boundary. Yet those use cases differ substantially in what they’re trying to achieve for the organization. As we noted above, common examples of Zero Trust goals range from ensuring workforce agility and mobility—using browsers and mobile apps and the internet to access business systems and applications—to the creation of carefully segmented micro-service architectures inside of new cloud-based applications. By focusing on a specific problem that we’re trying to solve, and approaching it with fresh eyes and new tools, we can avoid getting mired in low-value discussions around whether a new approach to a security challenge is really—or to what degree it is—an application of the Zero Trust concept.

Our third guiding principle is that Zero Trust concepts must be applied in accordance with the organizational value of the system and data being protected. Over time, the application of the Zero Trust conceptual model and associated mechanisms will continue to improve defense in depth, and continue to make security controls we already have work better through the increased visibility and software-defined nature of the cloud. Applied well, the tenets of Zero Trust can significantly raise the security bar, especially for critical workloads. However, if applied in strict orthodoxy, Zero Trust methods can limit the incorporation of more traditional technologies into upgraded or new systems, and stifle innovation by overly taxing organizations where the benefits aren’t commensurate with the effort. For many business systems, network controls and network perimeters will continue to be important and usually adequate controls for a long time, perhaps forever. We believe it’s best to think of Zero Trust concepts as additive to existing security controls and concepts, rather than as replacements.

Examples of Zero Trust principles and capabilities at work today within the AWS cloud

The most prominent example of Zero Trust in AWS is how millions of customers typically interact with AWS every day using the AWS Management Console or securely calling AWS APIs over a diverse set of public and private networks. Whether called via the console, the AWS Command Line Interface (AWS CLI), or software written to the AWS APIs, ultimately all of these methods of interaction reach a set of web services with endpoints that are reachable from the internet. There is absolutely nothing about the security of the AWS API infrastructure that depends on network reachability. Each one of these signed API requests is authenticated and authorized every single time at rates of millions upon millions of requests per second globally. Our customers do so confidently; knowing that the cryptographic strength of the underlying Transport Layer Security (TLS) protocol—augmented by the AWS Signature v4 signing process—properly secures these requests without any regard to the trustworthiness of the underlying network. Interestingly, the use of cloud-based APIs is rarely—if ever—mentioned in Zero Trust discussions. Perhaps this is because AWS led the way with this approach to securing APIs from the start, such that it is now assumed to be a basic part of every cloud security story.

Similarly, but perhaps not as well understood, when individual AWS services need to call each other to operate and deliver their service capabilities, they rely on the same mechanisms that you use as a customer. You can see this in action in the form of service-linked roles. For example, when AWS Auto Scaling determines that it needs to call the Amazon Elastic Compute Cloud (Amazon EC2) API to create or terminate an EC2 instance in your account, the AWS Auto Scaling service assumes the service-linked role you’ve provided in your account, receives the resulting AWS short-term credentials, and uses these credentials to sign requests using the SigV4 process to the appropriate EC2 APIs. On the receiving end, AWS Identity and Access Management (IAM) authenticates and authorizes the incoming calls for EC2. In other words, even though they’re both AWS services, AWS Auto Scaling and EC2 have no inherent trust, network or otherwise, of one another and use strong identity-centric controls as the basis of the security model between the two services as they operate on your behalf. You, the customer, have full visibility into both the privileges that you’re granting to one service, as well as an AWS CloudTrail record of the use of those privileges.

Other great examples of Zero Trust capabilities in the AWS portfolio can be found in the IoT Service. When we launched AWS IoT Core we made a strategic decision—against the prevailing industry norms at the time—to always require TLS network encryption and modern client authentication, including certificate-based mutual TLS, when connecting IoT devices to service endpoints. We subsequently added TLS support to FreeRTOS, enabling modern, secure communication to an entire class of small CPU and small memory devices that were previously assumed to not be capable of it. With AWS IoT Greengrass, we pioneered a way of working with existing no-security devices using a remote gateway that relied on local network presence but also was able to run AWS Lambda functions to validate security and provide a secure proxy to the cloud. These examples highlight where adherence to AWS security standards brought key foundational components of Zero Trust to a technology domain where vast amounts of unauthenticated, unencrypted network messaging over the open internet was previously the norm.

How AWS can help you on your Zero Trust journey

To help you on your own Zero Trust journey, there are a number of AWS cloud-specific identity and networking capabilities that provide core Zero Trust building blocks as standard features. AWS services provide this functionality via simple API calls, without you needing to build, maintain, or operate any infrastructure or additional software components. To help best frame the conversation, we’ll consider these capabilities against the backdrop of three distinct use cases:

Authorizing specific flows between components to eliminate unneeded lateral network mobility.
Enabling friction-free access to internal applications for your workforce.
Securing digital transformation projects such as IoT.

Our first use case focuses mainly on machine-to-machine communications—authorizing specific flows between components to help eliminate lateral network mobility risk. Otherwise put, if two components don’t need to talk to one another across the network, they shouldn’t be able to, even if these systems happen to exist within the same network or network segment. This greatly reduces the overall surface area of the connected systems and eliminates unneeded pathways, particularly those that lead to sensitive data. Within this use case, our discussion should begin with security groups, which have been a part of Amazon EC2 since its earliest days. Security groups provide highly dynamic, software-defined network micro-perimeters for both north-south and east-west traffic. Security group assignments occur automatically as resources come and go, and rules in one security group can reference one another by ID, either within the same Amazon VPC or across larger peered networks in the same or different regions. These properties allow security groups to act as a kind of identity system in which group membership becomes a relevant property for determining whether or not to permit particular network flows. This helps enable you to author extremely granular rules without the associated operational burden of keeping them up-to-date as membership in a group ebbs and flows. Similarly, PrivateLink provides an extremely useful building block in the general space of micro-perimeters and micro-segmentation. Using PrivateLink, a load-balanced endpoint can be exposed as a narrow, one-way gateway between two VPCs, with tight identity-based controls determining who can access the gateway and where incoming packets can land. Initiating network connections in the other direction isn’t allowed at all, and the VPCs don’t even need to have routes between one another. Thousands of customers use PrivateLink today as a fundamental building block of a secure micro-services architecture, as well as secure and private access to PaaS and SaaS services from their suppliers.

Going back to our discussion about AWS APIs, the AWS SigV4 signature process for authenticating and authorizing API requests is no longer just for AWS services. You can achieve the same kind of hardened interface approach using the Amazon API Gateway service, which allows software interfaces to be securely available on the open internet. API Gateway provides distributed denial of service (DDoS) protection, rate limiting, and AWS IAM support as one of several authorization options. When you choose AWS IAM authorization, you author standard IAM policies that define who can call your API and where they can call it from, using the full expressiveness of the IAM policy language. Callers sign their requests using their AWS credentials, typically delivered in the form of IAM roles attached to compute resources, and IAM uniquely authenticates and authorizes every single call to your API according to those policies. With one step, your API is protected behind the massively scaled, super performant, globally available IAM service that protects AWS APIs—with nothing for you to manage or maintain. Calls from the API Gateway front-end to your back-end implementation are secured by mutual TLS, so you’re assured that only API Gateway is able to invoke the back-end implementation. With this strong identity-centric control in place, you have two choices. You can safely place your back-end implementation on the public network, or add the VPC integration model such that the API Gateway call to your back-end implementation running inside of your VPC is protected by an identity-centric control (mutual TLS) and a network-centric control (private connectivity from API Gateway to your code). The security achieved by these feature combinations, arguably only possible in the cloud, makes discussions of east-west concerns seem underwhelming and rooted in constraints of the past.

Our second use case, enabling friction-free access to internal applications for your workforce, is all about improving workforce mobility without compromising security. Traditionally these applications have existed behind a strong VPN front door. However, VPNs can be expensive to scale and aren’t necessarily compatible with the full array of mobile devices that the modern workforce demands. The objective in this case is to make the locks on the individual applications so good that you can eliminate the VPN-based front door. To achieve this, our customers have told us that they want a range of technical solutions to choose from according to their industry, risk tolerance, developer maturity, and other factors. At one end of the spectrum, we have many customers who prefer to use desktop as a service—Amazon Workspaces—or application as a service—Amazon AppStream 2.0—models to provide a powerful and flexible pixel proxy approach to Zero Trust. Traditional security controls are applied to those intermediary virtual devices, and then any user with a PC, tablet, or HTML5 client can reach those virtualized desktops or applications over the internet—or behind additional network controls and perimeters, if they so desire—to provide a rich, desktop-like experience without having to worry about the security of the final device in the hands of the user. Similarly, customers have asked for a better way to access their enterprise applications securely from mobile phones without deploying mobile device management or other such often cumbersome and expensive technologies. To meet that requirement, we launched Amazon WorkLink, providing a secure proxy service that renders complex web applications in the AWS cloud. Amazon WorkLink streams only pixels—and a very minimal amount of JavaScript for interactivity—to mobile phones. No sensitive enterprise data is ever stored or cached on the mobile device.

At the other end of the spectrum, we have customers who want to connect their internal web applications directly to the internet. For these customers, the combination of AWS Shield, AWS WAF, and Application Load Balancer with OpenID Connect (OIDC) authentication provides a fully managed identity-aware network protection stack. Shield provides managed DDoS protection services that provide always-on detection and automatic inline mitigations that minimize application downtime and latency. AWS WAF is a web application firewall that lets you monitor and protect web requests before they reach your infrastructure using your desired combination of rule groups provided by AWS, the AWS Marketplace, or your own custom ones. By enabling authentication in Application Load Balancer—beyond the normal load balancing capabilities—you can directly integrate with your existing identity provider (IdP) to offload the work of authenticating users, and to leverage the existing capabilities within your IdP—such as strong authentication, device posture assessment, conditional access, and policy enforcement. Using this combination, your internal custom applications quickly become just as flexible as SaaS applications, allowing your workforce to enjoy the same work-anywhere flexibility as SaaS while unifying your application portfolio under a common security model powered by modern identity standards.

Our third use case—securing digital transformation projects such as IoT—is markedly different from the first two. Consider a connected vehicle, relaying a critical stream of instrumentation over mobile networks and the internet into a cloud based analytics environment for processing and insights. These workloads have always existed entirely outside the traditional enterprise network, and require a security model that accounts for that situation. The family of AWS IoT services provides scalable solutions for issuing unique device identities to every device in your fleet, and then using those identities and their associated access control policies to securely control how they communicate and interact with the cloud. The security of these devices can be easily monitored and maintained with AWS IoT Device Defender, over-the-air software updates, and even entire operating system upgrades—now built in to FreeRTOS—to keep devices safe and secure over time. Moving forward, as more and more IT workloads move closer to the edge to minimize latency and improve user experiences, the prevalence of this use case will continue to expand, even if it isn’t applicable to your business today.

It’s still Day 1

We hope this post has helped communicate our vision for Zero Trust, and highlighted how we believe that our underlying security principles and advancing capabilities represent a bar-raising security model both for the AWS cloud and for the environments that our customers build on top of our services.

At Amazon we obsess over customers and their needs, so our job is never done. We have lots more capabilities we want to build, and lots more guidance still to offer. We look forward to your feedback and to continuing the journey together—reflecting the words and core vision of our founder, Jeff Bezos: “It’s still Day 1.”

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Noise

Zero Trust architectures: An AWS perspective

Definition and guiding principles for Zero Trust

Examples of Zero Trust principles and capabilities at work today within the AWS cloud

How AWS can help you on your Zero Trust journey

It’s still Day 1

The collective thoughts of the interwebz