Tag Archives: on-premises

Orchestrate big data jobs on on-premises clusters with AWS Step Functions

Post Syndicated from Göksel SARIKAYA original https://aws.amazon.com/blogs/big-data/orchestrate-big-data-jobs-on-on-premises-clusters-with-aws-step-functions/

Customers with specific needs to run big data compute jobs on an on-premises infrastructure often require a scalable orchestration solution. For large-scale distributed compute clusters, the orchestration of jobs must be scalable to maximize their utilization, while at the same time remain resilient to any failures to prevent blocking the ever-growing influx of data and jobs. Moreover, on-premises compute resources can’t be extended on demand, therefore, the jobs may be competing for the same resources with different priorities.

This post showcases serverless building blocks for orchestrating big data jobs using AWS Step Functions, AWS Lambda, and Amazon DynamoDB with a focus on reliability, maintainability, and monitoring. In this solution, Step Functions enables thousands of workflows to run parallel. Additionally, Lambda provides flexibility implementing arbitrary interfaces to the on-premises infrastructure and its compute resources. With additional steps in the orchestration, the solution also allows operations to monitor thousands of parallel jobs in a visual interface for better debugging.


The proposed serverless solution consists of the following main components:

  • Job trigger – Requests new compute jobs to run on the on-premises cluster. For simplicity, in this architecture we assume that the trigger is a client calling Step Functions directly. However, you could extend this to include Amazon API Gateway to create a job API to interface with the orchestration solution or a rule engine to trigger jobs when relevant data becomes available.
  • Job manager – This Step Functions workflow runs once per compute job, with multiple workflows running in parallel. It tracks the status of a job from queueing, scheduling, running, retrying, all the way to its completion. Ideally, a job can be scheduled immediately, but workflows can run for days if a job is very low priority and compute resources are sparse. The job manager delegates the decision when or where to run the job to the job queue manager. Communication to the on-premises cluster is abstracted through a Lambda adapter.
  • Job queue manager – Maintains a queue of all jobs. With the given job properties (for example based on priority), the job queue manager decides the running time of jobs, and the cluster on which they run. To illustrate the concept, the architecture considers real-time information on the resource utilization of the compute clusters (memory, CPU) for scheduling. However, you could apply different scheduling algorithms as required given the flexibility of Lambda.
  • On-premises compute cluster – Provides the computing resources, data nodes, and tools to run compute jobs.

The following diagram illustrates the solution architecture.

Solution Architecture

The main process of the solution consists of seven steps:

  1. The job trigger runs a new Step Functions workflow to run a compute job on premises and provides the necessary information (such as priority and required resources).
  2. The job manager creates a new record in DynamoDB to add the job to the queue of the job queue manager, and the workflow waits for the job queue manager to call back.
  3. Amazon EventBridge triggers a scheduled Lambda function in the job queue manager periodically (for example, every 5 minutes), decoupled from the job requests.
  4. The job scheduler Lambda function retrieves real-time information from cluster metrics to see whether jobs can be scheduled at this point in time.
  5. The job scheduler function fetches all queued jobs from DynamoDB and tries to schedule as many of those jobs as possible to available compute resources based on priority, as well as memory and CPU demands.
  6. For each job that can be scheduled to the compute cluster, the job scheduler function communicates back to the job manager to signal the workflow to continue and that the job can be run.
  7. The job manager communicates with the on-premises cluster through the compute cluster adapter Lambda function to run the job, track its status periodically, and retry in case of errors.

On-premises compute cluster

In this post, we assume the on-premises compute cluster offers interfaces to interact with the compute resources. For example, customers could run a Spark compute cluster on premises that allows the following basic interactions through an API:

  • Upload and trigger a compute job on a cluster (for example, upload a Spark JAR file and submit)
  • Get the status of a compute job (such as running, stopped, or error)
  • Get error output in case of failures in the compute job (for example, the job failed due to access denied)

In addition, we assume the cluster can provide metrics on its current utilization. For example, the cluster could provide Prometheus metrics as aggregates over all resources within a compute cluster:

  • Memory utilization (for example, 2 TB with 80% utilization)
  • CPU utilization (for example, 5,000 cores with 50% utilization)

We use the terminology introduced here for the example in this post. Depending on the capabilities of the on-premises cluster, you can adjust these concepts. For example, the compute cluster could use Kubernetes or SLURM instead of Spark.

Job manager

The job manager is responsible for communicating with on-premises clusters to trigger big data jobs and query their status. It’s a Step Functions state machine that consists of three steps, as illustrated in the following figure.

The first step is JobQueueRequest, which makes a request to the job queue manager component and waits for the callback. When the job queue manager sends OK to the waiting step with a callback pattern, the second step StartJobRun runs.

The StartJobRun step communicates with the on-premises environment (for example, via HTTP post to a REST API endpoint) to trigger an on-premises job.

The third step GetJobStatus queries the job status from the on-premises cluster. If the job status is InProgress, the state machine waits for a configured time. When the Wait state is over, it returns to the GetJobStatus step to query the job status again in a loop. When the job returns a successful state from the on-premises cluster, the state machine completes its cycle with a Success state. If the job fails with a timeout or with an error, the state machine completes its cycle with a Fail state.

The following screenshot shows the details of the state machine on the Step Functions console.

Jpob Manager Step Function Inputs

Job queue manager

The job queue manager is responsible for managing job queues based on job priorities and cluster utilization. It consists of DynamoDB, Lambda, and EventBridge.

The JobQueue table keeps data of waiting jobs, including jobId as the primary key, priority as the sort key, needed memory and CPU consumptions, callbackId, and timestamp information. You can add further information to the table dynamically if required by the scheduling algorithm.

The following screenshot shows the attribute details of the JobQueue table.

EventBridge triggers the job scheduler Lambda function on a regular bases in a configured interval. First, the job scheduler function gets waiting jobs data from the JobQueue table in DynamoDB. Then it establishes a connection with the on-premises cluster to fetch cluster metrics such as memory and CPU utilization. Based on this information, the function decides which jobs are ready to be triggered on the on-premises cluster.

The scheduling algorithm proposed here follows a simple concept to maximize resource utilization, while respecting the job priority. Essentially, for an on-premises cluster (we could potentially have multiple in different geographies), the job scheduler Lambda function builds a queue of jobs according to their priority, while allocating the first job in the queue to compute resources on the cluster. If enough resources are available, the scheduler moves to the next job in the queue and repeats.

Due to the flexibility of Lambda functions, you can tailor the scheduling algorithm for a specific use case. Cluster scheduling algorithms are still an open research topic with different optimization goals, such as throughput, data location, fairness, deadlines, and more.

Get started

In this section, we provide a starting point for the solution described in this post. The steps walk you through creating a Step Functions state machine with the appropriate template, and the necessary Lambda and DynamoDB interactions to create the job manager and job queue manager building blocks. Example code for the Lambda functions is excluded from this post, because the communication with the on-premises cluster to trigger jobs can vary depending on your on-premises interface.

  1. On the Step Functions console, choose State machines.
  2. Choose Create state machine.
  3. Select Run a sample project.
  4. Select Job Poller.
    Job Poller State Machine Template
  5. Scroll down to see the sample projects, which are defined using Amazon States Language (ASL).
  6. Review the example definition, then choose Next.Job Manager Step Functions Template
  7. Choose Deploy resources.
    Deployment can take up to 10 minutes.Step Functions Deploy Resources
    The deployment creates the state machine that is responsible for job management. After you deploy the resources, you need to edit the sample ASL code to add the extra JobQueueRequest step in the state machine.
  8. Select the created state machine.
  9. Choose Edit to add ARNs of the three Lambda functions to make a request in the job queue manager (Job Queue Request), to submit a job to the on-premises cluster (Submit Job), and to poll the status of the jobs (Get Job Status).Job Manager Step Functions Definition
    Now you’re ready to create the job queue manager.
  10. On the DynamoDB console, create a table for storing job metadata.
  11. On the EventBridge console, create a scheduled rule that triggers the Lambda function at a configured interval.
  12. On the Lambda console, create the function that communicates with the on-premises cluster to fetch cluster metrics. It also gets jobs from the DynamoDB table to retrieve information including job priorities, required memory, and CPU to run the job on the on-premises cluster.


This solution uses Step Functions to track all jobs until completion, and therefore the Step Functions quotas must be considered for potential use cases. Mainly, a workflow can run for a maximum of 1 year (cannot be increased) and by default 1 million parallel runs can run in a single account (can be increased to millions). See Quotas for further details.


This post described how to orchestrate big data jobs running in parallel on on-premises clusters with a Step Functions workflow. To learn more about how to use Step Functions workflows for serverless orchestration, visit Serverless Land.

About the Authors

Göksel Sarikaya is a Senior Cloud Application Architect at AWS Professional Services. He enables customers to design scalable, high-performance, and cost effective applications using the AWS Cloud. He helps them to be more flexible and competitive during their digital transformation journey.

Nicolas Jacob Baer is a Senior Cloud Application Architect with a strong focus on data engineering and machine learning, based in Switzerland. He works closely with enterprise customers to design data platforms and build advanced analytics/ml use-cases.

Shukhrat Khodjaev is a Senior Engagement Manager at AWS ProServe, based out of Berlin. He focuses on delivering engagements in the field of Big Data and AI/ML that enable AWS customers to uncover and to maximize their value through efficient use of data.

Everything you wanted to know about trusts with AWS Managed Microsoft AD

Post Syndicated from Jeremy Girven original https://aws.amazon.com/blogs/security/everything-you-wanted-to-know-about-trusts-with-aws-managed-microsoft-ad/

Many Amazon Web Services (AWS) customers use Active Directory to centralize user authentication and authorization for a variety of applications and services. For these customers, Active Directory is a critical piece of their IT infrastructure. AWS offers AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD, to provide a highly available and resilient Active Directory service.

One of the most common AWS Managed Microsoft AD use cases is for customers who need to integrate their on-premises Active Directory domain or forest with AWS services like Amazon Relational Database Service (Amazon RDS), Amazon FSx, Amazon WorkSpaces, and other AWS applications and services. This type of integration can require a trust relationship. When it comes to trusts, there are some common misconceptions about what happens and doesn’t happen when a trust is created.

In this post, I’m going to dive deep into various aspects of Active Directory trusts and debunk some common myths along the way. This post will cover the following areas:

Starting with Kerberos

The first part of understanding how trusts work is to understand how authentication flows across a trust, particularly with Kerberos. Kerberos is a subject that, on the surface, is simple enough, but can quickly become much more complex. This post isn’t going to go into detail about Kerberos in Microsoft Windows. If you wish to look further into the topic, see the Microsoft Kerberos documentation. In this post, I’m just going to give you an overview of how Kerberos authentication works across trusts.

Figure 1: Kerberos authentication across trusts

Figure 1: Kerberos authentication across trusts

If you only remember one thing about Kerberos and trust, it should be referrals. Let’s look at the workflow in Figure 1, which shows a user from Domain A who is logged into a computer in Domain A and wants to access an Amazon FSx file share in Domain B. For simplicity’s sake, I’ll say there is a two-way trust between Domains A and B.

Note: When a trust is integrated with AWS Managed Microsoft AD, you need to enable Kerberos preauthentication for accounts that traverse the trusts. Disabling Kerberos preauthentication isn’t recommended, because a malicious user can directly send dummy requests for authentication. The key distribution center (KDC) will return an encrypted Ticket-Granting Ticket (TGT), which the malicious user can brute force offline. See Kerberos Pre-Authentication: Why It Should Not Be Disabled for more details.

The steps of the Kerberos authentication process over trusts are as follows:

1. Kerberos authentication service request (KRB_AS_REQ): The client contacts the authentication service (AS) of the KDC (which is running on a domain controller) for Domain A, which the client is a member of, for a short-lived ticket called a Ticket-Granting Ticket (TGT). The default lifetime of the TGT is 10 hours. For Windows clients this happens at logon, but Linux clients might need to run a kinit command.

2. Kerberos authentication service response (KRB_AS_REP): The AS constructs the TGT and creates a session key that the client can use to encrypt communication with the ticket-granting service (TGS). At the time that the client receives the TGT, the client has not been granted access to any resources, even to resources on the local computer.

3. Kerberos ticket-granting service request (KRB_TGS_REQ): The user’s Kerberos client sends a KRB_TGS_REQ message to a local KDC in Domain A, specifying [email protected] as the target. The Kerberos client compares the location with its own workstation’s domain. Because these values are different, the client sets a flag in the KDC Options field of the KRB_TGS_REQ message for NAME_CANONICALIZE, which indicates to the KDC that the server might be in another realm (domain).

4. Kerberos ticket-granting service response (KRB_TGS_REP): The user’s local KDC (for Domain A) receives the KRB_TGS_REQ and sends back a TGT referral ticket for Domain B. The TGT is issued for the next intervening domain along the shortest path to Domain B. The TGT also has a referral flag set, so that the KDC will be informed that the KRB_TGS_REQ is coming from another realm. This flag also tells the KDC to fill in the Transited Realms field. The referral ticket is encrypted with the interdomain key that is decrypted by Domain B’s TGS.

Note: When a trust is established between domains or forests, an interdomain key based on the trust password becomes available for authenticating KDC functions and is used to encrypt and decrypt Kerberos tickets.

5. Kerberos ticket-granting service request (KRB_TGS_REQ): The user’s Kerberos client sends a KRB_TGS_REQ along with the TGT it received from the Domain A KDC to a KDC in Domain B.

6. Kerberos ticket-granting service response (KRB_TGS_REP): The TGS in Domain B examines the TGT and the authenticator. If these are acceptable, the TGS creates a service ticket. The client’s identity is taken from the TGT and copied to the service ticket. Then the ticket is sent to the client.

For more details on the authenticator, see How the Kerberos Version 5 Authentication Protocol Works.

7. Application server service request (KRB_TGS_REQ): After the client has the service ticket, the client sends the ticket and a new authenticator to the target server, requesting access. The server will decrypt the ticket, validate the authenticator, and (for Windows services), create an access token for the user based on the SIDs in the ticket.

8. Application server service response (KRB_TGS_REP): Optionally, the client might request that the target server verify its own identity. This is called mutual authentication. If mutual authentication is requested, the target server takes the client computer’s timestamp from the authenticator, encrypts it with the session key the TGS provided for client-target server messages, and sends it to the client.

The basics of trust transitivity, direction, and types

Let’s start off by defining a trust. Active Directory trusts are a relationship between domains, which makes it possible for users in one domain to be authenticated by a domain controller in the other domain. Authenticated users, if given proper permissions, can access resources in the other domain.

Active Directory Domain Services supports four types of trusts: External (Domain), Forest, Realm, and Shortcut. Out of those four types of trusts, AWS Managed Microsoft AD supports the External (Domain) and Forest trust types. I’ll focus on External (Domain) and Forest trust types for this post.

Transitivity: What is it?

Before I dive into the types of trusts, it’s important to understand the concept of transitivity in trusts. A trust that is transitive allows authentication to flow through other domains (Child and Trees) in the trusted forests or domains. In contrast, a non-transitive trust is a point-to-point trust that allows authentication to flow exclusively between the trusted domains.

Figure 2: Forest trusts between the Example.local and Example.com forests

Figure 2: Forest trusts between the Example.local and Example.com forests

Don’t worry about the trust types at this point, because I’ll cover those shortly. The example in Figure 2 shows a Forest trust between Example.com and Example.local. The Example.local forest has a child domain named Child. With a transitive trust, users from the Example.local and Child.Example.local domain can be authenticated to resources in the Example.com domain.

If Figure 2 has an External trust, only users from Example.local can be authenticated to resources in the Example.com domain. Users from Child.Example.local cannot traverse the trust to access resources in the Example.com domain.

Trust direction

Two-way trusts are bidirectional trusts that allow authentication referrals from either side of the trust to give users access resources in either domain or forest. If you look in the Active Directory Domains and Trusts area of the Microsoft Management Console (MMC), which provides consoles to manage the hardware, software, and network components of Microsoft Windows operating system, you can see both an incoming and an outgoing trust for the trusted domain.

One-way trusts are a single-direction trust that allows authentication referrals from one side of the trust only. A one-way trust is either outgoing or incoming, but not both (that would be a two-way trust).

  • An outgoing trust allows users from the trusted domain (Example.com) to authenticate in this domain (Example.local).
  • An incoming trust allows users from this domain (Example.local) to authenticate in the trusted domain (Example.com).


Figure 3: One-way trust direction

Figure 3: One-way trust direction

Let’s use a diagram to further explain this concept. Figure 3 shows a one-way trust between Example.com and Example.local. This an outgoing trust from Example.com and an incoming trust on Example.local. Users from Example.local can authenticate and, if given proper permissions, access resources in Example.com. Users from Example.com cannot access or authenticate to resources in Example.local.

Trust types

In this section of the post, I’ll examine the various types of Active Directory trusts and their capabilities.

External trusts

This trust type is used to share resources between two domains. These can be individual domains within or external to a forest. Think of this as a point-to-point trust between two domains. See Understanding When to Create an External Trust for more details on this trust type.

  • Transitivity: Non-transitive
  • Direction: One-way or two-way
  • Authentication types: NTLM Only* (Kerberos is possible with caveats; see the Microsoft Windows Server documentation for details)
  • AWS Managed Microsoft AD support: Yes

Forest trusts

This trust type is used to share resources between two forests. This is the preferred trust model, because it works fully with Kerberos without any caveats. See Understanding When to Create a Forest Trust for more details.

  • Transitivity: Transitive
  • Direction: One-way or two-way
  • Authentication types: Kerberos and NTLM
  • AWS Managed Microsoft AD support: Yes

Realm trusts

This trust type is used to form a trust relationship between a non-Windows Kerberos realm and an Active Directory domain. See Understanding When to Create a Realm Trust for more details.

  • Transitivity: Non-transitive or transitive
  • Direction: One-way or two-way
  • Authentication types: Kerberos Only
  • AWS Managed Microsoft AD support: No

Shortcut trusts

This trust type is used to shorten the authentication path between domains within complex forests. See Understanding When to Create a Shortcut Trust for more details.

  • Transitivity: Transitive
  • Direction: One-way or two-way
  • Authentication types: Kerberos and NTLM
  • AWS Managed Microsoft AD support: No

User Principal Name suffixes

The default User Principal Name (UPN) suffix for a user account is the Domain Name System (DNS) domain name of the domain where the user account resides. In AWS Managed Microsoft AD and self-managed AD, alternative UPN suffixes are added to simplify administration and user logon processes by providing a single UPN suffix for all users. The UPN suffix is used within the Active Directory forest, and is not required to be a valid DNS domain name. See Adding User Principal Name Suffixes for the process to add UPN suffixes to a forest.

For example, if your domain is Example.local but you want your users to sign in with what appears to be another domain name (such as ExampleSuffix.local), you would need to add a new UPN suffix to the domain. Figure 4 shows a user being created with an alternate UPN suffix.

Figure 4: UPN selection on object creation

Figure 4: UPN selection on object creation

If you’re logged into a Windows system, you can use the whoami /upn command to see the UPN of the current user.

Forest trusts and name suffix routing

Name suffix routing manages how authentication requests are routed across forest trusts. A unique name suffix is a name suffix within a forest, such as a UPN suffix or DNS forest or domain tree name, that isn’t subordinate to any other name suffix. For example, the DNS forest name Example.com is a unique name suffix within the example.com forest.

All names that are subordinate to unique name suffixes are routed implicitly. For example, if your forest root is named Example.local, authentication requests for all child domains of Example.local (Child.Example.local) will be routed because the child domains are subordinate to the Example.local name suffix. If you want to exclude members of a child domain from authenticating in the specified forest, you can disable name suffix routing for that name. You can also disable routing for the forest name itself, if necessary. With domain trees and additional UPN suffixes, name suffix routing by default is disabled and must be enabled if those suffixes are to be able to traverse the trust.

Note: In AWS Managed Microsoft AD, customers don’t have the ability to create or modify trusts by using the native Microsoft tools. If you need a name suffix route enabled for your trust, open a support case with Premium Support.

A couple of diagrams will make it easier to digest this information. Figure 5 shows the trust configuration. There is a one-way outgoing forest trust from Example.com to Example.local. Example.local has a UPN suffix named ExampleSuffix.local added to it. Example.local also has a child domain named Child and a tree domain named ExampleTree.local. By default, users in Example.local and Child.Example.local will be able to authenticate to resources in Example.com. Users in the ExampleTree.local domain will not be able to authenticate to resources in Example.com, unless the name suffix route for ExampleTree.local is enabled on the trust object in Example.com.

Figure 5: Multi-domain and suffix forest with a trust

Figure 5: Multi-domain and suffix forest with a trust

Figure 6 is from the trust properties dialog from the Example.com forest of a trust between Example.com and Example.local. As you can see, *.example.local is enabled. But the custom UPN suffix ExampleSuffix.local and the tree domain ExampleTree.local are disabled by default.

Figure 6: Example.local trusts details

Figure 6: Example.local trusts details

Selective authentication

With AWS Managed Microsoft AD and self-managed AD, you have the option of configuring Selective Authentication. This option restricts authentication access over a trust to only the users in a trusted domain or forest who have been explicitly given authentication permissions to computer objects that reside in the trusting domain or forest.

When you use domain or forest-wide authentication, depending on the trust direction, users can authenticate across the trust. Authentication by itself doesn’t provide access—users have to be delegated permissions to access resources. When Selective Authentication is enabled, you must set the Allowed to Authenticate permission on each computer object the trusted user will be accessing, in addition to any other permissions that are required to access the computer object.

While Selective Authentication is a way to provide additional hardening of trusts, it requires a significant amount of planning and delegation, because you have to set the Allowed to Authenticate permission on all computer objects that are being accessed. It can also make troubleshooting permissions and trust issues more difficult.

For more details on Selective Authentication, see Selective Authentication and Configuring Selective Authentication Settings in the Microsoft documentation.

SID filtering

I won’t spend a lot of time on the subject of SID filtering, since this feature is enabled in AWS Managed Microsoft AD and can’t be disabled. SID filtering prevents malicious users who have domain or enterprise administrator level access in a trusted forest from granting elevated user rights to a trusting forest. It does this by preventing misuse of the attributes containing SIDs on security principals in the trusted forest. For example, a malicious user with administrative credentials located in a trusted forest could, through various means, obtain the SID information of a domain or enterprise admin in the trusting forest. After obtaining the SID of an administrator from the trusting forest, a malicious user with administrative credentials can add that SID to the SID history attribute of a security principal in the trusted forest and attempt to gain full access to the trusting forest and the resources within it.

Keeping SID filtering disabled on your on-premises domain can open your domain up to risks from malicious users. We understand that during a domain migration, you may need to disable it to allow an object’s SID from the original domain to be used during the migration. But in AWS Managed Microsoft AD, this filtering cannot be disabled. See SID Filtering for more details.

Network ports that are required to create trusts

The following network ports are required to be open between domain controllers on both domains or forests prior to attempting to create a trust. Note, the Security Group used by your AWS Managed Microsoft AD directory already has these inbound ports open. You will need to adjust the outbound rules of the Security Group to let it communicate with the to be trusted domains or forests. The following table is based on Microsoft’s recommendations. Depending on your use case, some of these ports might not need to be opened. For example, if LDAP over SSL isn’t configured, then TCP 636 isn’t needed.

Port Protocol Service
53 TCP and UDP DNS
88 TCP and UDP Kerberos
123 UDP Windows Time
135 TCP Remote Procedure Call (RPC)
389 TCP and UDP Lightweight Directory Access Protocol (LDAP)
445 TCP Server Message Block (SMB)
464 TCP and UDP Kerberos Password Change
636 TCP LDAP over SSL
3268 TCP LDAP Global Catalog (GC)
3269 TCP LDAP GC over SSL
49152–65535 TCP and UDP RPC

Trust creation process overview

AWS Managed Microsoft AD is based on Windows Server Active Directory Domain Services, which means that Active Directory trusts function the same way they do with self-managed Active Directory. The only difference is how the trust is created. You use the AWS Management Console or APIs to create the trust for the AWS Managed Microsoft AD side. This process has been documented thoroughly in the AWS Directory Service Administration Guide, so I won’t go into detail on the steps.

The high-level overview of the process is:

  1. Ensure that network and DNS name resolution is available and functional between the domains.
  2. Create the trust on the on-premises Active Directory.
  3. Complete the trust on the AWS Managed Microsoft AD in the AWS Directory Service console.

Common trust scenarios with AWS Managed Microsoft AD

When you create trust between an on-premises domain and AWS Managed Microsoft AD, there are some items to take into consideration that will help you decide what direction of trust you need to deploy. In this post, I’ll cover a couple of the most common scenarios.

All scenarios: Selecting a trust type

Let’s start with the choice between a Forest or External trust. We generally recommend using a Forest trust type. The reason for that is that Forest trusts fully support Kerberos without any caveats. With that said, if you have a specific requirement to implement an External trust, you can do so—just be aware of these caveats.

Scenario 1: Use AWS Managed Microsoft AD as a resource forest for Amazon RDS, Amazon FSx for Windows File Server, or Amazon EC2 instances

In this scenario, you might want to use AWS Managed Microsoft AD as a resource forest for Amazon RDS, Amazon FSx for Windows File Server, or Amazon Elastic Compute Cloud (Amazon EC2). AWS Managed Microsoft AD is going to be a resource domain, and user accounts will reside on the on-premises side of the trust and need to be able to access the resources in the AWS Managed Microsoft AD side of the trust.

In this scenario, the AWS applications (Amazon RDS, Amazon FSx for Windows File Server, or Amazon EC2) don’t require a two-way trust to function, because they are natively integrated with Active Directory. This tells you that you only need authentication to flow one way. This scenario requires a one-way incoming trust on the on-premises domain and one-way outgoing trusts on the AWS Managed Microsoft AD domain. Figure 7 demonstrates this.

Figure 7: A one-way trust

Figure 7: A one-way trust

Scenario 2: Use AWS Managed Microsoft AD as a resource forest for all other supported AWS applications

In this scenario, you want to use AWS Managed Microsoft AD as a resource domain for all other supported AWS applications that aren’t included in Scenario 1. As the previous scenario stated, AWS Managed Microsoft AD will be a resource domain, and the user accounts will reside on the on-premises side of the trust and need to be able to access the resources in the AWS Managed Microsoft AD.

In this scenario, AWS applications (Amazon Chime, Amazon Connect, Amazon QuickSight, AWS Single Sign-On, Amazon WorkDocs, Amazon WorkMail, Amazon WorkSpaces, AWS Client VPN, AWS Management Console, and AWS Transfer Family) need to be able to look up objects from the on-premises domain in order for them to function. This tells you that authentication needs to flow both ways. This scenario requires a two-way trust between the on-premises and AWS Managed Microsoft AD domains. Figure 8 demonstrates this.

Figure 8: A two-way trust

Figure 8: A two-way trust

Common trust myths and misconceptions

I have had many conversations with customers concerning trusts between their on-premises domain and their AWS Managed Microsoft AD domain. These are some of the common myths and misconceptions we’ve come across in our conversations.

Trusts synchronize objects between each domain.

This is false. A trust between domains or forests acts as a bridge that allows validated authentication requests, in the form of Kerberos or NTLM traffic, to travel between domains or forests. Objects are not synchronized between the domains or forests. Only the trust password is synchronized, which is used for Kerberos.

My password is passed over the trust when authenticating.

This is false. As I showed earlier in the Starting with Kerberos section, when authenticating across trusts, the user’s password is not passed between domains. The only things passed between domains are the Ticket Granting Service (TGS) requests and responses, which are generated in real time, are single use, and expire within hours.

A one-way trust allows bidirectional authentication.

This is false. One-way trusts allow authentications to traverse in one direction only. Users or objects from the trusted domain are able to authenticate and, if they are delegated, to access resources in the trusting domain. Users in the trusting domain can’t authenticate into the trusted domain, and aren’t granted permissions to access resources. Let’s say there is an Amazon FSx file system in Example.local and a one-way trust between Example.com (outgoing trust direction) and Example.local (incoming trust direction). A user in Example.com can’t be delegated permission to the Amazon FSx file system Example.local with the current trust configuration. That’s the nature of a one-way trust.

Trusts are inherently insecure by default.

This is false, although an improperly configured trust can increase your risk and exposure. Trusts by themselves do very little to increase an Active Directory’s attack surface. You should always use best practices when creating a trust to minimize risk. For example, a trust without a purpose should be removed. You should disable the SID History unless you’re in the process of migrating domains. See Security Considerations for Trusts for more guidance on securing trusts.

Users in the trusted domain are granted permissions to my domain when a trust is created.

This is false. By default, with two-way trusts, objects have read-only permission to Active Directory in both directions. Objects are not delegated permissions or access to resources or servers by default. For example, if you want a user to log into a computer in another domain, you first must delegate the user access to the resource in the other domain. Without that delegation, the user won’t be able to access the resource.

Troubleshooting trusts

Based on our experience working with many customers, the vast majority of trust configuration issues are either DNS resolution or networking connectivity errors. These are some troubleshooting steps to help you resolve any of these common issues:

  • Check whether you allowed outbound networking traffic on the AWS Managed Microsoft AD. See Step 1: Set up your environment for trusts to learn how to find your directory’s security group and how to modify it.
  • If the DNS server or the network for your on-premises domain uses a public (non-RFC 1918) IP address space, follow these steps:
    1. In the AWS Directory Service console, go to the IP routing section for your directory, choose Actions, and then choose Add route.
    2. Enter the IP address block of your DNS server or on-premises network using CIDR format, for example

      This step isn’t necessary if both your DNS server and your on-premises network are using RFC 1918 private IP address spaces.

  • After you verify the security group and check whether any applicable routes are required, launch a Windows Server instance and join it to the AWS Managed Microsoft AD directory. See Step 3: Deploy an EC2 instance to manage your AWS Managed Microsoft AD to learn how to do this. Once the instance is launched, do the following:
    • Run the following PowerShell command to test DNS connectivity:
      Resolve-DnsName -Name 'example.local' -DnsOnly
  • You should also look through the message explanations in the Trust creation status reasons guide in the AWS Directory Service documentation.

Summary of AWS Managed Microsoft AD trust considerations

In this blog post, I covered Kerberos authentication over Active Directory trusts and provided details on what Active Directory trusts are and how they function. Here’s a quick list of items that you should consider when you plan trust creation with AWS Managed Microsoft AD:

  • Ensure that you have a network connection and the appropriate network ports opened between both domains. Note, it is recommended all Active Directory traffic occur over private network connection like a VPN or Direct Connect.
  • Ensure that DNS resolution is working on both sides of the trust.
  • Decide whether you will implement selective authentication. If it will be used, plan your Active Directory access control list (ACL) delegation strategy before implementation.
  • As of this blog’s publication, keep in mind that AWS Managed Microsoft AD currently supports Forest trusts and External trusts only.
  • Ensure that Kerberos preauthentication is enabled for all objects that traverse trusts with AWS Managed Microsoft AD.
  • If you find that you need a name suffix route enabled for your trust, open a support case with AWS Support, requesting that the name suffix route be enabled.
  • Finally, review Security Considerations for Trusts: Domain and Forest Trusts for additional considerations for trust configuration.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Directory Service forum.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.


Jeremy Girven

Jeremy is a solutions architect specializing in Microsoft workloads on AWS. He has over 15 years’ experience with Microsoft Active Directory and over 23 years of industry experience. One of his fun projects is using SSM to automate the Active Directory build processes in AWS. To see more, check out the Active Directory AWS QuickStart.

AWS Online Tech Talks – June 2018

Post Syndicated from Devin Watson original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-june-2018/

AWS Online Tech Talks – June 2018

Join us this month to learn about AWS services and solutions. New this month, we have a fireside chat with the GM of Amazon WorkSpaces and our 2nd episode of the “How to re:Invent” series. We’ll also cover best practices, deep dives, use cases and more! Join us and register today!

Note – All sessions are free and in Pacific Time.

Tech talks featured this month:


Analytics & Big Data

June 18, 2018 | 11:00 AM – 11:45 AM PTGet Started with Real-Time Streaming Data in Under 5 Minutes – Learn how to use Amazon Kinesis to capture, store, and analyze streaming data in real-time including IoT device data, VPC flow logs, and clickstream data.
June 20, 2018 | 11:00 AM – 11:45 AM PT – Insights For Everyone – Deploying Data across your Organization – Learn how to deploy data at scale using AWS Analytics and QuickSight’s new reader role and usage based pricing.


AWS re:Invent
June 13, 2018 | 05:00 PM – 05:30 PM PTEpisode 2: AWS re:Invent Breakout Content Secret Sauce – Hear from one of our own AWS content experts as we dive deep into the re:Invent content strategy and how we maintain a high bar.

June 25, 2018 | 01:00 PM – 01:45 PM PTAccelerating Containerized Workloads with Amazon EC2 Spot Instances – Learn how to efficiently deploy containerized workloads and easily manage clusters at any scale at a fraction of the cost with Spot Instances.

June 26, 2018 | 01:00 PM – 01:45 PM PTEnsuring Your Windows Server Workloads Are Well-Architected – Get the benefits, best practices and tools on running your Microsoft Workloads on AWS leveraging a well-architected approach.


June 25, 2018 | 09:00 AM – 09:45 AM PTRunning Kubernetes on AWS – Learn about the basics of running Kubernetes on AWS including how setup masters, networking, security, and add auto-scaling to your cluster.



June 18, 2018 | 01:00 PM – 01:45 PM PTOracle to Amazon Aurora Migration, Step by Step – Learn how to migrate your Oracle database to Amazon Aurora.

June 20, 2018 | 09:00 AM – 09:45 AM PTSet Up a CI/CD Pipeline for Deploying Containers Using the AWS Developer Tools – Learn how to set up a CI/CD pipeline for deploying containers using the AWS Developer Tools.


Enterprise & Hybrid
June 18, 2018 | 09:00 AM – 09:45 AM PTDe-risking Enterprise Migration with AWS Managed Services – Learn how enterprise customers are de-risking cloud adoption with AWS Managed Services.

June 19, 2018 | 11:00 AM – 11:45 AM PTLaunch AWS Faster using Automated Landing Zones – Learn how the AWS Landing Zone can automate the set up of best practice baselines when setting up new


AWS Environments

June 21, 2018 | 11:00 AM – 11:45 AM PTLeading Your Team Through a Cloud Transformation – Learn how you can help lead your organization through a cloud transformation.

June 21, 2018 | 01:00 PM – 01:45 PM PTEnabling New Retail Customer Experiences with Big Data – Learn how AWS can help retailers realize actual value from their big data and deliver on differentiated retail customer experiences.

June 28, 2018 | 01:00 PM – 01:45 PM PTFireside Chat: End User Collaboration on AWS – Learn how End User Compute services can help you deliver access to desktops and applications anywhere, anytime, using any device.

June 27, 2018 | 11:00 AM – 11:45 AM PTAWS IoT in the Connected Home – Learn how to use AWS IoT to build innovative Connected Home products.


Machine Learning

June 19, 2018 | 09:00 AM – 09:45 AM PTIntegrating Amazon SageMaker into your Enterprise – Learn how to integrate Amazon SageMaker and other AWS Services within an Enterprise environment.

June 21, 2018 | 09:00 AM – 09:45 AM PTBuilding Text Analytics Applications on AWS using Amazon Comprehend – Learn how you can unlock the value of your unstructured data with NLP-based text analytics.


Management Tools

June 20, 2018 | 01:00 PM – 01:45 PM PTOptimizing Application Performance and Costs with Auto Scaling – Learn how selecting the right scaling option can help optimize application performance and costs.


June 25, 2018 | 11:00 AM – 11:45 AM PTDrive User Engagement with Amazon Pinpoint – Learn how Amazon Pinpoint simplifies and streamlines effective user engagement.


Security, Identity & Compliance

June 26, 2018 | 09:00 AM – 09:45 AM PTUnderstanding AWS Secrets Manager – Learn how AWS Secrets Manager helps you rotate and manage access to secrets centrally.
June 28, 2018 | 09:00 AM – 09:45 AM PTUsing Amazon Inspector to Discover Potential Security Issues – See how Amazon Inspector can be used to discover security issues of your instances.



June 19, 2018 | 01:00 PM – 01:45 PM PTProductionize Serverless Application Building and Deployments with AWS SAM – Learn expert tips and techniques for building and deploying serverless applications at scale with AWS SAM.



June 26, 2018 | 11:00 AM – 11:45 AM PTDeep Dive: Hybrid Cloud Storage with AWS Storage Gateway – Learn how you can reduce your on-premises infrastructure by using the AWS Storage Gateway to connecting your applications to the scalable and reliable AWS storage services.
June 27, 2018 | 01:00 PM – 01:45 PM PTChanging the Game: Extending Compute Capabilities to the Edge – Discover how to change the game for IIoT and edge analytics applications with AWS Snowball Edge plus enhanced Compute instances.
June 28, 2018 | 11:00 AM – 11:45 AM PTBig Data and Analytics Workloads on Amazon EFS – Get best practices and deployment advice for running big data and analytics workloads on Amazon EFS.

New – Pay-per-Session Pricing for Amazon QuickSight, Another Region, and Lots More

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-pay-per-session-pricing-for-amazon-quicksight-another-region-and-lots-more/

Amazon QuickSight is a fully managed cloud business intelligence system that gives you Fast & Easy to Use Business Analytics for Big Data. QuickSight makes business analytics available to organizations of all shapes and sizes, with the ability to access data that is stored in your Amazon Redshift data warehouse, your Amazon Relational Database Service (RDS) relational databases, flat files in S3, and (via connectors) data stored in on-premises MySQL, PostgreSQL, and SQL Server databases. QuickSight scales to accommodate tens, hundreds, or thousands of users per organization.

Today we are launching a new, session-based pricing option for QuickSight, along with additional region support and other important new features. Let’s take a look at each one:

Pay-per-Session Pricing
Our customers are making great use of QuickSight and take full advantage of the power it gives them to connect to data sources, create reports, and and explore visualizations.

However, not everyone in an organization needs or wants such powerful authoring capabilities. Having access to curated data in dashboards and being able to interact with the data by drilling down, filtering, or slicing-and-dicing is more than adequate for their needs. Subscribing them to a monthly or annual plan can be seen as an unwarranted expense, so a lot of such casual users end up not having access to interactive data or BI.

In order to allow customers to provide all of their users with interactive dashboards and reports, the Enterprise Edition of Amazon QuickSight now allows Reader access to dashboards on a Pay-per-Session basis. QuickSight users are now classified as Admins, Authors, or Readers, with distinct capabilities and prices:

Authors have access to the full power of QuickSight; they can establish database connections, upload new data, create ad hoc visualizations, and publish dashboards, all for $9 per month (Standard Edition) or $18 per month (Enterprise Edition).

Readers can view dashboards, slice and dice data using drill downs, filters and on-screen controls, and download data in CSV format, all within the secure QuickSight environment. Readers pay $0.30 for 30 minutes of access, with a monthly maximum of $5 per reader.

Admins have all authoring capabilities, and can manage users and purchase SPICE capacity in the account. The QuickSight admin now has the ability to set the desired option (Author or Reader) when they invite members of their organization to use QuickSight. They can extend Reader invites to their entire user base without incurring any up-front or monthly costs, paying only for the actual usage.

To learn more, visit the QuickSight Pricing page.

A New Region
QuickSight is now available in the Asia Pacific (Tokyo) Region:

The UI is in English, with a localized version in the works.

Hourly Data Refresh
Enterprise Edition SPICE data sets can now be set to refresh as frequently as every hour. In the past, each data set could be refreshed up to 5 times a day. To learn more, read Refreshing Imported Data.

Access to Data in Private VPCs
This feature was launched in preview form late last year, and is now available in production form to users of the Enterprise Edition. As I noted at the time, you can use it to implement secure, private communication with data sources that do not have public connectivity, including on-premises data in Teradata or SQL Server, accessed over an AWS Direct Connect link. To learn more, read Working with AWS VPC.

Parameters with On-Screen Controls
QuickSight dashboards can now include parameters that are set using on-screen dropdown, text box, numeric slider or date picker controls. The default value for each parameter can be set based on the user name (QuickSight calls this a dynamic default). You could, for example, set an appropriate default based on each user’s office location, department, or sales territory. Here’s an example:

To learn more, read about Parameters in QuickSight.

URL Actions for Linked Dashboards
You can now connect your QuickSight dashboards to external applications by defining URL actions on visuals. The actions can include parameters, and become available in the Details menu for the visual. URL actions are defined like this:

You can use this feature to link QuickSight dashboards to third party applications (e.g. Salesforce) or to your own internal applications. Read Custom URL Actions to learn how to use this feature.

Dashboard Sharing
You can now share QuickSight dashboards across every user in an account.

Larger SPICE Tables
The per-data set limit for SPICE tables has been raised from 10 GB to 25 GB.

Upgrade to Enterprise Edition
The QuickSight administrator can now upgrade an account from Standard Edition to Enterprise Edition with a click. This enables provisioning of Readers with pay-per-session pricing, private VPC access, row-level security for dashboards and data sets, and hourly refresh of data sets. Enterprise Edition pricing applies after the upgrade.

Available Now
Everything I listed above is available now and you can start using it today!

You can try QuickSight for 60 days at no charge, and you can also attend our June 20th Webinar.



How AWS Meets a Physical Separation Requirement with a Logical Separation Approach

Post Syndicated from Min Hyun original https://aws.amazon.com/blogs/security/how-aws-meets-a-physical-separation-requirement-with-a-logical-separation-approach/

We have a new resource available to help you meet a requirement for physically-separated infrastructure using logical separation in the AWS cloud. Our latest guide, Logical Separation: An Evaluation of the U.S. Department of Defense Cloud Security Requirements for Sensitive Workloads outlines how AWS meets the U.S. Department of Defense’s (DoD) stringent physical separation requirement by pioneering a three-pronged logical separation approach that leverages virtualization, encryption, and deploying compute to dedicated hardware.

This guide will help you understand logical separation in the cloud and demonstrates its advantages over a traditional physical separation model. Embracing this approach can help organizations confidently meet or exceed security requirements found in traditional on-premises environments, while also providing increased security control and flexibility.

Logical Separation is the second guide in the AWS Government Handbook Series, which examines cybersecurity policy initiatives and identifies best practices.

If you have questions or want to learn more, contact your account executive or AWS Support.

Creating a 1.3 Million vCPU Grid on AWS using EC2 Spot Instances and TIBCO GridServer

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/creating-a-1-3-million-vcpu-grid-on-aws-using-ec2-spot-instances-and-tibco-gridserver/

Many of my colleagues are fortunate to be able to spend a good part of their day sitting down with and listening to our customers, doing their best to understand ways that we can better meet their business and technology needs. This information is treated with extreme care and is used to drive the roadmap for new services and new features.

AWS customers in the financial services industry (often abbreviated as FSI) are looking ahead to the Fundamental Review of Trading Book (FRTB) regulations that will come in to effect between 2019 and 2021. Among other things, these regulations mandate a new approach to the “value at risk” calculations that each financial institution must perform in the four hour time window after trading ends in New York and begins in Tokyo. Today, our customers report this mission-critical calculation consumes on the order of 200,000 vCPUs, growing to between 400K and 800K vCPUs in order to meet the FRTB regulations. While there’s still some debate about the magnitude and frequency with which they’ll need to run this expanded calculation, the overall direction is clear.

Building a Big Grid
In order to make sure that we are ready to help our FSI customers meet these new regulations, we worked with TIBCO to set up and run a proof of concept grid in the AWS Cloud. The periodic nature of the calculation, along with the amount of processing power and storage needed to run it to completion within four hours, make it a great fit for an environment where a vast amount of cost-effective compute power is available on an on-demand basis.

Our customers are already using the TIBCO GridServer on-premises and want to use it in the cloud. This product is designed to run grids at enterprise scale. It runs apps in a virtualized fashion, and accepts requests for resources, dynamically provisioning them on an as-needed basis. The cloud version supports Amazon Linux as well as the PostgreSQL-compatible edition of Amazon Aurora.

Working together with TIBCO, we set out to create a grid that was substantially larger than the current high-end prediction of 800K vCPUs, adding a 50% safety factor and then rounding up to reach 1.3 million vCPUs (5x the size of the largest on-premises grid). With that target in mind, the account limits were raised as follows:

  • Spot Instance Limit – 120,000
  • EBS Volume Limit – 120,000
  • EBS Capacity Limit – 2 PB

If you plan to create a grid of this size, you should also bring your friendly local AWS Solutions Architect into the loop as early as possible. They will review your plans, provide you with architecture guidance, and help you to schedule your run.

Running the Grid
We hit the Go button and launched the grid, watching as it bid for and obtained Spot Instances, each of which booted, initialized, and joined the grid within two minutes. The test workload used the Strata open source analytics & market risk library from OpenGamma and was set up with their assistance.

The grid grew to 61,299 Spot Instances (1.3 million vCPUs drawn from 34 instance types spanning 3 generations of EC2 hardware) as planned, with just 1,937 instances reclaimed and automatically replaced during the run, and cost $30,000 per hour to run, at an average hourly cost of $0.078 per vCPU. If the same instances had been used in On-Demand form, the hourly cost to run the grid would have been approximately $93,000.

Despite the scale of the grid, prices for the EC2 instances did not move during the bidding process. This is due to the overall size of the AWS Cloud and the smooth price change model that we launched late last year.

To give you a sense of the compute power, we computed that this grid would have taken the #1 position on the TOP 500 supercomputer list in November 2007 by a considerable margin, and the #2 position in June 2008. Today, it would occupy position #360 on the list.

I hope that you enjoyed this AWS success story, and that it gives you an idea of the scale that you can achieve in the cloud!


10 visualizations to try in Amazon QuickSight with sample data

Post Syndicated from Karthik Kumar Odapally original https://aws.amazon.com/blogs/big-data/10-visualizations-to-try-in-amazon-quicksight-with-sample-data/

If you’re not already familiar with building visualizations for quick access to business insights using Amazon QuickSight, consider this your introduction. In this post, we’ll walk through some common scenarios with sample datasets to provide an overview of how you can connect yuor data, perform advanced analysis and access the results from any web browser or mobile device.

The following visualizations are built from the public datasets available in the links below. Before we jump into that, let’s take a look at the supported data sources, file formats and a typical QuickSight workflow to build any visualization.

Which data sources does Amazon QuickSight support?

At the time of publication, you can use the following data methods:

  • Connect to AWS data sources, including:
    • Amazon RDS
    • Amazon Aurora
    • Amazon Redshift
    • Amazon Athena
    • Amazon S3
  • Upload Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
  • Connect to on-premises databases like Teradata, SQL Server, MySQL, and PostgreSQL
  • Import data from SaaS applications like Salesforce and Snowflake
  • Use big data processing engines like Spark and Presto

This list is constantly growing. For more information, see Supported Data Sources.

Answers in instants

SPICE is the Amazon QuickSight super-fast, parallel, in-memory calculation engine, designed specifically for ad hoc data visualization. SPICE stores your data in a system architected for high availability, where it is saved until you choose to delete it. Improve the performance of database datasets by importing the data into SPICE instead of using a direct database query. To calculate how much SPICE capacity your dataset needs, see Managing SPICE Capacity.

Typical Amazon QuickSight workflow

When you create an analysis, the typical workflow is as follows:

  1. Connect to a data source, and then create a new dataset or choose an existing dataset.
  2. (Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
  3. Create a new analysis.
  4. Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
  5. (Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
  6. (Optional) Add more visuals to the analysis.
  7. (Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
  8. (Optional) Publish the analysis as a dashboard to share insights with other users.

The following graphic illustrates a typical Amazon QuickSight workflow.

Visualizations created in Amazon QuickSight with sample datasets

Visualizations for a data analyst

Source:  https://data.worldbank.org/

Download and Resources:  https://datacatalog.worldbank.org/dataset/world-development-indicators

Data catalog:  The World Bank invests into multiple development projects at the national, regional, and global levels. It’s a great source of information for data analysts.

The following graph shows the percentage of the population that has access to electricity (rural and urban) during 2000 in Asia, Africa, the Middle East, and Latin America.

The following graph shows the share of healthcare costs that are paid out-of-pocket (private vs. public). Also, you can maneuver over the graph to get detailed statistics at a glance.

Visualizations for a trading analyst

Source:  Deutsche Börse Public Dataset (DBG PDS)

Download and resources:  https://aws.amazon.com/public-datasets/deutsche-boerse-pds/

Data catalog:  The DBG PDS project makes real-time data derived from Deutsche Börse’s trading market systems available to the public for free. This is the first time that such detailed financial market data has been shared freely and continually from the source provider.

The following graph shows the market trend of max trade volume for different EU banks. It builds on the data available on XETRA engines, which is made up of a variety of equities, funds, and derivative securities. This graph can be scrolled to visualize trade for a period of an hour or more.

The following graph shows the common stock beating the rest of the maximum trade volume over a period of time, grouped by security type.

Visualizations for a data scientist

Source:  https://catalog.data.gov/

Download and resources:  https://catalog.data.gov/dataset/road-weather-information-stations-788f8

Data catalog:  Data derived from different sensor stations placed on the city bridges and surface streets are a core information source. The road weather information station has a temperature sensor that measures the temperature of the street surface. It also has a sensor that measures the ambient air temperature at the station each second.

The following graph shows the present max air temperature in Seattle from different RWI station sensors.

The following graph shows the minimum temperature of the road surface at different times, which helps predicts road conditions at a particular time of the year.

Visualizations for a data engineer

Source:  https://www.kaggle.com/

Download and resources:  https://www.kaggle.com/datasnaek/youtube-new/data

Data catalog:  Kaggle has come up with a platform where people can donate open datasets. Data engineers and other community members can have open access to these datasets and can contribute to the open data movement. They have more than 350 datasets in total, with more than 200 as featured datasets. It has a few interesting datasets on the platform that are not present at other places, and it’s a platform to connect with other data enthusiasts.

The following graph shows the trending YouTube videos and presents the max likes for the top 20 channels. This is one of the most popular datasets for data engineers.

The following graph shows the YouTube daily statistics for the max views of video titles published during a specific time period.

Visualizations for a business user

Source:  New York Taxi Data

Download and resources:  https://data.cityofnewyork.us/Transportation/2016-Green-Taxi-Trip-Data/hvrh-b6nb

Data catalog: NYC Open data hosts some very popular open data sets for all New Yorkers. This platform allows you to get involved in dive deep into the data set to pull some useful visualizations. 2016 Green taxi trip dataset includes trip records from all trips completed in green taxis in NYC in 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

The following graph presents maximum fare amount grouped by the passenger count during a period of time during a day. This can be further expanded to follow through different day of the month based on the business need.

The following graph shows the NewYork taxi data from January 2016, showing the dip in the number of taxis ridden on January 23, 2016 across all types of taxis.

A quick search for that date and location shows you the following news report:


Using Amazon QuickSight, you can see patterns across a time-series data by building visualizations, performing ad hoc analysis, and quickly generating insights. We hope you’ll give it a try today!


Additional Reading

If you found this post useful, be sure to check out Amazon QuickSight Adds Support for Combo Charts and Row-Level Security and Visualize AWS Cloudtrail Logs Using AWS Glue and Amazon QuickSight.

Karthik Odapally is a Sr. Solutions Architect in AWS. His passion is to build cost effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.




Pranabesh Mandal is a Solutions Architect in AWS. He has over a decade of IT experience. He is passionate about cloud technology and focuses on Analytics. In his spare time, he likes to hike and explore the beautiful nature and wild life of most divine national parks around the United States alongside his wife.





How to centralize DNS management in a multi-account environment

Post Syndicated from Mahmoud Matouk original https://aws.amazon.com/blogs/security/how-to-centralize-dns-management-in-a-multi-account-environment/

In a multi-account environment where you require connectivity between accounts, and perhaps connectivity between cloud and on-premises workloads, the demand for a robust Domain Name Service (DNS) that’s capable of name resolution across all connected environments will be high.

The most common solution is to implement local DNS in each account and use conditional forwarders for DNS resolutions outside of this account. While this solution might be efficient for a single-account environment, it becomes complex in a multi-account environment.

In this post, I will provide a solution to implement central DNS for multiple accounts. This solution reduces the number of DNS servers and forwarders needed to implement cross-account domain resolution. I will show you how to configure this solution in four steps:

  1. Set up your Central DNS account.
  2. Set up each participating account.
  3. Create Route53 associations.
  4. Configure on-premises DNS (if applicable).

Solution overview

In this solution, you use AWS Directory Service for Microsoft Active Directory (AWS Managed Microsoft AD) as a DNS service in a dedicated account in a Virtual Private Cloud (DNS-VPC).

The DNS service included in AWS Managed Microsoft AD uses conditional forwarders to forward domain resolution to either Amazon Route 53 (for domains in the awscloud.com zone) or to on-premises DNS servers (for domains in the example.com zone). You’ll use AWS Managed Microsoft AD as the primary DNS server for other application accounts in the multi-account environment (participating accounts).

A participating account is any application account that hosts a VPC and uses the centralized AWS Managed Microsoft AD as the primary DNS server for that VPC. Each participating account has a private, hosted zone with a unique zone name to represent this account (for example, business_unit.awscloud.com).

You associate the DNS-VPC with the unique hosted zone in each of the participating accounts, this allows AWS Managed Microsoft AD to use Route 53 to resolve all registered domains in private, hosted zones in participating accounts.

The following diagram shows how the various services work together:

Diagram showing the relationship between all the various services

Figure 1: Diagram showing the relationship between all the various services


In this diagram, all VPCs in participating accounts use Dynamic Host Configuration Protocol (DHCP) option sets. The option sets configure EC2 instances to use the centralized AWS Managed Microsoft AD in DNS-VPC as their default DNS Server. You also configure AWS Managed Microsoft AD to use conditional forwarders to send domain queries to Route53 or on-premises DNS servers based on query zone. For domain resolution across accounts to work, we associate DNS-VPC with each hosted zone in participating accounts.

If, for example, server.pa1.awscloud.com needs to resolve addresses in the pa3.awscloud.com domain, the sequence shown in the following diagram happens:

How domain resolution across accounts works

Figure 2: How domain resolution across accounts works


  • 1.1: server.pa1.awscloud.com sends domain name lookup to default DNS server for the name server.pa3.awscloud.com. The request is forwarded to the DNS server defined in the DHCP option set (AWS Managed Microsoft AD in DNS-VPC).
  • 1.2: AWS Managed Microsoft AD forwards name resolution to Route53 because it’s in the awscloud.com zone.
  • 1.3: Route53 resolves the name to the IP address of server.pa3.awscloud.com because DNS-VPC is associated with the private hosted zone pa3.awscloud.com.

Similarly, if server.example.com needs to resolve server.pa3.awscloud.com, the following happens:

  • 2.1: server.example.com sends domain name lookup to on-premise DNS server for the name server.pa3.awscloud.com.
  • 2.2: on-premise DNS server using conditional forwarder forwards domain lookup to AWS Managed Microsoft AD in DNS-VPC.
  • 1.2: AWS Managed Microsoft AD forwards name resolution to Route53 because it’s in the awscloud.com zone.
  • 1.3: Route53 resolves the name to the IP address of server.pa3.awscloud.com because DNS-VPC is associated with the private hosted zone pa3.awscloud.com.

Step 1: Set up a centralized DNS account

In previous AWS Security Blog posts, Drew Dennis covered a couple of options for establishing DNS resolution between on-premises networks and Amazon VPC. In this post, he showed how you can use AWS Managed Microsoft AD (provisioned with AWS Directory Service) to provide DNS resolution with forwarding capabilities.

To set up a centralized DNS account, you can follow the same steps in Drew’s post to create AWS Managed Microsoft AD and configure the forwarders to send DNS queries for awscloud.com to default, VPC-provided DNS and to forward example.com queries to the on-premise DNS server.

Here are a few considerations while setting up central DNS:

  • The VPC that hosts AWS Managed Microsoft AD (DNS-VPC) will be associated with all private hosted zones in participating accounts.
  • To be able to resolve domain names across AWS and on-premises, connectivity through Direct Connect or VPN must be in place.

Step 2: Set up participating accounts

The steps I suggest in this section should be applied individually in each application account that’s participating in central DNS resolution.

  1. Create the VPC(s) that will host your resources in participating account.
  2. Create VPC Peering between local VPC(s) in each participating account and DNS-VPC.
  3. Create a private hosted zone in Route 53. Hosted zone domain names must be unique across all accounts. In the diagram above, we used pa1.awscloud.com / pa2.awscloud.com / pa3.awscloud.com. You could also use a combination of environment and business unit: for example, you could use pa1.dev.awscloud.com to achieve uniqueness.
  4. Associate VPC(s) in each participating account with the local private hosted zone.

The next step is to change the default DNS servers on each VPC using DHCP option set:

  1. Follow these steps to create a new DHCP option set. Make sure in the DNS Servers to put the private IP addresses of the two AWS Managed Microsoft AD servers that were created in DNS-VPC:
    The "Create DHCP options set" dialog box

    Figure 3: The “Create DHCP options set” dialog box


  2. Follow these steps to assign the DHCP option set to your VPC(s) in participating account.

Step 3: Associate DNS-VPC with private hosted zones in each participating account

The next steps will associate DNS-VPC with the private, hosted zone in each participating account. This allows instances in DNS-VPC to resolve domain records created in these hosted zones. If you need them, here are more details on associating a private, hosted zone with VPC on a different account.

  1. In each participating account, create the authorization using the private hosted zone ID from the previous step, the region, and the VPC ID that you want to associate (DNS-VPC).
    aws route53 create-vpc-association-authorization –hosted-zone-id <hosted-zone-id> –vpc VPCRegion=<region>,VPCId=<vpc-id>
  2. In the centralized DNS account, associate DNS-VPC with the hosted zone in each participating account.
    aws route53 associate-vpc-with-hosted-zone –hosted-zone-id <hosted-zone-id> –vpc VPCRegion=<region>,VPCId=<vpc-id>

After completing these steps, AWS Managed Microsoft AD in the centralized DNS account should be able to resolve domain records in the private, hosted zone in each participating account.

Step 4: Setting up on-premises DNS servers

This step is necessary if you would like to resolve AWS private domains from on-premises servers and this task comes down to configuring forwarders on-premise to forward DNS queries to AWS Managed Microsoft AD in DNS-VPC for all domains in the awscloud.com zone.

The steps to implement conditional forwarders vary by DNS product. Follow your product’s documentation to complete this configuration.


I introduced a simplified solution to implement central DNS resolution in a multi-account environment that could be also extended to support DNS resolution between on-premise resources and AWS. This can help reduce operations effort and the number of resources needed to implement cross-account domain resolution.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Directory Service forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Online Tech Talks – April & Early May 2018

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-april-early-may-2018/

We have several upcoming tech talks in the month of April and early May. Come join us to learn about AWS services and solution offerings. We’ll have AWS experts online to help answer questions in real-time. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

April & early May — 2018 Schedule


April 30, 2018 | 01:00 PM – 01:45 PM PTBest Practices for Running Amazon EC2 Spot Instances with Amazon EMR (300) – Learn about the best practices for scaling big data workloads as well as process, store, and analyze big data securely and cost effectively with Amazon EMR and Amazon EC2 Spot Instances.

May 1, 2018 | 01:00 PM – 01:45 PM PTHow to Bring Microsoft Apps to AWS (300) – Learn more about how to save significant money by bringing your Microsoft workloads to AWS.

May 2, 2018 | 01:00 PM – 01:45 PM PTDeep Dive on Amazon EC2 Accelerated Computing (300) – Get a technical deep dive on how AWS’ GPU and FGPA-based compute services can help you to optimize and accelerate your ML/DL and HPC workloads in the cloud.


April 23, 2018 | 11:00 AM – 11:45 AM PTNew Features for Building Powerful Containerized Microservices on AWS (300) – Learn about how this new feature works and how you can start using it to build and run modern, containerized applications on AWS.


April 23, 2018 | 01:00 PM – 01:45 PM PTElastiCache: Deep Dive Best Practices and Usage Patterns (200) – Learn about Redis-compatible in-memory data store and cache with Amazon ElastiCache.

April 25, 2018 | 01:00 PM – 01:45 PM PTIntro to Open Source Databases on AWS (200) – Learn how to tap the benefits of open source databases on AWS without the administrative hassle.


April 25, 2018 | 09:00 AM – 09:45 AM PTDebug your Container and Serverless Applications with AWS X-Ray in 5 Minutes (300) – Learn how AWS X-Ray makes debugging your Container and Serverless applications fun.

Enterprise & Hybrid

April 23, 2018 | 09:00 AM – 09:45 AM PTAn Overview of Best Practices of Large-Scale Migrations (300) – Learn about the tools and best practices on how to migrate to AWS at scale.

April 24, 2018 | 11:00 AM – 11:45 AM PTDeploy your Desktops and Apps on AWS (300) – Learn how to deploy your desktops and apps on AWS with Amazon WorkSpaces and Amazon AppStream 2.0


May 2, 2018 | 11:00 AM – 11:45 AM PTHow to Easily and Securely Connect Devices to AWS IoT (200) – Learn how to easily and securely connect devices to the cloud and reliably scale to billions of devices and trillions of messages with AWS IoT.

Machine Learning

April 24, 2018 | 09:00 AM – 09:45 AM PT Automate for Efficiency with Amazon Transcribe and Amazon Translate (200) – Learn how you can increase the efficiency and reach your operations with Amazon Translate and Amazon Transcribe.

April 26, 2018 | 09:00 AM – 09:45 AM PT Perform Machine Learning at the IoT Edge using AWS Greengrass and Amazon Sagemaker (200) – Learn more about developing machine learning applications for the IoT edge.


April 30, 2018 | 11:00 AM – 11:45 AM PTOffline GraphQL Apps with AWS AppSync (300) – Come learn how to enable real-time and offline data in your applications with GraphQL using AWS AppSync.


May 2, 2018 | 09:00 AM – 09:45 AM PT Taking Serverless to the Edge (300) – Learn how to run your code closer to your end users in a serverless fashion. Also, David Von Lehman from Aerobatic will discuss how they used [email protected] to reduce latency and cloud costs for their customer’s websites.

Security, Identity & Compliance

April 30, 2018 | 09:00 AM – 09:45 AM PTAmazon GuardDuty – Let’s Attack My Account! (300) – Amazon GuardDuty Test Drive – Practical steps on generating test findings.

May 3, 2018 | 09:00 AM – 09:45 AM PTProtect Your Game Servers from DDoS Attacks (200) – Learn how to use the new AWS Shield Advanced for EC2 to protect your internet-facing game servers against network layer DDoS attacks and application layer attacks of all kinds.


April 24, 2018 | 01:00 PM – 01:45 PM PTTips and Tricks for Building and Deploying Serverless Apps In Minutes (200) – Learn how to build and deploy apps in minutes.


May 1, 2018 | 11:00 AM – 11:45 AM PTBuilding Data Lakes That Cost Less and Deliver Results Faster (300) – Learn how Amazon S3 Select And Amazon Glacier Select increase application performance by up to 400% and reduce total cost of ownership by extending your data lake into cost-effective archive storage.

May 3, 2018 | 11:00 AM – 11:45 AM PTIntegrating On-Premises Vendors with AWS for Backup (300) – Learn how to work with AWS and technology partners to build backup & restore solutions for your on-premises, hybrid, and cloud native environments.

New – Encryption of Data in Transit for Amazon EFS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-encryption-of-data-in-transit-for-amazon-efs/

Amazon Elastic File System was designed to be the file system of choice for cloud-native applications that require shared access to file-based storage. We launched EFS in mid-2016 and have added several important features since then including on-premises access via Direct Connect and encryption of data at rest. We have also made EFS available in additional AWS Regions, most recently US West (Northern California). As was the case with EFS itself, these enhancements were made in response to customer feedback, and reflect our desire to serve an ever-widening customer base.

Encryption in Transit
Today we are making EFS even more useful with the addition of support for encryption of data in transit. When used in conjunction with the existing support for encryption of data at rest, you now have the ability to protect your stored files using a defense-in-depth security strategy.

In order to make it easy for you to implement encryption in transit, we are also releasing an EFS mount helper. The helper (available in source code and RPM form) takes care of setting up a TLS tunnel to EFS, and also allows you to mount file systems by ID. The two features are independent; you can use the helper to mount file systems by ID even if you don’t make use of encryption in transit. The helper also supplies a recommended set of default options to the actual mount command.

Setting up Encryption
I start by installing the EFS mount helper on my Amazon Linux instance:

$ sudo yum install -y amazon-efs-utils

Next, I visit the EFS Console and capture the file system ID:

Then I specify the ID (and the TLS option) to mount the file system:

$ sudo mount -t efs fs-92758f7b -o tls /mnt/efs

And that’s it! The encryption is transparent and has an almost negligible impact on data transfer speed.

Available Now
You can start using encryption in transit today in all AWS Regions where EFS is available.

The mount helper is available for Amazon Linux. If you are running another distribution of Linux you will need to clone the GitHub repo and build your own RPM, as described in the README.


Amazon S3 Update: New Storage Class and General Availability of S3 Select

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-s3-update-new-storage-class-general-availability-of-s3-select/

I’ve got two big pieces of news for anyone who stores and retrieves data in Amazon Simple Storage Service (S3):

New S3 One Zone-IA Storage Class – This new storage class is 20% less expensive than the existing Standard-IA storage class. It is designed to be used to store data that does not need the extra level of protection provided by geographic redundancy.

General Availability of S3 Select – This unique retrieval option lets you retrieve subsets of data from S3 objects using simple SQL expressions, with the possibility for a 400% performance improvement in the process.

Let’s take a look at both!

S3 One Zone-IA (Infrequent Access) Storage Class
This new storage class stores data in a single AWS Availability Zone and is designed to provide eleven 9’s (99.99999999%) of data durability, just like the other S3 storage classes. Unlike those other classes, it is not designed to be resilient to the physical loss of an AZ due to major event such as an earthquake or a flood, and data could be lost in the unlikely event that an AZ is destroyed. S3 One Zone-IA storage gives you a lower cost option for secondary backups of on-premises data and for data that can be easily re-created. You can also use it as the target of S3 Cross-Region Replication from another AWS region.

You can specify the use of S3 One Zone-IA storage when you upload a new object to S3:

You can also make use of it as part of an S3 lifecycle rule:

You can set up a lifecycle rule that moves previous versions of an object to S3 One Zone-IA after 30 or more days:

And you can modify the storage class of an existing object:

You can also manage storage classes using the S3 API, CLI, and CloudFormation templates.

The S3 One Zone-IA storage class can be used in all public AWS regions. As I noted earlier, pricing is 20% lower than for the S3 Standard-IA storage class (see the S3 Pricing page for more info). There’s a 30 day minimum retention period, and a 128 KB minimum object size.

General Availability of S3 Select
Randall wrote a detailed introduction to S3 Select last year and showed you how you can use it to retrieve selected data from within S3 objects. During the preview we added support for server-side encryption and the ability to run queries from the S3 Console.

I used a CSV file of airport codes to exercise the new console functionality:

This file contains listings for over 9100 airports, so it makes for useful test data but it definitely does not test the limits of S3 Select in any way. I select the file, open the More menu, and choose Select from:

The console sets the file format and compression according to the file name and the encryption status. I set delimiter and click Show file preview to verify that my settings are correct. Then I click Next to proceed:

I type SQL expressions in the SQL editor and click Run SQL to issue the query:


I can also issue queries from the AWS SDKs. I initiate the select operation:

s3 = boto3.client('s3', region_name='us-west-2')

r = s3.select_object_content(
        Expression="select * from s3object s where s.\"Country (Name)\" like '%United States%'",
        InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}},
        OutputSerialization = {'CSV': {}},

And then I process the stream of results:

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
    elif 'Stats' in event:
        statsDetails = event['Stats']['Details']
        print("Stats details bytesScanned: ")
        print("Stats details bytesProcessed: ")

S3 Select is available in all public regions and you can start using it today. Pricing is based on the amount of data scanned and the amount of data returned.


How to Delegate Administration of Your AWS Managed Microsoft AD Directory to Your On-Premises Active Directory Users

Post Syndicated from Vijay Sharma original https://aws.amazon.com/blogs/security/how-to-delegate-administration-of-your-aws-managed-microsoft-ad-directory-to-your-on-premises-active-directory-users/

You can now enable your on-premises users administer your AWS Directory Service for Microsoft Active Directory, also known as AWS Managed Microsoft AD. Using an Active Directory (AD) trust and the new AWS delegated AD security groups, you can grant administrative permissions to your on-premises users by managing group membership in your on-premises AD directory. This simplifies how you manage who can perform administration. It also makes it easier for your administrators because they can sign in to their existing workstation with their on-premises AD credential to administer your AWS Managed Microsoft AD.

AWS created new domain local AD security groups (AWS delegated groups) in your AWS Managed Microsoft AD directory. Each AWS delegated group has unique AD administrative permissions. Users that are members in the new AWS delegated groups get permissions to perform administrative tasks, such as add users, configure fine-grained password policies and enable Microsoft enterprise Certificate Authority. Because the AWS delegated groups are domain local in scope, you can use them through an AD Trust to your on-premises AD. This eliminates the requirement to create and use separate identities to administer your AWS Managed Microsoft AD. Instead, by adding selected on-premises users to desired AWS delegated groups, you can grant your administrators some or all of the permissions. You can simplify this even further by adding on-premises AD security groups to the AWS delegated groups. This enables you to add and remove users from your on-premises AD security group so that they can manage administrative permissions in your AWS Managed Microsoft AD.

In this blog post, I will show you how to delegate permissions to your on-premises users to perform an administrative task–configuring fine-grained password policies–in your AWS Managed Microsoft AD directory. You can follow the steps in this post to delegate other administrative permissions, such as configuring group Managed Service Accounts and Kerberos constrained delegation, to your on-premises users.


Until now, AWS Managed Microsoft AD delegated administrative permissions for your directory by creating AD security groups in your Organization Unit (OU) and authorizing these AWS delegated groups for common administrative activities. The admin user in your directory created user accounts within your OU, and granted these users permissions to administer your directory by adding them to one or more of these AWS delegated groups.

However, if you used your AWS Managed Microsoft AD with a trust to an on-premises AD forest, you couldn’t add users from your on-premises directory to these AWS delegated groups. This is because AWS created the AWS delegated groups with global scope, which restricts adding users from another forest. This necessitated that you create different user accounts in AWS Managed Microsoft AD for the purpose of administration. As a result, AD administrators typically had to remember additional credentials for AWS Managed Microsoft AD.

To address this, AWS created new AWS delegated groups with domain local scope in a separate OU called AWS Delegated Groups. These new AWS delegated groups with domain local scope are more flexible and permit adding users and groups from other domains and forests. This allows your admin user to delegate your on-premises users and groups administrative permissions to your AWS Managed Microsoft AD directory.

Note: If you already have an existing AWS Managed Microsoft AD directory containing the original AWS delegated groups with global scope, AWS preserved the original AWS delegated groups in the event you are currently using them with identities in AWS Managed Microsoft AD. AWS recommends that you transition to use the new AWS delegated groups with domain local scope. All newly created AWS Managed Microsoft AD directories have the new AWS delegated groups with domain local scope only.

Now, I will show you the steps to delegate administrative permissions to your on-premises users and groups to configure fine-grained password policies in your AWS Managed Microsoft AD directory.


For this post, I assume you are familiar with AD security groups and how security group scope rules work. I also assume you are familiar with AD trusts.

The instructions in this blog post require you to have the following components running:

Solution overview

I will now show you how to manage which on-premises users have delegated permissions to administer your directory by efficiently using on-premises AD security groups to manage these permissions. I will do this by:

  1. Adding on-premises groups to an AWS delegated group. In this step, you sign in to management instance connected to AWS Managed Microsoft AD directory as admin user and add on-premises groups to AWS delegated groups.
  2. Administer your AWS Managed Microsoft AD directory as on-premises user. In this step, you sign in to a workstation connected to your on-premises AD using your on-premises credentials and administer your AWS Managed Microsoft AD directory.

For the purpose of this blog, I already have an on-premises AD directory (in this case, on-premises.com). I also created an AWS Managed Microsoft AD directory (in this case, corp.example.com) that I use with Amazon RDS for SQL Server. To enable Integrated Windows Authentication to my on-premises.com domain, I established a one-way outgoing trust from my AWS Managed Microsoft AD directory to my on-premises AD directory. To administer my AWS Managed Microsoft AD, I created an Amazon EC2 for Windows Server instance (in this case, Cloud Management). I also have an on-premises workstation (in this case, On-premises Management), that is connected to my on-premises AD directory.

The following diagram represents the relationships between the on-premises AD and the AWS Managed Microsoft AD directory.

The left side represents the AWS Cloud containing AWS Managed Microsoft AD directory. I connected the directory to the on-premises AD directory via a 1-way forest trust relationship. When AWS created my AWS Managed Microsoft AD directory, AWS created a group called AWS Delegated Fine Grained Password Policy Administrators that has permissions to configure fine-grained password policies in AWS Managed Microsoft AD.

The right side of the diagram represents the on-premises AD directory. I created a global AD security group called On-premises fine grained password policy admins and I configured it so all members can manage fine grained password policies in my on-premises AD. I have two administrators in my company, John and Richard, who I added as members of On-premises fine grained password policy admins. I want to enable John and Richard to also manage fine grained password policies in my AWS Managed Microsoft AD.

While I could add John and Richard to the AWS Delegated Fine Grained Password Policy Administrators individually, I want a more efficient way to delegate and remove permissions for on-premises users to manage fine grained password policies in my AWS Managed Microsoft AD. In fact, I want to assign permissions to the same people that manage password policies in my on-premises directory.

Diagram showing delegation of administrative permissions to on-premises users

To do this, I will:

  1. As admin user, add the On-premises fine grained password policy admins as member of the AWS Delegated Fine Grained Password Policy Administrators security group from my Cloud Management machine.
  2. Manage who can administer password policies in my AWS Managed Microsoft AD directory by adding and removing users as members of the On-premises fine grained password policy admins. Doing so enables me to perform all my delegation work in my on-premises directory without the need to use a remote desktop protocol (RDP) session to my Cloud Management instance. In this case, Richard, who is a member of On-premises fine grained password policy admins group can now administer AWS Managed Microsoft AD directory from On-premises Management workstation.

Although I’m showing a specific case using fine grained password policy delegation, you can do this with any of the new AWS delegated groups and your on-premises groups and users.

Let’s get started.

Step 1 – Add on-premises groups to AWS delegated groups

In this step, open an RDP session to the Cloud Management instance and sign in as the admin user in your AWS Managed Microsoft AD directory. Then, add your users and groups from your on-premises AD to AWS delegated groups in AWS Managed Microsoft AD directory. In this example, I do the following:

  1. Sign in to the Cloud Management instance with the user name admin and the password that you set for the admin user when you created your directory.
  2. Open the Microsoft Windows Server Manager and navigate to Tools > Active Directory Users and Computers.
  3. Switch to the tree view and navigate to corp.example.com > AWS Delegated Groups. Right-click AWS Delegated Fine Grained Password Policy Administrators and select Properties.
  4. In the AWS Delegated Fine Grained Password Policy window, switch to Members tab and choose Add.
  5. In the Select Users, Contacts, Computers, Service Accounts, or Groups window, choose Locations.
  6. In the Locations window, select on-premises.com domain and choose OK.
  7. In the Enter the object names to select box, enter on-premises fine grained password policy admins and choose Check Names.
  8. Because I have a 1-way trust from AWS Managed Microsoft AD to my on-premises AD, Windows prompts me to enter credentials for an on-premises user account that has permissions to complete the search. If I had a 2-way trust and the admin account in my AWS Managed Microsoft AD has permissions to read my on-premises directory, Windows will not prompt me.In the Windows Security window, enter the credentials for an account with permissions for on-premises.com and choose OK.
  9. Click OK to add On-premises fine grained password policy admins group as a member of the AWS Delegated Fine Grained Password Policy Administrators group in your AWS Managed Microsoft AD directory.

At this point, any user that is a member of On-premises fine grained password policy admins group has permissions to manage password policies in your AWS Managed Microsoft AD directory.

Step 2 – Administer your AWS Managed Microsoft AD as on-premises user

Any member of the on-premises group(s) that you added to an AWS delegated group inherited the permissions of the AWS delegated group.

In this example, Richard signs in to the On-premises Management instance. Because Richard inherited permissions from Delegated Fine Grained Password Policy Administrators, he can now administer fine grained password policies in the AWS Managed Microsoft AD directory using on-premises credentials.

  1. Sign in to the On-premises Management instance as Richard.
  2. Open the Microsoft Windows Server Manager and navigate to Tools > Active Directory Users and Computers.
  3. Switch to the tree view, right-click Active Directory Users and Computers, and then select Change Domain.
  4. In the Change Domain window, enter corp.example.com, and then choose OK.
  5. You’ll be connected to your AWS Managed Microsoft AD domain:

Richard can now administer the password policies. Because John is also a member of the AWS delegated group, John can also perform password policy administration the same way.

In future, if Richard moves to another division within the company and you hire Judy as a replacement for Richard, you can simply remove Richard from On-premises fine grained password policy admins group and add Judy to this group. Richard will no longer have administrative permissions, while Judy can now administer password policies for your AWS Managed Microsoft AD directory.


We’ve tried to make it easier for you to administer your AWS Managed Microsoft AD directory by creating AWS delegated groups with domain local scope. You can add your on-premises AD groups to the AWS delegated groups. You can then control who can administer your directory by managing group membership in your on-premises AD directory. Your administrators can sign in to their existing on-premises workstations using their on-premises credentials and administer your AWS Managed Microsoft AD directory. I encourage you to explore the new AWS delegated security groups by using Active Directory Users and Computers from the management instance for your AWS Managed Microsoft AD. To learn more about AWS Directory Service, see the AWS Directory Service home page. If you have questions, please post them on the Directory Service forum. If you have comments about this post, submit them in the “Comments” section below.


Best Practices for Running Apache Kafka on AWS

Post Syndicated from Prasad Alle original https://aws.amazon.com/blogs/big-data/best-practices-for-running-apache-kafka-on-aws/

This post was written in partnership with Intuit to share learnings, best practices, and recommendations for running an Apache Kafka cluster on AWS. Thanks to Vaishak Suresh and his colleagues at Intuit for their contribution and support.

Intuit, in their own words: Intuit, a leading enterprise customer for AWS, is a creator of business and financial management solutions. For more information on how Intuit partners with AWS, see our previous blog post, Real-time Stream Processing Using Apache Spark Streaming and Apache Kafka on AWS. Apache Kafka is an open-source, distributed streaming platform that enables you to build real-time streaming applications.

The best practices described in this post are based on our experience in running and operating large-scale Kafka clusters on AWS for more than two years. Our intent for this post is to help AWS customers who are currently running Kafka on AWS, and also customers who are considering migrating on-premises Kafka deployments to AWS.

AWS offers Amazon Kinesis Data Streams, a Kafka alternative that is fully managed.

Running your Kafka deployment on Amazon EC2 provides a high performance, scalable solution for ingesting streaming data. AWS offers many different instance types and storage option combinations for Kafka deployments. However, given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case.

In this blog post, we cover the following aspects of running Kafka clusters on AWS:

  • Deployment considerations and patterns
  • Storage options
  • Instance types
  • Networking
  • Upgrades
  • Performance tuning
  • Monitoring
  • Security
  • Backup and restore

Note: While implementing Kafka clusters in a production environment, make sure also to consider factors like your number of messages, message size, monitoring, failure handling, and any operational issues.

Deployment considerations and patterns

In this section, we discuss various deployment options available for Kafka on AWS, along with pros and cons of each option. A successful deployment starts with thoughtful consideration of these options. Considering availability, consistency, and operational overhead of the deployment helps when choosing the right option.

Single AWS Region, Three Availability Zones, All Active

One typical deployment pattern (all active) is in a single AWS Region with three Availability Zones (AZs). One Kafka cluster is deployed in each AZ along with Apache ZooKeeper and Kafka producer and consumer instances as shown in the illustration following.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers and Kafka cluster are deployed on each AZ.
  • Data is distributed evenly across three Kafka clusters by using Elastic Load Balancer.
  • Kafka consumers aggregate data from all three Kafka clusters.

Kafka cluster failover occurs this way:

  • Mark down all Kafka producers
  • Stop consumers
  • Debug and restack Kafka
  • Restart consumers
  • Restart Kafka producers

Following are the pros and cons of this pattern.

Pros Cons
  • Highly available
  • Can sustain the failure of two AZs
  • No message loss during failover
  • Simple deployment


  • Very high operational overhead:
    • All changes need to be deployed three times, one for each Kafka cluster
    • Maintaining and monitoring three Kafka clusters
    • Maintaining and monitoring three consumer clusters

A restart is required for patching and upgrading brokers in a Kafka cluster. In this approach, a rolling upgrade is done separately for each cluster.

Single Region, Three Availability Zones, Active-Standby

Another typical deployment pattern (active-standby) is in a single AWS Region with a single Kafka cluster and Kafka brokers and Zookeepers distributed across three AZs. Another similar Kafka cluster acts as a standby as shown in the illustration following. You can use Kafka mirroring with MirrorMaker to replicate messages between any two clusters.

In this pattern, this is the Kafka cluster deployment:

  • Kafka producers are deployed on all three AZs.
  • Only one Kafka cluster is deployed across three AZs (active).
  • ZooKeeper instances are deployed on each AZ.
  • Brokers are spread evenly across all three AZs.
  • Kafka consumers can be deployed across all three AZs.
  • Standby Kafka producers and a Multi-AZ Kafka cluster are part of the deployment.

Kafka cluster failover occurs this way:

  • Switch traffic to standby Kafka producers cluster and Kafka cluster.
  • Restart consumers to consume from standby Kafka cluster.

Following are the pros and cons of this pattern.

Pros Cons
  • Less operational overhead when compared to the first option
  • Only one Kafka cluster to manage and consume data from
  • Can handle single AZ failures without activating a standby Kafka cluster
  • Added latency due to cross-AZ data transfer among Kafka brokers
  • For Kafka versions before 0.10, replicas for topic partitions have to be assigned so they’re distributed to the brokers on different AZs (rack-awareness)
  • The cluster can become unavailable in case of a network glitch, where ZooKeeper does not see Kafka brokers
  • Possibility of in-transit message loss during failover

Intuit recommends using a single Kafka cluster in one AWS Region, with brokers distributing across three AZs (single region, three AZs). This approach offers stronger fault tolerance than otherwise, because a failed AZ won’t cause Kafka downtime.

Storage options

There are two storage options for file storage in Amazon EC2:

Ephemeral storage is local to the Amazon EC2 instance. It can provide high IOPS based on the instance type. On the other hand, Amazon EBS volumes offer higher resiliency and you can configure IOPS based on your storage needs. EBS volumes also offer some distinct advantages in terms of recovery time. Your choice of storage is closely related to the type of workload supported by your Kafka cluster.

Kafka provides built-in fault tolerance by replicating data partitions across a configurable number of instances. If a broker fails, you can recover it by fetching all the data from other brokers in the cluster that host the other replicas. Depending on the size of the data transfer, it can affect recovery process and network traffic. These in turn eventually affect the cluster’s performance.

The following table contrasts the benefits of using an instance store versus using EBS for storage.

Instance store EBS
  • Instance storage is recommended for large- and medium-sized Kafka clusters. For a large cluster, read/write traffic is distributed across a high number of brokers, so the loss of a broker has less of an impact. However, for smaller clusters, a quick recovery for the failed node is important, but a failed broker takes longer and requires more network traffic for a smaller Kafka cluster.
  • Storage-optimized instances like h1, i3, and d2 are an ideal choice for distributed applications like Kafka.


  • The primary advantage of using EBS in a Kafka deployment is that it significantly reduces data-transfer traffic when a broker fails or must be replaced. The replacement broker joins the cluster much faster.
  • Data stored on EBS is persisted in case of an instance failure or termination. The broker’s data stored on an EBS volume remains intact, and you can mount the EBS volume to a new EC2 instance. Most of the replicated data for the replacement broker is already available in the EBS volume and need not be copied over the network from another broker. Only the changes made after the original broker failure need to be transferred across the network. That makes this process much faster.



Intuit chose EBS because of their frequent instance restacking requirements and also other benefits provided by EBS.

Generally, Kafka deployments use a replication factor of three. EBS offers replication within their service, so Intuit chose a replication factor of two instead of three.

Instance types

The choice of instance types is generally driven by the type of storage required for your streaming applications on a Kafka cluster. If your application requires ephemeral storage, h1, i3, and d2 instances are your best option.

Intuit used r3.xlarge instances for their brokers and r3.large for ZooKeeper, with ST1 (throughput optimized HDD) EBS for their Kafka cluster.

Here are sample benchmark numbers from Intuit tests.

Configuration Broker bytes (MB/s)
  • r3.xlarge
  • ST1 EBS
  • 12 brokers
  • 12 partitions


Aggregate 346.9

If you need EBS storage, then AWS has a newer-generation r4 instance. The r4 instance is superior to R3 in many ways:

  • It has a faster processor (Broadwell).
  • EBS is optimized by default.
  • It features networking based on Elastic Network Adapter (ENA), with up to 10 Gbps on smaller sizes.
  • It costs 20 percent less than R3.

Note: It’s always best practice to check for the latest changes in instance types.


The network plays a very important role in a distributed system like Kafka. A fast and reliable network ensures that nodes can communicate with each other easily. The available network throughput controls the maximum amount of traffic that Kafka can handle. Network throughput, combined with disk storage, is often the governing factor for cluster sizing.

If you expect your cluster to receive high read/write traffic, select an instance type that offers 10-Gb/s performance.

In addition, choose an option that keeps interbroker network traffic on the private subnet, because this approach allows clients to connect to the brokers. Communication between brokers and clients uses the same network interface and port. For more details, see the documentation about IP addressing for EC2 instances.

If you are deploying in more than one AWS Region, you can connect the two VPCs in the two AWS Regions using cross-region VPC peering. However, be aware of the networking costs associated with cross-AZ deployments.


Kafka has a history of not being backward compatible, but its support of backward compatibility is getting better. During a Kafka upgrade, you should keep your producer and consumer clients on a version equal to or lower than the version you are upgrading from. After the upgrade is finished, you can start using a new protocol version and any new features it supports. There are three upgrade approaches available, discussed following.

Rolling or in-place upgrade

In a rolling or in-place upgrade scenario, upgrade one Kafka broker at a time. Take into consideration the recommendations for doing rolling restarts to avoid downtime for end users.

Downtime upgrade

If you can afford the downtime, you can take your entire cluster down, upgrade each Kafka broker, and then restart the cluster.

Blue/green upgrade

Intuit followed the blue/green deployment model for their workloads, as described following.

If you can afford to create a separate Kafka cluster and upgrade it, we highly recommend the blue/green upgrade scenario. In this scenario, we recommend that you keep your clusters up-to-date with the latest Kafka version. For additional details on Kafka version upgrades or more details, see the Kafka upgrade documentation.

The following illustration shows a blue/green upgrade.

In this scenario, the upgrade plan works like this:

  • Create a new Kafka cluster on AWS.
  • Create a new Kafka producers stack to point to the new Kafka cluster.
  • Create topics on the new Kafka cluster.
  • Test the green deployment end to end (sanity check).
  • Using Amazon Route 53, change the new Kafka producers stack on AWS to point to the new green Kafka environment that you have created.

The roll-back plan works like this:

  • Switch Amazon Route 53 to the old Kafka producers stack on AWS to point to the old Kafka environment.

For additional details on blue/green deployment architecture using Kafka, see the re:Invent presentation Leveraging the Cloud with a Blue-Green Deployment Architecture.

Performance tuning

You can tune Kafka performance in multiple dimensions. Following are some best practices for performance tuning.

 These are some general performance tuning techniques:

  • If throughput is less than network capacity, try the following:
    • Add more threads
    • Increase batch size
    • Add more producer instances
    • Add more partitions
  • To improve latency when acks =-1, increase your num.replica.fetches value.
  • For cross-AZ data transfer, tune your buffer settings for sockets and for OS TCP.
  • Make sure that num.io.threads is greater than the number of disks dedicated for Kafka.
  • Adjust num.network.threads based on the number of producers plus the number of consumers plus the replication factor.
  • Your message size affects your network bandwidth. To get higher performance from a Kafka cluster, select an instance type that offers 10 Gb/s performance.

For Java and JVM tuning, try the following:

  • Minimize GC pauses by using the Oracle JDK, which uses the new G1 garbage-first collector.
  • Try to keep the Kafka heap size below 4 GB.


Knowing whether a Kafka cluster is working correctly in a production environment is critical. Sometimes, just knowing that the cluster is up is enough, but Kafka applications have many moving parts to monitor. In fact, it can easily become confusing to understand what’s important to watch and what you can set aside. Items to monitor range from simple metrics about the overall rate of traffic, to producers, consumers, brokers, controller, ZooKeeper, topics, partitions, messages, and so on.

For monitoring, Intuit used several tools, including Newrelec, Wavefront, Amazon CloudWatch, and AWS CloudTrail. Our recommended monitoring approach follows.

For system metrics, we recommend that you monitor:

  • CPU load
  • Network metrics
  • File handle usage
  • Disk space
  • Disk I/O performance
  • Garbage collection
  • ZooKeeper

For producers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg

For consumers, we recommend that you monitor:

  • Batch-size-avg
  • Compression-rate-avg
  • Waiting-threads
  • Buffer-available-bytes
  • Record-queue-time-max
  • Record-send-rate
  • Records-per-request-avg


Like most distributed systems, Kafka provides the mechanisms to transfer data with relatively high security across the components involved. Depending on your setup, security might involve different services such as encryption, Kerberos, Transport Layer Security (TLS) certificates, and advanced access control list (ACL) setup in brokers and ZooKeeper. The following tells you more about the Intuit approach. For details on Kafka security not covered in this section, see the Kafka documentation.

Encryption at rest

For EBS-backed EC2 instances, you can enable encryption at rest by using Amazon EBS volumes with encryption enabled. Amazon EBS uses AWS Key Management Service (AWS KMS) for encryption. For more details, see Amazon EBS Encryption in the EBS documentation. For instance store–backed EC2 instances, you can enable encryption at rest by using Amazon EC2 instance store encryption.

Encryption in transit

Kafka uses TLS for client and internode communications.


Authentication of connections to brokers from clients (producers and consumers) to other brokers and tools uses either Secure Sockets Layer (SSL) or Simple Authentication and Security Layer (SASL).

Kafka supports Kerberos authentication. If you already have a Kerberos server, you can add Kafka to your current configuration.


In Kafka, authorization is pluggable and integration with external authorization services is supported.

Backup and restore

The type of storage used in your deployment dictates your backup and restore strategy.

The best way to back up a Kafka cluster based on instance storage is to set up a second cluster and replicate messages using MirrorMaker. Kafka’s mirroring feature makes it possible to maintain a replica of an existing Kafka cluster. Depending on your setup and requirements, your backup cluster might be in the same AWS Region as your main cluster or in a different one.

For EBS-based deployments, you can enable automatic snapshots of EBS volumes to back up volumes. You can easily create new EBS volumes from these snapshots to restore. We recommend storing backup files in Amazon S3.

For more information on how to back up in Kafka, see the Kafka documentation.


In this post, we discussed several patterns for running Kafka in the AWS Cloud. AWS also provides an alternative managed solution with Amazon Kinesis Data Streams, there are no servers to manage or scaling cliffs to worry about, you can scale the size of your streaming pipeline in seconds without downtime, data replication across availability zones is automatic, you benefit from security out of the box, Kinesis Data Streams is tightly integrated with a wide variety of AWS services like Lambda, Redshift, Elasticsearch and it supports open source frameworks like Storm, Spark, Flink, and more. You may refer to kafka-kinesis connector.

If you have questions or suggestions, please comment below.

Additional Reading

If you found this post useful, be sure to check out Implement Serverless Log Analytics Using Amazon Kinesis Analytics and Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics.

About the Author

Prasad Alle is a Senior Big Data Consultant with AWS Professional Services. He spends his time leading and building scalable, reliable Big data, Machine learning, Artificial Intelligence and IoT solutions for AWS Enterprise and Strategic customers. His interests extend to various technologies such as Advanced Edge Computing, Machine learning at Edge. In his spare time, he enjoys spending time with his family.