Security updates have been issued by CentOS (procps, xmlrpc, and xmlrpc3), Debian (batik, prosody, redmine, wireshark, and zookeeper), Fedora (jasper, kernel, poppler, and xmlrpc), Mageia (git and wireshark), Red Hat (rh-java-common-xmlrpc), Slackware (git), SUSE (bzr, dpdk-thunderxdpdk, and ocaml), and Ubuntu (exempi).
Security updates have been issued by Debian (kernel, procps, and tiff), Fedora (ca-certificates, chromium, and git), Mageia (kernel, kernel-linus, kernel-tmb, and libvirt), openSUSE (chromium and xen), Oracle (procps, xmlrpc, and xmlrpc3), Red Hat (xmlrpc and xmlrpc3), Scientific Linux (procps, xmlrpc, and xmlrpc3), SUSE (HA kernel modules and kernel), and Ubuntu (libytnef and python-oslo.middleware).
Today I’m excited to announce built-in authentication support in Application Load Balancers (ALB). ALB can now securely authenticate users as they access applications, letting developers eliminate the code they have to write to support authentication and offload the responsibility of authentication from the backend. The team built a great live example where you can try out the authentication functionality.
Identity-based security is a crucial component of modern applications and as customers continue to move mission critical applications into the cloud, developers are asked to write the same authentication code again and again. Enterprises want to use their on-premises identities with their cloud applications. Web developers want to use federated identities from social networks to allow their users to sign-in. ALB’s new authentication action provides authentication through social Identity Providers (IdP) like Google, Facebook, and Amazon through Amazon Cognito. It also natively integrates with any OpenID Connect protocol compliant IdP, providing secure authentication and a single sign-on experience across your applications.
How Does ALB Authentication Work?
Authentication is a complicated topic and our readers may have differing levels of expertise with it. I want to cover a few key concepts to make sure we’re all on the same page. If you’re already an authentication expert and you just want to see how ALB authentication works feel free to skip to the next section!
Authentication verifies identity.
Authorization verifies permissions, the things an identity is allowed to do.
OpenID Connect (OIDC) is a simple identity, or authentication, layer built on top on top of the OAuth 2.0 protocol. The OIDC specification document is pretty well written and worth a casual read.
Identity Providers (IdPs) manage identity information and provide authentication services. ALB supports any OIDC compliant IdP and you can use a service like Amazon Cognito or Auth0 to aggregate different identities from various IdPs like Active Directory, LDAP, Google, Facebook, Amazon, or others deployed in AWS or on premises.
When we get away from the terminology for a bit, all of this boils down to figuring out who a user is and what they’re allowed to do. Doing this securely and efficiently is hard. Traditionally, enterprises have used a protocol called SAML with their IdPs, to provide a single sign-on (SSO) experience for their internal users. SAML is XML heavy and modern applications have started using OIDC with JSON mechanism to share claims. Developers can use SAML in ALB with Amazon Cognito’s SAML support. Web app or mobile developers typically use federated identities via social IdPs like Facebook, Amazon, or Google which, conveniently, are also supported by Amazon Cognito.
ALB Authentication works by defining an authentication action in a listener rule. The ALB’s authentication action will check if a session cookie exists on incoming requests, then check that it’s valid. If the session cookie is set and valid then the ALB will route the request to the target group with X-AMZN-OIDC-* headers set. The headers contain identity information in JSON Web Token (JWT) format, that a backend can use to identify a user. If the session cookie is not set or invalid then ALB will follow the OIDC protocol and issue an HTTP 302 redirect to the identity provider. The protocol is a lot to unpack and is covered more thoroughly in the documentation for those curious.
ALB Authentication Walkthrough
I have a simple Python flask app in an Amazon ECS cluster running in some AWS Fargate containers. The containers are in a target group routed to by an ALB. I want to make sure users of my application are logged in before accessing the authenticated portions of my application. First, I’ll navigate to the ALB in the console and edit the rules.
I want to make sure all access to /account* endpoints is authenticated so I’ll add new rule with a condition to match those endpoints.
Now, I’ll add a new rule and create an Authenticate action in that rule.
I’ll have ALB create a new Amazon Cognito user pool for me by providing some configuration details.
After creating the Amazon Cognito pool, I can make some additional configuration in the advanced settings.
I can change the default cookie name, adjust the timeout, adjust the scope, and choose the action for unauthenticated requests.
I can pick Deny to serve a 401 for all unauthenticated requests or I can pick Allow which will pass through to the application if unauthenticated. This is useful for Single Page Apps (SPAs). For now, I’ll choose Authenticate, which will prompt the IdP, in this case Amazon Cognito, to authenticate the user and reload the existing page.
Now I’ll add a forwarding action for my target group and save the rule.
Over on the Facebook side I just need to add my Amazon Cognito User Pool Domain to the whitelisted OAuth redirect URLs.
I would follow similar steps for other authentication providers.
Now, when I navigate to an authenticated page my Fargate containers receive the originating request with the X-Amzn-Oidc-* headers set by ALB. Using the information in those headers (claims-data, identity, access-token) my application can implement authorization.
All of this was possible without having to write a single line of code to deal with each of the IdPs. However, it’s still important for the implementing applications to verify the signature on the JWT header to ensure the request hasn’t been tampered with.
Additional Resources
Of course everything we’ve seen today is also available in the the API and AWS Command Line Interface (CLI). You can find additional information on the feature in the documentation. This feature is provided at no additional charge.
With authentication built-in to ALB, developers can focus on building their applications instead of rebuilding authentication for every application, all the while maintaining the scale, availability, and reliability of ALB. I think this feature is a pretty big deal and I can’t wait to see what customers build with it. Let us know what you think of this feature in the comments or on twitter!
The adoption of Apache Spark has increased significantly over the past few years, and running Spark-based application pipelines is the new normal. Spark jobs that are in an ETL (extract, transform, and load) pipeline have different requirements—you must handle dependencies in the jobs, maintain order during executions, and run multiple jobs in parallel. In most of these cases, you can use workflow scheduler tools like Apache Oozie, Apache Airflow, and even Cron to fulfill these requirements.
Apache Oozie is a widely used workflow scheduler system for Hadoop-based jobs. However, its limited UI capabilities, lack of integration with other services, and heavy XML dependency might not be suitable for some users. On the other hand, Apache Airflow comes with a lot of neat features, along with powerful UI and monitoring capabilities and integration with several AWS and third-party services. However, with Airflow, you do need to provision and manage the Airflow server. The Cron utility is a powerful job scheduler. But it doesn’t give you much visibility into the job details, and creating a workflow using Cron jobs can be challenging.
What if you have a simple use case, in which you want to run a few Spark jobs in a specific order, but you don’t want to spend time orchestrating those jobs or maintaining a separate application? You can do that today in a serverless fashion using AWS Step Functions. You can create the entire workflow in AWS Step Functions and interact with Spark on Amazon EMR through Apache Livy.
In this post, I walk you through a list of steps to orchestrate a serverless Spark-based ETL pipeline using AWS Step Functions and Apache Livy.
Input data
For the source data for this post, I use the New York City Taxi and Limousine Commission (TLC) trip record data. For a description of the data, see this detailed dictionary of the taxi data. In this example, we’ll work mainly with the following three columns for the Spark jobs.
Column name
Column description
RateCodeID
Represents the rate code in effect at the end of the trip (for example, 1 for standard rate, 2 for JFK airport, 3 for Newark airport, and so on).
FareAmount
Represents the time-and-distance fare calculated by the meter.
TripDistance
Represents the elapsed trip distance in miles reported by the taxi meter.
The trip data is in comma-separated values (CSV) format with the first row as a header. To shorten the Spark execution time, I trimmed the large input data to only 20,000 rows. During the deployment phase, the input file tripdata.csv is stored in Amazon S3 in the <<your-bucket>>/emr-step-functions/input/ folder.
The following image shows a sample of the trip data:
Solution overview
The next few sections describe how Spark jobs are created for this solution, how you can interact with Spark using Apache Livy, and how you can use AWS Step Functions to create orchestrations for these Spark applications.
At a high level, the solution includes the following steps:
Trigger the AWS Step Function state machine by passing the input file path.
The first stage in the state machine triggers an AWS Lambda
The Lambda function interacts with Apache Spark running on Amazon EMR using Apache Livy, and submits a Spark job.
The state machine waits a few seconds before checking the Spark job status.
Based on the job status, the state machine moves to the success or failure state.
Subsequent Spark jobs are submitted using the same approach.
The state machine waits a few seconds for the job to finish.
The job finishes, and the state machine updates with its final status.
Let’s take a look at the Spark application that is used for this solution.
Spark jobs
For this example, I built a Spark jar named spark-taxi.jar. It has two different Spark applications:
MilesPerRateCode – The first job that runs on the Amazon EMR cluster. This job reads the trip data from an input source and computes the total trip distance for each rate code. The output of this job consists of two columns and is stored in Apache Parquet format in the output path.
The following are the expected output columns:
rate_code – Represents the rate code for the trip.
total_distance – Represents the total trip distance for that rate code (for example, sum(trip_distance)).
RateCodeStatus – The second job that runs on the EMR cluster, but only if the first job finishes successfully. This job depends on two different input sets:
csv – The same trip data that is used for the first Spark job.
miles-per-rate – The output of the first job.
This job first reads the tripdata.csv file and aggregates the fare_amount by the rate_code. After this point, you have two different datasets, both aggregated by rate_code. Finally, the job uses the rate_code field to join two datasets and output the entire rate code status in a single CSV file.
The output columns are as follows:
rate_code_id – Represents the rate code type.
total_distance – Derived from first Spark job and represents the total trip distance.
total_fare_amount – A new field that is generated during the second Spark application, representing the total fare amount by the rate code type.
Note that in this case, you don’t need to run two different Spark jobs to generate that output. The goal of setting up the jobs in this way is just to create a dependency between the two jobs and use them within AWS Step Functions.
Both Spark applications take one input argument called rootPath. It’s the S3 location where the Spark job is stored along with input and output data. Here is a sample of the final output:
The next section discusses how you can use Apache Livy to interact with Spark applications that are running on Amazon EMR.
Using Apache Livy to interact with Apache Spark
Apache Livy provides a REST interface to interact with Spark running on an EMR cluster. Livy is included in Amazon EMR release version 5.9.0 and later. In this post, I use Livy to submit Spark jobs and retrieve job status. When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy, and it starts listening on port 8998 by default. Livy provides APIs to interact with Spark.
Let’s look at a couple of examples how you can interact with Spark running on Amazon EMR using Livy.
To list active running jobs, you can execute the following from the EMR master node:
curl localhost:8998/sessions
If you want to do the same from a remote instance, just change localhost to the EMR hostname, as in the following (port 8998 must be open to that remote instance through the security group):
Through Spark submit, you can pass multiple arguments for the Spark job and Spark configuration settings. You can also do that using Livy, by passing the S3 path through the args parameter, as shown following:
For a detailed list of Livy APIs, see the Apache Livy REST API page. This post uses GET /batches and POST /batches.
In the next section, you create a state machine and orchestrate Spark applications using AWS Step Functions.
Using AWS Step Functions to create a Spark job workflow
AWS Step Functions automatically triggers and tracks each step and retries when it encounters errors. So your application executes in order and as expected every time. To create a Spark job workflow using AWS Step Functions, you first create a Lambda state machine using different types of states to create the entire workflow.
First, you use the Task state—a simple state in AWS Step Functions that performs a single unit of work. You also use the Wait state to delay the state machine from continuing for a specified time. Later, you use the Choice state to add branching logic to a state machine.
The following is a quick summary of how to use different states in the state machine to create the Spark ETL pipeline:
Task state – Invokes a Lambda function. The first Task state submits the Spark job on Amazon EMR, and the next Task state is used to retrieve the previous Spark job status.
Wait state – Pauses the state machine until a job completes execution.
Choice state – Each Spark job execution can return a failure, an error, or a success state So, in the state machine, you use the Choice state to create a rule that specifies the next action or step based on the success or failure of the previous step.
Here is one of my Task states, MilesPerRateCode, which simply submits a Spark job:
"MilesPerRate Job": {
"Type": "Task",
"Resource":"arn:aws:lambda:us-east-1:xxxxxx:function:blog-miles-per-rate-job-submit-function",
"ResultPath": "$.jobId",
"Next": "Wait for MilesPerRate job to complete"
}
This Task state configuration specifies the Lambda function to execute. Inside the Lambda function, it submits a Spark job through Livy using Livy’s POST API. Using ResultPath, it tells the state machine where to place the result of the executing task. As discussed in the previous section, Spark submit returns the session ID, which is captured with $.jobId and used in a later state.
The following code section shows the Lambda function, which is used to submit the MilesPerRateCode job. It uses the Python request library to submit a POST against the Livy endpoint hosted on Amazon EMR and passes the required parameters in JSON format through payload. It then parses the response, grabs id from the response, and returns it. The Next field tells the state machine which state to go to next.
Just like in the MilesPerRate job, another state submits the RateCodeStatus job, but it executes only when all previous jobs have completed successfully.
Here is the Task state in the state machine that checks the Spark job status:
Just like other states, the preceding Task executes a Lambda function, captures the result (represented by jobStatus), and passes it to the next state. The following is the Lambda function that checks the Spark job status based on a given session ID:
In the Choice state, it checks the Spark job status value, compares it with a predefined state status, and transitions the state based on the result. For example, if the status is success, move to the next state (RateCodeJobStatus job), and if it is dead, move to the MilesPerRate job failed state.
To set up this entire solution, you need to create a few AWS resources. To make it easier, I have created an AWS CloudFormation template. This template creates all the required AWS resources and configures all the resources that are needed to create a Spark-based ETL pipeline on AWS Step Functions.
This CloudFormation template requires you to pass the following four parameters during initiation.
Parameter
Description
ClusterSubnetID
The subnet where the Amazon EMR cluster is deployed and Lambda is configured to talk to this subnet.
KeyName
The name of the existing EC2 key pair to access the Amazon EMR cluster.
VPCID
The ID of the virtual private cloud (VPC) where the EMR cluster is deployed and Lambda is configured to talk to this VPC.
S3RootPath
The Amazon S3 path where all required files (input file, Spark job, and so on) are stored and the resulting data is written.
IMPORTANT: These templates are designed only to show how you can create a Spark-based ETL pipeline on AWS Step Functions using Apache Livy. They are not intended for production use without modification. And if you try this solution outside of the us-east-1 Region, download the necessary files from s3://aws-data-analytics-blog/emr-step-functions, upload the files to the buckets in your Region, edit the script as appropriate, and then run it.
To launch the CloudFormation stack, choose Launch Stack:
Launching this stack creates the following list of AWS resources.
Logical ID
Resource Type
Description
StepFunctionsStateExecutionRole
IAM role
IAM role to execute the state machine and have a trust relationship with the states service.
SparkETLStateMachine
AWS Step Functions state machine
State machine in AWS Step Functions for the Spark ETL workflow.
LambdaSecurityGroup
Amazon EC2 security group
Security group that is used for the Lambda function to call the Livy API.
RateCodeStatusJobSubmitFunction
AWS Lambda function
Lambda function to submit the RateCodeStatus job.
MilesPerRateJobSubmitFunction
AWS Lambda function
Lambda function to submit the MilesPerRate job.
SparkJobStatusFunction
AWS Lambda function
Lambda function to check the Spark job status.
LambdaStateMachineRole
IAM role
IAM role for all Lambda functions to use the lambda trust relationship.
EMRCluster
Amazon EMR cluster
EMR cluster where Livy is running and where the job is placed.
During the AWS CloudFormation deployment phase, it sets up S3 paths for input and output. Input files are stored in the <<s3-root-path>>/emr-step-functions/input/ path, whereas spark-taxi.jar is copied under <<s3-root-path>>/emr-step-functions/.
The following screenshot shows how the S3 paths are configured after deployment. In this example, I passed a bucket that I created in the AWS account s3://tm-app-demos for the S3 root path.
If the CloudFormation template completed successfully, you will see Spark-ETL-State-Machine in the AWS Step Functions dashboard, as follows:
Choose the Spark-ETL-State-Machine state machine to take a look at this implementation. The AWS CloudFormation template built the entire state machine along with its dependent Lambda functions, which are now ready to be executed.
On the dashboard, choose the newly created state machine, and then choose New execution to initiate the state machine. It asks you to pass input in JSON format. This input goes to the first state MilesPerRate Job, which eventually executes the Lambda function blog-miles-per-rate-job-submit-function.
Pass the S3 root path as input:
{
“rootPath”: “s3://tm-app-demos”
}
Then choose Start Execution:
The rootPath value is the same value that was passed when creating the CloudFormation stack. It can be an S3 bucket location or a bucket with prefixes, but it should be the same value that is used for AWS CloudFormation. This value tells the state machine where it can find the Spark jar and input file, and where it will write output files. After the state machine starts, each state/task is executed based on its definition in the state machine.
At a high level, the following represents the flow of events:
Execute the first Spark job, MilesPerRate.
The Spark job reads the input file from the location <<rootPath>>/emr-step-functions/input/tripdata.csv. If the job finishes successfully, it writes the output data to <<rootPath>>/emr-step-functions/miles-per-rate.
If the Spark job fails, it transitions to the error state MilesPerRate job failed, and the state machine stops. If the Spark job finishes successfully, it transitions to the RateCodeStatus Job state, and the second Spark job is executed.
If the second Spark job fails, it transitions to the error state RateCodeStatus job failed, and the state machine stops with the Failed status.
If this Spark job completes successfully, it writes the final output data to the <<rootPath>>/emr-step-functions/rate-code-status/ It also transitions the RateCodeStatus job finished state, and the state machine ends its execution with the Success status.
This following screenshot shows a successfully completed Spark ETL state machine:
The right side of the state machine diagram shows the details of individual states with their input and output.
When you execute the state machine for the second time, it fails because the S3 path already exists. The state machine turns red and stops at MilePerRate job failed. The following image represents that failed execution of the state machine:
You can also check your Spark application status and logs by going to the Amazon EMR console and viewing the Application history tab:
I hope this walkthrough paints a picture of how you can create a serverless solution for orchestrating Spark jobs on Amazon EMR using AWS Step Functions and Apache Livy. In the next section, I share some ideas for making this solution even more elegant.
Next steps
The goal of this post is to show a simple example that uses AWS Step Functions to create an orchestration for Spark-based jobs in a serverless fashion. To make this solution robust and production ready, you can explore the following options:
In this example, I manually initiated the state machine by passing the rootPath as input. You can instead trigger the state machine automatically. To run the ETL pipeline as soon as the files arrive in your S3 bucket, you can pass the new file path to the state machine. Because CloudWatch Events supports AWS Step Functions as a target, you can create a CloudWatch rule for an S3 event. You can then set AWS Step Functions as a target and pass the new file path to your state machine. You’re all set!
You can also improve this solution by adding an alerting mechanism in case of failures. To do this, create a Lambda function that sends an alert email and assigns that Lambda function to a Fail That way, when any part of your state fails, it triggers an email and notifies the user.
If you want to submit multiple Spark jobs in parallel, you can use the Parallel state type in AWS Step Functions. The Parallel state is used to create parallel branches of execution in your state machine.
With Lambda and AWS Step Functions, you can create a very robust serverless orchestration for your big data workload.
Cleaning up
When you’ve finished testing this solution, remember to clean up all those AWS resources that you created using AWS CloudFormation. Use the AWS CloudFormation console or AWS CLI to delete the stack named Blog-Spark-ETL-Step-Functions.
Summary
In this post, I showed you how to use AWS Step Functions to orchestrate your Spark jobs that are running on Amazon EMR. You used Apache Livy to submit jobs to Spark from a Lambda function and created a workflow for your Spark jobs, maintaining a specific order for job execution and triggering different AWS events based on your job’s outcome. Go ahead—give this solution a try, and share your experience with us!
Tanzir Musabbir is an EMR Specialist Solutions Architect with AWS. He is an early adopter of open source Big Data technologies. At AWS, he works with our customers to provide them architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena & AWS Glue. Tanzir is a big Real Madrid fan and he loves to travel in his free time.
As a serverless computing platform that supports Java 8 runtime, AWS Lambda makes it easy to run any type of Java function simply by uploading a JAR file. To help define not only a Lambda serverless application but also Amazon API Gateway, Amazon DynamoDB, and other related services, the AWS Serverless Application Model (SAM) allows developers to use a simple AWS CloudFormation template.
AWS provides the AWS Toolkit for Eclipse that supports both Lambda and SAM. AWS also gives customers an easy way to create Lambda functions and SAM applications in Java using the AWS Command Line Interface (AWS CLI). After you build a JAR file, all you have to do is type the following commands:
To consolidate these steps, customers can use Archetype by Apache Maven. Archetype uses a predefined package template that makes getting started to develop a function exceptionally simple.
In this post, I introduce a Maven archetype that allows you to create a skeleton of AWS SAM for a Java function. Using this archetype, you can generate a sample Java code example and an accompanying SAM template to deploy it on AWS Lambda by a single Maven action.
Prerequisites
Make sure that the following software is installed on your workstation:
Java
Maven
AWS CLI
(Optional) AWS SAM CLI
Install Archetype
After you’ve set up those packages, install Archetype with the following commands:
git clone https://github.com/awslabs/aws-serverless-java-archetype
cd aws-serverless-java-archetype
mvn install
These are one-time operations, so you don’t run them for every new package. If you’d like, you can add Archetype to your company’s Maven repository so that other developers can use it later.
With those packages installed, you’re ready to develop your new Lambda Function.
Start a project
Now that you have the archetype, customize it and run the code:
cd /path/to/project_home
mvn archetype:generate \
-DarchetypeGroupId=com.amazonaws.serverless.archetypes \
-DarchetypeArtifactId=aws-serverless-java-archetype \
-DarchetypeVersion=1.0.0 \
-DarchetypeRepository=local \ # Forcing to use local maven repository
-DinteractiveMode=false \ # For batch mode
# You can also specify properties below interactively if you omit the line for batch mode
-DgroupId=YOUR_GROUP_ID \
-DartifactId=YOUR_ARTIFACT_ID \
-Dversion=YOUR_VERSION \
-DclassName=YOUR_CLASSNAME
You should have a directory called YOUR_ARTIFACT_ID that contains the files and folders shown below:
The sample code is a working example. If you install SAM CLI, you can invoke it just by the command below:
cd YOUR_ARTIFACT_ID
mvn -P invoke verify
[INFO] Scanning for projects...
[INFO]
[INFO] ---------------------------< com.riywo:foo >----------------------------
[INFO] Building foo 1.0
[INFO] --------------------------------[ jar ]---------------------------------
...
[INFO] --- maven-jar-plugin:3.0.2:jar (default-jar) @ foo ---
[INFO] Building jar: /private/tmp/foo/target/foo-1.0.jar
[INFO]
[INFO] --- maven-shade-plugin:3.1.0:shade (shade) @ foo ---
[INFO] Including com.amazonaws:aws-lambda-java-core:jar:1.2.0 in the shaded jar.
[INFO] Replacing /private/tmp/foo/target/lambda.jar with /private/tmp/foo/target/foo-1.0-shaded.jar
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-local-invoke) @ foo ---
2018/04/06 16:34:35 Successfully parsed template.yaml
2018/04/06 16:34:35 Connected to Docker 1.37
2018/04/06 16:34:35 Fetching lambci/lambda:java8 image for java8 runtime...
java8: Pulling from lambci/lambda
Digest: sha256:14df0a5914d000e15753d739612a506ddb8fa89eaa28dcceff5497d9df2cf7aa
Status: Image is up to date for lambci/lambda:java8
2018/04/06 16:34:37 Invoking Package.Example::handleRequest (java8)
2018/04/06 16:34:37 Decompressing /tmp/foo/target/lambda.jar
2018/04/06 16:34:37 Mounting /private/var/folders/x5/ldp7c38545v9x5dg_zmkr5kxmpdprx/T/aws-sam-local-1523000077594231063 as /var/task:ro inside runtime container
START RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74 Version: $LATEST
Log output: Greeting is 'Hello Tim Wagner.'
END RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74
REPORT RequestId: a6ae19fe-b1b0-41e2-80bc-68a40d094d74 Duration: 96.60 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 7 MB
{"greetings":"Hello Tim Wagner."}
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 10.452 s
[INFO] Finished at: 2018-04-06T16:34:40+09:00
[INFO] ------------------------------------------------------------------------
This maven goal invokes sam local invoke -e event.json, so you can see the sample output to greet Tim Wagner.
To deploy this application to AWS, you need an Amazon S3 bucket to upload your package. You can use the following command to create a bucket if you want:
aws s3 mb s3://YOUR_BUCKET --region YOUR_REGION
Now, you can deploy your application by just one command!
mvn deploy \
-DawsRegion=YOUR_REGION \
-Ds3Bucket=YOUR_BUCKET \
-DstackName=YOUR_STACK
[INFO] Scanning for projects...
[INFO]
[INFO] ---------------------------< com.riywo:foo >----------------------------
[INFO] Building foo 1.0
[INFO] --------------------------------[ jar ]---------------------------------
...
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-package) @ foo ---
Uploading to aws-serverless-java/com.riywo:foo:1.0/924732f1f8e4705c87e26ef77b080b47 11657 / 11657.0 (100.00%)
Successfully packaged artifacts and wrote output template to file target/sam.yaml.
Execute the following command to deploy the packaged template
aws cloudformation deploy --template-file /private/tmp/foo/target/sam.yaml --stack-name <YOUR STACK NAME>
[INFO]
[INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @ foo ---
[INFO] Skipping artifact deployment
[INFO]
[INFO] --- exec-maven-plugin:1.6.0:exec (sam-deploy) @ foo ---
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - archetype
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 37.176 s
[INFO] Finished at: 2018-04-06T16:41:02+09:00
[INFO] ------------------------------------------------------------------------
Maven automatically creates a shaded JAR file, uploads it to your S3 bucket, replaces template.yaml, and creates and updates the CloudFormation stack.
To customize the process, modify the pom.xml file. For example, to avoid typing values for awsRegion, s3Bucket or stackName, write them inside pom.xml and check in your VCS. Afterward, you and the rest of your team can deploy the function by typing just the following command:
mvn deploy
Options
Lambda Java 8 runtime has some types of handlers: POJO, Simple type and Stream. The default option of this archetype is POJO style, which requires to create request and response classes, but they are baked by the archetype by default. If you want to use other type of handlers, you can use handlerType property like below:
## POJO type (default)
mvn archetype:generate \
...
-DhandlerType=pojo
## Simple type - String
mvn archetype:generate \
...
-DhandlerType=simple
### Stream type
mvn archetype:generate \
...
-DhandlerType=stream
Also, Lambda Java 8 runtime supports two types of Logging class: Log4j 2 and LambdaLogger. This archetype creates LambdaLogger implementation by default, but you can use Log4j 2 if you want:
If you use LambdaLogger, you can delete ./src/main/resources/log4j2.xml. See documentation for more details.
Conclusion
So, what’s next? Develop your Lambda function locally and type the following command: mvn deploy !
With this Archetype code example, available on GitHub repo, you should be able to deploy Lambda functions for Java 8 in a snap. If you have any questions or comments, please submit them below or leave them on GitHub.
This post courtesy of Massimiliano Angelino, AWS Solutions Architect
Different enterprise systems—ERP, CRM, BI, HR, etc.—need to exchange information but normally cannot do that natively because they are from different vendors. Enterprises have tried multiple ways to integrate heterogeneous systems, generally referred to as enterprise application integration (EAI).
Modern EAI systems are based on a message-oriented middleware (MoM), also known as enterprise service bus (ESB). An ESB provides data communication via a message bus, on top of which it also provides components to orchestrate, route, translate, and monitor the data exchange. Communication with the ESB is done via adapters or connectors provided by the ESB. In this way, the different applications do not have to have specific knowledge of the technology used to provide the integration.
Amazon MQ used with Apache Camel is an open-source alternative to commercial ESBs. With the launch of Amazon MQ, integration between on-premises applications and cloud services becomes much simpler. Amazon MQ provides a managed message broker service currently supporting ApacheMQ 5.15.0.
In this post, I show how a simple integration between Amazon MQ and other AWS services can be achieved by using Apache Camel.
Apache Camel provides built-in connectors for integration with a wide variety of AWS services such as Amazon MQ, Amazon SQS, Amazon SNS, Amazon SWF, Amazon S3, AWS Lambda, Amazon DynamoDB, AWS Elastic Beanstalk, and Amazon Kinesis Streams. It also provides a broad range of other connectors including Cassandra, JDBC, Spark, and even Facebook and Slack.
EAI system architecture
Different applications use different data formats, hence the need for a translation/transformation service. Such services can be provided to or from a common “normalized” format, or specifically between two applications.
The use of normalized formats simplifies the integration process when multiple applications need to share the same data, as the number of conversions to be realized is N (number of applications). This is at the cost of a more complex adaptation to a common format, which is required to cover all needs from the different applications, current and future.
Another characteristic of an EAI system is the support of distributed transactions to ensure data consistency across multiple applications.
EAI system architecture is normally composed of the following components:
A centralized broker that handles security, access control, and data communications. Amazon MQ provides these features through the support of multiple transport protocols (AMQP, Openwire, MQTT, WebSocket), security (all communications are encrypted via SSL), and per destination granular access control.
An independent data model, also known as the canonical data model. XML is the de facto standard for the data representation.
Connectors/agents that allow the applications to communicate with the broker.
A system model to allow a standardized way for all components to interface with the EAI. Java Message Service (JMS) and Windows Communication Foundation (WCF) are standard APIs to interact with constructs such as queues and topics to implement the different messaging patterns.
Walkthrough
This solution walks you through the following steps:
Creating the broker
Writing a simple application
Adding the dependencies
Triaging files into S3
Writing the Camel route
Sending files to the AMQP queue
Setting up AMQP
Testing the code
Creating the broker
To create a new broker, log in to your AWS account and choose Amazon MQ. Amazon MQ is currently available in six AWS Regions:
US East (N. Virginia)
US East (Ohio)
US West (Oregon)
EU (Ireland)
EU (Frankfurt)
Asia Pacific (Sydney) regions.
Make sure that you have selected one of these Regions.
The master user name and password are used to access the monitoring console of the broker and can be also used to authenticate when connecting the clients to the broker. I recommend creating separate users, without console access, to authenticate the clients to the broker, after the broker has been created.
For this example, create a single broker without failover. If your application requires a higher availability level, check the Create standby in a different zone check box. In case the principal broker instance would fail, the standby takes over in seconds. To make the client aware of the standby, use the failover:// protocol in the connection configuration pointing to both broker endpoints.
Leave the other settings as is. The broker takes few minutes to be created. After it’s done, you can see the list of endpoints available for the different protocols.
After the broker has been created, modify the security group to add the allowed ports and sources for access.
For this example, you need access to the ActiveMQ admin page and to AMQP. Open up ports 8162 and 5671 to the public address of your laptop.
You can also create a new user for programmatic access to the broker. In the Users section, choose Create User and add a new user named sdk.
Writing a simple application
The complete code for this walkthrough is available from the aws-amazonmq-apachecamel-sample GitHub repo. Clone the repository on your local machine to have the fully functional example. The rest of this post offers step-by-step instructions to build this solution.
To write the application, use Apache Maven and the Camel archetypes provided by Maven. If you do not have Apache Maven installed on your machine, you can follow the instructions at Installing Apache Maven.
From a terminal, run the following command:
mvn archetype:generate
You get a list of archetypes. Type camel to get only the one related to camel. In this case, use the java8 example and type the following:
Maven now generates the skeleton code in a folder named as the artifactId. In this case:
camel-aws-simple
Next, test that the environment is configured correctly to run Camel. At the prompt, run the following commands:
cd camel-aws-simple
mvn install
mvn exec:java
You should see a log appearing in the console, printing the following:
[INFO] --- exec-maven-plugin:1.6.0:java (default-cli) @ camel-aws-test ---
[ com.angmas.MainApp.main()] DefaultCamelContext INFO Apache Camel 2.20.1 (CamelContext: camel-1) is starting
[ com.angmas.MainApp.main()] ManagedManagementStrategy INFO JMX is enabled
[ com.angmas.MainApp.main()] DefaultTypeConverter INFO Type converters loaded (core: 192, classpath: 0)
[ com.angmas.MainApp.main()] DefaultCamelContext INFO StreamCaching is not in use. If using streams then its recommended to enable stream caching. See more details at http://camel.apache.org/stream-caching.html
[ com.angmas.MainApp.main()] DefaultCamelContext INFO Route: route1 started and consuming from: timer://simple?period=1000
[ com.angmas.MainApp.main()] DefaultCamelContext INFO Total 1 routes, of which 1 are started
[ com.angmas.MainApp.main()] DefaultCamelContext INFO Apache Camel 2.20.1 (CamelContext: camel-1) started in 0.419 seconds
[-1) thread #2 - timer://simple] route1 INFO Got a String body
[-1) thread #2 - timer://simple] route1 INFO Got an Integer body
[-1) thread #2 - timer://simple] route1 INFO Got a Double body
[-1) thread #2 - timer://simple] route1 INFO Got a String body
[-1) thread #2 - timer://simple] route1 INFO Got an Integer body
[-1) thread #2 - timer://simple] route1 INFO Got a Double body
[-1) thread #2 - timer://simple] route1 INFO Got a String body
[-1) thread #2 - timer://simple] route1 INFO Got an Integer body
[-1) thread #2 - timer://simple] route1 INFO Got a Double body
Adding the dependencies
Now that you have verified that the sample works, modify it to add the dependencies to interface to Amazon MQ/ActiveMQ and AWS.
For the following steps, you can use a normal text editor, such as vi, Sublime Text, or Visual Studio Code. Or, open the maven project in an IDE such as Eclipse or IntelliJ IDEA.
Open pom.xml and add the following lines inside the <dependencies> tag:
The camel-aws component is taking care of the interface with the supported AWS services without requiring any in-depth knowledge of the AWS Java SDK. For more information, see Camel Components for Amazon Web Services.
Triaging files into S3
Write a Camel component that receives files as a payload to messages in a queue and write them to an S3 bucket with different prefixes depending on the extension.
Because the broker that you created is exposed via a public IP address, you can execute the code from anywhere that there is an internet connection that allows communication on the specific ports. In this example, run the code from your own laptop. A broker can also be created without public IP address, in which case it is only accessible from inside the VPC in which it has been created, or by any peered VPC or network connected via a virtual gateway (VPN or AWS Direct Connect).
First, look at the code created by Maven. The archetype chosen created a standalone Camel context run via the helper org.apache.camel.main.Main class. This provides an easy way to run Camel routes from an IDE or the command line without needing to deploy it inside a container. Apache Camel can be also run as an OSGi module, or Spring and SpringBoot bean.
package com.angmas;
import org.apache.camel.main.Main;
/**
* A Camel Application
*/
public class MainApp {
/**
* A main() so you can easily run these routing rules in your IDE
*/
public static void main(String... args) throws Exception {
Main main = new Main();
main.addRouteBuilder(new MyRouteBuilder());
main.run(args);
}
}
The main method instantiates the Camel Main helper class and the routes, and runs the Camel application. The MyRouteBuilder class creates a route using Java DSL. It is also possible to define routes in Spring XML and load them dynamically in the code.
public void configure() {
// this sample sets a random body then performs content-based
// routing on the message using method references
from("timer:simple?period=1000")
.process()
.message(m -> m.setHeader("index", index++ % 3))
.transform()
.message(this::randomBody)
.choice()
.when()
.body(String.class::isInstance)
.log("Got a String body")
.when()
.body(Integer.class::isInstance)
.log("Got an Integer body")
.when()
.body(Double.class::isInstance)
.log("Got a Double body")
.otherwise()
.log("Other type message");
}
Writing the Camel route
Replace the existing route with one that fetches messages from Amazon MQ over AMQP, and routes the content to different S3 buckets depending on the file name extension.
Reads messages from the AMQP queue named filequeue.
Processes the message and sets a new ext header using the setExtensionHeader method (see below).
Checks the value of the ext header and write the body of the message as an object in an S3 bucket using different key prefixes, retaining the original name of the file.
The Amazon S3 component is configured with the bucket name, and a reference to an S3 client (amazonS3client=#s3Client) that you added to the Camel registry in the Main method of the app. Adding the object to the Camel registry allows Camel to find the object at runtime. Even though you could pass the region, accessKey, and secretKey parameters directly in the component URI, this way is more secure. It can make use of EC2 instance roles, so that you never need to pass the secrets.
Sending files to the AMQP queue
To send the files to the AMQP queue for testing, add another Camel route. In a real scenario, the messages to the AMQP queue are generated by another client. You are going to create a new route builder, but you could also add this route inside the existing MyRouteBuilder.
package com.angmas;
import org.apache.camel.builder.RouteBuilder;
/**
* A Camel Java8 DSL Router
*/
public class MessageProducerBuilder extends RouteBuilder {
/**
* Configure the Camel routing rules using Java code...
*/
public void configure() {
from("file://input?delete=false&noop=true")
.log("Content ${body} ${headers.CamelFileName}")
.to("amqp:filequeue");
}
}
The code reads files from the input folder in the work directory and publishes it to the queue. The route builder is added in the main class:
By default, Camel tries to connect to a local AMQP broker. Configure it to connect to your Amazon MQ broker.
Create an AMQPConnectionDetails object that is configured to connect to Amazon MQ broker with SSL and pass the user name and password that you set on the broker. Adding the object to the Camel registry allows Camel to find the object at runtime and use it as the default connection to AMQP.
public class MainApp {
public static String BROKER_URL = System.getenv("BROKER_URL");
public static String AMQP_URL = "amqps://"+BROKER_URL+":5671";
public static String BROKER_USERNAME = System.getenv("BROKER_USERNAME");
public static String BROKER_PASSWORD = System.getenv("BROKER_PASSWORD");
/**
* A main() so you can easily run these routing rules in your IDE
*/
public static void main(String... args) throws Exception {
Main main = new Main();
main.bind("amqp", getAMQPconnection());
main.bind("s3Client", AmazonS3ClientBuilder.standard().withRegion(Regions.US_EAST_1).build());
main.addRouteBuilder(new MyRouteBuilder());
main.addRouteBuilder(new MessageProducerBuilder());
main.run(args);
}
public static AMQPConnectionDetails getAMQPconnection() {
return new AMQPConnectionDetails(AMQP_URL, BROKER_USERNAME, BROKER_PASSWORD);
}
}
The AMQP_URL uses the amqps schema that indicates that you are using SSL. You then add the component to the registry. Camel finds it by matching the class type. main.bind("amqp-ssl", getAMQPConnection());
Testing the code
Create an input folder in the project root, and create few files with different extensions, such as txt, html, and csv.
Set the different environment variables required by the code, either in the shell or in your IDE as execution configuration.
If you are running the example from an EC2 instance, ensure that the EC2 instance role has read permission on the S3 bucket.
If you are running this on your laptop, ensure that you have configured the AWS credentials in the environment, for example, by using the aws configure command.
From the command line, execute the code:
mvn exec:java
If you are using an IDE, execute the main class. Camel outputs logging information and you should see messages listing the content and names of the files in the input folder.
Keep adding some more files to the input folder. You see that they are triaged in S3 a few seconds later. You can open the S3 console to check that they have been created.
To stop Camel, press CTRL+C in the shell.
Conclusion
In this post, I showed you how to create a publicly accessible Amazon MQ broker, and how to use Apache Camel to easily integrate AWS services with the broker. In the example, you created a Camel route that reads messages containing files from the AMQP queue and triages them by file extension into an S3 bucket.
Camel supports several components and provides blueprints for several enterprise integration patterns. Used in combination with the Amazon MQ, it provides a powerful and flexible solution to extend traditional enterprise solutions to the AWS Cloud, and integrate them seamlessly with cloud-native services, such as Amazon S3, Amazon SNS, Amazon SQS, Amazon CloudWatch, and AWS Lambda.
To learn more, see the Amazon MQ website. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.
Security updates have been issued by Arch Linux (firefox, libvorbis, and ntp), Debian (curl, firefox-esr, gitlab, libvorbis, libvorbisidec, openjdk-8, and uwsgi), Fedora (firefox, ImageMagick, kernel, and mailman), Gentoo (adobe-flash, jabberd2, oracle-jdk-bin, and plasma-workspace), Mageia (bugzilla, kernel, leptonica, libtiff, libvorbis, microcode, python-pycrypto, SDL_image, shadow-utils, sharutils, and xerces-c), openSUSE (exempi, firefox, GraphicsMagick, libid3tag, libraw, mariadb, php5, postgresql95, SDL2, SDL2_image, ucode-intel, and xmltooling), Red Hat (firefox), Slackware (firefox and libvorbis), SUSE (microcode_ctl and ucode-intel), and Ubuntu (firefox and php5, php7.0, php7.1).
Today we’d like to walk you through AWS Identity and Access Management (IAM), federated sign-in through Active Directory (AD) and Active Directory Federation Services (ADFS). With IAM, you can centrally manage users, security credentials such as access keys, and permissions that control which resources users can access. Customers have the option of creating users and group objects within IAM or they can utilize a third-party federation service to assign external directory users access to AWS resources. To streamline the administration of user access in AWS, organizations can utilize a federated solution with an external directory, allowing them to minimize administrative overhead. Benefits of this approach include leveraging existing passwords and password policies, roles and groups. This guide provides a walk-through on how to automate the federation setup across multiple accounts/roles with an Active Directory backing identity store. This will establish the minimum baseline for the authentication architecture, including the initial IdP deployment and elements for federation.
ADFS Federated Authentication Process
The following describes the process a user will follow to authenticate to AWS using Active Directory and ADFS as the identity provider and identity brokers:
Corporate user accesses the corporate Active Directory Federation Services portal sign-in page and provides Active Directory authentication credentials.
AD FS authenticates the user against Active Directory.
Active Directory returns the user’s information, including AD group membership information.
AD FS dynamically builds ARNs by using Active Directory group memberships for the IAM roles and user attributes for the AWS account IDs, and sends a signed assertion to the users browser with a redirect to post the assertion to AWS STS.
Temporary credentials are returned using STS AssumeRoleWithSAML.
The user is authenticated and provided access to the AWS management console.
Configuration Steps
Configuration requires setup in the Identity Provider store (e.g. Active Directory), the identity broker (e.g. Active Directory Federation Services), and AWS. It is possible to configure AWS to federate authentication using a variety of third-party SAML 2.0 compliant identity providers, more information can be found here.
AWS Configuration
The configuration steps outlined in this document can be completed to enable federated access to multiple AWS accounts, facilitating a single sign on process across a multi-account AWS environment. Access can also be provided to multiple roles in each AWS account. The roles available to a user are based on their group memberships in the identity provider (IdP). In a multi-role and/or multi-account scenario, role assumption requires the user to select the account and role they wish to assume during the authentication process.
Identity Provider
A SAML 2.0 identity provider is an IAM resource that describes an identity provider (IdP) service that supports the SAML 2.0 (Security Assertion Markup Language 2.0) standard. AWS SAML identity provider configurations can be used to establish trust between AWS and SAML-compatible identity providers, such as Shibboleth or Microsoft Active Directory Federation Services. These enable users in an organization to access AWS resources using existing credentials from the identity provider.
A SAML identify provider can be configured using the AWS console by completing the following steps.
2. Select SAML for the provider type. Select a provider name of your choosing (this will become the logical name used in the identity provider ARN). Lastly, download the FederationMetadata.xml file from your ADFS server to your client system file (https://yourADFSserverFQDN/FederationMetadata/2007-06/FederationMetadata.xml). Click “Choose File” and upload it to AWS.
3. Click “Next Step” and then verify the information you have entered. Click “Create” to complete the AWS identity provider configuration process.
IAM Role Naming Convention for User Access Once the AWS identity provider configuration is complete, it is necessary to create the roles in AWS that federated users can assume via SAML 2.0. An IAM role is an AWS identity with permission policies that determine what the identity can and cannot do in AWS. In a federated authentication scenario, users (as defined in the IdP) assume an AWS role during the sign-in process. A role should be defined for each access delineation that you wish to define. For example, create a role for each line of business (LOB), or each function within a LOB. Each role will then be assigned a set of policies that define what privileges the users who will be assuming that role will have.
The following steps detail how to create a single role. These steps should be completed multiple times to enable assumption of different roles within AWS, as required.
2. Select “SAML” as the trusted entity type. Click Next Step.
3. Select your previously created identity provider. Click Next: Permissions.
4. The next step requires selection of policies that represent the desired permissions the user should obtain in AWS, once they have authenticated and successfully assumed the role. This can be either a custom policy or preferably an AWS managed policy. AWS recommends leveraging existing AWS access policies for job functions for common levels of access. For example, the “Billing” AWS Managed policy should be utilized to provide financial analyst access to AWS billing and cost information.
5. Provide a name for your role. All roles should be created with the prefix ADFS-<rolename> to simplify the identification of roles in AWS that are accessed through the federated authentication process. Next click, “Create Role”.
Active Directory Configuration
Determining how you will create and delineate your AD groups and IAM roles in AWS is crucial to how you secure access to your account and manage resources. SAML assertions to the AWS environment and the respective IAM role access will be managed through regular expression (regex) matching between your on-premises AD group name to an AWS IAM role.
One approach for creating the AD groups that uniquely identify the AWS IAM role mapping is by selecting a common group naming convention. For example, your AD groups would start with an identifier, for example AWS-, as this will distinguish your AWS groups from others within the organization. Next, include the 12-digit AWS account number. Finally, add the matching role name within the AWS account. Here is an example:
You should do this for each role and corresponding AWS account you wish to support with federated access. Users in Active Directory can subsequently be added to the groups, providing the ability to assume access to the corresponding roles in AWS. If a user is associated with multiple Active Directory groups and AWS accounts, they will see a list of roles by AWS account and will have the option to choose which role to assume. A user will not be able to assume more than one role at a time, but has the ability to switch between them as needed.
Note: Microsoft imposes a limit on the number of groups a user can be a member of (approximately 1,015 groups) due to the size limit for the access token that is created for each security principal. This limitation, however, is not affected by how the groups may or may not be nested.
Active Directory Federation Services Configuration
ADFS federation occurs with the participation of two parties; the identity or claims provider (in this case the owner of the identity repository – Active Directory) and the relying party, which is another application that wishes to outsource authentication to the identity provider; in this case Amazon Secure Token Service (STS). The relying party is a federation partner that is represented by a claims provider trust in the federation service.
Relying Party
You can configure a new relying party in Active Directory Federation Services by doing the following.
1. From the ADFS Management Console, right-click ADFS and select Add Relying Party Trust.
2. In the Add Relying Party Trust Wizard, click Start.
3. Check Import data about the relying party published online or on a local network, enter https://signin.aws.amazon.com/static/saml-metadata.xml, and then click Next. The metadata XML file is a standard SAML metadata document that describes AWS as a relying party.
Note: SAML federations use metadata documents to maintain information about the public keys and certificates that each party utilizes. At run time, each member of the federation can then use this information to validate that the cryptographic elements of the distributed transactions come from the expected actors and haven’t been tampered with. Since these metadata documents do not contain any sensitive cryptographic material, AWS publishes federation metadata at https://signin.aws.amazon.com/static/saml-metadata.xml
4. Set the display name for the relying party and then click Next.
5. We will not choose to enable/configure the MFA settings at this time.
6. Select “Permit all users to access this relying party” and click Next.
7. Review your settings and then click Next.
8. Choose Close on the Finish page to complete the Add Relying Party Trust Wizard. AWS is now configured as a relying party.
Custom Claim Rules
Microsoft Active Directory Federation Services (AD FS) uses Claims Rule Language to issue and transform claims between claims providers and relying parties. A claim is information about a user from a trusted source. The trusted source is asserting that the information is true, and that source has authenticated the user in some manner. The claims provider is the source of the claim. This can be information pulled from an attribute store such as Active Directory (AD). The relying party is the destination for the claims, in this case AWS.
AD FS provides administrators with the option to define custom rules that they can use to determine the behavior of identity claims with the claim rule language. The Active Directory Federation Services (AD FS) claim rule language acts as the administrative building block to help manage the behavior of incoming and outgoing claims. There are four claim rules that need to be created to effectively enable Active Directory users to assume roles in AWS based on group membership in Active Directory.
Right-click on the relying party (in this case Amazon Web Services) and then click Edit Claim Rules
Here are the steps used to create the claim rules for NameId, RoleSessionName, Get AD Groups and Roles.
1. NameId
a) In the Edit Claim Rules for <relying party> dialog box, click Add Rule. b) Select Transform an Incoming Claim and then click Next. c) Use the following settings:
i) Claim rule name: NameId ii) Incoming claim type: Windows Account Name iii) Outgoing claim type: Name ID iv) Outgoing name ID format: Persistent Identifier v) Pass through all claim values: checked
a) Click Add Rule. b) In the Claim rule template list, select Send Claims Using a Custom Rule and then click Next. c) For Claim Rule Name, select Get AD Groups, and then in Custom rule, enter the following:
This custom rule uses a script in the claim rule language that retrieves all the groups the authenticated user is a member of and places them into a temporary claim named http://temp/variable. Think of this as a variable you can access later.
Note: Ensure there’s no trailing whitespace to avoid unexpected results.
4. Role Attributes
a) Unlike the two previous claims, here we used custom rules to send role attributes. This is done by retrieving all the authenticated user’s AD groups and then matching the groups that start with to IAM roles of a similar name. I used the names of these groups to create Amazon Resource Names (ARNs) of IAM roles in my AWS account (i.e., those that start with AWS-). Sending role attributes requires two custom rules. The first rule retrieves all the authenticated user’s AD group memberships and the second rule performs the transformation to the roles claim.
i) Click Add Rule. ii) In the Claim rule template list, select Send Claims Using a Custom Rule and then click Next. iii) For Claim Rule Name, enter Roles, and then in Custom rule, enter the following:
Rule language: c:[Type == "http://temp/variable", Value =~ "(?i)^AWS-([\d]{12})"] => issue(Type = "https://aws.amazon.com/SAML/Attributes/Role", Value = RegExReplace(c.Value, "AWS-([\d]{12})-", "arn:aws:iam::$1:saml-provider/idp1,arn:aws:iam::$1:role/"));
This custom rule uses regular expressions to transform each of the group memberships of the form AWS-<Account Number>-<Role Name> into in the IAM role ARN, IAM federation provider ARN form AWS expects.
Note: In the example rule language above idp1 represents the logical name given to the SAML identity provider in the AWS identity provider setup. Please change this based on the logical name you chose in the IAM console for your identity provider.
Adjusting Session Duration
By default, the temporary credentials that are issued by AWS IAM for SAML federation are valid for an hour. Depending on your organizations security stance, you may wish to adjust. You can allow your federated users to work in the AWS Management Console for up to 12 hours. This can be accomplished by adding another claim rule in your ADFS configuration. To add the rule, do the following:
1. Access ADFS Management Tool on your ADFS Server. 2. Choose Relying Party Trusts, then select your AWS Relying Party configuration. 3. Choose Edit Claim Rules. 4. Choose Add Rule to configure a new rule, and then choose Send claims using a custom rule. Finally, choose Next. 5. Name your Rule “Session Duration” and add the following rule syntax. 6. Adjust the value of 28800 seconds (8 hours) as appropriate.
Rule language: => issue(Type = "https://aws.amazon.com/SAML/Attributes/SessionDuration", Value = "28800");
Note: AD FS 2012 R2 and AD FS 2016 tokens have a sixty-minute validity period by default. This value is configurable on a per-relying party trust basis. In addition to adding the “Session Duration” claim rule, you will also need to update the security token created by AD FS. To update this value, run the following command:
The Parameter “-TokenLifetime” determines the Lifetime in Minutes. In this example, we set the Lifetime to 480 minutes, eight hours.
These are the main settings related to session lifetimes and user authentication. Once updated, any new console session your federated users initiate will be valid for the duration specified in the SessionDuration claim.
API/CLI Access Access to the AWS API and command-line tools using federated access can be accomplished using techniques in the following blog article:
This will enable your users to access your AWS environment using their domain credentials through the AWS CLI or one of the AWS SDKs.
Conclusion In this post, I’ve shown you how to provide identity federation, and thus SSO, to the AWS Management Console for multiple accounts using SAML assertions. With this approach, the AWS Security Token service (STS) will provide temporary credentials (via SAML) for the user to ‘assume’ a role (that they have access to use, as denoted by AD Group membership) that has specific permissions associated; as opposed to providing long-term access credentials to the AWS resources. By adopting this model, you will have a secure and robust IAM approach for accessing AWS resources that align with AWS security best practices.
Security updates have been issued by Debian (xmltooling), Fedora (mbedtls), openSUSE (freexl), Oracle (quagga and ruby), Red Hat (.NET Core, quagga, and ruby), Scientific Linux (quagga and ruby), SUSE (glibc), and Ubuntu (libreoffice).
Security updates have been issued by Arch Linux (mbedtls), CentOS (gcab and java-1.7.0-openjdk), Debian (drupal7, lucene-solr, wavpack, and xmltooling), Fedora (dnsmasq, gcab, gimp, golang, knot-resolver, ldns, libsamplerate, mingw-OpenEXR, mingw-poppler, python-crypto, qt5-qtwebengine, sblim-sfcb, systemd, unbound, and wavpack), Mageia (ioquake3, TiMidity++, tomcat, tomcat-native, and wireshark), openSUSE (systemd and zziplib), Red Hat (erlang and openstack-nova and python-novaclient), and SUSE (kernel).
On his blog, Patrick Uiterwijk writes about about Fedora packaging and how the distribution works to ensure its users get valid updates. Packages are signed, but repository metadata is not (yet), but there are other mechanisms in place to keep users from getting outdated updates (or to not get important security updates). “However, when a significant security issue is announced and we have repositories that include fixes for this issue, we have an ‘Emergency’ button. When we press that button, we tell our servers to immediately regard every older repomd.xml checksum as outdated.
This means that when we press this button, every mirror that does not have the very latest repository data will be regarded as outdated, so that our users get the security patches as soon as possible. This does mean that for a period of time only the master mirrors are trusted until other mirrors sync their data, but we prefer this solution over delaying getting important fixes out to our users and making them vulnerable to attackers in the meantime.”
This post courtesy of Greg Share, AWS Solutions Architect
Many organizations, particularly enterprises, rely on message brokers to connect and coordinate different systems. Message brokers enable distributed applications to communicate with one another, serving as the technological backbone for their IT environment, and ultimately their business services. Applications depend on messaging to work.
In many cases, those organizations have started to build new or “lift and shift” applications to AWS. In some cases, there are applications, such as mainframe systems, too costly to migrate. In these scenarios, those on-premises applications still need to interact with cloud-based components.
Amazon MQ is a managed message broker service for ActiveMQ that enables organizations to send messages between applications in the cloud and on-premises to enable hybrid environments and application modernization. For example, you can invoke AWS Lambda from queues and topics managed by Amazon MQ brokers to integrate legacy systems with serverless architectures. ActiveMQ is an open-source message broker written in Java that is packaged with clients in multiple languages, Java Message Server (JMS) client being one example.
This post shows you can use Amazon MQ to integrate on-premises and cloud environments using the network of brokers feature of ActiveMQ. It provides configuration parameters for a one-way duplex connection for the flow of messages from an on-premises ActiveMQ message broker to Amazon MQ.
ActiveMQ and the network of brokers
First, look at queues within ActiveMQ and then at the network of brokers as a mechanism to distribute messages.
The network of brokers behaves differently from models such as physical networks. The key consideration is that the production (sending) of a message is disconnected from the consumption of that message. Think of the delivery of a parcel: The parcel is sent by the supplier (producer) to the end customer (consumer). The path it took to get there is of little concern to the customer, as long as it receives the package.
The same logic can be applied to the network of brokers. Here’s how you build the flow from a simple message to a queue and build toward a network of brokers. Before you look at setting up a hybrid connection, I discuss how a broker processes messages in a simple scenario.
When a message is sent from a producer to a queue on a broker, the following steps occur:
A message is sent to a queue from the producer.
The broker persists this in its store or journal.
At this point, an acknowledgement (ACK) is sent to the producer from the broker.
When a consumer looks to consume the message from that same queue, the following steps occur:
The message listener (consumer) calls the broker, which creates a subscription to the queue.
Messages are fetched from the message store and sent to the consumer.
The consumer acknowledges that the message has been received before processing it.
Upon receiving the ACK, the broker sets the message as having been consumed. By default, this deletes it from the queue.
You can set the consumer to ACK after processing by setting up transactionmanagement or handle it manually using Session.CLIENT_ACKNOWLEDGE.
Static propagation
I now introduce the concept of static propagation with the network of brokers as the mechanism for message transfer from on-premises brokers to Amazon MQ. Static propagation refers to message propagation that occurs in the absence of subscription information. In this case, the objective is to transfer messages arriving at your selected on-premises broker to the Amazon MQ broker for consumption within the cloud environment.
After you configure static propagation with a network of brokers, the following occurs:
The on-premises broker receives a message from a producer for a specific queue.
The on-premises broker sends (statically propagates) the message to the Amazon MQ broker.
The Amazon MQ broker sends an acknowledgement to the on-premises broker, which marks the message as having been consumed.
Amazon MQ holds the message in its queue ready for consumption.
A consumer connects to Amazon MQ broker, subscribes to the queue in which the message resides, and receives the message.
Amazon MQ broker marks the message as having been consumed.
For Broker instance type, choose your instance size: – mq.t2.micro – mq.m4.large
For Deployment mode, enter one of the following: – Single-instance broker for development and test implementations (recommended) – Active/standby broker for high availability in production environments
Scroll down and enter your user name and password.
Expand Advanced Settings.
For VPC, Subnet, and Security Group, pick the values for the resources in which your broker will reside.
For Public Accessibility, choose Yes, as connectivity is internet-based. Another option would be to use private connectivity between your on-premises network and the VPC, an example being an AWS Direct Connect or VPN connection. In that case, you could set Public Accessibility to No.
For Maintenance, leave the default value, No preference.
Choose Create Broker. Wait several minutes for the broker to be created.
After creation is complete, you see your broker listed.
For connectivity to work, you must configure the security group where Amazon MQ resides. For this post, I focus on the OpenWire protocol.
For Openwire connectivity, allow port 61617 access for Amazon MQ from your on-premises ActiveMQ broker source IP address. For alternate protocols, see the Amazon MQ broker configuration information for the ports required:
Configuring the network of brokers with static propagation occurs on the on-premises broker by applying changes to the following file: <activemq install directory>/conf activemq.xml
Network connector
This is the first configuration item required to enable a network of brokers. It is only required on the on-premises broker, which initiates and creates the connection with Amazon MQ. This connection, after it’s established, enables the flow of messages in either direction between the on-premises broker and Amazon MQ. The focus of this post is the uni-directional flow of messages from the on-premises broker to Amazon MQ.
The default activemq.xml file does not include the network connector configuration. Add this with the networkConnector element. In this scenario, edit the on-premises broker activemq.xml file to include the following information between <systemUsage> and <transportConnectors>:
The highlighted components are the most important elements when configuring your on-premises broker.
name – Name of the network bridge. In this case, it specifies two things:
That this connection relates to an ActiveMQ queue (Q) as opposed to a topic (T), for reference purposes.
The source broker and target broker.
duplex –Setting this to false ensures that messages traverse uni-directionally from the on-premises broker to Amazon MQ.
uri –Specifies the remote endpoint to which to connect for message transfer. In this case, it is an Openwire endpoint on your Amazon MQ broker. This information could be obtained from the Amazon MQ console or via the API.
username and password – The same username and password configured when creating the Amazon MQ broker, and used to access the Amazon MQ ActiveMQ console.
networkTTL – Number of brokers in the network through which messages and subscriptions can pass. Leave this setting at the current value, if it is already included in your broker connection.
staticallyIncludedDestinations > queue physicalName – The destination ActiveMQ queue for which messages are destined. This is the queue that is propagated from the on-premises broker to the Amazon MQ broker for message consumption.
After the network connector is configured, you must restart the ActiveMQ service on the on-premises broker for the changes to be applied.
Verify the configuration
There are a number of places within the ActiveMQ console of your on-premises and Amazon MQ brokers to browse to verify that the configuration is correct and the connection has been established.
On-premises broker
Launch the ActiveMQ console of your on-premises broker and navigate to Network. You should see an active network bridge similar to the following:
This identifies that the connection between your on-premises broker and your Amazon MQ broker is up and running.
Now navigate to Connections and scroll to the bottom of the page. Under the Network Connectors subsection, you should see a connector labeled with the name: value that you provided within the ActiveMQ.xml configuration file. You should see an entry similar to:
Amazon MQ broker
Launch the ActiveMQ console of your Amazon MQ broker and navigate to Connections. Scroll to the Connections openwire subsection and you should see a connection specified that references the name: value that you provided within the ActiveMQ.xml configuration file. You should see an entry similar to:
If you configured the uri: for AMQP, STOMP, MQTT, or WSS as opposed to Openwire, you would see this connection under the corresponding section of the Connections page.
Testing your message flow
The setup described outlines a way for messages produced on premises to be propagated to the cloud for consumption in the cloud. This section provides steps on verifying the message flow.
Verify that the queue has been created
After you specify this queue name as staticallyIncludedDestinations > queue physicalName: and your ActiveMQ service starts, you see the following on your on-premises ActiveMQ console Queues page.
As you can see, no messages have been sent but you have one consumer listed. If you then choose Active Consumers under the Views column, you see Active Consumers for TestingQ.
This is telling you that your Amazon MQ broker is a consumer of your on-premises broker for the testing queue.
Produce and send a message to the on-premises broker
Now, produce a message on an on-premises producer and send it to your on-premises broker to a queue named TestingQ. If you navigate back to the queues page of your on-premises ActiveMQ console, you see that the messages enqueued and messages dequeued column count for your TestingQ queue have changed:
What this means is that the message originating from the on-premises producer has traversed the on-premises broker and propagated immediately to the Amazon MQ broker. At this point, the message is no longer available for consumption from the on-premises broker.
If you access the ActiveMQ console of your Amazon MQ broker and navigate to the Queues page, you see the following for the TestingQ queue:
This means that the message originally sent to your on-premises broker has traversed the network of brokers unidirectional network bridge, and is ready to be consumed from your Amazon MQ broker. The indicator is the Number of Pending Messages column.
Consume the message from an Amazon MQ broker
Connect to the Amazon MQ TestingQ queue from a consumer within the AWS Cloud environment for message consumption. Log on to the ActiveMQ console of your Amazon MQ broker and navigate to the Queue page:
As you can see, the Number of Pending Messages column figure has changed to 0 as that message has been consumed.
This diagram outlines the message lifecycle from the on-premises producer to the on-premises broker, traversing the hybrid connection between the on-premises broker and Amazon MQ, and finally consumption within the AWS Cloud.
Conclusion
This post focused on an ActiveMQ-specific scenario for transferring messages within an ActiveMQ queue from an on-premises broker to Amazon MQ.
For other on-premises brokers, such as IBM MQ, another approach would be to run ActiveMQ on-premises broker and use JMS bridging to IBM MQ, while using the approach in this post to forward to Amazon MQ. Yet another approach would be to use Apache Camel for more sophisticated routing.
I hope that you have found this example of hybrid messaging between an on-premises environment in the AWS Cloud to be useful. Many customers are already using on-premises ActiveMQ brokers, and this is a great use case to enable hybrid cloud scenarios.
To learn more, see the Amazon MQ website and Developer Guide. You can try Amazon MQ for free with the AWS Free Tier, which includes up to 750 hours of a single-instance mq.t2.micro broker and up to 1 GB of storage per month for one year.
Security updates have been issued by Debian (jackson-databind, leptonlib, libvorbis, python-crypto, and xen), Fedora (apache-commons-email, ca-certificates, libreoffice, libxml2, mujs, p7zip, python-django, sox, and torbrowser-launcher), openSUSE (libreoffice), SUSE (libreoffice), and Ubuntu (advancecomp, erlang, and freetype).
Security updates have been issued by Arch Linux (go, go-pie, and plasma-workspace), Debian (audacity, exim4, libreoffice, librsvg, ruby-omniauth, tomcat-native, and uwsgi), Fedora (tomcat-native), Gentoo (virtualbox), Mageia (kernel), openSUSE (freetype2, ghostscript, jhead, and libxml2), and SUSE (freetype2 and kernel).
Security updates have been issued by Arch Linux (dnsmasq, libmupdf, mupdf, mupdf-gl, mupdf-tools, and zathura-pdf-mupdf), CentOS (kernel), Debian (smarty3, thunderbird, and unbound), Fedora (bind, bind-dyndb-ldap, coreutils, curl, dnsmasq, dnsperf, gcab, java-1.8.0-openjdk, libxml2, mongodb, poco, rubygem-rack-protection, transmission, unbound, and wireshark), Red Hat (collectd, erlang, and openstack-nova), SUSE (bind), and Ubuntu (clamav and webkit2gtk).
Security updates have been issued by CentOS (bind), Debian (openocd), Mageia (unbound), Oracle (bind and microcode_ctl), Red Hat (bind, java-1.6.0-sun, libvirt, and qemu-kvm), Scientific Linux (bind), SUSE (kernel and perl-XML-LibXML), and Ubuntu (gimp, intel-microcode, mysql-5.5, mysql-5.7, and openssh).
Security updates have been issued by Debian (bind9, couchdb, lucene-solr, mysql-5.5, openocd, and php5), Mageia (gdk-pixbuf2.0, golang, and mariadb), openSUSE (curl, gd, ImageMagick, lxterminal, ncurses, newsbeuter, perl-XML-LibXML, and xmltooling), Oracle (kernel), and SUSE (xmltooling).
Security updates have been issued by Arch Linux (bind, irssi, nrpe, perl-xml-libxml, and transmission-cli), CentOS (java-1.8.0-openjdk), Debian (awstats, libgd2, mysql-5.5, rsync, smarty3, and transmission), Fedora (keycloak-httpd-client-install and rootsh), and Red Hat (java-1.7.0-oracle and java-1.8.0-oracle).
As might be guessed, a fair number of these updates are for the kernel and microcode changes to mitigate Meltdown and Spectre. More undoubtedly coming over the next weeks.
Security updates have been issued by CentOS (kernel, linux-firmware, and microcode_ctl), Debian (imagemagick), Fedora (kernel, libvirt, and python33), Mageia (curl, gdm, gnome-shell, libexif, libxml2, libxml2, perl-XML-LibXML, perl, swftools, and systemd), openSUSE (kernel-firmware), Oracle (kernel), Red Hat (kernel, kernel-rt, linux-firmware, and microcode_ctl), Scientific Linux (kernel, linux-firmware, and microcode_ctl), SUSE (ImageMagick, java-1_7_0-openjdk, kernel, kernel-firmware, microcode_ctl, qemu, and ucode-intel), and Ubuntu (apport, dnsmasq, and webkit2gtk).
The collective thoughts of the interwebz
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.