Tag Archives: auditing

Yuki Chan – Automated Penetration Testing Tool

Post Syndicated from Darknet original https://www.darknet.org.uk/2017/10/yuki-chan-automated-penetration-testing-tool/?utm_source=rss&utm_medium=social&utm_campaign=darknetfeed

Yuki Chan – Automated Penetration Testing Tool

Yuki Chan is an Automated Penetration Testing Tool that carries out a whole range of standard security auditing tasks automatically. It’s highly recommended to use this tool within Kali Linux OS as it already contains all the dependencies.

This tool is only designed for Linux OS so if you are not using Linux OS it won’t be much use, but if you have Android Smartphone or Tablet you can run this tool via Termux or GNURoot Debian.

Read the rest of Yuki Chan – Automated Penetration Testing Tool now! Only available at Darknet.

AWS Hot Startups – September 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-september-2017/

As consumers continue to demand faster, simpler, and more on-the-go services, FinTech companies are responding with ever more innovative solutions to fit everyone’s needs and to improve customer experience. This month, we are excited to feature the following startups—all of whom are disrupting traditional financial services in unique ways:

  • Acorns – allowing customers to invest spare change automatically.
  • Bondlinc – improving the bond trading experience for clients, financial institutions, and private banks.
  • Lenda – reimagining homeownership with a secure and streamlined online service.

Acorns (Irvine, CA)

Driven by the belief that anyone can grow wealth, Acorns is relentlessly pursuing ways to help make that happen. Currently the fastest-growing micro-investing app in the U.S., Acorns takes mere minutes to get started and is currently helping over 2.2 million people grow their wealth. And unlike other FinTech apps, Acorns is focused on helping America’s middle class – namely the 182 million citizens who make less than $100,000 per year – and looking after their financial best interests.

Acorns is able to help their customers effortlessly invest their money, little by little, by offering ETF portfolios put together by Dr. Harry Markowitz, a Nobel Laureate in economic sciences. They also offer a range of services, including “Round-Ups,” whereby customers can automatically invest spare change from every day purchases, and “Recurring Investments,” through which customers can set up automatic transfers of just $5 per week into their portfolio. Additionally, Found Money, Acorns’ earning platform, can help anyone spend smarter as the company connects customers to brands like Lyft, Airbnb, and Skillshare, who then automatically invest in customers’ Acorns account.

The Acorns platform runs entirely on AWS, allowing them to deliver a secure and scalable cloud-based experience. By utilizing AWS, Acorns is able to offer an exceptional customer experience and fulfill its core mission. Acorns uses Terraform to manage services such as Amazon EC2 Container Service, Amazon CloudFront, and Amazon S3. They also use Amazon RDS and Amazon Redshift for data storage, and Amazon Glacier to manage document retention.

Acorns is hiring! Be sure to check out their careers page if you are interested.

Bondlinc (Singapore)

Eng Keong, Founder and CEO of Bondlinc, has long wanted to standardize, improve, and automate the traditional workflows that revolve around bond trading. As a former trader at BNP Paribas and Jefferies & Company, E.K. – as Keong is known – had personally seen how manual processes led to information bottlenecks in over-the-counter practices. This drove him, along with future Bondlinc CTO Vincent Caldeira, to start a new service that maximizes efficiency, information distribution, and accessibility for both clients and bankers in the bond market.

Currently, bond trading requires banks to spend a significant amount of resources retrieving data from expensive and restricted institutional sources, performing suitability checks, and attaching required documentation before presenting all relevant information to clients – usually by email. Bankers are often overwhelmed by these time-consuming tasks, which means clients don’t always get proper access to time-sensitive bond information and pricing. Bondlinc bridges this gap between banks and clients by providing a variety of solutions, including easy access to basic bond information and analytics, updates of new issues and relevant news, consolidated management of your portfolio, and a chat function between banker and client. By making the bond market much more accessible to clients, Bondlinc is taking private banking to the next level, while improving efficiency of the banks as well.

As a startup running on AWS since inception, Bondlinc has built and operated its SaaS product by leveraging Amazon EC2, Amazon S3, Elastic Load Balancing, and Amazon RDS across multiple Availability Zones to provide its customers (namely, financial institutions) a highly available and seamlessly scalable product distribution platform. Bondlinc also makes extensive use of Amazon CloudWatch, AWS CloudTrail, and Amazon SNS to meet the stringent operational monitoring, auditing, compliance, and governance requirements of its customers. Bondlinc is currently experimenting with Amazon Lex to build a conversational interface into its mobile application via a chat-bot that provides trading assistance services.

To see how Bondlinc works, request a demo at Bondlinc.com.

Lenda (San Francisco, CA)

Lenda is a digital mortgage company founded by seasoned FinTech entrepreneur Jason van den Brand. Jason wanted to create a smarter, simpler, and more streamlined system for people to either get a mortgage or refinance their homes. With Lenda, customers can find out if they are pre-approved for loans, and receive accurate, real-time mortgage rate quotes from industry-experienced home loan advisors. Lenda’s advisors support customers through the loan process by providing financial advice and guidance for a seamless experience.

Lenda’s innovative platform allows borrowers to complete their home loans online from start to finish. Through a savvy combination of being a direct lender with proprietary technology, Lenda has simplified the mortgage application process to save customers time and money. With an interactive dashboard, customers know exactly where they are in the mortgage process and can manage all of their documents in one place. The company recently received its Series A funding of $5.25 million, and van den Brand shared that most of the capital investment will be used to improve Lenda’s technology and fulfill the company’s mission, which is to reimagine homeownership, starting with home loans.

AWS allows Lenda to scale its business while providing a secure, easy-to-use system for a faster home loan approval process. Currently, Lenda uses Amazon S3, Amazon EC2, Amazon CloudFront, Amazon Redshift, and Amazon WorkSpaces.

Visit Lenda.com to find out more.

Thanks for reading and see you in October for another round of hot startups!

-Tina

Now Use AWS IAM to Delete a Service-Linked Role When You No Longer Require an AWS Service to Perform Actions on Your Behalf

Post Syndicated from Ujjwal Pugalia original https://aws.amazon.com/blogs/security/now-use-aws-iam-to-delete-a-service-linked-role-when-you-no-longer-require-an-aws-service-to-perform-actions-on-your-behalf/

Earlier this year, AWS Identity and Access Management (IAM) introduced service-linked roles, which provide you an easy and secure way to delegate permissions to AWS services. Each service-linked role delegates permissions to an AWS service, which is called its linked service. Service-linked roles help with monitoring and auditing requirements by providing a transparent way to understand all actions performed on your behalf because AWS CloudTrail logs all actions performed by the linked service using service-linked roles. For information about which services support service-linked roles, see AWS Services That Work with IAM. Over time, more AWS services will support service-linked roles.

Today, IAM added support for the deletion of service-linked roles through the IAM console and the IAM API/CLI. This means you now can revoke permissions from the linked service to create and manage AWS resources in your account. When you delete a service-linked role, the linked service no longer has the permissions to perform actions on your behalf. To ensure your AWS services continue to function as expected when you delete a service-linked role, IAM validates that you no longer have resources that require the service-linked role to function properly. This prevents you from inadvertently revoking permissions required by an AWS service to manage your existing AWS resources and helps you maintain your resources in a consistent state. If there are any resources in your account that require the service-linked role, you will receive an error when you attempt to delete the service-linked role, and the service-linked role will remain in your account. If you do not have any resources that require the service-linked role, you can delete the service-linked role and IAM will remove the service-linked role from your account.

In this blog post, I show how to delete a service-linked role by using the IAM console. To learn more about how to delete service-linked roles by using the IAM API/CLI, see the DeleteServiceLinkedRole API documentation.

Note: The IAM console does not currently support service-linked role deletion for Amazon Lex, but you can delete your service-linked role by using the Amazon Lex console. To learn more, see Service Permissions.

How to delete a service-linked role by using the IAM console

If you no longer need to use an AWS service that uses a service-linked role, you can remove permissions from that service by deleting the service-linked role through the IAM console. To delete a service-linked role, you must have permissions for the iam:DeleteServiceLinkedRole action. For example, the following IAM policy grants the permission to delete service-linked roles used by Amazon Redshift. To learn more about working with IAM policies, see Working with Policies.

{ 
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowDeletionOfServiceLinkedRolesForRedshift",
            "Effect": "Allow",
            "Action": ["iam:DeleteServiceLinkedRole"],
            "Resource": ["arn:aws:iam::*:role/aws-service-role/redshift.amazonaws.com/AWSServiceRoleForRedshift*"]
	 }
    ]
}

To delete a service-linked role by using the IAM console:

  1. Navigate to the IAM console and choose Roles from the navigation pane.

Screenshot of the Roles page in the IAM console

  1. Choose the service-linked role you want to delete and then choose Delete role. In this example, I choose the  AWSServiceRoleForRedshift service-linked role.

Screenshot of the AWSServiceRoleForRedshift service-linked role

  1. A dialog box asks you to confirm that you want to delete the service-linked role you have chosen. In the Last activity column, you can see when the AWS service last used the service-linked role, which tells you when the linked service last used the service-linked role to perform an action on your behalf. If you want to continue to delete the service-linked role, choose Yes, delete to delete the service-linked role.

Screenshot of the "Delete role" window

  1. IAM then checks whether you have any resources that require the service-linked role you are trying to delete. While IAM checks, you will see the status message, Deletion in progress, below the role name. Screenshot showing "Deletion in progress"
  1. If no resources require the service-linked role, IAM deletes the role from your account and displays a success message on the console.

Screenshot of the success message

  1. If there are AWS resources that require the service-linked role you are trying to delete, you will see the status message, Deletion failed, below the role name.

Screenshot showing the "Deletion failed"

  1. If you choose View details, you will see a message that explains the deletion failed because there are resources that use the service-linked role.
    Screenshot showing details about why the role deletion failed
  2. Choose View Resources to view the Amazon Resource Names (ARNs) of the first five resources that require the service-linked role. You can delete the service-linked role only after you delete all resources that require the service-linked role. In this example, only one resource requires the service-linked role.

Conclusion

Service-linked roles make it easier for you to delegate permissions to AWS services to create and manage AWS resources on your behalf and to understand all actions the service will perform on your behalf. If you no longer need to use an AWS service that uses a service-linked role, you can remove permissions from that service by deleting the service-linked role through the IAM console. However, before you delete a service-linked role, you must delete all the resources associated with that role to ensure that your resources remain in a consistent state.

If you have any questions, submit a comment in the “Comments” section below. If you need help working with service-linked roles, start a new thread on the IAM forum or contact AWS Support.

– Ujjwal

Prime Day 2017 – Powered by AWS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/prime-day-2017-powered-by-aws/

The third annual Prime Day set another round of records for global orders, topping Black Friday and Cyber Monday, making it the biggest day in Amazon retail history. Over the course of the 30 hour event, tens of millions of Prime members purchased things like Echo Dots, Fire tablets, programmable pressure cookers, espresso machines, rechargeable batteries, and much more! July 11th also set a record for the number of new Prime memberships, as people signed up in order to take advantage of hundreds of thousands of deals. Amazon customers shopped online and made heavy use of the Amazon App, with mobile orders more than doubling from last Prime Day.

Powered by AWS
Last year I told you about How AWS Powered Amazon’s Biggest Day Ever, and shared what the team had learned with regard to preparation, automation, monitoring, and thinking big. All of those lessons still apply and you can read that post to learn more. Preparation for this year’s Prime Day (which started just days after Prime Day 2016 wrapped up) started by collecting and sharing best practices and identifying areas for improvement, proceeding to implementation and stress testing as the big day approached. Two of the best practices involve auditing and GameDay:

Auditing – This is a formal way for us to track preparations, identify risks, and to track progress against our objectives. Each team must respond to a series of detailed technical and operational questions that are designed to help them determine their readiness. On the technical side, questions could revolve around time to recovery after a database failure, including the all-important check of the TTL (time to live) for the CNAME. Operational questions address schedules for on-call personnel, points of contact, and ownership of services & instances.

GameDay – This practice (which I believe originated with former Amazonian Jesse Robbins), is intended to validate all of the capacity planning & preparation and to verify that all of the necessary operational practices are in place and work as expected. It introduces simulated failures and helps to train the team to identify and quickly resolve issues, building muscle memory in the process. It also tests failover and recovery capabilities, and can expose latent defects that are lurking under the covers. GameDays help teams to understand scaling drivers (page views, orders, and so forth) and gives them an opportunity to test their scaling practices. To learn more, read Resilience Engineering: Learning to Embrace Failure or watch the video: GameDay: Creating Resiliency Through Destruction.

Prime Day 2017 Metrics
So, how did we do this year?

The AWS teams checked their dashboards and log files, and were happy to share their metrics with me. Here are a few of the most interesting ones:

Block Storage – Use of Amazon Elastic Block Store (EBS) grew by 40% year-over-year, with aggregate data transfer jumping to 52 petabytes (a 50% increase) for the day and total I/O requests rising to 835 million (a 30% increase). The team told me that they loved the elasticity of EBS, and that they were able to ramp down on capacity after Prime Day concluded instead of being stuck with it.

NoSQL Database – Amazon DynamoDB requests from Alexa, the Amazon.com sites, and the Amazon fulfillment centers totaled 3.34 trillion, peaking at 12.9 million per second. According to the team, the extreme scale, consistent performance, and high availability of DynamoDB let them meet needs of Prime Day without breaking a sweat.

Stack Creation – Nearly 31,000 AWS CloudFormation stacks were created for Prime Day in order to bring additional AWS resources on line.

API Usage – AWS CloudTrail processed over 50 billion events and tracked more than 419 billion calls to various AWS APIs, all in support of Prime Day.

Configuration TrackingAWS Config generated over 14 million Configuration items for AWS resources.

You Can Do It
Running an event that is as large, complex, and mission-critical as Prime Day takes a lot of planning. If you have an event of this type in mind, please take a look at our new Infrastructure Event Readiness white paper. Inside, you will learn how to design and provision your applications to smoothly handle planned scaling events such as product launches or seasonal traffic spikes, with sections on automation, resiliency, cost optimization, event management, and more.

Jeff;

 

Implement Continuous Integration and Delivery of Apache Spark Applications using AWS

Post Syndicated from Luis Caro Perez original https://aws.amazon.com/blogs/big-data/implement-continuous-integration-and-delivery-of-apache-spark-applications-using-aws/

When you develop Apache Spark–based applications, you might face some additional challenges when dealing with continuous integration and deployment pipelines, such as the following common issues:

  • Applications must be tested on real clusters using automation tools (live test)
  • Any user or developer must be able to easily deploy and use different versions of both the application and infrastructure to be able to debug, experiment on, and test different functionality.
  • Infrastructure needs to be evaluated and tested along with the application that uses it.

In this post, we walk you through a solution that implements a continuous integration and deployment pipeline supported by AWS services. The pipeline offers the following workflow:

  • Deploy the application to a QA stage after a commit is performed to the source code.
  • Perform a unit test using Spark local mode.
  • Deploy to a dynamically provisioned Amazon EMR cluster and test the Spark application on it
  • Update the application as an AWS Service Catalog product version, allowing a user to deploy any version (commit) of the application on demand.

Solution overview

The following diagram shows the pipeline workflow.

The solution uses AWS CodePipeline, which allows users to orchestrate and automate the build, test, and deploy stages for application source code. The solution consists of a pipeline that contains the following stages:

  • Source: Both the Spark application source code in addition to the AWS CloudFormation template file for deploying the application are committed to version control. In this example, we use AWS CodeCommit. For an example of the application source code, see zip. 
  • Build: In this stage, you use Apache Maven both to generate the application .jar binaries and to execute all of the application unit tests that end with *Spec.scala. In this example, we use AWS CodeBuild, which runs the unit tests given that they are designed to use Spark local mode.
  • QADeploy: In this stage, the .jar file built previously is deployed using the CloudFormation template included with the application source code. All the resources are created in this stage, such as networks, EMR clusters, and so on. 
  • LiveTest: In this stage, you use Apache Maven to execute all the application tests that end with *SpecLive.scala. The tests submit EMR steps to the cluster created as part of the QADeploy step. The tests verify that the steps ran successfully and their results. 
  • LiveTestApproval: This stage is included in case a pipeline administrator approval is required to deploy the application to the next stages. The pipeline pauses in this stage until an administrator manually approves the release. 
  • QACleanup: In this stage, you use an AWS Lambda function to delete the CloudFormation template deployed as part of the QADeploy stage. The function does not affect any resources other than those deployed as part of the QADeploy stage. 
  • DeployProduct: In this stage, you use a Lambda function that creates or updates an AWS Service Catalog product and portfolio. Every time the pipeline releases a change to the application, the AWS Service Catalog product gets a new version, with the commit of the change as the version description. 

Try it out!

Use the provided sample template to get started using this solution. This template creates the pipeline described earlier with all of its stages. It performs an initial commit of the sample Spark application in order to trigger the first release change. To deploy the template, use the following AWS CLI command:

aws cloudformation create-stack  --template-url https://s3.amazonaws.com/aws-bigdata-blog/artifacts/sparkAppDemoForPipeline/emrSparkpipeline.yaml --stack-name emr-spark-pipeline --capabilities CAPABILITY_NAMED_IAM

After the template finishes creating resources, you see the pipeline name on the stack Outputs tab. After that, open the AWS CodePipeline console and select the newly created pipeline.

After a couple of minutes, AWS CodePipeline detects the initial commit applied by the CloudFormation stack and starts the first release.

You can watch how the pipeline goes through the Build, QADeploy, and LiveTest stages until it finally reaches the LiveTestApproval stage.

At this point, you can check the results of the test in the log files of the Build and LiveTest stage jobs on AWS CodeBuild. If you check the CloudFormation console, you see that a new template has been deployed as part of the QADeploy stage.

You can also visit the EMR console and view how the LiveTest stage submitted steps to the EMR cluster.

After performing the review, manually approve the revision on the LiveTestApproval stage by using the AWS CodePipeline console.

After the revision is approved, the pipeline proceeds to use a Lambda function that destroys the resources deployed on the QAdeploy stage. Finally, it creates or updates a product and portfolio in AWS Service Catalog. After the final stage of the pipeline is complete, you can check that the product is created successfully on the AWS Service Catalog console.

You can check the product versions and notice that the first version is the initial commit performed by the CloudFormation template.

You can proceed to share the created portfolio with any users in your AWS account and allow them to deploy any version of the Spark application. You can also perform a commit on the AWS CodeCommit repository. The pipeline is triggered automatically and repeats the pipeline execution to deploy a new version of the product.

To destroy all of the resources created by the stack, make sure all the deployed stacks using AWS Service Catalog or the QAdeploy stage are destroyed. Then, destroy the pipeline template using the following AWS CLI command:

 

aws cloudformation delete-stack --stack-name emr-spark-pipeline

Conclusion

You can use the sample template and Spark application shared in this post and adapt them for the specific needs of your own application. The pipeline can have as many stages as needed and it can be used to automatically deploy to AWS Service Catalog or a production environment using CloudFormation.

If you have questions or suggestions, please comment below.


Additional Reading

Learn how to implement authorization and auditing on Amazon EMR using Apache Ranger.

 


About the Authors

Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

Samuel Schmidt is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.

 

 

 

AWS Summit New York – Summary of Announcements

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-summit-new-york-summary-of-announcements/

Whew – what a week! Tara, Randall, Ana, and I have been working around the clock to create blog posts for the announcements that we made at the AWS Summit in New York. Here’s a summary to help you to get started:

Amazon Macie – This new service helps you to discover, classify, and secure content at scale. Powered by machine learning and making use of Natural Language Processing (NLP), Macie looks for patterns and alerts you to suspicious behavior, and can help you with governance, compliance, and auditing. You can read Tara’s post to see how to put Macie to work; you select the buckets of interest, customize the classification settings, and review the results in the Macie Dashboard.

AWS GlueRandall’s post (with deluxe animated GIFs) introduces you to this new extract, transform, and load (ETL) service. Glue is serverless and fully managed, As you can see from the post, Glue crawls your data, infers schemas, and generates ETL scripts in Python. You define jobs that move data from place to place, with a wide selection of transforms, each expressed as code and stored in human-readable form. Glue uses Development Endpoints and notebooks to provide you with a testing environment for the scripts you build. We also announced that Amazon Athena now integrates with Amazon Glue, as does Apache Spark and Hive on Amazon EMR.

AWS Migration Hub – This new service will help you to migrate your application portfolio to AWS. My post outlines the major steps and shows you how the Migration Hub accelerates, tracks,and simplifies your migration effort. You can begin with a discovery step, or you can jump right in and migrate directly. Migration Hub integrates with tools from our migration partners and builds upon the Server Migration Service and the Database Migration Service.

CloudHSM Update – We made a major upgrade to AWS CloudHSM, making the benefits of hardware-based key management available to a wider audience. The service is offered on a pay-as-you-go basis, and is fully managed. It is open and standards compliant, with support for multiple APIs, programming languages, and cryptography extensions. CloudHSM is an integral part of AWS and can be accessed from the AWS Management Console, AWS Command Line Interface (CLI), and through API calls. Read my post to learn more and to see how to set up a CloudHSM cluster.

Managed Rules to Secure S3 Buckets – We added two new rules to AWS Config that will help you to secure your S3 buckets. The s3-bucket-public-write-prohibited rule identifies buckets that have public write access and the s3-bucket-public-read-prohibited rule identifies buckets that have global read access. As I noted in my post, you can run these rules in response to configuration changes or on a schedule. The rules make use of some leading-edge constraint solving techniques, as part of a larger effort to use automated formal reasoning about AWS.

CloudTrail for All Customers – Tara’s post revealed that AWS CloudTrail is now available and enabled by default for all AWS customers. As a bonus, Tara reviewed the principal benefits of CloudTrail and showed you how to review your event history and to deep-dive on a single event. She also showed you how to create a second trail, for use with CloudWatch CloudWatch Events.

Encryption of Data at Rest for EFS – When you create a new file system, you now have the option to select a key that will be used to encrypt the contents of the files on the file system. The encryption is done using an industry-standard AES-256 algorithm. My post shows you how to select a key and to verify that it is being used.

Watch the Keynote
My colleagues Adrian Cockcroft and Matt Wood talked about these services and others on the stage, and also invited some AWS customers to share their stories. Here’s the video:

Jeff;

 

Announcing the New Customer Compliance Center

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/announcing-the-new-customer-compliance-center/

AWS has the longest running, most effective, and most customer-obsessed compliance program in the cloud market. We have always centered our program around customers, obtaining the certifications needed to provide our customers with the proper level of validated transparency in order to enable them to certify their own AWS workloads [download .pdf of AWS certifications]. We also offer a rich suite of embedded compliance tooling, enabling customers and partners to more effectively manage security controls and in turn provide evidence of effective control operation to their auditors. Along with our customers and partners, we have the largest, most diverse, and most comprehensive compliance footprint in the industry.

Enabling customers is a core part of the AWS DNA. Today, in the spirit of that pedigree, I’m happy to announce we’ve launched a new AWS Customer Compliance Center. This center is focused on the security and compliance of our customers on AWS. You can learn from other customer experiences and discover how your peers have solved the compliance, governance, and audit challenges present in today’s regulatory environment. You can also access our industry-first cloud Auditor Learning Path via the customer center. These online university learning resources are logical learning paths, specifically designed for security, compliance and audit professionals, allowing you to build on the IT skills you have to move your environment to the next generation of audit and security assurance. As we engage with our security and compliance customer colleagues on this topic, we will continue to update and improve upon the existing resource and publish new enablers in the coming months.

We are excited to continue to work with our customers on moving from the old-guard manual audit world to the new cloud-enabled, automated, “secure and compliant by default” model we’ve been leading over the past few years.

– Chad Woolf, AWS Security & Compliance

Ghost Phisher – Phishing Attack Tool With GUI

Post Syndicated from Darknet original http://feedproxy.google.com/~r/darknethackers/~3/mogKZIEOkns/

Ghost Phisher is a Wireless and Ethernet security auditing and phishing attack tool written using the Python Programming Language and the Python Qt GUI library, the program is able to emulate access points and deploy. The tool comes with a fake DNS server, fake DHCP server, fake HTTP server and also has an integrated area […]

The post Ghost…

Read the full post at darknet.org.uk

New Information in the AWS IAM Console Helps You Follow IAM Best Practices

Post Syndicated from Rob Moncur original https://aws.amazon.com/blogs/security/newly-updated-features-in-the-aws-iam-console-help-you-adhere-to-iam-best-practices/

Today, we added new information to the Users section of the AWS Identity and Access Management (IAM) console to make it easier for you to follow IAM best practices. With this new information, you can more easily monitor users’ activity in your AWS account and identify access keys and passwords that you should rotate regularly. You can also better audit users’ MFA device usage and keep track of their group memberships. In this post, I show how you can use this new information to help you follow IAM best practices.

Monitor activity in your AWS account

The IAM best practice, monitor activity in your AWS account, encourages you to monitor user activity in your AWS account by using services such as AWS CloudTrail and AWS Config. In addition to monitoring usage in your AWS account, you should be aware of inactive users so that you can remove them from your account. By only retaining necessary users, you can help maintain the security of your AWS account.

To help you find users that are inactive, we added three new columns to the IAM user table: Last activity, Console last sign-in, and Access key last used.
Screenshot showing three new columns in the IAM user table

  1. Last activity – This column tells you how long it has been since the user has either signed in to the AWS Management Console or accessed AWS programmatically with their access keys. Use this column to find users who might be inactive, and consider removing them from your AWS account.
  2. Console last sign-in – This column displays the time since the user’s most recent console sign-in. Consider removing passwords from users who are not signing in to the console.
  3. Access key last used – This column displays the time since a user last used access keys. Use this column to find any access keys that are not being used, and deactivate or remove them.

Rotate credentials regularly

The IAM best practice, rotate credentials regularly, recommends that all users in your AWS account change passwords and access keys regularly. With this practice, if a password or access key is compromised without your knowledge, you can limit how long the credentials can be used to access your resources. To help your management efforts, we added three new columns to the IAM user table: Access key age, Password age, and Access key ID.

Screenshot showing three new columns in the IAM user table

  1. Access key age – This column shows how many days it has been since the oldest active access key was created for a user. With this information, you can audit access keys easily across all your users and identify the access keys that may need to be rotated.

Based on the number of days since the access key has been rotated, a green, yellow, or red icon is displayed. To see the corresponding time frame for each icon, pause your mouse pointer on the Access key age column heading to see the tooltip, as shown in the following screenshot.

Icons showing days since the oldest active access key was created

  1. Password age – This column shows the number of days since a user last changed their password. With this information, you can audit password rotation and identify users who have not changed their password recently. The easiest way to make sure that your users are rotating their password often is to establish an account password policy that requires users to change their password after a specified time period.
  2. Access key ID – This column displays the access key IDs for users and the current status (Active/Inactive) of those access key IDs. This column makes it easier for you to locate and see the state of access keys for each user, which is useful for auditing. To find a specific access key ID, use the search box above the table.

Enable MFA for privileged users

Another IAM best practice is to enable multi-factor authentication (MFA) for privileged IAM users. With MFA, users have a device that generates a unique authentication code (a one-time password [OTP]). Users must provide both their normal credentials (such as their user name and password) and the OTP when signing in.

To help you see if MFA has been enabled for your users, we’ve improved the MFA column to show you if MFA is enabled and which type of MFA (hardware, virtual, or SMS) is enabled for each user, where applicable.

Screenshot showing the improved "MFA" column

Use groups to assign permissions to IAM users

Instead of defining permissions for individual IAM users, it’s usually more convenient to create groups that relate to job functions (such as administrators, developers, and accountants), define the relevant permissions for each group, and then assign IAM users to those groups. All the users in an IAM group inherit the permissions assigned to the group. This way, if you need to modify permissions, you can make the change once for everyone in a group instead of making the change one time for each user. As people move around in your company, you can change the group membership of the IAM user.

To better understand which groups your users belong to, we’ve made updates:

  1. Groups – This column now lists the groups of which a user is a member. This information makes it easier to understand and compare multiple users’ permissions at once.
  2. Group count – This column shows the number of groups to which each user belongs.Screenshot showing the updated "Groups" and "Group count" columns

Customize your view

Choosing which columns you see in the User table is easy to do. When you click the button with the gear icon in the upper right corner of the table, you can choose the columns you want to see, as shown in the following screenshots.

Screenshot showing gear icon  Screenshot of "Manage columns" dialog box

Conclusion

We made these improvements to the Users section of the IAM console to make it easier for you to follow IAM best practices in your AWS account. Following these best practices can help you improve the security of your AWS resources and make your account easier to manage.

If you have comments about this post, submit them in the “Comments” section below. If you have questions or suggestions, please start a new thread on the IAM forum.

– Rob

A Man-in-the-Middle Attack against a Password Reset System

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/07/a_man-in-the-mi.html

This is nice work: “The Password Reset MitM Attack,” by Nethanel Gelerntor, Senia Kalma, Bar Magnezi, and Hen Porcilan:

Abstract: We present the password reset MitM (PRMitM) attack and show how it can be used to take over user accounts. The PRMitM attack exploits the similarity of the registration and password reset processes to launch a man in the middle (MitM) attack at the application level. The attacker initiates a password reset process with a website and forwards every challenge to the victim who either wishes to register in the attacking site or to access a particular resource on it.

The attack has several variants, including exploitation of a password reset process that relies on the victim’s mobile phone, using either SMS or phone call. We evaluated the PRMitM attacks on Google and Facebook users in several experiments, and found that their password reset process is vulnerable to the PRMitM attack. Other websites and some popular mobile applications are vulnerable as well.

Although solutions seem trivial in some cases, our experiments show that the straightforward solutions are not as effective as expected. We designed and evaluated two secure password reset processes and evaluated them on users of Google and Facebook. Our results indicate a significant improvement in the security. Since millions of accounts are currently vulnerable to the PRMitM attack, we also present a list of recommendations for implementing and auditing the password reset process.

Password resets have long been a weak security link.

BoingBoing Post.

Analysis of Top-N DynamoDB Objects using Amazon Athena and Amazon QuickSight

Post Syndicated from Rendy Oka original https://aws.amazon.com/blogs/big-data/analysis-of-top-n-dynamodb-objects-using-amazon-athena-and-amazon-quicksight/

If you run an operation that continuously generates a large amount of data, you may want to know what kind of data is being inserted by your application. The ability to analyze data intake quickly can be very valuable for business units, such as operations and marketing. For many operations, it’s important to see what is driving the business at any particular moment. For retail companies, for example, understanding which products are currently popular can aid in planning for future growth. Similarly, for PR companies, understanding the impact of an advertising campaign can help them market their products more effectively.

This post covers an architecture that helps you analyze your streaming data. You’ll build a solution using Amazon DynamoDB Streams, AWS Lambda, Amazon Kinesis Firehose, and Amazon Athena to analyze data intake at a frequency that you choose. And because this is a serverless architecture, you can use all of the services here without the need to provision or manage servers.

The data source

You’ll collect a random sampling of tweets via Twitter’s API and store a variety of attributes in your DynamoDB table, such as: Twitter handle, tweet ID, hashtags, location, and Time-To-Live (TTL) value.

In DynamoDB, the primary key is used as an input to an internal hash function. The output from this function determines the partition in which the data will be stored. When using a combination of primary key and sort key as a DynamoDB schema, you need to make sure that no single partition key contains many more objects than the other partition keys because this can cause partition level throttling. For the demonstration in this blog, the Twitter handle will be the primary key and the tweet ID will be the sort key. This allows you to group and sort tweets from each user.

To help you get started, I have written a script that pulls a live Twitter stream that you can use to generate your data. All you need to do is provide your own Twitter Apps credentials, and it should generate the data immediately. Alternatively, I have also provided a script that you can use to generate random Tweets with little effort.

You can find both scripts in the Github repository:

https://github.com/awslabs/aws-blog-dynamodb-analysis

There are some modules that you may need to install to run these scripts. You can find them in Python’s module repository:

To get your own Twitter credentials, go to https://www.twitter.com/ and sign up for a free account, if you don’t already have one. After your account is set up, go to https://apps.twitter.com/. On the main landing page, choose the Create New App button. After the application is created, go to Keys and Access Tokens to get your credentials to use the Twitter API. You’ll need to generate Customer Tokens/Secret and Access Token/Secret. All four keys will be used to authenticate your request.

Architecture overview

Before we begin, let’s take a look at the overall flow of information will look like, from data ingestion into DynamoDB to visualization of results in Amazon QuickSight.

As illustrated in the architecture diagram above, any changes made to the items in DynamoDB will be captured and processed using DynamoDB Streams. Next, a Lambda function will be invoked by a trigger that is configured to respond to events in DynamoDB Streams. The Lambda function processes the data prior to pushing to Amazon Kinesis Firehose, which will output to Amazon S3. Finally, you use Amazon Athena to analyze the streaming data landing in Amazon S3. The result can be explored and visualized in Amazon QuickSight for your company’s business analytics.

You’ll need to implement your custom Lambda function to help transform the raw <key, value> data stored in DynamoDB to a JSON format for Athena to digest, but I can help you with a sample code that you are free to modify.

Implementation

In the following sections, I’ll walk through how you can set up the architecture discussed earlier.

Create your DynamoDB table

First, let’s create a DynamoDB table and enable DynamoDB Streams. This will enable data to be copied out of this table. From the console, use the user_id as the partition key and tweet_id as the sort key:

After the table is ready, you can enable DynamoDB Streams. This process operates asynchronously, so there is no performance impact on the table when you enable this feature. The easiest way to manage DynamoDB Streams is also through the DynamoDB console.

In the Overview tab of your newly created table, click Manage Stream. In the window, choose the information that will be written to the stream whenever data in the table is added or modified. In this example, you can choose either New image or New and old images.

For more details on this process, check out our documentation:

http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html

Configure Kinesis Firehose

Before creating the Lambda function, you need to configure Kinesis Firehose delivery stream so that it’s ready to accept data from Lambda. Open the Firehose console and choose Create Firehose Delivery Stream. From here, choose S3 as the destination and use the following to information to configure the resource. Note the Delivery stream name because you will use it in the next step.

For more details on this process, check out our documentation:

http://docs.aws.amazon.com/firehose/latest/dev/basic-create.html#console-to-s3

Create your Lambda function

Now that Kinesis Firehose is ready to accept data, you can create your Lambda function.

From the AWS Lambda console, choose the Create a Lambda function button and use the Blank Function. Enter a name and description, and choose Python 2.7 as the Runtime. Note your Lambda function name because you’ll need it in the next step.

In the Lambda function code field, you can paste the script that I have written for this purpose. All this function needs is the name of your Firehose stream name set as an environment variable.

import boto3
import json
import os

# Initiate Firehose client
firehose_client = boto3.client('firehose')

def lambda_handler(event, context):
    records = []
    batch   = []
    try :
        for record in event['Records']:
            tweet = {}
            t_stats = '{ "table_name":"%s", "user_id":"%s", "tweet_id":"%s", "approx_post_time":"%d" }\n' \
                      % ( record['eventSourceARN'].split('/')[1], \
                          record['dynamodb']['Keys']['user_id']['S'], \
                          record['dynamodb']['Keys']['tweet_id']['N'], \
                          int(record['dynamodb']['ApproximateCreationDateTime']) )
            tweet["Data"] = t_stats
            records.append(tweet)
        batch.append(records)
        res = firehose_client.put_record_batch(
            DeliveryStreamName = os.environ['firehose_stream_name'],
            Records = batch[0]
        )
        return 'Successfully processed {} records.'.format(len(event['Records']))
    except Exception :
        pass

The handler should be set to lambda_function.lambda_handler and you can use the existing lambda_dynamodb_streams role that’s been created by default.

Enable DynamoDB trigger and start collecting data

Everything is ready to go. Open your table using the DynamoDB console and go to the Triggers tab. Select the Create trigger drop down list and choose Existing Lambda function. In the pop-up window, select the function that you just created, and choose the Create button.

At this point, you can start collecting data with the Python script that I’ve provided. The first one will create a script that will pull public Twitter data and the other will generate fake tweets using Lorem Ipsum text.

Configure Amazon Athena to read the data

Next, you will configure Amazon Athena so that it can read the data Kinesis Firehose outputs to Amazon S3 and allow you to analyze the data as needed. You can connect to Athena directly from the Athena console, and you can establish a connection using JDBC or the Athena API. In this example, I’m going to demonstrate what this looks like on the Athena console.

First, create a new database and a new table. You can do this by running the following two queries. The first query creates a new database:

CREATE DATABASE IF NOT EXISTS ddbtablestats

And the second query creates a new table:

CREATE EXTERNAL TABLE IF NOT EXISTS ddbtablestats.twitterfeed (
    `table_name` string,
    `user_id` string,
    `tweet_id` bigint,
    `approx_post_time` timestamp 
) PARTITIONED BY (
    year string,
    month string,
    day string,
    hour string 
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('serialization.format' = '1')
LOCATION 's3://myBucket/dynamodb/streams/transactions/'

Note that this table is created using partitions. Partitioning separates your data into logical parts based on certain criteria, such as date, location, language, etc. This allows Athena to selectively pull your data without needing to process the entire data set. This effectively minimizes the query execution time, and it also allows you to have greater control over the data that you want to query.

After the query has completed, you should be able to see the table in the left side pane of the Athena dashboard.

After the database and table have been created, execute the ALTER TABLE query to populate the partitions in your table. Replace the date with the current date when the script was executed.

ALTER TABLE ddbtablestats.TwitterFeed ADD IF NOT EXISTS
PARTITION (year='2017',month='05',day='17',hour='01') location 's3://myBucket/dynamodb/streams/transactions/2017/05/17/01/'

Using the Athena console, you’ll need to manually populate each partition for each additional partition that you’d like to analyze, however you can programmatically automate this process by using the JDBC driver or any AWS SDK of your choice.

For more information on partitioning in Athena, check out our documentation:

http://docs.aws.amazon.com/athena/latest/ug/partitions.html

Querying the data in Amazon Athena

This is it! Let’s run this query to see the top 10 most active Twitter users in the last 24 hours. You can do this from the Athena console:

SELECT user_id, COUNT(DISTINCT tweet_id) tweets FROM ddbTableStats.TwitterFeed
WHERE year='2017' AND month='05' AND day='17'
GROUP BY user_id
ORDER BY tweets DESC
LIMIT 10

The result should look similar to the following:

Linking Athena to Amazon QuickSight

Finally, to make this data available to a larger audience, let’s visualize this data in Amazon QuickSight. Amazon QuickSight provides native connectivity to AWS data sources such as Amazon Redshift, Amazon RDS, and Amazon Athena. Amazon QuickSight can also connect to on-premises databases, Excel, or CSV files, and it can connect to cloud data sources such as Salesforce.com. For this solution, we will connect Amazon QuickSight to the Athena table we just created.

Amazon QuickSight has a free tier that provides 1 user and 1GB of SPICE (Superfast Parallel In-memory Calculated Engine) capacity free. So you can sign up and use QuickSight free of charge.

When you are signing up for Amazon QuickSight, ensure that you grant permissions for QuickSight to connect to Athena and the S3 bucket where the data is stored.

After you’ve signed up, navigate to the new analysis button, and choose new data set, and then select the Athena data source option. Create a new name for your data source and proceed to the next prompt. At this point, you should see the Athena table you created earlier.

Choose the option to import the data to SPICE for a quicker analysis. SPICE is an in-memory optimized calculation engine that is designed for quick data visualization through parallel processing. SPICE also enables you to refresh your data sets at a regular interval or on-demand as you want.

In the dialog box, confirm this data set creation, and you’ll arrive on the landing page where you can start building your graph. The X-axis will represent the user_id and the Value will be used to represent the SUM total of the tweets from each user.

The Amazon QuickSight report looks like this:

Through this visualization, I can easily see that there are 3 users that tweeted over 20 times that day and that the majority of the users have fewer than 10 tweets that day. I can also set up a scheduled refresh of my SPICE dataset so that I have a dashboard that is regularly updated with the latest data.

Closing thoughts

Here are the benefits that you can gain from using this architecture:

  1. You can optimize the design of your DynamoDB schema that follows AWS best practice recommendations.
  1. You can run analysis and data intelligence in order to understand the current customer demands for your business.
  1. You can store incremental backup for future auditing.

The flexibility of our AWS services invites you to create and design the ideal workflow for your production at any scale, and, as always, if you ever need some guidance, don’t hesitate to reach out to us.I  hope this has been helpful to you! Please leave any questions and comments below.

 


Additional Reading

Learn how to analyze VPC Flow Logs with Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight.


About the Author

Rendy Oka is a Big Data Support Engineer for Amazon Web Services. He provides consultations and architectural designs and partners with the TAMs, Solution Architects, and AWS product teams to help develop solutions for our customers. He is also a team lead for the big data support team in Seattle. Rendy has traveled to dozens of countries around the world and takes every opportunity to experience the local culture wherever he goes

 

 

 

 

Perform More Productive Audits of Your AWS Resources by Using the New AWS Auditor Learning Path

Post Syndicated from Jodi Scrofani original https://aws.amazon.com/blogs/security/perform-more-productive-audits-of-your-aws-resources-by-using-the-new-aws-auditor-learning-path/

Auditing image

AWS customers in highly regulated industries such as financial services and healthcare tend to undergo frequent security audits. To help make these audits more productive, AWS has released the AWS Auditor Learning Path. This set of online and in-person classes provides foundational and advanced education about implementing security in the AWS Cloud and using AWS tools to gather the information necessary to audit an AWS environment. The Learning Path also includes a set of self-paced labs to help you gain hands-on experience for auditing your use of AWS services.

After completing the AWS Auditor Learning Path, you should have an understanding of how your IT department consumes AWS services and be able to more effectively engage with your compliance and security teams. The Learning Path is specifically designed for:

  • Auditing executives
  • Field auditors
  • Specialized internal auditors

To get started today, see the AWS Auditor Learning Path.

– Jodi

Introducing the Self-Service Business Associate Addendum

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/introducing-the-self-service-business-associate-addendum/

HIPAA logo

Today, we made available a new feature in AWS Artifact (our auditing and compliance portal) that enables you to review, accept, and track the status of your Business Associate Addendum (BAA). With this new feature, you can accept the terms of a BAA online, and instantly designate an AWS account as a “HIPAA Account” for use with protected health information (PHI) under the U.S. Health Insurance Portability and Accountability Act (HIPAA). In addition, you can sign in to AWS Artifact to confirm that your account is designated as a HIPAA Account, and review the terms of the BAA for that account. If you are no longer using a designated HIPAA Account in connection with PHI, you can remove that designation using the AWS Artifact interface.

Today’s release addresses two key customer needs in particular: (1) the need to enter into a BAA quickly, and (2) the need to easily track and control whether an AWS account is designated as a HIPAA Account under a BAA.

The BAA is the first specialized industry agreement that AWS is making available online. We chose to launch with the BAA as a commitment to AWS customer organizations who are reinventing the way healthcare is researched and delivered with the cloud. Many AWS customers have great stories to tell as we work together to use technology to advance the healthcare industry.

If you already have a BAA with AWS, or if you are considering designing or migrating a new solution that will create, receive, maintain, or transmit PHI on AWS, you can use AWS Artifact to manage your HIPAA Accounts today. As with all AWS Artifact features, there are no additional fees for using AWS Artifact to review, accept, and manage BAAs online.

– Chad

NSA Document Outlining Russian Attempts to Hack Voter Rolls

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/06/nsa_document_ou.html

This week brought new public evidence about Russian interference in the 2016 election. On Monday, the Intercept published a top-secret National Security Agency document describing Russian hacking attempts against the US election system. While the attacks seem more exploratory than operational ­– and there’s no evidence that they had any actual effect ­– they further illustrate the real threats and vulnerabilities facing our elections, and they point to solutions.

The document describes how the GRU, Russia’s military intelligence agency, attacked a company called VR Systems that, according to its website, provides software to manage voter rolls in eight states. The August 2016 attack was successful, and the attackers used the information they stole from the company’s network to launch targeted attacks against 122 local election officials on October 27, 12 days before the election.

That is where the NSA’s analysis ends. We don’t know whether those 122 targeted attacks were successful, or what their effects were if so. We don’t know whether other election software companies besides VR Systems were targeted, or what the GRU’s overall plan was — if it had one. Certainly, there are ways to disrupt voting by interfering with the voter registration process or voter rolls. But there was no indication on Election Day that people found their names removed from the system, or their address changed, or anything else that would have had an effect — anywhere in the country, let alone in the eight states where VR Systems is deployed. (There were Election Day problems with the voting rolls in Durham, NC ­– one of the states that VR Systems supports ­– but they seem like conventional errors and not malicious action.)

And 12 days before the election (with early voting already well underway in many jurisdictions) seems far too late to start an operation like that. That is why these attacks feel exploratory to me, rather than part of an operational attack. The Russians were seeing how far they could get, and keeping those accesses in their pocket for potential future use.

Presumably, this document was intended for the Justice Department, including the FBI, which would be the proper agency to continue looking into these hacks. We don’t know what happened next, if anything. VR Systems isn’t commenting, and the names of the local election officials targeted did not appear in the NSA document.

So while this document isn’t much of a smoking gun, it’s yet more evidence of widespread Russian attempts to interfere last year.

The document was, allegedly, sent to the Intercept anonymously. An NSA contractor, Reality Leigh Winner, was arrested Saturday and charged with mishandling classified information. The speed with which the government identified her serves as a caution to anyone wanting to leak official US secrets.

The Intercept sent a scan of the document to another source during its reporting. That scan showed a crease in the original document, which implied that someone had printed the document and then carried it out of some secure location. The second source, according to the FBI’s affidavit against Winner, passed it on to the NSA. From there, NSA investigators were able to look at their records and determine that only six people had printed out the document. (The government may also have been able to track the printout through secret dots that identified the printer.) Winner was the only one of those six who had been in e-mail contact with the Intercept. It is unclear whether the e-mail evidence was from Winner’s NSA account or her personal account, but in either case, it’s incredibly sloppy tradecraft.

With President Trump’s election, the issue of Russian interference in last year’s campaign has become highly politicized. Reports like the one from the Office of the Director of National Intelligence in January have been criticized by partisan supporters of the White House. It’s interesting that this document was reported by the Intercept, which has been historically skeptical about claims of Russian interference. (I was quoted in their story, and they showed me a copy of the NSA document before it was published.) The leaker was even praised by WikiLeaks founder Julian Assange, who up until now has been traditionally critical of allegations of Russian election interference.

This demonstrates the power of source documents. It’s easy to discount a Justice Department official or a summary report. A detailed NSA document is much more convincing. Right now, there’s a federal suit to force the ODNI to release the entire January report, not just the unclassified summary. These efforts are vital.

This hack will certainly come up at the Senate hearing where former FBI director James B. Comey is scheduled to testify Thursday. Last year, there were several stories about voter databases being targeted by Russia. Last August, the FBI confirmed that the Russians successfully hacked voter databases in Illinois and Arizona. And a month later, an unnamed Department of Homeland Security official said that the Russians targeted voter databases in 20 states. Again, we don’t know of anything that came of these hacks, but expect Comey to be asked about them. Unfortunately, any details he does know are almost certainly classified, and won’t be revealed in open testimony.

But more important than any of this, we need to better secure our election systems going forward. We have significant vulnerabilities in our voting machines, our voter rolls and registration process, and the vote tabulation systems after the polls close. In January, DHS designated our voting systems as critical national infrastructure, but so far that has been entirely for show. In the United States, we don’t have a single integrated election. We have 50-plus individual elections, each with its own rules and its own regulatory authorities. Federal standards that mandate voter-verified paper ballots and post-election auditing would go a long way to secure our voting system. These attacks demonstrate that we need to secure the voter rolls, as well.

Democratic elections serve two purposes. The first is to elect the winner. But the second is to convince the loser. After the votes are all counted, everyone needs to trust that the election was fair and the results accurate. Attacks against our election system, even if they are ultimately ineffective, undermine that trust and ­– by extension ­– our democracy. Yes, fixing this will be expensive. Yes, it will require federal action in what’s historically been state-run systems. But as a country, we have no other option.

This essay previously appeared in the Washington Post.

Building High-Throughput Genomics Batch Workflows on AWS: Workflow Layer (Part 4 of 4)

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/building-high-throughput-genomics-batch-workflows-on-aws-workflow-layer-part-4-of-4/

Aaron Friedman is a Healthcare and Life Sciences Partner Solutions Architect at AWS

Angel Pizarro is a Scientific Computing Technical Business Development Manager at AWS

This post is the fourth in a series on how to build a genomics workflow on AWS. In Part 1, we introduced a general architecture, shown below, and highlighted the three common layers in a batch workflow:

  • Job
  • Batch
  • Workflow

In Part 2, you built a Docker container for each job that needed to run as part of your workflow, and stored them in Amazon ECR.

In Part 3, you tackled the batch layer and built a scalable, elastic, and easily maintainable batch engine using AWS Batch. This solution took care of dynamically scaling your compute resources in response to the number of runnable jobs in your job queue length as well as managed job placement.

In part 4, you build out the workflow layer of your solution using AWS Step Functions and AWS Lambda. You then run an end-to-end genomic analysis―specifically known as exome secondary analysis―for many times at a cost of less than $1 per exome.

Step Functions makes it easy to coordinate the components of your applications using visual workflows. Building applications from individual components that each perform a single function lets you scale and change your workflow quickly. You can use the graphical console to arrange and visualize the components of your application as a series of steps, which simplify building and running multi-step applications. You can change and add steps without writing code, so you can easily evolve your application and innovate faster.

An added benefit of using Step Functions to define your workflows is that the state machines you create are immutable. While you can delete a state machine, you cannot alter it after it is created. For regulated workloads where auditing is important, you can be assured that state machines you used in production cannot be altered.

In this blog post, you will create a Lambda state machine to orchestrate your batch workflow. For more information on how to create a basic state machine, please see this Step Functions tutorial.

All code related to this blog series can be found in the associated GitHub repository here.

Build a state machine building block

To skip the following steps, we have provided an AWS CloudFormation template that can deploy your Step Functions state machine. You can use this in combination with the setup you did in part 3 to quickly set up the environment in which to run your analysis.

The state machine is composed of smaller state machines that submit a job to AWS Batch, and then poll and check its execution.

The steps in this building block state machine are as follows:

  1. A job is submitted.
    Each analytical module/job has its own Lambda function for submission and calls the batchSubmitJob Lambda function that you built in the previous blog post. You will build these specialized Lambda functions in the following section.
  2. The state machine queries the AWS Batch API for the job status.
    This is also a Lambda function.
  3. The job status is checked to see if the job has completed.
    If the job status equals SUCCESS, proceed to log the final job status. If the job status equals FAILED, end the execution of the state machine. In all other cases, wait 30 seconds and go back to Step 2.

Here is the JSON representing this state machine.

{
  "Comment": "A simple example that submits a Job to AWS Batch",
  "StartAt": "SubmitJob",
  "States": {
    "SubmitJob": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>::function:batchSubmitJob",
      "Next": "GetJobStatus"
    },
    "GetJobStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>:function:batchGetJobStatus",
      "Next": "CheckJobStatus",
      "InputPath": "$",
      "ResultPath": "$.status"
    },
    "CheckJobStatus": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.status",
          "StringEquals": "FAILED",
          "End": true
        },
        {
          "Variable": "$.status",
          "StringEquals": "SUCCEEDED",
          "Next": "GetFinalJobStatus"
        }
      ],
      "Default": "Wait30Seconds"
    },
    "Wait30Seconds": {
      "Type": "Wait",
      "Seconds": 30,
      "Next": "GetJobStatus"
    },
    "GetFinalJobStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:us-east-1:<account-id>:function:batchGetJobStatus",
      "End": true
    }
  }
}

Building the Lambda functions for the state machine

You need two basic Lambda functions for this state machine. The first one submits a job to AWS Batch and the second checks the status of the AWS Batch job that was submitted.

In AWS Step Functions, you specify an input as JSON that is read into your state machine. Each state receives the aggregate of the steps immediately preceding it, and you can specify which components a state passes on to its children. Because you are using Lambda functions to execute tasks, one of the easiest routes to take is to modify the input JSON, represented as a Python dictionary, within the Lambda function and return the entire dictionary back for the next state to consume.

Building the batchSubmitIsaacJob Lambda function

For Step 1 above, you need a Lambda function for each of the steps in your analysis workflow. As you created a generic Lambda function in the previous post to submit a batch job (batchSubmitJob), you can use that function as the basis for the specialized functions you’ll include in this state machine. Here is such a Lambda function for the Isaac aligner.

from __future__ import print_function

import boto3
import json
import traceback

lambda_client = boto3.client('lambda')



def lambda_handler(event, context):
    try:
        # Generate output put
        bam_s3_path = '/'.join([event['resultsS3Path'], event['sampleId'], 'bam/'])

        depends_on = event['dependsOn'] if 'dependsOn' in event else []

        # Generate run command
        command = [
            '--bam_s3_folder_path', bam_s3_path,
            '--fastq1_s3_path', event['fastq1S3Path'],
            '--fastq2_s3_path', event['fastq2S3Path'],
            '--reference_s3_path', event['isaac']['referenceS3Path'],
            '--working_dir', event['workingDir']
        ]

        if 'cmdArgs' in event['isaac']:
            command.extend(['--cmd_args', event['isaac']['cmdArgs']])
        if 'memory' in event['isaac']:
            command.extend(['--memory', event['isaac']['memory']])

        # Submit Payload
        response = lambda_client.invoke(
            FunctionName='batchSubmitJob',
            InvocationType='RequestResponse',
            LogType='Tail',
            Payload=json.dumps(dict(
                dependsOn=depends_on,
                containerOverrides={
                    'command': command,
                },
                jobDefinition=event['isaac']['jobDefinition'],
                jobName='-'.join(['isaac', event['sampleId']]),
                jobQueue=event['isaac']['jobQueue']
            )))

        response_payload = response['Payload'].read()

        # Update event
        event['bamS3Path'] = bam_s3_path
        event['jobId'] = json.loads(response_payload)['jobId']
        
        return event
    except Exception as e:
        traceback.print_exc()
        raise e

In the Lambda console, create a Python 2.7 Lambda function named batchSubmitIsaacJob and paste in the above code. Use the LambdaBatchExecutionRole that you created in the previous post. For more information, see Step 2.1: Create a Hello World Lambda Function.

This Lambda function reads in the inputs passed to the state machine it is part of, formats the data for the batchSubmitJob Lambda function, invokes that Lambda function, and then modifies the event dictionary to pass onto the subsequent states. You can repeat these for each of the other tools, which can be found in the tools//lambda/lambda_function.py script in the GitHub repo.

Building the batchGetJobStatus Lambda function

For Step 2 above, the process queries the AWS Batch DescribeJobs API action with jobId to identify the state that the job is in. You can put this into a Lambda function to integrate it with Step Functions.

In the Lambda console, create a new Python 2.7 function with the LambdaBatchExecutionRole IAM role. Name your function batchGetJobStatus and paste in the following code. This is similar to the batch-get-job-python27 Lambda blueprint.

from __future__ import print_function

import boto3
import json

print('Loading function')

batch_client = boto3.client('batch')

def lambda_handler(event, context):
    # Log the received event
    print("Received event: " + json.dumps(event, indent=2))
    # Get jobId from the event
    job_id = event['jobId']

    try:
        response = batch_client.describe_jobs(
            jobs=[job_id]
        )
        job_status = response['jobs'][0]['status']
        return job_status
    except Exception as e:
        print(e)
        message = 'Error getting Batch Job status'
        print(message)
        raise Exception(message)

Structuring state machine input

You have structured the state machine input so that general file references are included at the top-level of the JSON object, and any job-specific items are contained within a nested JSON object. At a high level, this is what the input structure looks like:

{
        "general_field_1": "value1",
        "general_field_2": "value2",
        "general_field_3": "value3",
        "job1": {},
        "job2": {},
        "job3": {}
}

Building the full state machine

By chaining these state machine components together, you can quickly build flexible workflows that can process genomes in multiple ways. The development of the larger state machine that defines the entire workflow uses four of the above building blocks. You use the Lambda functions that you built in the previous section. Rename each building block submission to match the tool name.

We have provided a CloudFormation template to deploy your state machine and the associated IAM roles. In the CloudFormation console, select Create Stack, choose your template (deploy_state_machine.yaml), and enter in the ARNs for the Lambda functions you created.

Continue through the rest of the steps and deploy your stack. Be sure to check the box next to "I acknowledge that AWS CloudFormation might create IAM resources."

Once the CloudFormation stack is finished deploying, you should see the following image of your state machine.

In short, you first submit a job for Isaac, which is the aligner you are using for the analysis. Next, you use parallel state to split your output from "GetFinalIsaacJobStatus" and send it to both your variant calling step, Strelka, and your QC step, Samtools Stats. These then are run in parallel and you annotate the results from your Strelka step with snpEff.

Putting it all together

Now that you have built all of the components for a genomics secondary analysis workflow, test the entire process.

We have provided sequences from an Illumina sequencer that cover a region of the genome known as the exome. Most of the positions in the genome that we have currently associated with disease or human traits reside in this region, which is 1–2% of the entire genome. The workflow that you have built works for both analyzing an exome, as well as an entire genome.

Additionally, we have provided prebuilt reference genomes for Isaac, located at:

s3://aws-batch-genomics-resources/reference/

If you are interested, we have provided a script that sets up all of that data. To execute that script, run the following command on a large EC2 instance:

make reference REGISTRY=<your-ecr-registry>

Indexing and preparing this reference takes many hours on a large-memory EC2 instance. Be careful about the costs involved and note that the data is available through the prebuilt reference genomes.

Starting the execution

In a previous section, you established a provenance for the JSON that is fed into your state machine. For ease, we have auto-populated the input JSON for you to the state machine. You can also find this in the GitHub repo under workflow/test.input.json:

{
  "fastq1S3Path": "s3://aws-batch-genomics-resources/fastq/SRR1919605_1.fastq.gz",
  "fastq2S3Path": "s3://aws-batch-genomics-resources/fastq/SRR1919605_2.fastq.gz",
  "referenceS3Path": "s3://aws-batch-genomics-resources/reference/hg38.fa",
  "resultsS3Path": "s3://<bucket>/genomic-workflow/results",
  "sampleId": "NA12878_states_1",
  "workingDir": "/scratch",
  "isaac": {
    "jobDefinition": "isaac-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/highPriority-myenv",
    "referenceS3Path": "s3://aws-batch-genomics-resources/reference/isaac/"
  },
  "samtoolsStats": {
    "jobDefinition": "samtools_stats-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/lowPriority-myenv"
  },
  "strelka": {
    "jobDefinition": "strelka-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/highPriority-myenv",
    "cmdArgs": " --exome "
  },
  "snpEff": {
    "jobDefinition": "snpeff-myenv:1",
    "jobQueue": "arn:aws:batch:us-east-1:<account-id>:job-queue/lowPriority-myenv",
    "cmdArgs": " -t hg38 "
  }
}

You are now at the stage to run your full genomic analysis. Copy the above to a new text file, change paths and ARNs to the ones that you created previously, and save your JSON input as input.states.json.

In the CLI, execute the following command. You need the ARN of the state machine that you created in the previous post:

aws stepfunctions start-execution --state-machine-arn <your-state-machine-arn> --input file://input.states.json

Your analysis has now started. By using Spot Instances with AWS Batch, you can quickly scale out your workflows while concurrently optimizing for cost. While this is not guaranteed, most executions of the workflows presented here should cost under $1 for a full analysis.

Monitoring the execution

The output from the above CLI command gives you the ARN that describes the specific execution. Copy that and navigate to the Step Functions console. Select the state machine that you created previously and paste the ARN into the search bar.

The screen shows information about your specific execution. On the left, you see where your execution currently is in the workflow.

In the following screenshot, you can see that your workflow has successfully completed the alignment job and moved onto the subsequent steps, which are variant calling and generating quality information about your sample.

You can also navigate to the AWS Batch console and see that progress of all of your jobs reflected there as well.

Finally, after your workflow has completed successfully, check out the S3 path to which you wrote all of your files. If you run a ls –recursive command on the S3 results path, specified in the input to your state machine execution, you should see something similar to the following:

2017-05-02 13:46:32 6475144340 genomic-workflow/results/NA12878_run1/bam/sorted.bam
2017-05-02 13:46:34    7552576 genomic-workflow/results/NA12878_run1/bam/sorted.bam.bai
2017-05-02 13:46:32         45 genomic-workflow/results/NA12878_run1/bam/sorted.bam.md5
2017-05-02 13:53:20      68769 genomic-workflow/results/NA12878_run1/stats/bam_stats.dat
2017-05-02 14:05:12        100 genomic-workflow/results/NA12878_run1/vcf/stats/runStats.tsv
2017-05-02 14:05:12        359 genomic-workflow/results/NA12878_run1/vcf/stats/runStats.xml
2017-05-02 14:05:12  507577928 genomic-workflow/results/NA12878_run1/vcf/variants/genome.S1.vcf.gz
2017-05-02 14:05:12     723144 genomic-workflow/results/NA12878_run1/vcf/variants/genome.S1.vcf.gz.tbi
2017-05-02 14:05:12  507577928 genomic-workflow/results/NA12878_run1/vcf/variants/genome.vcf.gz
2017-05-02 14:05:12     723144 genomic-workflow/results/NA12878_run1/vcf/variants/genome.vcf.gz.tbi
2017-05-02 14:05:12   30783484 genomic-workflow/results/NA12878_run1/vcf/variants/variants.vcf.gz
2017-05-02 14:05:12    1566596 genomic-workflow/results/NA12878_run1/vcf/variants/variants.vcf.gz.tbi

Modifications to the workflow

You have now built and run your genomics workflow. While diving deep into modifications to this architecture are beyond the scope of these posts, we wanted to leave you with several suggestions of how you might modify this workflow to satisfy additional business requirements.

  • Job tracking with Amazon DynamoDB
    In many cases, such as if you are offering Genomics-as-a-Service, you might want to track the state of your jobs with DynamoDB to get fine-grained records of how your jobs are running. This way, you can easily identify the cost of individual jobs and workflows that you run.
  • Resuming from failure
    Both AWS Batch and Step Functions natively support job retries and can cover many of the standard cases where a job might be interrupted. There may be cases, however, where your workflow might fail in a way that is unpredictable. In this case, you can use custom error handling with AWS Step Functions to build out a workflow that is even more resilient. Also, you can build in fail states into your state machine to fail at any point, such as if a batch job fails after a certain number of retries.
  • Invoking Step Functions from Amazon API Gateway
    You can use API Gateway to build an API that acts as a "front door" to Step Functions. You can create a POST method that contains the input JSON to feed into the state machine you built. For more information, see the Implementing Serverless Manual Approval Steps in AWS Step Functions and Amazon API Gateway blog post.

Conclusion

While the approach we have demonstrated in this series has been focused on genomics, it is important to note that this can be generalized to nearly any high-throughput batch workload. We hope that you have found the information useful and that it can serve as a jump-start to building your own batch workloads on AWS with native AWS services.

For more information about how AWS can enable your genomics workloads, be sure to check out the AWS Genomics page.

Other posts in this four-part series:

Please leave any questions and comments below.

Building a Secure Cross-Account Continuous Delivery Pipeline

Post Syndicated from Anuj Sharma original https://aws.amazon.com/blogs/devops/aws-building-a-secure-cross-account-continuous-delivery-pipeline/

Most organizations create multiple AWS accounts because they provide the highest level of resource and security isolation. In this blog post, I will discuss how to use cross account AWS Identity and Access Management (IAM) access to orchestrate continuous integration and continuous deployment.

Do I need multiple accounts?

If you answer “yes” to any of the following questions you should consider creating more AWS accounts:

  • Does your business require administrative isolation between workloads? Administrative isolation by account is the most straightforward way to grant independent administrative groups different levels of administrative control over AWS resources based on workload, development lifecycle, business unit (BU), or data sensitivity.
  • Does your business require limited visibility and discoverability of workloads? Accounts provide a natural boundary for visibility and discoverability. Workloads cannot be accessed or viewed unless an administrator of the account enables access to users managed in another account.
  • Does your business require isolation to minimize blast radius? Separate accounts help define boundaries and provide natural blast-radius isolation to limit the impact of a critical event such as a security breach, an unavailable AWS Region or Availability Zone, account suspensions, and so on.
  • Does your business require a particular workload to operate within AWS service limits without impacting the limits of another workload? You can use AWS account service limits to impose restrictions on a business unit, development team, or project. For example, if you create an AWS account for a project group, you can limit the number of Amazon Elastic Compute Cloud (Amazon EC2) or high performance computing (HPC) instances that can be launched by the account.
  • Does your business require strong isolation of recovery or auditing data? If regulatory requirements require you to control access and visibility to auditing data, you can isolate the data in an account separate from the one where you run your workloads (for example, by writing AWS CloudTrail logs to a different account).
  • Do your workloads depend on specific instance reservations to support high availability (HA) or disaster recovery (DR) capacity requirements? Reserved Instances (RIs) ensure reserved capacity for services such as Amazon EC2 and Amazon Relational Database Service (Amazon RDS) at the individual account level.

Use case

The identities in this use case are set up as follows:

  • DevAccount

Developers check the code into an AWS CodeCommit repository. It stores all the repositories as a single source of truth for application code. Developers have full control over this account. This account is usually used as a sandbox for developers.

  • ToolsAccount

A central location for all the tools related to the organization, including continuous delivery/deployment services such as AWS CodePipeline and AWS CodeBuild. Developers have limited/read-only access in this account. The Operations team has more control.

  • TestAccount

Applications using the CI/CD orchestration for test purposes are deployed from this account. Developers and the Operations team have limited/read-only access in this account.

  • ProdAccount

Applications using the CI/CD orchestration tested in the ToolsAccount are deployed to production from this account. Developers and the Operations team have limited/read-only access in this account.

Solution

In this solution, we will check in sample code for an AWS Lambda function in the Dev account. This will trigger the pipeline (created in AWS CodePipeline) and run the build using AWS CodeBuild in the Tools account. The pipeline will then deploy the Lambda function to the Test and Prod accounts.

 

Setup

  1. Clone this repository. It contains the AWS CloudFormation templates that we will use in this walkthrough.
git clone https://github.com/awslabs/aws-refarch-cross-account-pipeline.git
  1. Follow the instructions in the repository README to push the sample AWS Lambda application to an AWS CodeCommit repository in the Dev account.
  2. Install the AWS Command Line Interface as described here. To prepare your access keys or assume-role to make calls to AWS, configure the AWS CLI as described here.

Walkthrough

Note: Follow the steps in the order they’re written. Otherwise, the resources might not be created correctly.

  1. In the Tools account, deploy this CloudFormation template. It will create the customer master keys (CMK) in AWS Key Management Service (AWS KMS), grant access to Dev, Test, and Prod accounts to use these keys, and create an Amazon S3 bucket to hold artifacts from AWS CodePipeline.
aws cloudformation deploy --stack-name pre-reqs \
--template-file ToolsAcct/pre-reqs.yaml --parameter-overrides \
DevAccount=ENTER_DEV_ACCT TestAccount=ENTER_TEST_ACCT \
ProductionAccount=ENTER_PROD_ACCT

In the output section of the CloudFormation console, make a note of the Amazon Resource Number (ARN) of the CMK and the S3 bucket name. You will need them in the next step.

  1. In the Dev account, which hosts the AWS CodeCommit repository, deploy this CloudFormation template. This template will create the IAM roles, which will later be assumed by the pipeline running in the Tools account. Enter the AWS account number for the Tools account and the CMK ARN.
aws cloudformation deploy --stack-name toolsacct-codepipeline-role \
--template-file DevAccount/toolsacct-codepipeline-codecommit.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides ToolsAccount=ENTER_TOOLS_ACCT CMKARN=FROM_1st_Step
  1. In the Test and Prod accounts where you will deploy the Lambda code, execute this CloudFormation template. This template creates IAM roles, which will later be assumed by the pipeline to create, deploy, and update the sample AWS Lambda function through CloudFormation.
aws cloudformation deploy --stack-name toolsacct-codepipeline-cloudformation-role \
--template-file TestAccount/toolsacct-codepipeline-cloudformation-deployer.yaml \
--capabilities CAPABILITY_NAMED_IAM \
--parameter-overrides ToolsAccount=ENTER_TOOLS_ACCT CMKARN=FROM_1st_STEP  \
S3Bucket=FROM_1st_STEP
  1. In the Tools account, which hosts AWS CodePipeline, execute this CloudFormation template. This creates a pipeline, but does not add permissions for the cross accounts (Dev, Test, and Prod).
aws cloudformation deploy --stack-name sample-lambda-pipeline \
--template-file ToolsAcct/code-pipeline.yaml \
--parameter-overrides DevAccount=ENTER_DEV_ACCT TestAccount=ENTER_TEST_ACCT \
ProductionAccount=ENTER_PROD_ACCT CMKARN=FROM_1st_STEP \
S3Bucket=FROM_1st_STEP--capabilities CAPABILITY_NAMED_IAM
  1. In the Tools account, execute this CloudFormation template, which give access to the role created in step 4. This role will be assumed by AWS CodeBuild to decrypt artifacts in the S3 bucket. This is the same template that was used in step 1, but with different parameters.
aws cloudformation deploy --stack-name pre-reqs \
--template-file ToolsAcct/pre-reqs.yaml \
--parameter-overrides CodeBuildCondition=true
  1. In the Tools account, execute this CloudFormation template, which will do the following:
    1. Add the IAM role created in step 2. This role is used by AWS CodePipeline in the Tools account for checking out code from the AWS CodeCommit repository in the Dev account.
    2. Add the IAM role created in step 3. This role is used by AWS CodePipeline in the Tools account for deploying the code package to the Test and Prod accounts.
aws cloudformation deploy --stack-name sample-lambda-pipeline \
--template-file ToolsAcct/code-pipeline.yaml \
--parameter-overrides CrossAccountCondition=true \
--capabilities CAPABILITY_NAMED_IAM

What did we just do?

  1. The pipeline created in step 4 and updated in step 6 checks out code from the AWS CodeCommit repository. It uses the IAM role created in step 2. The IAM role created in step 4 has permissions to assume the role created in step 2. This role will be assumed by AWS CodeBuild to decrypt artifacts in the S3 bucket, as described in step 5.
  2. The IAM role created in step 2 has permission to check out code. See here.
  3. The IAM role created in step 2 also has permission to upload the checked-out code to the S3 bucket created in step 1. It uses the KMS keys created in step 1 for server-side encryption.
  4. Upon successfully checking out the code, AWS CodePipeline triggers AWS CodeBuild. The AWS CodeBuild project created in step 4 is configured to use the CMK created in step 1 for cryptography operations. See here. The AWS CodeBuild role is created later in step 4. In step 5, access is granted to the AWS CodeBuild role to allow the use of the CMK for cryptography.
  5. AWS CodeBuild uses pip to install any libraries for the sample Lambda function. It also executes the aws cloudformation package command to create a Lambda function deployment package, uploads the package to the specified S3 bucket, and adds a reference to the uploaded package to the CloudFormation template. See here.
  6. Using the role created in step 3, AWS CodePipeline executes the transformed CloudFormation template (received as an output from AWS CodeBuild) in the Test account. The AWS CodePipeline role created in step 4 has permissions to assume the IAM role created in step 3, as described in step 5.
  7. The IAM role assumed by AWS CodePipeline passes the role to an IAM role that can be assumed by CloudFormation. AWS CloudFormation creates and updates the Lambda function using the code that was built and uploaded by AWS CodeBuild.

This is what the pipeline looks like using the sample code:

Conclusion

Creating multiple AWS accounts provides the highest degree of isolation and is appropriate for a number of use cases. However, keeping a centralized account to orchestrate continuous delivery and deployment using AWS CodePipeline and AWS CodeBuild eliminates the need to duplicate the delivery pipeline. You can secure the pipeline through the use of cross account IAM roles and the encryption of artifacts using AWS KMS. For more information, see Providing Access to an IAM User in Another AWS Account That You Own in the IAM User Guide.

References

AWS Big Data Blog Month in Review: April 2017

Post Syndicated from Derek Young original https://aws.amazon.com/blogs/big-data/aws-big-data-blog-month-in-review-april-2017/

Another month of big data solutions on the Big Data Blog. Please take a look at our summaries below and learn, comment, and share. Thank you for reading!

NEW POSTS

Amazon QuickSight Spring Announcement: KPI Charts, Export to CSV, AD Connector, and More! 
In this blog post, we share a number of new features and enhancements in Amazon Quicksight. You can now create key performance indicator (KPI) charts, define custom ranges when importing Microsoft Excel spreadsheets, export data to comma separated value (CSV) format, and create aggregate filters for SPICE data sets. In the Enterprise Edition, we added an additional option to connect to your on-premises Active Directory using AD Connector. 

Securely Analyze Data from Another AWS Account with EMRFS
Sometimes, data to be analyzed is spread across buckets owned by different accounts. In order to ensure data security, appropriate credentials management needs to be in place. This is especially true for large enterprises storing data in different Amazon S3 buckets for different departments. This post shows how you can use a custom credentials provider to access S3 objects that cannot be accessed by the default credentials provider of EMRFS.

Querying OpenStreetMap with Amazon Athena
This post explains how anyone can use Amazon Athena to quickly query publicly available OSM data stored in Amazon S3 (updated weekly) as an AWS Public Dataset. Imagine that you work for an NGO interested in improving knowledge of and access to health centers in Africa. You might want to know what’s already been mapped, to facilitate the production of maps of surrounding villages, and to determine where infrastructure investments are likely to be most effective.

Build a Real-time Stream Processing Pipeline with Apache Flink on AWS
This post outlines a reference architecture for a consistent, scalable, and reliable stream processing pipeline that is based on Apache Flink using Amazon EMR, Amazon Kinesis, and Amazon Elasticsearch Service. An AWSLabs GitHub repository provides the artifacts that are required to explore the reference architecture in action. Resources include a producer application that ingests sample data into an Amazon Kinesis stream and a Flink program that analyses the data in real time and sends the result to Amazon ES for visualization.

Manage Query Workloads with Query Monitoring Rules in Amazon Redshift
Amazon Redshift is a powerful, fully managed data warehouse that can offer significantly increased performance and lower cost in the cloud. However, queries which hog cluster resources (rogue queries) can affect your experience. In this post, you learn how query monitoring rules can help spot and act against such queries. This, in turn, can help you to perform smooth business operations in supporting mixed workloads to maximize cluster performance and throughput.

Amazon QuickSight Now Supports Audit Logging with AWS CloudTrail
In this post, we announce support for AWS CloudTrail in Amazon QuickSight, which allows logging of QuickSight events across an AWS account. Whether you have an enterprise setting or a small team scenario, this integration will allow QuickSight administrators to accurately answer questions such as who last changed an analysis, or who has connected to sensitive data. With CloudTrail, administrators have better governance, auditing and risk management of their QuickSight usage.

Near Zero Downtime Migration from MySQL to DynamoDB
This post introduces two methods of seamlessly migrating data from MySQL to DynamoDB, minimizing downtime and converting the MySQL key design into one more suitable for NoSQL.


Want to learn more about Big Data or Streaming Data? Check out our Big Data and Streaming data educational pages.

Leave a comment below to let us know what big data topics you’d like to see next on the AWS Big Data Blog.

Amazon QuickSight Now Supports Audit Logging with AWS CloudTrail

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/amazon-quicksight-now-supports-logging-with-aws-cloudtrail/

QuickSight diagram

Amazon QuickSight democratizes business intelligence, making it easier and cheaper for you to provide advanced business analytics capabilities to everyone in your organization. Amazon QuickSight also enables you to understand your business better and helps you make data-driven decisions more quickly. However, determining who has access to which data in your organization can still be an administrative challenge.

Today, we are happy to announce that Amazon QuickSight now supports AWS CloudTrail, which enables you to log Amazon QuickSight events across your account. Amazon QuickSight administrators can now quickly and accurately answer questions such as who changed an analysis last or who connected to a sensitive database. CloudTrail support in Amazon QuickSight gives your administrators better governance, auditing, and risk management of your company’s Amazon QuickSight usage.

To learn more, see the full AWS Big Data Blog post.

– Craig

Amazon QuickSight Now Supports Audit Logging with AWS CloudTrail

Post Syndicated from Jose Kunnackal original https://aws.amazon.com/blogs/big-data/amazon-quicksight-now-supports-audit-logging-with-cloudtrail/

We launched Amazon QuickSight to democratize BI. Our goal is to make it easier and cheaper to roll out advanced business analytics capabilities to everyone in an organization. Overall, this enables better understanding of business, and allows faster data-driven decisions in an organization. In the past, the ability to share data presented an administrative challenge – that of knowing who has access to what data. Solving this problem ensures compliance with policies, and also provides an opportunity for businesses to see how employees use data to drive crucial decisions.

Today, we are happy to announce support for AWS CloudTrail in Amazon QuickSight, which allows logging of QuickSight events across an AWS account. Whether you have an enterprise setting or a small team scenario, this integration will allow QuickSight administrators to accurately answer questions such as who last changed an analysis, or who has connected to sensitive data. With CloudTrail, administrators have better governance, auditing and risk management of their QuickSight usage

You can get started with CloudTrail with just a few clicks. Any AWS account that is enabled for CloudTrail will automatically see QuickSight activity included in the CloudTrail logs. When enabled, CloudTrail starts logging events including:

  • Account subscribe/unsubscribe
  • Data source create/update/delete
  • Data set create/update/delete
  • Analysis create/access/update/delete
  • Dashboard create/access/update/delete
  • SPICE capacity purchases
  • User subscription purchases

A full list of all supported events can be found in the QuickSight documentation.

With CloudTrail logging enabled, you can easily track QuickSight activities in your account, starting with the question of who signed up for the service.

Beyond sign up, the QuickSight CloudTrail integration logs details of both data access and user management actions within the account. For example, attempts to connect to an external data source (below).

Or, sharing of a dataset:

Or even a simple dashboard access (below):

Through these details, administrators can track the origins of malicious activity, better understand user activity, or simply ensure that QuickSight activity is in compliance with policy. As with all CloudTrail data, the log is hosted in an S3 bucket configured by the AWS Account administrator. QuickSight users will not have access to this bucket, unless permitted by the administrator.

Finally, what better way to understand access and usage than to visualize the data in QuickSight! Using the native Amazon Athena integration, you can follow the steps here to pull the CloudTrail data from S3 into Athena. You can choose to query your data using Athena, or to bring data into SPICE for fast visualizations with QuickSight.

Here, I chose to pull data into SPICE, using a custom SQL statement to pull in only the specific fields that I was interested in.

Once QuickSight ingests the data into SPICE, I can analyze it using a variety of visualization types and filter capabilities.

First, I created a visual to display all the events recorded in this AWS account. As an administrator, I can view account activity at a glance, quickly identifying anything out of the ordinary. For example, if there are spikes in connection attempts, as the administrator, I might want to look into the specific user attempting to connect, and then try to understand whether this is malicious activity, or simply a case of an expired password.

To add the ability to view activity by username, I added the username field as a drill-down option in the QuickSight field wells.

This immediately makes the hierarchy available in the visual, allowing me to view the users who performed a specific activity.

It can also be useful to look at usage patterns. This allows administrators to better understand user behavior. You can see when users are just consuming dashboards and analyses, when they connect to specific data sources, when they purchase SPICE capacity, and so on. This allows you to detect hotspots in user behavior for abnormal activity, or to save costs by identifying inactive user accounts.

With Athena and QuickSight, you can get rich insights from your CloudTrail data in a few minutes!

AWS CloudTrail logging for QuickSight is available for both Standard and Enterprise Editions, across all supported QuickSight regions. Standard CloudTrail pricing applies for events generated via QuickSight.

Learn more

To learn more about these capabilities and start using them in your dashboards, check out the QuickSight User Guide.

Stay engaged

If you have questions and suggestions, you can post them on the QuickSight Discussion Forum.

Not a QuickSight user

Click here to get started for FREE.