Managing multi-tenant APIs using Amazon API Gateway

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/managing-multi-tenant-apis-using-amazon-api-gateway/

This post is written by Satish Mane, Solutions Architect.

Many ISVs provide platforms as a service in a multi-tenant environment. You can achieve multi-tenancy by partitioning the platform based on customer identifiers such as customer ID or account ID. The architecture for multi-tenant environments is often composed of authentication, authorization, a service layer, queues, and databases.

The primary focus of these architectures is to simplify the addition of more features. The multi-tenant design pattern has opened up new challenges and opportunities for software vendors thanks to microservice architectures gaining popularity. The challenge in a multi-tenant environment is that excessive load by a single customer, because of many requests to an API, can affect the entire platform.

This blog post looks at how to protect and monetize multi-tenant APIs using Amazon API Gateway. It describes a multi-tenant architecture design pattern based on a custom tenant ID to onboard customers. A tenant in a multi-tenant platform represents the customer having a group of users with common access, but individuals having specific permissions to the platform.

Overview

This example protects multi-tenant platform REST APIs using Amazon Cognito, Amazon API Gateway, and AWS Lambda.

In the following sections, you learn how to use the API Gateway’s usage plans to protect and productize multi-tenant platforms. Usage plans enable throttling of excessive API requests and apply an API usage quota policy. The user authenticates using Amazon Cognito to get a JSON Web Token (JWT) that is passed to API Gateway for authorization.

The multi-tenant platform that exposes REST APIs has clients such as a mobile app, a web application, and API clients that consume the REST APIs. This post focuses on protecting REST APIs with Amazon Cognito as the security layer for authenticating users and issuing tokens using OpenID Connect. The token contains the customer identity information, such as the tenant ID to which the users belong. API Gateway throttles the requests from a tenant only after the limit defined in the usage plans exceeds.

Architecture

This architecture shows the flow of user requests:

  1. The client application sends a request to Amazon Cognito using the /oauth/authorize or /login API. Amazon Cognito authenticates the user credentials.
  2. Amazon Cognito redirects using an authorization code grant and prompts the user to enter credentials. After authentication, it returns the authorization code.
  3. It then passes the authorization code to obtain a JWT from Amazon Cognito.
  4. Upon successful authentication, Amazon Cognito returns a JWT, such as acccess_token, id_token, refresh_token. The access/id token stores information about the granted permissions including tenant ID to which this user belongs to.
  5. The client application invokes the REST API that is deployed in API Gateway. The API request passes the JWT as a bearer token in the API request Authorization header.
  6. Since the tenant ID is hidden in the encrypted JWT token, the Lambda authorizer function validates and decodes the token, and extracts the tenant ID from the JWT.
  7. The Lambda token authorizer function returns an IAM policy along with tenant ID from the decoded token to which a user belongs.
  8. The application’s REST API is configured with usage plans against a custom API key, which is the tenant ID in API Gateway. API Gateway evaluates the IAM policy and looks up the usage policy using the API key. It throttles API requests if the number of requests exceed the throttle or quota limits in the usage policy.
  9. If the number of API requests is within the limit, then API Gateway sends requests to the downstream application REST API. This could be deployed using containers, Lambda, or an Amazon EC2 instance.

Customer (tenant) onboarding

There are multiple ways to set up multi-tenant applications. You can either create tenant-specific pools or add tenant ID as a custom attribute in each user profile. This blog uses the latter approach. The tenant ID is added to the JWT after successful authentication.

Since, tenant ID is an API key in API Gateway, the length of tenant ID must be a minimum of 20 characters. You can define the structure of tenant ID such as <customer id>-<random string>. As part of tenant onboarding, you can automate configuring the API key and usage plans in API Gateway using CDK APIs. Here, you configure the API key and usage plan as part of the solution deployment itself.

Authentication and authorization

You need a user pool and application client enabled with the authorization code mechanism for authenticating users. API Gateway can verify JWT OAuth tokens against single Amazon Cognito user pools. To get tenant information (tenant ID), use a custom Lambda authorizer function in API Gateway to verify the token, extract the tenant id, and return to API Gateway.

API Gateway usage plans

API Gateway supports the usage plan feature for REST APIs only. This solution uses an integration point as a MOCK integration type. You can use the usage plan to set the throttle and quota limit that are associated with API keys. API keys can be generated or you can use a custom key. To enforce usage plans for each tenant separately, use tenant ID as a prefix to a uniquely generated value to prepare the custom API key.

Configure API Gateway to integrate API key and Usage plan

You need to enable REST API to use the API key and set the source to AUTHORIZER. There are two ways to accept API keys for every incoming request. You can supply it as part of the incoming request HEADER or via a custom authorizer Lambda function. This example uses a custom authorizer Lambda function to retrieve the API key that is extracted from the JWT received through an incoming API request. Customers only pass encrypted JWTs in the request authorization header. These steps are automated using the AWS CDK.

Pre-requisites

Deploying the example

The example source code is available on GitHub. To deploy and configure solution:

  1. Clone the repository to your local machine.
    git clone https://github.com/aws-samples/api-gateway-usage-policy-based-api-protection
  2. Prepare the deployment package.
    cdk synth
    npm run build
    npm install --prefix aws-usage-policy-stack/lambda/src
  3. Configure the user pool in Amazon Cognito.
    npx cdk deploy CognitoStack
  4. Open the AWS Management Console and navigate to Amazon Cognito. Choose Manage user pool and select your user pool. Note down the pool ID under general settings.
    User pool
  5. Create a user with a tenant ID.
    aws cognito-idp admin-create-user --user-pool-id <REPLACE WITH COGNITO POOL ID> --username <REPLACE WITH USERNAME> \
    --user-attributes Name="given_name",Value="<REPLACE WITH FIRST NAME>" Name="family_name",Value="<REPLACE WITH LAST NAME>" " Name="custom:tenant_id",Value="<REPLACE WITH CUSTOMER ID>" \
    --temporary-password change1t
    
  6. To simplify testing the OAuth flow, use https://openidconnect.net/. In the configuration, set the JWKS well known URI.
    https://cognito-idp.<REPLACE WITH AWS REGION>.amazonaws.com/<REPLACE WITH COGNITO POOL ID>/.well-known/openid-configuration
  7. Test the OAuth flow with https://openidconnect.net/ to fetch the JWT ID token. Save the token in a text editor for later use.
  8. Open aws-usage-policy-stack/app.ts in an IDE and replace “NOT_DEFINED” with the 20-character long tenant ID from the previous section.
  9. Configure the user pool in API Gateway and create the Lambda function:
    npx cdk deploy ApigatewayStack
  10. After successfully deploying the API Gateway stack, open the AWS Management Console and select API Gateway. Locate ProductRestApi in the name column and note its ID.
    API Gateway console

Testing the example

Test the example using the following curl command. It throttles the requests to the deployed API based on defined limits and quotas. The following thresholds are preset: API quota limit of 5 requests/day, throttle limit of 10 requests/second, and a burst limit of 2 requests/second.

To simulate the scenario and start throttling requests.

  1. Open a terminal window.
  2. Install the curl utility if necessary.
  3. Run the following command six times after replacing placeholders with the correct values.
    curl -H "Authorization: Bearer <REPLACE WITH ID_TOKEN received in step 7 of Deploy Amazon Cognito Resources>" -X GET https://<REPLACE WITH REST API ID noted in step 10 of Deploy Amazon API Gateway resources>.execute-api.eu-west-1.amazonaws.com/dev/products.

You receive the message {“message”: “Limit Exceeded”} after you run the command for the sixth time. To repeat the tests, navigate to the API Gateway console. Change the quota limits in the usage plan and run the preceding command again. You can monitor HTTP/2 429 exceptions (Limit Exceeded) in API Gateway dashboard.

API Gateway console

Any changes to usage plan limits do not need redeployment of the API in API Gateway. You can change limits dynamically. Changes take a few seconds to become effective.

Cleaning up

To avoid incurring future charges, clean up the resources created. To delete the CDK stack, use the following command. Since there are multiple stacks, you must explicitly specify the name of the stacks.

cdk destroy CognitoStack ApigatewayStack

Conclusion

This post covers the API Gateway usage plan feature to protect multi-tenant APIs from excessive request loads and also as a product offering that enforces customer specific usage quotas.

To learn more about Amazon API Gateway, refer to Amazon API Gateway documentation. For more serverless learning resources, visit Serverless Land.

How to Secure App Development in the Cloud, With Tips From Gartner

Post Syndicated from Ben Austin original https://blog.rapid7.com/2022/06/22/how-to-secure-app-development-in-the-cloud-with-tips-from-gartner/

How to Secure App Development in the Cloud, With Tips From Gartner

Building applications in the cloud has been great for development speed and scalability, but it can sometimes feel more like a sustained migraine for security teams. How do you keep your cloud applications safe without resorting to a dizzying patchwork of overlapping tools and dispersed services?

Gartner® research on “Innovation Insight for Cloud-Native Application Protection Platforms” breaks down the core capabilities required to effectively reduce risk in your cloud environment, and how they might come together into a single solution or ecosystem to relieve your security headaches.

You can read the full report here. But if you’re tight for time, or just want to get a preview first, we’ve got you covered in this post.

At a high level, here’s what Gartner found in its research into cloud-native application protection platforms (CNAPP):

  • “To support [digital] initiatives, developers have embraced cloud-native application development, typically combining microservices-based architectures built using containers, assembled in DevOps-style development pipelines, deployed into programmatic cloud infrastructure and orchestrated at runtime using Kubernetes and maintained with an immutable infrastructure mindset. This shift creates significant challenges in securing these applications.”
  • “The unique characteristics of cloud-native applications makes them impossible to secure without a complex set of overlapping tools spanning development and production,” including infrastructure as code (IaC) scanning, cloud workload protection platforms (CWPP), cloud infrastructure entitlement management (CIEM), cloud security posture management (CSPM), and container management.
  • “Understanding and addressing the real risk of cloud-native applications requires advanced analytics combining siloed views of application risk, open-source component risk, cloud infrastructure risk, and runtime workload risk.”

Gartner also has a few recommendations for how to handle this new security paradigm:

  • “Implement an integrated security approach that covers the entire life cycle of cloud-native applications, starting in development and extending into production.”
  • “Integrate security into the developer’s toolchain so that security testing is automated as code is created and moves through the development pipeline, reducing the friction of adoption.”
  • “[Security and risk management] leaders should evaluate emerging cloud-native application protection platforms that provide a complete life cycle approach for security.”

Basically, securing app development in the cloud effectively is going to require tools that let you consolidate core security functions, get a clear view of your environment (and the risks it may contain), and empower your developers to incorporate security into the security pipeline.

So, what’s our take?

CNAPP represents the next evolution of cloud security through the unification of previously siloed feature sets or solutions. In previous years, just having tools that did one or more of these core functions provided by separate vendors was “good enough.” But over time, as cloud security programs across enterprises continued to scale and mature, it became clear that the dispersed nature of these tools made it extremely difficult, if not impossible, to get a true understanding of risk across complex cloud environments and make meaningful progress in operationalizing cloud security.

CNAPP is essentially a mindset that can save organizations from having to deploy a new set of technologies. It’s the idea that teams need a consolidated view of the different risks in their environment at the infrastructure, workload, orchestration, or API level, as well as unified workflows and automation capabilities to effectively mitigate those risks.

How to Secure App Development in the Cloud, With Tips From Gartner

The reality today, however, is that very few vendors can actually live up to the high bar that Gartner has set with CNAPP. The capabilities shown on the diagram above are extremely wide-ranging and span across multiple teams (DevSecOps and more) within an organization.

CNAPP is about more than just identifying a shopping list of capabilities that your security team needs. When considering how to build out a program to protect cloud-native applications, security teams should focus on driving toward a set of outcomes they hope to achieve. Gartner doesn’t define these outcomes in their CNAPP report, but based on our experience working with some of the most sophisticated cloud and application security teams in the world, some of those desired outcomes may include:

  1. An up-to-date, easily maintainable inventory of all infrastructure, workloads, and apps that make up your organization’s entire cloud footprint
  2. Centralized reporting on risk across the full application stack, including open-source and third-party components
  3. Ongoing, real-time monitoring of suspicious or malicious activity at both the application and infrastructure levels
  4. Integration into the development team’s CI/CD pipeline in order to prevent risks at scale before code is deployed
  5. Automated workflows, both for notification and remediation, to detect and respond to threats as quickly as possible, with minimal human intervention

Each team’s list of outcomes will vary slightly depending on operational maturity, compliance requirements, size and complexity of the cloud environment, and what types of applications they are protecting. Keeping these five outcomes top of mind while evaluating solutions will help your team build from a solid foundation and avoid simply checking boxes off a long list of capabilities.

CNAPP may be a mindset shift first and foremost – but at the end of the day, the capabilities needed to achieve this more holistic approach to cloud and application security have to live somewhere within your technology stack. A unified platform that supports all these needs can help break down unnecessary silos and make it easier to contextualize your security data across the entire cloud infrastructure.

GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

Gartner, Innovation Insight for Cloud-Native Application Protection Programs, by Neil MacDonald, Charlie Winckless, 25 August 2021

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Security updates for Wednesday

Post Syndicated from original https://lwn.net/Articles/898605/

Security updates have been issued by Debian (exo and ntfs-3g), Fedora (collectd, golang-github-cli-gh, grub2, qemu, and xen), Red Hat (httpd:2.4, kernel, and postgresql), SUSE (drbd, fwupdate, neomutt, and trivy), and Ubuntu (apache2, openssl, openssl1.0, and qemu).

Verify Apple devices with no installed software

Post Syndicated from Kenny Johnson original https://blog.cloudflare.com/private-attestation-token-device-posture/

Verify Apple devices with no installed software

Verify Apple devices with no installed software

One of the foundations of Zero Trust is determining if a user’s device is “healthy” — that it has its operating system up-to-date with the latest security patches, that it’s not jailbroken, that it doesn’t have malware installed, and so on. Traditionally, determining this has required installing software directly onto a user’s device.

Earlier this month, Cloudflare participated in the announcement of an open source standard called a Private Attestation Token. Device manufacturers who support the standard can now supply a Private Attestation Token with any request made by one of their devices. On the IT Administration side, Private Attestation Tokens means that security teams can verify a user’s device before they access a sensitive application — without the need to install any software or collect a user’s device data.

At WWDC 2022, Apple announced Private Attestation Tokens. Today, we’re announcing that Cloudflare Access will support verifying a Private Attestation token. This means that security teams that rely on Cloudflare Access can verify a user’s Apple device before they access a sensitive application — no additional software required.

Determining a “healthy” device

There are many solutions on the market that help security teams determine if a device is “healthy” and corporately managed. What the majority of these solutions have in common is that they require software to be installed directly on the user’s machine. This comes with challenges associated with client software including compatibility issues, version management, and end user support. Many companies have dedicated Mobile Device Management (MDM) tools to manage the software installed on employee machines.

MDM is a proven model, but it is also a challenge to manage — taking a dedicated team in many cases. What’s more, installing client or MDM software is not always possible for contractors, vendors or employees using personal machines. Security teams have to resort to VDI or VPN solutions for external users to securely access corporate applications.

How Private Attestation Tokens verify a device

Private Attestation Tokens leverage the Privacy Pass Protocol, which Cloudflare authored with major device manufacturers, to attest to a device’s health and integrity.

In order for Private Attestation Tokens to work, four parties agree to work in concert with a common framework to generate and exchange anonymous, unforgeable tokens. Without all four parties in the process, PATs won’t work.

  1. An Origin. A website, application, or API that receives requests from a client. When a website receives a request to their origin, the origin must know to look for and request a token from the client making the request. For Cloudflare customers, Cloudflare acts as the origin (on behalf of customers) and handles the requesting and processing of tokens.
  2. A Client. Whatever tool the visitor is using to attempt to access the Origin. This will usually be a web browser or mobile application. In our example, let’s say the client is a mobile Safari Browser.
  3. An Attester. The Attester is who the client asks to prove something (i.e that a mobile device has a valid IMEI) before a token can be issued. In our example below, the Attester is Apple, the device vendor.
  4. An Issuer. The issuer is the only one in the process that actually generates, or issues, a token. The Attester makes an API call to whatever Issuer the Origin has chosen to trust,  instructing the Issuer to produce a token. In our case, Cloudflare will also be the Issuer.
Verify Apple devices with no installed software

We are then able to rely on the attestation from the device manufacturer as a form of validation that a device is in a “healthy” enough state to be allowed access to a sensitive application.

Checking device health without client software

Private Attestation Tokens do not require any additional software to be installed on the user’s device. This is because the “attestation” of device health and validity is attested directly by the device operating system’s manufacturer — in this case, Apple.

This means that a security team can use Cloudflare Access and Private Attestation Tokens to verify if a user is accessing from a “healthy” Apple device before allowing access to a sensitive corporate application. Some checks as part of the attestation include:

  • Is the device on the latest OS version?
  • Is the device jailbroken?
  • Is the window attempting to log in, in focus?
  • And much more.

Over time, we are working with other device manufacturers to expand device support and what is verified as part of the device attestation process. The attributes that are attested will also continue to expand over time, which means the device verification in Access will only strengthen.

In the next few months, we will move Private Attestation Support in Cloudflare Access to a closed beta. The first version will work for iOS devices and support will expand from there. The only change required will be an updated Access policy, no software will need to be installed. If you would like to be part of the beta program, sign up here today!

How to augment or replace your VPN with Cloudflare

Post Syndicated from Michael Keane original https://blog.cloudflare.com/how-to-augment-or-replace-your-vpn/

How to augment or replace your VPN with Cloudflare

“Never trust, always verify.”

How to augment or replace your VPN with Cloudflare

Almost everyone we speak to these days understands and agrees with this fundamental principle of Zero Trust. So what’s stopping folks? The biggest gripe we hear: they simply aren’t sure where to start. Security tools and network infrastructure have often been in place for years, and a murky implementation journey involving applications that people rely on to do their work every day can feel intimidating.

While there’s no universal answer, several of our customers have agreed that offloading key applications from their traditional VPN to a cloud-native Zero Trust Network Access (ZTNA) solution like Cloudflare Access is a great place to start—providing an approachable, meaningful upgrade for their business.

In fact, Gartner predicted that “by 2025, at least 70% of new remote access deployments will be served predominantly by ZTNA as opposed to VPN services, up from less than 10% at the end of 2021.”1 By prioritizing a ZTNA project, IT and Security executives can better shield their business from attacks like ransomware while simultaneously improving their employees’ daily workflows. The trade-off between security and user experience is an outmoded view of the world; organizations can truly improve both if they go down the ZTNA route.

You can get started here with Cloudflare Access for free, and in this guide we’ll show you why, and how.

Why nobody likes their VPN

The network-level access and default trust granted by VPNs create avoidable security gaps by inviting the possibility of lateral movement within your network. Attackers may enter your network through a less-sensitive entry point after stealing credentials, and then traverse to find more business-critical information to exploit. In the face of rising attacks, the threat here is too real—and the path to mitigate is too within reach—to ignore.

How to augment or replace your VPN with Cloudflare

Meanwhile, VPN performance feels stuck in the 90s… and not in a fun, nostalgic way. Employees suffer through slow and unreliable connections that simply weren’t built for today’s scale of remote access. In the age of the “Great Reshuffle” and the current recruiting landscape, providing subpar experiences for teams based on legacy tech doesn’t have a great ROI. And when IT/security practitioners have plenty of other job opportunities readily available, they may not want to put up with manual, avoidable tasks born from an outdated technology stack. From both security and usability angles, moving toward VPN replacement is well worth the pursuit.

Make least-privilege access the default

Instead of authenticating a user and providing access to everything on your corporate network, a ZTNA implementation or “software-defined perimeter” authorizes access per resource, effectively eliminating the potential for lateral movement. Each access attempt is evaluated against Zero Trust rules based on identity, device posture, geolocation, and other contextual information. Users are continuously re-evaluated as context changes, and all events are logged to help improve visibility across all types of applications.

How to augment or replace your VPN with Cloudflare

As co-founder of Udaan, Amod Malviya, noted, “VPNs are frustrating and lead to countless wasted cycles for employees and the IT staff supporting them. Furthermore, conventional VPNs can lull people into a false sense of security. With Cloudflare Access, we have a far more reliable, intuitive, secure solution that operates on a per user, per access basis. I think of it as Authentication 2.0 — even 3.0″.

Better security and user experience haven’t always co-existed, but the fundamental architecture of ZTNA really does improve both compared to legacy VPNs. Whether your users are accessing Office 365 or your custom, on-prem HR app, every login experience is treated the same. With Zero Trust rules being checked behind the scenes, suddenly every app feels like a SaaS app to your end users. Like our friends at OneTrust said when they implemented ZTNA, “employees can connect to the tools they need, so simply teams don’t even know Cloudflare is powering the backend. It just works.”

Assembling a ZTNA project plan

VPNs are so entrenched in an organization’s infrastructure that fully replacing one may take a considerable amount of time, depending on the total number of users and applications served. However, there still is significant business value in making incremental progress. You can migrate away from your VPN at your own pace and let ZTNA and your VPN co-exist for some time, but it is important to at least get started.

Consider which one or two applications behind your VPN would be most valuable for a ZTNA pilot, like one with known complaints or numerous IT support tickets associated with it. Otherwise, consider internal apps that are heavily used or are visited by particularly critical or high-risk users. If you have any upcoming hardware upgrades or license renewals planned for your VPN(s), apps behind the accompanying infrastructure may also be a sensible fit for a modernization pilot.

As you start to plan your project, it’s important to involve the right stakeholders. For your ZTNA pilot, your core team should at minimum involve an identity admin and/or admin who manages internal apps used by employees, plus a network admin who understands your organization’s traffic flow as it relates to your VPN. These perspectives will help to holistically consider the implications of your project rollout, especially if the scope feels dynamic.

Executing a transition plan for a pilot app

Step 1: Connect your internal app to Cloudflare’s network
The Zero Trust dashboard guides you through a few simple steps to set up our app connector, no virtual machines required. Within minutes, you can create a tunnel for your application traffic and route it based on public hostnames or your private network routes. The dashboard will provide a string of commands to copy and paste into your command line to facilitate initial routing configurations. From there, Cloudflare will manage your configuration automatically.

A pilot web app may be the most straightforward place to start here, but you can also extend to SSH, VNC, RDP, or internal IPs and hostnames through the same workflow. With your tunnel up and running, you’ve created the means through which your users will securely access your resources and have essentially eliminated the potential for lateral movement within your network. Your application is not visible to the public Internet, significantly reducing your attack surface.

Step 2: Integrate identity and endpoint protection
Cloudflare Access acts as an aggregation layer for your existing security tools. With support for over a dozen identity providers (IdPs) like Okta, Microsoft Azure AD, Ping Identity, or OneLogin, you can link multiple simultaneous IdPs or separate tenants from one IdP. This can be particularly useful for companies undergoing mergers or acquisitions or perhaps going through compliance updates, e.g. incorporating a separate FedRAMP tenant.

In a ZTNA implementation, this linkage lets both tools play to their strengths. The IdP houses user stores and performs the identity authentication check, while Cloudflare Access controls the broader Zero Trust rules that ultimately decide access permissions to a broad range of resources.

Similarly, admins can integrate common endpoint protection providers like Crowdstrike, SentinelOne, Tanium or VMware Carbon Black to incorporate device posture into Zero Trust rulesets. Access decisions can incorporate device posture risk scores for tighter granularity.

You might find shortcut approaches to this step if you plan on using simpler authentication like one-time pins or social identity providers with external users like partners or contractors. As you mature your ZTNA rollout, you can incorporate additional IdPs or endpoint protection providers at any time without altering your fundamental setup. Each integration only adds to your source list of contextual signals at your disposal.

Step 3: Configure Zero Trust rules
Depending on your assurance levels for each app, you can customize your Zero Trust policies to appropriately restrict access to authorized users using contextual signals. For example, a low-risk app may simply require email addresses ending in “@company.com” and a successful SMS or email multifactor authentication (MFA) prompt. Higher risk apps could require hard token MFA specifically, plus a device posture check or other custom validation check using external APIs.

MFA in particular can be difficult to implement with legacy on-prem apps natively using traditional single sign-on tools. Using Cloudflare Access as a reverse proxy helps provide an aggregation layer to simplify rollout of MFA to all your resources, no matter where they live.

Step 4: Test clientless access right away
After connecting an app to Cloudflare and configuring your desired level of authorization rules, end users in most cases can test web, SSH, or VNC access without using a device client. With no downloads or mobile device management (MDM) rollouts required, this can help accelerate ZTNA adoption for key apps and be particularly useful for enabling third-party access.

Note that a device client can still be used to unlock other use cases like protecting SMB or thick client applications, verifying device posture, or enabling private routing. Cloudflare Access can handle any arbitrary L4-7 TCP or UDP traffic, and through bridges to WAN-as-a-service it can offload VPN use cases like ICMP or server-to-client initiated protocol traffic like VoIP as well.

How to augment or replace your VPN with Cloudflare

At this stage for the pilot app, you are up and running with ZTNA! Top priority apps can be offloaded from your VPN one at a time at any pace that feels comfortable to help modernize your access security. Still, augmenting and fully replacing a VPN are two very different things.

Moving toward full VPN replacement

While a few top resource candidates for VPN offloading might be clear for your company, the total scope could be overwhelming, with potentially thousands of internal IPs and domains to consider. You can configure the local domain fallback entries within Cloudflare Access to point to your internal DNS resolver for selected internal hostnames. This can help you more efficiently disseminate access to resources made available over your Intranet.

It can also be difficult for admins to granularly understand the full reach of their current VPN usage. Potential visibility issues aside, the full scope of applications and users may be in dynamic flux especially at large organizations. You can use the private network discovery report within Cloudflare Access to passively vet the state of traffic on your network over time. For discovered apps requiring more protection, Access workflows help you tighten Zero Trust rules as needed.

Both of these capabilities can help reduce anxiety around fully retiring a VPN. By starting to build your private network on top of Cloudflare’s network, you’re bringing your organization closer to achieving Zero Trust security.

The business impact our customers are seeing

Offloading applications from your VPN and moving toward ZTNA can have measurable benefits for your business even in the short term. Many of our customers speak to improvements in their IT team’s efficiency, onboarding new employees faster and spending less time on access-related help tickets. For example, after implementing Cloudflare Access, eTeacher Group reduced its employee onboarding time by 60%, helping all teams get up to speed faster.

Even if you plan to co-exist with your VPN alongside a slower modernization cadence, you can still track IT tickets for the specific apps you’ve transitioned to ZTNA to help quantify the impact. Are overall ticket numbers down? Did time to resolve decrease? Over time, you can also partner with HR for qualitative feedback through employee engagement surveys. Are employees feeling empowered with their current toolset? Do they feel their productivity has improved or complaints have been addressed?

Of course, improvements to security posture also help mitigate the risk of expensive data breaches and their lingering, damaging effects to brand reputation. Pinpointing narrow cause-and-effect relationships for the cost benefits of each small improvement may feel more art than science here, with too many variables to count. Still, reducing reliance on your VPN is a great step toward reducing your attack surface and contributes to your macro return on investment, however long your full Zero Trust journey may last.

Start the clock toward replacing your VPN

Our obsession with product simplicity has helped many of our customers sunset their VPNs already, and we can’t wait to do more.

You can get started here with Cloudflare Access for free to begin augmenting your VPN. Follow the steps outlined above with your prioritized ZTNA test cases, and for a sense of broader timing you can create your own Zero Trust roadmap as well to figure out what project should come next.

For a full summary of Cloudflare One Week and what’s new, tune in to our recap webinar.

___

1Nat Smith, Mark Wah, Christian Canales. (2022, April 08). Emerging Technologies: Adoption Growth Insights for Zero Trust Network Access. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

Introducing Private Network Discovery

Post Syndicated from Abe Carryl original https://blog.cloudflare.com/introducing-network-discovery/

Introducing Private Network Discovery

Introducing Private Network Discovery

With Cloudflare One, building your private network on Cloudflare is easy. What is not so easy is maintaining the security of your private network over time. Resources are constantly being spun up and down with new users being added and removed on a daily basis, making it painful to manage over time.

That’s why today we’re opening a closed beta for our new Zero Trust network discovery tool. With Private Network Discovery, our Zero Trust platform will now start passively cataloging both the resources being accessed and the users who are accessing them without any additional configuration required. No third party tools, commands, or clicks necessary.

To get started, sign-up for early access to the closed beta and gain instant visibility into your network today. If you’re interested in learning more about how it works and what else we will be launching in the future for general availability, keep scrolling.

One of the most laborious aspects of migrating to Zero Trust is replicating the security policies which are active within your network today. Even if you do have a point-in-time understanding of your environment, networks are constantly evolving with new resources being spun up dynamically for various operations. This results in a constant cycle to discover and secure applications which creates an endless backlog of due diligence for security teams.

That’s why we built Private Network Discovery. With Private Network Discovery, organizations can easily gain complete visibility into the users and applications that live on their network without any additional effort on their part. Simply connect your private network to Cloudflare, and we will surface any unique traffic we discover on your network to allow you to seamlessly translate them into Cloudflare Access applications.

Building your private network on Cloudflare

Building out a private network has two primary components: the infrastructure side, and the client side.

The infrastructure side of the equation is powered by Cloudflare Tunnel, which simply connects your infrastructure (whether that be a single application, many applications, or an entire network segment) to Cloudflare. This is made possible by running a simple command-line daemon in your environment to establish multiple secure, outbound-only links to Cloudflare. Simply put, Tunnel is what connects your network to Cloudflare.

On the other side of this equation, you need your end users to be able to easily connect to Cloudflare and, more importantly, your network. This connection is handled by our robust device agent, Cloudflare WARP. This agent can be rolled out to your entire organization in just a few minutes using your in-house MDM tooling, and it establishes a secure connection from your users’ devices to the Cloudflare network.

Introducing Private Network Discovery

Now that we have your infrastructure and your users connected to Cloudflare, it becomes easy to tag your applications and layer on Zero Trust security controls to verify both identity and device-centric rules for each and every request on your network.

How it works

As we mentioned earlier, we built this feature to help your team gain visibility into your network by passively cataloging unique traffic destined for an RFC 1918 or RFC 4193 address space. By design, this tool operates in an observability mode whereby all applications are surfaced, but are tagged with a base state of “Unreviewed.”

Introducing Private Network Discovery

The Network Discovery tool surfaces all origins within your network, defined as any unique IP address, port, or protocol. You can review the details of any given origin and then create a Cloudflare Access application to control access to that origin. It’s also worth noting that Access applications may be composed of more than one origin.

Let’s take, for example, a privately hosted video conferencing service, Jitsi. I’m using this example as our team actually uses this service internally to test our new features before pushing them into production. In this scenario, we know that our self-hosted instance of Jitsi lives at 10.0.0.1:443. However, as this is a video conferencing application, it communicates on both tcp:10.0.0.1:443 and udp:10.0.0.1:10000. Here we would select one origin and assign it an application name.

As a note, during the closed beta you will not be able to view this application in the Cloudflare Access application table. For now, these application names will only be reflected in the discovered origins table of the Private Network Discovery report. You will see them reflected in the Application name column exclusively. However, when this feature goes into general availability you’ll find all the applications you have created under Zero Trust > Access > Applications as well.

After you have assigned an application name and added your first origin, tcp:10.0.0.1:443, you can then follow the same pattern to add the other origin, udp:10.0.0.1:10000, as well. This allows you to create logical groupings of origins to create a more accurate representation of the resources on your network.

Introducing Private Network Discovery

By creating an application, our Network Discovery tool will automatically update the status of these individual origins from “Unreviewed” to “In-Review.” This will allow your team to easily track the origin’s status. From there, you can drill further down to review the number of unique users accessing a particular origin as well as the total number of requests each user has made. This will help equip your team with the information it needs to create identity and device-driven Zero Trust policies. Once your team is comfortable with a given application’s usage, you can then manually update the status of a given application to be either “Approved” or “Unapproved”.

What’s next

Our closed beta launch is just the beginning. While the closed beta release supports creating friendly names for your private network applications, those names do not currently appear in the Cloudflare Zero Trust policy builder.

As we move towards general availability, our top priority will be making it easier to secure your private network based on what is surfaced by the Private Network Discovery tool. With the general availability launch, you will be able to create Access applications directly from your Private Network Discovery report, reference your private network applications in Cloudflare Access and create Zero Trust security policies for those applications, all in one singular workflow.

As you can see, we have exciting plans for this tool and will continue investing in Private Network Discovery in the future. If you’re interested in gaining access to the closed beta, sign-up here and be among the first users to try it out!

Cloudflare recognized by Microsoft as a Security Software Innovator

Post Syndicated from Abhi Das original https://blog.cloudflare.com/cloudflare-recognized-by-microsoft-as-a-security-software-innovator/

Cloudflare recognized by Microsoft as a Security Software Innovator

This post is also available in 简体中文, Deutsch, Français, Español.

Cloudflare recognized by Microsoft as a Security Software Innovator

Recently, Microsoft announced the winners for the 2022 Microsoft Security Excellence Awards, a prestigious classification in the Microsoft partner community. We are honored to announce that Cloudflare has won the Security Software Innovator award. This award recognized Cloudflare’s innovative approach to Zero Trust and Security solutions. Our transformative technology in collaboration with Microsoft provides world-class joint solutions for our mutual customers.

Microsoft Security Excellence Awards

The third annual Microsoft Security awards celebrated finalists in 10 categories spanning security, compliance, and identity. Microsoft unveiled the winners of the Microsoft Security Partner Awards, voted on by a group of industry veterans, on June 6, 2022.

Through this award, Microsoft recognizes Cloudflare’s approach to constantly deliver the most innovative solutions for joint customers. Together with Microsoft, we have supported thousands of customers including many of the largest Fortune 500 companies on their Zero Trust journey, enabling customers to simply and easily support their security needs with faster performance.

Cloudflare has built deep integrations with Microsoft to help organizations take the next step in their Zero Trust journey. These integrations empower organizations to make customer implementations operationally efficient while delivering a seamless user experience. Currently, all our mutual customers benefit from several integrations across Microsoft 365 and Azure to secure web applications and safeguard employees with identity and device protections. Working with Microsoft has been critical in helping our customers on their Zero Trust journey. It is a complex undertaking that Cloudflare has been simplifying through our extremely easy to adopt product portfolio such as Cloudflare One via a  single pane of glass.

We want to thank Microsoft for its continued collaboration with Cloudflare. We are committed to serving our joint customers as we expand our integrations across Microsoft’s suite of products and continuously innovate against the latest threats.

“Partners are critical to solving customers’ constantly evolving security challenges and threat landscape. The close collaboration and deep integrations between Cloudflare and Microsoft ensure our joint customers are equipped with innovative technologies that are seamlessly integrated to address their security challenges. We are pleased to recognize Cloudflare with the Security Software Innovator Award at this year’s Microsoft Security Excellence Awards.”
– Ann Johnson, Corporate Vice President of Security, Compliance, Identity, and Management, Business Development at Microsoft.

Not only a must-have Zero Trust requires constant innovation

Cloudflare recognized by Microsoft as a Security Software Innovator
Perimeter based security models are breaking under pressure

The rapid transition to remote work and the rise of SaaS applications has disrupted how businesses need to think about protecting their networks. Organizations historically protected their sensitive applications and networks by building a “castle-and-moat”, piecing together disparate point solutions for each defensive layer.

Comprehensive solutions require a layered defensive architecture for Internet security (DNS and HTTPS filtering), endpoint and data protection (Remote Browser Isolation and Data-Loss Prevention) as well as SaaS app security (CASB) and connecting users both in-and-away from the office via private network connections. This model is difficult to implement and manage, and doesn’t scale in the modern workplace with users and applications residing everywhere that is connected to the Internet.

Why Zero Trust is a must-have:

  1. Apps can now live anywhere on-prem, cloud or SaaS
  2. Employees can access those resources from anywhere
  3. Attacks are getting more sophisticated constantly
  4. Internet is the new ‘Office’ away from ‘Castle-Moat’ model
Cloudflare recognized by Microsoft as a Security Software Innovator
Current world of how applications are deployed and accessed

Cloudflare One protects any application or network for users everywhere by running our full suite of product across our global network present in more than 270 cities around the world:

  • Protect any self-hosted or SaaS application with Access.
  • Inspect and protect Internet access with Gateway.
  • Isolate sensitive applications and high-risk browsing with Browser Isolation.
  • Protection from data-loss with CASB and DLP controls.

Finally, any device, office or network can be protected by Cloudflare One by connecting to our closest point of presence via our Roaming Agent (WARP) or via tunneled or direct connectivity.

Cloudflare recognized by Microsoft as a Security Software Innovator
Our current integrations with Microsoft within the context of a request flow

Looking forward to continuing this journey as the world around us changes constantly

This is the first year that Microsoft has a  Software Security Innovator award category, and we’re extremely proud to have won. Cloudflare is committed to strive and deliver next generation innovative Zero Trust solutions to our customers. If you are interested in our Cloudflare One suite, please reach out. Also, if you are interested in partnering with our Zero Trust solutions, fill out the form here.

Symbiote Backdoor in Linux

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2022/06/symbiote-backdoor-in-linux.html

Interesting:

What makes Symbiote different from other Linux malware that we usually come across, is that it needs to infect other running processes to inflict damage on infected machines. Instead of being a standalone executable file that is run to infect a machine, it is a shared object (SO) library that is loaded into all running processes using LD_PRELOAD (T1574.006), and parasitically infects the machine. Once it has infected all the running processes, it provides the threat actor with rootkit functionality, the ability to harvest credentials, and remote access capability.

News article:

Researchers have unearthed a discovery that doesn’t occur all that often in the realm of malware: a mature, never-before-seen Linux backdoor that uses novel evasion techniques to conceal its presence on infected servers, in some cases even with a forensic investigation.

No public attribution yet.

So far, there’s no evidence of infections in the wild, only malware samples found online. It’s unlikely this malware is widely active at the moment, but with stealth this robust, how can we be sure?

Handy Tips #32: Deploying Zabbix in the Azure cloud platform

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/handy-tips-32-deploying-zabbix-in-the-azure-cloud-platform/21355/

Deploy your Zabbix servers and proxies in the Azure cloud.

There are many use cases where deploying your Zabbix server or Zabbix proxies in the cloud can reduce costs, provide an additional layer of security and redundancy, and improve the available management toolset.

Deploy your Zabbix instance in the Azure cloud with the official Zabbix cloud images:

  • Cloud images are available for the latest Zabbix server and proxy versions
  • Deploy a fresh Zabbix instance in 5 minutes

  • Dynamically scale the cloud resources
  • Select the deployment options based on your budget

Check out the video to learn how to deploy Zabbix in the Microsoft Azure cloud platform:

How to deploy Zabbix in the Azure cloud platform:

  1. Navigate to the Zabbix Cloud Images page
  2. Select the Microsoft Azure vendor and Zabbix server cloud image
  3. Press the Get it now button and press Continue in the next window
  4. On the deployment page press the Create button
  5. Provide the virtual machine name, resource group, region
  6. Specify the administrator account settings
  7. Provide the disk, network, tag, and advanced settings
  8. Verify the provided settings
  9. Press Create to begin deploying the virtual machine
  10. For public key authentication: download and store the private key
  11. Once the deployment is complete, press the Go to resource button
  12. Save your public IP address and connect to it via SSH
  13. Save the initial frontend username and password
  14. Use the public IP address to connect to your Zabbix frontend
  15. Log in with the saved username and password obtained

Tips and best practices
  • The default SSH user is called azureuser
  • Remember to store your SSH private key in a secure location
  • You can access the Zabbix database by using the root user
  • The password for the MySQL database root user is stored in /root/.my.cnf configuration file

Feeling overwhelmed with deploying and managing your Zabbix instance?
Check out the Zabbix certified specialist courses, where under the guidance of a Zabbix certified trainer, you will learn how to deploy, configure and manage your Zabbix instance.

The post Handy Tips #32: Deploying Zabbix in the Azure cloud platform appeared first on Zabbix Blog.

The names of the new Astro Pi computers get revealed

Post Syndicated from Sam Duffy original https://www.raspberrypi.org/blog/new-astro-pi-computer-names-mission-zero-2021-22/

We and our collaborators at ESA Education are excited to announce that 17,168 programs written by young people from 26 countries have been successfully deployed on board the International Space Station (ISS) for the European Astro Pi Challenge 2021/22. And we can finally reveal the names of the two new and upgraded Astro Pi computers that Astro Pi participants have chosen.

The mark 2 Astro Pi units spin in microgravity on the International Space Station.
Young people participating in this year’s Astro Pi Mission Zero had the chance to help name these two upgraded Astro Pi computers, which we sent to the ISS in December.

Astro Pi is more popular than ever with young people

A record number of 28,126 young people took part across both missions in the Astro Pi Challenge 2021/22. In addition to the 299 Mission Space Lab teams who achieved flight status with the code they wrote for their scientific experiments this year, young people wrote 16,869 Mission Zero programs that were run on the new Astro Pi computers. This is an amazing 84% increase compared to Mission Zero last year.

Mission Zero is perfect for beginner coders: participants follow our step-by-step instructions and write a simple program for the Astro Pis. The program takes a humidity reading on board the ISS and displays it for the astronauts. Participants can also include code to display their own unique message on the Astro Pi LED displays. Mission Zero teams are very inventive, and the young people made great use of the Astro Pis’ LED display to create pixel art:

Pixel art coded by young people in Astro Pi Mission Zero.
Examples of pixel art images designed by Mission Zero 2021/22 teams for the Astro Pis’ LED displays.

Every Mission Zero participant receives a unique certificate showing exactly where the ISS was on its orbital path when their program was run:

The new Astro Pi computers’ names

This year, the deployment of all the Mission Zero and Mission Space Lab programs was overseen by ESA astronaut Matthias Maurer. But before he could do that, he first had an extra special task: unpacking and assembling the brand-new Astro Pi units in microgravity.

Matthias catching Astro Pis in microgravity.

The two original Astro Pis, named Ed and Izzy, travelled to the ISS back in 2015 as part of Tim Peake’s Principia mission. Since the, these two special Raspberry Pi computers have run programs written by more than 54,000 young people. They have done an amazing job and will return to Earth later in 20 22.

This year’s European Astro Pi Challenge is the first to use the two all-new Astro Pi computers, which we sent up to the ISS in December 2021. They are packed with special features, widening young people’s possibilities for new Mission Space Lab experiments. Running this year’s 17,168 programs was the new Astro Pis’ first task. 

Two Astro Pi units on board the International Space Station.
The two new Astro Pi computers on board the ISS

All young people taking part in Mission Zero this year had the once-in-a-lifetime opportunity: they got to suggest and vote for the names of the two new Astro Pi computers. We received nearly 7,000 name suggestions.

ESA astronaut Matthias Maurer has recorded a special message for all Astro Pi participants, revealing that the new Astro Pi computers will be named in honour of two inspirational European scientists drum roll… Nikola Tesla and Marie Curie!

The Astro Pi unit equipped with a Raspberry Pi High Quality Camera that is sensitive to near-infrared light is now called Nikola, and the Astro Pi unit with a visible-light sensitive High Quality Camera is now called Marie.

Marie Curie was born in Poland in 1867 and the first person ever to win two Nobel Prizes, in Physics and Chemistry, for her contribution to pioneering work on radioactivity and the treatment of cancer. Nikola Tesla was born in Croatia in 1856, and his innovations in electrical engineering included alternating current — vital for transmitting electricity over long distances — and the induction motor.

Marie Curie and Nikola Tesla’s work continues to impact all of our lives today, and we are delighted that this year’s Astro Pi participants have democratically chosen their names for the new Astro Pi computers.

Sign up for news about the next Astro Pi Challenge

The European Astro Pi Challenge will be back again in September 2022. Subscribe to the Astro Pi newsletter on the Astro Pi website to be the first to hear when the 2022/23 missions have lift off! 

The post The names of the new Astro Pi computers get revealed appeared first on Raspberry Pi.

Use the AWS Glue connector to read and write Apache Iceberg tables with ACID transactions and perform time travel

Post Syndicated from Tomohiro Tanaka original https://aws.amazon.com/blogs/big-data/use-the-aws-glue-connector-to-read-and-write-apache-iceberg-tables-with-acid-transactions-and-perform-time-travel/

Nowadays, many customers have built their data lakes as the core of their data analytic systems. In a typical use case of data lakes, many concurrent queries run to retrieve consistent snapshots of business insights by aggregating query results. A large volume of data constantly comes from different data sources into the data lakes. There is also a common demand to reflect the changes occurring in the data sources into the data lakes. This means that not only inserts but also updates and deletes need to be replicated into the data lakes.

Apache Iceberg provides the capability of ACID transactions on your data lakes, which allows concurrent queries to add or delete records isolated from any existing queries with read-consistency for queries. Iceberg is an open table format designed for large analytic workloads on huge datasets. You can perform ACID transactions against your data lakes by using simple SQL expressions. It also enables time travel, rollback, hidden partitioning, and schema evolution changes, such as adding, dropping, renaming, updating, and reordering columns.

AWS Glue is one of the key elements to building data lakes. It extracts data from multiple sources and ingests your data to your data lake built on Amazon Simple Storage Service (Amazon S3) using both batch and streaming jobs. To expand the accessibility of your AWS Glue extract, transform, and load (ETL) jobs to Iceberg, AWS Glue provides an Apache Iceberg connector. The connector allows you to build Iceberg tables on your data lakes and run Iceberg operations such as ACID transactions, time travel, rollbacks, and so on from your AWS Glue ETL jobs.

In this post, we give an overview of how to set up the Iceberg connector for AWS Glue and configure the relevant resources to use Iceberg with AWS Glue jobs. We also demonstrate how to run typical Iceberg operations on AWS Glue interactive sessions with an example use case.

Apache Iceberg connector for AWS Glue

With the Apache Iceberg connector for AWS Glue, you can take advantage of the following Iceberg capabilities:

  • Basic operations on Iceberg tables – This includes creating Iceberg tables in the AWS Glue Data Catalog and inserting, updating, and deleting records with ACID transactions in the Iceberg tables
  • Inserting and updating records – You can run UPSERT (update and insert) queries for your Iceberg table
  • Time travel on Iceberg tables – You can read a specific version of an Iceberg table from table snapshots that Iceberg manages
  • Rollback of table versions – You can revert an Iceberg table back to a specific version of the table

Iceberg offers additional useful capabilities such as hidden partitioning; schema evolution with add, drop, update, and rename support; automatic data compaction; and more. For more details about Iceberg, refer to the Apache Iceberg documentation.

Next, we demonstrate how the Apache Iceberg connector for AWS Glue works for each Iceberg capability based on an example use case.

Overview of example customer scenario

Let’s assume that an ecommerce company sells products on their online platform. Customers can buy products and write reviews to each product. Customers can add, update, or delete their reviews at any time. The customer reviews are an important source for analyzing customer sentiment and business trends.

In this scenario, we have the following teams in our organization:

  • Data engineering team – Responsible for building and managing data platforms.
  • Data analyst team – Responsible for analyzing customer reviews and creating business reports. This team queries the reviews daily, creates a business intelligence (BI) report, and shares it with sales team.
  • Customer support team – Responsible for replying to customer inquiries. This team queries the reviews when they get inquiries about the reviews.

Our solution has the following requirements:

  • Query scalability is important because the website is huge.
  • Individual customer reviews can be added, updated, and deleted.
  • The data analyst team needs to use both notebooks and ad hoc queries for their analysis.
  • The customer support team sometimes needs to view the history of the customer reviews.
  • Customer reviews can always be added, updated, and deleted, even while one of the teams is querying the reviews for analysis. This means that any result in a query isn’t affected by uncommitted customer review write operations.
  • Any changes in customer reviews that are made by the organization’s various teams need to be reflected in BI reports and query results.

In this post, we build a data lake of customer review data on top of Amazon S3. To meet these requirements, we introduce Apache Iceberg to enable adding, updating, and deleting records; ACID transactions; and time travel queries. We also use an AWS Glue Studio notebook to integrate and query the data at scale. First, we set up the connector so we can create an AWS Glue connection for Iceberg.

Set up the Apache Iceberg connector and create the Iceberg connection

We first set up Apache Iceberg connector for AWS Glue to use Apache Iceberg with AWS Glue jobs. Particularly, in this section, we set up the Apache Iceberg connector for AWS Glue and create an AWS Glue job with the connector. Complete the following steps:

  1. Navigate to the Apache Iceberg connector for AWS Glue page in AWS Marketplace.
  2. Choose Continue to Subscribe.

  1. Review the information under Terms and Conditions, and choose Accept Terms to continue.

  1. When the subscription is complete, choose Continue to Configuration.

  1. For Fulfillment option, choose Glue 3.0. (1.0 and 2.0 are also available options.)
  2. For Software version, choose the latest software version.

As of this writing, 0.12.0-2 is the latest version of the Apache Iceberg connector for AWS Glue.

  1. Choose Continue to Launch.

  1. Choose Usage instructions.
  2. Choose Activate the Glue connector from AWS Glue Studio.

You’re redirected to AWS Glue Studio.

  1. For Name, enter a name for your connection (for example, iceberg-connection).

  1. Choose Create connection and activate connector.

A message appears that the connection was successfully added, and the connection is now visible on the AWS Glue Studio console.

Configure resources and permissions

We use a provided AWS CloudFormation template to set up Iceberg configuration for AWS Glue. AWS CloudFormation creates the following resources:

  • An S3 bucket to store an Iceberg configuration file and actual data
  • An AWS Lambda function to generate an Iceberg configuration file based on parameters provided by a user for the CloudFormation template, and to clean up the resources created through this post
  • AWS Identity and Access Management (IAM) roles and policies with necessary permissions
  • An AWS Glue database in the Data Catalog to register Iceberg tables

To deploy the CloudFormation template, complete the following steps:

  1. Choose Launch Stack:

Launch Button

  1. For DynamoDBTableName, enter a name for an Amazon DynamoDB table that is created automatically when AWS Glue creates an Iceberg table.

This table is used for an AWS Glue job to obtain a commit lock to avoid concurrently modifying records in Iceberg tables. For more details about commit locking, refer to DynamoDB for Commit Locking. Note that you shouldn’t specify the name of an existing table.

  1. For IcebergDatabaseName, enter a name for the AWS Glue database that is created in the Data Catalog and used for registering Iceberg tables.
  2. Choose Next.

  1. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  2. Choose Create stack.

Start an AWS Glue Studio notebook to use Apache Iceberg

After you launch the CloudFormation stack, you create an AWS Glue Studio notebook to perform Iceberg operations. Complete the following steps:

  1. Download the Jupyter notebook file.
  2. On the AWS Glue console, choose Jobs in the navigation pane.
  3. Under Create job, select Jupyter Notebook.

  1. Select Upload and edit an existing notebook and upload iceberg-with-glue.ipynb.

  1. Choose Create.
  2. For Job name, enter a name.
  3. For IAM role, choose IcebergConnectorGlueJobRole, which was created via the CloudFormation template.
  4. Choose Start notebook job.

The process takes a few minutes to complete, after which you can see an AWS Glue Studio notebook view.

  1. Choose Save to save the notebook.

Set up the Iceberg configuration

To set up the Iceberg configuration, complete the following steps:

  1. Run the following cells with multiple options (magics). Note that you set your connection name for the %connections magic in the cell.

For more information, refer to Configuring AWS Glue Interactive Sessions for Jupyter and AWS Glue Studio notebooks.

A message Session <session-id> has been created appears when your AWS Glue Studio notebook is ready.

In the last cell in this section, you load your Iceberg configuration, which you specified when launching the CloudFormation stack. The Iceberg configuration includes a warehouse path for Iceberg actual data, a DynamoDB table name for commit locking, a database name for your Iceberg tables, and more.

To load the configuration, set the S3 bucket name that was created via the CloudFormation stack.

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Choose the stack you created.
  3. On the Outputs tab, copy the S3 bucket name.

  1. Set the S3 name as the S3_BUCKET parameter in your notebook.

  1. Run the cell and load the Iceberg configuration that you set.

Initialize the job with Iceberg configurations

We continue to run cells to initiate a SparkSession in this section.

  1. Set an Iceberg warehouse path and a DynamoDB table name for Iceberg commit locking from the user_config parameter.
  2. Initialize a SparkSession by setting the Iceberg configurations.
  3. With the SparkSession object, create SparkContext and GlueContext objects.

The following screenshot shows the relevant section in the notebook.

We provide the details of each parameter that you configure for the SparkSession in the appendix of this post.

For this post, we demonstrate setting the Spark configuration for Iceberg. You can also set the configuration as AWS Glue job parameters. For more information, refer to the Usage Information section in the Iceberg connector product page.

Use case walkthrough

To walk through our use case, we use two tables; acr_iceberg and acr_iceberg_report. The table acr_iceberg contains the customer review data. The table acr_iceberg_report contains BI analysis results based on the customer review data. All changes to acr_iceberg also impact acr_iceberg_report. The table acr_iceberg_report needs to be updated daily, right before sharing business reports with stakeholders.

To demonstrate this use case, we walk through the following typical steps:

  1. A data engineering team registers the acr_iceberg and acr_iceberg_report tables in the Glue Data Catalog.
  2. Customers (ecommerce users) add reviews to products in the Industrial_Supplies category. These reviews are added to the Iceberg table.
  3. A customer requests to update their reviews. We simulate updating the customer review in the acr_iceberg table.
  4. We reflect the customer’s request of the updated review in acr_iceberg into acr_iceberg_report.
  5. We revert the customer’s request of the updated review for the customer review table acr_iceberg, and reflect the reversion in acr_iceberg_report.

1. Create Iceberg tables of customer reviews and BI reports

In this step, the data engineering team creates the acr_iceberg Iceberg table for customer reviews data (based on the Amazon Customer Reviews Dataset), and the team creates the acr_iceberg_report Iceberg table for BI reports.

Create the acr_iceberg table for customer reviews

The following code initially extracts the Amazon customer reviews, which are stored in a public S3 bucket. Then it creates an Iceberg table of the customer reviews and loads these reviews into your specified S3 bucket (created via CloudFormation stack). Note that the script loads partial datasets to avoid taking a lot of time to load the data.

# Loading the dataset and creating an Iceberg table. This will take about 3-5 minutes.
spark.read \
    .option('basePath', INPUT_BASE_PATH) \
    .parquet(*INPUT_CATEGORIES) \
    .writeTo(f'{CATALOG}.{DATABASE}.{TABLE}') \
    .tableProperty('format-version', '2') \
    .create()

Regarding the tableProperty parameter, we specify format version 2 to make the table version compatible with Amazon Athena. For more information about Athena support for Iceberg tables, refer to Considerations and limitations. To learn more about the difference between Iceberg table versions 1 and 2, refer to Appendix E: Format version changes.

Let’s run the following cells. Running the second cell takes around 3–5 minutes.

After you run the cells, the acr_iceberg table is available in your specified database in the Glue Data Catalog.

You can also see the actual data and metadata of the Iceberg table in the S3 bucket that is created through the CloudFormation stack. Iceberg creates the table and writes actual data and relevant metadata that includes table schema, table version information, and so on. See the following objects in your S3 bucket:

$ aws s3 ls 's3://your-bucket/data/' --recursive
YYYY-MM-dd hh:mm:ss   83616660 data/iceberg_blog_default.db/acr_iceberg/data/00000-44-c2983230-c43a-4f4a-9b89-1f7c13e59645-00001.parquet
YYYY-MM-dd hh:mm:ss   83247771 
...
YYYY-MM-dd hh:mm:ss       5134 data/iceberg_blog_default.db/acr_iceberg/metadata/00000-bc5d3ea2-280f-4e28-a71f-4c2b749ed637.metadata.json
YYYY-MM-dd hh:mm:ss     116950 data/iceberg_blog_default.db/acr_iceberg/metadata/411308cd-1f4d-4535-9444-f6b56a56697f-m0.avro
YYYY-MM-dd hh:mm:ss       3821 data/iceberg_blog_default.db/acr_iceberg/metadata/snap-6122957686233868728-1-411308cd-1f4d-4535-9444-f6b56a56697f.avro

The job tries to create a DynamoDB table, which you specified in the CloudFormation stack (in the following screenshot, its name is myGlueLockTable), if it doesn’t exist already. As we discussed earlier, the DynamoDB table is used for commit locking for Iceberg tables.

Create the acr_iceberg_report Iceberg table for BI reports

The data engineer team also creates the acr_iceberg_report table for BI reports in the Glue Data Catalog. This table initially has the following records.

comment_count avg_star product_category
1240 4.20729367860598 Camera
95 4.80167540490342 Industrial_Supplies
663 3.80123467540571 PC

To create the table, run the following cell.

The two Iceberg tables have been created. Let’s check the acr_iceberg table records by running a query.

Determine the average star rating for each product category by querying the Iceberg table

You can see the Iceberg table records by using a SELECT statement. In this section, we query the acr_iceberg table to simulate seeing a current BI report data by running an ad hoc query.

Run the following cell in the notebook to get the aggregated number of customer comments and mean star rating for each product_category.

The cell output has the following results.

Another way to query Iceberg tables is using Amazon Athena (when you use the Athena with Iceberg tables, you need to set up the Iceberg environment) or Amazon EMR.

2. Add customer reviews in the Iceberg table

In this section, customers add comments for some products in the Industrial Supplies product category, and we add these comments to the acr_iceberg table. To demonstrate this scenario, we create a Spark DataFrame based on the following new customer reviews and then add them to the table with an INSERT statement.

marketplace customer_id review_id product_id product_
parent
product_
title
star_
rating
helpful_
votes
total_
votes
vine verified_
purchase
review_
headline
review_
body
review_
date
year product_
category
US 12345689 ISB35E4556F144 I00EDBY7X8 989172340 plastic containers 5 0 0 N Y Five Stars Great product! 2022-02-01 2022 Industrial_
Supplies
US 78901234 IS4392CD4C3C4 I00D7JFOPC 952000001 battery tester 3 0 0 N Y nice one, but
it broke
some days later
nope 2022-02-01 2022 Industrial_
Supplies
US 12345123 IS97B103F8B24C I002LHA74O 818426953 spray bottle 2 1 1 N N Two Stars the bottle isn’t
as big as pictured.
2022-02-01 2022 Industrial_
Supplies
US 23000093 ISAB4268D46F3X I00ARPLCGY 562945918 3d printer 5 3 3 N Y Super great very useful 2022-02-01 2022 Industrial_
Supplies
US 89874312 ISAB4268137V2Y I80ARDQCY 564669018 circuit board 4 0 0 Y Y Great, but
a little bit expensive
you should buy this,
but note the price
2022-02-01 2022 Industrial_
Supplies

Run the following cells in the notebook to insert the customer comments to the Iceberg table. The process takes about 1 minute.

Run the next cell to see an addition to the product category Industrial_Supplies with 5 under comment_count.

3. Update a customer review in the Iceberg table

In the previous section, we added new customer reviews to the acr_iceberg Iceberg table. In this section, a customer requests an update of their review. Specifically, customer 78901234 requests the following update of the review ID IS4392CD4C3C4.

  • change star_rating from 3 to 5
  • update the review_headline from nice one, but it broke some days later to very good

We update the customer comment by using an UPDATE query by running the following cell.

We can review the updated record by running the next cell as follows.

Also, when you run this cell for the reporting table, you can see the updated avg_star column value for the Industrial_Supplies product category. Specifically, the avg_star value has been updated from 3.8 to 4.2 as a result of the star_rating changing from 3 to 5:

4. Reflect changes in the customer reviews table in the BI report table with a MERGE INTO query

In this section, we reflect the changes in the acr_iceberg table into the BI report table acr_iceberg_report. To do so, we run the MERGE INTO query and combine the two tables based on the condition of the product_category column in each table. This query works as follows:

  • When the product_category column in each table is the same, the query returns the sum of each column record
  • When the column in each table is not the same, the query just inserts a new record

This MERGE INTO operation is also referred to as an UPSERT (update and insert).

Run the following cell to reflect the update of customer reviews in the acr_iceberg table into the acr_iceberg_report BI table.

After the MERGE INTO query is complete, you can see the updated acr_iceberg_report table by running the following cell.

The MERGE INTO query performed the following changes:

  • In the Camera, Industrial_Supplies, and PC product categories, each comment_count is the sum between the initial value of the acr_iceberg_report table and the aggregated table value. For example, in the Industrial_Supplies product category row, the comment_count 100 is calculated by 95 (in the initial version of acr_iceberg_report) + 5 (in the aggregated report table).
  • In addition to comment_count, the avg_star in the Camera, Industrial_Supplies, or PC product category row is also computed by averaging between each avg_star value in acr_iceberg_report and in the aggregated table.
  • In other product categories, each comment_count and avg_star is the same as each value in the aggregated table, which means that each value in the aggregated table is inserted into the acr_iceberg_report table.

5. Roll back the Iceberg tables and reflect changes in the BI report table

In this section, the customer who requested the update of the review now requests to revert the updated review.

Iceberg stores versioning tables through the operations for Iceberg tables. We can see the information of each version of table by inspecting tables, and we can also time travel or roll back tables to an old table version.

To complete the customer request to revert the updated review, we need to revert the table version of acr_iceberg to the earlier version when we first added the reviews. Additionally, we need to update the acr_iceberg_report table to reflect the rollback of the acr_iceberg table version. Specifically, we need to perform the following three steps to complete these operations:

  1. Check the history of table changes of acr_iceberg and acr_iceberg_report to get each table snapshot.
  2. Roll back acr_iceberg to the version when first we inserted records, and also roll back the acr_iceberg_report table to the initial version to reflect the customer review update.
  3. Merge the acr_iceberg table with the acr_iceberg_report table again.

Get the metadata of each report table

As a first step, we check table versions by inspecting the table. Run the following cells.

Now you can see the following table versions in acr_iceberg and acr_iceberg_report:

  • acr_iceberg has three versions:
    • The oldest one is the initial version of this table, which shows the append operation
    • The second oldest one is the record insertion, which shows the append operation
    • The latest one is the update, which shows the overwrite operation
  • acr_iceberg_report has two versions:
    • The oldest one is the initial version of this table, which shows the append operation
    • The other one is from the MERGE INTO query in the previous section, which shows the overwrite operation

As shown in the following screenshot, we roll back to the acr_iceberg table version, inserting records based on the customer revert request. We also roll back to the acr_iceberg_report table version in the initial version to discard the MERGE INTO operation in the previous section.

Roll back the acr_iceberg and acr_iceberg_report tables

Based on your snapshot IDs, you can roll back each table version:

  • For acr_iceberg, use the second-oldest snapshot_id (in this example, 5440744662350048750) and replace <Type snapshot_id in ace_iceberg table> in the following cell with this snapshot_id.
  • For acr_iceberg_report table, use the initial snapshot_id (in this example, 7958428388396549892) and replace <Type snaphost_id in ace_iceberg_report table> in the following cell with this snapshot_id.

After you specify the snapshot_id for each rollback query, run the following cells.

When this step is complete, you can see the previous and current snapshot IDs of each table.

Each Iceberg table has been reverted to the specific version now.

Reflect changes in acr_iceberg into acr_iceberg_report again

We reflect the acr_iceberg table reversion into the current acr_iceberg_report table. To complete this, run the following cell.

After you rerun the MERGE INTO query, run the following cell to see the new table records. When we compare the table records, we observe that the avg_star value in Industrial_Supplies is lower than the value of the previous table avg_star.

You were able to reflect a customer’s request of reverting their updated review on the BI report table. Specifically, you can get the updated avg_star record in the Industrial_Supplies product category.

Clean up

To clean up all resources that you created, delete the CloudFormation stack.

Conclusion

In this post, we walked through using the Apache Iceberg connector with AWS Glue ETL jobs. We created an Iceberg table built on Amazon S3, and ran queries such as reading the Iceberg table data, inserting a record, merging two tables, and time travel.

The operations for the Iceberg table that we demonstrated in this post aren’t all of the operations Iceberg supports. Refer to the Apache Iceberg documentation for information about more operations.

Appendix: Spark configurations to use Apache Iceberg on AWS Glue

As we mentioned earlier, the notebook sets up a Spark configuration to integrate Iceberg with AWS Glue. The following table shows what each parameter defines.

Spark configuration key Value Description
spark.sql.catalog.{CATALOG} org.apache.iceberg.spark.SparkCatalog Specifies a Spark catalog interface that communicates with Iceberg tables.
spark.sql.catalog.{CATALOG}.warehouse {WAREHOUSE_PATH} A warehouse path for jobs to write iceberg metadata and actual data.
spark.sql.catalog.{CATALOG}.catalog-impl org.apache.iceberg.aws.
glue.GlueCatalog
The implementation of the Spark catalog class to communicate between Iceberg tables and the AWS Glue Data Catalog.
spark.sql.catalog.{CATALOG}.io-impl org.apache.iceberg.aws.s3.S3FileIO Used for Iceberg to communicate with Amazon S3.
spark.sql.catalog.{CATALOG}.lock-impl org.apache.iceberg.aws.glue.
DynamoLockManager
Used for Iceberg to manage table locks.
spark.sql.catalog.{CATALOG}.lock.table {DYNAMODB_TABLE} A DynamoDB table name to store table locks.
spark.sql.extensions org.apache.icerberg.spark.extensions.
IcebergSparkSessionExtensions
The implementation that enables Spark to run Iceberg-specific SQL commands.
spark.sql.session.timeZone UTC Sets the time zone of the Spark environment to UTC for further Iceberg time travel queries. The epoch time is in the UTC time zone.

About the Author

Tomohiro Tanaka is a Cloud Support Engineer at Amazon Web Services. He builds Glue connectors such as Apache Iceberg connector and TPC-DS connector. He’s passionate about helping customers build data lakes using ETL workloads. In his free time, he also enjoys coffee breaks with his colleagues and making coffee at home.

New AWS whitepaper: AWS User Guide to Financial Services Regulations and Guidelines in New Zealand

Post Syndicated from Julian Busic original https://aws.amazon.com/blogs/security/new-aws-whitepaper-aws-user-guide-to-financial-services-regulations-and-guidelines-in-new-zealand/

Amazon Web Services (AWS) has released a new whitepaper to help financial services customers in New Zealand accelerate their use of the AWS Cloud.

The new AWS User Guide to Financial Services Regulations and Guidelines in New Zealand—along with the existing AWS Workbook for the RBNZ’s Guidance on Cyber Resilience—continues our efforts to help AWS customers navigate the regulatory expectations of the Reserve Bank of New Zealand (RBNZ) in a shared responsibility environment.

This whitepaper is intended for RBNZ-regulated institutions that are looking to run material workloads in the AWS Cloud, and is particularly useful for leadership, security, risk, and compliance teams that need to understand RBNZ requirements and guidance.

The whitepaper summarizes RBNZ requirements and guidance related to outsourcing, cyber resilience, and the cloud. It also gives RBNZ-regulated institutions information they can use to commence their due diligence and assess how to implement the appropriate programs for their use of AWS cloud services.

This document joins existing guides for other jurisdictions in the Asia Pacific region, such as Australia, India, Singapore, and Hong Kong. As the regulatory environment continues to evolve, we’ll provide further updates on the AWS Security Blog and the AWS Compliance page. You can find more information on cloud-related regulatory compliance at the AWS Compliance Center. You can also reach out to your AWS account manager for help finding the resources you need.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Author

Julian Busic

Julian is a Security Solutions Architect with a focus on regulatory engagement. He works with our customers, their regulators, and AWS teams to help customers raise the bar on secure cloud adoption and usage. Julian has over 15 years of experience working in risk and technology across the financial services industry in Australia and New Zealand.

The collective thoughts of the interwebz