Introducing Point in Time queries and SQL/PPL support in Amazon OpenSearch Serverless

2024-11-19 Jagadish Kumar

Post Syndicated from Jagadish Kumar original https://aws.amazon.com/blogs/big-data/introducing-point-in-time-queries-and-sql-ppl-support-in-amazon-opensearch-serverless/

Today we announced support for three new features for Amazon OpenSearch Serverless: Point in Time (PIT) search, which enables you to maintain stable sorting for deep pagination in the presence of updates, and Piped Processing Language (PPL) and Structured Query Language (SQL), which give you new ways to query your data. Querying with SQL or PPL is useful if you’re already familiar with the language or want to integrate your domain with an application that uses them.

OpenSearch Serverless is a powerful and scalable search and analytics engine that enables you to store, search, and analyze large volumes of data while reducing the burden of manual infrastructure provisioning and scaling as you ingest, analyze, and visualize your time series and search data, simplifying data management and enabling you to derive actionable insights from data. The vector engine for OpenSearch Serverless also makes it easy for you to build modern machine learning (ML) augmented search experiences and generative artificial intelligence (generative AI) applications without needing to manage the underlying vector database infrastructure.

PIT search

Point in Time (PIT) search lets you run different queries against a dataset that’s fixed in time. Typically, when you run the same query on the same index at different points in time, you receive different results because documents are constantly indexed, updated, and deleted. With PIT, you can query against a state of your dataset for a point in time. Although OpenSearch still supports other ways of paginating results, PIT search provides superior capabilities and performance because it isn’t bound to a query and supports consistent pagination. When you create a PIT for a set of indexes, OpenSearch creates contexts to access data at that point in time and when you use a query with a PIT ID, it searches the contexts that are frozen in time to provide consistent results.

Using PIT involves the following high-level steps:

Create a PIT.
Run search queries with a PIT ID and use the search_after parameter for the next page of results.
Close the PIT.

Create a PIT

When you create a PIT, OpenSearch Serverless provides a PIT ID, which you can use to run multiple queries on the frozen dataset. Even though the indexes continue to ingest data and modify or delete documents, the PIT references the data that hasn’t changed since the PIT creation.

Run a search query with the PIT ID

PIT search isn’t bound to a query, so you can run different queries on the same dataset, which is frozen in time.

When you run a query with a PIT ID, you can use the search_after parameter to retrieve the next page of results. This gives you control over the order of documents in the pages of results.

The following response contains the first 100 documents that match the query. To get the next set of documents, you can run the same query with the last document’s sort values as the search_after parameter, keeping the same sort and pit.id. You can use the optional keep_alive parameter to extend the PIT time.

Close the PIT

When your queries on the dataset are complete, you can delete the PIT using the DELETE operation. PITs automatically expire after the keep_alive duration.

Considerations and limitations

Keep in mind the following limitations when using this feature:

Search slicing is not supported in OpenSearch Serverless
PIT list segment is not supported
The total number of open PITs are restricted to 300 per collection that share the same AWS Key Management Service (AWS KMS) key

SQL and PPL support

OpenSearch Serverless provides a primary query interface called query DSL that you can use to search your data. Query DSL is a flexible language with a JSON interface. In addition to DSL, you can now extract insights out of OpenSearch Serverless using the familiar SQL query syntax.

You can use the SQL and PPL API, the /plugins/_sql and /plugins/_ppl endpoints respectively, to search the data. You can use aggregations, group by, and where clauses to investigate your data and read your data as JSON documents or CSV tables, so you have the flexibility to use the format that works best for you. By default, queries return data in JDBC format. You can specify the response format as JDBC, standard OpenSearch JSON, CSV, or raw.

Use the /plugins/_sql endpoint to send SQL queries to the SQL plugin, as shown in the following example.

Besides basic filtering and aggregation, OpenSearch SQL also supports complex queries, such as querying semi-structured data, set operations, sub-queries and limited JOINs. Beyond the standard functions, OpenSearch functions are provided for better analytics and visualization.

For PPL queries, use the /plugins/_ppl endpoint to send queries to the SQL plugin.

Considerations and limitations

Keep in mind the following:

Query Workbench is not supported for SQL and PPL queries
The SQL and PPL CLI is supported and can be used to issue SQL and PPL queries
DELETE statements are not supported
SQL plugin data sources are not supported
The SQL query stats API is not supported

Summary

In this post, we discussed new features in OpenSearch Serverless. PIT is a useful feature when you need to maintain a consistent view of your data for pagination during search operations. SQL in OpenSearch Service bridges the gap between traditional relational database concepts and the flexibility of OpenSearch’s document-oriented data storage. You can send SQL and PPL queries to the _sql and _ppl endpoints, respectively, and use aggregations, group by, and where clauses to analyze their data.

For more information, refer to :

About the Authors

Jagadish Kumar (Jag) is a Senior Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Frank Dattalo is a Software Engineer with Amazon OpenSearch Service. He focuses on the search and plugin experience in Amazon OpenSearch Serverless. He has an extensive background in search, data ingestion, and AI/ML. In his free time, he likes to explore Seattle’s coffee landscape.

Milav Shah is an Engineering Leader with Amazon OpenSearch Service. He focuses on the search experience for OpenSearch customers. He has extensive experience building highly scalable solutions in databases, real-time streaming, and distributed computing. He also possesses functional domain expertise in verticals like Internet of Things, fraud protection, gaming, and ML/AI. In his free time, he likes to ride his bicycle, hike, and play chess.

The serverless attendee’s guide to AWS re:Invent 2024

2024-11-19 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/the-serverless-attendees-guide-to-aws-reinvent-2024/

AWS re:Invent 2024 offers an extensive selection of serverless and application integration content.

AWS re:Invent Banner

For detailed descriptions and schedule, visit the AWS re:Invent Session Catalog.

Join AWS serverless experts and community members at the AWS Modern Apps and Open Source Zone in the AWS Expo Village. This serves as a hub for serverless discussions at re:Invent. While you are there, enjoy a free coffee and learn about serverless architectures at the Serverlesspresso booth. There are two this year, another one at the Certificate Lounge. The AWS Expo Village also includes Serverless and Serverless Containers booths.

Don’t have a ticket yet? Join us in Las Vegas from November 28-December 2, 2022 by registering for re:Invent 2024.

This guide organizes the sessions into categories to help you find the content this is most relevant to you.

Session Types

Breakout Sessions are lecture-style presentations covering architecture, best practices, and deep dives into AWS services.
Workshops are 2-hour hands-on sessions where you work through tasks in AWS accounts using AWS services. Laptops are required and AWS credits are provided.
Chalk Talks are highly interactive 60-minute sessions with smaller audiences, focused on technical deep dives with whiteboards for architectural discussions.
Builders’ Sessions are 60-minute small-group sessions led by an AWS expert who guides you through a technical problem using AWS services.
Code Talks are 60-minute live coding sessions where AWS experts show how to build solutions using AWS services.

Leadership session: Nick Coult, Usman Khalid, Kathleen deValk

SVS211: Celebrating 10 years of pioneering serverless and containers – Breakout.
- Explore how serverless has evolved to help organizations drive the highest performance, availability, and security at low costs.

Getting started sessions

Are you new to serverless or taking your first steps? Hear from AWS experts and customers on best practices and strategies for building serverless workloads. Get hands-on with services by attending a workshop or builders session. Create the next great “to do” app or add a new customer experience for a theme park.

SVS202: Thinking serverless – Chalk Talk
- Learn how to approach building solutions with a serverless mindset by breaking down business problems into serverless building blocks.
SVS205: Building a serverless web application for a theme park – Workshop
- Learn how to build a complete serverless web application for a theme park called Innovator Island.
SVS201: Getting started with serverless patterns – Workshop
- Learn how to recognize and apply common serverless patterns by building production-ready code for a serverless application.
SVS204: Write less code: Building applications with a serverless mindset – Builders Session
- Get more value by using built-in integrations between AWS services through configuration rather than writing glue code.
SVS207: Effectively model costs for your serverless applications – Chalk Talk
- Gain insights into modeling the cost of serverless applications on AWS by considering request loads, payload sizes, and service pricing.
API201: The AWS Step Functions workshop – Workshop
- Learn about the features of AWS Step Functions through hands-on interactive modules.
API204: Building event-driven architectures – Workshop
- Learn about the basics of event-driven design using examples involving Amazon SNS, Amazon SQS, AWS Lambda, Amazon EventBridge, and more.
API205: Unlock the power of an exceptional serverless developer experience – Code Talk
- Learn how to accelerate your serverless development with AWS tools, including Amazon Q Developer integrated into IDEs.
SEG209: Getting started building serverless SaaS architectures
- Discover how to build your first serverless application, and learn how to handle multi-tenant architectures for SaaS applications.

Understanding serverless architectures

SVS208: Balance consistency and developer freedom with platform engineering – Breakout
- Learn how platform teams can provide opinionated security, cost, observability, reliability, and sustainability patterns while maintaining developer flexibility.
SVS209: Containers or serverless functions: A path for cloud-native success – Breakout
- Explore the fundamental differences between containers and serverless functions through real-world scenarios and insights into choosing the right approach.
OPN301: Level up your serverless applications with Powertools for AWS Lambda – Workshop
- Learn why Powertools for AWS Lambda can be the developer toolkit of choice for serverless workloads.
DEV341: From single to multi-tenant: Scaling a mission-critical serverless app
- Explore how to transition a mission-critical application from a single-tenant to a multi-tenant architecture
DEV337: Zero to production serverless in 8 weeks
- Hear about a real-world project journey, from concept to production in only eight weeks. Expect practical insights, mistakes, tips, and how using the right technologies and development process can deliver results fast.

Building event-driven applications

API204: Building event-driven architectures – Workshop
- Learn about the basics of event-driven design using examples involving Amazon SNS, Amazon SQS, AWS Lambda, Amazon EventBridge, and more.
API206: How event-driven architectures can go wrong and how to fix them – Chalk Talk
- Explore common event-driven pitfalls including YOLO events, god events, observability soup, event loops, and surprise bills.
DEV321: Choosing the right serverless compute services
- Learn when to use AWS serverless compute services like AWS Lambda and Amazon ECS on AWS Fargate and how to integrate them into your application architectures.
API307: Event-driven architectures at scale: Manage millions of events – Breakout
- Discover proven patterns for building high-scale event-driven systems that can be effectively managed across a distributed organization with Amazon EventBridge.
SVS206: Building an event sourcing system using AWS serverless technologies – Chalk Talk
- Explore strategies for building effective event sourcing architectures using AWS serverless technologies to store application state as an append-only event log.
COP408: Coding for serverless observability
- Join this code talk to learn best practices for collecting signals from your serverless applications. Dive deep into techniques to effectively instrument your applications to provide you with optimal observability.

Incorporating orchestration

API201: The AWS Step Functions workshop – Workshop
- Learn about the features of AWS Step Functions through hands-on interactive modules.
API203: Building common orchestrated workflows with AWS Step Functions – Builders Session
- Build three orchestrated workflows, including streamlined data processing with Distributed Map state, external system integration using callback, and implementing the saga pattern.
API207: Optimize data processing with built-in AWS Step Functions features – Chalk Talk
- Learn to optimize your serverless data processing workflows at scale using AWS Step Functions features, including intrinsic functions and Distributed Map state.
API402: Building advanced workflows with AWS Step Functions – Breakout
- Learn how you can use generative AI to generate state machines automatically from textual descriptions and chat with your workflow to optimize it.

Understanding integration patterns

API208: Building an integration strategy for the future – Breakout
- Boost productivity and create better customer experiences by building a modern integration strategy using AWS application, data, and file integration services.
API306: Integration patterns for distributed systems – Breakout
- Learn about common design trade-offs for distributed systems and how to navigate them with design patterns, illustrated with real-world examples.
API311: Application integration for platform builders – Breakout
- Explore the implementation of application integration using serverless components in enterprise environments.

Building APIs and frontends

SVS203: Create your first API from scratch with OpenAPI and Amazon API Gateway – Builders Session
- Learn how to design and provision complete APIs using infrastructure as code following the OpenAPI specification.
API303: Building modern API architectures: Which front door should I use? – Chalk Talk
- Explore options for building modern APIs including REST, GraphQL, and real-time APIs along with their benefits and drawbacks.
API304: Building rate-limited solutions on AWS – Chalk Talk
- Learn some of the best ways to build rate limiting into your systems for improved reliability.
API305: Asynchronous frontends: Building seamless event-driven experiences – Breakout
- Explore patterns to enable asynchronous, event-driven integrations with the frontend designed for architects and frontend, backend, and full-stack engineers.

Diving deep into advanced topics

SVS401: Best practices for serverless developers – Breakout
- Discover architectural best practices, optimizations, and useful shortcuts for building production-ready serverless workloads.
SVS403: From serverful to serverless Java – Workshop
- Learn how to bring your traditional Java Spring application to AWS Lambda with minimal effort and iteratively apply optimizations.
SVS406: Scale streaming workloads with AWS Lambda – Chalk Talk
- Learn how to implement parallel processing techniques for ordered and unordered use cases to address throughput limitations in streaming data processing.

Processing data

SVS404: Building serverless distributed data processing workloads – Workshop
- Learn how serverless technologies like AWS Step Functions and AWS Lambda can help you simplify management and scaling of distributed data processing.
API401: Multi-tenant Amazon SQS queues: Mitigating noisy neighbors – Chalk Talk
- Explore advanced strategies for managing multi-tenant Amazon SQS queues and effective mitigation techniques, including shuffle sharding and overflow queues.
SVS321: AWS Lambda and Apache Kafka for real-time data processing applications – Breakout
- Gain practical insights into building scalable, serverless data processing applications by integrating AWS Lambda with Apache Kafka.

Incorporating generative AI

API209: Generative AI at scale: Serverless workflows for enterprise-ready apps – Workshop
- Learn to build enterprise-ready, scalable generative AI applications that can scale from serving 100 to 100,000 users.
API310: Build a meeting summarization solution with generative AI & serverless – Code Talk
- See live coding of a serverless application for producing meeting summaries with generative AI using Amazon Transcribe and Amazon Bedrock, orchestrated with AWS Step Functions.
SVS319: Unlock the power of generative AI with AWS Serverless – Breakout
- Learn to harness AWS Serverless to build robust, cost-effective generative AI applications. Explore using AWS Step Functions to orchestrate complex AI workflows.
SVS325: Secure access to enterprise generative AI with serverless AI gateway – Chalk Talk
- Explore how to architect a serverless AI gateway on AWS to securely integrate and consume large language models from multiple providers.

Additional resources

For social activities see the Unofficial list of AWS re:Invent Conference and Vendor Parties.

If you are attending re:Invent, connect at our AWS Modern Apps and Open Source Zone in the AWS Expo Village. The AWS Expo Village also includes Serverless and Serverless Containers booths.

If you can not join us in-person, breakout sessions will be available via our YouTube channel after the event.

We look forward to seeing you at re:Invent 2024! For more serverless learning resources, visit Serverless Land.

[$] Book review: Run Your Own Mail Server

2024-11-19 jzb

Post Syndicated from jzb original https://lwn.net/Articles/998153/

The most common piece of advice given to users who ask about
running their own mail server is don’t. Setting up
and securing a mail server in 2024 is not for the faint of heart, nor
for anyone without copious spare time. Spammers want to flood inboxes
with ads for questionable supplements, attackers want to abuse servers
to send spam (or worse), and getting the big providers to accept mail
from small servers is a constant uphill battle. Michael W. Lucas,
however, encourages users to thumb their nose at the “Email
Empire“, and declare email independence. His self-published book,
Run Your Own Mail
Server, provides a manual (and manifesto) for users who are
interested in the challenge.

Introducing Amazon MWAA micro environments for Apache Airflow

2024-11-19 Hernan Garcia

Post Syndicated from Hernan Garcia original https://aws.amazon.com/blogs/big-data/introducing-amazon-mwaa-micro-environments-for-apache-airflow/

Amazon Managed Workflows for Apache Airflow (Amazon MWAA), is a managed Apache Airflow service used to extract business insights across an organization by combining, enriching, and transforming data through a series of tasks called a workflow. It enhances infrastructure security and availability while reducing operational overhead.

Today, we’re excited to announce mw1.micro, the latest addition to Amazon MWAA environment classes. This offering is designed to provide an even more cost-effective solution for running Airflow environments in the cloud. With mw1.micro, we’re bringing the power of Amazon MWAA to teams who require a lightweight environment without compromising on essential features. In this post, we’ll explore mw1.micro characteristics, key benefits, ideal use cases, and how you can set up an Amazon MWAA environment based on this new environment class.

Customers maintain multiple MWAA environments to separate development stages, optimize resources, manage versions, enhance security, ensure redundancy, customize settings, improve scalability, and facilitate experimentation. This approach offers greater flexibility and control over workflow management. These organizations often maintain multiple AWS accounts for development, testing, and production stages, leading to increased complexity and cost. The traditional approach of using full-sized Amazon MWAA environments for development and testing can also be expensive, especially for teams working on smaller projects or proof-of-concept initiatives. Additionally, customers adopting a federated deployment model find it challenging to provide isolated environments for different teams or departments, and at the same time optimize cost. The introduction of mw1.micro addresses these pain points by offering an option that enables a more efficient resource utilization and significant cost savings.

The micro environment class

The mw1.micro configuration provides a balanced set of resources suitable for small-scale data processing and orchestration tasks. The class allocates 1 vCPU and 3GB of RAM for a scheduler/worker hybrid container. Similarly, the web server is equipped with 1 vCPU and 3 GB RAM configuration. The Amazon Elastic Container Service (Amazon ECS) tasks launched in the environment use AWS Fargate platform version 1.4.0, increasing ephemeral task storage to 20 GB.

mw1.micro environments support up to three concurrent tasks, making it ideal for sequential or lightly parallelized workflows. Additionally, it can accommodate up to 25 DAGs, providing ample capacity for organizing and managing various data pipelines and processes. This micro environment is particularly well-suited for development, testing, or small production workloads where resource optimization and cost-efficiency are primary concerns.

The following table summarizes the environment capabilities of mw1.micro.

Class/Resources	Scheduler and Worker vCPU/RAM	Web Server vCPU/RAM	Concurrent Tasks	DAG Capacity
mw1.micro	1 vCPU / 3GB	1 vCPU / 3GB	3	Up to 25

For mw1.micro, we maintain the general architecture of Amazon MWAA, and combine the Airflow scheduler and worker into a single container. For this reason, mw1.micro uses only two AWS Fargate tasks, one scheduler/worker hybrid, and one web server. The following diagram illustrates the environment architecture.

Another important change is that the meta database will now use a t4g.medium Amazon Aurora PostgreSQL-Compatible Edition instance powered by AWS Graviton2. With the Graviton2 family of processors, you get compute, storage, and networking improvements, and the reduction of your carbon footprint offered by the AWS family of processors.

Supported features

mw1.micro maintains Amazon MWAA and Airflow key functionalities that developers currently rely on:

You can set up a public or private web server, allowing you to control access to your Airflow UI as needed
You can add custom plugins and requirements, enabling you to extend Airflow’s capabilities and manage dependencies effortlessly
Startup scripts can be used to perform initialization tasks, making sure your environment is configured precisely to your specifications
The Airflow UI is fully functional, providing the same intuitive interface for managing and monitoring your workflows
It has the same networking features as other Amazon MWAA environment classes, such as custom URLs and shared virtual private cloud (VPC) support
Scheduler and worker logs remain separate in their respective Amazon CloudWatch log groups, providing ease of monitoring and troubleshooting

Considerations

The architectural decisions behind mw1.micro reflect a balance between functionality and cost-effectiveness. Here are the constraints the limited resources in mw1.micro brings:

The scheduler and worker are combined into a single Fargate task. Only a single scheduler/worker container is supported.
micro consists of a single Fargate task for the web server. The maximum number of web servers is 1.
The number of concurrent Airflow tasks in the worker (worker_autoscale) can be set to a maximum value of 3.

Pricing and availability

Amazon MWAA pricing dimensions remains unchanged, and you only pay for what you use:

The environment class
Metadata database storage consumed

Metadata database storage pricing remains the same. Refer to Amazon Managed Workflows for Apache Airflow Pricing for rates and more details.

Observe Amazon MWAA performance

When you start using the new environment class, it’s important to understand its behavior for maintaining optimal operation and identifying potential capacity issues. It’s essential to monitor key metrics such as metadata database memory usage, and CPU utilization of the worker/scheduler hybrid container. We recommend following the guidance described in Introducing container, database, and queue utilization metrics for Amazon MWAA to better understand the state of your environments, and get insights to right-size your resources.

Set up a new micro environment in Amazon MWAA

You can set up an Amazon MWAA micro environment in your account and preferred AWS Region using the AWS Management Console, API, or AWS Command Line Interface (AWS CLI). If you’re adopting infrastructure as code (IaC), you can automate the setup using AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or Terraform scripts.

The Amazon MWAA micro environment class is available today in all Regions where Amazon MWAA is currently available.

Conclusion

In this post, we announced the availability of the new micro environment class in Amazon MWAA. This offering addresses the needs of teams working on smaller projects, proof-of-concept initiatives, or those requiring isolated environments for different departments. By providing a lightweight yet feature-rich solution, mw1.micro enables organizations to achieve substantial cost savings without compromising on essential functionalities.

As you explore the possibilities of mw1.micro, remember to monitor its performance using the recommended metrics to maintain optimal operation. With its availability across all Regions where Amazon MWAA is offered, your teams can now use the power of Airflow in a more streamlined and economical manner, opening up new opportunities for efficient data pipeline management and orchestration in the cloud.

For additional details and code examples on Amazon MWAA, visit the Amazon MWAA User Guide and the Amazon MWAA examples GitHub repo.

Apache, Apache Airflow, and Airflow are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Authors

Hernan Garcia is a Senior Solutions Architect at AWS based in the Netherlands. He works in the financial services industry, supporting enterprises in their cloud adoption. He is passionate about serverless technologies, security, and compliance. He enjoys spending time with family and friends, and trying out new dishes from different cuisines.

Sriharsh Adari is a Senior Solutions Architect at AWS, where he helps customers work backward from business outcomes to develop innovative solutions on AWS. Over the years, he has helped multiple customers on data platform transformations across industry verticals. His core area of expertise includes technology strategy, data analytics, and data science. In his spare time, he enjoys playing sports, watching TV shows, and playing Tabla.

Important changes to CloudTrail events for AWS IAM Identity Center

2024-11-19 Arthur Mnev

Post Syndicated from Arthur Mnev original https://aws.amazon.com/blogs/security/modifications-to-aws-cloudtrail-event-data-of-iam-identity-center/

AWS IAM Identity Center is streamlining its AWS CloudTrail events by including only essential fields that are necessary for workflows like audit and incident response. This change simplifies user identification in CloudTrail, addressing customer feedback. It also enhances correlation between IAM Identity Center users and external directory services, such as Okta Universal Directory or Microsoft Active Directory.

Effective January 13, 2025, IAM Identity Center will stop emitting userName and principalId fields under the user identity element in CloudTrail events. These fields will be excluded from the CloudTrail events that are initiated when users sign in to IAM Identity Center, use the AWS access portal, and access AWS accounts through the AWS CLI. Instead, IAM Identity Center now emits user ID and Identity Store Amazon Resource Name (ARN) fields to replace the userName and principalId fields, simplifying user identification. IAM Identity Center CloudTrail events will also specify IdentityCenterUser as the identity type instead of Unknown, providing a clear identifier for users. Additionally, IAM Identity Center will omit the value of a group’s displayName in CloudTrail events when you create or update a group. You can access group attributes, such as displayName, by using the Identity Store DescribeGroup API operation for authorized workflows.

We recommend that you update your workflows that process the userName, principalId, userIdentity type, or group displayName fields in CloudTrail events for IAM Identity Center before these changes take effect on January 13, 2025. This blog post provides guidance for these updates.

How to prepare your workflows for the upcoming changes to IAM Identity Center user identification in CloudTrail

To simplify user identification, IAM Identity Center is making changes to the user identity element for its CloudTrail events. Based on these changes, you can update your workflows to link CloudTrail events to a specific user, associate users with their external directories, and track user activity within the same session. The updated user identity element for a sample CloudTrail event is shared at the end of this section.

IAM Identity Center will update the userIdentity type for CloudTrail events that are emitted when users sign in, use the AWS access portal, and access AWS accounts through the AWS CLI. For authenticated users, the userIdentity type will change from Unknown to IdentityCenterUser. For unauthenticated users, the userIdentity type will remain Unknown. We recommend that you update your workflows to accept both values.

To identify the user linked to a CloudTrail event, IAM Identity Center now emits userId and identityStoreArn fields to replace the userName and principalId fields. The userId is a unique and immutable user identifier that IAM Identity Center assigns to every user in the Identity Store, its native directory referenced by the identityStoreArn. These new fields enhance user identification and action tracking in CloudTrail and are present in the CloudTrail entries where the userIdentity type is IdentityCenterUser. For an example of the user identity element with the new fields and the describe-user CLI command to retrieve user attributes using the user ID and Identity Store ARN, see the Identifying the user and session in IAM Identity Center user-initiated CloudTrail events section of the IAM Identity Center User Guide.

Among other user attributes, you can use the describe-user CLI command to retrieve the external ID associated with a user in the Identity Store. You can use the external ID to associate Identity Store users with their external directories. The external ID maps the user to an immutable user identifier in their external directory, such as Microsoft Active Directory or Okta Universal Directory.

Note: IAM Identity Center doesn’t emit an external ID in CloudTrail. You need access to the Identity Store to retrieve an external ID based on the userId and identityStoreArn fields in CloudTrail.

If you have access to the CloudTrail events but not the Identity Store, you can use the UserName field emitted under the additionalEventData element to correlate your users with their external directories. This field represents the username that the user authenticates or federates with when signing in to IAM Identity Center. For more details, see the Correlating users between IAM Identity Center and external directories section of the IAM Identity Center User Guide.

Notes:

When the identity source is the AWS Directory Service, the UserName value logged in the additionalEventData element in CloudTrail is equal to the username that the user enters during authentication. For example, a user who has the username [email protected], can authenticate with anyuser, [email protected], or company.com\anyuser, and in each case the entered value is emitted in CloudTrail respectively.

For a sign-in failure caused by incorrect username input, IAM Identity Center emits the UserName field in its CloudTrail event as a fixed-text value of HIDDEN_DUE_TO_SECURITY_REASONS. This is because the username value input by the user in such a scenario could contain sensitive information, such as a user’s password.

To track user activity within the same session, IAM Identity Center now emits the credentialId field in CloudTrail events for user actions that take place in the AWS access portal or that use the AWS CLI. The credentialId field contains the AWS access portal session ID for a user, to help you track user actions during their session.

The following table shows a CloudTrail event example that illustrates the fields, highlighted in yellow, that will change on January 13, 2025. IAM Identity Center recently started emitting userId, identityStoreArn, credentialId, and UserName in the additional event data for its CloudTrail events. Therefore, this example considers them as existing fields.

Before the upcoming changes

"eventName": "CredentialChallenge",
"eventSource": "signin.amazonaws.com",
"userIdentity": {
  "type": "Unknown",
  "userName": "anyuser",
  "accountId": "123456789012",
  "principalId": "123456789012",
  "onBehalfOf": {
    "userId": "a11111-1111-1111-11a1-111aa111aa11",
    "identityStoreArn": "arn:aws:identitystore::111111111:identitystore/d-111111a1a"
  },
  "credentialId": "1111a111111111a1a11111a1a[…]"
},
"additionalEventData": {
    "CredentialType": "PASSWORD",
    "UserName": "anyuser"
}

After the upcoming changes

"eventName": "CredentialChallenge",
"eventSource": "signin.amazonaws.com",
"userIdentity": {
  "type": "IdentityCenterUser",
  "accountId": "123456789012",
  "onBehalfOf": {
    "userId": "a11111-1111-1111-11a1-111aa111aa11",
    "identityStoreArn": "arn:aws:identitystore::111111111:identitystore/d-111111a1a"
  },
  "credentialId": "1111a111111111a1a11111a1a[…]"
},
"additionalEventData": {
    "CredentialType": "PASSWORD",
    "UserName": "anyuser"
}

How to prepare your workflows for the upcoming changes to IAM Identity Center group management events in CloudTrail

Your workflows that require access to group attributes, such as displayName, can retrieve them by using the Identity Store DescribeGroup API operation. Beginning January 13, 2025, IAM Identity Center will replace the displayName value in the administrative CloudTrail events for CreateGroup and UpdateGroup with a fixed text value of HIDDEN_DUE_TO_SECURITY_REASONS. This update restricts access to the group displayName only to workflows that are authorized to access group attributes in the Identity Store.

The following table shows a CloudTrail event example that illustrates the upcoming change in the displayName field, which is highlighted in yellow.

Before the upcoming changes

"eventName": "CreateGroup",
"eventSource": "sso-directory.amazonaws.com",
"userIdentity": {
  "type": "AssumedRole",
  "userName": "GroupManagerRole",
  "accountId": "123456789012",
  "principalId": "123456789012"
}
…
"group": {
    "groupId": "11a1a111-1111-1010-aaa1-01111a1111a0",
    "displayName": "PowerUserGroup",
    "groupAttributes": {
        "description": {
            "stringValue": "HIDDEN_DUE_TO_SECURITY_REASONS"
        }
    }
}

After the upcoming changes

"eventName": "CreateGroup",
"eventSource": "sso-directory.amazonaws.com",
"userIdentity": {
  "type": "AssumedRole",
  "userName": "GroupManagerRole",
  "accountId": "123456789012",
  "principalId": "123456789012"
}
…
"group": {
    "groupId": "11a1a111-1111-1010-aaa1-01111a1111a0",
    "displayName": "HIDDEN_DUE_TO_SECURITY_REASONS",
    "groupAttributes": {
        "description": {
            "stringValue": "HIDDEN_DUE_TO_SECURITY_REASONS"
        }
    }
}

Gain a deeper understanding of the specific CloudTrail events impacted by the changes

Earlier in this post, we said that IAM Identity Center emits the relevant CloudTrail events when users sign in to IAM Identity Center, use the AWS access portal, and access AWS accounts through the AWS CLI, or when administrators create and update groups. These CloudTrail events belong to four event groups that the IAM Identity Center User Guide refers to as AWS access portal, OIDC, Sign-in, and Identity Store events. The following list provides more details about the use cases that lead to the emission of these CloudTrail events:

The AWS access Portal events cover sign-in and sign-out from the AWS access portal, as well as the retrieval of a user’s account and application assignments, which are necessary to display the portal. IAM Identity Center also emits these events when configuring AWS CLI or IDE toolkits for access to AWS accounts as an IAM Identity Center user.
The relevant OpenID Connect (OIDC) event is CreateToken. IAM Identity Center emits this event when starting a session for an authenticated user (for example, to access assigned AWS accounts through AWS CLI or IDE toolkits).
The Sign-in events cover password-based and federated authentication, as well as multi-factor authentication (MFA).
The relevant Identity Store events include the end-user management of MFA devices inside the AWS access portal and the two administrative Identity Store events, CreateGroup and UpdateGroup.

Note that some of the API operations behind the CloudTrail events in scope are also available as AWS CLI commands:

The two tables in this section provide a detailed record of the changes and their relation to CloudTrail events.

The following table lists the changes to fields emitted by IAM Identity Center and the relevant CloudTrail events.

Changes	AWS access portal (Use of the portal)	OIDC (Sign-in to IAM Identity Center through AWS CLI and IDE toolkits)	Sign-in (authentication, including MFA, federation)	Identity Store (MFA device and group management)
Available as of January 13, 2025
Exclusion of userName from the `userIdentity` element for authenticated users	Yes	Yes, limited to the `CreateToken` event	Yes	Yes, limited to MFA management in the AWS access portal
Exclusion of `principalId` from the `userIdentity` element	Yes	Yes, limited to the `CreateToken` event	Yes	Yes, limited to MFA management in the AWS access portal
Modified `userIdentity`’s type value from `Unknown` to `IdentityCenterUser`	Yes	Yes, limited to the `CreateToken` event	Yes, limited to successful authentications	Yes, limited to MFA management in the AWS access portal
Exclusion of the group `displayName` value from the `requestParameters` and `responseElements` elements	No	No	No	Yes, limited to administrative `CreateGroup` and `UpdateGroup` events
Exclusion of the UserName (in the `additionalEventData` element) a user keys in on failed authentication attempts	No	No	Yes, limited to the `CredentialChallenge` event	No
Available as of October 2024
Addition of the `onBehalfOf` element with `userId` and `identityStoreArn`, and `credentialId` in the `userIdentity` element	Yes	Yes, limited to the `CreateToken` event	Yes, limited to successful authentications	Yes, limited to MFA management in the AWS access portal
Addition of `UserName` in `additionalEventData` element	No	No	Yes, limited to `CredentialChallenge` and `UserAuthentication` events in specific cases	No

The following table summarizes the relevant IAM Identity Center CloudTrail event groups, event sources, and event names.

Event group	Source	Event names
AWS access portal	`sso.amazonaws.com`	`Authenticate Federate ListAccountRoles ListAccounts ListApplications ListProfilesForApplication GetRoleCredentials Logout`
OIDC	`sso.amazonaws.com`	`CreateToken`
Sign-in	`signin.amazon.com`	`CredentialChallenge CredentialVerification UserAuthentication`
Identity Store	`sso-directory.amazonaws.com` or `identitystore.amazonaws.com`	`ListMfaDevicesForUser DeleteMfaDeviceForUser UpdateMfaDeviceForUser StartWebAuthnDeviceRegistration StartVirtualMfaDeviceRegistration CompleteWebAuthnDeviceRegistration CompleteVirtualMfaDeviceRegistration CreateGroup UpdateGroup`

Conclusion

In this post, we reviewed several important upcoming and recently completed changes to CloudTrail events that IAM Identity Center emits. We recommend that you update your CloudTrail based workflows before January 13, 2025 if they rely on the userName, principalId, or type fields in the CloudTrail user identity element when users sign in to IAM Identity Center, use the AWS access portal, access AWS accounts through the AWS CLI, or set a group’s displayName field in group management administrative events. AWS has recently introduced the fields userId, identityStoreArn, and credentialId in the CloudTrail user identity element to help you complete your updates.

Please contact your AWS account team or AWS support if you need additional assistance.

Rapid7 Recognized for ‘Excellence in Workplace Health and Wellbeing’ at the Belfast Telegraph IT Awards

2024-11-19 Rapid7

Post Syndicated from Rapid7 original https://blog.rapid7.com/2024/11/19/rapid7-recognized-for-excellence-in-workplace-health-and-wellbeing-at-the-belfast-telegraph-it-awards/

Rapid7 Recognized for ‘Excellence in Workplace Health and Wellbeing’ at the Belfast Telegraph IT Awards

On Friday, November 15th, Rapid7 was awarded ‘Excellence in Workplace Health and Wellbeing’ at the Belfast Telegraph IT Awards. This award recognizes technology companies in Belfast that prioritize employee well-being.

At Rapid7, we believe that the best ideas and solutions come from diverse, multi-faceted teams. By supporting our people with programs that enhance their well-being and quality of life, we create an environment where they can continue to have rewarding career experiences and make an incredible impact on our business. Our programs go beyond just taking care of people when they are sick. Instead, we look to increase their overall quality of life with unique initiatives and offerings that support both physical and mental health and wellness.

Our award submission was broken down into three key areas where we offer unique benefits that make us leaders in our field. These areas included benefit offerings, physical health and well-being, and mental health and well-being.

Benefit Offerings

Rapid7 is proud to offer unique and competitive benefits to employees and their families. One example is our neurodiversity coverage. Employees at Rapid7, and their family members, have access to specialists for evaluations, screenings, and treatment programs. Appointments and services that would otherwise take months or years are able to happen within weeks.

As part of our health benefit program, once a year, our company participates in a global health and well-being challenge. This is not your typical ‘steps’ challenge, but instead a comprehensive initiative encompassing physical activity, meditation, and mindfulness, designed to build connections across Rapid7 teams.

Physical health and well-being

Our cycle-to-work scheme allows employees to set aside a salary sacrifice to purchase a new bicycle. There is no maximum limit so our employees are often able to select high-end models at an affordable rate. Employees drop in and out of the program as they wish, and this year we have 16 employees saving up to get their new bikes.

For those who prefer a gym or fitness classes, our Chichester street office building is equipped with a full service gym featuring cardio and weight training equipment, as well as a yoga and group fitness studio. The fitness studio has a variety of virtual program on demand, many of which can be completed in just 20 minutes, making it easy for employees to fit in a quick break during their day.

Mental health and wellbeing

AwareNI is a local organization that we’ve been proud to partner with. We participate in their mood matters program, bringing mental health awareness and training to employees across Rapid7. However, what is most unique is our on-site mental health first aiders. We partner with AwareNI to train employees to be on-site mental health first aiders, giving employees a resource in the office to go to if they are experiencing a mental health crisis. As mental health first aiders, these employees are equipped with the skills and knowledge to guide and support colleagues experiencing a mental health-related crisis.

At Rapid7, we are on a mission to create a secure digital world for our customers, our industry, and our communities. We show up every day to keep our 11,000+ customers around the world protected from the latest threats. This requires us to build a dynamic workplace where innovation and collaboration thrive. Taking care of our people is a critical first step, and we’re honored to have been recognized as a leader in this space. To learn more, please visit our careers site at careers.rapid7.com.

Integrate custom applications with AWS Lake Formation – Part 1

2024-11-19 Stefano Sandona

Post Syndicated from Stefano Sandona original https://aws.amazon.com/blogs/big-data/integrate-custom-applications-with-aws-lake-formation-part-1/

AWS Lake Formation makes it straightforward to centrally govern, secure, and globally share data for analytics and machine learning (ML).

With Lake Formation, you can centralize data security and governance using the AWS Glue Data Catalog, letting you manage metadata and data permissions in one place with familiar database-style features. It also delivers fine-grained data access control, so you can make sure users have access to the right data down to the row and column level.

Lake Formation also makes it straightforward to share data internally across your organization and externally, which lets you create a data mesh or meet other data sharing needs with no data movement.

Additionally, because Lake Formation tracks data interactions by role and user, it provides comprehensive data access auditing to verify the right data was accessed by the right users at the right time.

In this two-part series, we show how to integrate custom applications or data processing engines with Lake Formation using the third-party services integration feature.

In this post, we dive deep into the required Lake Formation and AWS Glue APIs. We walk through the steps to enforce Lake Formation policies within custom data applications. As an example, we present a sample Lake Formation integrated application implemented using AWS Lambda.

The second part of the series introduces a sample web application built with AWS Amplify. This web application showcases how to use the custom data processing engine implemented in the first post.

By the end of this series, you will have a comprehensive understanding of how to extend the capabilities of Lake Formation by building and integrating your own custom data processing components.

Integrate an external application

The process of integrating a third-party application with Lake Formation is described in detail in How Lake Formation application integration works.

In this section, we dive deeper into the steps required to establish trust between Lake Formation and an external application, the API operations that are involved, and the AWS Identity and Access Management (IAM) permissions that must be set up to enable the integration.

Lake Formation application integration external data filtering

In Lake Formation, it’s possible to control which third-party engines or applications are allowed to read and filter data in Amazon Simple Storage Service (Amazon S3) locations registered with Lake Formation.

To do so, you can navigate to the Application integration settings page on the Lake Formation console and enable Allow external engines to filter data in Amazon S3 locations registered with Lake Formation, specifying the AWS account IDs from where third-party engines are allowed to access locations registered with Lake Formation. In addition, you have to specify the allowed session tag values to identify trusted requests. We discuss in later sections how these tags are used.

LakeFormation Application integration

Lake Formation application integration involved AWS APIs

The following is a list of the main AWS APIs needed to integrate an application with Lake Formation:

sts:AssumeRole – Returns a set of temporary security credentials that you can use to access AWS resources.
glue:GetUnfilteredTableMetadata – Allows a third-party analytical engine to retrieve unfiltered table metadata from the Data Catalog.
glue:GetUnfilteredPartitionsMetadata – Retrieves partition metadata from the Data Catalog that contains unfiltered metadata.
lakeformation:GetTemporaryGlueTableCredentials – Allows a caller in a secure environment to assume a role with permission to access Amazon S3. To vend such credentials, Lake Formation assumes the role associated with a registered location, for example an S3 bucket, with a scope down policy that restricts the access to a single prefix.
lakeformation:GetTemporaryGluePartitionCredentials – This API is identical to GetTemporaryTableCredentials except that it’s used when the target Data Catalog resource is of type Partition. Lake Formation restricts the permission of the vended credentials with the same scope down policy that restricts access to a single Amazon S3 prefix.

Later in this post, we present a sample architecture illustrating how you can use these APIs.

External application and IAM roles to access data

For an external application to access resources in an Lake Formation environment, it needs to run under an IAM principal (user or role) with the appropriate credentials. Let’s consider a scenario where the external application runs under the IAM role MyApplicationRole that is part of the AWS account 123456789012.

In Lake Formation, you have granted access to various tables and databases to two specific IAM roles:

AccessRole1
AccessRole2

To enable MyApplicationRole to access the resources that have been granted to AccessRole1 and AccessRole2, you need to configure the trust relationships for these access roles. Specifically, you need to configure the following:

Allow MyApplicationRole to assume each of the access roles (AccessRole1 and AccessRole2) using the sts:AssumeRole
Allow MyApplicationRole to tag the assumed session with a specific tag, which is required by Lake Formation. The tag key should be LakeFormationAuthorizedCaller, and the value should match one of the session tag values specified in the Application integration settings page on the Lake Formation console (for example, “application1“).

The following code is an example of the trust relationships configuration for an access role (AccessRole1 or AccessRole2):

[
    {
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::123456789012:role/MyApplicationRole"
        },
        "Action": "sts:AssumeRole"
    },
    {
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::123456789012:role/MyApplicationRole"
        },
        "Action": "sts:TagSession",
        "Condition": {
            "StringEquals": {
                "aws:RequestTag/LakeFormationAuthorizedCaller": "application1"
            }
        }
    }
]

Additionally, the data access IAM roles (AccessRole1 and AccessRole2) must have the following IAM permissions assigned in order to read Lake Formation protected tables:

{
    "Version": "2012-10-17",
    "Statement": {
        "Sid": "LakeFormationManagedAccess",
        "Effect": "Allow",
        "Action": [
            "lakeformation:GetDataAccess",
            "glue:GetTable",
            "glue:GetTables",
            "glue:GetDatabase",
            "glue:GetDatabases",
            "glue:GetPartition",
            "glue:GetPartitions"
        ],
        "Resource": "*"
    }
}

Solution overview

For our solution, Lambda serves as our external trusted engine and application integrated with Lake Formation. This example is provided in order to understand and see in action the access flow and the Lake Formation API responses. Because it’s based on a single Lambda function, it’s not meant to be used in production settings or with high volumes of data.

Moreover, the Lambda based engine has been configured to support a limited set of data files (CSV, Parquet, and JSON), a limited set of table configurations (no nested data), and a limited set of table operations (SELECT only). Due to these limitations, the application should not be used for arbitrary tests.

In this post, we provide instructions on how to deploy a sample API application integrated with Lake Formation that implements the solution architecture. The core of the API is implemented with a Python Lambda function. We also show how to test the function with Lambda tests. In the second post in this series, we provide instructions on how to deploy a web frontend application that integrates with this Lambda function.

Access flow for unpartitioned tables

The following diagram summarizes the access flow when accessing unpartitioned tables.

Solution Architecture - Unpartitioned tables

The workflow consists of the following steps:

User A (authenticated with Amazon Cognito or other equivalent systems) sends a request to the application API endpoint, requesting access to a specific table inside a specific database.
The API endpoint, created with AWS AppSync, handles the request, invoking a Lambda function.
The function checks which IAM data access role the user is mapped to. For simplicity, the example uses a static hardcoded mapping (mappings={ "user1": "lf-app-access-role-1", "user2": "lf-app-access-role-2"}).
The function invokes the sts:AssumeRole API to assume the user-related IAM data access role (lf-app-access-role-1AccessRole1). The AssumeRole operation is performed with the tag LakeFormationAuthorizedCaller, having as its value one of the session tag values specified when configuring the application integration settings in Lake Formation (for example, {'Key': 'LakeFormationAuthorizedCaller','Value': 'application1'}). The API returns a set of temporary credentials, which we refer to as StsCredentials1.
Using StsCredentials1, the function invokes the glue:GetUnfilteredTableMetadata API, passing the requested database and table name. The API returns information like table location, a list of authorized columns, and data filters, if defined.
Using StsCredentials1, the function invokes the lakeformation:GetTemporaryGlueTableCredentials API, passing the requested database and table name, the type of requested access (SELECT), and CELL_FILTER_PERMISSION as the supported permission types (because the Lambda function implements logic to apply row-level filters). The API returns a set of temporary Amazon S3 credentials, which we refer to as S3Credentials1.
Using S3Credentials1, the function lists the S3 files stored in the table location S3 prefix and downloads them.
The retrieved Amazon S3 data is filtered to remove those columns and rows that the user is not allowed access to (authorized columns and row filters were retrieved in Step 5) and authorized data is returned to the user.

Access flow for partitioned tables

The following diagram summarizes the access flow when accessing partitioned tables.

Solution Architecture - Partitioned tables

The steps involved are almost identical to the ones presented for partitioned tables, with the following changes:

After invoking the glue:GetUnfilteredTableMetadata API (Step 5) and identifying the table as partitioned, the Lambda function invokes the glue:GetUnfilteredPartitionsMetadata API using StsCredentials1 (Step 6). The API returns, in addition to other information, the list of partition values and locations.
For each partition, the function performs the following actions:
- Invokes the lakeformation:GetTemporaryGluePartitionCredentials API (Step 7), passing the requested database and table name, the partition value, the type of requested access (SELECT), and CELL_FILTER_PERMISSION as the supported permissions type (because the Lambda function implements logic to apply row-level filters). The API returns a set of temporary Amazon S3 credentials, which we refer to as S3CredentialsPartitionX.
- Uses S3CredentialsPartitionX to list the partition location S3 files and download them (Step 8).
The function appends the retrieved data.
Before the Lambda function returns the results to the user (Step 9), the retrieved Amazon S3 data is filtered to remove those columns and rows that the user is not allowed access to (authorized columns and row filters were retrieved in Step 5).

Prerequisites

The following prerequisites are needed to deploy and test the solution:

Lake Formation should be enabled in the AWS Region where the sample application will be deployed
The steps must be run with an IAM principal with sufficient permissions to create the needed resources, including Lake Formation databases and tables

Deploy solution resources with AWS CloudFormation

We create the solution resources using AWS CloudFormation. The provided CloudFormation template creates the following resources:

One S3 bucket to store table data (lf-app-data-<account-id>)
Two IAM roles, which will be mapped to client users and their associated Lake Formation permission policies (lf-app-access-role-1 and lf-app-access-role-2)
Two IAM roles used for the two created Lambda functions (lf-app-lambda-datalake-population-role and lf-app-lambda-role)
One AWS Glue database (lf-app-entities) with two AWS Glue tables, one unpartitioned (users_tbl) and one partitioned (users_partitioned_tbl)
One Lambda function used to populate the data lake data (lf-app-lambda-datalake-population)
One Lambda function used for the Lake Formation integrated application (lf-app-lambda-engine)
One IAM role used by Lake Formation to access the table data and perform credentials vending (lf-app-datalake-location-role)
One Lake Formation data lake location (s3://lf-app-data-<account-id>/datasets) associated with the IAM role created for credentials vending (lf-app-datalake-location-role)
One Lake Formation data filter (lf-app-filter-1)
One Lake Formation tag (key: sensitive, values: true or false)
Tag associations to tag the created unpartitioned AWS Glue table (users_tbl) columns with the created tag

To launch the stack and provision your resources, complete the following steps:

Download the code zip bundle for the Lambda function used for the Lake Formation integrated application (lf-integrated-app.zip).
Download the code zip bundle for the Lambda function used to populate the data lake data (datalake-population-function.zip).
Upload the zip bundles to an existing S3 bucket location (for example, s3://mybucket/myfolder1/myfolder2/lf-integrated-app.zip and s3://mybucket/myfolder1/myfolder2/datalake-population-function.zip)
Choose Launch Stack.

This automatically launches AWS CloudFormation in your AWS account with a template. Make sure that you create the stack in your intended Region.

Choose Next to move to the Specify stack details section
For Parameters, provide the following parameters:
1. For powertoolsLogLevel, specify how verbose the Lambda function logger should be, from the most verbose to the least verbose (no logs). For this post, we choose DEBUG.
2. For s3DeploymentBucketName, enter the name of the S3 bucket containing the Lambda functions’ code zip bundles. For this post, we use mybucket.
3. For s3KeyLambdaDataPopulationCode, enter the Amazon S3 location containing the code zip bundle for the Lambda function used to populate the data lake data (datalake-population-function.zip). For example, myfolder1/myfolder2/datalake-population-function.zip.
4. For s3KeyLambdaEngineCode, enter the Amazon S3 location containing the code zip bundle for the Lambda function used for the Lake Formation integrated application (lf-integrated-app.zip). For example, myfolder1/myfolder2/lf-integrated-app.zip.
Choose Next.

Cloudformation Create Stack with properties

Add additional AWS tags if required.
Choose Next.
Acknowledge the final requirements.
Choose Create stack.

Enable the Lake Formation application integration

Complete the following steps to enable the Lake Formation application integration:

On the Lake Formation console, choose Application integration settings in the navigation pane.
Enable Allow external engines to filter data in Amazon S3 locations registered with Lake Formation.
For Session tag values, choose application1.
For AWS account IDs, enter the current AWS account ID.
Choose Save.

LakeFormation Application integration

Enforce Lake Formation permissions

The CloudFormation stack created one database named lf-app-entities with two tables named users_tbl and users_partitioned_tbl.

To be sure you’re using Lake Formation permissions, you should confirm that you don’t have any grants set up on those tables for the principal IAMAllowedPrincipals. The IAMAllowedPrincipals group includes any IAM users and roles that are allowed access to your Data Catalog resources by your IAM policies, and it’s used to maintain backward compatibility with AWS Glue.

To confirm Lake Formations permissions are enforced, navigate to the Lake Formation console and choose Data lake permissions in the navigation pane. Filter permissions by Database=lf-app-entities and remove all the permissions given to the principal IAMAllowedPrincipals.

For more details on IAMAllowedPrincipals and backward compatibility with AWS Glue, refer to Changing the default security settings for your data lake.

Check the created Lake Formation resources and permissions

The CloudFormation stack created two IAM roles—lf-app-access-role-1 and lf-app-access-role-2—and assigned them different permissions on the users_tbl (unpartitioned) and users_partitioned_tbl (partitioned) tables. The specific Lake Formation grants are summarized in the following table.

IAM Roles	lf-app-entities (Database)
	users _tbl (Table)	_tbl _partitioned_tbl (Table)
`lf-app-access-role-1`	No access	Read access on columns `uid`, `state`, and `city` for all the records. Read access to all columns except for `address` only on rows with value `state=united kingdom`.
`lf-app-access-role-2`	Read access on columns with the tag `sensitive = false`	Read access to all columns and rows.

To better understand the full permissions setup, you should review the CloudFormation created Lake Formation resources and permissions. On the Lake Formation console, complete the following steps:

Review the data filters:
1. Choose Data filters in the navigation pane.
2. Inspect the lf-app-filter-1
Review the tags:
1. Choose LF-Tags and permissions in the navigation pane.
2. Inspect the sensitive
Review the tag associations:
1. Choose Tables in the navigation pane.
2. Choose the users_tbl
3. Inspect the LF-Tags associated to the different columns in the Schema
Review the Lake Formation permissions:
1. Choose Data lake permissions in the navigation pane.
2. Filter by Principal = lf-app-access-role-1 and inspect the assigned permissions.
3. Filter by Principal = lf-app-access-role-2 and inspect the assigned permissions.

Test the Lambda function

The Lambda function created by the CloudFormation template accepts JSON objects as input events. The JSON events have the following structure:

 {
  "identity": {
    "username": "XXX"
  },
  "fieldName": "YYY",
  "arguments": {
    "AA": "BB",
    ...
  }
}

Although the identity field is always needed in order to identify the called identity, depending on the requested operation (fieldName), different arguments should be provided. The following table lists these arguments.

Operation	Description	Needed Arguments	Output
`getDbs`	List databases	No arguments needed	List of databases the user has access to
`getTablesByDb`	List tables	`db: <db_name>`	List of tables inside a database the user has access to
`getUnfilteredTableMetadata`	Return the table metadata	`db: <db_name>` `table: <table_name>`	Returns the output of the glue:GetUnfilteredTableMetadata API
`getUnfilteredPartitionsMetadata`	Return the table partitions metadata	`db: <db_name>` `table: <table_name>`	Returns the output of the glue:GetUnfilteredPartitionsMetadata API
`getTableData`	Get table data	`db: <db_name>` `table: <table_name>` `noOfRecs: N` (number of records to pull) `nonNullRowsOnly: true/false` (`true` to filter out records with all null values)	`location`: Table location `authorizedData`: records of the table the user has access to `allColumns`: All the columns of the table (returned only for demonstration and comparison purposes) `allData`: All the records of the table without any filtering (returned only for demonstration and comparison purposes) `cellFilters`: Lake Formation filters (applied to `allData` to return `authorizedData`) `authorizedColumns`: Columns to which the user has access to (projection applied to `allData` to return `authorizedData`)

To test the Lambda function, you can create some sample Lambda test events. Complete the following steps:

On the Lambda console, choose Functions on the navigation pane.
Choose the lf-app-lambda-engine
On the Test tab, select Create new event.
For Event JSON, enter a valid JSON (we provide some sample JSON events).
Choose Test.

Creata Lambda Test

Check the test results (JSON response).

Lambda Test Result

The following are some sample test events you can try to see how different identities can access different sets of information.

user1	user2
`{ "identity": { "username": "user1" }, "fieldName": "getDbs" }`	`{ "identity": { "username": "user2" }, "fieldName": "getDbs" }`
`{ "identity": { "username": "user1" }, "fieldName": "getTablesByDb", "arguments": { "db": "lf-app-entities" } }`	`{ "identity": { "username": "user2" }, "fieldName": "getTablesByDb", "arguments": { "db": "lf-app-entities" } }`
`{ "identity": { "username": "user1" }, "fieldName": "getUnfilteredTableMetadata", "arguments": { "db": "lf-app-entities", "table": "users_tbl" } }`	`{ "identity": { "username": "user2" }, "fieldName": "getUnfilteredTableMetadata", "arguments": { "db": "lf-app-entities", "table": "users_tbl" } }`
`{ "identity": { "username": "user1" }, "fieldName": "getUnfilteredTableMetadata", "arguments": { "db": "lf-app-entities", "table": "users_partitioned_tbl" } }`	`{ "identity": { "username": "user2" }, "fieldName": "getUnfilteredTableMetadata", "arguments": { "db": "lf-app-entities", "table": "users_partitioned_tbl" } }`
`{ "identity": { "username": "user1" }, "fieldName": "getUnfilteredPartitionsMetadata", "arguments": { "db": "lf-app-entities", "table": "users_tbl" } }`	`{ "identity": { "username": "user2" }, "fieldName": "getUnfilteredPartitionsMetadata", "arguments": { "db": "lf-app-entities", "table": "users_tbl" } }`
`{ "identity": { "username": "user1" }, "fieldName": "getUnfilteredPartitionsMetadata", "arguments": { "db": "lf-app-entities", "table": "users_partitioned_tbl" } }`	`{ "identity": { "username": "user2" }, "fieldName": "getUnfilteredPartitionsMetadata", "arguments": { "db": "lf-app-entities", "table": "users_partitioned_tbl" } }`
`{ "identity": { "username": "user1" }, "fieldName": "getTableData", "arguments": { "db": "lf-app-entities", "table": "users_tbl", "noOfRecs": 10, "nonNullRowsOnly": true } }`	`{ "identity": { "username": "user2" }, "fieldName": "getTableData", "arguments": { "db": "lf-app-entities", "table": "users_tbl", "noOfRecs": 10, "nonNullRowsOnly": true } }`
`{ "identity": { "username": "user1" }, "fieldName": "getTableData", "arguments": { "db": "lf-app-entities", "table": "users_partitioned_tbl", "noOfRecs": 10, "nonNullRowsOnly": true } }`	`{ "identity": { "username": "user2" }, "fieldName": "getTableData", "arguments": { "db": "lf-app-entities", "table": "users_partitioned_tbl", "noOfRecs": 10, "nonNullRowsOnly": true } }`

As an example, in the following test, we request users_partitioned_tbl table data in the context of user1:

{
  "identity": {
    "username": "user1"
  },
  "fieldName": "getTableData",
  "arguments": {
    "db": "lf-app-entities",
    "table": "users_partitioned_tbl",
    "noOfRecs": 10,
    "nonNullRowsOnly": true
  }
}

The following is the related API response:

{
  "database": "lf-app-entities",
  "name": "users_partitioned_tbl",
  "location": "s3://lf-app-data-123456789012/datasets/lf-app-entities/users_partitioned/",
  "authorizedColumns": [
    {
      "Name": "born_year",
      "Type": "string"
    },
    {
      "Name": "city",
      "Type": "string"
    },
    {
      "Name": "name",
      "Type": "string"
    },
    {
      "Name": "state",
      "Type": "string"
    },
    {
      "Name": "surname",
      "Type": "string"
    },
    {
      "Name": "uid",
      "Type": "int"
    }
  ],
  "authorizedData": [
    [
      "1980",
      "bristol",
      "emily",
      "united kingdom",
      "brown",
      4
    ],
    [
      "1980",
      "vancouver",
      "<FILTEREDCELL>",
      "canada",
      "<FILTEREDCELL>",
      5
    ],
    [
      "1980",
      "madrid",
      "<FILTEREDCELL>",
      "spain",
      "<FILTEREDCELL>",
      6
    ],
    [
      "1980",
      "mexico city",
      "<FILTEREDCELL>",
      "mexico",
      "<FILTEREDCELL>",
      10
    ],
    [
      "1980",
      "zurich",
      "<FILTEREDCELL>",
      "switzerland",
      "<FILTEREDCELL>",
      11
    ],
    [
      "1980",
      "buenos aires",
      "<FILTEREDCELL>",
      "argentina",
      "<FILTEREDCELL>",
      12
    ],
    [
      "1990",
      "london",
      "john",
      "united kingdom",
      "pike",
      1
    ],
    [
      "1990",
      "milan",
      "<FILTEREDCELL>",
      "italy",
      "<FILTEREDCELL>",
      2
    ],
    [
      "1990",
      "berlin",
      "<FILTEREDCELL>",
      "germany",
      "<FILTEREDCELL>",
      3
    ],
    [
      "1990",
      "munich",
      "<FILTEREDCELL>",
      "germany",
      "<FILTEREDCELL>",
      7
    ]
  ],
  "allColumns": [
    {
      "Name": "address",
      "Type": "string"
    },
    {
      "Name": "born_year",
      "Type": "string"
    },
    {
      "Name": "city",
      "Type": "string"
    },
    {
      "Name": "name",
      "Type": "string"
    },
    {
      "Name": "state",
      "Type": "string"
    },
    {
      "Name": "surname",
      "Type": "string"
    },
    {
      "Name": "uid",
      "Type": "int"
    }
  ],
  "allData": [
    [
      "beautiful avenue 123",
      "1980",
      "bristol",
      "emily",
      "united kingdom",
      "brown",
      4
    ],
    [
      "lake street 45",
      "1980",
      "vancouver",
      "david",
      "canada",
      "lee",
      5
    ],
    [
      "plaza principal 6",
      "1980",
      "madrid",
      "sophia",
      "spain",
      "luz",
      6
    ],
    [
      "avenida de arboles 40",
      "1980",
      "mexico city",
      "olivia",
      "mexico",
      "garcia",
      10
    ],
    [
      "pflanzenstrasse 34",
      "1980",
      "zurich",
      "lucas",
      "switzerland",
      "fischer",
      11
    ],
    [
      "avenida de luces 456",
      "1980",
      "buenos aires",
      "isabella",
      "argentina",
      "afortunado",
      12
    ],
    [
      "hidden road 78",
      "1990",
      "london",
      "john",
      "united kingdom",
      "pike",
      1
    ],
    [
      "via degli alberi 56A",
      "1990",
      "milan",
      "mario",
      "italy",
      "rossi",
      2
    ],
    [
      "green road 90",
      "1990",
      "berlin",
      "july",
      "germany",
      "finn",
      3
    ],
    [
      "parkstrasse 789",
      "1990",
      "munich",
      "oliver",
      "germany",
      "schmidt",
      7
    ]
  ],
  "filteredCellPh": "<FILTEREDCELL>",
  "cellFilters": [
    {
      "ColumnName": "born_year",
      "RowFilterExpression": "TRUE"
    },
    {
      "ColumnName": "city",
      "RowFilterExpression": "TRUE"
    },
    {
      "ColumnName": "name",
      "RowFilterExpression": "state='united kingdom'"
    },
    {
      "ColumnName": "state",
      "RowFilterExpression": "TRUE"
    },
    {
      "ColumnName": "surname",
      "RowFilterExpression": "state='united kingdom'"
    },
    {
      "ColumnName": "uid",
      "RowFilterExpression": "TRUE"
    }
  ]
}

To troubleshoot the Lambda function, you can navigate to the Monitoring tab, choose View CloudWatch logs, and inspect the latest log stream.

Clean up

If you plan to explore Part 2 of this series, you can skip this part, because you will need the resources created here. You can refer to this section at the end of your testing.

Complete the following steps to remove the resources you created following this post and avoid incurring additional costs:

On the AWS CloudFormation console, choose Stacks in the navigation pane.
Choose the stack you created and choose Delete.

Additional considerations

In the proposed architecture, Lake Formation permissions were granted to specific IAM data access roles that requesting users (for example, the identity field) were mapped to. Another possibility is to assign permissions in Lake Formation to SAML users and groups and then work with the AssumeDecoratedRoleWithSAML API.

Conclusion

In the first part of this series, we explored how to integrate custom applications and data processing engines with Lake Formation. We delved into the required configuration, APIs, and steps to enforce Lake Formation policies within custom data applications. As an example, we presented a sample Lake Formation integrated application built on Lambda.

The information provided in this post can serve as a foundation for developing your own custom applications or data processing engines that need to operate on an Lake Formation protected data lake.

Refer to the second part of this series to see how to build a sample web application that uses the Lambda based Lake Formation application.

About the Authors

Stefano Sandona Picture Stefano Sandonà is a Senior Big Data Specialist Solution Architect at AWS. Passionate about data, distributed systems, and security, he helps customers worldwide architect high-performance, efficient, and secure data platforms.

Francesco Marelli Picture Francesco Marelli is a Principal Solutions Architect at AWS. He specializes in the design, implementation, and optimization of large-scale data platforms. Francesco leads the AWS Solution Architect (SA) analytics team in Italy. He loves sharing his professional knowledge and is a frequent speaker at AWS events. Francesco is also passionate about music.

Integrate custom applications with AWS Lake Formation – Part 2

2024-11-19 Stefano Sandona

Post Syndicated from Stefano Sandona original https://aws.amazon.com/blogs/big-data/integrate-custom-applications-with-aws-lake-formation-part-2/

In the first part of this series, we demonstrated how to implement an engine that uses the capabilities of AWS Lake Formation to integrate third-party applications. This engine was built using an AWS Lambda Python function.

In this post, we explore how to deploy a fully functional web client application, built with JavaScript/React through AWS Amplify (Gen 1), that uses the same Lambda function as the backend. The provisioned web application provides a user-friendly and intuitive way to view the Lake Formation policies that have been enforced.

For the purposes of this post, we use a local machine based on MacOS and Visual Studio Code as our integrated development environment (IDE), but you could use your preferred development environment and IDE.

Solution overview

AWS AppSync creates serverless GraphQL and pub/sub APIs that simplify application development through a single endpoint to securely query, update, or publish data.

GraphQL is a data language to enable client apps to fetch, change, and subscribe to data from servers. In a GraphQL query, the client specifies how the data is to be structured when it’s returned by the server. This makes it possible for the client to query only for the data it needs, in the format that it needs it in.

Amplify streamlines full-stack app development. With its libraries, CLI, and services, you can connect your frontend to the cloud for authentication, storage, APIs, and more. Amplify provides libraries for popular web and mobile frameworks, like JavaScript, Flutter, Swift, and React.

Prerequisites

The web application that we deploy depends on the Lambda function that was deployed in the first post of this series. Make sure the function is already deployed and working in your account.

Install and configure the AWS CLI

The AWS Command Line Interface (AWS CLI) is an open source tool that enables you to interact with AWS services using commands in your command line shell. To install and configure the AWS CLI, see Getting started with the AWS CLI.

Install and configure the Amplify CLI

To install and configure the Amplify CLI, see Set up Amplify CLI. Your development machine must have the following installed:

Node.js v14.x or later
npm v6.14.4 or later
git v2.14.1 or later

Create the application

We create a JavaScript application using the React framework.

In the terminal, enter the following command:

npm create vite@latest

Enter a name for your project (we use lfappblog), choose React for the framework, and choose JavaScript for the variant.

You can now run the next steps, ignore any warning messages. Don’t run the npm run dev command yet.

Enter the following command:

cd lfappblog && npm install

You should now see the directory structure shown in the following screenshot.

You can now test the newly created application by running the following command:

npm run dev

By default, the application is available on port 5173 on your local machine.

The base application is shown in the workspace browser.

You can close the browser window and then the test web server by entering the following in the terminal: q + enter

Set up and configure Amplify for the application

To set up Amplify for the application, complete the following steps:

Run the following command in the application directory to initialize Amplify:

amplify init

Refer to the following screenshot for all the options required. Make sure to change the value of Distribution Directory Path to dist. The command creates and runs the required AWS CloudFormation template to create the backend environment in your AWS account.

amplify init command and output - animated

amplify init command and output

Install the node modules required by the application with the following command:

npm install aws-amplify \
@aws-amplify/ui-react \
ace-builds \
file-loader \
@cloudscape-design/components @cloudscape-design/global-styles

npm install for required packages command and output

The output of this command will vary depending on the packages already installed on your development machine.

Add Amplify authentication

Amplify can implement authentication with Amazon Cognito user pools. You run this step before adding the function and the Amplify API capabilities so that the user pool created can be set as the authentication mechanism for the API, otherwise it would default to the API key and further modifications would be required.

Run the following command and accept all the defaults:

amplify add auth

amplify add auth command and output - animated

amplify add auth command and output

Add the Amplify API

The application backend is based on a GraphQL API with resolvers implemented as a Python Lambda function. The API feature of Amplify can create the required resources for GraphQL APIs based on AWS AppSync (default) or REST APIs based on Amazon API Gateway.

Run the following command to add and initialize the GraphQL API:

amplify add api

Make sure to set Blank Schema as the schema template (a full schema is provided as part of this post; further instructions are provided in the next sections).
Make sure to select Authorization modes and then Amazon Cognito User Pool.

amplify add api command and output - animated

amplify add api command and output

Add Amplify hosting

Amplify can host applications using either the Amplify console or Amazon CloudFront and Amazon Simple Storage Service (Amazon S3) with the option to have manual or continuous deployment. For simplicity, we use the Hosting with Amplify Console and Manual Deployment options.

Run the following command:

amplify add hosting

amplify add hosting command and output - animated

Copy and configure the GraphQL API schema

You’re now ready to copy and configure the GraphQL schema file and update it with the current Lambda function name.

Run the following commands:

export PROJ_NAME=lfappblog
aws s3 cp s3://aws-blogs-artifacts-public/BDB-3934/schema.graphql \
~/${PROJ_NAME}/amplify/backend/api/${PROJ_NAME}/schema.graphql

In the schema.graphql file, you can see that the lf-app-lambda-engine function is set as the data source for the GraphQL queries.

schema.graphql file content

Copy and configure the AWS AppSync resolver template

AWS AppSync uses templates to preprocess the request payload from the client before it’s sent to the backend and postprocess the response payload from the backend before it’s sent to the client. The application requires a modified template to correctly process custom backend error messages.

Run the following commands:

export PROJ_NAME=lfappblog
aws s3 cp s3://aws-blogs-artifacts-public/BDB-3934/InvokeLfAppLambdaEngineLambdaDataSource.res.vtl \
~/${PROJ_NAME}/amplify/backend/api/${PROJ_NAME}/resolvers/

In the InvokeLfAppLambdaEngineLambdaDataSource.res.vtl file, you can inspect the .vtl resolver definition.

InvokeLfAppLambdaEngineLambdaDataSource.res.vtl file content

Copy the application client code

As last step, copy the application client code:

export PROJ_NAME=lfappblog
aws s3 cp s3://aws-blogs-artifacts-public/BDB-3934/App.jsx \
~/${PROJ_NAME}/src/App.jsx

You can now open App.jsx to inspect it.

Publish the full application

From the project directory, run the following command to verify all resources are ready to be created on AWS:

amplify status

amplify status command and output

Run the following command to publish the full application:

amplify publish

This will take several minutes to complete. Accept all defaults apart from Enter maximum statement depth [increase from default if your schema is deeply nested], which must be set to 5.

amplify publish command and output - animated

amplify publish command and output

All the resources are now deployed on AWS and ready for use.

Use the application

You can start using the application from the Amplify hosted domain.

Run the following command to retrieve the application URL:

amplify status

amplify status command and output

At first access, the application shows the Amazon Cognito login page.

Choose Create Account and create a user with user name user1 (this is mapped in the application to the role lf-app-access-role-1 for which we created Lake Formation permissions in the first post).

Enter the confirmation code that you received through email and choose Sign In.

When you’re logged in, you can start interacting with the application.

Application starting screen

Controls

The application offers several controls:

Database – You can select a database registered with Lake Formation with the Describe permission.

Application database control

Table – You can choose a table with Select permission.

Application Table and Number of Records controls

Number of records – This indicates the number of records (between 5–40) to display on the Data Because this is a sample application, no pagination was implemented in the backend.
Row type – Enable this option to display only rows that have at least one cell with authorized data. If all cells in a row are unauthorized and checkbox is selected, the row is not displayed.

Outputs

The application has four outputs, organized in tabs.

Unfiltered Table Metadata

This tab displays the response of the AWS Glue API GetUnfilteredTableMetadata policies for the selected table. The following is an example of the content:

{
  "Table": {
    "Name": "users_tbl",
    "DatabaseName": "lf-app-entities",
    "CreateTime": "2024-07-10T10:00:26+00:00",
    "UpdateTime": "2024-07-10T11:41:36+00:00",
    "Retention": 0,
    "StorageDescriptor": {
      "Columns": [
        {
          "Name": "uid",
          "Type": "int"
        },
        {
          "Name": "name",
          "Type": "string"
        },
        {
          "Name": "surname",
          "Type": "string"
        },
        {
          "Name": "state",
          "Type": "string"
        },
        {
          "Name": "city",
          "Type": "string"
        },
        {
          "Name": "address",
          "Type": "string"
        }
      ],
      "Location": "s3://lf-app-data-123456789012/datasets/lf-app-entities/users/",
      "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
      "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
      "Compressed": false,
      "NumberOfBuckets": 0,
      "SerdeInfo": {
        "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
        "Parameters": {
          "field.delim": ","
        }
      },
      "SortColumns": [],
      "StoredAsSubDirectories": false
    },
    "PartitionKeys": [],
    "TableType": "EXTERNAL_TABLE",
    "Parameters": {
      "classification": "csv"
    },
    "CreatedBy": "arn:aws:sts::123456789012:assumed-role/Admin/fmarelli",
    "IsRegisteredWithLakeFormation": true,
    "CatalogId": "123456789012",
    "VersionId": "1"
  },
  "AuthorizedColumns": [
    "city",
    "state",
    "uid"
  ],
  "IsRegisteredWithLakeFormation": true,
  "CellFilters": [
    {
      "ColumnName": "city",
      "RowFilterExpression": "TRUE"
    },
    {
      "ColumnName": "state",
      "RowFilterExpression": "TRUE"
    },
    {
      "ColumnName": "uid",
      "RowFilterExpression": "TRUE"
    }
  ],
  "ResourceArn": "arn:aws:glue:us-east-1:123456789012:table/lf-app-entities/users"
}

Unfiltered Partitions Metadata

This tab displays the response of the AWS Glue API GetUnfileteredPartitionsMetadata policies for the selected table. The following is an example of the content:

{
  "UnfilteredPartitions": [
    {
      "Partition": {
        "Values": [
          "1991"
        ],
        "DatabaseName": "lf-app-entities",
        "TableName": "users_partitioned_tbl",
        "CreationTime": "2024-07-10T11:34:32+00:00",
        "LastAccessTime": "1970-01-01T00:00:00+00:00",
        "StorageDescriptor": {
          "Columns": [
            {
              "Name": "uid",
              "Type": "int"
            },
            {
              "Name": "name",
              "Type": "string"
            },
            {
              "Name": "surname",
              "Type": "string"
            },
            {
              "Name": "state",
              "Type": "string"
            },
            {
              "Name": "city",
              "Type": "string"
            },
            {
              "Name": "address",
              "Type": "string"
            }
          ],
          "Location": "s3://lf-app-data-123456789012/datasets/lf-app-entities/users_partitioned/born_year=1991",
          "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
          "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
          "Compressed": false,
          "NumberOfBuckets": 0,
          "SerdeInfo": {
            "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "Parameters": {
              "field.delim": ","
            }
          },
          "BucketColumns": [],
          "SortColumns": [],
          "Parameters": {},
          "StoredAsSubDirectories": false
        },
        "CatalogId": "123456789012"
      },
      "AuthorizedColumns": [
        "address",
        "city",
        "name",
        "state",
        "surname",
        "uid"
      ],
      "IsRegisteredWithLakeFormation": true
    },
    {
      "Partition": {
        "Values": [
          "1990"
        ],
        "DatabaseName": "lf-app-entities",
        "TableName": "users_partitioned_tbl",
        "CreationTime": "2024-07-10T11:34:32+00:00",
        "LastAccessTime": "1970-01-01T00:00:00+00:00",
        "StorageDescriptor": {
          "Columns": [
            {
              "Name": "uid",
              "Type": "int"
            },
            {
              "Name": "name",
              "Type": "string"
            },
            {
              "Name": "surname",
              "Type": "string"
            },
            {
              "Name": "state",
              "Type": "string"
            },
            {
              "Name": "city",
              "Type": "string"
            },
            {
              "Name": "address",
              "Type": "string"
            }
          ],
          "Location": "s3://lf-app-data-123456789012/datasets/lf-app-entities/users_partitioned/born_year=1990",
          "InputFormat": "org.apache.hadoop.mapred.TextInputFormat",
          "OutputFormat": "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat",
          "Compressed": false,
          "NumberOfBuckets": 0,
          "SerdeInfo": {
            "SerializationLibrary": "org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe",
            "Parameters": {
              "field.delim": ","
            }
          },
          "BucketColumns": [],
          "SortColumns": [],
          "Parameters": {},
          "StoredAsSubDirectories": false
        },
        "CatalogId": "123456789012"
      },
      "AuthorizedColumns": [
        "address",
        "city",
        "name",
        "state",
        "surname",
        "uid"
      ],
      "IsRegisteredWithLakeFormation": true
    }
  ]
}

Authorized Data

This tab displays a table that shows the columns, rows, and cells that the user is authorized to access.

Application Authorized Data tab

A cell is marked as Unauthorized if the user has no permissions to access its contents, according to the cell filter definition. You can choose the unauthorized cell to view the relevant cell filter condition.

Application Authorized Data tab cell pop up example

In this example, the user can’t access the value of column surname in the first row because for the row, state is canada, but the cell can only be accessed when state=’united kingdom’.

If the Only rows with authorized data control is unchecked, rows with all cells set to Unauthorized are also displayed.

All Data

This tab contains a table that contains all the rows and columns in the table (the unfiltered data). This is useful for comparison with authorized data to understand how cell filters are applied to the unfiltered data.

Application All Data tab

Test Lake Formation permissions

Log out of the application and go to the Amazon Cognito login form, choose Create Account, and create a new user with called user2 (this is mapped in the application to the role lf-app-access-role-2 that we created Lake Formation permissions for in the first post). Get table data and metadata for this user to see how Lake Formation permissions are enforced and so the two users can see different data (on the Authorized Data tab).

The following screenshot shows that the Lake Formation permissions we created grant access to the following data (all rows, all columns) of table users_partitioned_tbl to user2 (mapped to lf-app-access-role-2).

Application Authorized Data tab for user2 on table users_partitioned_tbl

The following screenshot shows that the Lake Formation permissions we created grant access to the following data (all rows, but only city, state, and uid columns) of table users_tbl to user2 (mapped to lf-app-access-role-2).

Application Authorized Data tab for user2 on table users_partitioned

Considerations for the GraphQL API

You can use the AWS AppSync GraphQL API deployed in this post for other applications; the responses of the GetUnfilteredTableMetadata and GetUnfileteredPartitionsMetadata AWS Glue APIs were fully mapped in the GraphQL schema. You can use the Queries page on the AWS AppSync console to run the queries; this is based on GraphiQL.

AWS AppSync Queries page

You can use the following object to define the query variables:

{ 
  "db": "lf-app-entities",
  "table": "users_partitioned_tbl",
  "noOfRecs": 30,
  "nonNullRowsOnly": true
}

The following code shows the queries available with input parameters and all fields defined in the schema as output:

  query GetDbs {
    getDbs {
      catalogId
      name
      description
    }
  }

  query GetTablesByDb($db: String!) {
    getTablesByDb(db: $db) {
      Name
      DatabaseName
      Location
      IsPartitioned
    }
  }
  
  query GetTableData(
    $db: String!
    $table: String!
    $noOfRecs: Int
    $nonNullRowsOnly: Boolean!
  ) {
    getTableData(
      db: $db
      table: $table
      noOfRecs: $noOfRecs
      nonNullRowsOnly: $nonNullRowsOnly
    ) {
      database
      name
      location
      authorizedColumns {
        Name
        Type
      }
      authorizedData
      allColumns {
        Name
        Type
      }
      allData
      filteredCellPh
      cellFilters {
        ColumnName
        RowFilterExpression
      }
    }
  }

  query GetUnfilteredTableMetadata($db: String!, $table: String!) {
    getUnfilteredTableMetadata(db: $db, table: $table) {
      JsonResp
      ApiResp {
        Table {
          Name
          DatabaseName
          Description
          Owner
          CreateTime
          UpdateTime
          LastAccessTime
          LastAnalyzedTime
          Retention
          StorageDescriptor {
            Columns {
              Name
              Type
              Comment
            }
            Location
            AdditionalLocations
            InputFormat
            OutputFormat
            Compressed
            NumberOfBuckets
            SerdeInfo {
              Name
              SerializationLibrary
            }
            BucketColumns
            SortColumns {
              Column
              SortOrder
            }
            Parameters {
              Name
              Value
            }
            SkewedInfo {
              SkewedColumnNames
              SkewedColumnValues
            }
            StoredAsSubDirectories
            SchemaReference {
              SchemaVersionId
              SchemaVersionNumber
            }
          }
          PartitionKeys {
            Name
            Type
            Comment
            Parameters {
              Name
              Value
            }
          }
          ViewOriginalText
          ViewExpandedText
          TableType
          Parameters {
            Name
            Value
          }
          CreatedBy
          IsRegisteredWithLakeFormation
          TargetTable {
            CatalogId
            DatabaseName
            Name
            Region
          }
          CatalogId
          VersionId
          FederatedTable {
            Identifier
            DatabaseIdentifier
            ConnectionName
          }
          ViewDefinition {
            IsProtected
            Definer
            SubObjects
            Representations {
              Dialect
              DialectVersion
              ViewOriginalText
              ViewExpandedText
              ValidationConnection
              IsStale
            }
          }
          IsMultiDialectView
        }
        AuthorizedColumns
        IsRegisteredWithLakeFormation
        CellFilters {
          ColumnName
          RowFilterExpression
        }
        QueryAuthorizationId
        IsMultiDialectView
        ResourceArn
        IsProtected
        Permissions
        RowFilter
      }
    }
  }

  query GetUnfilteredPartitionsMetadata($db: String!, $table: String!) {
    getUnfilteredPartitionsMetadata(db: $db, table: $table) {
      JsonResp
      ApiResp {
        Partition {
          Values
          DatabaseName
          TableName
          CreationTime
          LastAccessTime
          StorageDescriptor {
            Columns {
              Name
              Type
              Comment
            }
            Location
            AdditionalLocations
            InputFormat
            OutputFormat
            Compressed
            NumberOfBuckets
            SerdeInfo {
              Name
              SerializationLibrary
            }
            BucketColumns
            SortColumns {
              Column
              SortOrder
            }
            Parameters {
              Name
              Value
            }
            SkewedInfo {
              SkewedColumnNames
              SkewedColumnValues
            }
            StoredAsSubDirectories
            SchemaReference {
              SchemaVersionId
              SchemaVersionNumber
            }
          }
          Parameters {
            Name
            Value
          }
          LastAnalyzedTime
          CatalogId
        }
        AuthorizedColumns
        IsRegisteredWithLakeFormation
      }
    }
  }

Clean up

To remove the resources created in this post, run the following command:

amplify delete

amplify delete command and output

Refer to Part 1 to clean up the resources created in the first part of this series.

Conclusion

In this post, we showed how to implement a web application that uses a GraphQL API implemented with AWS AppSync and Lambda as the backend for a web application integrated with Lake Formation. You should now have a comprehensive understanding of how to extend the capabilities of Lake Formation by building and integrating your own custom data processing applications.

Try out this solution for yourself, and share your feedback and questions in the comments.

About the Authors

Manage access controls in generative AI-powered search applications using Amazon OpenSearch Service and Amazon Cognito

2024-11-19 Karim Akhnoukh

Post Syndicated from Karim Akhnoukh original https://aws.amazon.com/blogs/big-data/manage-access-controls-in-generative-ai-powered-search-applications-using-amazon-opensearch-service-and-aws-cognito/

Organizations of all sizes and types are using generative AI to create products and solutions. A common adoption pattern is to introduce document search tools to internal teams, especially advanced document searches based on semantic search. In semantic search, documents are stored as vectors, a numeric representation of the document content, in a vector database such as Amazon OpenSearch Service, and are retrieved by performing similarity search with a vector representation of the search query.

In a real-world scenario, organizations want to make sure their users access only documents they are entitled to access. They are looking for a reliable and scalable solution to implement robust access controls to make sure these documents are only accessible to individuals who have a legitimate business need and the appropriate level of authorization. The permission mechanism has to be secure, built on top of built-in security features, and scalable for manageability when the user base scales out. Maintaining proper access controls for these sensitive assets is paramount, because unauthorized access could lead to severe consequences, such as data breaches, compliance violations, and reputational damage.

In this post, we show you how to manage user access to enterprise documents in generative AI-powered tools according to the access you assign to each persona.

Common use cases

The following are industry-specific use cases for document access management across different departments:

In R&D and engineering, access to product design documents evolves from restricted to broader as development progresses
HR maintains open access to general policies while limiting access to sensitive employee information
Finance and accounting documents require varying levels of access for auditing and executive decision-making
Sales and marketing teams carefully manage customer data and strategies, implementing tiered access for different roles and departments

These examples demonstrate the need for dynamic, role-based access control to balance information sharing with confidentiality in various business contexts.

Solution overview

By combining the powerful vector search capabilities of OpenSearch Service with the access control features provided by Amazon Cognito, this solution enables organizations to manage access controls based on custom user attributes and document metadata.

This approach simplifies the management of access rights, making sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. Following this approach, you can manage the access to your organization’s documents at scale. The following diagram depicts the solution architecture.

The solution workflow consists of the following steps:

The user accesses a smart search portal and lands on a web interface deployed on AWS Amplify.
The user authenticates through an Amazon Cognito user pool and an access token is returned to the client. This access token will be used to retrieve the key pair custom attributes assigned to the user. In our case, we created two custom attributes (custom:department and custom:access_level).
For each user query, an API is invoked on Amazon API Gateway to process the request. Each invocation includes the user access token in the header.
The API is integrated with AWS Lambda, which processes the user query and generates the answers based on available documents and user access using retrieval augmented generation (RAG). The process starts by creating a vector based on the question (embedding) by invoking the embedding model.
A query is sent to OpenSearch Service that includes the following:
1. The embedding vector generated.
2. User custom attributes retrieved by Lambda based on their access token, by calling the Amazon Cognito GetUser API.
3. The query relies on the support of an efficient k-NN filter in OpenSearch Service to perform the search.
Pre-filtered documents that relate to the user query are included in the prompt of the large language model (LLM) that summarizes the answer. Then, Lambda replies back to the web interface with the LLM completion (reply).
If the user’s access needs to be modified (assigned attributes), an API call is made through API Gateway to a Lambda function that processes the request to add or update the custom attributes’ value for a specific user.
New attributes are reflected in the user’s profile in Amazon Cognito.

Our solution is implemented and wrapped within AWS Cloud Development Kit (AWS CDK) stacks, which are available in the GitHub repo.

Our sample documents assume a fictional manufacturing company called Unicorn Robotics Factory, which develops robotic unicorns. The dataset contains over 900 documents that are a mix of engineering, roadmap, and business reporting documents. The following is an example of a document’s content:

**CONFIDENTIAL - UNICORNS ROBOTICS INTERNAL DOCUMENT**

**Project: "Galactic Unicorn"**

Unicorns Robotics is proud to announce the development of our latest project, the "Galactic Unicorn". 
This top-secret project aims to create a robotic unicorn that can travel through space and time, bringing magic and joy to children and adults alike.....

The associated metadata file for this document consists of the following:

{ "department": "research", "access_level": "confidential" }

Our solution in the GitHub repo takes care of loading the documents with associated metadata tags. For illustration purposes, we used the following mapping for the users and document access.

This solution is meant to delegate access management to the application tier, to simplify the implementation of use cases like generative AI-powered document search tools. However, if your use case requires a stricter approach to control document access, like multi-tenant environments or field-level security, you might want to use the fine-grained access control feature in OpenSearch Service. In our solution, we manage the access on the document level according to the assigned metadata.

Prerequisites

To deploy the solution, you need the following prerequisites:

An AWS account. If you don’t already have an AWS account, you can create one.
Your access to the AWS account must have AWS Identity and Access Management (IAM) permissions to launch AWS CloudFormation templates that create IAM roles.
The AWS Command Line Interface (AWS CLI) installed.
node.js and npm installed for the frontend.
Docker installed.
The AWS CDK configured. For more information, see Getting started with the AWS CDK.
In case of LLM inference based on Amazon SageMaker, a sufficient service limit to deploy an ml.g5.12xlarge instance for the SageMaker endpoint. If needed, you can initiate a quota increase request. Refer to Service Quotas for more details.

Deploy the solution

To deploy the solution to your AWS account, refer to the Readme file in our GitHub repo.

Query documents with different personas

Now let’s test the application using different personas. In this example, we use the same users with their corresponding custom attributes as illustrated in the solution overview.

To start, let’s log in using the researcher account and run the search around a confidential document.

We ask, “What is the projected profit margin of the Galactic Unicorn project?” and get the result as shown in the following screenshot.

The question invokes a query to OpenSearch Service using the custom attributes assigned to the researcher. The following code illustrates how the query is structured:

for attr, values in user_attributes.items():
        must_conditions.append(
            {
                "bool": {
                    "should": [{"term": {attr: value}} for value in values],
                    "minimum_should_match": 1,
                }
            }
        )

query = {
        "size": 5,
        "query": {
            "knn": {
                "doc_embedding": {
                    "vector": query_vector,
                    "k": 10,
                    "filter": {"bool": {"must": must_conditions}},
                }
            }
        },
    }

Let’s sign out and log in again with an engineer profile to test the same query. Based on the assigned attributes and document metadata, the result should look like that in the following screenshot.

If you tried to query some support documents, you will get the desired answer, as shown in the following screenshot.

Modify user access

As depicted in the solution diagram, we’ve added a feature in the web interface to allow you to modify user access, which you could use to perform further tests. To do so, log in as a tool admin and choose Manage Attributes. Then modify the custom attribute value for a given user, as shown in the following screenshot.

Clean up

When deleting a stack, most resources will be deleted upon stack deletion, but that’s not the case for all resources. The Amazon Simple Storage Service (Amazon S3) bucket, Amazon Cognito user pool, and OpenSearch Service domain will be retained by default. However, our AWS CDK code altered this default behavior by setting the RemovalPolicy to DESTROY for the mentioned resources. If you want to retain them, you can adjust the RemovalPolicy in the AWS CDK code for the different resources.

You can use the following command to clean up the resources deployed to your AWS account:

make destroy

Conclusion

This post illustrated how to build a document search RAG solution that makes sure only authorized users can access and interact with specific documents based on their roles, departments, and other relevant attributes. It combines OpenSearch Service and Amazon Cognito custom attributes to make a tag-based access control mechanism that makes it straightforward to manage at scale.

For demonstration purposes, the following points weren’t included in the AWS CDK code. However, they’re still applicable and you might want to work on them before deploying for production purposes:

OpenSearch Service best practices, such as instance sizing and using primary nodes
Advanced document chunking strategies for RAG implementations, such as recursive or semantic chunking

About the Authors

Karim Akhnoukh is a Solutions Architect at AWS working with manufacturing customers in Germany. He is passionate about applying machine learning and generative AI to solve customers’ business challenges. Besides work, he enjoys playing sports, aimless walks, and good quality coffee.

Ahmed Ewis is a Senior Solutions Architect at AWS GenAI Labs. He helps customers build generative AI-based solutions to solve business problems. When not collaborating with customers, he enjoys playing with his kids and cooking.

Fortune Hui is a Solutions Architect at AWS Hong Kong, working with conglomerate customers. He helps customers and partners build big data platform and generative AI applications. In his free time, he plays badminton and enjoys whisky.

Fing Agent: Raspberry Pi Network Security Made Easy!

2024-11-19 Crosstalk Solutions

Post Syndicated from Crosstalk Solutions original https://www.youtube.com/watch?v=3HBm8KfcJFU

Harper Steele | Will & Harper | Talks at Google

2024-11-19 Talks at Google

Post Syndicated from Talks at Google original https://www.youtube.com/watch?v=h5NhCUFup1U

HPE Slingshot 400 Brings a Liquid Cooled 51.2T Switch and 400Gbps Networking

2024-11-19 Rohit Kumar

Post Syndicated from Rohit Kumar original https://www.servethehome.com/hpe-slingshot-400-brings-a-liquid-cooled-51-2t-switch-and-400gbps-networking/

At SC24 we saw the liquid-cooled 51.2T HPE Slingshot 400 switch that will double the performance of HPE’s HPC interconnect

The post HPE Slingshot 400 Brings a Liquid Cooled 51.2T Switch and 400Gbps Networking appeared first on ServeTheHome.

The #1 Cause of G.A.S. for Photographers…

2024-11-19 Matt Granger

Post Syndicated from Matt Granger original https://www.youtube.com/watch?v=1qGzukI7dxU

Holiday Gift Guide 2024

2024-11-19 Yev

Post Syndicated from Yev original https://www.backblaze.com/blog/holiday-gift-guide-2024/

A decorative image showing a gift floating in the ether, waiting to be given.

Ah, the holidays. They can be both fun and stressful for many of us—not least because we have to make so many decisions around gift giving. The plethora of cyber sales and new products on the market make this a great time to try out new products yourself or buy some for the loved people in your lives. To that end, I’ve followed tradition and asked some of my fellow Backblazers to submit their gift ideas for this year and we’ve compiled them into this 2024 gift giving list!

Reading, (w)righting, (a)rithmatic

Kindle Paperwhite

For the reader in your life, the Kindle Paperwhite is a great way to keep reading regardless of the conditions. It uses e-ink technology to avoid glare and allows for reading in both high and low light conditions!

ReMarkable Paper Pro

This is one of the coolest devices out there, and I can personally say that I desperately want someone to gift me one (although it’s a bit on the pricey side, so I get it). This uses similar e-ink technology as the Paperwhite and allows you to take notes using your own handwriting—which (at least in my case when I write in cursive) helps me remember things!

Tekfun LCD Writing and Doodle Tablet

Mixing writing and artistry, this doodle tablet is great for kiddos that are with you in a restaurant or long-haul flight. In fact, I just bought one of these for my nephew!

Klein Bottle

Some people are math people. I don’t get it, but this Klein Bottle is basically an inside joke to math nerds (something about Möbius loops) and can serve as a lovely table prop for the math lover in your life.

Sounds good!

HyperX Cloud Alpha Wireless Gaming Headset

For the gamer in your life, these are some of the best headphones on the market. They’re wireless, they sound great, and the microphone picks up all of the positive (I’m sure) feedback for teammates.

Apple AirPods 4

The latest and greatest from Apple. These come with a new chip that helps with transparency mode so you can cross the street with confidence.

Google Pixel Buds Pro 2

A product image of the Pixel Buds Pro 2.

If you’re an Android user (raises hand) you may want something built for your phone and the latest Pro version of the Pixel Buds are here for you. These come with little nubs for snug fit, meaning you can run or lift or meander without worrying that your tiny buds will fall out.

Experiences

Sur La Table Cooking Classes

A screenshot form the Sur La Table website that features their holiday cooking classes.

Who doesn’t want to impress loved ones with delicious dishes? If you have a Sur La Table near you, these are great.

Airbnb Experiences

If you’re a fan of traveling you’ve likely stayed at an AirBnB. But even if you’re more of a homebase person, their Experiences tab is worth checking out. From cooking to glass blowing to cow cuddle therapy (yes, that’s a thing).

Odds and ends

theFube Fidget Cube

I have one of these in my car and, let me tell you, as a person who used to bite their nails out of boredom (gross, I know), it helps keep my mind and compulsions at bay! Highly recommend.

Under Desk Foot Rest

Many of us spend our day sitting, sometimes standing at a desk. Even when you’re sitting, though, ergonomics are a big deal and having an adjustable foot rest can help you get into a position that feels solid and sustainable for long stretches of time. I recommend getting up and stretching every now and again, but if you’re like me and have a bad back, these are great.

Leatherman ARC

Whether you’re outdoors or indoors, sometimes you need to cut or saw or screw or open or pry or file something, and for that, these Leatherman tools cannot be beaten.

Give the gift of Backblaze

And, of course, we’d be remiss if we didn’t remind you that Backblaze Computer Backup makes a great gift. Help your family and friends experience the sweet, sweet peace of mind that comes from a good backup strategy and make sure they never lose a file again. Bonus: you don’t even have to go to the store to get it.

A decorative showing a gift covered in Backblaze logos.

Go forth and gift!

We hope this guide sparked some ideas and simplified some choices. We love hearing about what folks are excited about, so feel free to give us some more good options in the comments below.

The post Holiday Gift Guide 2024 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Incus 6.7 released

2024-11-19 corbet

Post Syndicated from corbet original https://lwn.net/Articles/998756/

Version 6.7 of the Incus container-management system (forked from LXD) has
been released. “This is another one of those pretty well rounded
releases with new features and improvements for everyone“. New
features include automatic cluster rebalancing, DHCP improvements, and more.

Security updates for Tuesday

2024-11-19 corbet

Post Syndicated from corbet original https://lwn.net/Articles/998755/

Security updates have been issued by AlmaLinux (.NET 9.0, bcc, bluez, bpftrace, bubblewrap, flatpak, buildah, cockpit, containernetworking-plugins, cups, cyrus-imapd, edk2, expat, firefox, fontforge, gnome-shell, gnome-shell-extensions, grafana, grafana-pcp, gtk3, httpd, iperf3, jose, krb5, libgcrypt, libsoup, libvirt, libvpx, lldpd, microcode_ctl, mingw-glib2, mod_auth_openidc, nano, NetworkManager, oci-seccomp-bpf-hook, openexr, osbuild-composer, pcp, podman, poppler, postfix, python-dns, python-jinja2, python-jwcrypto, python3.11, python3.11-PyMySQL, python3.11-urllib3, python3.12, python3.12-PyMySQL, python3.12-urllib3, python3.9, qemu-kvm, runc, skopeo, squid, thunderbird, toolbox, tpm2-tools, vim, webkit2gtk3, xorg-x11-server, and xorg-x11-server-Xwayland), Fedora (lemonldap-ng and mingw-expat), SUSE (bea-stax, xstream, expat, httpcomponents-client, httpcomponents-core, kernel, SUSE Manager Client Tools, SUSE Manager Proxy, Retail Branch Server 4.3, SUSE Manager Salt Bundle, SUSE Manager Server 4.3, and SUSE Manager Server 5.0), and Ubuntu (curl, glib2.0, and webkit2gtk).

Accelerate Mean Time to Exposure Remediation Across Hybrid Environments with Remediation Hub

2024-11-19 Pauline Logan

Post Syndicated from Pauline Logan original https://blog.rapid7.com/2024/11/19/accelerate-mean-time-to-exposure-remediation-across-hybrid-environments-with-remediation-hub/

Accelerate Mean Time to Exposure Remediation Across Hybrid Environments with Remediation Hub

As organizations continue to scale their digital infrastructure, the volume of vulnerabilities and exposures grows at an overwhelming pace. Security teams often find themselves inundated with alerts and risk signals, unable to remediate every issue within their environment. They often struggle to keep pace with the dynamic nature of threats, and existing tools were not built to address the complexity of modern IT environments.

With limited time and resources, trying to address every potential vulnerability is not feasible. This reality has driven the need for prioritization—teams must focus on the vulnerabilities that present the highest risks to their organization, based on factors like attacker behaviors, real-world threat intelligence, and exploitability.

Meet Remediation Hub, Your New Home for Exposure Prioritization and Remediation

Accelerate Mean Time to Exposure Remediation Across Hybrid Environments with Remediation Hub

Rapid7’s Remediation Hub, our newest addition to the Exposure Command platform, is designed to address this exact challenge. Remediation Hub automatically prioritizes various risk signals across your hybrid environment and suggests the actions your team can take that would have the largest impact on reducing your overall risk posture.

The solution leverages foundational visibility from Surface Command, which presents a comprehensive view of your attack surface, combining both external scanning and cyber asset management to provide a dynamic inventory and topology map of every asset across your environment. Underpinned by a powerful graph database, the platform allows teams to visualize the entire attack surface and understand the interconnected relationships between assets, ensuring that teams are guided to take action on the risks that are not only likely to be exploited but could also have the broadest blast radius.

Remediation Hub considers factors like public accessibility, reachability, and the presence of downstream controls (like a firewall, for instance) when prioritizing vulnerabilities. The platform’s Active Risk incorporates real-world threat intelligence from Rapid7 Labs and our open source community to provide clarity into what CVEs are being actively exploited in the wild, which could provide insight into which exposures across your environment attackers are likely to target.

Along with insight into the impacted assets, teams are also provided step-by-step guidance on how to implement the suggested fix, with many actions available as native automation workflows.

Proactive Exposure Management: Moving from Reactive to Proactive

By taking a more targeted, intelligence-driven approach to remediation, security teams can move from reactive to proactive exposure management, ultimately making their organizations more resilient to attacks and accelerating the time it takes to both detect and remediate exposures that pop up across their environments.

To learn more and experience a self-guided Product Tour, click here.

What’s coming next?

In the next post, we’ll walk you through how users can leverage Remediation Hub when responding to an Emergent Threat, including gathering available information about a zero-day, building an understanding of your exposure, along with step-by-step remediation guidance.

We’ll also, of course, continue to bring additional enhancements to Remediation Hub geared toward making it easier for teams to more effectively collaborate with stakeholders across the organization to prioritize and remediate exposures across their hybrid environments. So be sure to stay tuned here for more posts with those updates. As always, we welcome customer feedback and would love to hear from you! Your input helps us tailor our product roadmap, based on your priorities and business needs.

Why Italy Sells So Much Spyware

2024-11-19 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/why-italy-sells-so-much-spyware.html

Interesting analysis:

Although much attention is given to sophisticated, zero-click spyware developed by companies like Israel’s NSO Group, the Italian spyware marketplace has been able to operate relatively under the radar by specializing in cheaper tools. According to an Italian Ministry of Justice document, as of December 2022 law enforcement in the country could rent spyware for €150 a day, regardless of which vendor they used, and without the large acquisition costs which would normally be prohibitive.

As a result, thousands of spyware operations have been carried out by Italian authorities in recent years, according to a report from Riccardo Coluccini, a respected Italian journalist who specializes in covering spyware and hacking.

Italian spyware is cheaper and easier to use, which makes it more widely used. And Italian companies have been in this market for a long time.

The Worst Mistake in the History of the Human Race

2024-11-19 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=qTO8VJPwgpM

Ada Computer Science: What have we learnt so far

2024-11-19 Ben Durbin

Post Syndicated from Ben Durbin original https://www.raspberrypi.org/blog/ada-computer-science-what-have-we-learnt-so-far/

It’s been over a year since we launched Ada Computer Science, and we continue to see the numbers of students and teachers using the platform all around the world grow. Our recent year in review shared some of the key developments we’ve made since launching, many of which are a direct result of feedback from our community.

Today, we are publishing an impact report that includes some of this feedback, along with what users are saying about the impact Ada Computer Science is having.

Computer science students at a desktop computer in a classroom.

Evaluating Ada Computer Science

Ada Computer Science is a free learning platform for computer science students and teachers. It provides high-quality, online learning materials to use in the classroom, for homework, and for revision. Our experienced team has created resources that cover every topic in the leading GCSE and A level computer science specifications.

From May to July 2024, we invited users to provide feedback via an online survey, and we got responses from 163 students and 27 teachers. To explore the feedback further, we also conducted in-depth interviews with three computer science teachers in September 2024.

How is Ada being used?

The most common ways students use Ada Computer Science — as reported by more than two thirds of respondents — is for revision and/or to complete work set by their teacher. Similarly, teachers most commonly said that they direct students to use Ada outside the classroom.

“I recommend my students use Ada Computer Science as their main textbook.” — Teacher

What is users’ experience of using Ada?

Most respondents agreed or strongly agreed that Ada is useful for learning (82%) and high quality (79%).

“Ada Computer Science has been very effective for independent revision, I like how it provides hints and pointers if you answer a question incorrectly.” — Student

Ada users were generally positive about their overall experience of the platform and using it to find the information they were looking for.

“Ada is one of the best for hitting the nail on the head. They’ve really got it in tune with the depth that exam boards want.” — Ian Robinson, computer science teacher (St Alban’s Catholic High School, UK)

What impact is Ada having?

Around half of the teachers agreed that Ada had reduced their workload and/or increased their subject knowledge. Across all respondents, teachers estimated that the average weekly time saving was 1 hour 8 minutes.

Additionally, 81% of students agreed that as a result of using Ada, they had become better at understanding computer science concepts. Other benefits were reported too, with most students agreeing that they had become better problem-solvers, for example.

“I love Ada! It is an extremely helpful resource… The content featured is very comprehensive and detailed, and the visual guides… are particularly helpful to aid my understanding.” — Student

Future developments

Since receiving this feedback, we have already released updated site navigation and new question finder designs. In 2025, we are planning improvements to the markbook (for example, giving teachers an overview of the assignments they’ve set) and to how assignments can be created.

If you’d like to read more about the findings, there’s a full report for you to download. Thank you to everyone who took the time to take part — we really value your feedback!

The post Ada Computer Science: What have we learnt so far appeared first on Raspberry Pi Foundation.