Automate data loading from your database into Amazon Redshift using AWS Database Migration Service (DMS), AWS Step Functions, and the Redshift Data API

Post Syndicated from Ritesh Sinha original https://aws.amazon.com/blogs/big-data/automate-data-loading-from-your-database-into-amazon-redshift-using-aws-database-migration-service-dms-aws-step-functions-and-the-redshift-data-api/

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing ETL (extract, transform, and load), business intelligence (BI), and reporting tools. Tens of thousands of customers use Amazon Redshift to process exabytes of data per day and power analytics workloads such as BI, predictive analytics, and real-time streaming analytics.

As more and more data is being generated, collected, processed, and stored in many different systems, making the data available for end-users at the right place and right time is a very important aspect for data warehouse implementation. A fully automated and highly scalable ETL process helps minimize the operational effort that you must invest in managing the regular ETL pipelines. It also provides timely refreshes of data in your data warehouse.

You can approach the data integration process in two ways:

  • Full load – This method involves completely reloading all the data within a specific data warehouse table or dataset
  • Incremental load – This method focuses on updating or adding only the changed or new data to the existing dataset in a data warehouse

This post discusses how to automate ingestion of source data that changes completely and has no way to track the changes. This is useful for customers who want to use this data in Amazon Redshift; some examples of such data are products and bills of materials without tracking details at the source.

We show how to build an automatic extract and load process from various relational database systems into a data warehouse for full load only. A full load is performed from SQL Server to Amazon Redshift using AWS Database Migration Service (AWS DMS). When Amazon EventBridge receives a full load completion notification from AWS DMS, ETL processes are run on Amazon Redshift to process data. AWS Step Functions is used to orchestrate this ETL pipeline. Alternatively, you could use Amazon Managed Workflows for Apache Airflow (Amazon MWAA), a managed orchestration service for Apache Airflow that makes it straightforward to set up and operate end-to-end data pipelines in the cloud.

Solution overview

The workflow consists of the following steps:

  1. The solution uses an AWS DMS migration task that replicates the full load dataset from the configured SQL Server source to a target Redshift cluster in a staging area.
  2. AWS DMS publishes the replicationtaskstopped event to EventBridge when the replication task is complete, which invokes an EventBridge rule.
  3. EventBridge routes the event to a Step Functions state machine.
  4. The state machine calls a Redshift stored procedure through the Redshift Data API, which loads the dataset from the staging area to the target production tables. With this API, you can also access Redshift data with web-based service applications, including AWS Lambda.

The following architecture diagram highlights the end-to-end solution using AWS services.

In the following sections, we demonstrate how to create the full load AWS DMS task, configure the ETL orchestration on Amazon Redshift, create the EventBridge rule, and test the solution.

Prerequisites

To complete this walkthrough, you must have the following prerequisites:

  • An AWS account
  • A SQL Server database configured as a replication source for AWS DMS
  • A Redshift cluster to serve as the target database
  • An AWS DMS replication instance to migrate data from source to target
  • A source endpoint pointing to the SQL Server database
  • A target endpoint pointing to the Redshift cluster

Create the full load AWS DMS task

Complete the following steps to set up your migration task:

  1. On the AWS DMS console, choose Database migration tasks in the navigation pane.
  2. Choose Create task.
  3. For Task identifier, enter a name for your task, such as dms-full-dump-task.
  4. Choose your replication instance.
  5. Choose your source endpoint.
  6. Choose your target endpoint.
  7. For Migration type, choose Migrate existing data.

  1. In the Table mapping section, under Selection rules, choose Add new selection rule
  2. For Schema, choose Enter a schema.
  3. For Schema name, enter a name (for example, dms_sample).
  4. Keep the remaining settings as default and choose Create task.

The following screenshot shows your completed task on the AWS DMS console.

Create Redshift tables

Create the following tables on the Redshift cluster using the Redshift query editor:

  • dbo.dim_cust – Stores customer attributes:
CREATE TABLE dbo.dim_cust (
cust_key integer ENCODE az64,
cust_id character varying(10) ENCODE lzo,
cust_name character varying(100) ENCODE lzo,
cust_city character varying(50) ENCODE lzo,
cust_rev_flg character varying(1) ENCODE lzo
)

DISTSTYLE AUTO;
  • dbo.fact_sales – Stores customer sales transactions:
CREATE TABLE dbo.fact_sales (
order_number character varying(20) ENCODE lzo,
cust_key integer ENCODE az64,
order_amt numeric(18,2) ENCODE az64
)

DISTSTYLE AUTO;
  • dbo.fact_sales_stg – Stores daily customer incremental sales transactions:
CREATE TABLE dbo.fact_sales_stg (
order_number character varying(20) ENCODE lzo,
cust_id character varying(10) ENCODE lzo,
order_amt numeric(18,2) ENCODE az64
)

DISTSTYLE AUTO;

Use the following INSERT statements to load sample data into the sales staging table:

insert into dbo.fact_sales_stg(order_number,cust_id,order_amt) values (100,1,200);
insert into dbo.fact_sales_stg(order_number,cust_id,order_amt) values (101,1,300);
insert into dbo.fact_sales_stg(order_number,cust_id,order_amt) values (102,2,25);
insert into dbo.fact_sales_stg(order_number,cust_id,order_amt) values (103,2,35);
insert into dbo.fact_sales_stg(order_number,cust_id,order_amt) values (104,3,80);
insert into dbo.fact_sales_stg(order_number,cust_id,order_amt) values (105,3,45);

Create the stored procedures

In the Redshift query editor, create the following stored procedures to process customer and sales transaction data:

  • Sp_load_cust_dim() – This procedure compares the customer dimension with incremental customer data in staging and populates the customer dimension:
CREATE OR REPLACE PROCEDURE dbo.sp_load_cust_dim()
LANGUAGE plpgsql
AS $$
BEGIN
truncate table dbo.dim_cust;
insert into dbo.dim_cust(cust_key,cust_id,cust_name,cust_city) values (1,100,'abc','chicago');
insert into dbo.dim_cust(cust_key,cust_id,cust_name,cust_city) values (2,101,'xyz','dallas');
insert into dbo.dim_cust(cust_key,cust_id,cust_name,cust_city) values (3,102,'yrt','new york');
update dbo.dim_cust
set cust_rev_flg=case when cust_city='new york' then 'Y' else 'N' end
where cust_rev_flg is null;
END;
$$
  • sp_load_fact_sales() – This procedure does the transformation for incremental order data by joining with the date dimension and customer dimension and populates the primary keys from the respective dimension tables in the final sales fact table:
CREATE OR REPLACE PROCEDURE dbo.sp_load_fact_sales()
LANGUAGE plpgsql
AS $$
BEGIN
--Process Fact Sales
insert into dbo.fact_sales
select
sales_fct.order_number,
cust.cust_key as cust_key,
sales_fct.order_amt
from dbo.fact_sales_stg sales_fct
--join to customer dim
inner join (select * from dbo.dim_cust) cust on sales_fct.cust_id=cust.cust_id;
END;
$$

Create the Step Functions state machine

Complete the following steps to create the state machine redshift-elt-load-customer-sales. This state machine is invoked as soon as the AWS DMS full load task for the customer table is complete.

  1. On the Step Functions console, choose State machines in the navigation pane.
  2. Choose Create state machine.
  3. For Template, choose Blank.
  4. On the Actions dropdown menu, choose Import definition to import the workflow definition of the state machine.

  1. Open your preferred text editor and save the following code as an ASL file extension (for example, redshift-elt-load-customer-sales.ASL). Provide your Redshift cluster ID and the secret ARN for your Redshift cluster.
{
"Comment": "State Machine to process ETL for Customer Sales Transactions",
"StartAt": "Load_Customer_Dim",
"States": {
"Load_Customer_Dim": {
"Type": "Task",
"Parameters": {
"ClusterIdentifier": "redshiftcluster-abcd",
"Database": "dev",
"Sql": "call dbo.sp_load_cust_dim()",
"SecretArn": "arn:aws:secretsmanager:us-west-2:xxx:secret:rs-cluster-secret-abcd"
},
"Resource": "arn:aws:states:::aws-sdk:redshiftdata:executeStatement",
"Next": "Wait on Load_Customer_Dim"
},
"Wait on Load_Customer_Dim": {
"Type": "Wait",
"Seconds": 30,
"Next": "Check_Status_Load_Customer_Dim"
},

"Check_Status_Load_Customer_Dim": {
"Type": "Task",
"Next": "Choice",
"Parameters": {
"Id.$": "$.Id"
},

"Resource": "arn:aws:states:::aws-sdk:redshiftdata:describeStatement"
},

"Choice": {
"Type": "Choice",
"Choices": [
{
"Not": {
"Variable": "$.Status",
"StringEquals": "FINISHED"
},
"Next": "Wait on Load_Customer_Dim"
}
],
"Default": "Load_Sales_Fact"
},
"Load_Sales_Fact": {
"Type": "Task",
"End": true,
"Parameters": {
"ClusterIdentifier": "redshiftcluster-abcdef”,
"Database": "dev",
"Sql": "call dbo.sp_load_fact_sales()",
"SecretArn": "arn:aws:secretsmanager:us-west-2:xxx:secret:rs-cluster-secret-abcd"
},

"Resource": "arn:aws:states:::aws-sdk:redshiftdata:executeStatement"
}
}
}
  1. Choose Choose file and upload the ASL file to create a new state machine.

  1. For State machine name, enter a name for the state machine (for example, redshift-elt-load-customer-sales).
  2. Choose Create.

After the successful creation of the state machine, you can verify the details as shown in the following screenshot.

The following diagram illustrates the state machine workflow.

The state machine includes the following steps:

  • Load_Customer_Dim – Performs the following actions:
    • Passes the stored procedure sp_load_cust_dim to the execute-statement API to run in the Redshift cluster to load the incremental data for the customer dimension
    • Sends data back the identifier of the SQL statement to the state machine
  • Wait_on_Load_Customer_Dim – Waits for at least 15 seconds
  • Check_Status_Load_Customer_Dim – Invokes the Data API’s describeStatement to get the status of the API call
  • is_run_Load_Customer_Dim_complete – Routes the next step of the ETL workflow depending on its status:
    • FINISHED – Passes the stored procedure Load_Sales_Fact to the execute-statement API to run in the Redshift cluster, which loads the incremental data for fact sales and populates the corresponding keys from the customer and date dimensions
    • All other statuses – Goes back to the wait_on_load_customer_dim step to wait for the SQL statements to finish

The state machine redshift-elt-load-customer-sales loads the dim_cust, fact_sales_stg, and fact_sales tables when invoked by the EventBridge rule.

As an optional step, you can set up event-based notifications on completion of the state machine to invoke any downstream actions, such as Amazon Simple Notification Service (Amazon SNS) or further ETL processes.

Create an EventBridge rule

EventBridge sends event notifications to the Step Functions state machine when the full load is complete. You can also turn event notifications on or off in EventBridge.

Complete the following steps to create the EventBridge rule:

  1. On the EventBridge console, in the navigation pane, choose Rules.
  2. Choose Create rule.
  3. For Name, enter a name (for example, dms-test).
  4. Optionally, enter a description for the rule.
  5. For Event bus, choose the event bus to associate with this rule. If you want this rule to match events that come from your account, select AWS default event bus. When an AWS service in your account emits an event, it always goes to your account’s default event bus.
  6. For Rule type, choose Rule with an event pattern.
  7. Choose Next.
  8. For Event source, choose AWS events or EventBridge partner events.
  9. For Method, select Use pattern form.
  10. For Event source, choose AWS services.
  11. For AWS service, choose Database Migration Service.
  12. For Event type, choose All Events.
  13. For Event pattern, enter the following JSON expression, which looks for the REPLICATON_TASK_STOPPED status for the AWS DMS task:
{
"source": ["aws.dms"],
"detail": {
"eventId": ["DMS-EVENT-0079"],
"eventType": ["REPLICATION_TASK_STOPPED"],
"detailMessage": ["Stop Reason FULL_LOAD_ONLY_FINISHED"],
"type": ["REPLICATION_TASK"],
"category": ["StateChange"]
}
}

  1. For Target type, choose AWS service.
  2. For AWS service, choose Step Functions state machine.
  3. For State machine name, enter redshift-elt-load-customer-sales.
  4. Choose Create rule.

The following screenshot shows the details of the rule created for this post.

Test the solution

Run the task and wait for the workload to complete. This workflow moves the full volume data from the source database to the Redshift cluster.

The following screenshot shows the load statistics for the customer table full load.

AWS DMS provides notifications when an AWS DMS event occurs, for example the completion of a full load or if a replication task has stopped.

After the full load is complete, AWS DMS sends events to the default event bus for your account. The following screenshot shows an example of invoking the target Step Functions state machine using the rule you created.

We configured the Step Functions state machine as a target in EventBridge. This enables EventBridge to invoke the Step Functions workflow in response to the completion of an AWS DMS full load task.

Validate the state machine orchestration

When the entire customer sales data pipeline is complete, you may go through the entire event history for the Step Functions state machine, as shown in the following screenshots.

Limitations

The Data API and Step Functions AWS SDK integration offers a robust mechanism to build highly distributed ETL applications within minimal developer overhead. Consider the following limitations when using the Data API and Step Functions:

Clean up

To avoid incurring future charges, delete the Redshift cluster, AWS DMS full load task, AWS DMS replication instance, and Step Functions state machine that you created as part of this post.

Conclusion

In this post, we demonstrated how to build an ETL orchestration for full loads from operational data stores using the Redshift Data API, EventBridge, Step Functions with AWS SDK integration, and Redshift stored procedures.

To learn more about the Data API, see Using the Amazon Redshift Data API to interact with Amazon Redshift clusters and Using the Amazon Redshift Data API.


About the authors

Ritesh Kumar Sinha is an Analytics Specialist Solutions Architect based out of San Francisco. He has helped customers build scalable data warehousing and big data solutions for over 16 years. He loves to design and build efficient end-to-end solutions on AWS. In his spare time, he loves reading, walking, and doing yoga.

Praveen Kadipikonda is a Senior Analytics Specialist Solutions Architect at AWS based out of Dallas. He helps customers build efficient, performant, and scalable analytic solutions. He has worked with building databases and data warehouse solutions for over 15 years.

Jagadish Kumar (Jag) is a Senior Specialist Solutions Architect at AWS focused on Amazon OpenSearch Service. He is deeply passionate about Data Architecture and helps customers build analytics solutions at scale on AWS.

Serverless ICYMI Q2 2024

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/serverless-icymi-q2-2024/

Welcome to the 26th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

Calendar

Calendar

EDA Day – London 2024

The AWS Serverless DA team hosted the third Event-Driven Architecture (EDA) Day in London on May 14th. This event brought together prominent figures in the event-driven architecture community, AWS, and customer speakers.

EDA Day covered 13 sessions, 2 workshops, and a Q&A panel. David Boyne was the keynote speaker with a talk “Complexity is the Gotcha of Event-Driven Architecture”. There were AWS speakers including Matthew Meckes, Natasha Wright, Julian Wood, Gillian Amstrong, Josh Kahn, Veda Ramen, and Uma Ramadoss. There was also an impressive lineup of guest speakers, Daniele Frasca, David Anderson, Ryan Cormack, Sarah Hamilton, Sheen Brisals, Marcin Sodkiewicz, and Ben Ellerby.

Videos are available on YouTube

EDA Day London

EDA Day London

The future of Serverless

There has been a lot of talk about the future of serverless, with this year being the 10th anniversary of AWS Lambda. Eric Johnson addresses the topic in his ServerlessDays Milan keynote, “Now serverless is all grown up, what’s next”.

AWS Lambda

AWS launched support for the latest release of Ruby 3.3 is based on the new Amazon Linux 2023 runtime. The Ruby 3.3 runtime also provides access to the latest Ruby language features.

There is a new guide on how to retrieve data about Lambda functions that use a deprecated runtime.

Learn how to run code after returning a response from an AWS Lambda function. This post shows how to return a synchronous function response as soon as possible, yet also perform additional asynchronous work after you send the response. For example, you may store data in a database or send information to a logging system.

See how you can use the circuit-breaker pattern with Lambda extensions and Amazon DynamoDB. The circuit breaker pattern can help prevent cascading failures and improve overall system stability.

Circuit-breaker pattern

Circuit-breaker pattern

Lambda functions now scale up to 12X faster in the AWS GovCloud (US) Regions.

Powertools for AWS Lambda (Python) adds support for Agents for Amazon Bedrock.

The AWS SDK for JavaScript v2 enters maintenance mode on September 8, 2024 and reaches end-of-support on September 8, 2025.

Amazon CloudWatch Logs introduced Live Tail streaming CLI support.

Amazon ECS and AWS Fargate

You can now secure Amazon Elastic Container Service (Amazon ECS) workloads on AWS Fargate with customer managed keys (CMKs). Once you add your keys to AWS Key Management Service (AWS KMS), you can use these to encrypt the underlying ephemeral storage of an Amazon ECS task on AWS Fargate.

Windows containers on AWS Fargate now start faster, up to 42% for Windows Server 2022 Core. AWS has optimized the Windows Server AMIs, introduced EC2 fast launch with pre-provisioned snapshots, and reduced network latency.

Amazon ECS Service Connect is a networking capability to simplify service discovery, connectivity, and traffic observability for Amazon ECS. You can now proactively scale Amazon ECS services by using custom metrics.

ECS Connect custom metrics

ECS Service Connect custom metrics

AWS Step Functions

The AWS Step Functions TestState API allows you to test individual states independently and to integrate testing into your preferred development workflows. Learn how to accelerate workflow development to iterate faster.

Step Functions TestState API

Step Functions TestState API

Amazon EventBridge

Amazon EventBridge Pipes now supports event delivery through AWS PrivateLink. You can send events from an event source located in an Amazon Virtual Private Cloud (VPC) to a Pipes target without traversing the public internet.

Amazon Timestream for LiveAnalytics is now an EventBridge Pipes target. Timestream for LiveAnalytics is a fast, scalable, purpose-built time series database that makes it easy to store and analyze trillions of time series data points per day.

EventBridge has a new console dashboard which provides a centralized view of your resources, metrics, and quotas. The console has an improved Learn page and other console enhancements. When using the CloudFormation template export for Pipes, you can also generate the IAM role. There is a new Rules tab in the Event Bus detail page, and the monitoring tab in the Rule detail page now includes additional metrics.

EventBridge Scheduler has some new API request metrics for improved observability.

Generative AI

Amazon Bedrock is a fully managed Generative AI service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API. Bedrock now supports new models, including Anthropic’s Claude 3.5, AI21 Labs’ Jamba-Instruct, Amazon Titan Text Premier.

The new Bedrock Converse API provides a consistent way to invoke Amazon Bedrock models and simplifies multi-turn conversations. There is also a JavaScript tutorial to walk you through sending requests to the Converse API using the Javascript SDK.

Amazon Q Developer is now generally available. Amazon Q Developer, part of the Amazon Q family, is a generative AI–powered assistant for software development. Amazon Q is available in the AWS Management Console and as an integrated development environment (IDE) extension for Visual Studio Code, Visual Studio, and JetBrains IDEs. Amazon Q Developer has knowledge of your AWS account resources and can help understand your costs.

Amazon Q list Lambda functions

Amazon Q list Lambda functions

You can use Amazon Q Developer to develop code features and transform code to upgrade Java applications. Amazon Q Developer also offers inline completions in the command line. For more information, see Reimagining software development with the Amazon Q Developer Agent.

Amazon Q code features

Amazon Q code features

Knowledge Bases for Amazon Bedrock now let you configure Guardrails, configure inference parameters, and offers observability logs.

Storage and data

Amazon S3 no longer charges for several HTTP error codes if initiated from outside your individual AWS account or AWS Organization.

You can automatically detect malware in new object uploads to S3 with Amazon GuardDuty.

Amazon Elastic File System (Amazon EFS) now support up to 1.5 GiB/s of throughput per client, a 3x increase over the previous limit of 500 MiB/s.

Discover architectural patterns for real-time analytics using Amazon Kinesis Data Streams in part 1 and part 2 and see how to optimize write throughput.

Amazon API Gateway

Amazon API Gateway now allows you to increase the integration timeout beyond the prior limit of 29 seconds. You can raise the integration timeout for Regional and private REST APIs, but this might require a reduction in your account-level throttle quota limit. This launch can help with workloads that require longer timeouts, such as Generative AI use cases with Large Language Models (LLMs).

You can also now use Amazon Verified Permissions to secure API Gateway REST APIs when using an Open ID connect (OIDC) compliant identity provider. You can now control access based on user attributes and group memberships, without writing code.

AWS AppSync

You can now invoke your AWS AppSync data sources in an event-driven manner. Previously, you could only invoke Lambda functions synchronously from AWS AppSync. AWS AppSync can now trigger Lambda functions in Event mode, asynchronously decoupling the API response from the Lambda invocation, which helps with long-running operations.

AWS AppSync now passes application request headers to Lambda custom authorizer functions. You can make authorization decisions based on the value of the authorization header, and the value of other headers that were sent with the request from the application client.

Learn best practices for AWS AppSync GraphQL APIs. See how to how to optimize the security, performance, coding standards, and deployment of your AWS AppSync API. AWS AppSync also has increase quotas, and new metrics

AWS Amplify

AWS Amplify Gen 2 is now generally available. This now provides a code-first developer experience for building full-stack apps using TypeScript. Amplify Gen 2 allows you to express app requirements like the data models, business logic, and authorization rules in TypeScript.

AWS Amplify Gen2

AWS Amplify Gen2

Amplify has a new experience for file storage. This post explores using Lambda to create serverless functions for Amplify using TypeScript. There are also new team environment workflows.

Serverless blog posts

April

May

June

Serverless container blog posts

April

May

June

Serverless Office Hours

Serverless Office Hours

Serverless Office Hours

April

May

June

Containers from the Couch

Containers from the Couch

Containers from the Couch

April

May

FooBar Serverless

April

February

June

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on X (formerly Twitter) to see the latest news, follow conversations, and interact with the team.

And finally, visit the Serverless Land and Containers on AWS websites for all your serverless and serverless container needs.

Трябва ли неправославните да се интересуват от БПЦ?

Post Syndicated from Светла Енчева original https://www.toest.bg/tryabva-li-nepravoslavnite-da-se-interesuvat-ot-bpts/

Трябва ли неправославните да се интересуват от БПЦ?

В края на миналата седмица Българската православна църква (БПЦ) избра свой патриарх – за втори път от 1989 г. насам. Изборът се превърна във водеща тема и беше подробно отразен в медиите. Когато стана ясно, че новият патриарх ще е Даниил Видински, последваха бурни реакции. Поводът за тях беше, че той нееднократно е изразявал проруски позиции.

Ала не е ли редно само вярващите и практикуващи православни християни да имат право на мнение по темата? Ако не се интересувате от футбол, какво значение има за вас дали отборът на България се е класирал за Европейското първенство или кой ще стане европейски шампион по футбол?

Ако Църквата и държавата бяха наистина отделени, както е по конституция, ако религията не се използваше за политически цели и БПЦ не оказваше влияние върху живота на непринадлежащите към нея, тогава действително православните християни щяха да имат най-голямо право да я критикуват. Така ли е обаче?

Ролята на БПЦ в политическия и обществения живот

Освен религиозно събитие, избирането на патриарх Даниил беше и политическо. На първо място, защото е залог в борбата за външнополитическата ориентация на България. На първата литургия с негово участие присъстваха редица проруски политици, например Румен Петков и бившият президент Георги Първанов, както и руската посланичка Елеонора Митрофанова.

Значението на БПЦ за външната политика е голямо заради все по-нарастващата ѝ роля във вътрешнополитическия и обществения живот през последния четвърт век. През 2001 г., когато на власт идва Симеон Сакскобургготски, патриархът за първи път влиза в Народното събрание за откриването му. И с малки изключения получава запазено място там. Докато се стигне до наши дни, когато президентът поздравява новоизбрания предводител на БПЦ с думите „Бог да пази България“, а встъпването му в длъжност се отбелязва дори с военни почести.

Православнообразната реторика, на която практически никой политик не се противопоставя, служи и за политически послания и обосноваване на политически решения. Откриването на предишното 49-то Народно събрание например беше съпроводено с нещо като сапунена миниопера, понеже се случи в Страстната седмица. И кой ли не сметна за нужно да направи връзка между страстите Христови и политическия живот в България – от президента, през председателя на ГЕРБ Бойко Борисов, та до тогавашния главен прокурор Иван Гешев.

За доминирането на БПЦ в обществения живот услужливо допринасят и голяма част от медиите. Ако има православен празник или важно църковно събитие, почти целите новинарски емисии на водещите телевизии се запълват с него. Дори медии, които се стремят да спазват журналистическите стандарти, понякога губят критична дистанция и забравят за мисията си да обясняват какво се случва. И започват да говорят на църковен език – например как Даниил Видински бил „възведен в архимандритско достойнство“ и „хиротонисан за епископ“. Вместо простичко да кажат: „станал архимандрит“, „ръкоположен за епископ“.

На БПЦ обаче настоящата сериозна роля в обществено-политическия живот не ѝ е достатъчна. Едни от първите думи на новоизбрания патриарх Даниил бяха, че училищният предмет религия и православие трябва да стане задължителен. А който не е православен, можело да учи други религии, етика или философия. Понастоящем религията се изучава факултативно, а предметите етика и философия са в учебния план на гимназиите още от началото на 90-те. Така че не е ясно в какво точно би се изразила исканата от патриарха промяна. Ясно е само, че иска да е задължително.

Намесата на религията в правото на личен живот

Православието влияе и върху личния живот дори на хората, които не са православни. Медиите не пропускат да ни напомнят кога според православния календар трябва да празнуваме, да скърбим, да се смирим или да се извиняваме. И че трябва да почитаме починалите си близки точно на Задушница, а не на някоя дата, която лично на нас ни е важна.

Но докато несъобразяването с тези ритуали се възприема като морално прегрешение или най-малкото като социална неадекватност, има сфери, в които религията е допусната – и то от възможно най-високите места в държавата – да реши вместо нас как да живеем живота си. Позицията на БПЦ, както и на други християнски деноминации се взема под внимание и в законодателния процес, служи като аргумент и за решения на върховните съдилища.

И макар другите християнски деноминации да са особено гласовити, когато става въпрос за ограничаване на правото на личен живот, никой не би им обърнал внимание, ако не действаха в унисон с БПЦ, националистическите партии и руската пропаганда.

Дори да не сте религиозни, държавата може да ви забрани да сключите брак с аргумента, че това противоречи на традициите и религията в България. По-точно, противоречи на Конституцията, според задължителните тълкувания на която тя е в съответствие с религията и традициите (независимо какво пише в самата Конституция). Попадате в горната хипотеза, ако обичате човек от същия пол. Пак в името на религията и традициите държавата няма да ви разреши и друга форма на регистрирано партньорство с любимия човек, въпреки че според Европейския съд по правата на човека тя е длъжна да го направи – поне за сключилите брак в чужбина.

Религията и традициите имат думата и какво име и какъв пол да пише в документите ви за самоличност. Не в кръщелното ви свидетелство, а в гражданските ви документи. Ако сте от късметлийското мнозинство, членовете на което се идентифицират в съответствие с половите си органи, нямате проблем. Но ако имате външните белези на мъж, а се чувствате като жена или обратното, не сте достатъчно традиционни и религиозно приемливи. И сте осъдени на доживотен тормоз и присмех, когато някой погледне първо личната ви карта, а после вас (или обратното).

Много имоти, никакви данъци

Но да допуснем, че сте хетеросексуални, не сте трансджендър и не се интересувате от политика – нито вътрешна, нито външна. Дори и в този случай БПЦ влияе върху живота ви. Посредством парите във вашия джоб.

Колко имота притежава Църквата, никой простосмъртен не знае. По всяка вероятност те струват милиарди, както показва разследване на bTV. Не става дума само за църкви, параклиси и манастири. БПЦ притежава например луксозни магазини и заведения в идеалния център на София, парцели в доста столични квартали, както и жилищни сгради. И си строи цял комплекс на територията на „Манастирски ливади“.

Не само за имотите, но и за печалбите от наеми и продажби Църквата не плаща данъци, макар че на теория дължи данъци върху печалбата от търговска дейност. Защото ги интерпретира като… дарения от вярващи. А държавата не си и поставя за цел да съпостави тази интерпретация със закона. Така в бюджета не влизат (може би) милиони. А печалбите отиват неизвестно къде.

Видно е обаче къде не отиват. Не отиват за ремонт на църкви и манастири в окаяно състояние, за чието спасяване се събират дарения. Още по-малко отиват за подпомагане на нуждаещите се. БПЦ не изглежда да осъзнава, че Църквата би следвало да има и социална функция, каквато имат християнските църкви на много места по света.

БПЦ не плаща данъци на държавата, но пък държавата плаща на Църквата. За заплати на свещеници, че и за изграждане на църкви. В разследването на bTV Горан Благоев твърди, че в държавния бюджет за 2024 г. перото за Църквата е 38,4 млн. лв. Той поставя въпроса за липсващата социална дейност на Църквата и пита: „Трябва ли държавата да налива пари в институция, която не изпълнява обществената си функция?“

Благоев беше водещ на предаването „Вяра и общество“ по БНТ, но в резултат на критичната му позиция към БПЦ предаването беше свалено. Това е впрочем още един случай, когато държавата застава на страната на Църквата, а не на гражданите си.

България още ли е светска?

През 2017 г. журналистката Татяна Ваксберг попита: „България не беше ли светска?“ Седем години по-късно този въпрос все повече прилича на табу. Той не се поставя (както отбелязва Ваксберг още преди десет години). Ако някой се подлъже да го зададе, следва мълчание. В същото време ролята на религията в публичния живот расте. Тоест не на православието само по себе си, а на определени негови политически интерпретации.

Това е обидно и за някои вярващи, които виждат как с религията им се злоупотребява. Нерелигиозните и изповядващите религии, различни от християнството, обаче проявяват завидна търпимост към прогресивното стесняване на светското поле в публичния живот. Или по-скоро непукизъм.

Срастването на държавата и Църквата в България е проблем не само на ЛГБТИ+ хората, които не могат да сключат брак или да сменят гражданския си пол. По един или друг начин то е проблем на всички. И ще става все по-голям. Освен за онези, които прибират печалбата. И финансовата, и политическата.

На север: Острови от вълна сред вълните на Атлантика (първа част)

Post Syndicated from Светла Стоянова original https://www.toest.bg/na-sever-ostrovi-ot-vulna-sred-vulnitie-na-atlantika-purva-chast/

Пристигнахме в пълна тъмница, мокър, хладен вятър ни забрули за добре дошли. Разпънахме палатката на една прогизнала поляна и се заслушахме в неравномерното барабанене на капките по платното. Вече на сухо и в топлия спален чувал, дъждът ми се струваше като нежна приспивна песен. Сутринта, щом дръпнах ципа на вратичката, слънчев лъч проблесна в очите ми и за първи път осъзнах къде се намираме. Пред мен се ширеше синият океан и от него бяха изникнали няколко яркозелени острова, стръмни като планини. Полярни буревестници плавно се спускаха, почти докосваха водната повърхност, плъзгаха се на сантиметри от нея и наново се издигаха с все същата грация. В далечината пориви на вятъра оставяха следите си в морето, сякаш нещо невидимо близваше водата и я разнасяше на хиляди малки капчици във въздуха. Бях тук, чувствах мократа трева по пръстите си и знаех, че не е видение.

На север: Острови от вълна сред вълните на Атлантика (първа част)

Два пъти имах късмета да посетя Фарьорските острови и да палаткувам на шест от тях в два различни сезона. Но не си представяйте екзотични дестинации като Канарските или Карибските острови, а по-скоро архипелаг, брулен от северноатлантическия вятър някъде между Норвегия, Шотландия и Исландия. Осемнайсетте острова са с разнообразен релеф, някои – със стръмни зелени поляни и отвесни базалтови скали, а други – разчленени от фиорди, каменисти заливи и черни плажове, неуморно изложени на шлифоването на соления океан. Въпреки това почти всички са обитаеми. От хиляди овце, хора, диви зайци и птици – всички те изложени на произвола на времето и милостта на океана.

Името на островите означава „острови на овцете“, при това с право, защото и до днес животните са почти два пъти повече от населението.

През цялата година овцете пасат навън, тъй като естествени хищници на островите няма, а вълната им ги пази от студа и вятъра благодарение на високото си съдържание на ланолин – мазен секрет по вълнените влакна, който я прави водоустойчива. Това е една от най-топлите вълни в света и стара местна поговорка гласи: Ull er Føroya gull (Вълната е фарьорското злато).

На север: Острови от вълна сред вълните на Атлантика (първа част)
Фарьорски овце през май © Светла Стоянова

В продължение на векове сръчните фарьорски ръце са я обработвали, а след това изплитали топли пуловери с характерни местни шарки, дебели чорапи и специални ръкавици с по два палеца. Плетените изделия били безценни за местното население, но служели и за ключов износ и разменна стока срещу сол, захар, кафе и други. Когато мъжете тръгвали за риба, жените им оставали у дома да предат и плетат, както и да поправят вече поизносените дрехи.

Типично е, че колкото повече разноцветни нишки се вплитат, толкова по-топъл и с по-добра водоустойчивост е пуловерът, затова и измислянето на все по-сложни шарки винаги е било любимо занимание.

На север: Острови от вълна сред вълните на Атлантика (първа част)
Плетене на фарьорски пуловер с исландска вълна © Светла Стоянова

Широко разпространени били и ръкавиците с два палеца, така че, щом едната страна се поизноси от тегленето на рибарските мрежи, да могат да я обърнат с опакото и ръкавицата да има втори живот. В книжарниците днес се продават каталози, събрали всички открити фарьорски плетки и шарки, които продължават да вдъхновяват млади и стари плетачи и днес.

За съжаление обаче, в наши дни вълната, която се продава там, не е от фарьорски овце, а най-често от шетландски, новозеландски, исландски, перуански и други. Или дори и да се намира такава, тя често е смесица с вече споменатите – в неизвестно съотношение. След разговори с не един местен овчар научих, че ежегодно хилядите килограми остригана вълна, която притежава страхотни качества, бива изхвърляна или чисто и просто изгаряна. Обяснението – нямало пари във вълната.

На север: Острови от вълна сред вълните на Атлантика (първа част)
Фарьорски овце и отвесни скали © Светла Стоянова

Но как „фарьорското злато” се превръща в пепел?

Нямало машини, които да я предат, нямало и хора, които да се занимават с това… Разказаха ми, че някои ентусиасти се опитват да я изпращат във Великобритания или Германия, за да се изпреде от нея прежда и после да я върнат за продан в родината, но разбира се, тази разходка оскъпява крайния продукт до такава степен, че тези, които могат да си го позволят, са твърде малко.

Овцете на Фарьорските острови живеят почти като диви животни, защото са целогодишно навън на самотек. Те впечатляват с умението да се справят в екстремни условия, само и само да достигнат най-свежата зелена тревичка – катерят се по главозамайващи урви, спускат се в стръмни каньони, промъкват се из немислими скривалища. Един овчар ми разказа, че ги събират три пъти годишно: през зимата, за да им дадат лекарства и допълнително сено, през лятото, за да ги острижат, и през есента, за да изберат кои ще оставят, заколят или подберат за разплод.

Стадата всъщност са колективни и овцете се разпределят според това кой колко земя притежава. Това означава, че почти всеки фарьорец има определен дял овце. Все още до къщите съществуват hjallur – малки дървени колибки за сушене на месо и риба, които са силно проветриви, така че морският вятър да изсуши и по естествен път да осоли месото и рибата. Всеки разполага със свои запаси месо и такова почти никога не може да се намери в супермаркетите. За сметка на това в магазините е пълно с агнешко от Нова Зеландия или Исландия, което е шокиращо предвид броя на местните овце. Същото важи и за сушената риба, най-често внос отново от Исландия, при все че и за Фарьорските острови тя е основен поминък. Всичко това ме кара да се надявам, че традициите все още са живи и хората сами осигуряват храната си, съответно няма нужда да я купуват.

Един ден в селцето Gásadalur, попаднахме на бележка от местен, който продаваше типичното skerpikjøt – сушено и естествено осолено агнешко, престояло окачено във въпросните hjallur поне шест месеца. Похлопах на зелената дървена порта на типичната фарьорска къща и ми отвори брадат фарьорец, който ни заведе до своя сушилник за месо. Там избрахме един агнешки бут и той обясни, че е най-вкусно на тънки резени върху филия с масълце. Послушахме съвета му и след малко вече режехме неуверено от сушения бут. Вкусът беше наситен, подобно на някое плесенясало сирене, и едновременно с това пристрастяващ.

На север: Острови от вълна сред вълните на Атлантика (първа част)
Типична фарьорска къща © Светла Стоянова

Други страстни любители на особено добрия вкус обаче стигат до крайности и тук ще ви разкажа за две срещи с тях.

Тръгнахме по необозначен планински път, който се изкачваше постепенно до скалист ръб и оттатък ръба следваше стръмно нанадолнище. Заслизахме внимателно, а гледката насреща комбинираше островърхи планини, водопади, изливащи се право в морето, и скромен залив с тучна поляна, точно като за нашата палатка. Всичко това беше постоянно загръщано от валма облаци и препускащи мъгли. Бяхме изминали сигурно половината път надолу, когато съзряхме трима катерещи баира. Предупредиха ни, че нататък пътят е изключително труден и измамен, затова разумно обърнахме посоката и изкачихме на бърз ход стръмнината, по която току-що се бяхме спуснали. Оказа се, че тримата фарьорци се прибирали по този път, след като оставили два бивола в долината с тучната поляна, която си бяхме набелязали и ние. „От онази трева долу става най-вкусното месо!“, намигна с лукава усмивка единият от тях. Цял ден вървели с биволите, оставили ги и сега със сетни сили катереха последните скали по урвата. Явно си заслужаваше усилията! Щом стигнахме отново ръба, ни показаха друг, по-обиколен път до жадуваната от нас и биволите поляна. Чифтокопитни не срещнахме, но попаднахме на тайна ловна хижа, която приютявала ловците на диви зайци години наред. Вътре имаше тетрадка, в която местните бяха описвали перипетиите си около лова, а аз бях приятно учудена, че разбирам повечето от написаното, тъй като фарьорският език изключително се доближава до исландския. Оказа се, че зайците били внесени на островите от Норвегия през ХIX век като допълнителен източник на храна. Внесените диви зайци, които в норвежките планини зиме сменяли козината си с чистобяла, изменили зимната си премяна в синкавосива с бяла опашка, за да се вписват по-добре в не толкова снежните зими на островите.

На север: Острови от вълна сред вълните на Атлантика (първа част)
Стръмното спускане до поляната с биволите © Светла Стоянова

При среща с друг местен овчар научихме, че синовете му всяка година закарвали овцете с лодка до най-отдалечената и стръмна част на един от островите. Мястото било обрасло с ароматна ангелика, наричана у нас пищялка, която овцете обожавали, и после специфичното ѝ ухание оставало във вкуса на агнешкото. Затова двамата сина качвали напролет 40-килограмовите овце на въпросното място и ги прибирали наесен вече 80-килограмови, като цялото пренасяне на своенравните овце от и на лодката било доста рисковано. Всеки път, продължи овчарят, жена му умирала от страх за двамата си сина, тъй като неведнъж в неспокойните води по тези места имало лодко- и корабокрушения.

Освен на овцете, стопанството на страната разчита и на рибата. Географското положение предразполага към прекрасен улов, а фарьорците са умели рибари от векове. Преобладават атлантическата треска, скумрията и херингата, както и морският дявол, атлантическата камбала и др. Рибата е основна част от менюто на фарьорците и все още има хора, които я ловят малко преди вечеря.

Днес на много места в по-тихите заливи има и развъдници за сьомга. Още жива тя се пренася във фабрики, където се разфасова и изпраща по света. В тези развъдници обаче лесно се развиват болести и паразити сред рибите, а също така се използват вредни химикали, които вкупом замърсяват водите извън развъдниците и вредят на цялостната морска екосистема.

През последните години все повече се развива събирането на водорасли и производството на продукти от тях. Водораслите поглъщат енергия от слънцето и се хранят с въглероден диоксид, така че едновременно служат за пречиствателни станции на океана и имат безброй приложения – като заместител на месото, хранителни добавки, фураж за животни, за направа на опаковки от биопластмаса и дори за биогориво. Има бъдеще във водораслите и днес се откриват все повече полезни приложения в медицината, козметиката и хранителната индустрия, стига да се подходи с известна доза изобретателност.

Подложени на северния променлив климат, обсипани с овце и обкръжавани от риба и водорасли, Фарьорските острови придобиват все по-голяма популярност и като туристическа дестинация. Двата начина да стигнете до тях са с кораб или с полет до единственото летище, което е еднопистово. На място страната разполага с прекрасни пътища, впечатляващо количество междуостровни тунели, модерни фериботи и дори хеликоптери за обществен транспорт. Но повече за това – в следващата част, защото пътуването между островите и срещите с местните продължават…

Дъждът се сипе като през сито и ситни по разрошените от вятъра коси. Вървим през ниските растения и току разменяме погледи с овцете, които любопитно надигат глави от опосканата трева. Възрастните ни поздравяват с по едно „беее“, а сгушилите се до тях агънца гледат с големи очи като нарисувани в детска книжка. Щом подминаваме, започват да сучат от майките си и енергично да махат с опашка.

(Следва продължение.)

[$] Eliminating indirect calls for security modules

Post Syndicated from corbet original https://lwn.net/Articles/979683/

Like many kernel subsystems, the Linux security module (LSM) subsystem
makes extensive use of indirect function calls. Those calls, however, are
increasingly problematic, and the pressure to remove them has been growing.
The good news is that there is a patch
series
from KP Singh that accomplishes that goal. Its progress into
the mainline has been slow — this change was first proposed
by Brendan Jackman and Paul Renauld in 2020 — and this work has been caught
up in some wider controversies along the way, but it should be close to
being ready.

Takeaways From The Take Command Summit: Navigating Modern SOC Challenges

Post Syndicated from Emma Burdett original https://blog.rapid7.com/2024/07/02/takeaways-from-the-take-command-summit-navigating-modern-soc-challenges/

Takeaways From The Take Command Summit: Navigating Modern SOC Challenges

At our recent Take Command summit, experts delved into the pressing challenges faced by SOC teams. With 2,365 more data breaches in 2023 than in 2022 (74% of which were a direct result of cyber attacks), the need for robust security operations has never been greater.

Key takeaways from the 25 minute panel:

  1. Emphasizing Proactive Defense: SOC teams must prioritize proactive threat detection and intelligence gathering to stay ahead of evolving cyber threats.
  2. Enhancing Response Times: Reducing incident response times is crucial for mitigating the impact of security breaches and minimizing damage.
  3. Leveraging Advanced Tools: Utilizing advanced threat detection technologies, such as AI and machine learning, can significantly improve the ability to identify and respond to sophisticated attacks.

Key Quote:

“The increasing use of native tools by threat actors means they can stay hidden longer, complicating our detection efforts.”  – Lonnie Best, Detection & Response Services Manager, Rapid7.

The evolving threat landscape requires SOC teams to enhance detection capabilities and streamline operations. To dive deeper into these insights, click through to watch the full discussion.

Security updates for Tuesday

Post Syndicated from corbet original https://lwn.net/Articles/980393/

Security updates have been issued by AlmaLinux (httpd:2.4/httpd), Arch Linux (openssh), Fedora (cups, emacs, and python-urllib3), Gentoo (OpenSSH), Mageia (ffmpeg, gdb, openssl, python-idna, and python-imageio), Red Hat (golang and kernel), SUSE (booth, libreoffice, openssl-1_1-livepatches, podman, python-arcomplete, python-Fabric, python-PyGithub, python- antlr4-python3-runtime, python-avro, python-chardet, python-distro, python- docker, python-fakeredis, python-fixedint, pyth, python-Js2Py, python310, python39, and squid), and Ubuntu (cups and netplan.io).

Spotting Phishing Emails: Essential Tips from Nebosystems

Post Syndicated from Editor original https://nebosystems.eu/spotting-phishing-emails-essential-tips/

Welcome to Nebosystems! We are excited to announce the release of our first video on YouTube: “How to Identify Phishing Emails.” In today’s digital age, protecting yourself from cyber threats is more important than ever. Phishing emails are one of the most common and dangerous types of cyber attacks, but with the right knowledge, you can easily spot and avoid them.

In this video, you’ll learn:

  • How to recognize suspicious sender information.
  • How to identify urgent or threatening language.
  • Tips for checking suspicious links and attachments.
  • The importance of watching for spelling and grammar mistakes.
  • Why legitimate companies will never ask for sensitive information via email.

Stay safe online by following these essential tips. If you find this video helpful, don’t forget to like and subscribe to our YouTube channel for more cybersecurity tips and tricks!

Watch the video now:

For more comprehensive protection, explore our Information Security Services designed to safeguard your digital assets and ensure your business remains secure. Learn more about our services here: Information Security Services.

Stay informed with the latest updates and insights on cybersecurity by visiting our blog: Nebosystems Blog.

For any inquiries or further information, feel free to contact us.

Supermicro AS-1115SV-WTNRT Review 1U AMD EPYC Siena Server

Post Syndicated from Patrick Kennedy original https://www.servethehome.com/supermicro-as-1115sv-wtnrt-review-1u-amd-epyc-siena-broadcom-server/

In our Supermicro AS-1115SV-WTNRT review, we see how this 1U AMD EPYC 8004 platform provides lots of NVMe and expansion due to its WIO design

The post Supermicro AS-1115SV-WTNRT Review 1U AMD EPYC Siena Server appeared first on ServeTheHome.

Scientific Linux 7 reaches end of life

Post Syndicated from corbet original https://lwn.net/Articles/980312/

While the end of support for CentOS 7, which happened on June 30, is
significant, it is also worth taking a moment to reflect on the end of
Scientific Linux 7, which has also just occurred. Scientific Linux
was once a popular RHEL rebuild supported by Fermilab, CERN, DESY, and ETH
Zurich. Development of Scientific Linux stopped with SL7, with the labs
switching to CentOS thereafter, but the SL7 release was supported through
to the bitter end. Thanks are due to all who built and supported
Scientific Linux; you provided a useful and stable platform for many years.

Introducing self-managed data sources for Amazon OpenSearch Ingestion

Post Syndicated from Muthu Pitchaimani original https://aws.amazon.com/blogs/big-data/introducing-self-managed-data-sources-for-amazon-opensearch-ingestion/

Enterprise customers increasingly adopt Amazon OpenSearch Ingestion (OSI) to bring data into Amazon OpenSearch Service for various use cases. These include petabyte-scale log analytics, real-time streaming, security analytics, and searching semi-structured key-value or document data. OSI makes it simple, with straightforward integrations, to ingest data from many AWS services, including Amazon DynamoDB, Amazon Simple Storage Service (Amazon S3), Amazon Managed Streaming for Apache Kafka (Amazon MSK), and Amazon DocumentDB (with MongoDB compatibility).

Today we are announcing support for ingesting data from self-managed OpenSearch/Elasticsearch and Apache Kafka clusters. These sources can either be on Amazon Elastic Compute Cloud (Amazon EC2) or on-premises environments.

In this post, we outline the steps to get started with these sources.

Solution overview

OSI supports the AWS Cloud Development Kit (AWS CDK), AWS CloudFormation, the AWS Command Line Interface (AWS CLI), Terraform, AWS APIs, and the AWS Management Console to deploy pipelines. In this post, we use the console to demonstrate how to create a self-managed Kafka pipeline.

Prerequisites

To make sure OSI can connect and read data successfully, the following conditions should be met:

  • Network connectivity to data sources – OSI is generally deployed in a public network, such as the internet, or in a virtual private cloud (VPC). OSI deployed in a customer VPC is able to access data sources in the same or different VPC and on the internet with an attached internet gateway. If your data sources are in another VPC, common methods for network connectivity include direct VPC peering, using a transit gateway, or using customer managed VPC endpoints powered by AWS PrivateLink. If your data sources are on your corporate data center or other on-premises environment, common methods for network connectivity include AWS Direct Connect and using a network hub like a transit gateway. The following diagram shows a sample configuration of OSI running in a VPC and using Amazon OpenSearch Service as a sink. OSI runs in a service VPC and creates an Elastic Network interface (ENI) in the customer VPC. For self-managed data source these ENIs are used for reading data from on-premises environment. OSI creates an VPC endpoint in the service VPC to send data to the sink.
  • Name resolution for data sources – OSI uses an Amazon Route 53 resolver. This resolver automatically answers queries to names local to a VPC, public domain names on the internet, and records hosted in private hosted zones. If you’re are using a private hosted zone, make sure you have a DHCP option set enabled, attached to the VPC using AmazonProvidedDNS as domain name server. For more information, see Work with DHCP option sets. Additionally, you can use resolver inbound and outbound endpoints if you need a complex resolution schemes with conditions that are beyond a simple private hosted zone.
  • Certificate verification for data source names – OSI supports only SASL_SSL for transport for Apache Kafka source. Within SASL, Amazon OpenSearch Service supports most authentication mechanisms like PLAIN, SCRAM, IAM, GSAPI and others. When using SASL_SSL, make sure you have access to certificates needed for OSI to authenticate. For self-managed OpenSearch data sources, make sure verifiable certificates are installed on the clusters. Amazon OpenSearch Service doesn’t support insecure communication between OSI and OpenSearch. Certificate verification cannot be turned off. In particular, the “insecure” configuration option is not supported.
  • Access to AWS Secrets Manager – OSI uses AWS Secrets Manager to retrieve credentials and certificates needed to communicate with self-managed data sources. For more information, see Create and manage secrets with AWS Secrets Manager.
  • IAM role for pipelines – You need an AWS Identity and Access Management (IAM) pipeline role to write to data sinks. For more information, see Identity and Access Management for Amazon OpenSearch Ingestion.

Create a pipeline with self-managed Kafka as a source

After you complete the prerequisites, you’re ready to create a pipeline for your data source. Complete the following steps:

  1. On the OpenSearch Service console, choose Pipelines under Ingestion in the navigation pane.
  2. Choose Create pipeline.
  3. Choose Streaming under Use case in the navigation pane.
  4. Select Self managed Apache Kafka under Ingestion pipeline blueprints and choose Select blueprint.

This will populate a sample configuration for this pipeline.

  1. Provide a name for this pipeline and choose the appropriate pipeline capacity.
  2. Under Pipeline configuration, provide your pipeline configuration in YAML format. The following code snippet shows sample configuration in YAML for SASL_SSL authentication:
    version: '2'
    kafka-pipeline:
      source:
        kafka:
          acknowledgments: true
          bootstrap_servers:
            - 'node-0.example.com:9092'
          encryption:
            type: "ssl"
            certificate: '${{aws_secrets:kafka-cert}}'
            
          authentication:
            sasl:
              plain:
                username: '${{aws_secrets:secrets:username}}'
                password: '${{aws_secrets:secrets:password}}'
          topics:
            - name: on-prem-topic
              group_id: osi-group-1
      processor:
        - grok:
            match:
              message:
                - '%{COMMONAPACHELOG}'
        - date:
            destination: '@timestamp'
            from_time_received: true
      sink:
        - opensearch:
            hosts: ["https://search-domain-12345567890.us-east-1.es.amazonaws.com"]
            aws:
              region: us-east-1
              sts_role_arn: 'arn:aws:iam::123456789012:role/pipeline-role'
            index: "on-prem-kakfa-index"
    extension:
      aws:
        secrets:
          kafka-cert:
            secret_id: kafka-cert
            region: us-east-1
            sts_role_arn: 'arn:aws:iam::123456789012:role/pipeline-role'
          secrets:
            secret_id: secrets
            region: us-east-1
            sts_role_arn: 'arn:aws:iam::123456789012:role/pipeline-role'

  1. Choose Validate pipeline and confirm there are no errors.
  2. Under Network configuration, choose Public access or VPC access. (For this post, we choose VPC access).
  3. If you chose VPC access, specify your VPC, subnets, and an appropriate security group so OSI can reach the outgoing ports for the data source.
  4. Under VPC attachment options, select Attach to VPC and choose an appropriate CIDR range.

OSI resources are created in a service VPC managed by AWS that is separate from the VPC you chose in the last step. This selection allows you to configure what CIDR ranges OSI should use inside this service VPC. The choice exists so you can make sure there is no address collision between CIDR ranges in your VPC that is attached to your on-premises network and this service VPC. Many pipelines in your account can share same CIDR ranges for this service VPC.

  1. Specify any optional tags and log publishing options, then choose Next.
  2. Review the configuration and choose Create pipeline.

You can monitor the pipeline creation and any log messages in the Amazon CloudWatch Logs log group you specified. Your pipeline should now be successfully created. For more information about how to provision capacity for the performance of this pipeline, see the section Recommended Compute Units (OCUs) for the MSK pipeline in Introducing Amazon MSK as a source for Amazon OpenSearch Ingestion.

Create a pipeline with self-managed OpenSearch as a source

The steps for creating a pipeline for self-managed OpenSearch are similar to the steps for creating one for Kafka. During the blueprint selection, choose Data Migration under Use case and select Self managed OpenSearch/Elasticsearch. OpenSearch Ingestion can source data from all versions of OpenSearch and Elasticsearch from version 7.0  to  version 7.10.

The following blueprint shows a sample configuration YAML for this data source:

version: "2"
opensearch-migration-pipeline:
  source:
    opensearch:
      acknowledgments: true
      hosts: [ "https://node-0.example.com:9200" ]
      username: "${{aws_secrets:secret:username}}"
      password: "${{aws_secrets:secret:password}}"
      indices:
        include:
        - index_name_regex: "opensearch_dashboards_sample_data*"
        exclude:
          - index_name_regex: '\..*'
  sink:
    - opensearch:
        hosts: [ "https://search-domain-12345567890.us-east-1.es.amazonaws.com" ]
        aws:
          sts_role_arn: "arn:aws:iam::123456789012:role/pipeline-role"
          region: "us-east-1"
        index: "on-prem-os"
extension:
  aws:
    secrets:
      secret:
        secret_id: "self-managed-os-credentials"
        region: "us-east-1"
        sts_role_arn: "arn:aws:iam::123456789012:role/pipeline-role"
        refresh_interval: PT1H

Considerations for self-managed OpenSearch data source

Certificates installed on the OpenSearch cluster need to be verifiable for OSI to connect to this data source before reading data. Insecure connections are currently not supported.

After you’re connected, make sure the cluster has sufficient read bandwidth to allow for OSI to read data. Use the Min and Max OCU setting to limit OSI read bandwidth consumption. Your read bandwidth will vary depending upon data volume, number of indexes, and provisioned OCU capacity. Start small and increase the number of OCUs to balance between available bandwidth and acceptable migration time.

This source is typically meant for one-time migration of data and not as continuous ingestion to keep data in sync between data sources and sinks.

OpenSearch Service domains support remote reindexing, but that consumes resources in your domains. Using OSI will move this compute out of the domain, and OSI can achieve significantly higher bandwidth than remote reindexing, thereby resulting in faster migration times.

OSI doesn’t support deferred replay or traffic recording today; refer to Migration Assistant for Amazon OpenSearch Service if your migration needs those capabilities.

Conclusion

In this post, we introduced self-managed sources for OpenSearch Ingestion that enable you to ingest data from corporate data centers or other on-premises environments. OSI also supports various other data sources and integrations. Refer to Working with Amazon OpenSearch Ingestion pipeline integrations to learn about these other data sources.


About the Authors

Muthu Pitchaimani is a Search Specialist with Amazon OpenSearch Service. He builds large-scale search applications and solutions. Muthu is interested in the topics of networking and security, and is based out of Austin, Texas.

Arjun Nambiar is a Product Manager with Amazon OpenSearch Service. He focuses on ingestion technologies that enable ingesting data from a wide variety of sources into Amazon OpenSearch Service at scale. Arjun is interested in large-scale distributed systems and cloud-centered technologies, and is based out of Seattle, Washington.

Upcoming Book on AI and Democracy

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/07/upcoming-book-on-ai-and-democracy.html

If you’ve been reading my blog, you’ve noticed that I have written a lot about AI and democracy, mostly with my co-author Nathan Sanders. I am pleased to announce that we’re writing a book on the topic.

This isn’t a book about deep fakes, or misinformation. This is a book about what happens when AI writes laws, adjudicates disputes, audits bureaucratic actions, assists in political strategy, and advises citizens on what candidates and issues to support. It’s a book that tries to look into what an AI-assisted democratic system might look like, and then at how to best ensure that we make use of the good parts while avoiding the bad parts.

This is what I talked about in my RSA Conference speech last month, which you can both watch and read. (You can also read earlier attempts at this idea.)

The book will be published by MIT Press sometime in fall 2025, with an open-access digital version available a year after that. (It really can’t be published earlier. Nothing published this year will rise above the noise of the US presidential election, and anything published next spring will have to go to press without knowing the results of that election.)

Right now, the organization of the book is in six parts:

AI-Assisted Politicians
AI-Assisted Legislators
The AI-Assisted Administration
The AI-Assisted Legal System
AI-Assisted Citizens
Getting the Future We Want

It’s too early to share a more detailed table of contents, but I would like help thinking about titles. Below are my current list of brainstorming ideas: both titles and subtitles. Please mix and match, or suggest your own in the comments. No idea is too far afield, because anything can spark more ideas.

Titles:

AI and Democracy
Democracy with AI
Democracy after AI
Democratia ex Machina
Democracy ex Machina
E Pluribus, Machina
Democracy and the Machines
Democracy with Machines
Building Democracy with Machines
Democracy in the Loop
We the People + AI
Artificial Democracy
AI Enhanced Democracy
The State of AI
Citizen AI

Trusting the Bots
Trusting the Computer
Trusting the Machine

The End of the Beginning
Sharing Power
Better Run
Speed, Scale, Scope, and Sophistication
The New Model of Governance
Model Citizen
Artificial Individualism

Subtitles:

How AI Upsets the Power Balances of Democracy
Twenty (or So) Ways AI will Change Democracy
Reimagining Democracy for the Age of AI
Who Wins and Loses
How Democracy Thrives in an AI-Enhanced World
Ensuring that AI Enhances Democracy and Doesn’t Destroy It
How AI Will Change Politics, Legislating, Bureaucracy, Courtrooms, and Citizens
AI’s Transformation of Government, Citizenship, and Everything In-Between
Remaking Democracy, from Voting to Legislating to Waiting in Line
How to Make Democracy Work for People in an AI Future
How AI Will Totally Reshape Democracies and Democratic Institutions
Who Wins and Loses when AI Governs
How to Win and Not Lose With AI as a Partner
AI’s Transformation of Democracy, for Better and for Worse
How AI Can Improve Society and Not Destroy It
How AI Can Improve Society and Not Subvert It
Of the People, for the People, with a Whole lot of AI
How AI Will Reshape Democracy
How the AI Revolution Will Reshape Democracy

Combinations:

Imagining a Thriving Democracy in the Age of AI: How Technology Enhances Democratic Ideals and Nurtures a Society that Serves its People

Making Model Citizens: How to Put AI to Use to Help Democracy
Modeling Citizenship: Who Wins and Who Loses when AI Transforms Democracy
A Model for Government: Democracy with AI, and How to Make it Work for Us

AI of, By, and for the People: How Artificial Intelligence will reshape Democracy
The (AI) Political Revolution: Speed, Scale, Scope, Sophistication, and our Democracy
Speed, Scale, Scope, Sophistication: The AI Democratic Revolution
The Artificial Political Revolution: X Ways AI will Change Democracy…Forever

EDITED TO ADD (7/10): More options:

The Silicon Realignment: The Future of Political Power in a Digital World
Political Machines
EveryTHING is political

Amazon MWAA best practices for managing Python dependencies

Post Syndicated from Mike Ellis original https://aws.amazon.com/blogs/big-data/amazon-mwaa-best-practices-for-managing-python-dependencies/

Customers with data engineers and data scientists are using Amazon Managed Workflows for Apache Airflow (Amazon MWAA) as a central orchestration platform for running data pipelines and machine learning (ML) workloads. To support these pipelines, they often require additional Python packages, such as Apache Airflow Providers. For example, a pipeline may require the Snowflake provider package for interacting with a Snowflake warehouse, or the Kubernetes provider package for provisioning Kubernetes workloads. As a result, they need to manage these Python dependencies efficiently and reliably, providing compatibility with each other and the base Apache Airflow installation.

Python includes the tool pip to handle package installations. To install a package, you add the name to a special file named requirements.txt. The pip install command instructs pip to read the contents of your requirements file, determine dependencies, and install the packages. Amazon MWAA runs the pip install command using this requirements.txt file during initial environment startup and subsequent updates. For more information, see How it works.

Creating a reproducible and stable requirements file is key for reducing pip installation and DAG errors. Additionally, this defined set of requirements provides consistency across nodes in an Amazon MWAA environment. This is most important during worker auto scaling, where additional worker nodes are provisioned and having different dependencies could lead to inconsistencies and task failures. Additionally, this strategy promotes consistency across different Amazon MWAA environments, such as dev, qa, and prod.

This post describes best practices for managing your requirements file in your Amazon MWAA environment. It defines the steps needed to determine your required packages and package versions, create and verify your requirements.txt file with package versions, and package your dependencies.

Best practices

The following sections describe the best practices for managing Python dependencies.

Specify package versions in the requirements.txt file

When creating a Python requirements.txt file, you can specify just the package name, or the package name and a specific version. Adding a package without version information instructs the pip installer to download and install the latest available version, subject to compatibility with other installed packages and any constraints. The package versions selected during environment creation may be different than the version selected during an auto scaling event later on. This version change can create package conflicts leading to pip install errors. Even if the updated package installs properly, code changes in the package can affect task behavior, leading to inconsistencies in output. To avoid these risks, it’s best practice to add the version number to each package in your requirements.txt file.

Use the constraints file for your Apache Airflow version

A constraints file contains the packages, with versions, verified to be compatible with your Apache Airflow version. This file adds an additional validation layer to prevent package conflicts. Because the constraints file plays such an important role in preventing conflicts, beginning with Apache Airflow v2.7.2 on Amazon MWAA, your requirements file must include a --constraint statement. If a --constraint statement is not supplied, Amazon MWAA will specify a compatible constraints file for you.

Constraint files are available for each Airflow version and Python version combination. The URLs have the following form:

https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt

The official Apache Airflow constraints are guidelines, and if your workflows require newer versions of a provider package, you may need to modify your constraints file and include it in your DAG folder. When doing so, the best practices outlined in this post become even more important to guard against package conflicts.

Create a .zip archive of all dependencies

Creating a .zip file containing the packages in your requirements file and specifying this as the package repository source makes sure the exact same wheel files are used during your initial environment setup and subsequent node configurations. The pip installer will use these local files for installation rather than connecting to the external PyPI repository.

Test the requirements.txt file and dependency .zip file

Testing your requirements file before release to production is key to avoiding installation and DAG errors. Testing both locally, with the MWAA local runner, and in a dev or staging Amazon MWAA environment, are best practices before deploying to production. You can use continuous integration and delivery (CI/CD) deployment strategies to perform the requirements and package installation testing, as described in Automating a DAG deployment with Amazon Managed Workflows for Apache Airflow.

Solution overview

This solution uses the MWAA local runner, an open source utility that replicates an Amazon MWAA environment locally. You use the local runner to build and validate your requirements file, and package the dependencies. In this example, you install the snowflake and dbt-cloud provider packages. You then use the MWAA local runner and a constraints file to determine the exact version of each package compatible with Apache Airflow. With this information, you then update the requirements file, pinning each package to a version, and retest the installation. When you have a successful installation, you package your dependencies and test in a non-production Amazon MWAA environment.

We use MWAA local runner v2.8.1 for this walkthrough and walk through the following steps:

  1. Download and build the MWAA local runner.
  2. Create and test a requirements file with package versions.
  3. Package dependencies.
  4. Deploy the requirements file and dependencies to a non-production Amazon MWAA environment.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Set up the MWAA local runner

First, you download the MWAA local runner version matching your target MWAA environment, then you build the image.

Complete the following steps to configure the local runner:

  1. Clone the MWAA local runner repository with the following command:
    git clone [email protected]:aws/aws-mwaa-local-runner.git -b v2.8.1

  2. With Docker running, build the container with the following command:
    cd aws-mwaa-local-runner
     ./mwaa-local-env build-image

Create and test a requirements file with package versions

Building a versioned requirements file makes sure all Amazon MWAA components have the same package versions installed. To determine the compatible versions for each package, you start with a constraints file and an un-versioned requirements file, allowing pip to resolve the dependencies. Then you create your versioned requirements file from pip’s installation output.

The following diagram illustrates this workflow.

Requirements file testing process

To build an initial requirements file, complete the following steps:

  1. In your MWAA local runner directory, open requirements/requirements.txt in your preferred editor.

The default requirements file will look similar to the following:

--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.11.txt"
apache-airflow-providers-snowflake==5.2.1
apache-airflow-providers-mysql==5.5.1
  1. Replace the existing packages with the following package list:
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.11.txt"
apache-airflow-providers-snowflake
apache-airflow-providers-dbt-cloud[http]
  1. Save requirements.txt.
  2. In a terminal, run the following command to generate the pip install output:
./mwaa-local-env test-requirements

test-requirements runs pip install, which handles resolving the compatible package versions. Using a constraints file makes sure the selected packages are compatible with your Airflow version. The output will look similar to the following:

Successfully installed apache-airflow-providers-dbt-cloud-3.5.1 apache-airflow-providers-snowflake-5.2.1 pyOpenSSL-23.3.0 snowflake-connector-python-3.6.0 snowflake-sqlalchemy-1.5.1 sortedcontainers-2.4.0

The message beginning with Successfully installed is the output of interest. This shows which dependencies, and their specific version, pip installed. You use this list to create your final versioned requirements file.

Your output will also contain Requirement already satisfied messages for packages already available in the base Amazon MWAA environment. You do not add these packages to your requirements.txt file.

  1. Update the requirements file with the list of versioned packages from the test-requirements command. The updated file will look similar to the following code:
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.11.txt"
apache-airflow-providers-snowflake==5.2.1
apache-airflow-providers-dbt-cloud[http]==3.5.1
pyOpenSSL==23.3.0
snowflake-connector-python==3.6.0
snowflake-sqlalchemy==1.5.1
sortedcontainers==2.4.0

Next, you test the updated requirements file to confirm no conflicts exist.

  1. Rerun the requirements-test function:
./mwaa-local-env test-requirements

A successful test will not produce any errors. If you encounter dependency conflicts, return to the previous step and update the requirements file with additional packages, or package versions, based on pip’s output.

Package dependencies

If your Amazon MWAA environment has a private webserver, you must package your dependencies into a .zip file, upload the file to your S3 bucket, and specify the package location in your Amazon MWAA instance configuration. Because a private webserver can’t access the PyPI repository through the internet, pip will install the dependencies from the .zip file.

If you’re using a public webserver configuration, you also benefit from a static .zip file, which makes sure the package information remains unchanged until it is explicitly rebuilt.

This process uses the versioned requirements file created in the previous section and the package-requirements feature in the MWAA local runner.

To package your dependencies, complete the following steps:

  1. In a terminal, navigate to the directory where you installed the local runner.
  2. Download the constraints file for your Python version and your version of Apache Airflow and place it in the plugins directory. For this post, we use Python 3.11 and Apache Airflow v2.8.1:
curl -o plugins/constraints-2.8.1-3.11.txt https://raw.githubusercontent.com/apache/airflow/constraints-2.8.1/constraints-3.11.txt
  1. In your requirements file, update the constraints URL to the local downloaded file.

The –-constraint statement instructs pip to compare the package versions in your requirements.txt file to the allowed version in the constraints file. Downloading a specific constraints file to your plugins directory enables you to control the constraint file location and contents.

The updated requirements file will look like the following code:

--constraint "/usr/local/airflow/plugins/constraints-2.8.1-3.11.txt"
apache-airflow-providers-snowflake==5.2.1
apache-airflow-providers-dbt-cloud[http]==3.5.1
pyOpenSSL==23.3.0
snowflake-connector-python==3.6.0
snowflake-sqlalchemy==1.5.1
sortedcontainers==2.4.0
  1. Run the following command to create the .zip file:
./mwaa-local-env package-requirements

package-requirements creates an updated requirements file named packaged_requirements.txt and zips all dependencies into plugins.zip. The updated requirements file looks like the following code:

--find-links /usr/local/airflow/plugins
--no-index
--constraint "/usr/local/airflow/plugins/constraints-2.8.1-3.11.txt"
apache-airflow-providers-snowflake==5.2.1
apache-airflow-providers-dbt-cloud[http]==3.5.1
pyOpenSSL==23.3.0
snowflake-connector-python==3.6.0
snowflake-sqlalchemy==1.5.1
sortedcontainers==2.4.0

Note the reference to the local constraints file and the plugins directory. The –-find-links statement instructs pip to install packages from /usr/local/airflow/plugins rather than the public PyPI repository.

Deploy the requirements file

After you achieve an error-free requirements installation and package your dependencies, you’re ready to deploy the assets to a non-production Amazon MWAA environment. Even when verifying and testing requirements with the MWAA local runner, it’s best practice to deploy and test the changes in a non-prod Amazon MWAA environment before deploying to production. For more information about creating a CI/CD pipeline to test changes, refer to Deploying to Amazon Managed Workflows for Apache Airflow.

To deploy your changes, complete the following steps:

  1. Upload your requirements.txt file and plugins.zip file to your Amazon MWAA environment’s S3 bucket.

For instructions on specifying a requirements.txt version, refer to Specifying the requirements.txt version on the Amazon MWAA console. For instructions on specifying a plugins.zip file, refer to Installing custom plugins on your environment.

The Amazon MWAA environment will update and install the packages in your plugins.zip file.

After the update is complete, verify the provider package installation in the Apache Airflow UI.

  1. Access the Apache Airflow UI in Amazon MWAA.
  2. From the Apache Airflow menu bar, choose Admin, then Providers.

The list of providers, and their versions, is shown in a table. In this example, the page reflects the installation of apache-airflow-providers-db-cloud version 3.5.1 and apache-airflow-providers-snowflake version 5.2.1. This list only contains the provider packages installed, not all supporting Python packages. Provider packages that are part of the base Apache Airflow installation will also appear in the list. The following image is an example of the package list; note the apache-airflow-providers-db-cloud and apache-airflow-providers-snowflake packages and their versions.

Airflow UI with installed packages

To verify all package installations, view the results in Amazon CloudWatch Logs. Amazon MWAA creates a log stream for the requirements installation and the stream contains the pip install output. For instructions, refer to Viewing logs for your requirements.txt.

A successful installation results in the following message:

Successfully installed apache-airflow-providers-dbt-cloud-3.5.1 apache-airflow-providers-snowflake-5.2.1 pyOpenSSL-23.3.0 snowflake-connector-python-3.6.0 snowflake-sqlalchemy-1.5.1 sortedcontainers-2.4.0

If you encounter any installation errors, you should determine the package conflict, update the requirements file, run the local runner test, re-package the plugins, and deploy the updated files.

Clean up

If you created an Amazon MWAA environment specifically for this post, delete the environment and S3 objects to avoid incurring additional charges.

Conclusion

In this post, we discussed several best practices for managing Python dependencies in Amazon MWAA and how to use the MWAA local runner to implement these practices. These best practices reduce DAG and pip installation errors in your Amazon MWAA environment. For additional details and code examples on Amazon MWAA, visit the Amazon MWAA User Guide and the Amazon MWAA examples GitHub repo.

Apache, Apache Airflow, and Airflow are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.


About the Author


Mike Ellis is a Technical Account Manager at AWS and an Amazon MWAA specialist. In addition to assisting customers with Amazon MWAA, he contributes to the Apache Airflow open source project.

Amazon DataZone enhances data discovery with advanced search filtering

Post Syndicated from Chaitanya Vejendla original https://aws.amazon.com/blogs/big-data/amazon-datazone-enhances-data-discovery-with-advanced-search-filtering/

Amazon DataZone, a fully managed data management service, helps organizations catalog, discover, analyze, share, and govern data between data producers and consumers. We are excited to announce the introduction of advanced search filtering capabilities in the Amazon DataZone business data catalog.

With the improved rendering of glossary terms, you can now navigate large sets of terms with ease in an expandable and collapsible hierarchy, reducing the time and effort required to locate specific data assets. The introduction of logical operators (AND and OR) for filtering allows for more precise searches, enabling you to combine multiple criteria in a way that best suits your needs. The descriptive summary of search criteria helps users keep track of their applied filters, making it simple to adjust search parameters on the fly.

In this post, we discuss how these new search filtering capabilities enhance the user experience and boost the accuracy of search results, facilitating the ability to find data quickly.

Challenges

Many of our customers manage vast numbers of data assets within the Amazon DataZone catalog for discoverability. Data producers tag these assets with business glossary terms to classify and enhance discovery. For example, data assets owned by a particular department can be tagged with the glossary term for that department, like “Marketing.”

Data consumers searching for the right data assets use faceted search with various criteria, including business glossary terms, and apply filters to refine their search results. However, finding the right data assets can be challenging, especially when it involves combining multiple filters. Customers wanted more flexibility and precision in their search capabilities, such as:

  • A more intuitive way to navigate through extensive lists of glossary terms
  • The ability to apply more nuanced search logic to refine search results with greater precision
  • A summary of applied filters to effortlessly review and adjust search criteria

New features in Amazon DataZone

With the latest release, Amazon DataZone now supports features that enhance search flexibility and accuracy:

  • Improved rendering of glossary terms – Glossary terms are now displayed in a hierarchical view, providing a more organized structure. You can navigate and select from long lists of glossary terms presented in an expandable and collapsible hierarchy within the search facets. For instance, a data scientist can quickly find specific customer demographic data without sifting through an overwhelming flat list.
  • Logical operators for refined search – You can now choose logical operators to refine your search results, offering greater control and precision. For example, a financial analyst preparing a report on investment performance can use AND logic to combine criteria like investment type and region to pinpoint the exact data needed, or use OR logic to broaden the search to include any investments that meet either criterion.
  • Summary of search criteria – A descriptive summary of applied search filters is now provided, allowing you to review and manage your search criteria with ease. For example, a project manager can quickly adjust filters to find project-related assets matching specific phases or statuses.

These enhancements enable you to better understand the relationships between different search facets, enhancing the overall search experience and making it effortless to find the right data assets.

Use case overview

To demonstrate these search enhancements, we set up a new Amazon DataZone domain with two projects:

  • Marketing project – Publishes campaign-related data assets from the Marketing department. These data assets have been tagged with relevant business glossary terms corresponding to marketing.
  • Sales project – Publishes sales-related datasets from the Sales department. These data assets have been tagged with relevant business glossary terms corresponding to sales.

The following screenshots show examples of the different tagged assets.

In the following sections, we demonstrate the improvements in the user search experience for this use case.

Improved rendering of glossary terms

As a data consumer, you want to discover data assets using the faceted search capability within Amazon DataZone.

The search result panel has been enhanced to display glossaries and glossary terms in a hierarchical fashion. This allows you to expand and collapse sections for a more intuitive search experience.

For example, if you want to find product sales data assets from the Corporate Sales department, you can select the appropriate term within the glossary. The selection criteria and the corresponding result list show a total of 18 data assets, as shown in the following screenshot.

Next, if you want to further refine your search to focus only on the product category of Smartphones, you can do so.

Because OR is the default logical operator for your search within the glossary terms, it lists all the assets that are either part of Corporate Sales or tagged with Smartphones.

Logical operators for refined search

You now have the flexibility to change the default operator to AND to list only those data assets that are part of Corporate Sales and tagged with Smartphones, narrowing down the result set.

Additionally, you can further filter based on the asset type by selecting the available options. When you select Glue Table as your asset type, it defaults to the AND condition across the glossary terms and the asset type filter, thereby showing the data assets that satisfy all the filter conditions.

You also have the flexibility to change the operator to OR across these filters, yielding a more exhaustive list of data assets.

Summary of search criteria

As we showed in the preceding screenshots, the results also display a summary of the filters you applied for the search. This enables you to review and better manage your search criteria.

Conclusion

This post demonstrated new Amazon DataZone search enhancement features that streamline data discovery for a more intuitive user experience. These enhancements are designed to empower data consumers within organizations to make more informed decisions, faster. By streamlining the search process and making it more intuitive, Amazon DataZone continues to support the growing needs of data-driven businesses, helping you unlock the full potential of your data assets.

For more information about Amazon DataZone and to get started, refer to the Amazon DataZone User Guide.


About the authors

Chaitanya Vejendla is a Senior Solutions Architect specialized in DataLake & Analytics primarily working for Healthcare and Life Sciences industry division at AWS. Chaitanya is responsible for helping life sciences organizations and healthcare companies in developing modern data strategies, deploy data governance and analytical applications, electronic medical records, devices, and AI/ML-based applications, while educating customers about how to build secure, scalable, and cost-effective AWS solutions. His expertise spans across data analytics, data governance, AI, ML, big data, and healthcare-related technologies.

Ramesh H Singh is a Senior Product Manager Technical (External Services) at AWS in Seattle, Washington, currently with the Amazon DataZone team. He is passionate about building high-performance ML/AI and analytics products that enable enterprise customers to achieve their critical goals using cutting-edge technology.

Rishabh Asthana is a Front-end Engineer at AWS, working with the Amazon DataZone team based in New York City, USA.

Somdeb Bhattacharjee is an Enterprise Solutions Architect based out of New York, USA focused on helping customers on their cloud journey. He has interest in Databases, Big Data and Analytics.