Rapid7 completes IRAP PROTECTED assessment for Insight Platform solutions

Post Syndicated from Rapid7 original https://blog.rapid7.com/2024/07/08/rapid7-completes-irap-protected-assessment-for-insight-platform-solutions/

Rapid7 completes IRAP PROTECTED assessment for Insight Platform solutions

Exciting news from Australia!

Rapid7 has successfully completed an Information Security Registered Assessors Program (IRAP) assessment to PROTECTED Level for several of our Insight Platform solutions.

What is IRAP?

An IRAP assessment is an independent assessment of the implementation, appropriateness, and effectiveness of a system’s security controls. Achieving IRAP PROTECTED status means Australian Government agencies requiring PROTECTED level controls can access our industry-leading, practitioner-first security solutions. Meeting this status further strengthens our position as a trusted partner for Australian government organizations seeking to enhance their cybersecurity posture.

Rapid7 is one of the only vendors to be IRAP-assessed across what we consider a consolidated cybersecurity operation. This places us in a unique position to supply services across federal, state, and local government in Australia. It provides our government customers with the confidence that we have the right governance and controls in place for our own business in order to deliver that service effectively for our customers, specifically covering:

  • Vulnerability management on traditional infrastructure
  • Endpoints
  • The secure implementation of web applications
  • Detection and response to alerts or threats
  • The ability to securely automate workflows

Why is being IRAP PROTECTED important?

Being IRAP-assessed demonstrates our commitment to providing secure and reliable information security services for Government Systems, Cloud Service Providers, Cloud Services, and Information and Communications Technology (ICT) Systems, and more widely to our Australian customers.

Importantly, it highlights how we take the shared responsibility model extremely seriously. It also shows we’re protecting our customers’ information and data across their traditional infrastructure and in the cloud.

Which solutions are approved?

Solutions assessed and approved for PROTECTED Level include InsightIDR (detection and response), InsightVM (vulnerability management), InsightAppSec (application security), and InsightConnect (orchestration and automation). These solutions provide a comprehensive security platform to help government agencies tackle the challenges of today’s evolving cybersecurity landscape.

The successful completion of the IRAP assessment at the PROTECTED level demonstrates our commitment to supporting Australian government customers. It means they have access to a comprehensive security platform necessary to tackle the ever-evolving challenges of today’s cybersecurity landscape.

As more government agencies migrate to hybrid cloud environments, we can help them better manage the growing complexity of identifying and securing the attack surface.

As attackers become increasingly sophisticated, better armed, and faster, the IRAP assessment is yet another string in our cybersecurity bow, showcasing our potential to support Australian Government agencies and more widely, our customers.

How EchoStar ingests terabytes of data daily across its 5G Open RAN network in near real-time using Amazon Redshift Serverless Streaming Ingestion

Post Syndicated from Balaram Mathukumilli original https://aws.amazon.com/blogs/big-data/how-echostar-ingests-terabytes-of-data-daily-across-its-5g-open-ran-network-in-near-real-time-using-amazon-redshift-serverless-streaming-ingestion/

This post was co-written with Balaram Mathukumilli, Viswanatha Vellaboyana and Keerthi Kambam from DISH Wireless, a wholly owned subsidiary of EchoStar.

EchoStar, a connectivity company providing television entertainment, wireless communications, and award-winning technology to residential and business customers throughout the US, deployed the first standalone, cloud-native Open RAN 5G network on AWS public cloud.

Amazon Redshift Serverless is a fully managed, scalable cloud data warehouse that accelerates your time to insights with fast, simple, and secure analytics at scale. Amazon Redshift data sharing allows you to share data within and across organizations, AWS Regions, and even third-party providers, without moving or copying the data. Additionally, it allows you to use multiple warehouses of different types and sizes for extract, transform, and load (ETL) jobs so you can tune your warehouses based on your write workloads’ price-performance needs.

You can use the Amazon Redshift Streaming Ingestion capability to update your analytics data warehouse in near real time. Redshift Streaming Ingestion simplifies data pipelines by letting you create materialized views directly on top of data streams. With this capability in Amazon Redshift, you can use SQL to connect to and directly ingest data from data streams, such as Amazon Kinesis Data Streams or Amazon Managed Streaming for Apache Kafka (Amazon MSK), and pull data directly to Amazon Redshift.

EchoStar uses Redshift Streaming Ingestion to ingest over 10 TB of data daily from more than 150 MSK topics in near real time across its Open RAN 5G network. This post provides an overview of real-time data analysis with Amazon Redshift and how EchoStar uses it to ingest hundreds of megabytes per second. As data sources and volumes grew across its network, EchoStar migrated from a single Redshift Serverless workgroup to a multi-warehouse architecture with live data sharing. This resulted in improved performance for ingesting and analyzing their rapidly growing data.

“By adopting the strategy of ‘parse and transform later,’ and establishing an Amazon Redshift data warehouse farm with a multi-cluster architecture, we leveraged the power of Amazon Redshift for direct streaming ingestion and data sharing.

“This innovative approach improved our data latency, reducing it from two–three days to an average of 37 seconds. Additionally, we achieved better scalability, with Amazon Redshift direct streaming ingestion supporting over 150 MSK topics.”

—Sandeep Kulkarni, VP, Software Engineering & Head of Wireless OSS Platforms at EchoStar

EchoStar use case

EchoStar needed to provide near real-time access to 5G network performance data for downstream consumers and interactive analytics applications. This data is sourced from the 5G network EMS observability infrastructure and is streamed in near real-time using AWS services like AWS Lambda and AWS Step Functions. The streaming data produced many small files, ranging from bytes to kilobytes. To efficiently integrate this data, a messaging system like Amazon MSK was required.

EchoStar was processing over 150 MSK topics from their messaging system, with each topic containing around 1 billion rows of data per day. This resulted in an average total data volume of 10 TB per day. To use this data, EchoStar needed to visualize it, perform spatial analysis, join it with third-party data sources, develop end-user applications, and use the insights to make near real-time improvements to their terrestrial 5G network. EchoStar needed a solution that does the following:

  • Optimize parsing and loading of over 150 MSK topics to enable downstream workloads to run simultaneously without impacting each other
  • Allow hundreds of queries to run in parallel with desired query throughput
  • Seamlessly scale capacity with the increase in user base and maintain cost-efficiency

Solution overview

EchoStar migrated from a single Redshift Serverless workgroup to a multi-warehouse Amazon Redshift architecture in partnership with AWS. The new architecture enables workload isolation by separating streaming ingestion and ETL jobs from analytics workloads across multiple Redshift compute instances. At the same time, it provides live data sharing using a single copy of the data between the data warehouse. This architecture takes advantage of AWS capabilities to scale Redshift streaming ingestion jobs and isolate workloads while maintaining data access.

The following diagram shows the high-level end-to-end serverless architecture and overall data pipeline.

Architecture Diagram

The solution consists of the following key components:

  • Primary ETL Redshift Serverless workgroup – A primary ETL producer workgroup of size 392 RPU
  • Secondary Redshift Serverless workgroups – Additional producer workgroups of varying sizes to distribute and scale near real-time data ingestion from over 150 MSK topics based on price-performance requirements
  • Consumer Redshift Serverless workgroup – A consumer workgroup instance to run analytics using Tableau

To efficiently load multiple MSK topics into Redshift Serverless in parallel, we first identified the topics with the highest data volumes in order to determine the appropriate sizing for secondary workgroups.

We began by sizing the system initially to Redshift Serverless workgroup of 64 RPU. Then we onboarded a small number of MSK topics, creating related streaming materialized views. We incrementally added more materialized views, evaluating overall ingestion cost, performance, and latency needs within a single workgroup. This initial benchmarking gave us a solid baseline to onboard the remaining MSK topics across multiple workgroups.

In addition to a multi-warehouse approach and workgroup sizing, we optimized such large-scale data volume ingestion with an average latency of 37 seconds by splitting ingestion jobs into two steps:

  • Streaming materialized views – Use JSON_PARSE to ingest data from MSK topics in Amazon Redshift
  • Flattening materialized views – Shred and perform transformations as a second step, reading data from the respective streaming materialized view

The following diagram depicts the high-level approach.

MSK to Redshift

Best practices

In this section, we share some of the best practices we observed while implementing this solution:

  • We performed an initial Redshift Serverless workgroup sizing based on three key factors:
    • Number of records per second per MSK topic
    • Average record size per MSK topic
    • Desired latency SLA
  • Additionally, we created only one streaming materialized view for a given MSK topic. Creation of multiple materialized views per MSK topic can slow down the ingestion performance because each materialized view becomes a consumer for that topic and shares the Amazon MSK bandwidth for that topic.
  • While defining the streaming materialized view, we avoided using JSON_EXTRACT_PATH_TEXT to pre-shred data, because json_extract_path_text operates on the data row by row, which significantly impacts ingestion throughput. Instead, we adopted JSON_PARSE with the CAN_JSON_PARSE function to ingest data from the stream at lowest latency and to guard against errors. The following is a sample SQL query we used for the MSK topics (the actual data source names have been masked due to security reasons):
CREATE MATERIALIZED VIEW <source-name>_streaming_mvw AUTO REFRESH YES AS
SELECT
    kafka_partition,
    kafka_offset,
    refresh_time,
    case when CAN_JSON_PARSE(kafka_value) = true then JSON_PARSE(kafka_value) end as Kafka_Data,
    case when CAN_JSON_PARSE(kafka_value) = false then kafka_value end as Invalid_Data
FROM
    external_<source-name>."<source-name>_mvw";
  • We kept the streaming materialized views simple and moved all transformations like unnesting, aggregation, and case expressions to a later step as flattening materialized views. The following is a sample SQL query we used to flatten data by reading the streaming materialized views created in the previous step (the actual data source and column names have been masked due to security reasons):
CREATE MATERIALIZED VIEW <source-name>_flatten_mvw AUTO REFRESH NO AS
SELECT
    kafka_data."<column1>" :: integer as "<column1>",
    kafka_data."<column2>" :: integer as "<column2>",
    kafka_data."<column3>" :: bigint as "<column3>",
    … 
    …
    …
    …
FROM
    <source-name>_streaming_mvw;
  • The streaming materialized views were set to auto refresh so that they can continuously ingest data into Amazon Redshift from MSK topics.
  • The flattening materialized views were set to manual refresh based on SLA requirements using Amazon Managed Workflows for Apache Airflow (Amazon MWAA).
  • We skipped defining any sort key in the streaming materialized views to further accelerate the ingestion speed.
  • Lastly, we used SYS_MV_REFRESH_HISTORY and SYS_STREAM_SCAN_STATES system views to monitor the streaming ingestion refreshes and latencies.

For more information about best practices and monitoring techniques, refer to Best practices to implement near-real-time analytics using Amazon Redshift Streaming Ingestion with Amazon MSK.

Results

EchoStar saw improvements with this solution in both performance and scalability across their 5G Open RAN network.

Performance

By isolating and scaling Redshift Streaming Ingestion refreshes across multiple Redshift Serverless workgroups, EchoStar met their latency SLA requirements. We used the following SQL query to measure latencies:

WITH curr_qry as (
    SELECT
        mv_name,
        cast(partition_id as int) as partition_id,
        max(query_id) as current_query_id
    FROM
        sys_stream_scan_states
    GROUP BY
        mv_name,
        cast(partition_id as int)
)
SELECT
    strm.mv_name,
    tmp.partition_id,
    min(datediff(second, stream_record_time_max, record_time)) as min_latency_in_secs,
    max(datediff(second, stream_record_time_min, record_time)) as max_latency_in_secs
FROM
    sys_stream_scan_states strm,
    curr_qry tmp
WHERE
    strm.query_id = tmp.current_query_id
    and strm.mv_name = tmp.mv_name
    and strm.partition_id = tmp.partition_id
GROUP BY 1,2
ORDER BY 1,2;

When we further aggregate the preceding query to only the mv_name level (removing partition_id, which uniquely identifies a partition in an MSK topic), we find the average daily performance results we achieved on a Redshift Serverless workgroup size of 64 RPU as shown in the following chart. (The actual materialized view names have been hashed for security reasons because it maps to an external vendor name and data source.)

S.No. stream_name_hash min_latency_secs max_latency_secs avg_records_per_day
1 e022b6d13d83faff02748d3762013c 1 6 186,395,805
2 a8cc0770bb055a87bbb3d37933fc01 1 6 186,720,769
3 19413c1fc8fd6f8e5f5ae009515ffb 2 4 5,858,356
4 732c2e0b3eb76c070415416c09ffe0 3 27 12,494,175
5 8b4e1ffad42bf77114ab86c2ea91d6 3 4 149,927,136
6 70e627d11eba592153d0f08708c0de 5 5 121,819
7 e15713d6b0abae2b8f6cd1d2663d94 5 31 148,768,006
8 234eb3af376b43a525b7c6bf6f8880 6 64 45,666
9 38e97a2f06bcc57595ab88eb8bec57 7 100 45,666
10 4c345f2f24a201779f43bd585e53ba 9 12 101,934,969
11 a3b4f6e7159d9b69fd4c4b8c5edd06 10 14 36,508,696
12 87190a106e0889a8c18d93a3faafeb 13 69 14,050,727
13 b1388bad6fc98c67748cc11ef2ad35 25 118 509
14 cf8642fccc7229106c451ea33dd64d 28 66 13,442,254
15 c3b2137c271d1ccac084c09531dfcd 29 74 12,515,495
16 68676fc1072f753136e6e992705a4d 29 69 59,565
17 0ab3087353bff28e952cd25f5720f4 37 71 12,775,822
18 e6b7f10ea43ae12724fec3e0e3205c 39 83 2,964,715
19 93e2d6e0063de948cc6ce2fb5578f2 45 45 1,969,271
20 88cba4fffafd085c12b5d0a01d0b84 46 47 12,513,768
21 d0408eae66121d10487e562bd481b9 48 57 12,525,221
22 de552412b4244386a23b4761f877ce 52 52 7,254,633
23 9480a1a4444250a0bc7a3ed67eebf3 58 96 12,522,882
24 db5bd3aa8e1e7519139d2dc09a89a7 60 103 12,518,688
25 e6541f290bd377087cdfdc2007a200 71 83 176,346,585
26 6f519c71c6a8a6311f2525f38c233d 78 115 100,073,438
27 3974238e6aff40f15c2e3b6224ef68 79 82 12,770,856
28 7f356f281fc481976b51af3d76c151 79 96 75,077
29 e2e8e02c7c0f68f8d44f650cd91be2 92 99 12,525,210
30 3555e0aa0630a128dede84e1f8420a 97 105 8,901,014
31 7f4727981a6ba1c808a31bd2789f3a 108 110 11,599,385

All 31 materialized views running and refreshing concurrently and continuously show a minimum latency of 1 second and a maximum latency of 118 seconds over the last 7 days, meeting EchoStar’s SLA requirements.

Scalability

With this Redshift data sharing enabled multi-warehouse architecture approach, EchoStar can now quickly scale their Redshift compute resources on demand by using the Redshift data sharing architecture to onboard the remaining 150 MSK topics. In addition, as their data sources and MSK topics increase further, they can quickly add additional Redshift Serverless workgroups (for example, another Redshift Serverless 128 RPU workgroup) to meet their desired SLA requirements.

Conclusion

By using the scalability of Amazon Redshift and a multi-warehouse architecture with data sharing, EchoStar delivers near real-time access to over 150 million rows of data across over 150 MSK topics, totaling 10 TB ingested daily, to their users.

This split multi-producer/consumer model of Amazon Redshift can bring benefits to many workloads that have similar performance characteristics as EchoStar’s warehouse. With this pattern, you can scale your workload to meet SLAs while optimizing for price and performance. Please reach out to your AWS Account Team to engage an AWS specialist for additional help or for a proof of concept.


About the authors

Balaram Mathukumilli is Director, Enterprise Data Services at DISH Wireless. He is deeply passionate about Data and Analytics solutions. With 20+ years of experience in Enterprise and Cloud transformation, he has worked across domains such as PayTV, Media Sales, Marketing and Wireless. Balaram works closely with the business partners to identify data needs, data sources, determine data governance, develop data infrastructure, build data analytics capabilities, and foster a data-driven culture to ensure their data assets are properly managed, used effectively, and are secure

Viswanatha Vellaboyana, a Solutions Architect at DISH Wireless, is deeply passionate about Data and Analytics solutions. With 20 years of experience in enterprise and cloud transformation, he has worked across domains such as Media, Media Sales, Communication, and Health Insurance. He collaborates with enterprise clients, guiding them in architecting, building, and scaling applications to achieve their desired business outcomes.

Keerthi Kambam is a Senior Engineer at DISH Network specializing in AWS Services. She builds scalable data engineering and analytical solutions for dish customer faced applications. She is passionate about solving complex data challenges with cloud solutions.

Raks KhareRaks Khare is a Senior Analytics Specialist Solutions Architect at AWS based out of Pennsylvania. He helps customers across varying industries and regions architect data analytics solutions at scale on the AWS platform. Outside of work, he likes exploring new travel and food destinations and spending quality time with his family.

Adi Eswar has been a core member of the AI/ML and Analytics Specialist team, leading the customer experience of customer’s existing workloads and leading key initiatives as part of the Analytics Customer Experience Program and Redshift enablement in AWS-TELCO customers. He spends his free time exploring new food, cultures, national parks and museums with his family.

Shirin Bhambhani is a Senior Solutions Architect at AWS. She works with customers to build solutions and accelerate their cloud migration journey. She enjoys simplifying customer experiences on AWS.

Vinayak Rao is a Senior Customer Solutions Manager at AWS. He collaborates with customers, partners, and internal AWS teams to drive customer success, delivery of technical solutions, and cloud adoption.

Amazon DataZone introduces OpenLineage-compatible data lineage visualization in preview

Post Syndicated from Leonardo Gomez original https://aws.amazon.com/blogs/big-data/amazon-datazone-introduces-openlineage-compatible-data-lineage-visualization-in-preview/

We are excited to announce the preview of API-driven, OpenLineage-compatible data lineage in Amazon DataZone to help you capture, store, and visualize lineage of data movement and transformations of data assets on Amazon DataZone.

With the Amazon DataZone OpenLineage-compatible API, domain administrators and data producers can capture and store lineage events beyond what is available in Amazon DataZone, including transformations in Amazon Simple Storage Service (Amazon S3), AWS Glue, and other AWS services. This provides a comprehensive view for data consumers browsing in Amazon DataZone, who can gain confidence of an asset’s origin, and data producers, who can assess the impact of changes to an asset by understanding its usage.

In this post, we discuss the latest features of data lineage in Amazon DataZone, its compatibility with OpenLineage, and how to get started capturing lineage from other services such as AWS Glue, Amazon Redshift, and Amazon Managed Workflows for Apache Airflow (Amazon MWAA) into Amazon DataZone through the API.

Why it matters to have data lineage

Data lineage gives you an overarching view into data assets, allowing you to see the origin of objects and their chain of connections. Data lineage enables tracking the movement of data over time, providing a clear understanding of where the data originated, how it has changed, and its ultimate destination within the data pipeline. With transparency around data origination, data consumers gain trust that the data is correct for their use case. Data lineage information is captured at levels such as tables, columns, and jobs, allowing you to conduct impact analysis and respond to data issues because, for example, you can see how one field impacts downstream sources. This equips you to make well-informed decisions before committing changes and avoid unwanted changes downstream.

Data lineage in Amazon DataZone is an API-driven, OpenLineage-compatible feature that helps you capture and visualize lineage events from OpenLineage-enabled systems or through an API, to trace data origins, track transformations, and view cross-organizational data consumption. The lineage visualized includes activities inside the Amazon DataZone business data catalog. Lineage captures the assets cataloged as well as the subscribers to those assets and to activities that happen outside the business data catalog captured programmatically using the API.

Additionally, Amazon DataZone versions lineage with each event, enabling you to visualize lineage at any point in time or compare transformations across an asset’s or job’s history. This historical lineage provides a deeper understanding of how data has evolved, which is essential for troubleshooting, auditing, and enforcing the integrity of data assets.

The following screenshot shows an example lineage graph visualized with the Amazon DataZone data catalog.

Introduction to OpenLineage compatible data lineage

The need to capture data lineage consistently across various analytical services and combine them into a unified object model is key in uncovering insights from the lineage artifact. OpenLineage is an open source project that offers a framework to collect and analyze lineage. It also offers reference implementation of an object model to persist metadata along with integration to major data and analytics tools.

The following are key concepts in OpenLineage:

  • Lineage events – OpenLineage captures lineage information through a series of events. An event is anything that represents a specific operation performed on the data that occurs in a data pipeline, such as data ingestion, transformation, or data consumption.
  • Lineage entitiesEntities in OpenLineage represent the various data objects involved in the lineage process, such as datasets and tables.
  • Lineage runs – A lineage run represents a specific run of a data pipeline or a job, encompassing multiple lineage events and entities.
  • Lineage form types – Form types, or facets, provide additional metadata or context about lineage entities or events, enabling richer and more descriptive lineage information. OpenLineage offers facets for runs, jobs, and datasets, with the option to build custom facets.

The Amazon DataZone data lineage API is OpenLineage compatible and extends OpenLineage’s functionality by providing a materialization endpoint to persist the lineage outputs in an extensible object model. OpenLineage offers integrations for certain sources, and integration of these sources with Amazon DataZone is straightforward because the Amazon DataZone data lineage API understands the format and translates to the lineage data model.

The following diagram illustrates an example of the Amazon DataZone lineage data model.

In Amazon DataZone, every lineage node represents an underlying resource—there is a 1:1 mapping of the lineage node with a logical or physical resource such as table, view, or asset. The nodes represent a specific job with a specific run, or a node for a table or asset, and one node for a subscription target.

Each version of a node captures what happened to the underlying resource at that specific timestamp. In Amazon DataZone, lineage not only shares the story of data movement outside it, but it also represents the lineage of activities inside Amazon DataZone, such as asset creation, curation, publishing, and subscription.

To hydrate the lineage model in Amazon DataZone, two types of lineage are captured:

  • Lineage activities inside Amazon DataZone – This includes assets added to the catalog and published, and then details about the subscriptions are captured automatically. When you’re in the producer project context (for example, if the project you’re selected is the owning project of the asset you are browsing and you’re a member of that project), you will see two states of the dataset node:
    • The inventory asset type node defines the asset in the catalog that is in an unpublished stage. Other users can’t subscribe to the inventory asset. To learn more, refer to Creating inventory and published data in Amazon DataZone.
    • The published asset type represents the actual asset that is discoverable by data users across the organization. This is the asset type that can be subscribed by other project members. If you are a consumer and not part of the producing project of that asset, you will only see the published asset node.
  • Lineage activities outside of Amazon DataZone can be captured programmatically using the PostLineageEvent With these events captured either upstream or downstream of cataloged assets, data producers and consumers get a comprehensive view of data movement to check the origin of data or its consumption. We discuss how to use the API to capture lineage events later in this post.

There are two different types of lineage nodes available in Amazon DataZone:

  • Dataset node – In Amazon DataZone, lineage visualizes nodes that represent tables and views. Depending on the context of the project, the producers will be able to view both the inventory and published asset, whereas consumers can only view the published asset. When you first open the lineage tab on the asset details page, the cataloged dataset node will be the starting point for lineage graph traversal upstream or downstream. Dataset nodes include lineage nodes automated from Amazon DataZone and custom lineage nodes:
    • Automated dataset nodes – These nodes include information about AWS Glue or Amazon Redshift assets published in the Amazon DataZone catalog. They’re automatically generated and include a corresponding AWS Glue or Amazon Redshift icon within the node.
    • Custom dataset nodes – These nodes include information about assets that are not published in the Amazon DataZone catalog. They’re created manually by domain administrators (producers) and are represented by a default custom asset icon within the node. These are essentially custom lineage nodes created using the OpenLineage event format.
  • Job (run) node – This node captures the details of the job, which represents the latest run of a particular job and its run details. This node also captures multiple runs of the job and can be viewed on the History tab of the node details. Node details are made visible when you choose the icon.

Visualizing lineage in Amazon DataZone

Amazon DataZone offers a comprehensive experience for data producers and consumers. The asset details page provides a graphical representation of lineage, making it straightforward to visualize data relationships upstream or downstream. The asset details page provides the following capabilities to navigate the graph:

  • Column-level lineage – You can expand column-level lineage when available in dataset nodes. This automatically shows relationships with upstream or downstream dataset nodes if source column information is available.
  • Column search – If the dataset has more than 10 columns, the node presents pagination to navigate to columns not initially presented. To quickly view a particular column, you can search on the dataset node that lists just the searched column.
  • View dataset nodes only – If you want filter out the job nodes, you can choose the Open view control icon in the graph viewer and toggle the Display dataset nodes only This will remove all the job nodes from the graph and let you navigate just the dataset nodes.
  • Details pane – Each lineage node captures and displays the following details:
    • Every dataset node has three tabs: Lineage info, Schema, and History. The History tab lists the different versions of lineage event captured for that node.
    • The job node has a details pane to display job details with the tabs Job info and History. The details pane also captures queries or expressions run as part of the job.
  • Version tabs – All lineage nodes in Amazon DataZone data lineage will have versioning, captured as history, based on lineage events captured. You can view lineage at a selected timestamp that opens a new tab on the lineage page to help compare or contrast between the different timestamps.

The following screenshot shows an example of data lineage visualization.

You can experience the visualization with sample data by choosing Preview on the Lineage tab and choosing the Try sample lineage link. This opens a new browser tab with sample data to test and learn about the feature with or without a guided tour, as shown in the following screenshot.

Solution overview

Now that we understand the capabilities of the new data lineage feature in Amazon DataZone, let’s explore how you can get started in capturing lineage from AWS Glue tables and ETL (extract, transform, and load) jobs, Amazon Redshift, and Amazon MWAA.

The getting started scripts are also available in Amazon DataZone’s new GitHub repository.

Prerequisites

For this walkthrough, you should have the following prerequisites:

If the AWS account you use to follow this post uses AWS Lake Formation to manage permissions on the AWS Glue Data Catalog, make sure that you log in as a user with access to create databases and tables. For more information, refer to Implicit Lake Formation permissions.

Launch the CloudFormation stack

To create your resources for this use case using AWS CloudFormation, complete the following steps:

  1. Launch the CloudFormation stack in us-east-1:
  2. For Stack name, enter a name for your stack.
  3. Choose Next.
  4. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  5. Choose Create stack.

Wait for the stack formation to finish provisioning the resources. When you see the CREATE_COMPLETE status, you can proceed to the next steps.

Capture lineage from AWS Glue tables

For this example, we use CloudShell, which is a browser-based shell, to run the commands necessary to harvest lineage metadata from AWS Glue tables. Complete the following steps:

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Select the AWSomeRetailCrawler crawler created by the CloudFormation template.
  3. Choose Run.

When the crawler is complete, you’ll see a Succeeded status.

Now let’s harvest the lineage metadata using CloudShell.

  1. Download the extract_glue_crawler_lineage.py file.
  2. On the Amazon DataZone console, open CloudShell.
  1. On the Actions menu, choose Update file.
  2. Upload the extract_glue_crawler_lineage.py file.

  3. Run the following commands:
    sudo yum -y install python3
    python3 -m venv env
    . env/bin/activate
    pip install boto3

You should get the following results.

  1. After all the libraries and dependencies are configured, run the following command to harvest the lineage metadata from the inventory table:
    python extract_glue_crawler_lineage.py -d awsome_retail_db -t inventory -r us-east-1 -i dzd_Your_doamin

  2. The script asks for verification of the settings provided; enter Yes.

You should receive a notification indicating that the script ran successfully.

After you capture the lineage information from the Inventory table, complete the following steps to run the data source.

  1. On the Amazon DataZone data portal, open the Sales
  2. On the Data tab, choose Data sources in the navigation pane.
  1. Select your data source job and choose Run.

For this example, we had a data source job called SalesDLDataSourceV2 already created pointing to the awesome_retail_db database. To learn more about how to create data source jobs, refer to Create and run an Amazon DataZone data source for the AWS Glue Data Catalog.

After the job runs successfully, you should see a confirmation message.

Now let’s view the lineage diagram generated by Amazon DataZone.

  1. On the Data inventory tab, choose the Inventory table.
  2. On the Inventory asset page, choose the new Lineage tab.

On the Lineage tab, you can see that Amazon DataZone created three nodes:

  • Job / Job run – This is based on the AWS Glue crawler used to harvest the asset technical metadata
  • Dataset – This is based on the S3 object that contains the data related to this asset
  • Table – This is the AWS Glue table created by the crawler

If you choose the Dataset node, Amazon DataZone offers information about the S3 object used to create the asset.

Capture data lineage for AWS Glue ETL jobs

In the previous section, we covered how to generate a data lineage diagram on top of a data asset. Now let’s see how we can create one for an AWS Glue job.

The CloudFormation template that we launched earlier created an AWS Glue job called Inventory_Insights. This job gets data from the Inventory table and creates a new table called Inventory_Insights with the aggregated data of the total products available in all the stores.

The CloudFormation template also copied the openlineage-spark_2.12-1.9.1.jar file to the S3 bucket created for this post. This file is necessary to generate lineage metadata from the AWS Glue job. We use version 1.9.1, which is compatible with AWS Glue 3.0, the version used to create the AWS Glue job for this post. If you’re using a different version of AWS Glue, you need to download the corresponding OpenLineage Spark plugin file that matches your AWS Glue version.

The OpenLineage Spark plugin is not able to extract data lineage from AWS Glue Spark jobs that use AWS Glue DynamicFrames. Use Spark SQL DataFrames instead.

  1. Download the extract_glue_spark_lineage.py file.
  2. On the Amazon DataZone console, open CloudShell.
  3. On the Actions menu, choose Update file.
  4. Upload the extract_glue_spark_lineage.py file.
  5. On the CloudShell console, run the following command (if your CloudShell session expired, you can open a new session):
    python extract_glue_spark_lineage.py —region "us-east-1" —domain-identifier 'dzd_Your Domain'

  6. Confirm the information showed by the script by entering yes.

You will see the following message; this means that the script is ready to get the AWS Glue job lineage metadata after you run it.

Now let’s run the AWS Glue job created by the Cloud formation template.

  1. On the AWS Glue console, choose ETL jobs in the navigation pane.
  2. Select the Inventory_Insights job and choose Run job.

On the Job details tab, you will notice that the job has the following configuration:

  • Key --conf with value extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener --conf spark.openlineage.transport.type=console --conf spark.openlineage.facets.custom_environment_variables=[AWS_DEFAULT_REGION;GLUE_VERSION;GLUE_COMMAND_CRITERIA;GLUE_PYTHON_VERSION;]
  • Key --user-jars-first with value true
  • Dependent JARs path set as the S3 path s3://{your bucket}/lib/openlineage-spark_2.12-1.9.1.jar
  • The AWS Glue version set as 3.0

During the run of the job, you will see the following output on the CloudShell console.

This means that the script has successfully harvested the lineage metadata from the AWS Glue job.

Now let’s create an AWS Glue table based on the data created by the AWS Glue job. For this example, we use an AWS Glue crawler.

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Select the AWSomeRetailCrawler crawler created by the CloudFormation template and choose Run.

When the crawler is complete, you will see the following message.

Now let’s open the Amazon DataZone portal to see how the diagram is represented in Amazon DataZone.

  1. On the Amazon DataZone portal, choose the Sales project.
  2. On the Data tab, choose Inventory data in the navigation pane.
  3. Choose the inventory insights asset

On the Lineage tab, you can see the diagram created by Amazon DataZone. It shows three nodes:

    • The AWS Glue crawler used to create the AWS Glue table
    • The AWS Glue table created by the crawler
    • The Amazon DataZone cataloged asset
  1. To see the lineage information about the AWS Glue job that you ran to create the inventory_insights table, choose the arrows icon on the left side of the diagram.

Now you can see the full lineage diagram for the Inventory_insights table.

  1. Choose the blue arrow icon in the inventory node to the left of the diagram.

You can see the evolution of the columns and the transformations that they had.

When you choose any of the nodes that are part of the diagram, you can see more details. For example, the inventory_insights node shows the following information.

Capture lineage from Amazon Redshift

Let’s explore how to generate a lineage diagram from Amazon Redshift. In this example, we use AWS Cloud9 because it allows us to configure the connection to the virtual private cloud (VPC) where our Redshift cluster resides. For more information about AWS Cloud9, refer to the AWS Cloud9 User Guide.

The CloudFormation template included as part of this post doesn’t cover the creation of a Redshift cluster or the creation of the tables used in this section. To learn more about how to create a Redshift cluster, see Step 1: Create a sample Amazon Redshift cluster. We use the following query to create the tables needed for this section of the post:

Create SCHEMA market

create table market.retail_sales (
  id BIGINT primary key,
  name character varying not null
);

create table market.online_sales (
  id BIGINT primary key,
  name character varying not null
);

/* Important to insert some data in the table */
INSERT INTO market.retail_sales
VALUES (123, 'item1')

INSERT INTO market.online_sales
VALUES (234, 'item2')

create table market.sales AS
Select id, name from market.retail_sales
Union ALL
Select id, name from market.online_sales;

Remember to add the IP address of your AWS Cloud9 environment to the security group with access to the Redshift cluster.

  1. Download the requirements.txt and extract_redshift_lineage.py files.
  2. On the File menu, choose Upload Local Files.
  3. Upload the requirements.txt and extract_redshift_lineage.py files.
  4. Run the following commands:
    # Install Python 
    sudo yum -y install python3
    
    # dependency set up 
    python3 -m venv env 
    . env/bin/activate
    
    pip install -r requirements.txt

You should be able to see the following messages.

  1. To set the AWS credentials, run the following command:
    export AWS_ACCESS_KEY_ID=<<Your Access Key>>
    export AWS_SECRET_ACCESS_KEY=<<Your Secret Access Key>>
    export AWS_SESSION_TOKEN=<<Your Session Token>>

  2. Run the extract_redshift_lineage.py script to harvest the metadata necessary to generate the lineage diagram:
    python extract_redshift_lineage.py \
     -r region \
     -i dzd_your_dz_domain_id \
     -n your-redshift-cluster-endpoint \
     -t your-rs-port \
     -d your-database \
     -s the-starting-date

  3. Next, you will be prompted to enter the user name and password for the connection to your Amazon DataZone database.
  4. When you receive a confirmation message, enter yes.

If the configuration was done correctly, you will see the following confirmation message.

Now let’s see how the diagram was created in Amazon DataZone.

  1. On the Amazon DataZone data portal, open the Sales project.
  2. On the Data tab, choose Data sources.
  3. Run the data source job.

For this post, we already created a data source job called Sales_DW_Enviroment-default-datasource to add the Redshift data source to our Amazon DataZone project. To learn how to create a data source job, refer to Create and run an Amazon DataZone data source for Amazon Redshift

After you run the job, you’ll see the following confirmation message.

  1. On the Data tab, choose Inventory data in the navigation pane.
  2. Choose the total_sales asset.
  1. Choose the Lineage tab.

Amazon DataZone create a three-node lineage diagram for the total sales table; you can choose any node to view its details.

  1. Choose the arrows icon next to the Job/ Job run node to view a more complete lineage diagram.
  1. Choose the Job / Job run

The Job Info section shows the query that was used to create the total sales table.

Capture lineage from Amazon MWAA

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. Amazon MWAA is a managed service for Airflow that lets you use your current Airflow platform to orchestrate your workflows. OpenLineage supports integration with Airflow 2.6.3 using the openlineage-airflow package, and the same can be enabled on Amazon MWAA as a plugin. Once enabled, the plugin converts Airflow metadata to OpenLineage events, which are consumable by DataZone.PostLineageEvent.

The following diagram shows the setup required in Amazon MWAA to capture data lineage using OpenLineage and publish it to Amazon DataZone.

The workflow uses an Amazon MWAA DAG to invoke a data pipeline. The process is as follows:

  1. The openlineage-airflow plugin is configured on Amazon MWAA as a lineage backend. Metadata about the DAG run is passed to the plugin, which converts it into OpenLineage format.
  2. The lineage information collected is written to Amazon CloudWatch log group according to the Amazon MWAA environment.
  3. A helper function captures the lineage information from the log file and publishes it to Amazon DataZone using the PostLineageEvent API.

The example used in the post uses Amazon MWAA version 2.6.3 and OpenLineage plugin version 1.4.1. For other Airflow versions supported by OpenLineage, refer to Supported Airflow versions.

Configure the OpenLineage plugin on Amazon MWAA to capture lineage

When harvesting lineage using OpenLineage, a Transport configuration needs to be set up, which tells OpenLineage where to emit the events to, for example the console or an HTTP endpoint. You can use ConsoleTransport, which logs the OpenLineage events in the Amazon MWAA task CloudWatch log group, which can then be published to Amazon DataZone using a helper function.

Specify the following in the requirements.txt file added to the S3 bucket configured for Amazon MWAA:

openlineage-airflow==1.4.1

In the Airflow logging configuration section under the MWAA configuration for the Airflow environment, enable Airflow task logs with log level INFO. The following screenshot shows a sample configuration.

A successful configuration will add a plugin to Airflow, which can be verified from the Airflow UI by choosing Plugins on the Admin menu.

In this post, we use a sample DAG to hydrate data to Redshift tables. The following screenshot shows the DAG in graph view.

Run the DAG and upon successful completion of a run, open the Amazon MWAA task CloudWatch log group for your Airflow environment (airflow-env_name-task) and filter based on the expression console.py to select events emitted by OpenLineage. The following screenshot shows the results.

Publish lineage to Amazon DataZone

Now that you have the lineage events emitted to CloudWatch, the next step is to publish them to Amazon DataZone to associate them to a data asset and visualize them on the business data catalog.

  1. Download the files requirements.txt and airflow_cw_parse_log.py and gather environment details like AWS region, Amazon MWAA environment name and Amazon DataZone Domain ID.
  2. The Amazon MWAA environment name can be obtained from the Amazon MWAA console.
  3. The Amazon DataZone domain ID can be obtained from Amazon DataZone service console or from the Amazon DataZone portal.
  4. Navigate to CloudShell and choose Upload files on the Actions menu to upload the files requirements.txt and extract_airflow_lineage.py.

  5. After the files are uploaded, run the following script to filter lineage events from the Airflow task logs and publish them to Amazon DataZone:
    # Set up virtual env and install dependencies
    python -m venv env
    pip install -r requirements.txt
    . env/bin/activate
    
    # run the script
    python extract_airflow_lineage.py \
      --region us-east-1 \
      --domain-identifier your_domain_identifier \
      --airflow-environment-name your_airflow_environment_name

The function extract_airflow_lineage.py filters the lineage events from the Amazon MWAA task log group and publishes the lineage to the specified domain within Amazon DataZone.

Visualize lineage on Amazon DataZone

After the lineage is published to DataZone, open your DataZone project, navigate to the Data tab and chose a data asset that was accessed by the Amazon MWAA DAG. In this case, it is a subscribed asset.

Navigate to the Lineage tab to visualize the lineage published to Amazon DataZone.

Choose a node to look at additional lineage metadata. In the following screenshot, we can observe the producer of the lineage has been marked as airflow.

Conclusion

In this post, we shared the preview feature of data lineage in Amazon DataZone, how it works, and how you can capture lineage events, from AWS Glue, Amazon Redshift, and Amazon MWAA, to be visualized as part of the asset browsing experience.

To learn more about Amazon DataZone and how to get started, refer to the Getting started guide. Check out the YouTube playlist for some of the latest demos of Amazon DataZone and short descriptions of the capabilities available.


About the Authors

Leonardo Gomez is a Principal Analytics Specialist at AWS, with over a decade of experience in data management. Specializing in data governance, he assists customers worldwide in maximizing their data’s potential while promoting data democratization. Connect with him on LinkedIn.

Priya Tiruthani is a Senior Technical Product Manager with Amazon DataZone at AWS. She focuses on improving data discovery and curation required for data analytics. She is passionate about building innovative products to simplify customers’ end-to-end data journey, especially around data governance and analytics. Outside of work, she enjoys being outdoors to hike, capture nature’s beauty, and recently play pickleball.

Ron Kyker is a Principal Engineer with Amazon DataZone at AWS, where he helps drive innovation, solve complex problems, and set the bar for engineering excellence for his team. Outside of work, he enjoys board gaming with friends and family, movies, and wine tasting.

Srinivasan Kuppusamy is a Senior Cloud Architect – Data at AWS ProServe, where he helps customers solve their business problems using the power of AWS Cloud technology. His areas of interests are data and analytics, data governance, and AI/ML.

Amazon Managed Service for Apache Flink now supports Apache Flink version 1.19

Post Syndicated from Francisco Morillo original https://aws.amazon.com/blogs/big-data/amazon-managed-service-for-apache-flink-now-supports-apache-flink-version-1-19/

Apache Flink is an open source distributed processing engine, offering powerful programming interfaces for both stream and batch processing, with first-class support for stateful processing and event time semantics. Apache Flink supports multiple programming languages, Java, Python, Scala, SQL, and multiple APIs with different level of abstraction, which can be used interchangeably in the same application.

Amazon Managed Service for Apache Flink offers a fully managed, serverless experience in running Apache Flink applications and now supports Apache Flink 1.19.1, the latest stable version of Apache Flink at the time of writing. AWS led the community release of the version 1.19.1, which introduces a number of bug fixes over version 1.19.0, released in March 2024.

In this post, we discuss some of the interesting new features and configuration changes available for Managed Service for Apache Flink introduced with this new release. In every Apache Flink release, there are exciting new experimental features. However, in this post, we are going to focus on the features most accessible to the user with this release.

Connectors

With the release of version 1.19.1, the Apache Flink community also released new connector versions for the 1.19 runtime. Starting from 1.16, Apache Flink introduced a new connector version numbering, following the pattern <connector-version>-<flink-version>. It’s recommended to use connectors for the runtime version you are using. Refer to Using Apache Flink connectors to stay updated on any future changes regarding connector versions and compatibility.

SQL

Apache Flink 1.19 brings new features and improvements, particularly in the SQL API. These enhancements are designed to provide more flexibility, better performance, and ease of use for developers working with Flink’s SQL API. In this section, we delve into some of the most notable SQL enhancements introduced in this release.

State TTL per operator

Configuring state TTL at the operator level was introduced in Apache Flink 1.18 but wasn’t easily accessible to the end-user. To modify an operator TTL, you had to export the plan at development time, modify it manually, and force Apache Flink to use the edited plan instead of generating a new one when the application starts. The new features added to Flink SQL in 1.19 simplify this process by allowing TTL configurations directly through SQL hints, eliminating the need for JSON plan manipulation.

The following code shows examples of how to use SQL hints to set state TTL:

-- State TTL for Joins
SELECT /*+ STATE_TTL('Orders' = '1d', 'Customers' = '20d') */ 
  *
FROM Orders 
LEFT OUTER JOIN Customers 
  ON Orders.o_custkey = Customers.c_custkey;

-- State TTL for Aggregations
SELECT /*+ STATE_TTL('o' = '1d') */ 
  o_orderkey, SUM(o_totalprice) AS revenue 
FROM Orders AS o 
GROUP BY o_orderkey;

Session window table-valued functions

Windows are at the heart of processing infinite streams in Apache Flink, splitting the stream into finite buckets for computations. Before 1.19, Apache Flink provided the following types of window table-value functions (TVFs):

  • Tumble windows – Fixed-size, non-overlapping windows
  • Hop windows – Fixed-size, overlapping windows with a specified hop interval
  • Cumulate windows – Increasingly larger windows that start at the same point but grow over time

With the Apache Flink 1.19 release, it has enhanced its SQL capabilities by supporting session window TVFs in streaming mode, allowing for more sophisticated and flexible windowing operations directly within SQL queries. Applications can create dynamic windows that group elements based on session gaps, now supported in streaming mode. The following code shows an example:

-- Session window with partition keys
SELECT 
  * 
FROM TABLE(
  SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES));

-- Apply aggregation on the session windowed table with partition keys
SELECT 
  window_start, window_end, item, SUM(price) AS total_price
FROM TABLE(
  SESSION(TABLE Bid PARTITION BY item, DESCRIPTOR(bidtime), INTERVAL '5' MINUTES))
GROUP BY item, window_start, window_end;

Mini-batch optimization for regular joins

When using the Table API or SQL, regular joins—standard equi-joins like a table SQL join, where time is not a factor—may induce a considerable overhead for the state backend, especially when using RocksDB.

Normally, Apache Flink processes standard joins one record at a time, looking up the state for a matching record in the other side of the join, updating the state with the input record, and emitting the resulting record. This may add considerable pressure on RocksDB, with multiple reads and writes for each record.

Apache Flink 1.19 introduces the ability to use mini-batch processing with equi-joins (FLIP-415). When enabled, Apache Flink will process regular joins not one record at a time, but in small batches, substantially reducing the pressure on the RocksDB state backend. Mini-batching adds some latency, which is controllable by the user. See, for example, the following SQL code (embedded in Java):

TableConfig tableConfig = tableEnv.getConfig();
tableConfig.set("table.exec.mini-batch.enabled", "true");
tableConfig.set("table.exec.mini-batch.allow-latency", "5s");
tableConfig.set("table.exec.mini-batch.size", "5000");

tableEnv.executeSql("CREATE TEMPORARY VIEW ab AS " +
  "SELECT a.id as a_id, a.a_content, b.id as b_id, b.b_content " +
  "FROM a LEFT JOIN b ON a.id = b.id";

With this configuration, Apache Flink will buffer up to 5,000 records or up to 5 seconds, whichever comes first, before processing the join for the entire mini-batch.

In Apache Flink 1.19, mini-batching only works for regular joins, not windowed or temporal joins. Mini-batching is disabled by default, and you have to explicitly enable it and set the batch size and latency for Flink to use it. Also, mini-batch settings are global, applied to all regular join of your application. At the time of writing, it’s not possible to set mini-batching per join statement.

AsyncScalarFunction

Before version 1.19, an important limitation of SQL and the Table API, compared to the Java DataStream API, was the lack of asynchronous I/O support. Any request to an external system, for example a database or a REST API, or even any AWS API call, using the AWS SDK, is synchronous and blocking. An Apache Flink’s subtask waits for the response before completing the processing of a record and proceeding to the next one. Practically, the roundtrip latency of each request was added to the processing latency for each processed record. Apache Flink’s Async I/O API removes this limitation, but it’s only available for the DataStream API and Java. Until version 1.19, there was no simple efficient workaround in SQL, the Table API, or Python.

Apache Flink 1.19 introduces the new AsyncScalarFunction, a user-defined function (UDF) that can be implemented using non-blocking calls to the external system, to support use cases similar to asynchronous I/O in SQL and the Table API.

This new type of UDF is only available in streaming mode. At the moment, it only supports ordered output. DataStream Async I/O also supports unordered output, which may further reduce latency when strict ordering isn’t required.

Python 3.11 support

Python 3.11 is now supported, and Python 3.7 support has been completely removed (FLINK-33029). Managed Service for Apache Flink currently uses the Python 3.11 runtime to run PyFlink applications. Python 3.11 is a bugfix only version of the runtime. Python 3.11 introduced several performance improvements and bug fixes, but no API breaking changes.

Performance improvements: Dynamic checkpoint interval

In the latest release of Apache Flink 1.19, significant enhancements have been made to improve checkpoint behavior. With this new release, it gives the application the capability to adjust checkpointing intervals dynamically based on whether the source is processing backlog data (FLIP-309).

In Apache Flink 1.19, you can now specify different checkpointing intervals based on whether a source operator is processing backlog data. This flexibility optimizes job performance by reducing checkpoint frequency during backlog phases, enhancing overall throughput. Extending checkpoint intervals allows Apache Flink to prioritize processing throughput over frequent state snapshots, thereby improving efficiency and performance.

To enable it, you need to define the execution.checkpointing.interval parameter for regular intervals and execution.checkpointing.interval-during-backlog to specify a longer interval when sources report processing backlog.

For example, if you want to run checkpoints every 60 seconds during normal processing, but extend to 10 minutes during the processing of backlogs, you can set the following:

  • execution.checkpointing.interval = 60s
  • execution.checkpointing.interval-during-backlog = 10m

In Amazon Managed Service for Apache Flink, the default checkpointing interval is configured by the application configuration (60 seconds by default). You don’t need to set the configuration parameter. To set a longer checkpointing interval during backlog processing, you can raise a support case to modify execution.checkpointing.interval-during-backlog. See Modifiable Flink configuration properties for further details about modifying Apache Flink configurations.

At the time of writing, dynamic checkpointing intervals are only supported by Apache Kafka source and FileSystem source connectors. If you use any other source connector, intervals during backlog are ignored, and Apache Flink runs a checkpoint at the default interval during backlog processing.

In Apache Flink, checkpoints are always injected in the flow from the sources. This feature only involves source connectors. The sink connectors you use in your application don’t affect this feature. For a deep dive into the Apache Flink checkpoint mechanism, see Optimize checkpointing in your Amazon Managed Service for Apache Flink applications with buffer debloating and unaligned checkpoints.

More troubleshooting information: Job initialization and checkpoint traces

With FLIP-384, Apache Flink 1.19 introduces trace reporters, which show checkpointing and job initialization traces. As of 1.19, this trace information can be sent to the logs using Slf4j. In Managed Service for Apache Flink, this is now enabled by default. You can find checkpoint and job initialization details in Amazon CloudWatch Logs, with the other logs from the application.

Checkpoint traces contain valuable information about each checkpoint. You can find similar information on the Apache Flink Dashboard, but only for the latest checkpoints and only while the application is running. Conversely, in the logs, you can find the full history of checkpoints. The following is an example of a checkpoint trace:

SimpleSpan{
  scope=org.apache.flink.runtime.checkpoint.CheckpointStatsTracker, 
  name=Checkpoint, 
  startTsMillis=1718779769305, 
  endTsMillis=1718779769542, 
  attributes={
    jobId=1b418a2404cbcf47ef89071f83f2dff9, 
    checkpointId=9774, 
    checkpointStatus=COMPLETED, 
    fullSize=9585, 
    checkpointedSize=9585
  }
}

Job initialization traces are generated when the job starts and recovers the state from a checkpoint or savepoint. You can find valuable statistics you can’t normally find elsewhere, including the Apache Flink Dashboard. The following is an example of a job initialization trace:

SimpleSpan{
  scope=org.apache.flink.runtime.checkpoint.CheckpointStatsTracker,
  name=JobInitialization,
  startTsMillis=1718781201463,
  endTsMillis=1718781409657,
  attributes={
    maxReadOutputDataDurationMs=89,
    initializationStatus=COMPLETED,
    fullSize=26167879378,
    sumMailboxStartDurationMs=621,
    sumGateRestoreDurationMs=29,
    sumDownloadStateDurationMs=199482,
    sumRestoredStateSizeBytes.LOCAL_MEMORY=46764,
    checkpointId=270,
    sumRestoredStateSizeBytes.REMOTE=26167832614,
    maxDownloadStateDurationMs=199482,
    sumReadOutputDataDurationMs=90,
    maxRestoredStateSizeBytes.REMOTE=26167832614,
    maxInitializeStateDurationMs=201122,
    sumInitializeStateDurationMs=201241,
    jobId=8edb291c9f1c91c088db51b48de42308,
    maxGateRestoreDurationMs=22,
    maxMailboxStartDurationMs=391,
    maxRestoredStateSizeBytes.LOCAL_MEMORY=46764
  }
}

Checkpoint and job initialization traces are logged at INFO level. You can find them in CloudWatch Logs only if you configure a logging level of INFO or DEBUG in your Managed Service for Apache Flink application.

Managed Service for Apache Flink behavior change

As a fully managed service, Managed Service for Apache Flink controls some runtime configuration parameters to guarantee the stability of your application. For details about the Apache Flink settings that can be modified, see Apache Flink settings.

With the 1.19 runtime, if you programmatically modify a configuration parameter that is directly controlled by Managed Service for Apache Flink, you receive an explicit ProgramInvocationException when the application starts, explaining what parameter is causing the problem and preventing the application from starting. With runtime 1.18 or earlier, changes to parameters controlled by the managed service were silently ignored.

To learn more about how Managed Service for Apache Flink handles configuration changes in runtime 1.19 or later, refer to FlinkRuntimeException: “Not allowed configuration change(s) were detected”.

Conclusion

In this post, we explored some of the new relevant features and configuration changes introduced with Apache Flink 1.19, now supported by Managed Service for Apache Flink. This latest version brings numerous enhancements aimed at improving performance, flexibility, and usability for developers working with Apache Flink.

With the support of Apache Flink 1.19, Managed Service for Apache Flink now supports the latest released Apache Flink version. We have seen some of the interesting new features available for Flink SQL and PyFlink.

You can find more details about recent releases from the Apache Flink blog and release notes:

If you’re new to Apache Flink, we recommend our guide to choosing the right API and language and following the getting started guide to start using Managed Service for Apache Flink.

If you’re already running an application in Managed Service for Apache Flink, you can safely upgrade it in-place to the new 1.19 runtime.


About the Authors

Francisco Morillo is a Streaming Solutions Architect at AWS, specializing in real-time analytics architectures. With over five years in the streaming data space, Francisco has worked as a data analyst for startups and as a big data engineer for consultancies, building streaming data pipelines. He has deep expertise in Amazon Managed Streaming for Apache Kafka (Amazon MSK) and Amazon Managed Service for Apache Flink. Francisco collaborates closely with AWS customers to build scalable streaming data solutions and advanced streaming data lakes, ensuring seamless data processing and real-time insights.

Lorenzo Nicora works as Senior Streaming Solution Architect at AWS, helping customers across EMEA. He has been building cloud-centered, data-intensive systems for over 25 years, working in the finance industry both through consultancies and for FinTech product companies. He has leveraged open-source technologies extensively and contributed to several projects, including Apache Flink.

[$] Giving bootloaders the boot with nmbl

Post Syndicated from jzb original https://lwn.net/Articles/979789/

At DevConf.cz 2024,
Marta Lewandowska gave a talk to discuss a
new approach for booting Linux systems, “No more boot
loader: Please use the kernel instead
“. The talk, available on
YouTube
, introduced a new project called nmbl (for “no more bootloader”,
pronounced “nimble”). The idea is to get rid of bootloaders (e.g.,
GNU GRUB) with a
Unified
Kernel Image
(UKI) that removes the need for a separate bootloader
altogether. It is early days for nmbl, currently the project is only
being tested for use with virtual machines, but the idea is
compelling. If successful, nmbl could offer security, performance, and
maintenance benefits compared to GRUB and other separate bootloaders.

On the CSRB’s Non-Investigation of the SolarWinds Attack

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/07/on-the-csrbs-non-investigation-of-the-solarwinds-attack.html

ProPublica has a long investigative article on how the Cyber Safety Review Board failed to investigate the SolarWinds attack, and specifically Microsoft’s culpability, even though they were directed by President Biden to do so.

Implementing multi-Region failover for Amazon API Gateway

Post Syndicated from Chris Munns original https://aws.amazon.com/blogs/compute/implementing-multi-region-failover-for-amazon-api-gateway/

This post is written by Marcos Ortiz, Principal AWS Solutions Architect and Khubyar Behramsha, Sr. AWS Solutions Architect.

In this post, you learn how organizations can evolve from a single-Region architecture API Gateway to a multi-Region one, using a reliable failover mechanism without dependencies on AWS control plane operations. An AWS Well-Architected best practice is to rely on the data plane and not the control plane during recovery. Failover controls should work with no dependencies on the primary Region. This pattern shows how to independently failover discrete services deployed behind a shared public API. Additionally, there is a walkthrough on how to deploy and test the proposed architecture, using our open-source code available on GitHub.

For many organizations, running services behind a Regional Amazon API Gateway endpoint aligned to AWS Well-Architected best practices, offers the right balance of resilience, simplicity, and affordability. However, depending on business criticality, regulatory requirements, or disaster recovery objectives, some organizations must deploy their APIs using a multi-Region architecture.

When dealing with business-critical applications, organizations often want full control over how and when to trigger a failover. A manually triggered failover allows for dependencies to be failed over in a specific order. Failover actions follow the chain of approvals needed, which helps prevent failing over to an unprepared replica or other flapping issues caused by intermittent disruptions. While the failover action or trigger has a human-in-the-loop component, the recommendation is for all subsequent actions to be automated as much as possible. This approach gives application owners control over the failover process, including the ability to trigger the failover in cases of intermittent issues.

Overview

One common approach for customers is to deploy a public Regional API with a custom domain name, providing more intuitive URLs for their users. The backend uses API mappings to connect multiple API stages to a custom domain. This approach allows service owners to deploy their services independently while sharing the same top-level API domain name. Here is a typical architecture that follows this pattern:

Regional endpoint with mapping

Regional endpoint with mapping

However, when trying to evolve this to a multi-Region architecture, organizations often struggle to fail over each service independently. If the preceding architecture is deployed in two Regions as-is, it becomes an all-or-nothing scenario, where organizations must either fail over all the services behind API Gateway or none.

Evolving to a multi-Region architecture

To enable each team to manage and failover their services independently, you can implement this new approach for a multi-Region architecture. Each service has its own subdomain, using API Gateway HTTP integrations to route the request to a given service. This allows the service APIs the flexibility to be independently failed over, or all at once, with the shared public API.

Multi-Region architecture

Multi-Region architecture

This is the request flow:

  1. Users access a specific service through the public shared API domain name using a URL suffix. For instance, to access service1, the end user would send a request to http://example.com/service1.
  2. Amazon Route 53 has the top-level domain, example.com, registered with a primary and a secondary failover record. It routes the request to the API Gateway external API endpoint in the primary Region (us-east-1).
  3. API Gateway uses an HTTP integration to forward the request to service1 at https://service1.example.com.
  4. Amazon Route 53, has the domain service1.example.com registered with a primary and a secondary failover record. It routes the request to the API Gateway service1 API Regional endpoint in the primary Region (us-east-1) when healthy and routes to the service1 API Regional endpoint in the secondary Region (us-west-2) when unhealthy.
  5. Represents the primary route for service1 configured in Amazon Route 53.
  6. Represents the secondary route for service1 configured in Amazon Route 53.

This solution requires deploying each service API in both the primary (us-east-1) and secondary (us-west-2) Regions. Both Regions use the same custom domain configuration. For the primary Region, primary DNS records for each service point to the Regional API Gateway distribution endpoint. In the secondary Region, secondary DNS records for each service point to the Regional API Gateway distribution endpoint in the secondary Region.

Route 53 records

Route 53 records

Active-passive manual failover

The example provided here enables a reliable failover mechanism that does not rely on the Amazon Route 53 control plane. It uses Amazon Route 53 Application Recovery Controller (Route 53 ARC), which provides a cluster with five Regional endpoints across five different AWS Regions. The failover process uses these endpoints, instead of manually editing Amazon Route 53 DNS records, which is a control plane operation. The routing controls in Route 53 ARC failover traffic from the primary Region to the secondary one.

Route 53 ARC routing controls

Route 53 ARC routing controls

Routing controls are on-off switches that enable you to redirect client traffic from one instance of your workload to another. Traffic re-routing is the result of setting associated DNS health checks as healthy or unhealthy.

Route 53 ARC toggles

Route 53 ARC toggles

Deploying the sample application

Pre-requisites

  1. A public domain (example.com) registered with Amazon Route 53. Follow the instructions here on how to register a domain and the instructions here to configure Amazon Route 53 as your DNS service.
  2. An AWS Certificate Manager certificate (*.example.com) for your domain name on both the primary and secondary Regions you plan to deploy the sample APIs.

Deploy the Amazon Route 53 ARC stack

Deploy the Amazon Route 53 ARC stack first, which creates a cluster and the routing controls that enable you to fail over the APIs.

Follow the detailed instructions here to deploy the Amazon Route 53 Application Recovery Controller (ARC) stack.

Deploy the Service1 API both in the primary and secondary Regions

This deploys an API Gateway Regional endpoint in each Region, which calls an AWS Lambda function to return the service name and the current AWS Region serving the request:

{"service": "service1", "region": "us-east-1"}

This is the code for the Lambda function:

import json
import os

def lambda_handler(event, context):
    return {
"statusCode": 200,
"body": json.dumps({
  "service": "service1",
  "region": os.environ['AWS_REGION']}),
}

Follow the detailed instructions here to deploy the service1 stack.

Deploy the Service2 API both in the primary and secondary Regions

This stack is similar to service1, but has a different domain name and returns service2 as the service name:

{"service": "service2", "region": "us-east-1"}

Follow the detailed instructions here to deploy the service2 stack.

Deploy the shared public API both in the primary and secondary Regions

This step configures HTTP endpoints so that when you call example.com/service1 or example.com/service2, it routes the request to the respective public DNS records you have set up for service1 and service2.

Follow the detailed instructions here to deploy the external API stack.

Failover tests

To test the deployed example, modify then run the provided test script:

  1. Update lines 3–5 in the test.sh file to reference the domain name you configured for your APIs.
  2. Provide execute permissions and run the script:
chmod +x ./test/sh
./test.sh

This script sends an HTTP request to each one of your three endpoints every 5 seconds. You can then use Amazon Route 53 ARC to fail over your services independently and see the responses served from different Regions.

Initially, all services are routing traffic to the us-east-1 Region:

Initial routing

Initial routing

With the following command, you update two routing controls for service1, setting the primary Region (us-east-1) health check state to off, and the secondary Region (us-west-2) health check state to on:

aws route53-recovery-cluster update-routing-control-states \
 --update-routing-control-state-entries \
 '[{"RoutingControlArn":"arn:aws:route53-recovery-control::111122223333:controlpanel/0123456bbbbbbb0123456bbbbbb0123456/routingcontrol/abcdefg1234567","RoutingControlState":"On"},
{"RoutingControlArn":"arn:aws:route53-recovery-control:: 111122223333:controlpanel/0123456bbbbbbb0123456bbbbbb0123456/routingcontrol/hijklmnop987654321","RoutingControlState":"Off"}]' \
 --region ap-southeast-2 \
 --endpoint-url https://abcd1234.route53-recovery-cluster.ap-southeast-2.amazonaws.com/v1

After a few seconds, the script terminal shows that service1 is now routing traffic to us-west-2, while the other services are still routing traffic to the us-east-1 Region.

Flipping service1 to backup Region

Flipping service1 to backup Region

To fail back service1 to the us-east-1 Region, run this command, now setting the service1 primary Region (us-east-1) health check state to on, and the secondary Region (us-west-2) health check state to off:

aws route53-recovery-cluster update-routing-control-states \
 --update-routing-control-state-entries \
 '[{"RoutingControlArn":"arn:aws:route53-recovery-control::111122223333:controlpanel/0123456bbbbbbb0123456bbbbbb0123456/routingcontrol/abcdefg1234567","RoutingControlState":"Off"},
{"RoutingControlArn":"arn:aws:route53-recovery-control:: 111122223333:controlpanel/0123456bbbbbbb0123456bbbbbb0123456/routingcontrol/hijklmnop987654321","RoutingControlState":"On"}]' \
 --region ap-southeast-2 \
 --endpoint-url https:// abcd1234.route53-recovery-cluster.ap-southeast-2.amazonaws.com/v1

After a few seconds, the script terminal shows that service1 is now routing traffic to the us-east-1 Region again, like the other services.

Routing recovery

Routing recovery

Cleaning up

After you are finished, follow the cleanup instructions on GitHub.

Conclusion

This solution helps put the control back in the hands of the teams managing critical workloads using API Gateway. By decoupling the frontend and backend, this solution gives organizations granular control over failover at the service level using Amazon Route 53 ARC to remove dependencies on control plane actions.

The pattern outlined also reduces the impact to consumers of the service as it allows you to use the same public API and top-level domain when moving from a single-Region to a multi-Region architecture.

For more resilience learning, visit AWS Architecture Blog – Resilience.

For more serverless learning, visit Serverless Land.

AWS Weekly Roundup: Amazon S3 Access Grants, AWS Lambda, European Sovereign Cloud Region, and more (July 8, 2024).

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-amazon-s3-access-grants-aws-lambda-european-sovereign-cloud-region-and-more-july-8-2024/

I counted only 21 AWS news since last Monday, most of them being Regional expansions of existing services and capabilities. I hope you enjoyed a relatively quiet week, because this one will be busier.

This week, we’re welcoming our customers and partners at the Jacob Javits Convention Center for the AWS Summit New York on Wednesday, July 10. I can tell you there is a stream of announcements coming, if I judge by the number of AWS News Blog posts ready to be published.

I am writing these lines just before packing my bag to attend the AWS Community Day in Douala, Cameroon next Saturday. I can’t wait to meet our customers and partners, students, and the whole AWS community there.

But for now, let’s look at last week’s new announcements.

Last week’s launches
Here are the launches that got my attention.

Amazon Simple Storage Service (Amazon S3) Access Grants now integrate with Amazon SageMaker and open souce Python frameworksAmazon S3 Access Grants maps identities in directories such as Active Directory or AWS Identity and Access Management (IAM) principals, to datasets in S3. The integration with Amazon SageMaker Studio for machine learning (ML) helps you map identities to your machine learning (ML) datasets in S3. The integration with the AWS SDK for Python (Boto3) plugin replaces any custom code required to manage data permissions, so you can use S3 Access Grants in open source Python frameworks such as Django, TensorFlow, NumPy, Pandas, and more.

AWS Lambda introduces new controls to make it easier to search, filter, and aggregate Lambda function logsYou can now capture your Lambda logs in JSON structured format without bringing your own logging libraries. You can also control the log level (for example, ERROR, DEBUG, or INFO) of your Lambda logs without making any code changes. Lastly, you can choose the Amazon CloudWatch log group to which Lambda sends your logs.

Amazon DataZone introduces fine-grained access controlAmazon DataZone has introduced fine-grained access control, providing data owners granular control over their data at row and column levels. You use Amazon DataZone to catalog, discover, analyze, share, and govern data at scale across organizational boundaries with governance and access controls. Data owners can now restrict access to specific records of data instead of granting access to an entire dataset.

AWS Direct Connect proposes native 400 Gbps dedicated connections at select locationsAWS Direct Connect provides private, high-bandwidth connectivity between AWS and your data center, office, or colocation facility. Native 400 Gbps connections provide higher bandwidth without the operational overhead of managing multiple 100 Gbps connections in a link aggregation group. The increased capacity delivered by 400 Gbps connections is particularly beneficial to applications that transfer large-scale datasets, such as for ML and large language model (LLM) training or advanced driver assistance systems for autonomous vehicles.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items that you might find interesting:

The list of services available at launch in the upcoming AWS Europe Sovereign Cloud Region is available – we shared the list of AWS services that will be initially available at launch in the new AWS European Sovereign Cloud Region. The list has no surprises. Services for security, networking, storage, computing, containers, artificial intelligence (AI), and serverless will be available at launch. We are building the AWS European Sovereign Cloud to offer public sector organizations and customers in highly regulated industries further choice to help them meet their unique digital sovereignty requirements, as well as stringent data residency, operational autonomy, and resiliency requirements. This is an investment of 7.8 billion euros (approximately $8.46 billion). The new Region will be available by the end of 2025.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Summits – Join free online and in-person events that bring the cloud computing community together to connect, collaborate, and learn about AWS. To learn more about future AWS Summit events, visit the AWS Summit page. Register in your nearest city: New York (July 10), Bogotá (July 18), and Taipei (July 23–24).

AWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are in Cameroon (July 13), Aotearoa (August 15), and Nigeria (August 24).

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— seb

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

French elections: political cyber attacks and Internet traffic shifts

Post Syndicated from João Tomé original https://blog.cloudflare.com/2024-french-elections-political-cyber-attacks-and-internet-traffic-shifts


The 2024 French legislative election runoff on July 7 yielded surprising results compared to the first round on June 30, with the New Popular Front (NPF) gaining the most seats, followed by French President Macron’s Ensemble party, and the National Rally. Coalition negotiations will follow. In this post, we examine the ongoing online attacks against French political parties and how initial election predictions at 20:00 local time led to a noticeable drop in France’s Internet traffic.

This blog post is part of a series tracking the numerous elections of 2024. We have covered elections in South Africa, India, Iceland, Mexico, the European Union, the UK and also the 2024 US presidential debate. We also continuously update our election report on Cloudflare Radar.

Let’s start with the attacks, and then move on to the Internet traffic trends.

Political parties under attack

As we highlighted last week, the first round of the French elections saw specific DDoS (Distributed Denial of Service) attacks targeting French political party websites. While online attacks are common and not always election-related, recent activities in France, the Netherlands, and the UK confirm that DDoS attacks frequently target political parties during election periods.

Two French political parties were attacked shortly before the first round of elections, and a third party was targeted on June 30. This third party, indicated in green on the chart below, faced attacks on the evening of June 29. Several attempts were thwarted by Cloudflare throughout election day, from 10:00 to 23:00 UTC (12:00 to 01:00 local time). The most intense attack occurred at 19:00 UTC (21:00 local time), reaching nearly 40,000 requests per second, with a total of 620 million DDoS requests recorded on that day (June 29).

Our data indicates that the most significant attack Cloudflare intercepted targeted a party shown in yellow on the chart above. The party had already been attacked on June 23, 2024, and this subsequent attack happened on July 3 at 21:36 UTC (23:36 local time), lasting four minutes and peaking at 151,000 requests per second (rps), making it the second-largest attack we’ve observed on political parties recently. This was comparable in intensity and duration to another attack on a UK political party right after their election.

On the runoff election day, July 7, the party represented by the blue line was again a target, having been attacked previously on June 24, 27, and 29. The most severe of these occurred on June 27, with attacks reaching 118,000 rps during a day that totaled 610 million daily DDoS requests. On July 7, the attacks resumed, with the first starting at 09:55 UTC (11:55 local time) and continuing sporadically until 23:18 UTC (01:18 local time on July 8). The peak of these attacks came at 11:40 UTC (13:40 local time), reaching 96,000 rps.

While these rates may seem small to Cloudflare, they can be devastating for websites not well-protected against such high levels of traffic. DDoS attacks not only overwhelm systems but also serve, if successful, as a distraction for IT teams while attackers attempt other types of breaches.

Exit polls came with a 20:00 Internet traffic dip

Each election brings its own unique circumstances. For instance, the UK’s snap election took place on Thursday, July 4, 2024, aligning with Britain’s tradition of weekday elections. In contrast, France and many other countries hold elections on weekends, typically Sundays.

During the first round of the French elections on June 30, morning traffic was lower than the previous week and rose in the afternoon. The runoff, a week later, displayed a different pattern. Morning traffic remained stable compared to June 30, but it saw a significant decrease in the afternoon, especially after 17:30 local time. Polling stations in major cities closed at 20:00. At this time, TV media began broadcasting the first results, causing a 16% drop in traffic compared to the previous week. This trend, where traffic dips as initial results are announced, is also seen in other elections, like the UK’s.

Traffic shifts during voting day, compared to the previous week, are more revealing when viewed in detail. The map and table below summarize the traffic changes observed at the state level within France, when voting closed and initial results predictions were revealed on TV at around 20:00 local time. This was the moment when, from Cloudflare’s data perspective, attention was diverted from online use.

(Source: Cloudflare; created with Datawrapper)

The table below shows the drops in traffic on July 7, at 20:00 local time, compared to the previous week.

State Drop in traffic (%)
Bourgogne-Franche-Comté -19%
Grand Est -19%
Brittany -15%
Auvergne-Rhône-Alpes -15%
Corsica -14%
Occitanie -11%
Nouvelle-Aquitaine -11%
Normandy -10%
Île-de-France -10%
Hauts-de-France -9%
Pays de la Loire -8%
Provence-Alpes-Côte d’Azur -7%
Centre-Val de Loire -6%

On election day in France, Internet traffic decreased most significantly in the regions of Bourgogne-Franche-Comté and Grand Est, both in the eastern part of the country and both experiencing a 19% drop. When comparing these regions to the Île-de-France region, where Paris is located, we see a smaller traffic decrease, at 10%. In the south, in regions like Provence-Alpes-Côte d’Azur, the drop was even less pronounced, at 7%.

Mobile device usage

Also notable was the increase in mobile device request traffic share during both election days, driving the share to levels higher than usual. Over the past month, mobile device traffic share on Sundays typically ranged from 53% to 54%. However, it rose to 57% on the first election day, June 30, and increased further to 58% on the runoff day, July 7, 2024. Mobile device traffic share was especially elevated from 11:00 to 22:00 local time on these days.

DNS trends: news outlets bring results

Switching focus to domain trends, our 1.1.1.1 resolver DNS data reveals a targeted impact from the French elections, allowing for a comparison between the two election days. Analyzing French news media outlets, DNS traffic in France was significantly higher on the first election day, June 30, with a 250% increase at 20:00 local time compared to the previous week. This was 6% higher than on the runoff day, July 7.

For French TV domains, the situation reversed during the runoff on July 7, showing 31% more DNS traffic at 20:00 local time than in the first round. On June 30, DNS traffic at that time was already 274% higher than the previous week, but the increase on July 7 was even more significant, at 391% compared to June 23, 2024—the Sunday before the two election days.

For microblogging social media in France, traffic was higher during the two election days, peaking on the first round. At the close of voting polls at 20:00 local time on June 30, traffic surged 38% compared to June 23, 2024. On July 7, runoff day, traffic increased by 32% at 20:00 local time compared to June 23, but was 4% lower than on June 30.​

Conclusion: keeping track of elections

In France, more attention was diverted from the Internet during the decisive runoff election day than in the first round, with a noticeable dip in traffic when TV stations announced predicted results at 20:00 local time.

If you want to follow more trends and insights about the Internet and elections in particular, you can check Cloudflare Radar, and more specifically our new 2024 Elections Insights report, which will be updated as elections take place throughout the year.

Since last week, we’ve updated our trends to include last-minute voting during the elections in Iran on June 28, 2024, and the suspension of mobile Internet in Mauritania following protests after the presidential elections on June 29, 2024, and the UK election.

Security updates for Monday

Post Syndicated from jake original https://lwn.net/Articles/981119/

Security updates have been issued by AlmaLinux (openssh), Debian (krb5), Fedora (yt-dlp), Gentoo (firefox, KDE Plasma Workspaces, Stellarium, thunderbird, and X.Org X11 library), Mageia (python-js2py and znc), Oracle (389-ds, c-ares, container-tools, cups, go-toolset, httpd:2.4/httpd, iperf3, kernel, less, libreoffice, libuv, nghttp2, openldap, openssh, python-idna, python-jinja2, python-pillow, python3, python3.11-PyMySQL, and xmlrpc-c), Red Hat (kernel, kernel-rt, openssh, and virt:rhel and virt-devel:rhel modules), and SUSE (go1.21, go1.22, krb5, kubevirt, virt-api-container, virt-controller-container, virt-exportproxy-container, virt-exportserver-container, virt-handler-container, virt-launcher-container, virt-libguestfs-t, netty3, opera, and python-urllib3).

Centrally manage VPC network ACL rules to block unwanted traffic using AWS Firewall Manager

Post Syndicated from Bryan Van Hook original https://aws.amazon.com/blogs/security/centrally-manage-vpc-network-acl-rules-to-block-unwanted-traffic-using-aws-firewall-manager/

Amazon Virtual Private Cloud (Amazon VPC) provides two options for controlling network traffic: network access control lists (ACLs) and security groups. A network ACL defines inbound and outbound rules that allow or deny traffic based on protocol, IP address range, and port range. Security groups determine which inbound and outbound traffic is allowed on a network interface, but they cannot explicitly deny traffic like a network ACL can. Every VPC subnet is associated with a network ACL that ultimately determines which traffic can enter or leave the subnet, even if a security group allows it. Network ACLs provide a layer of network controls that augment your security groups.

There are situations when you might want to deny specific sources or destinations within the range of network traffic allowed by security groups. For example, you want to deny inbound traffic from malicious sources on the internet, or you want to deny outbound traffic to ports or protocols used by exploits or malware. Security group rules can only control what traffic is allowed. If you want to deny specific traffic within the range of allowed traffic from security groups, you need to use network ACL rules. If you want to deny specific types of traffic in many VPCs, you need to update each network ACL associated with subnets in each of those VPCs. We heard from customers that implementing a baseline of common network ACL rules can be challenging to manage across many Amazon Web Services (AWS) accounts, so we expanded AWS Firewall Manager capabilities to make this easier.

AWS Firewall Manager network ACL security policies allow you to centrally manage network ACL rules for VPC subnets across AWS accounts in your organization. The following sections demonstrate how you can use network ACL policies to manage common network ACL rules that deny inbound and outbound traffic.

Deny inbound traffic using a network ACL security policy

If you have not already set up a Firewall Manager administrator account, see Firewall Manager prerequisites. Note that network ACL policies require your AWS Config configuration recorder to include the AWS::EC2::NetworkAcl and AWS::EC2::Subnet resource types.

Let’s review an example of how you can now use Firewall Manager to centrally manage a network ACL rule that denies inbound traffic from a public source IP range.

To deny inbound traffic:

  1. Sign in to your Firewall Manager delegated administrator account, open the AWS Management Console, and go to Firewall Manager.
  2. In the navigation pane, under AWS Firewall Manager, select Security policies.
  3. On the Filter menu, select the AWS Region where your VPC subnets are defined, and choose Create policy. In this example, we select US East (N. Virginia).
  4. Under Policy details, select Network ACL, and then choose Next.

    Figure 1: Network ACL policy type and Region

    Figure 1: Network ACL policy type and Region

  5. On Policy name, enter a Policy name and Policy description.

    Figure 2: Network ACL policy name and description

    Figure 2: Network ACL policy name and description

  6. In the Network ACL policy rules section, select the Inbound rules tab.
  7. In the First rules section, choose Add rules.

    Figure 3: Add rules in the First rules section

    Figure 3: Add rules in the First rules section

  8. In the Inbound rules window, choose Add inbound rules.

    Figure 4: Add inbound rules

    Figure 4: Add inbound rules

  9. For Inbound rules, choose the following:
    1. For Type, select All traffic.
    2. For Protocol, select All.
    3. For Port range, select All.
    4. For Source, enter an IP address range that you want to deny. In this example, we use 192.0.2.0/24.
    5. For Action, select Deny.
    6. Choose Add Rules.

    Figure 5: Configure a network ACL inbound rule

    Figure 5: Configure a network ACL inbound rule

  10. In Network ACL policy rules, under First rules, review the deny rule.

    Figure 6: Review the inbound deny rule

    Figure 6: Review the inbound deny rule

  11. Under Policy action, select the following:
    1. Select Auto remediate any noncompliant resources.
    2. Under Force Remediation, select Force remediate first rules. Firewall Manager compares your existing network ACL rules with rules defined in the policy. A conflict exists if a policy rule has the opposite action of an existing rule and overlaps with the existing rule’s protocol, address range, or port range. In these cases, Firewall Manager will not remediate the network ACL unless you enable force remediation.

    Figure 7: Configure the policy action

    Figure 7: Configure the policy action

  12. Choose Next.
  13. Policy scope, select the following:
    1. Under AWS accounts this policy applies to, select the scope of accounts that apply. In this example, we include all accounts.
    2. Under Resource type, select Subnet.
    3. Under Resources, select the scope of resources that apply. In this example, we only include subnets that have a particular tag.

    Figure 8: Configure the policy scope

    Figure 8: Configure the policy scope

  14. Enable resource cleanup if you want Firewall Manager to remove the rules it added to network ACLs associated with subnets that are no longer in scope. To enable cleanup, select Automatically remove protections from resources that leave the policy scope, and choose Next.

    Figure 9: Enable resource cleanup

    Figure 9: Enable resource cleanup

  15. Under Configure policy tags, define the tags you want to associate with your policy, and then choose Next.
  16. Under Review and create policy, choose Next.

Before creating the Firewall Manager policy, the subnet is associated with a default network ACL, as shown in Figure 10.

Figure 10: Default network ACL rules before the subnet is in scope

Figure 10: Default network ACL rules before the subnet is in scope

As shown in Figure 11, the subnet is now associated with a network ACL managed by Firewall Manager. The original Allow rule has been preserved and moved to priority 5,000. The Deny rule has been added with priority 1.

Figure 11: Inbound rules in network ACL managed by Firewall Manager

Figure 11: Inbound rules in network ACL managed by Firewall Manager

Deny outbound traffic using a network ACL security policy

You can also use Firewall Manager to implement outbound network ACL rules to deny the use of ports used by malware or software vulnerabilities. In this example, we’re blocking the use of LDAP port 389.

  1. Sign in to your Firewall Manager delegated administrator account and open the Firewall Manager console.
  2. In the navigation pane, under AWS Firewall Manager, select Security policies.
  3. On the Filter menu, select the AWS Region where your VPC subnets are defined, and choose Create policy. In this example, we select US East (N. Virginia).
  4. Under Policy details, select Network ACL, and then choose Next.
  5. Enter a Policy name and Policy description.
  6. In the Network ACL policy rules section, select the Outbound rules tab.
  7. In the First rules section, choose Add rules.

    Figure 12: Add rules in the First rules section

    Figure 12: Add rules in the First rules section

  8. Under Outbound rules, choose Add outbound rules.
  9. In Outbound rules, select the following:
    1. For Type, select LDAP (389).
    2. For Destination, enter 0.0.0.0/0.
    3. For Action, select Deny.
    4. Choose Add Rules.

    Figure 13: Configure a network ACL outbound rule

    Figure 13: Configure a network ACL outbound rule

  10. On the Network ACL policy rules page, under First rules, review the deny rule.

    Figure 14: Review the outbound deny rule

    Figure 14: Review the outbound deny rule

  11. In Policy action, under Policy action, select the following:
    1. Select Auto remediate any noncompliant resources.
    2. Under Force Remediation, select Force remediate first rules, and then choose Next.

    Figure 15: Configure the policy action

    Figure 15: Configure the policy action

  12. Under Policy scope, choose the following:
    1. Under AWS accounts this policy applies to, select the scope of accounts that apply. In this example, we include all accounts by selecting Include all accounts under my organization.
    2. Under Resource type, select Subnet.
    3. Under Resources, select the scope of resources that apply. In this example, we select Include only subnets that all the specified resource tags.

    Figure 16: Configure the policy scope

    Figure 16: Configure the policy scope

  13. On Resource cleanup, enable resource cleanup if you want Firewall Manager to remove rules it added to network ACLs associated with subnets that are no longer in scope. To enable resource cleanup, select Automatically remove protections from resources that leave the policy scope, and then choose Next.

    Figure 17: Enable resource cleanup

    Figure 17: Enable resource cleanup

  14. Under Configure policy tags, define the tags you want to associate with your policy, and then choose Next.
  15. Under Review and create policy, choose Next.

Before creating the Firewall Manager policy, the subnet is associated with a network ACL that already contains rules with priority 100 and 101, as shown in Figure 18.

Figure 18: Rules in original network ACL

Figure 18: Rules in original network ACL

As shown in Figure 19, the subnet is now associated with a network ACL managed by Firewall Manager. The original rules have been preserved and moved to priority 5,000 and 5,100. The Deny rule for LDAP has been added with priority 1.

Figure 19: Outbound rules in network ACL managed by Firewall Manager

Figure 19: Outbound rules in network ACL managed by Firewall Manager

Working with network ACLs managed by Firewall Manager

Firewall Manager network ACL policies allow you to manage up to 5 inbound and 5 outbound rules. Network ACLs can support a total of 20 inbound rules and 20 outbound rules by default. This limit can be increased up to 40 inbound rules and 40 outbound rules, but network performance might be impacted. Consider AWS Network Firewall if you need support for more rules and a broader set of features.

To diagnose overly restrictive network ACL rules, see Querying Amazon VPC flow logs to learn more about using Amazon Athena to analyze your VPC flow logs.

AWS accounts that are in scope of your Firewall Manager policy might have identities with permission to modify network ACLs created by Firewall Manager. You can use a service control policy (SCP) to deny AWS Identity and Access Management (IAM) actions that modify network ACLs if you want to make sure that they are exclusively managed by Firewall Manager. Firewall Manager uses service-linked roles, which are not restricted by SCPs. The following example SCP denies network ACL updates without restricting Firewall Manager:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyNaclUpdateExceptFMS",
      "Effect": "Deny",
      "Action": [
        "ec2:CreateNetworkAclEntry",
        "ec2:DeleteNetworkAclEntry",
        "ec2:ReplaceNetworkAclAssociation",
        "ec2:ReplaceNetworkAclEntry"
      ],
      "Resource": "*"
    }
  ]
}

Summary

Prior to AWS Firewall Manager network ACL security policies, you had to implement your own process to orchestrate updates to network ACLs across VPC subnets in your organization in AWS Organizations. AWS Firewall Manager network ACL security policies allow you to centrally define common network ACL rules that are automatically applied to VPC subnets across your organization, even as you add new accounts and resources. In this post, we demonstrated how you can use network ACL policies in a variety of scenarios, such as blocking ingress from malicious sources and blocking egress to destinations used by malware and exploits. You can also use network ACL policies to implement an allow list. For example, you might only want to allow egress to your on-premises network.

To get started, explore network ACL security policies in the Firewall Manager console. For more information, see the AWS Firewall Manager Developer Guide and send feedback to AWS re:Post for AWS Firewall Manager or through your AWS support team.

Bryan Van Hook

Bryan Van Hook
Bryan is a Senior Security Solutions Architect at AWS. He has over 25 years of experience in software engineering, cloud operations, and internet security. He spends most of his time helping customers gain the most value from native AWS security services. Outside of his day job, Bryan can be found playing tabletop games and acoustic guitar.

Author

Jesse Lepich
Jesse is a Senior Security Solutions Architect at AWS based in Lake St. Louis, Missouri, focused on helping customers implement native AWS security services. Outside of cloud security, his interests include relaxing with family, barefoot waterskiing, snowboarding and snow skiing, surfing, boating and sailing, and mountain climbing.

Имат думата читателите

Post Syndicated from Тоест original https://www.toest.bg/imat-dumata-chitatelite-2024/

Имат думата читателите

Скъпи читатели,

„Тоест“ е медия, която винаги активно търси диалог, и мнението на читателите ни е от решаващо значение за нас. Ще сме искрено благодарни, ако ни отделите 10 минути, за да отговорите на няколко въпроса в анкетата по-долу. С нея бихме искали да научим повече за читателските ви възприятия, какво е мнението ви за „Тоест“ и за присъствието на медията в социалните мрежи, а също така какви са нагласите ви към финансовата подкрепа на независими медии. Анкетата е анонимна и много лесна за попълване.

Благодарим ви предварително за времето, което ще отделите, за да споделите с нас своите мисли. Всеки отговор е ценен и ще ни помогнете да продължим да развиваме и подобряваме „Тоест“.

Kernel prepatch 6.10-rc7

Post Syndicated from corbet original https://lwn.net/Articles/981020/

The 6.10-rc7 kernel prepatch is out for
testing.

Things remain calm, although I do suspect that part of it is that
it’s been the July 4th week in the US, and a lot of Europe is
starting to go away on summer vacation.

But hey, let’s not look a gift horse too closely in the
mouth. Maybe it’s really just that 6.10 is shaping up well. Right?
RIGHT?