Happy New Year! AWS Weekly Roundup – January 8, 2024

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/happy-new-year-aws-weekly-roundup-january-8-2024/

Happy New Year! Cloud technologies, machine learning, and generative AI have become more accessible, impacting nearly every aspect of our lives. Amazon CTO Dr. Werner Vogels offers four tech predictions for 2024 and beyond:

  • Generative AI becomes culturally aware
  • FemTech finally takes off
  • AI assistants redefine developer productivity
  • Education evolves to match the speed of technology

Read how these technology trends will converge to help solve some of society’s most difficult problems. Download the Werner Vogels’ Tech Predictions for 2024 and Beyond ebook or read Werner’s All Things Distributed blog.

AWS re:Invent 2023To hear insights from AWS and industry thought leaders, grow your skills, and get inspired, watch AWS re:Invent 2023 videos on demand for keynotes, innovation talks, breakout sessions, and AWS Hero guide playlists.

Launches from the last few weeks
Since our last week in review on December 18, 2023, I’d like to highlight some launches from year end, as well as last week:

New AWS Canada West (Calgary) Region – We are opening a new and second Region and in Canada, AWS Canada West (Calgary). At the end of 2023, AWS had 33 AWS Regions and 105 Availability Zones (AZs) globally. We preannounced 12 additional AZs in four future Regions in Malaysia, New Zealand, Thailand, and the AWS European Sovereign Cloud. We will share more information on these Regions in 2024. Please stay tuned.

DNS over HTTPS in Amazon Route 53 Resolver – You can use the DNS over HTTPS (DoH) protocol for both inbound and outbound Route 53 Resolver endpoints. As the name suggests, DoH supports HTTP or HTTP/2 over TLS to encrypt the data exchanged for Domain Name System (DNS) resolutions.

Automatic enrollment to Amazon RDS Extended Support – Your MySQL 5.7 and PostgreSQL 11 database instances running on Amazon Aurora and Amazon RDS will be automatically enrolled into Amazon RDS Extended Support starting on February 29, 2024. You can have more control over when you want to upgrade the major version of your database after the community end of life (EoL).

New Amazon CloudWatch Network Monitor – This is a new feature of Amazon CloudWatch that helps monitor network availability and performance between AWS and your on-premises environments. Network Monitor needs zero manual instrumentation and gives you access to real-time network visibility to proactively and quickly identify issues within the AWS network and your own hybrid environment. For more information, read Monitor hybrid connectivity with Amazon CloudWatch Network Monitor.

Amazon Aurora PostgreSQL integrations with Amazon Bedrock – You can use two methods to integrate Aurora PostgreSQL databases with Amazon Bedrock to power generative AI applications. You can use the SQL query with Aurora ML integration with Amazon Bedrock and Aurora vector store with Knowledge Bases for Amazon Bedrock for Retrieval Augmented Generation (RAG).

New WordPress setup on Amazon Lightsail – Set up your WordPress website on Amazon Lightsail with the new workflow to eliminate complexity and time spent configuring your website. The workflow allows you to complete all the necessary steps, including setting up a Secure Sockets Layer (SSL) certificate to secure your website with HTTPS.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other news items that you may find interesting in the new year:

Book recommendations for AWS customer executives – Plan for the new year and catch up on what others are doing and thinking. AWS Enterprise Strategy team recommends what books are most important for our AWS customer executives to read.

Best practices for scaling AWS CDK adoption with Platform Engineering – A recent evolution in DevOps is the introduction of platform engineering teams to build services, toolchains, and documentation to support workload teams. This blog post introduces strategies and best practices for accelerating CDK adoption within your organization. You can learn how to scale the lessons learned from the pilot project across your organization through platform engineering.

High performance running HPC applications on AWS Graviton instances – When running the Parallel Lattice Boltzmann Solver (Palabos) on Amazon EC2 Hpc7g instances to solve computational fluid dynamics (CFD) problems, performance increased by up to 70% and price performance was up to 3x better than on the previous generation of Graviton instances.

The new AWS open source newsletter, #181 – Check up on all the latest open source content, which this week includes AWS Amplify, Amazon Corretto, dbt, Apache Flink, Karpenter, LangChain, Pinecone, and more.

Upcoming AWS Events
Check your calendars and sign up for these AWS events in the new year:

AWS at CES 2024 (January 9-12) – AWS will be representing some of the latest cloud services and solutions that are purpose built for the automotive, mobility, transportation, and manufacturing industries. Join us to learn about the latest cloud capabilities across generative AI, software define vehicles, product engineering, sustainability, new digital customer experiences, connected mobility, autonomous driving, and so much more in Amazon Experience Area.

APJ Builders Online Series (January 18) – This online conference is designed for you to learn core AWS concepts, and step-by-step architectural best practices, including demonstrations to help you get started and accelerate your success on AWS.

You can browse all upcoming AWS-led in-person and virtual events, and developer-focused events such as AWS DevDay.

That’s all for this week. Check back next Monday for another Week in Review!

— Channy

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Looking beyond code coverage with Amazon CodeWhisperer

Post Syndicated from Saurabh Kumar original https://aws.amazon.com/blogs/devops/looking-beyond-code-coverage-with-amazon-codewhisperer/

Code coverage is a code quality metric leveraging unit tests. Coming up with test cases with every combination of parameters requires developer’s time, which is already scarce. Developers’ focus is (mis)directed at just meeting the coverage threshold. In doing so, quality of code may be compromised and resulting code may still result in unexpected outcomes.

In this blog, we will walk you through a Java application and demonstrate how to look beyond code coverage by leveraging Amazon CodeWhisperer, an AI coding companion, for generating a combination of test cases, including boundary conditions which are often overlooked due to time and resource constraints. Taking this approach, you can improve code quality as well as improve your productivity.

Prerequisites

  1. Create an AWS Account. If you don’t already have an AWS account, you’ll need to create one in order to follow the steps outlined to AWS Management Console. For this, it is recommended to create a new user.
  2. To set up CodeWhisperer for your development environment, follow the setup instructions AWS Toolkit.
  3. Download and set up Java: Install the Java SE Development Kit.
  4. Install Maven.
  5. You can use any IDE of your choice such as Eclipse, Spring Tools, VS Code or IntelliJ. For this blog, we have used IntelliJ community edition which you can download from here. You can use IntelliJ Ultimate if you have a license for it.
  6. Clone repo: Looking beyond code coverage with CodeWhisperer.
  7. Import cloned project in IntelliJ following importing-a-project guide.

Following the above steps, you would have setup the project locally. Figure1 below shows how the initial project should look after you have imported it as maven project in your IDE:

Figure1. initial Java project

Figure1. initial Java project

The following screen recording shows how you can use CodeWhisperer to generate code for ‘add’ method using the comment “//method for adding two numbers”.

ScreenRecording 1. Initial application code generation

ScreenRecording 1. Initial application code generation

Now let’s generate one simple test case using CodeWhisperer as shown in the ScreenRecording 2. First test case generation:

ScreenRecording 2. First test case generation

ScreenRecording 2. First test case generation

Let’s run the test case with code coverage. In figure2. “First test with code coverage”, you can see that we have achieved 100% coverage on the Calculator class. If we just go by the coverage, we can conclude that the code is ready.

Figure2. First test with code coverage

Figure2. First test with code coverage

Just one unit test is not sufficient to ensure quality and side-effect-free code. Next, you will see how CodeWhisperer can assist in generating additional test cases. As soon as you begin typing a comment like, “// Test,” it provides suggestions for new test cases, such as “// test with one negative number,” “// test with two negative numbers,” “Test with one zero number,” etc. This feature, as shown in the ScreenRecording 3 titled ‘Generating additional test cases’ below, makes the task of generating a variety of test cases easier and enables developers to create more tests in a shorter amount of time.

ScreenRecording 3. Generating additional test cases

ScreenRecording 3. Generating additional test cases

So far, we have generated test cases with different arguments and still have 100% code coverage. Let’s now turn our focus towards safety of the code and think about different arguments that can lead to unexpected outcomes. Each argument should fall within the range of -2,147,483,648 (the minimum value of type int) to 2,147,483,647 (the maximum value of type int). Let’s use CodeWhisperer to generate additional test cases to challenge code safety as shown in the screen recording below:

ScreenRecording 4. Generating test for boundary conditions

ScreenRecording 4. Generating test for boundary conditions

Here, CodeWhisperer first generated a test case which adds 1 to maximum value of integer. We have also added a statement to print the result to console so that we can see the actual value ‘add’ method returns when this use case is executed. Point to note here: the generated test case is expecting the output to be the MINIMUM value of type int. Upon execution, the test case prints something unexpected. Because of the way signed integer operations work, adding 1 to max int results in the min value.

Let’s consider this in more practical terms. Imagine using the ‘add’ method in a banking system, where every time a customer deposits money into their bank account, you add up the recent deposit to calculate the final amount in their account. Now, imagine a customer’s reaction when they find their balance to be negative after depositing $2.14 billion, and they now owe huge overdraft charges.

This example demonstrates that even code with 100% coverage has unexpected side effects. The focus should be identifying combinations of parameters which can potentially produce unexpected outcomes so that code can be corrected before it manifests this behavior in production.

Now, let’s use CodeWhisperer to generate another test case that could create an unexpected result: “add -1 to the minimum value of ‘int’”. Again, adding -1 to the minimum int value results in the MAXIMUM value. Using the same example as above, a customer would be more than happy to notice that they still have money in their bank account, even after withdrawing $2.14 billion.

Again, the point is that developers should focus on ensuring that the code doesn’t have unwanted, unexpected consequences, rather than chasing a coverage target.

Now that we have seen that the add methods runs into integer overflow in certain conditions, let’s improve the code using CodeWhisperer using comment “check a and b for integer overflow”:

ScreenRecording 5. Code improvement- overflow check

ScreenRecording 5. Code improvement- overflow check

After adding the safety checks, the test cases are not resulting in unexpected outcomes and resulting in ArithmaticException as shown in the above screen recording. However, the test cases are failing, and failing test cases can interrupt the CI/CD pipeline. So, let’s refactor these test cases to expect this runtime exception and pass the test case as shown in the screen recording below.

ScreenRecording 6. Test case improvement- overflow checks

ScreenRecording 6. Test case improvement- overflow checks

Having rerun the test cases with coverage, you can see that the test cases are not only passing, but also have 100% code coverage.

For this blog, the majority of the code and its corresponding test cases are generated by CodeWhisperer, an AI coding companion. This tool enables us to enhance code by easily exploring libraries. In our example, this led us to the ‘Math.addExact’ method, which provides checks for boundary conditions relevant to our task. Let’s refactor the code to utilize this method, as shown below in Figure 3 final code.

Figure 3. Final code

Figure 3. Final code

If we rerun the test suite with coverage, we find that all the test cases are passing and coverage is also maintained at 100%.

Figure 4. Final tests with coverage

Figure 4. Final tests with coverage

Conclusion:

Through this blog post, we have demonstrated that high code coverage alone does not guarantee high quality code. Tools like Amazon CodeWhisperer can boost developer productivity by generating code and as well as corresponding test suite, including boundary conditions. This frees up developers to concentrate on business logic and to learn new frameworks and libraries, thereby resulting in overall improvement in quality and safety of code.

While our example focused on Java, this concept applies to other programming languages as well. Checkout the complete list of programming languages and IDEs supported by CodeWhisperer in the FAQs.

Try out CodeWhisperer Individual Tier for free to see how it can help you write high-quality code more efficiently using CodeWhisperer getting started guides.

Happy coding!

      
Saurabh Kumar's picture

Saurabh Kumar

Saurabh Kumar is a Solutions Architect at AWS based out of Raleigh, NC. He is passionate about helping customers solve their business challenges and technical problems from migration to modernization and optimization. Outside of work, he likes gardening and spending time with his family.

Bineesh Ravindran's picture

Bineesh Ravindran

Bineesh is Solutions Architect at Amazon Webservices (AWS) who is passionate about technology and love to help customers solve problems. Bineesh has over 20 years of experience in designing and implementing enterprise applications. He works with AWS partners and customers to provide them with architectural guidance for building scalable architecture and execute strategies to drive adoption of AWS services. When he’s not working, he enjoys biking, aqua scaping and playing badminton.

Architectural patterns for real-time analytics using Amazon Kinesis Data Streams, part 1

Post Syndicated from Raghavarao Sodabathina original https://aws.amazon.com/blogs/big-data/architectural-patterns-for-real-time-analytics-using-amazon-kinesis-data-streams-part-1/

We’re living in the age of real-time data and insights, driven by low-latency data streaming applications. Today, everyone expects a personalized experience in any application, and organizations are constantly innovating to increase their speed of business operation and decision making. The volume of time-sensitive data produced is increasing rapidly, with different formats of data being introduced across new businesses and customer use cases. Therefore, it is critical for organizations to embrace a low-latency, scalable, and reliable data streaming infrastructure to deliver real-time business applications and better customer experiences.

This is the first post to a blog series that offers common architectural patterns in building real-time data streaming infrastructures using Kinesis Data Streams for a wide range of use cases. It aims to provide a framework to create low-latency streaming applications on the AWS Cloud using Amazon Kinesis Data Streams and AWS purpose-built data analytics services.

In this post, we will review the common architectural patterns of two use cases: Time Series Data Analysis and Event Driven Microservices. In the subsequent post in our series, we will explore the architectural patterns in building streaming pipelines for real-time BI dashboards, contact center agent, ledger data, personalized real-time recommendation, log analytics, IoT data, Change Data Capture, and real-time marketing data. All these architecture patterns are integrated with Amazon Kinesis Data Streams.

Real-time streaming with Kinesis Data Streams

Amazon Kinesis Data Streams is a cloud-native, serverless streaming data service that makes it easy to capture, process, and store real-time data at any scale. With Kinesis Data Streams, you can collect and process hundreds of gigabytes of data per second from hundreds of thousands of sources, allowing you to easily write applications that process information in real-time. The collected data is available in milliseconds to allow real-time analytics use cases, such as real-time dashboards, real-time anomaly detection, and dynamic pricing. By default, the data within the Kinesis Data Stream is stored for 24 hours with an option to increase the data retention to 365 days. If customers want to process the same data in real-time with multiple applications, then they can use the Enhanced Fan-Out (EFO) feature. Prior to this feature, every application consuming data from the stream shared the 2MB/second/shard output. By configuring stream consumers to use enhanced fan-out, each data consumer receives dedicated 2MB/second pipe of read throughput per shard to further reduce the latency in data retrieval.

For high availability and durability, Kinesis Data Streams achieves high durability by synchronously replicating the streamed data across three Availability Zones in an AWS Region and gives you the option to retain data for up to 365 days. For security, Kinesis Data Streams provide server-side encryption so you can meet strict data management requirements by encrypting your data at rest and Amazon Virtual Private Cloud (VPC) interface endpoints to keep traffic between your Amazon VPC and Kinesis Data Streams private.

Kinesis Data Streams has native integrations with other AWS services such as AWS Glue and Amazon EventBridge to build real-time streaming applications on AWS. Refer to Amazon Kinesis Data Streams integrations for additional details.

Modern data streaming architecture with Kinesis Data Streams

A modern streaming data architecture with Kinesis Data Streams can be designed as a stack of five logical layers; each layer is composed of multiple purpose-built components that address specific requirements, as illustrated in the following diagram:

The architecture consists of the following key components:

  • Streaming sources – Your source of streaming data includes data sources like clickstream data, sensors, social media, Internet of Things (IoT) devices, log files generated by using your web and mobile applications, and mobile devices that generate semi-structured and unstructured data as continuous streams at high velocity.
  • Stream ingestion – The stream ingestion layer is responsible for ingesting data into the stream storage layer. It provides the ability to collect data from tens of thousands of data sources and ingest in real time. You can use the Kinesis SDK for ingesting streaming data through APIs, the Kinesis Producer Library for building high-performance and long-running streaming producers, or a Kinesis agent for collecting a set of files and ingesting them into Kinesis Data Streams. In addition, you can use many pre-build integrations such as AWS Database Migration Service (AWS DMS), Amazon DynamoDB, and AWS IoT Core to ingest data in a no-code fashion. You can also ingest data from third-party platforms such as Apache Spark and Apache Kafka Connect
  • Stream storage – Kinesis Data Streams offer two modes to support the data throughput: On-Demand and Provisioned. On-Demand mode, now the default choice, can elastically scale to absorb variable throughputs, so that customers do not need to worry about capacity management and pay by data throughput. The On-Demand mode automatically scales up 2x the stream capacity over its historic maximum data ingestion to provide sufficient capacity for unexpected spikes in data ingestion. Alternatively, customers who want granular control over stream resources can use the Provisioned mode and proactively scale up and down the number of Shards to meet their throughput requirements. Additionally, Kinesis Data Streams can store streaming data up to 24 hours by default, but can extend to 7 days or 365 days depending upon use cases. Multiple applications can consume the same stream.
  • Stream processing – The stream processing layer is responsible for transforming data into a consumable state through data validation, cleanup, normalization, transformation, and enrichment. The streaming records are read in the order they are produced, allowing for real-time analytics, building event-driven applications or streaming ETL (extract, transform, and load). You can use Amazon Managed Service for Apache Flink for complex stream data processing, AWS Lambda for stateless stream data processing, and AWS Glue & Amazon EMR for near-real-time compute. You can also build customized consumer applications with Kinesis Consumer Library, which will take care of many complex tasks associated with distributed computing.
  • Destination – The destination layer is like a purpose-built destination depending on your use case. You can stream data directly to Amazon Redshift for data warehousing and Amazon EventBridge for building event-driven applications. You can also use Amazon Kinesis Data Firehose for streaming integration where you can light stream processing with AWS Lambda, and then deliver processed streaming into destinations like Amazon S3 data lake, OpenSearch Service for operational analytics, a Redshift data warehouse, No-SQL databases like Amazon DynamoDB, and relational databases like Amazon RDS to consume real-time streams into business applications. The destination can be an event-driven application for real-time dashboards, automatic decisions based on processed streaming data, real-time altering, and more.

Real-time analytics architecture for time series

Time series data is a sequence of data points recorded over a time interval for measuring events that change over time. Examples are stock prices over time, webpage clickstreams, and device logs over time. Customers can use time series data to monitor changes over time, so that they can detect anomalies, identify patterns, and analyze how certain variables are influenced over time. Time series data is typically generated from multiple sources in high volumes, and it needs to be cost-effectively collected in near real time.

Typically, there are three primary goals that customers want to achieve in processing time-series data:

  • Gain insights real-time into system performance and detect anomalies
  • Understand end-user behavior to track trends and query/build visualizations from these insights
  • Have a durable storage solution to ingest and store both archival and frequently accessed data.

With Kinesis Data Streams, customers can continuously capture terabytes of time series data from thousands of sources for cleaning, enrichment, storage, analysis, and visualization.

The following architecture pattern illustrates how real time analytics can be achieved for Time Series data with Kinesis Data Streams:

Build a serverless streaming data pipeline for time series data

The workflow steps are as follows:

  1. Data Ingestion & Storage – Kinesis Data Streams can continuously capture and store terabytes of data from thousands of sources.
  2. Stream Processing – An application created with Amazon Managed Service for Apache Flink can read the records from the data stream to detect and clean any errors in the time series data and enrich the data with specific metadata to optimize operational analytics. Using a data stream in the middle provides the advantage of using the time series data in other processes and solutions at the same time. A Lambda function is then invoked with these events, and can perform time series calculations in memory.
  3. Destinations – After cleaning and enrichment, the processed time series data can be streamed to Amazon Timestream database for real-time dashboarding and analysis, or stored in databases such as DynamoDB for end-user query. The raw data can be streamed to Amazon S3 for archiving.
  4. Visualization & Gain insights – Customers can query, visualize, and create alerts using Amazon Managed Service for Grafana. Grafana supports data sources that are storage backends for time series data. To access your data from Timestream, you need to install the Timestream plugin for Grafana. End-users can query data from the DynamoDB table with Amazon API Gateway acting as a proxy.

Refer to Near Real-Time Processing with Amazon Kinesis, Amazon Timestream, and Grafana showcasing a serverless streaming pipeline to process and store device telemetry IoT data into a time series optimized data store such as Amazon Timestream.

Enriching & replaying data in real time for event-sourcing microservices

Microservices are an architectural and organizational approach to software development where software is composed of small independent services that communicate over well-defined APIs. When building event-driven microservices, customers want to achieve 1. high scalability to handle the volume of incoming events and 2. reliability of event processing and maintain system functionality in the face of failures.

Customers utilize microservice architecture patterns to accelerate innovation and time-to-market for new features, because it makes applications easier to scale and faster to develop. However, it is challenging to enrich and replay the data in a network call to another microservice because it can impact the reliability of the application and make it difficult to debug and trace errors. To solve this problem, event-sourcing is an effective design pattern that centralizes historic records of all state changes for enrichment and replay, and decouples read from write workloads. Customers can use Kinesis Data Streams as the centralized event store for event-sourcing microservices, because KDS can 1/ handle gigabytes of data throughput per second per stream and stream the data in milliseconds, to meet the requirement on high scalability and near real-time latency, 2/ integrate with Flink and S3 for data enrichment and achieving while being completely decoupled from the microservices, and 3/ allow retry and asynchronous read in a later time, because KDS retains the data record for a default of 24 hours, and optionally up to 365 days.

The following architectural pattern is a generic illustration of how Kinesis Data Streams can be used for Event-Sourcing Microservices:

The steps in the workflow are as follows:

  1. Data Ingestion and Storage – You can aggregate the input from your microservices to your Kinesis Data Streams for storage.
  2. Stream processing Apache Flink Stateful Functions simplifies building distributed stateful event-driven applications. It can receive the events from an input Kinesis data stream and route the resulting stream to an output data stream. You can create a stateful functions cluster with Apache Flink based on your application business logic.
  3. State snapshot in Amazon S3 – You can store the state snapshot in Amazon S3 for tracking.
  4. Output streams – The output streams can be consumed through Lambda remote functions through HTTP/gRPC protocol through API Gateway.
  5. Lambda remote functions – Lambda functions can act as microservices for various application and business logic to serve business applications and mobile apps.

To learn how other customers built their event-based microservices with Kinesis Data Streams, refer to the following:

Key considerations and best practices

The following are considerations and best practices to keep in mind:

  • Data discovery should be your first step in building modern data streaming applications. You must define the business value and then identify your streaming data sources and user personas to achieve the desired business outcomes.
  • Choose your streaming data ingestion tool based on your steaming data source. For example, you can use the Kinesis SDK for ingesting streaming data through APIs, the Kinesis Producer Library for building high-performance and long-running streaming producers, a Kinesis agent for collecting a set of files and ingesting them into Kinesis Data Streams, AWS DMS for CDC streaming use cases, and AWS IoT Core for ingesting IoT device data into Kinesis Data Streams. You can ingest streaming data directly into Amazon Redshift to build low-latency streaming applications. You can also use third-party libraries like Apache Spark and Apache Kafka to ingest streaming data into Kinesis Data Streams.
  • You need to choose your streaming data processing services based on your specific use case and business requirements. For example, you can use Amazon Kinesis Managed Service for Apache Flink for advanced streaming use cases with multiple streaming destinations and complex stateful stream processing or if you want to monitor business metrics in real time (such as every hour). Lambda is good for event-based and stateless processing. You can use Amazon EMR for streaming data processing to use your favorite open source big data frameworks. AWS Glue is good for near-real-time streaming data processing for use cases such as streaming ETL.
  • Kinesis Data Streams on-demand mode charges by usage and automatically scales up resource capacity, so it’s good for spiky streaming workloads and hands-free maintenance. Provisioned mode charges by capacity and requires proactive capacity management, so it’s good for predictable streaming workloads.
  • You can use the Kinesis Shared Calculator to calculate the number of shards needed for provisioned mode. You don’t need to be concerned about shards with on-demand mode.
  • When granting permissions, you decide who is getting what permissions to which Kinesis Data Streams resources. You enable specific actions that you want to allow on those resources. Therefore, you should grant only the permissions that are required to perform a task. You can also encrypt the data at rest by using a KMS customer managed key (CMK).
  • You can update the retention period via the Kinesis Data Streams console or by using the IncreaseStreamRetentionPeriod and the DecreaseStreamRetentionPeriod operations based on your specific use cases.
  • Kinesis Data Streams supports resharding. The recommended API for this function is UpdateShardCount, which allows you to modify the number of shards in your stream to adapt to changes in the rate of data flow through the stream. The resharding APIs (Split and Merge) are typically used to handle hot shards.

Conclusion

This post demonstrated various architectural patterns for building low-latency streaming applications with Kinesis Data Streams. You can build your own low-latency steaming applications with Kinesis Data Streams using the information in this post.

For detailed architectural patterns, refer to the following resources:

If you want to build a data vision and strategy, check out the AWS Data-Driven Everything (D2E) program.


About the Authors

Raghavarao Sodabathina is a Principal Solutions Architect at AWS, focusing on Data Analytics, AI/ML, and cloud security. He engages with customers to create innovative solutions that address customer business problems and to accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.

Hang Zuo is a Senior Product Manager on the Amazon Kinesis Data Streams team at Amazon Web Services. He is passionate about developing intuitive product experiences that solve complex customer problems and enable customers to achieve their business goals.

Shwetha Radhakrishnan is a Solutions Architect for AWS with a focus in Data Analytics. She has been building solutions that drive cloud adoption and help organizations make data-driven decisions within the public sector. Outside of work, she loves dancing, spending time with friends and family, and traveling.

Brittany Ly is a Solutions Architect at AWS. She is focused on helping enterprise customers with their cloud adoption and modernization journey and has an interest in the security and analytics field. Outside of work, she loves to spend time with her dog and play pickleball.

Джентълмените с късмет. Ротацията продължава

Post Syndicated from Емилия Милчева original https://www.toest.bg/dzentulmenite-s-kusmet-rotatsiyata-produlzhava/

Джентълмените с късмет. Ротацията продължава

Мюретата са част от политическата игра. Когато общественото внимание и енергия упорито биват насочвани към конкретна тема, обикновено става въпрос за договаряне далеч от камерите – и за съвсем друго. Ето че технологията на ротацията на премиера на ПП–ДБ акад. Николай Денков с настоящата вицепремиерка Мария Габриел от ГЕРБ измести темата за смените на министри и далеч по-важната – за механизма за номинации за членове на регулаторите. 

След като е налице желанието на съдружниците в управлението то да продължи, значи ротацията ще се осъществи. Останалото е несъществено от гледна точка на обществения интерес.

Смяна през март ще има, правителството с премиер Мария Габриел ще е с мандата на ГЕРБ–СДС. Двама-трима министри ще бъдат сменени. Асен Василев остава министър на финансите, вероятно ще е и вицепремиер. Под въпрос е дали настоящият министър-председател акад. Николай Денков ще е вицепремиер – освен министър на образованието и науката. Наред с това ще се състои и смяната на председателя на 49-тото НС Росен Желязков от ГЕРБ с Никола Минчев от „Продължаваме промяната“.

Можеше да е другояче

Всичко можеше да е по-честно и обществено приемливо, ако беше подписано коалиционно споразумение, където фигурира и механизмът на ротация, както го направиха в Румъния Националнолибералната и Социалдемократическата партия, със срок до провеждането на следващите редовни парламентарни избори. (Третият партньор, подписал споразумението – Демократичният съюз на унгарците в Румъния, напусна коалицията.) Документът предвижда ротация на премиера и размяна на няколко министерства между двете партии. Но в България ПП–ДБ и ГЕРБ–СДС заложиха на джентълменска дума, упорито избягвайки да подпишат документ, легитимиращ партньорството им. 

Лидерът на ГЕРБ Бойко Борисов искаше коалиционно споразумение. От ПП–ДБ упорстваха за някакъв Механизъм за гарантиране на реформаторската програма на правителството, първа точка в който отново бяха назначенията в регулаторите. Нито едното бе възприето, нито другото, но след като управленският съюз укрепва, „коалиция“ звучи далеч по-добре от обидното „сглобка“ или ироничното „не-коалиция“. А и внася известна прозрачност. 

Бездруго вече няма защо да се крият. 

Но пък заради ДПС „узаконяването“ на връзката става някак… комплицирано – хем за управление са се разбрали две коалиции, хем управляващото мнозинство в действителност е на три крака.

Черно на бяло – ДПС го няма, но е във властта

Какво имаме дотук? Декларация за национално отговорно управление на ПП–ДБ и ГЕРБ–СДС от май миналата година, в която са записани няколко принципа освен известните цели за Шенген и еврозоната. Ротация „за последователни периоди“ от по 9 месеца (с неуточнен брой). Излъчване на правителство с втория мандат. Задължението да бъдат подготвени и приети промени в Конституцията, включително в структурата на прокуратурата и ВСС. „Механизъм за предварително съгласуване между парламентарните групи на номинациите за регулаторните органи, избирани от парламента, с оглед гарантиране на най-добрия подбор на личности с високи професионални и морални качества.“ 

В края на първата и началото на новата „бременност“ с власт такъв механизъм още не е подготвен или поне не е обществено известен, и то в годината, в която предстои обновяването на регулаторите и на съдебните кадровици. 

Налице е и управленска програма на правителството „Денков–Габриел“ от 146 страници плюс законодателни приоритети. Стоп. Законодателните приоритети, обявени през септември миналата година, са договорени обаче не само между ПП–ДБ и ГЕРБ–СДС, но и с участието на ДПС. Конституционните промени бяха приети и с гласовете на Движението, което е вкарано и при назначенията в регулаторите. Но въпреки че ДПС са част от конституционното и евроатлантическото мнозинство, изобщо от мнозинството зад правителството, джентълменската дума е само между ПП–ДБ и ГЕРБ–СДС. 

Логично би било и едно (eвентуално) коалиционно споразумение да е между тях – но къде тогава остава ДПС! Партията на Доган отсъства от декларацията с принципи за управление, където фигурира механизмът за назначения. Допускайки я до квота, двете коалиции в действителност нарушават декларирани принципи, качвайки на автостоп третия партньор във властта. 

В ПП–ДБ не обичат думата „квота“, убеждавайки, че ще се излъчват безспорни професионалисти с обществена подкрепа и доверие. Някак трудно е за вярване при повече от десетилетие негативна селекция в съдебната система, констатирана от експерти и анализатори и целенасочено осъществявана от ГЕРБ и ДПС къде с помощта, къде с мълчаливото съгласие на БСП. Първо, че отнякъде ще се изнамерят такива кадри, второ – че ще бъдат допуснати в системата. 

Вторият кабинет – с мандата на първите

Този път се очаква правителството, което ще бъде гласувано през март, да е с мандата на ГЕРБ–СДС – политическата сила, първа на парламентарните избори на 2 април. И Борисов, и премиерът Николай Денков отбелязаха, че така ротацията на премиерите ще е най-лесна. Това е и компенсация за ГЕРБ, които отказаха първия мандат след вота, за да се сглоби кабинет с втория – на ПП–ДБ. Сега, в четвъртия триместър управление, получават и своето правителство. 

В крайна сметка няма значение кой е носителят на мандата, след като има зад гърба си същото мнозинство – ГЕРБ–СДС, ПП–ДБ, ДПС, и това мнозинство ще диктува какво да се приеме и какво да се отхвърли. 

Включването на ДПС помогна за приемане на конституционните промени. Но се оказа най-удобно за ГЕРБ. Двете политически сили често се обединяваха, проваляйки достатъчно смислени предложения, за да защитят свои бизнес интереси. Например изобщо отхвърлиха каквито и да било промени при търговете за онколекарства и така беше запазена досегашната практика НЗОК да плаща на различните болници лекарства с до 20 пъти разлика в цените. 

Отрязаха предложението и частните болници да обявяват обществени поръчки, защото настоящата ситуация води до същото – оскъпяване на едни и същи лекарства, тъй като повечето частни лечебни заведения имат свои дистрибуторски фирми, а НЗОК плаща и на едните, и на другите с обществен ресурс. А в самото начало при разпределянето на парламентарните комисии ГЕРБ отстъпиха част от своите председателски места на ДПС.

Тандемът ГЕРБ–ДПС има отработено стиковане и в тройното мнозинство представлява един друг политически сбор, на който допреди година се противопоставяха „Продължаваме промяната“ и „Демократична България“. 

Технологията не е тайна

Технологията за ротацията е като при оставка на правителството и преди дни конституционалисти обясниха как ще се случи. Премиерът подава оставка, което означава оставка и на правителството, и започва въртележката с мандатите. Президентът връчва мандата на лице, посочено от ГЕРБ–СДС, в случая Мария Габриел, и следват номинациите на министрите. До началото на март обаче трите политически сили – да, и ДПС също – ще трябва да се разберат кои министри да бъдат сменени и кои ще са новите персони за тези овакантени позиции. 

Но на първо време в дневния ред на парламента и управляващото мнозинство е изборът на двама конституционни съдии, забавен повече от две години. Според сайта Lex.bg най-спряганите имена са на проф. Екатерина Михайлова и на настоящия правосъден министър доц. Атанас Славов.

Така че коалицията ще продължи да управлява и през 2024 г. Само някакъв черен лебед би попречил на второто 9-месечие. Но понякога и черните лебеди са мюрета.

Security updates for Monday

Post Syndicated from jake original https://lwn.net/Articles/957146/

Security updates have been issued by Debian (exim4), Fedora (chromium, perl-Spreadsheet-ParseExcel, python-aiohttp, python-pysqueezebox, and tinyxml), Gentoo (Apache Batik, Eclipse Mosquitto, firefox, R, Synapse, and util-linux), Mageia (libssh2 and putty), Red Hat (squid), SUSE (libxkbcommon), and Ubuntu (gnutls28).

An overview of Cloudflare’s logging pipeline

Post Syndicated from Colin Douch http://blog.cloudflare.com/author/colin/ original https://blog.cloudflare.com/an-overview-of-cloudflares-logging-pipeline


One of the roles of Cloudflare’s Observability Platform team is managing the operation, improvement, and maintenance of our internal logging pipelines. These pipelines are used to ship debugging logs from every service across Cloudflare’s infrastructure into a centralised location, allowing our engineers to operate and debug their services in near real time. In this post, we’re going to go over what that looks like, how we achieve high availability, and how we meet our Service Level Objectives (SLOs) while shipping close to a million log lines per second.

Logging itself is a simple concept. Virtually every programmer has written a Hello, World! program at some point. Printing something to the console like that is logging, whether intentional or not.

Logging pipelines have been around since the beginning of computing itself. Starting with putting string lines in a file, or simply in memory, our industry quickly outgrew the concept of each machine in the network having its own logs. To centralise logging, and to provide scaling beyond a single machine, we invented protocols such as the BSD Syslog Protocol to provide a method for individual machines to send logs over the network to a collector, providing a single pane of glass for logs over an entire set of machines.

Our logging infrastructure at Cloudflare is a bit more complicated, but still builds on these foundational principles.

The beginning

Logs at Cloudflare start the same as any other, with a println. Generally systems don’t call println directly however, they outsource that logic to a logging library. Systems at Cloudflare use various logging libraries such as Go’s zerolog, C++’s KJ_LOG, or Rusts log, however anything that is able to print lines to a program’s stdout/stderr streams is compatible with our pipeline. This offers our engineers the greatest flexibility in choosing tools that work for them and their teams.

Because we use systemd for most of our service management, these stdout/stderr streams are generally piped into systemd-journald which handles the local machine logs. With its RateLimitBurst and RateLimitInterval configurations, this gives us a simple knob to control the output of any given service on a machine. This has given our logging pipeline the colloquial name of the “journal pipeline”, however as we will see, our pipeline has expanded far beyond just journald logs.

Syslog-NG

While journald provides us a method to collect logs on every machine, logging onto each machine individually is impractical for debugging large scale services. To this end, the next step of our pipeline is syslog-ng. Syslog-ng is a daemon that implements the aforementioned BSD syslog protocol. In our case, it reads logs from journald, and applies another layer of rate limiting. It then applies rewriting rules to add common fields, such as the name of the machine that emitted the log, the name of the data center the machine is in, and the state of the data center that the machine is in. It then wraps the log in a JSON wrapper and forwards it to our Core data centers.

journald itself has an interesting feature that makes it difficult for some of our use cases – it guarantees a global ordering of every log on a machine. While this is convenient for the single node case, it imposes the limitation that journald is single-threaded. This means that for our heavier workloads, where every millisecond of delay counts, we provide a more direct path into our pipeline. In particular, we offer a Unix Domain Socket that syslog-ng listens on. This socket operates as a separate source of logs into the same pipeline that the journald logs follow, but allows greater throughput by eschewing the need for a global ordering that journald enforces. Logging in this manner is a bit more involved than outputting logs to the stdout streams, as services have to have a pipe created for them and then manually open that socket to write to. As such, this is generally reserved for services that need it, and don’t mind the management overhead it requires.

log-x

Our logging pipeline is a critical service at Cloudflare. Any potential delays or missing data can cause downstream effects that may hinder or even prevent the resolving of customer facing incidents. Because of this strict requirement, we have to offer redundancy in our pipeline. This is where the operation we call “log-x” comes into play.

We operate two main core data centers. One in the United States, and one in Europe. From each machine, we ship logs to both of these data centers. We call these endpoints log-a, and log-b. The log-a and log-b receivers will insert the logs into a Kafka topic for later consumption. By duplicating the data to two different locations, we achieve a level of redundancy that can handle the failure of either data center.

The next problem we encounter is that we have many data centers all around the world, which at any time due to changing Internet conditions may become disconnected from one, or both core data centers. If the data center is disconnected for long enough we may end up in a situation where we drop logs to either the log-a or log-b receivers. This would result in an incomplete view of logs from one data center and is unacceptable; Log-x was designed to alleviate this problem. In the event that syslog-ng fails to send logs to either log-a or log-b, it will actually send the log twice to the available receiver. This second copy will be marked as actually destined for the other log-x receiver. When a log-x receiver receives such a log, it will insert it into a different Kafka queue, known as the Dead Letter Queue (DLQ). We then use Kafka Mirror Maker to sync this DLQ across to the data center that was inaccessible. With this logic log-x allows us to maintain a full copy of all the logs in each core data center, regardless of any transient failures from any of our data centers.

Kafka

When logs arrive in the core data centers, we buffer them in a Kafka queue. This provides a few benefits. Firstly, it means that any consumers of the logs can be added without any changes – they only need to register with Kafka as a consumer group on the logs topic. Secondly, it allows us to tolerate transient failures of the consumers without losing any data. Because the Kafka clusters in the core data centers are much larger than any single machine, Kafka allows us to tolerate up to eight hours of total outage for our consumers without losing any data. This has proven to be enough to recover without data loss from all but the largest of incidents.

When it comes to partitioning our Kafka data, we have an interesting dilemma. Rather arbitrarily, the syslog protocol only supports timestamps up to microseconds. For our faster log emitters, this means that the syslog protocol cannot guarantee ordering with timestamps alone. To work around this limitation, we partition our logs using a key made up of both the host, and the service name. Because Kafka guarantees ordering within a partition, this means that any logs from a service on a machine are guaranteed to be ordered between themselves. Unfortunately, because logs from a service can have vastly different rates between different machines, this can result in unbalanced Kafka partitions. We have an ongoing project to move towards Open Telemetry Logs to combat this.

Onward to storage

With the logs in Kafka, we can proceed to insert them into a more long term storage. For storage, we operate two backends. An ElasticSearch/Logstash/Kibana (ELK) stack, and a Clickhouse cluster.

For ElasticSearch, we split our cluster of 90 nodes into a few types. The first being “master”

nodes. These nodes act as the ElasticSearch masters, and coordinate insertions into the cluster. We then have “data” nodes that handle the actual insertion and storage. Finally, we have the “HTTP” nodes that handle HTTP queries. Traditionally in an ElasticSearch cluster, all the data nodes will also handle HTTP queries, however because of the size of our cluster and shards we have found that designating only a few nodes to handle HTTP requests greatly reduces our query times by allowing us to take advantage of aggressive caching.

On the Clickhouse side, we operate a ten node Clickhouse cluster that stores our service logs. We are in the process of migrating this to be our primary storage, but at the moment it provides an alternative interface into the same logs that ElasticSearch provides, allowing our Engineers to use either Lucene through the ELK stack, or SQL and Bash scripts through the Clickhouse interface.

What’s next?

As Cloudflare continues to grow, our demands on our Observability systems, and our logging pipeline in particular continue to grow with it. This means that we’re always thinking ahead to what will allow us to scale and improve the experience for our engineers. On the horizon, we have a number of projects to further that goal including:

  • Increasing our multi-tenancy capabilities with better resource insights for our engineers
  • Migrating our syslog-ng pipeline towards Open Telemetry
  • Tail sampling rather than our probabilistic head sampling we have at the moment
  • Better balancing for our Kafka clusters

If you’re interested in working with logging at Cloudflare, then reach out – we’re hiring!

Building a Partner Program: The Zabbix Advantage

Post Syndicated from Michael Kammer original https://blog.zabbix.com/building-a-partner-program-the-zabbix-advantage/27164/

At Zabbix, our emphasis on high performance, functionality, and reliability has led to the creation of one of the most popular monitoring solutions on the market. It’s so popular, in fact, that we get near-constant requests for Zabbix professional consulting, advice, support, and training from almost every corner of the world.

That’s why we created the Zabbix Partner Program. Our partner program was designed with one goal in mind – to get our services to the widest possible audience of qualified buyers by allowing customers to purchase them through a network of verified Zabbix partners as well as from Zabbix directly.

Our partners create high value for thousands of customers who would not otherwise enjoy access to Zabbix services by providing complete localization in terms of linguistic and cultural compatibility, availability across time zones, in-person access, and flexibility around currencies and payments.

To do that as effectively as possible, we’ve divided our partners into 3 categories:

Resellers. These are companies that promote and resell Zabbix services. Their job is to locate leads, present and promote Zabbix products and services, consult the leads regarding their ideal solutions, and arrange the contracts. At that point, Zabbix steps in and provides the services. Resellers are a great resource for customers who are limited by local regulations when it comes to buying Zabbix services in their local currency or from companies registered in their own country.

Certified Partners. Certified partners can also promote and resell Zabbix services, but they’re also officially authorized to deliver selected Zabbix services and solutions in their local languages. The ease of access and a common language allows certified partners to stay in close contact with customers. They can also sell their own value-adding services alongside Zabbix services.

Premium Partners. A premium partner has the same authorization as certified partners, but premium partner status is reserved for partners with the highest expertise and experience. Premium partners can participate in highly sophisticated Zabbix implementation, integration, and support projects.

Building a winning partner program has taught us a few things about the process, so without further ado, we’d like to share 6 best practices that we adhere to when it comes to cultivating and expanding our network of partners.

Set realistic goals

Years of running a partner program have taught us that success is impossible without clearly defined goals and success metrics. Setting firm, realistic goals for a program is the only way to measure its effectiveness and ROI. After a few quarters, it should be possible to compare performance to goals and see whether changes need to be made.

Accordingly, we make sure that Zabbix executives, sales teams, and partners are aware that getting a new program up and running (or making changes to an existing program) takes time. Expecting instant results is not realistic – we’ve learned that a ramp-up period of a few months is usually reasonable.

Make expectations clear

Nothing kills momentum faster than confusion. That’s why it’s important to make sure that partners have a solid understanding of everything that’s being asked of them. We’ve learned to give partners concise goals and objectives so that everyone is on the same page. We also create annual business plans for all three partnership programs, review them quarterly, and reward success.

Having the same KPIs as partners is also important. When different metrics for success exist, we run the risk of our partners being less enthusiastic about taking actions that will increase the success of Zabbix but may do less for them. In our experience, it’s better to build partnerships around a joint success target so that when partners win, we win.

Support your partners

At Zabbix, supporting our partners means providing outstanding sales, marketing, and technical support, all of which shows that we’re invested in their success as much as our own. Our partnership team helps partners with all presales-related questions, organizes demo calls, manages the deal registration to protect partner deals, patriciates in joint calls with customers, and helps with all possible legal questions and certifications.

Apart from day-to-day pre-sales support, we organize and participate in joint Zabbix marketing events of different formats together with our partners. These meetups, meetings, conferences, and external events organized by other vendors around the globe are designed to spread the word about Zabbix solutions and services while helping our partners generate new leads. During these events, our partners demonstrate their recent use-cases and serve as experts for the rest of the partner network and the wider Zabbix community.

Build Trust

Trust is the foundation of all partnerships, and we find that our partners trust us because we deliver the support and tools they need to be successful. It’s why we work hard to keep our partners updated with product developments and industry trends, and we continuously educate them on how to sell and overcome roadblocks.

We even allow some of our partners to conduct official Zabbix trainings, provided they have a certified trainer available. When an existing partner wants to become a training partner, we discuss their needs and plan their training certification together.

Measure and monitor

Whether launching a new program or scaling up an existing one, measuring the right key performance indicators (KPIs) can mean the difference between growth and chaos. If a business doesn’t know what to measure and optimize for their partner program, they won’t know what to improve if growth stalls out, and you’ll struggle to explain how partnerships contribute value.

It’s impossible to get far on the road to success without measuring progress along the way. That’s why we review goals and metrics with our partners every quarter, assess what’s working well and what’s missing the mark, and adapt and adjust if needed. We’ve learned not to change things up too often, but we’re always open to making tweaks that will amplify success.

Communicate effectively

One of the most important ingredients of any successful partner program is communication. It’s essential to keep partners informed about new products, promotions, and other important updates. That involves knowing the audience and understanding what each partner type and their respective employees are interested in and when.

A cornerstone of the Zabbix Partner Program is our ability to actively listen to our partners’ feedback. Our experience is that getting ahead of issues and concerns strengthens relationships, maintains trust, and guarantees that our partners feel supported and valued.

Conclusion

Becoming a Zabbix Partner is an ideal way to get recognized by potential customers and increase the visibility of your business, while also getting a leg up on your competitors by using technical support according to a professional service-level agreement.

In addition, you can count on discounts on all Zabbix services, the ability to access pre-sale consulting services, and participation in joint marketing events.

To find out more about our partner program and sign up, visit the Zabbix Partners page.

The post Building a Partner Program: The Zabbix Advantage appeared first on Zabbix Blog.

Second Interdisciplinary Workshop on Reimagining Democracy

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/01/second-interdisciplinary-workshop-on-reimagining-democracy.html

Last month, I convened the Second Interdisciplinary Workshop on Reimagining Democracy (IWORD 2023) at the Harvard Kennedy School Ash Center. As with IWORD 2022, the goal was to bring together a diverse set of thinkers and practitioners to talk about how democracy might be reimagined for the twenty-first century.

My thinking is very broad here. Modern democracy was invented in the mid-eighteenth century, using mid-eighteenth-century technology. Were democracy to be invented from scratch today, with today’s technologies, it would look very different. Representation would look different. Adjudication would look different. Resource allocation and reallocation would look different. Everything would look different, because we would have much more powerful technology to build on and no legacy systems to worry about.

Such speculation is not realistic, of course, but it’s still valuable. Everyone seems to be talking about ways to reform our existing systems. That’s critically important, but it’s also myopic. It represents a hill-climbing strategy of continuous improvements. We also need to think about discontinuous changes that you can’t easily get to from here; otherwise, we’ll be forever stuck at local maxima.

I wrote about the philosophy more in this essay about IWORD 2022. IWORD 2023 was equally fantastic, easily the most intellectually stimulating two days of my year. The event is like that; the format results in a firehose of interesting.

Summaries of all the talks are in the first set of comments below. (You can read a similar summary of IWORD 2022 here.) Thank you to the Ash Center and the Belfer Center at Harvard Kennedy School, and the Knight Foundation, for the funding to make this possible.

Next year, I hope to take the workshop out of Harvard and somewhere else. I would like it to live on for as long as it is valuable.

Now, I really want to explain the format in detail, because it works so well.

I used a workshop format I and others invented for another interdisciplinary workshop: Security and Human Behavior, or SHB. It’s a two-day event. Each day has four ninety-minute panels. Each panel has six speakers, each of whom presents for ten minutes. Then there are thirty minutes of questions and comments from the audience. Breaks and meals round out the day.

The workshop is limited to forty-eight attendees, which means that everyone is on a panel. This is important: every attendee is a speaker. And attendees commit to being there for the whole workshop; no giving your talk and then leaving. This makes for a very collaborative environment. The short presentations means that no one can get too deep into details or jargon. This is important for an interdisciplinary event. Everyone is interesting for ten minutes.

The final piece of the workshop is the social events. We have a night-before opening reception, a conference dinner after the first day, and a final closing reception after the second day. Good food is essential.

Honestly, it’s great but it’s also it’s exhausting. Everybody is interesting for ten minutes. There’s no down time to zone out or check email. And even though a shorter event would be easier to deal with, the numbers all fit together in a way that’s hard to change. A one-day event means only twenty-four attendees/speakers, and that’s not a critical mass. More people per panel doesn’t work. Not everyone speaking creates a speaker/audience hierarchy, which I want to avoid. And a three-day, slower-paced event is too long. I’ve thought about it long and hard; the format I’m using is optimal.

The 6.7 kernel has been released

Post Syndicated from corbet original https://lwn.net/Articles/957098/

Linus has released the 6.7 kernel.

End result: 6.7 is (in number of commits: over 17k non-merge
commits, with 1k+ merges) one of the largest kernel releases we’ve
ever had, but the extra rc8 week was purely due to timing with the
holidays, not about any difficulties with the larger release.

Some of the headline features in this release are:
the removal of support for the Itanium
architecture,
the first part of the futex2 API,
futex
support
in io_uring,
the BPF exceptions mechanism,
the bcachefs filesystem,
the TCP
authentication option
,
the kernel samepage merging smart
scan mode
, and
networking
support
for the Landlock security module.
See the LWN merge-window summaries (part 1,
part 2) and the (in-progress) KernelNewbies 6.7 page for
more information.