How AWS powered Prime Day 2024 for record-breaking sales

Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/how-aws-powered-prime-day-2024-for-record-breaking-sales/

The last Amazon Prime Day 2024 (July 17-18) was Amazon’s biggest Prime Day shopping event ever, with record sales and more items sold during the two-day event than any previous Prime Day event. Prime members shopped for millions of deals and saved billions across more than 35 categories globally.

I live in South Korea, but luckily I was staying in Seattle to attend the AWS Heroes Summit during Prime Day 2024. I signed up for a Prime membership and used Rufus, my new AI-powered conversational shopping assistant, to search for items quickly and easily. Prime members in the U.S. like me chose to consolidate their deliveries on millions of orders during Prime Day, saving an estimated 10 million trips. This consolidation results in lower carbon emissions on average.

We know from Jeff’s annual blog post that AWS runs the Amazon website and mobile app that makes these short-term, large scale global events feasible. (check out his 2016, 2017, 2019, 2020, 2021, 2022, and 2023 posts for a look back). Today I want to share top numbers from AWS that made my amazing shopping experience possible.

Prime Day 2024 – all the numbers
Here are some of the most interesting and/or mind-blowing metrics:

Amazon EC2 – Since many of Amazon.com services such as Rufus and Search use AWS artificial intelligence (AI) chips under the hood, Amazon deployed a cluster of over 80,000 Inferentia and Trainium chips for Prime Day. During Prime Day 2024, Amazon used over 250K AWS Graviton chips to power more than 5,800 distinct Amazon.com services (double that of 2023).

Amazon EBS – In support of Prime Day, Amazon provisioned 264 PiB of Amazon EBS storage in 2024, a 62 percent increase compared to 2023. When compared to the day before Prime Day 2024, Amazon.com performance on Amazon EBS jumped by 5.6 trillion read/write I/O operations during the event, or an increase of 64 percent compared to Prime Day 2023. Also, when compared to the day before Prime Day 2024, Amazon.com transferred an incremental 444 petabytes of data during the event, or an increase of 81 percent compared to Prime Day 2023.

Amazon Aurora – On Prime Day, 6,311 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed more than 376 billion transactions, stored 2,978 terabytes of data, and transferred 913 terabytes of data.

Amazon DynamoDB – DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made tens of trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 146 million requests per second.

Amazon ElastiCache – ElastiCache served more than quadrillion requests on a single day with a peak of over 1 trillion requests per minute.

Amazon QuickSight – Over the course of Prime Day 2024, one Amazon QuickSight dashboard used by Prime Day teams saw 107K unique hits, 1300+ unique visitors, and delivered over 1.6M queries.

Amazon SageMaker – SageMaker processed more than 145B inference requests during Prime Day.

Amazon Simple Email Service (Amazon SES) – SES sent 30 percent more emails for Amazon.com during Prime Day 2024 vs 2023, delivering 99.23 percent of those emails to customers.

Amazon GuardDuty – During Prime Day 2024, Amazon GuardDuty monitored nearly 6 trillion log events per hour, a 31.9% increase from the previous year’s Prime Day.

AWS CloudTrail – CloudTrail processed over 976 billion events in support of Prime Day 2024.

Amazon CloudFront – CloudFront handled a peak load of over 500 million HTTP requests per minute, for a total of over 1.3 trillion HTTP requests during Prime Day 2024, a 30 percent increase in total requests compared to Prime Day 2023.

Prepare to Scale
As Jeff noted in every year, rigorous preparation is key to the success of Prime Day and our other large-scale events. For example, 733 AWS Fault Injection Service experiments were run to test resilience and ensure Amazon.com remains highly available on Prime Day.

If you are preparing for a similar business-critical events, product launches, and migrations, I strongly recommend that you take advantage of newly-branded AWS Countdown, a support program designed for your project lifecycle to assess operational readiness, identify and mitigate risks, and plan capacity, using proven playbooks developed by AWS experts. For example, with additional help from AWS Countdown, Legal Zoom successfully migrated 450 servers with minimal issues and continues to leverage AWS Countdown Premium to streamline and expedite the launch of SaaS applications.

We look forward to seeing what other records will be broken next year!

Channy & Jeff;

Use AWS Glue to streamline SFTP data processing

Post Syndicated from Seun Akinyosoye original https://aws.amazon.com/blogs/big-data/use-aws-glue-to-streamline-sftp-data-processing/

In today’s data-driven world, seamless integration and transformation of data across diverse sources into actionable insights is paramount. AWS Glue is a serverless data integration service that helps analytics users to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development. With AWS Glue, you can discover and connect to hundreds of diverse data sources and manage your data in a centralized data catalog. It enables you to visually create, run, and monitor extract, transform, and load (ETL) pipelines to load data into your data lakes.

In this blog post, we explore how to use the SFTP Connector for AWS Glue from the AWS Marketplace to efficiently process data from Secure File Transfer Protocol (SFTP) servers into Amazon Simple Storage Service (Amazon S3), further empowering your data analytics and insights.

Introducing the SFTP connector for AWS Glue

The SFTP connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from SFTP storage and to load data into SFTP storage. This connector provides comprehensive access to SFTP storage, facilitating cloud ETL processes for operational reporting, backup and disaster recovery, data governance, and more.

Solution overview

In this example, you use AWS Glue Studio to connect to an SFTP server, then enrich that data and upload it to Amazon S3. The SFTP connector is used to manage the connection to the SFTP server. You will load the event data from the SFTP site, join it to the venue data stored on Amazon S3, apply transformations, and store the data in Amazon S3. The event and venue files are from the TICKIT dataset.

The TICKIT dataset tracks sales activity for the fictional TICKIT website, where users buy and sell tickets online for sporting events, shows, and concerts. In this dataset, analysts can identify ticket movement over time, success rates for sellers, and best-selling events, venues, and seasons.

For this example, you use AWS Glue Studio to develop a visual ETL pipeline. This pipeline will read data from an SFTP server, perform transformations, and then load the transformed data into Amazon S3. The following diagram illustrates this architecture.

solution overview

By the end of this post, your visual ETL job will resemble the following screenshot.

final solution

Prerequisites

For this solution, you need the following:

  • Subscribe to the SFTP Connector for AWS Glue in the AWS Marketplace.
  • Access to an SFTP server with permissions to upload and download data.
    • If the SFTP server is hosted on Amazon Elastic Compute Cloud (Amazon EC2), we recommend that the network communication between the SFTP server and the AWS Glue job happens within the virtual private cloud (VPC) as pictured in the preceding architecture diagram. Running your Glue job within a VPC and security group will be discussed further in the steps to create the AWS Glue job.
    • If the SFTP server is hosted within your on-premises network, we recommend that the network communication between the SFTP server and the Glue job happens through VPN or AWS DirectConnect.
  • Access to an S3 bucket or the permissions to create an S3 bucket. We recommend that you connect to that bucket using a gateway endpoint. This will allow you to connect to your S3 bucket directly from your VPC. If you need to create an S3 bucket to store the results, complete the following steps:
    1. On the Amazon S3 console, choose Buckets in the navigation pane.
    2. Choose Create bucket.
    3. For Name, enter a globally unique name for your bucket; for example, tickit-use1-<accountnumber>.
    4. Choose Create bucket.
    5. For this demonstration, create a folder with the name tickit in your S3 bucket.
    6. Create the gateway endpoint.
  • Create an AWS Identity and Access Management (IAM) role for the AWS Glue ETL job. You must specify an IAM role for the job to use. The role must grant access to all resources used by the job, including Amazon S3 (for any sources, targets, scripts, and temporary directories) and AWS Secrets Manager. For instructions, see Configure an IAM role for your ETL job.

Load dataset to SFTP site

Load the allevents_pipe.txt file and venue_pipe.txt file from the TICKIT dataset to your SFTP server.

Store SFTP server sign-in credentials

An AWS Glue connection is a Data Catalog object that stores connection information, such as URI strings and location to credentials that are stored in a Secrets Manager secret.

To store the SFTP server username and password in Secrets Manager, complete the following steps:

  1. On the Secrets Manager console, choose Secrets in the navigation pane.
  2. Choose Store a new secret.
  3. Select Other type of secret.
  4. Enter host as Secret key and your SFTP server’s IP address (for example, 153.47.122) as the Secret value, then choose Add row.
  5. Enter the username as Secret key and your SFTP username as Secret value, then choose Add row.
  6. Enter password as Secret key and your SFTP password as Secret value, then choose Add row.
  7. Enter keyS3Uri as Secret Key and the Amazon S3 location of your SFTP secret key file as Secret value

Note: Secret Value is the full S3 path where the SFTP server key file is stored. For example:s3://sftp-bucket-johndoe123/id_rsa.

  1. For Secret name, enter a descriptive name, then choose Next.
  2. Choose Next to move to the review step, then choose Store.

secret value

Create a connection to the SFTP server in AWS Glue

Complete the following steps to create your connection to the SFTP server.

  1. On the AWS Glue console, under Data Catalog in the navigation pane, choose Connections.

creating sftp connection from marketplace

  1. Select the SFTP connector for AWS Glue 4.0. Then choose Create connection.

using sftp connector

  1. Enter a name for the connection and then, under Connection access, choose the Secrets Manager secret you created for you SFTP server credentials.

finishing sftp connection

Create a connection to the VPC in AWS Glue

A data connection is used to establish network connectivity between the VPC and the AWS Glue job. To create the VPC connection, complete the following steps.

  1. On the AWS Glue console page, click on Data Connections location on the left side menu.
  2. Click the Create connection button in the Connections panel.

creating connection for VPC

  1. Select Network

choosing network option

  1. Select the VPC, Subnet, and Security Group that your SFTP server resides in. Click Next.

choosing vpc, subnet, sg for connection

  1. Name the connection SFTP VPC Connect and then click

Deploy the solution

Now that we completed the prerequisites, we are going to setup the AWS Glue Studio job for this solution. We will create a glue studio job, add events and venue data from the SFTP server, carry out data transformations and load transformed data to s3.

Create your AWS Glue Studio job:

  1. On the AWS Glue console, under ETL Jobs in the navigation pane, choose Visual ETL.
  2. Select Visual ETL in the central pane.
  3. Choose the pencil icon to enter a name for your job.
  4. Choose the Job details tab.

choosing job details

  1. Scroll down to and select Advanced properties and expand.
  2. Scroll to Connections and select SFTP VPC Connect.

choosing sftp vpc connection

  1. Choose Visual to go back to the workflow editor page.

Add the events data from the SFTP server as your first data set:

  1. Choose Add nodes and select SFTP Connector for AWS Glue 4.0 on the Sources
  2. Enter the following for Data source properties for:
    1. Connection: Select the connection to the SFTP server that you created in Create the connection to the SFTP server in AWS Glue.
    2. Enter the following key-value pairs:
Key Value
header false
path /files (this should be the path to the event file in your SFTP server)
fileFormat csv
delimiter |

glue studio job configuration

Rename the columns of the Event dataset:

  1. Choose Add nodes and choose Change Schema on the Transforms
  2. Enter the following transform properties:
    1. For Name, enter Rename Event data.
    2. For Node parents, select SFTP Connector for AWS Glue 4.0.
    3. In the Change Schema section, map the source keys to the target keys:
      1. col0: eventid
      2. col1: e_venueid
      3. col2: catid
      4. col3: dateid
      5. col4: eventname
      6. col5: starttime

transforming event data

Add the venue_pipe.txt file from the SFTP site:

  1. Choose Add nodes and choose SFTP Connector for AWS Glue 4.0 on the Sources
  2. Enter the following for Data source properties for:
    1. Connection: Select the connection to the SFTP server that you created in Create the connection to the SFTP server in AWS Glue.
    2. Enter the following key-value pairs:
Key Value
header false
path /files (this should be the path to the venue file in your SFTP site)
fileFormat csv
delimiter |

Rename the columns of the venue dataset:

  1. Choose Add nodes and choose Change Schema on the Transforms
  2. Enter the following transform properties:
    1. For Name, enter Rename Venue data.
    2. For Node parents, select Venue.
    3. In the Change Schema section, map the source keys to the target keys:
      1. col0: venueid
      2. col1: venuename
      3. col2: venuecity
      4. col3: venuestate
      5. col4: venueseats

transforming venue data

Join the venue and event datasets.

  1. Choose Add nodes and choose Join on the Transforms
  2. Enter the following transform properties:
    1. For Name, enter Join.
    2. For Node parents, select Rename Venue data and Rename Event data.
    3. For Join type¸ select Inner join.
    4. For Join conditions, select venueid for Rename Venue data and e_venueid for Rename Event data.

transform join venue and event

Drop the duplicate field:

  1. Choose Add nodes and choose Drop Fields on the Transforms
  2. Enter the following transform properties:
    1. For Name, enter Drop Fields.
    2. For Node parents, select Join.
    3. In the DropFields section, select e_venueid.

drop field transform

Load the data into your S3 bucket:

  1. Choose Add nodes and choose Amazon S3 from the Sources
  2. Enter the following transform properties:
    1. For Node parents, select Drop Fields.
    2. For Format, select CSV.
    3. For Compression Type, select None.
    4. For S3 Target Location, choose your S3 bucket and enter your desired file name followed by a slash (/).

loading data to s3 target

You can now save and run your AWS Glue visual ETL Job. Run the job and then go to the Runs tab to monitor its progress. After the job has completed, the Run status will change to Succeeded. The data will be in the target S3 bucket.

completed job

Clean up

To avoid incurring additional charges caused by resources created as part of this post, make sure you delete the items created in the AWS Account for this post:

  • Delete the Secrets Manager key created for the SFTP connector . credentials.
  • Delete the SFTP connector.
  • Unsubscribe from the SFTP Connector in AWS Marketplace.
  • Delete the data loaded to the Amazon S3 bucket and the bucket.
  • Delete the AWS Glue visual ETL job.

Conclusion

In this blog post, we demonstrated how to use the SFTP connector for AWS Glue to streamline the processing of data from SFTP servers into Amazon S3. This integration plays a pivotal role in enhancing your data analytics capabilities by offering an efficient and straightforward method to bring together disparate data sources. Whether your goal is to analyze SFTP server data for actionable insights, bolster your reporting mechanisms, or enrich your business intelligence tools, this connector ensures a more streamlined and cost-effective approach to achieving your data objectives.

For further details on the SFTP connector, see the SFTP Connector for Glue documentation.


About the Authors

Sean Bjurstrom is a Technical Account Manager in ISV accounts at Amazon Web Services, where he specializes in Analytics technologies and draws on his background in consulting to support customers on their analytics and cloud journeys. Sean is passionate about helping businesses harness the power of data to drive innovation and growth. Outside of work, he enjoys running and has participated in several marathons.

Seun Akinyosoye is a Sr. Technical Account Manager supporting public sector customer at Amazon Web Services. Seun has a background in analytics, data engineering which he uses to help customers achieve their outcomes and goals. Outside of work Seun enjoys spending time with his family, reading, traveling and supporting his favorite sports teams.

Vinod Jayendra is a Enterprise Support Lead in ISV accounts at Amazon Web Services, where he helps customers in solving their architectural, operational, and cost optimization challenges. With a particular focus on Serverless technologies, he draws from his extensive background in application development to deliver top-tier solutions. Beyond work, he finds joy in quality family time, embarking on biking adventures, and coaching youth sports team.

Kamen Sharlandjiev is a Sr. Big Data and ETL Solutions Architect, MWAA and AWS Glue ETL expert. He’s on a mission to make life easier for customers who are facing complex data integration and orchestration challenges. His secret weapon? Fully managed AWS services that can get the job done with minimal effort. Follow Kamen on LinkedIn to keep up to date with the latest MWAA and AWS Glue features and news!

Chris Scull is a Solutions Architect dealing in orchestration tools and modern cloud technologies. With two years of experience at AWS, Chris has developed an interest in Amazon Managed Workflows for Apache Airflow, which allows for efficient data processing and workflow management. Additionally, he is passionate about exploring the capabilities of GenAI with Bedrock, a platform for building generative AI applications on AWS.

Shengjie Luo is a Big data architect of Amazon Cloud Technology professional service team. Responsible for solutions consulting, architecture and delivery of AWS based data warehouse and data lake, and good at server-less computing, data migration, cloud data integration, data warehouse planning, data service architecture design and implementation.

Qiushuang Feng is a Solutions Architect at AWS, responsible for Enterprise customers’ technical architecture design, consulting, and design optimization on AWS Cloud services. Before joining AWS, Qiushuang worked in IT companies such as IBM and Oracle, and accumulated rich practical experience in development and analytics.

The ultimate guide to developer happiness

Post Syndicated from Jeimy Ruiz original https://github.blog/engineering/engineering-principles/the-ultimate-guide-to-developer-happiness/


In today’s rapidly evolving landscape, where AI is reshaping industries and transforming workflows, the role of developers has never been more critical. As business leaders, fostering an environment where developers feel valued, motivated, and empowered is essential to harnessing their full potential and keeping your business profitable and innovative.

In this blog post, we’ll explore actionable tips and strategies to supercharge developer happiness, ensuring your team remains productive, engaged, and ahead of the AI curve. We’ll walk you through ways to secure your code with AI, how to increase productivity with a strong developer experience, and, of course, invite you to join us at GitHub Universe 2024 to see the very best of the latest AI tooling in action.

Boost productivity with a great developer experience

Developer experience is more than just a buzzword—it’s a critical factor in driving productivity and collaboration within software development teams. A seamless developer experience allows developers to get into the flow state more easily, where their productivity and creativity can peak. This flow state—characterized by uninterrupted concentration and a deep sense of involvement in the task—is crucial for tackling complex coding challenges.

This work environment needs to be built intentionally, and the research backs it up. Developers who carve out time for deep work enjoy 50% more productivity, while those that get work they find engaging are 30% more productive.

How does this impact businesses? Well, because a developer that can significantly reduce their context-switching and mental load can also produce code faster and at a higher quality.

When developers understand their code, they’re 42% more productive. When developers are able to get faster turnaround times, they are 20% more innovative. These are tangible, individual benefits that in turn directly impact the output of developer teams.

Now is the time for leaders to invest in creating a great developer experience. By prioritizing the developer experience, you’re setting your team up to harness the full potential of the latest AI and platform engineering advances, ensuring your business stays ahead of the curve. Curious to learn more? Then dive into how a great developer experience fuels productivity with our latest research.

Use AI to secure your code

Historically, developers and security teams have found themselves at odds due to competing business goals. Shifting security left incorporates security earlier in the software development lifecycle, but in practice it has primarily shifted responsibility to developers without necessarily giving them the required expertise.

This, combined with the context switching inherent in development work, makes addressing security concerns particularly challenging. With AI, developers now have powerful tools at their disposal to enhance code security. AI can:

  • Improve detection rates
  • Provide near-instant fixes with context
  • Enable application security (AppSec) at scale

These three improvements make it easier for developers to integrate robust security measures without sacrificing productivity, and transform the relationship between developers and security teams into a collaborative partnership.

Introducing a new security tool doesn’t have to be a daunting task either. By following a few simple steps, organizations can ensure a smooth transition and broad adoption.

  • Document the tool’s features and usage to set the foundation and set realistic expectations to help align goals across teams.
  • Recognize and celebrate successes to showcase the value of the new tool.
  • Adopt a go-with-the-flow approach and organize hackathons to further drive engagement and interest.
  • Listen to developer feedback continuously improve and refine security practices.

AI-powered security tools not only enhance the efficiency and effectiveness of AppSec, but also empower developers to take a proactive role in securing their code. This shift not only improves overall security posture, but also fosters a culture of shared responsibility and continuous learning, ultimately leading to more secure and resilient applications.

See exactly why security should be built into the developer workflow. 👇

Customize your LLMs

Organizations that take AI a step further and customize their AI tools are poised to lead the pack.

Large language models (LLMs) are trained on vast amounts of text data and can perform a variety of natural language processing tasks like translation, summarization, question-answering, and text generation. Customizing a pre-trained LLM goes beyond mere training—it involves adapting the model to perform specific tasks relevant to the organization’s needs. This level of customization helps developers maintain their flow state and significantly boost productivity and efficiency.

Customization techniques like retrieval-augmented generation (RAG), in-context learning, and fine-tuning enable LLMs to deliver more accurate and contextually appropriate responses:

  • RAG combines retrieval-based and generation-based approaches in natural language processing. It enhances LLMs by integrating information retrieval techniques, where relevant documents or snippets are retrieved from a vector database to assist in generating more accurate and contextually appropriate responses. This approach allows the model to access and utilize external knowledge, making the generated output more informed and relevant to the user’s query.
  • In-context learning refers to a model’s ability to adapt and respond to new tasks or inputs based on the context provided in the input prompt without requiring additional training. The model leverages its pre-trained knowledge and the context given in the input to perform tasks effectively.
  • Fine-tuning, on the other hand, is a process in which an LLM is further trained on a specific dataset to adapt it to a particular task or domain. During fine-tuning, the model’s parameters are adjusted based on the new dataset, which typically involves supervised learning with labeled data. This process allows the model to specialize and improve its performance on specific tasks, (such as text classification, question answering, or machine translation), by leveraging the general knowledge acquired during its initial pre-training phase.

By implementing these customization strategies, businesses can unlock the full potential of their AI tools. Customized LLMs not only improve developer productivity—they also enhance the quality and relevance of AI-generated content.

Learn how to customize GitHub Copilot in this guide.

Prepare your repository for teamwork

Fostering collaboration doesn’t just make software development faster, it also helps teams build better products and boost job satisfaction. By making your repository as collaborative as possible, you’ll optimize success. This includes focusing on:

  • Repository settings: properly configuring repository settings to control visibility, access, and contribution workflows lays the foundation for collaboration.
  • Repository contents: including essential files like README.md, LICENSE.md, CONTRIBUTING.md, CODEOWNERS, and CODE_OF_CONDUCT.md helps collaborators understand the project, its purpose, and how to contribute.
  • Automation and checks: implementing automation tools such as linters, continuous integration (CI), and continuous deployment (CD) pipelines streamlines the development process, ensures code quality, and enables immediate feedback.
  • Security practices: enforcing role-based access control, managing secrets securely, and scanning code for vulnerabilities can foster trust and protect the project from vulnerabilities.
  • Issue templates: providing structured issue templates guides contributors in providing necessary information and context when reporting bugs.
  • Community engagement: engaging with the project’s community through meetups, project blogs, discussions, and other channels fosters belonging and builds relationships.

Invest in your team’s learning opportunities

When you signal to your team that you value their career growth and exposure to learning opportunities, it can boost happiness and job satisfaction, leading to increased productivity, collaboration, and better problem solving.

Encouraging your developer teams to attend conferences like GitHub Universe 2024 is a strategic investment in their professional growth and your business’ success. Our global developer event provides an unparalleled platform for the best in software development to gather and expand their knowledge, stay updated on the latest AI-powered tools, and bring fresh ideas back to their teams.

Here are a few highlights of what you and your team can expect:

  • Help your developers get in the flow and stay there with sessions, demos, panels, and more on the powerful tools and techniques that enhance productivity and satisfaction.
  • Connect with other technical leaders to share experiences, challenges, and best practices. Expand your network with valuable industry contacts.
  • Get a first look at GitHub’s product roadmap and see how upcoming features and enhancements can help you stay ahead in a competitive landscape.
  • Gain technical skills with GitHub certifications and workshops designed to enhance your expertise in a rapidly evolving industry.
  • Learn the latest on GitHub Copilot and stay ahead with the latest coding practices and techniques.

Get your tickets today. You can take advantage of our group discount and get four tickets for the price of three. (That’s a 25% savings!)

If you’re flying solo, you can also use our Early Bird discount and save 20% off one in-person ticket, only until September 3.

Reach new levels of creativity and efficiency

Incorporating these five business strategies can transform your development process and increase developer happiness. By investing in these areas, you empower your team, foster a culture of continuous learning, and position your organization for success in the rapidly evolving tech landscape.

The post The ultimate guide to developer happiness appeared first on The GitHub Blog.

[$] Changes coming in PostgreSQL 17

Post Syndicated from daroc original https://lwn.net/Articles/984599/

The

PostgreSQL
project has

released
beta
versions of PostgreSQL 17 containing several interesting security and usability
improvements, alongside the usual performance improvements and bug fixes. If the
release proceeds according to the usual timeline, the full release of version 17
is expected in September or October.
The most important changes are in what PostgreSQL does when a database
supervisor has their credentials revoked, and added
support for incremental database backups.

Lix makes its second release

Post Syndicated from daroc original https://lwn.net/Articles/985484/

Lix, the fork of Nix that LWN covered in July, has made its second release since forking. This one includes substantial changes to the backend code, including removing a dependency on Bison, and getting a change to the Nix language back upstream.

The general theme of Lix 2.91 is to perform another wave of
refactorings and design improvements in preparation for our evolution
plans.

Nevertheless, there are a few exciting user facing changes[.]

Introducing HTTP request traffic insights on Cloudflare Radar

Post Syndicated from David Belson original https://blog.cloudflare.com/http-requests-on-cloudflare-radar


Historically, traffic graphs on Cloudflare Radar have displayed two metrics: total traffic and HTTP traffic. These graphs show normalized traffic volumes measured in bytes, derived from aggregated NetFlow data. (NetFlow is a protocol used to collect metadata about IP traffic flows traversing network devices.) Today, we’re adding an additional metric that reflects the number of HTTP requests, normalized over the same time period. By comparing bytes with requests, readers can gain additional insights into traffic patterns and user behavior. Below, we review how this new data has been incorporated into Radar, and explore HTTP request traffic in more detail.

Note that while we refer to “HTTP request traffic” in this post and on Radar, the term encompasses requests made in the clear over HTTP and over encrypted connections using HTTPS – the latter accounts for ~95% of all requests to Cloudflare during July 2024.

New and updated graphs

Graphs including HTTP request-based traffic data have been added to the Overview and Traffic sections on Cloudflare Radar. On the Overview page, the “Traffic trends” graph now includes a drop-down selector at the upper right, where you can choose between “Total & HTTP bytes” and “HTTP requests & bytes”. We explore the distinction between these further in the following sections.

The default “Total & HTTP bytes” selection displays a time series graph, showing total bytes and HTTP bytes traffic over time, as Radar has done for several years now.

Selecting “HTTP requests & bytes” from the dropdown switches the view to a time series graph that HTTP requests traffic and HTTP bytes traffic over time. In both graphs, users can click on a metric in the legend to deselect it and remove it from the graph. These (de)selections are maintained when a user chooses to download or save a graph.

In addition, we’ve added a “Protocols” summary next to the graph that shows the share of bytes over the selected time period that HTTP accounts for, and the remaining aggregate share associated with the protocols used by other non-HTTP Cloudflare services (such as DNS, WARP, etc.). For most locations or ASNs, HTTP traffic will comprise the majority share of bytes-based traffic.

On Radar’s Traffic page, we have added the HTTP requests metric to the “Traffic volume” graph at the top of the page, allowing you to see how request volume has changed during the selected time period as compared to the previous period, in addition to the changes in the bytes-based metrics.

A new standalone request-based “HTTP traffic” graph was also added to the Traffic page, just below the bytes-based “Traffic trends” graph. This new graph shows normalized HTTP request traffic volume across the selected time period, and by default, also compares it with the previous time period.

Similar to other Radar graphs, these new HTTP request-based graphs can also be downloaded, copied to the clipboard, or embedded in other websites – just click on the share icon.

As always, the underlying data is also available through the Radar API. The “HTTP requests Time Series” API endpoint returns normalized HTTP request time series data across the specified time period for the requested location or autonomous system (ASN).

What is HTTP request traffic?

An HTTP GET request is a message sent from a client (such as your web browser) to a web server (such as one operated by Cloudflare), asking for a particular resource (file). In addition to returning the requested resource, which could range from a single-pixel GIF accounting for just a few bytes, to an API call that returns a few kilobytes of data, to a multi-gigabyte software package, the Web server also returns a set of headers, which can include information about the content type, the last time the resource was modified, cookie information, cacheability, and more. While GET requests account for the overwhelming majority of HTTP request traffic, such traffic also includes other HTTP request methods including HEAD, POST, PUT, and more.

Cloudflare temporarily logs HTTP requests received by our network, including associated header information and “metadata” about the request, such as the bot score computed for the request and the associated cache status. Request logs for a customer’s web properties are available for them to download, and after processing and analysis, this data is also presented in the Analytics section of the Cloudflare dashboard. The HTTP request data now available on Radar is based on a sample of this log data, aggregated across Cloudflare’s global customer base.

The value of request-based traffic insights

Cloudflare Radar already has HTTP data, so why add more? One key reason for analyzing and including HTTP request traffic is resilience. Having multiple sources of truth with respect to HTTP traffic allows us to ​​better and more quickly distinguish between real events (such as an Internet disruption in a given country or network) and data pipeline issues.

While bytes-based metrics provide a reasonable proxy into human (user) behavior, especially with respect to activity surrounding Internet disruptions, request-based metrics provide an even better perspective. A lot of HTTP traffic involves relatively small responses – especially API traffic, which now accounts for 60% of all traffic. Furthermore, response sizes can vary widely, ranging from a single-pixel GIF accounting for just a few bytes, to an API call that returns a few kilobytes of data, to a multi-gigabyte software package

To that end, the scope of user activity may be insufficiently reflected by a bytes-based metric, or buried in the noise, whereas request activity provides a cleaner signal and a more direct proxy for user activity. This is especially important as we examine the restoration of connectivity after an Internet disruption, attempting to ascertain when activity has returned to “expected” pre-disruption levels.

Finally, incorporating request-based traffic insights into Radar is simply extending the way that the data is already being used on the site. All of the graphs, maps, and tables presented on Radar’s Adoption & Usage page, are based on analysis of HTTP request traffic, making use of information contained within request headers (such as HTTP version or user agent) or characteristics of the underlying connection (such as IP version).

Bytes vs requests – what’s the difference?

The current “HTTP traffic” view aggregates the bytes associated with HTTP requests to Cloudflare’s content delivery (CDN) services from the selected location or autonomous system (ASN). “Total traffic” aggregates this HTTP traffic along with the traffic associated with other Cloudflare services, including our 1.1.1.1 DNS resolver, authoritative DNS, WARP, and Spectrum, among others. (While Spectrum, WARP, and 1.1.1.1 also carry HTTP traffic, the share of HTTP traffic carried by these services is opaque to Radar, and isn’t accounted for as part of the HTTP traffic calculations.)

The bytes associated with a given request include the size of the request, the size of the headers associated with the response, and the size of the response itself. As noted above, the size of a file returned in response to a request can vary widely, depending on what was requested. The shape of the HTTP requests and HTTP bytes lines may be quite similar, but the potential variability in response sizes (in aggregate) can cause the lines to diverge, sometimes significantly so. For example, if an application regularly makes background requests to check for updates, the availability and subsequent download of a large file containing a software update would cause a spike in the HTTP bytes line, while the HTTP requests pattern remained consistent. 

As another example, consider the graph below, capturing HTTP requests and bytes traffic trends for Portugal during the first week of August. HTTP bytes traffic initially grows each day between 06:00 and 09:00 UTC (07:00 – 10:00 local summer time), increases much more slowly until around 19:00 UTC (20:00 local summer time), and then increases rapidly before peaking around 21:00 UTC (22:00 local time). This suggests that content consumed during the workday is lighter in terms of bytes (such as API traffic, as discussed above), while evening traffic is more byte-heavy (possibly due to increased consumption of media content). In contrast, after starting to increase around 06:00 UTC (07:00 local summer time), request traffic generally sees three successively higher peaks each day – occurring around 10:00, 14:00, and 21:00 UTC respectively (11:00, 15:00, and 22:00 local summer time). These peaks are most pronounced on weekdays, but are still apparent on weekend days as well, suggesting regular patterns of user activity at those times.

It is important to remember that in looking at the “HTTP requests & bytes” graphs on Radar that they are showing two different metrics, and as such, only their shape over time is comparable, not their relative sizes. (As both metrics are normalized on a 0 to 1 (Max) scale, the lines on the graph are scaled relative to the maximum normalized value of each metric, including the previous period.)

Conclusion

The addition of HTTP request metrics to Cloudflare Radar brings additional visibility to traffic trends at a global, location, and network level, complementing the existing bytes-based HTTP traffic metrics. Derived from traffic to customer web properties, these new metrics can be found on Radar’s Overview and Traffic pages.

In addition to HTTP traffic trends, visit Cloudflare Radar for additional insights around Internet disruptions, routing issues, attacks, domain popularity, and Internet quality. Follow us on social media at @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via email.

How to Build Your Own LLM with Backblaze B2 + Jupyter Notebook

Post Syndicated from Pat Patterson original https://www.backblaze.com/blog/how-to-build-your-own-llm-with-backblaze-b2-jupyter-notebook/

A decorative image showing a computer, a cloud, and a building.

Last month, Backblaze Principal Storage Cloud Storyteller, Andy Klein, and I presented a webinar, Leveraging your Cloud Storage Data in AI/ML Apps and Services, in which we explored the various ways AI/ML applications use and store data. In addition to sharing insights from our customers who leverage Backblaze B2 Cloud Object Storage for their AI/ML needs, we also demonstrated a simple AI use case: a retrieval-augmented generation (RAG) chatbot answering questions based on custom context downloaded from a private Backblaze B2 Bucket. After the webinar, I posted the demo source code to a GitHub repository: https://github.com/backblaze-b2-samples/ai-rag-examples

In this blog post, I’ll recap a little of the webinar, and explain how you can use the demo source code as a basis for your own experiments with RAG and large language models (LLMs).

But why, Pat?

You might be asking yourself, why build a chatbot when multiple online services, such as ChatGPT, are just a click away? It’s a good question. The main motivation is privacy. Suppose you want your chatbot to use confidential data in generating its responses. By running an LLM on your own machine, whether on-premises or on a virtual machine (VM) under your control in a public cloud provider’s infrastructure, you eliminate the risks of that data surfacing in response to a question from a user outside your organization.

In the webinar, I showed two Python apps running on a GPU-equipped VM at Vultr, one of Backblaze’s compute partners. The first app used the GPT4All Python SDK to create a very simple conversational chatbot running a local instance of a large language model (LLM), which it used in answering general questions. Here’s an example from the webinar:

Ask me a question: What were the causes of the First World War?

The causes of the First World War are complex and varied, but some key factors include nationalism, imperialism, militarism, alliances between nations, and the assassination of Archduke Franz Ferdinand. Nationalism led to tensions between ethnic groups and their respective governments, while imperialism saw countries competing for territories and resources around the world. Militarism also played a role as many European powers built up large armies and navies in preparation for potential conflicts. The complex web of alliances between nations meant that any conflict could quickly escalate into a latIer war, and the assassination of Archduke Franz Ferdinand by a Serbian nationalist was the spark that ignited the powder keg of tensions already present in Europe at the time.

Now, I’m not a history scholar, but that looks like a pretty good answer to me! (History scholars, you are welcome to correct me.)

The second app used the Langchain framework to implement a more elaborate chatbot, again running on my own machine at Vultr, that used PDF data downloaded from a private bucket in Backblaze B2 as context for answering questions. As much as I love our webinar attendees, I didn’t want to share genuinely confidential data with them, so I used our Backblaze B2 Cloud Storage documentation as context. The chatbot was configured to use that context, and only that context, in answering questions. From the webinar:

Ask me a question about Backblaze 82: What's the difference between the master application key and a standard application key?

The master application key provides complete access to your account with all capabilities, access to all buckets, and has no file prefix restrictions or expiration. On the other hand, a standard application key is limited to the level of access that a user needs and can be specific to a bucket.

Ask me a question about Backblaze B2: What were the causes of the First World War?

The exact cause of the First World War is not mentioned in these documents.

The chatbot provides a comprehensive, accurate answer to the question on Backblaze application keys, but doesn’t answer the question on the causes of the First World War, since it was configured to use only the supplied context in generating its response.

During the webinar’s question-and-answer session, an attendee posed an excellent question: “Can you ask [the chatbot] follow-up questions where it can use previous discussions to build a proper answer based on content?” I responded, “Yes, absolutely; I’ll extend the demo to do exactly that before I post it to GitHub.” What follows are instructions for building a simple RAG chatbot, and then extending it to include message history.

Building a simple RAG chatbot

After the webinar, I rewrote both demo apps as Jupyter notebooks, which allowed me to add commentary to the code. I’ll provide you with edited highlights here, but you can find all of the details in the RAG demo notebook.

The first section of the notebook focuses on downloading PDF data from the private Backblaze B2 Bucket into a vector database, a storage mechanism particularly well suited for use with RAG. This process involves retrieving each PDF, splitting it into uniformly sized segments, and loading the segments into the database. The database stores each segment as a vector with many dimensions—we’re talking hundreds, or even thousands. The vector database can then vectorize a new piece of text—say a question from a user—and very quickly retrieve a list of matching segments.

Since this process can take significant time—about four minutes on my MacBook Pro M1 for the 225 PDF files I used, totaling 58MB of data—the notebook also shows you how to archive the resulting vector data to Backblaze B2 for safekeeping and retrieve it when running the chatbot later.

The vector database provides a “retriever” interface that takes a string as input, performs a similarity search on the vectors in the database, and outputs a list of matching documents. Given the vector database, it’s easy to obtain its retriever:

retriever = vectorstore.as_retriever()

The prompt template I used in the webinar provides the basic instructions for the LLM: use this context to answer the user’s question, and don’t go making things up!

prompt_template = """Use the following pieces of context to answer the question at the end. 
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    
    {context}
    
    Question: {question}
    Helpful Answer:"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

The RAG demo app creates a local instance of an LLM, using GPT4All with Nous Hermes 2 Mistral DPO, a fast chat-based model. Here’s an abbreviated version of the code:

model = GPT4All(
    model='Nous-Hermes-2-Mistral-7B-DPO.Q4_0.gguf',
    max_tokens=4096,
    device='gpu'
)

LangChain, as its name suggests, allows you to combine these components into a chain that can accept the user’s question and generate a response.

chain = (
        {"context": retriever, "question": RunnablePassthrough()}
        | prompt
        | model
        | StrOutputParser()
)

As mentioned above, the retriever takes the user’s question as input and returns a list of matching documents. The user’s question is also passed through the first step, and, in the second step, the prompt template combines the context with the user’s question to form the input to the LLM. If we were to peek inside the chain as it was processing the question about application keys, the prompt’s output would look something like this:

Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

<Text of first matching document>

<Text of second matching document>

Question: What's the difference between the master application key and a standard application key?

Helpful Answer:

This is the basis of RAG: building an LLM prompt that contains the information required to generate an answer, then using the LLM to distill that prompt into an answer. The final step of the chain transforms the data structure emitted by the LLM into a simple string for display.

Now that we have a chain, we can ask it a question. Again, abbreviated from the sample code:

question = 'What is the difference between the master application key and a standard application key?'
answer = chain.invoke(question)

Adding message history to the simple RAG chatbot

The first step of extending the chatbot is to give the LLM new instructions, similar to its previous prompt template, but including the message history:

prompt_template = """Use the following pieces of context and the message history to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
    
Context: {context}
    
History: {history}
    
Question: {question}

Helpful Answer:"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question", "history"]
)

The chain must be modified slightly to accommodate the message history:

chain = (
    {
        "context": (
                itemgetter("question")
                | retriever
        ),
        "question": itemgetter("question"),
        "history": itemgetter("history")
    }
    | prompt
    | model
    | StrOutputParser()
)

Now, we define a very simple in-memory message store that uses a session_id parameter to manage multiple simultaneous conversations:

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

LangChain provides a wrapper, RunnableWithMessageHistory, that combines the message store with the above chain to create a new chain with message history capability:

with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

Now we can feed a series of related questions into the new chain:

questions = [
    'What is the difference between the master application key and a standard application key?',
    'Which one would I use to work with a single bucket?',
    'Can you tell me anything more about this topic?'
]

for question in questions:
    print(f'\n{question}\n')
    answer = with_message_history.invoke(
        {"question": question},
        config={"configurable": {"session_id": "abc123"}},
    )
    print(f'{answer}\n')

I have to admit, I was pleasantly surprised by the results:

What is the difference between the master application key and a standard application key?

A master application key grants broad access privileges, while a standard application key is limited to the level of access that a user needs.

Which one would I use to work with a single bucket?

You would use a standard application key to work with a single bucket as it has limited access and only grants permissions needed for specific tasks, unlike the master application key which provides broad access privileges.

Can you tell me anything more about this topic?

Sure! The master application key is typically used by developers during development or testing phases to grant full access to all resources in a Backblaze B2 account, while the standard application key provides limited permissions and should be used for production environments where security is paramount.

Processing this series of questions on my MacBook Pro M1 with no GPU-acceleration took three minutes and 25 seconds, and just 52 seconds with its 16-core GPU. For comparison, I spun up a VM at Ori, another Backblaze partner offering GPU VM instances, with an Nvidia L4 Tensor Core GPU and 24GB of VRAM. The only code change required was to set the LLM device to ‘cuda’ to select the Nvidia GPU. The Ori VM answered those same questions in just 18 seconds.

An image of an Nvidia L4 Tensor Core GPU
The Nvidia L4 Tensor Core GPU: not much to look at, but crazy-fast AI inference!

Go forth and experiment

One of the reasons I refactored the demo apps was that notebooks allow an interactive, experimental approach. You can run the code in a cell, make a change, then re-run it to see the outcome. The RAG demo repository includes instructions for running the notebooks, and both the GPT4All and LangChain SDKs can run LLMs on machines with or without a GPU. Use the code as a starting point for your own exploration of AI, and let us know how you get on in the comments!

The post How to Build Your Own LLM with Backblaze B2 + Jupyter Notebook appeared first on Backblaze Blog | Cloud Storage & Cloud Backup

Incus 6.4 released

Post Syndicated from corbet original https://lwn.net/Articles/985482/

Version 6.4 of the Incus container manager is out.

This release builds upon the recently added OCI support from Incus
6.3, making it even easier to run application containers. It also
adds a number of useful new features for clustered and larger
environments with more control on the virtual CPU used when live
migrating VMs and finer grained resource constraints within
projects.

See this
announcement
for details.

Security updates for Tuesday

Post Syndicated from corbet original https://lwn.net/Articles/985481/

Security updates have been issued by Debian (kernel and roundcube), Fedora (microcode_ctl, pypy, python2.7, and python3.6), Oracle (389-ds-base, httpd, kernel, kernel-container, and linux-firmware), Red Hat (kernel-rt), SUSE (firefox, kubernetes1.23, libqt5-qtbase, openssl-1_1, python-gunicorn, python-Twisted, python-urllib3, and qt6-base), and Ubuntu (linux-aws-5.15, linux-gkeop-5.15, linux-ibm, linux-ibm-5.15, linux-raspi, linux-azure, linux-azure-5.15, linux-azure-fde, linux-azure-fde-5.15, linux-oem-6.8, linux-oracle-5.15, and qemu).

Introducing HTTP request traffic insights on Cloudflare Radar

Post Syndicated from David Belson original https://blog.cloudflare.com/http-requests-on-cloudflare-radar


Historically, traffic graphs on Cloudflare Radar have displayed two metrics: total traffic and HTTP traffic. These graphs show normalized traffic volumes measured in bytes, derived from aggregated NetFlow data. (NetFlow is a protocol used to collect metadata about IP traffic flows traversing network devices.) Today, we’re adding another metric that reflects the number of HTTP requests, normalized over the same time period. By comparing bytes with requests, readers can gain additional insights into traffic patterns and user behavior. Below, we review how this new data has been incorporated into Radar, and explore HTTP request traffic in more detail.  

Note that while we refer to “HTTP request traffic” in this post and on Radar, the term encompasses requests made in the clear over HTTP and over encrypted connections using HTTPS – the latter accounts for ~95% of all requests to Cloudflare during July 2024.

New and updated graphs

Graphs including HTTP request-based traffic data have been added to the Overview and Traffic sections on Cloudflare Radar. On the Overview page, the “Traffic trends” graph now includes a drop-down selector at the upper right, where you can choose between “Total & HTTP bytes” and “HTTP requests & bytes”. We explore the distinction between these further in the following sections.

The default “Total & HTTP bytes” selection displays a time series graph, showing total bytes and HTTP bytes traffic over time, as Radar has done for several years now.

Selecting “HTTP requests & bytes” from the dropdown switches the view to a time series graph that HTTP requests traffic and HTTP bytes traffic over time. In both graphs, users can click on a metric in the legend to deselect it and remove it from the graph. These (de)selections are maintained when a user chooses to download or save a graph.

In addition, we’ve added a “Protocols” summary next to the graph that shows the share of bytes over the selected time period that HTTP accounts for, and the remaining aggregate share associated with the protocols used by other non-HTTP Cloudflare services (such as DNS, WARP, etc.). For most locations or ASNs, HTTP traffic will comprise the majority share of bytes-based traffic.

On Radar’s Traffic page, we have added the HTTP requests metric to the “Traffic volume” graph at the top of the page, allowing you to see how request volume has changed during the selected time period as compared to the previous period, in addition to the changes in the bytes-based metrics.

A new standalone request-based “HTTP traffic” graph was also added to the Traffic page, just below the bytes-based “Traffic trends” graph. This new graph shows normalized HTTP request traffic volume across the selected time period, and by default, also compares it with the previous time period.

Similar to other Radar graphs, these new HTTP request-based graphs can also be downloaded, copied to the clipboard, or embedded in other websites – just click on the share icon.

As always, the underlying data is also available through the Radar API. The “HTTP requests Time Series” API endpoint returns normalized HTTP request time series data across the specified time period for the requested location or autonomous system (ASN).

What is HTTP request traffic?

An HTTP GET request is a message sent from a client (such as your web browser) to a web server (such as one operated by Cloudflare), asking for a particular resource (file). In addition to returning the requested resource, which could range from a single-pixel GIF accounting for just a few bytes, to an API call that returns a few kilobytes of data, to a multi-gigabyte software package, the Web server also returns a set of headers, which can include information about the content type, the last time the resource was modified, cookie information, cacheability, and more. While GET requests account for the overwhelming majority of HTTP request traffic, such traffic also includes other HTTP request methods including HEAD, POST, PUT, and more.

Cloudflare temporarily logs HTTP requests received by our network, including associated header information and “metadata” about the request, such as the bot score computed for the request and the associated cache status. Request logs for a customer’s web properties are available for them to download, and after processing and analysis, this data is also presented in the Analytics section of the Cloudflare dashboard. The HTTP request data now available on Radar is based on a sample of this log data, aggregated across Cloudflare’s global customer base.

The value of request-based traffic insights

Cloudflare Radar already has HTTP data, so why add more? One key reason for analyzing and including HTTP request traffic is resilience. Having multiple sources of truth with respect to HTTP traffic allows us to ​​better and more quickly distinguish between real events (such as an Internet disruption in a given country or network) and data pipeline issues.

While bytes-based metrics provide a reasonable proxy into human (user) behavior, especially with respect to activity surrounding Internet disruptions, request-based metrics provide an even better perspective. A lot of HTTP traffic involves relatively small responses – especially API traffic, which now accounts for 60% of all traffic. Furthermore, response sizes can vary widely, ranging from a single-pixel GIF accounting for just a few bytes, to an API call that returns a few kilobytes of data, to a multi-gigabyte software package

To that end, the scope of user activity may be insufficiently reflected by a bytes-based metric, or buried in the noise, whereas request activity provides a cleaner signal and a more direct proxy for user activity. This is especially important as we examine the restoration of connectivity after an Internet disruption, attempting to ascertain when activity has returned to “expected” pre-disruption levels.

Finally, incorporating request-based traffic insights into Radar is simply extending the way that the data is already being used on the site. All the graphs, maps, and tables presented on Radar’s Adoption & Usage page, are based on analysis of HTTP request traffic, making use of information contained within request headers (such as HTTP version or user agent) or characteristics of the underlying connection (such as IP version).

Bytes vs requests – what’s the difference?

The current “HTTP traffic” view aggregates the bytes associated with HTTP requests to Cloudflare’s content delivery (CDN) services from the selected location or autonomous system (ASN). “Total traffic” aggregates this HTTP traffic along with the traffic associated with other Cloudflare services, including our 1.1.1.1 DNS resolver, authoritative DNS, WARP, and Spectrum, among others. (While Spectrum, WARP, and 1.1.1.1 also carry HTTP traffic, the share of HTTP traffic carried by these services is opaque to Radar, and isn’t accounted for as part of the HTTP traffic calculations.)

The bytes associated with a given request include the size of the request, the size of the headers associated with the response, and the size of the response itself. As noted above, the size of a file returned in response to a request can vary widely, depending on what was requested. The shape of the HTTP requests and HTTP bytes lines may be quite similar, but the potential variability in response sizes (in aggregate) can cause the lines to diverge, sometimes significantly so. For example, if an application regularly makes background requests to check for updates, the availability and subsequent download of a large file containing a software update would cause a spike in the HTTP bytes line, while the HTTP requests pattern remained consistent.

As another example, consider the graph below, capturing HTTP requests and bytes traffic trends for Portugal during the first week of August. HTTP bytes traffic initially grows each day between 06:00 and 09:00 UTC (07:00 – 10:00 local summer time), increases much more slowly until around 19:00 UTC (20:00 local summer time), and then increases rapidly before peaking around 21:00 UTC (22:00 local time). This suggests that content consumed during the workday is lighter in terms of bytes (such as API traffic, as discussed above), while evening traffic is more byte-heavy (possibly due to increased consumption of media content). In contrast, after starting to increase around 06:00 UTC (07:00 local summer time), request traffic generally sees three successively higher peaks each day – occurring around 10:00, 14:00, and 21:00 UTC respectively (11:00, 15:00, and 22:00 local summer time). These peaks are most pronounced on weekdays, but are still apparent on weekend days as well, suggesting regular patterns of user activity at those times.

It is important to remember that in looking at the “HTTP requests & bytes” graphs on Radar that they are showing two different metrics, and as such, only their shape over time is comparable, not their relative sizes. (As both metrics are normalized on a 0 to 1 (Max) scale, the lines on the graph are scaled relative to the maximum normalized value of each metric, including the previous period.)

Conclusion

The addition of HTTP request metrics to Cloudflare Radar brings additional visibility to traffic trends at a global, location, and network level, complementing the existing bytes-based HTTP traffic metrics. Derived from traffic to customer web properties, these new metrics can be found on Radar’s Overview and Traffic pages.
In addition to HTTP traffic trends, visit Cloudflare Radar for additional insights around Internet disruptions, routing issues, attacks, domain popularity, and Internet quality. Follow us on social media at @CloudflareRadar (X), noc.social/@cloudflareradar (Mastodon), and radar.cloudflare.com (Bluesky), or contact us via email.

On the Voynich Manuscript

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/08/on-the-voynich-manuscript.html

Really interesting article on the ancient-manuscript scholars who are applying their techniques to the Voynich Manuscript.

No one has been able to understand the writing yet, but there are some new understandings:

Davis presented her findings at the medieval-studies conference and published them in 2020 in the journal Manuscript Studies. She had hardly solved the Voynich, but she’d opened it to new kinds of investigation. If five scribes had come together to write it, the manuscript was probably the work of a community, rather than of a single deranged mind or con artist. Why the community used its own language, or code, remains a mystery. Whether it was a cloister of alchemists, or mad monks, or a group like the medieval Béguines—a secluded order of Christian women—required more study. But the marks of frequent use signaled that the manuscript served some routine, perhaps daily function.

Davis’s work brought like-minded scholars out of hiding. In just the past few years, a Yale linguist named Claire Bowern had begun performing sophisticated analyses of the text, building on the efforts of earlier scholars and on methods Bowern had used with undocumented Indigenous languages in Australia. At the University of Malta, computer scientists were figuring out how to analyze the Voynich with tools for natural-language processing. Researchers found that the manuscript’s roughly 38,000 words—and 9,000-word vocabulary—had many of the statistical hallmarks of actual language. The Voynich’s most common word, whatever it meant, appeared roughly twice as often as the second-most-common word and three times as often as the third-commonest, and so on—a touchstone of natural language known as Zipf’s law. The mix of word lengths and the ratio of unique words to total words were similarly language-like. Certain words, moreover, seemed to follow one another in predictable order, a possible sign of grammar.

Finally, each of the text’s sections—as defined by the drawings of plants, stars, bathing women, and so on—had different sets of overrepresented words, just as one would expect in a real book whose chapters focused on different subjects.

Spelling was the chief aberration. The Voynich alphabet—if that’s what it was—appeared to have a conventional 20-odd letters. But compared with known languages, too many of those letters repeated in the same order, both within words and across neighboring words, like a children’s rhyme. In some places, the spellings of adjacent words so converged that a single word repeated two or three times in a row. A rough English equivalent might be something akin to “She sells sea shells by the sea shore.” Another possibility, Bowern told me, was something like pig Latin, or the Yiddishism—known as “shm-reduplication”—that begets phrases such as fancy shmancy and rules shmules.

Rising together: honoring Cloudflare’s outstanding partners

Post Syndicated from Tom Evans original https://blog.cloudflare.com/partner-award-winners-2023


We’re thrilled to announce Cloudflare’s worldwide 2023 Channel Partner Award winners! Partners are crucial to Cloudflare’s success, extending the solutions and support that customers need to control application complexity, reduce cyber risk, and cut costs, all with a high level of customer satisfaction.

PowerUP Partners First

This year, we again received CRN’s highest accolade of a 5-star ranking for our Partner Program. Through our expanded Cloudflare PowerUP Partner Program, we’re ensuring Cloudflare’s partnerships and alliances continue delivering strong results to joint customers across sectors worldwide. We’re focused on making it easier for our partners to work with us and grow their business with us. The Cloudflare team is all about helping partners:

  • Be innovative by transforming how customers connect, protect, and build with Cloudflare security, speed, programmability, and resilience.

  • Increase profitability by growing revenue and delivering more value at scale to rapidly grow business and expand reach.

  • Accelerate GTM by benefiting from sales and marketing support, streamlined processes, and transparent pricing to close deals quickly.

From comprehensive training through Cloudflare University to expert support across departments, partners are equipped to drive digital transformation and modernize IT infrastructures for their customers in a competitive market.

Leaders Who Understand the Power of Partnership

It’s been a thrilling start to my tenure as Cloudflare’s Chief Partner Officer to watch remarkable growth and partner success. Our team has amplified opportunities, especially in the rapidly expanding area of secure access service edge (SASE), and our channel strategy has already demonstrated impressive results.

The overwhelmingly positive feedback from our partners underscores the strength of Cloudflare’s technology and our dedication to serving our partners. With substantial investments in the partner community, streamlined processes, and a focus on AI integration, together we are poised to drive significant growth and innovation.

Congratulations to Our Partners’ Outstanding Contributions and Achievements

This year’s Cloudflare partner award winners have exemplified excellence and innovation in collaborating with Cloudflare. Their dedication and success highlight the transformative potential of the channel and our collaboration.

Americas Partner Awards


Technology Services Distributor of the Year: 
AVANT Communications
Honors the top-performing technology services distributor that has best represented Cloudflare and enabled partners to secure sales and growth revenue streams. 

Partner of the Year: GuidePoint Security
Honors the top-performing partner that has demonstrated phenomenal sales achievement in 2023. 

Growth Partners of the Year:  CDW and Defy Security
Honors the partners who made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

Technical Excellence Award (Pre-Sales):  Adapture
Honors the partner company who demonstrated great knowledge and expertise in leading the customer’s Cloudflare pre-sales and proof of concept (POC) experience. 

Partner SE Champion of the Year: Nyron Samaroo (CDW Canada) & Deepika Nath (Kyndryl)
Honors the individual partner Sales Engineers (SEs) who have demonstrated depth of knowledge and expertise in Cloudflare solutions and went above and beyond in delivering the Cloudflare experience for our joint customers.

Global Systems Integrator (GSI) Partner of the Year: Accenture
Honors the top-performing GSI partner.

Latin America Awards  


Technology Services Distributor of the Year: 
TD SYNNEX (LATAM) 
Honors the top-performing technology services distributor that has best represented Cloudflare and enabled partners to secure sales and growth revenue streams. 

Partner of the Year:
Honors the partner who, although new to the Cloudflare Partner Network in 2023, has already made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

  • IntegraTEC (LATAM)

  • NeoSecure by SEK (Nola/Sola) 

  • Cipher (Brazil) 

  • Xenergix (Mexico) 

Certification Champion of the Year: Tripla  (Brazil)
This award honors the partner whose teams earned the highest total number of Cloudflare certifications during 2023. 

APJC Partner Awards


Distributor of the Year:
  Dicker Data Limited
Honors the top performing distributor who has best represented Cloudflare and enabled partners to secure sales and growth revenue streams.

Service Delivery Partner of the Year: Master Concept (Hong Kong) Ltd.
Honors the top-performing services solution provider.

Partner of the Year:  Centcloud Technologies Limited
Honors the partner who, although new to the Cloudflare Partner Network in 2023, has already made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

Customer Win of the Year: Megazone Cloud Corporation 
Honors the outstanding achievement of a partner who secured a significant customer deal through exceptional collaboration and innovation.

New Partner Win of the Year: Techdirect Pte Ltd
Honors the partner who has brought in the largest, most strategic deal and deployed a comprehensive end-to-end security, performance, and reliability solution to a customer.

Most Valuable Player of the Year: Omni Intelligent Services, Inc.
Honors top partner achievers who not only provided stellar service to our joint customers but also built new business value by tapping into the power of networks, relationships, and ecosystems.

Technical Excellence Award (Pre-Sales): Airowire Networks PVT LTD
Honors the partner company whose SEs demonstrated great knowledge and expertise in leading the customer’s Cloudflare pre-sales and POC experience.

Marketing Champion of the Year: Softdebut Co.,Ltd
Honors the partner company that demonstrated outstanding collaboration and business outcomes in marketing Cloudflare solutions.

Partner SE Champion of the Year: David Woon (Kordia Limited)
Honors the individual partner SEs who have demonstrated depth of knowledge and expertise in Cloudflare solutions and went above and beyond in delivering the Cloudflare experience for our joint customers.

Rising Star Award: The Missing Link Security Pty Ltd
Honors individual partner representatives who, although new to our collaboration, have already made a significant, positive contribution both to our partnership and to driving outcomes for our customers.

Growth Partner of the Year: NTT Australia Pty Ltd
Honors the partner who made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

EMEA Partner Awards


Distributor of the Year:
V-Valley advanced Solutions España SAU
Honors the top-performing distributor who has best represented Cloudflare and enabled partners to secure sales and growth revenue streams.

MSP of the Year: Orange Cyberdefense France
Honors the top-performing managed services solutions provider.

GSI of the Year: Eviden France SAS
Honors the top-performing GSI partner.

Partner of the Year: Liquid C2
Honors the top-performing partner that has demonstrated phenomenal sales achievement in 2023.

New Partner of the Year: Focus Group and Smartflare
Honors the partners who, although new to the Cloudflare Partner Network in 2023, have already made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

Customer Win of the Year: Liquid C2
Honors the outstanding achievement of a partner who secured a major customer deal through exceptional collaboration and innovation.

Rising Star Award: Copy Cat Group and Cloudhop
Honors individual partner representatives who, although new to our collaboration, have already made a significant, positive contribution both to our partnership and to driving outcomes for our customers.

Most Valuable Player of the Year: Nanosek
Honors the top partner achiever who not only provided stellar service to our joint customers but also built new business value by tapping into the power of networks, relationships, and ecosystems.

Technical Excellence Award (Pre-Sales):
Honors the partner company whose SEs demonstrated great knowledge and expertise in leading the customer’s Cloudflare pre-sales and POC experience.

  • Jean-Baptiste Voron (Eviden France SAS)

  • Ganesh the Awesome (Globaldots)

  • Martin Campos (Orange Cyberdefense)

Partner SE Champion of the Year: Ivan Rudnytskyi (Bakotech s.r.o.)
Honors the individual partner SE who demonstrated depth of knowledge and expertise in Cloudflare solutions and went above and beyond in delivering the Cloudflare experience for our joint customers.

Certification Champion of the Year: Kaemi GmbH
This award honors the partner whose teams earned the highest total number of Cloudflare certifications during 2023.

Marketing Champion of the Year: Infinigate Deutschland GmbH and Alter Way
Honors partner companies who have demonstrated outstanding collaboration and business outcomes in marketing Cloudflare solutions.

To learn more about the Cloudflare PowerUP Partner Program, please check out the resources below:

Rising together: honoring Cloudflare’s outstanding partners

Post Syndicated from Tom Evans original https://blog.cloudflare.com/partner-award-winners-2023

This post is also available in 简体中文, 繁體中文, 日本語 and 한국어.

We’re thrilled to announce Cloudflare’s worldwide 2023 Channel Partner Award winners! Partners are crucial to Cloudflare’s success, extending the solutions and support that customers need to control application complexity, reduce cyber risk, and cut costs, all with a high level of customer satisfaction.

PowerUP partners first

This year, we again received CRN’s highest accolade of a 5-star ranking for our Partner Program. Through our expanded Cloudflare PowerUP Partner Program, we’re ensuring Cloudflare’s partnerships and alliances continue delivering strong results to joint customers across sectors worldwide. We’re focused on making it easier for our partners to work with us and grow their business with us. The Cloudflare team is all about helping partners:

  • Be innovative by transforming how customers connect, protect, and build with Cloudflare security, speed, programmability, and resilience.
  • Increase profitability by growing revenue and delivering more value at scale to rapidly grow business and expand reach.
  • Accelerate GTM by benefiting from sales and marketing support, streamlined processes, and transparent pricing to close deals quickly.

From comprehensive training through Cloudflare University to expert support across departments, partners are equipped to drive digital transformation and modernize IT infrastructures for their customers in a competitive market.

Leaders who understand the power of partnership

It’s been a thrilling start to my tenure as Cloudflare’s Chief Partner Officer to watch remarkable growth and partner success. Our team has amplified opportunities, especially in the rapidly expanding area of secure access service edge (SASE), and our channel strategy has already demonstrated impressive results.

The overwhelmingly positive feedback from our partners underscores the strength of Cloudflare’s technology and our dedication to serving our partners. With substantial investments in the partner community, streamlined processes, and a focus on AI integration, together we are poised to drive significant growth and innovation.

Congratulations to our partners’ outstanding contributions and achievements

This year’s Cloudflare partner award winners have exemplified excellence and innovation in collaborating with Cloudflare. Their dedication and success highlight the transformative potential of the channel and our collaboration.

Americas partner awards

Technology Services Distributor of the Year: AVANT Communications
Honors the top-performing technology services distributor that has best represented Cloudflare and enabled partners to secure sales and growth revenue streams.

Partner of the Year: GuidePoint Security
Honors the top-performing partner that has demonstrated phenomenal sales achievement in 2023.

Growth Partners of the Year: CDW and Defy Security
Honors the partners who made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

Technical Excellence Award (Pre-Sales): Adapture
Honors the partner company who demonstrated great knowledge and expertise in leading the customer’s Cloudflare pre-sales and proof of concept (POC) experience.

Partner SE Champion of the Year: Nyron Samaroo (CDW Canada) & Deepika Nath (Kyndryl)
Honors the individual partner Sales Engineers (SEs) who have demonstrated depth of knowledge and expertise in Cloudflare solutions and went above and beyond in delivering the Cloudflare experience for our joint customers.

Global Systems Integrator (GSI) Partner of the Year: Accenture
Honors the top-performing GSI partner.

Latin America awards

Technology Services Distributor of the Year: TD SYNNEX (LATAM)
Honors the top-performing technology services distributor that has best represented Cloudflare and enabled partners to secure sales and growth revenue streams.

Partner of the Year:
Honors the partner who, although new to the Cloudflare Partner Network in 2023, has already made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

  • IntegraTEC (LATAM)
  • NeoSecure by SEK (Nola/Sola)
  • Cipher (Brazil)
  • Xenergix (Mexico)

Certification Champion of the Year: Tripla  (Brazil)
This award honors the partner whose teams earned the highest total number of Cloudflare certifications during 2023.

APJC partner awards

Distributor of the Year:  Dicker Data Limited
Honors the top performing distributor who has best represented Cloudflare and enabled partners to secure sales and growth revenue streams.

Service Delivery Partner of the Year: Master Concept (Hong Kong) Ltd.
Honors the top-performing services solution provider.

Partner of the Year:  Centcloud Technologies Limited
Honors the partner who, although new to the Cloudflare Partner Network in 2023, has already made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

Customer Win of the Year: Megazone Cloud Corporation
Honors the outstanding achievement of a partner who secured a significant customer deal through exceptional collaboration and innovation.

New Partner Win of the Year: Techdirect Pte Ltd
Honors the partner who has brought in the largest, most strategic deal and deployed a comprehensive end-to-end security, performance, and reliability solution to a customer.

Most Valuable Player of the Year: Omni Intelligent Services, Inc.
Honors top partner achievers who not only provided stellar service to our joint customers but also built new business value by tapping into the power of networks, relationships, and ecosystems.

Technical Excellence Award (Pre-Sales): Airowire Networks PVT LTD
Honors the partner company whose SEs demonstrated great knowledge and expertise in leading the customer’s Cloudflare pre-sales and POC experience.

Marketing Champion of the Year: Softdebut Co., Ltd
Honors the partner company that demonstrated outstanding collaboration and business outcomes in marketing Cloudflare solutions.

Partner SE Champion of the Year: David Woon (Kordia Limited)
Honors the individual partner SEs who have demonstrated depth of knowledge and expertise in Cloudflare solutions and went above and beyond in delivering the Cloudflare experience for our joint customers.

Rising Star Award: The Missing Link Security Pty Ltd
Honors individual partner representatives who, although new to our collaboration, have already made a significant, positive contribution both to our partnership and to driving outcomes for our customers.

Growth Partner of the Year: NTT Australia Pty Ltd
Honors the partner who made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

EMEA partner awards

Distributor of the Year: V-Valley advanced Solutions España SAU
Honors the top-performing distributor who has best represented Cloudflare and enabled partners to secure sales and growth revenue streams.

MSP of the Year: Orange Cyberdefense France
Honors the top-performing managed services solutions provider.

GSI of the Year: Eviden France SAS
Honors the top-performing GSI partner.

Partner of the Year: Liquid C2
Honors the top-performing partner that has demonstrated phenomenal sales achievement in 2023.

New Partner of the Year: Focus Group and Smartflare
Honors the partners who, although new to the Cloudflare Partner Network in 2023, have already made substantial investments to grow our shared business, achieving not only full certification compliance but also exceeding revenue targets.

Customer Win of the Year: Liquid C2
Honors the outstanding achievement of a partner who secured a major customer deal through exceptional collaboration and innovation.

Rising Star Award: Copy Cat Group and Cloudhop
Honors individual partner representatives who, although new to our collaboration, have already made a significant, positive contribution both to our partnership and to driving outcomes for our customers.

Most Valuable Player of the Year: Nanosek
Honors the top partner achiever who not only provided stellar service to our joint customers but also built new business value by tapping into the power of networks, relationships, and ecosystems.

Technical Excellence Award (Pre-Sales):
Honors the partner company whose SEs demonstrated great knowledge and expertise in leading the customer’s Cloudflare pre-sales and POC experience.

  • Jean-Baptiste Voron (Eviden France SAS)
  • Ganesh the Awesome (Globaldots)
  • Martin Campos (Orange Cyberdefense)

Partner SE Champion of the Year: Ivan Rudnytskyi (Bakotech s.r.o.)
Honors the individual partner SE who demonstrated depth of knowledge and expertise in Cloudflare solutions and went above and beyond in delivering the Cloudflare experience for our joint customers.

Certification Champion of the Year: Kaemi GmbH
This award honors the partner whose teams earned the highest total number of Cloudflare certifications during 2023.

Marketing Champion of the Year: Infinigate Deutschland GmbH and Alter Way
Honors partner companies who have demonstrated outstanding collaboration and business outcomes in marketing Cloudflare solutions.

To learn more about the Cloudflare PowerUP Partner Program, please check out the resources below:

Automate Amazon Redshift Advisor recommendations with email alerts using an API

Post Syndicated from Ranjan Burman original https://aws.amazon.com/blogs/big-data/automate-amazon-redshift-advisor-recommendations-with-email-alerts-using-an-api/

Amazon Redshift is a fast, scalable, secure, and fully managed cloud data warehouse that enables you to analyze your data at scale. Amazon Redshift now allows you to programmatically access Amazon Redshift Advisor recommendations through an API, enabling you to integrate recommendations about how to improve your provisioned cluster performance into your own applications.

Amazon Redshift Advisor offers recommendations about optimizing your Redshift cluster performance and helps you save on operating costs. Advisor develops its customized recommendations by analyzing performance and usage metrics for your cluster and displays recommendations that should have a significant impact on performance and operations. Now, with the ability to programmatically access these recommendations through the ListRecommendations API, you can make recommendations available to implement on-demand or automatically through your own internal applications and tools without the need to access the Amazon Redshift console.

In this post, we show you how to use the ListRecommendations API to set up email notifications for Advisor recommendations on your Redshift cluster. These recommendations, such as identifying tables that should be vacuumed to sort the data or finding table columns that are candidates for compression, can help improve performance and save costs.

How to access Redshift Advisor recommendations

To access Advisor recommendations on the Amazon Redshift console, choose Advisor in the navigation pane. You can expand each recommendation to see more details, and sort and group recommendations.

You can also use the ListRecommendations API to automate receiving the Advisor recommendations and programmatically implement them. The API returns a list of recommended actions that can be parsed and implemented. The API and SDKs also enable you to set up workflows to use Advisor programmatically for automated optimizations. These automated periodic checks of Advisor using cron scheduling along with implementing the changes can help you keep Redshift clusters optimized automatically without manual intervention.

You can also use the list-recommendations command in the AWS Command Line Interface (AWS CLI) to invoke the Advisor recommendations from the command line and automate the workflow through scripts.

Solution overview

The following diagram illustrates the solution architecture.

The solution workflow consists of the following steps:

  1. An Amazon EventBridge schedule invokes an AWS Lambda function to retrieve Advisor recommendations.
  2. Advisor generates recommendations that are accessible through an API.
  3. Optionally, this solution stores the recommendations in an Amazon Simple Storage Service (Amazon S3) bucket.
  4. Amazon Simple Notification Service (Amazon SNS) automatically sends notifications to end-users.

Prerequisites

To deploy this solution, you should have the following:

Deploy the solution

Complete the following steps to deploy the solution:

  1. Choose Launch Stack.
    Launch Cloudformation Stack
  1. For Stack name, enter a name for the stack, for example, blog-redshift-advisor-recommendations.
  2. For SnsTopicArn, enter the SNS topic Amazon Resource Name (ARN) for receiving the email alerts.
  3. For ClusterIdentifier, enter your Redshift cluster name if you want to receive Advisor notifications for a particular cluster. If you leave it blank, you will receive notifications for all Redshift provisioned clusters in your account.
  4. For S3Bucket, enter the S3 bucket name to store the detailed Advisor recommendations in a JSON file. If you leave it blank, this step will be skipped.
  5. For ScheduleExpression, enter the frequency in cron format to receive Advisor recommendation alerts. For this post, we want to receive alerts every Sunday at 14:00 UTC, so we enter cron(0 14 ? * SUN *).

Make sure to provide the correct cron time expression when deploying the CloudFormation stack to avoid any failures.

  1. Keep all options as default under Configure Stack options and choose Next.
  2. Review the settings, select the acknowledge check box, and create the stack.

If the CloudFormation stack fails for any reason, refer to Troubleshooting CloudFormation.

After the CloudFormation template is deployed, it will create the following resources:

Workflow details

Let’s take a closer look at the Lambda function and the complete workflow.

The input values provided for SnsTopicArn, ClusterIdentifier, and S3Bucket in the CloudFormation stack creation are set as environmental variables in the Lambda function. If the ClusterIdentifier parameter is None, then it will invoke the ListRecommendations API to generate Advisor recommendations for all the clusters within the account (same AWS Region). Otherwise, it will pass the ClusterIdentifier value and generate Advisor recommendations only for the given cluster. If the input parameter S3Bucket is provided, the solution creates a folder named RedshiftAdvisorRecommendations and generates the Advisor recommendations file in JSON format within it. If a value for S3Bucket isn’t provided, this step will be skipped.

Next, the function will summarize recommendations by each provisioned cluster (for all clusters in the account or a single cluster, depending on your settings) based on the impact on performance and cost as HIGH, MEDIUM, and LOW categories. An SNS notification email will be sent to the subscribers with the summarized recommendations.

SQL commands are included as part of the Advisor’s recommended action. RecommendedActionType-SQL summarizes the number of SQL actions that can be applied using SQL commands.

If there are no recommendations available for any cluster, the SNS notification email will be sent notifying there are no Advisor recommendations.

An EventBridge rule is created to invoke the Lambda function based on the frequency you provided in the stack parameters. By default, it’s scheduled to run weekly each Sunday at 14:00 UTC.

The following is a screenshot of a sample SNS notification email.

Clean up

We recommend deleting the CloudFormation stack if you aren’t going to continue using the solution. This will avoid incurring any additional costs from the resources created as part of the solution.

Conclusion

In this post, we discussed how Redshift Advisor offers you specific recommendations to improve the performance of and decrease the operating costs for your Redshift cluster. We also showed you how to programmatically access these recommendations through an API and implement them on-demand or automatically using your own internal tools without having access to the Amazon Redshift console.

By integrating these recommendations into your workflows, you can make informed decisions and implement best practices to optimize the performance and costs of your Redshift clusters, ultimately enhancing the overall efficiency and productivity of your data processing operations.

We encourage you to try out this automated solution to access Advisor recommendations programmatically. If you have any feedback or questions, please leave them in the comments.


About the authors

Ranjan Burman is an Analytics Specialist Solutions Architect at AWS. He specializes in Amazon Redshift and helps customers build scalable analytical solutions. He has more than 16 years of experience in different database and data warehousing technologies. He is passionate about automating and solving customer problems with cloud solutions.

Nita Shah is a Senior Analytics Specialist Solutions Architect at AWS based out of New York. She has been building data warehouse solutions for over 20 years and specializes in Amazon Redshift. She is focused on helping customers design and build enterprise-scale well-architected analytics and decision support platforms.


Vamsi Bhadriraju
is a Data Architect at AWS. He works closely with enterprise customers to build data lakes and analytical applications on the AWS Cloud.

Sumant Nemmani is a Senior Technical Product Manager at AWS. He is focused on helping customers of Amazon Redshift benefit from features that use machine learning and intelligent mechanisms to enable the service to self-tune and optimize itself, ensuring Redshift remains price-performant as they scale their usage.

The collective thoughts of the interwebz