The GCC project has been working to support compiling to BPF
for some time. José Marchesi and David Faust spoke in an extended session at the 2024 Linux Storage,
Filesystem, Memory Management, and BPF Summit
about how that work has been going, and what is left for GCC to be on-par with
LLVM with regard to BPF support. They also related tentative plans for how
GCC BPF support would be maintained in the future.
AWS recently announced that Apache Flink is generally available for Amazon EMR on Amazon Elastic Kubernetes Service (EKS). Apache Flink is a scalable, reliable, and efficient data processing framework that handles real-time streaming and batch workloads (but is most commonly used for real-time streaming). Amazon EMR on EKS is a deployment option for Amazon EMR that allows you to run open source big data frameworks such as Apache Spark and Flink on Amazon Elastic Kubernetes Service (Amazon EKS) clusters with the EMR runtime. With the addition of Flink support in EMR on EKS, you can now run your Flink applications on Amazon EKS using the EMR runtime and benefit from both services to deploy, scale, and operate Flink applications more efficiently and securely.
In this post, we introduce the features of EMR on EKS with Apache Flink, discuss their benefits, and highlight how to get started.
EMR on EKS for data workloads
AWS customers deploying large-scale data workloads are adopting the EMR runtime with Amazon EKS as the underlying orchestrator to benefit from complimenting features. This also enables multi-tenancy and allows data engineers and data scientists to focus on building the data applications, and the platform engineering and the site reliability engineering (SRE) team can manage the infrastructure. Some key benefits of Amazon EKS for these customers are:
The AWS-managed control plane, which improves resiliency and removes undifferentiated heavy lifting
Features like multi-tenancy and resource-based access policies (RBAC), which allow you to build cost-efficient platforms and enforce organization-wide governance policies
The extensibility of Kubernetes, which allows you to install open source add-ons (observability, security, notebooks) to meet your specific needs
The EMR runtime offers the following benefits:
Takes care of the undifferentiated heavy lifting of managing installations, configuration, patching, and backups
Simplifies scaling
Optimizes performance and cost
Implements security and compliance by integrating with other AWS services and tools
Benefits of EMR on EKS with Apache Flink
The flexibility to choose instance types, price, and AWS Region and Availability Zone according to the workload specification is often the main driver of reliability, availability, and cost-optimization. Amazon EMR on EKS natively integrates tools and functionalities to enable these—and more.
Integration with existing tools and processes, such as continuous integration and continuous development (CI/CD), observability, and governance policies, helps unify the tools used and decreases the time to launch new services. Many customers already have these tools and processes for their Amazon EKS infrastructure, which you can now easily extend to your Flink applications running on EMR on EKS. If you’re interested in building your Kubernetes and Amazon EKS capabilities, we recommend using EKS Blueprints, which provides a starting place to compose complete EKS clusters that are bootstrapped with the operational software that is needed to deploy and operate workloads.
Another benefit of running Flink applications with Amazon EMR on EKS is improving your applications’ scalability. The volume and complexity of data processed by Flink apps can vary significantly based on factors like the time of the day, day of the week, seasonality, or being tied to a specific marketing campaign or other activity. This volatility makes customers trade off between over-provisioning, which leads to inefficient resource usage and higher costs, or under-provisioning, where you risk missing latency and throughput SLAs or even service outages. When running Flink applications with Amazon EMR on EKS, the Flink auto scaler will increase the applications’ parallelism based on the data being ingested, and Amazon EKS auto scaling with Karpenter or Cluster Autoscaler will scale the underlying capacity required to meet those demands. In addition to scaling up, Amazon EKS can also scale your applications down when the resources aren’t needed so your Flink apps are more cost-efficient.
Running EMR on EKS with Flink allows you to run multiple versions of Flink on the same cluster. With traditional Amazon Elastic Compute Cloud (Amazon EC2) instances, each version of Flink needs to run on its own virtual machine to avoid challenges with resource management or conflicting dependencies and environment variables. However, containerizing Flink applications allows you to isolate versions and avoid conflicting dependencies, and running them on Amazon EKS allows you to use Kubernetes as the unified resource manager. This means that you have the flexibility to choose which version of Flink is best suited for each job, and also improves your agility to upgrade a single job to the next version of Flink rather than having to upgrade an entire cluster, or spin up a dedicated EC2 instance for a different Flink version, which would increase your costs.
Key EMR on EKS differentiations
In this section, we discuss the key EMR on EKS differentiations.
Faster restart of the Flink job during scaling or failure recovery
This is enabled by task local recovery via Amazon Elastic Block Store (Amazon EBS) volumes and fine-grained recovery support in Adaptive Scheduler.
Task local recovery via EBS volumes for TaskManager pods is available with Amazon EMR 6.15.0 and higher. The default overlay mount comes with 10 GB, which is sufficient for jobs with a lower state. Jobs with large states can enable the automatic EBS volume mount option. The TaskManager pods are automatically created and mounted during pod creation and removed during pod deletion.
Fine-grained recovery support in the adaptive scheduler is available with Amazon EMR 6.15.0 and higher. When a task fails during its run, fine-grained recovery restarts only the pipeline-connected component of the failed task, instead of resetting the entire graph, and triggers a complete rerun from the last completed checkpoint, which is more expensive than just rerunning the failed tasks. To enable fine-grained recovery, set the following configurations in your Flink configuration:
jobmanager.execution.failover-strategy: region
restart-strategy: exponential-delay or fixed-delay
Logging and monitoring support with customer managed keys
Monitoring and observability are key constructs of the AWS Well-Architected framework because they help you learn, measure, and adapt to operational changes. You can enable monitoring of launched Flink jobs while using EMR on EKS with Apache Flink. Amazon Managed Service for Prometheus is deployed automatically, if enabled while installing the Flink operator, and it helps analyze Prometheus metrics emitted for the Flink operator, job, and TaskManager.
You can use the Flink UI to monitor health and performance of Flink jobs through a browser using port-forwarding. We have also enabled collection and archival of operator and application logs to Amazon Simple Storage Service (Amazon S3) or Amazon CloudWatch using a FluentD sidecar. This can be enabled through a monitoringConfiguration block in the deployment customer resource definition (CRD):
monitoringConfiguration:
s3MonitoringConfiguration:
logUri: S3 BUCKET
encryptionKeyArn: CMK ARN FOR S3 BUCKET ENCRYPTION
cloudWatchMonitoringConfiguration:
logGroupName: LOG GROUP NAME
logStreamNamePrefix: LOG GROUP STREAM PREFIX
sideCarResources:
limits:
cpuLimit: 500m
memoryLimit: 250Mi
containerLogRotationConfiguration:
rotationSize: 2Gb
maxFilesToKeep: 10
Cost-optimization using Amazon EC2 Spot Instances
Amazon EC2 Spot Instances are an Amazon EC2 pricing option that provides steep discounts of up to 90% over On-Demand prices. It’s the preferred choice to run big data workloads because it helps improve throughput and optimize Amazon EC2 spend. Spot Instances are spare EC2 capacity and can be interrupted with notification if Amazon EC2 needs the capacity for On-Demand requests. Flink streaming jobs running on EMR on EKS can now respond to Spot Instance interruption, perform a just-in-time (JIT) checkpoint of the running jobs, and prevent scheduling further tasks on these Spot Instances. When restarting the job, not only will the job restart from the checkpoint, but a combined restart mechanism will provide a best-effort service to restart the job either after reaching target resource parallelism or the end of the current configured window. This can also prevent consecutive job restarts caused by Spot Instances stopping in a short interval and help reduce cost and improve performance.
To minimize the impact of Spot Instance interruptions, you should adopt Spot Instance best practices. The combined restart mechanism and JIT checkpoint is offered only in Adaptive Scheduler.
Integration with the AWS Glue Data Catalog as a metadata store for Flink applications
The AWS Glue Data Catalog is a centralized metadata repository for data assets across various data sources, and provides a unified interface to store and query information about data formats, schemas, and sources. Amazon EMR on EKS with Apache Flink releases 6.15.0 and higher support using the Data Catalog as a metadata store for streaming and batch SQL workflows. This further enables data understanding and makes sure that it is transformed correctly.
Integration with Amazon S3, enabling resiliency and operational efficiency
Amazon S3 is the preferred cloud object store for AWS customers to store not only data but also application JARs and scripts. EMR on EKS with Apache Flink can fetch application JARs and scripts (PyFlink) through deployment specification, which eliminates the need to build custom images in Flink’s Application Mode. When checkpointing on Amazon S3 is enabled, a managed state is persisted to provide consistent recovery in case of failures. Retrieval and storage of files using Amazon S3 is enabled by two different Flink connectors. We recommend using Presto S3 (s3p) for checkpointing and s3 or s3a for reading and writing files including JARs and scripts. See the following code:
...
spec:
flinkConfiguration:
taskmanager.numberOfTaskSlots: "2"
state.checkpoints.dir: s3p://<BUCKET-NAME>/flink-checkpoint/
...
job:
jarURI: "s3://<S3-BUCKET>/scripts/pyflink.py" # Note, this will trigger the artifact download process
entryClass: "org.apache.flink.client.python.PythonDriver"
...
Role-based access control using IRSA
IAM Roles for Service Accounts (IRSA) is the recommended way to implement role-based access control (RBAC) for deploying and running applications on Amazon EKS. EMR on EKS with Apache Flink creates two roles (IRSA) by default for Flink operator and Flink jobs. The operator role is used for JobManager and Flink services, and the job role is used for TaskManagers and ConfigMaps. This helps limit the scope of AWS Identity and Access Management (IAM) permission to a service account, helps with credential isolation, and improves auditability.
Get started with EMR on EKS with Apache Flink
If you want to run a Flink application on recently launched EMR on EKS with Apache Flink, refer to Running Flink jobs with Amazon EMR on EKS, which provides step-by-step guidance to deploy, run, and monitor Flink jobs.
We have also created an IaC (Infrastructure as Code) template for EMR on EKS with Flink Streaming as part of Data on EKS (DoEKS), an open-source project aimed at streamlining and accelerating the process of building, deploying, and scaling data and ML workloads on Amazon Elastic Kubernetes Service (Amazon EKS). This template will help you to provision a EMR on EKS with Flink cluster and evaluate the features as mentioned in this blog. This template comes with the best practices built in, so you can use this IaC template as a foundation for deploying EMR on EKS with Flink in your own environment if you decide to use it as part of your application.
Conclusion
In this post, we explored the features of recently launched EMR on EKS with Flink to help you understand how you might run Flink workloads on a managed, scalable, resilient, and cost-optimized EMR on EKS cluster. If you are planning to run/explore Flink workloads on Kubernetes consider running them on EMR on EKS with Apache Flink. Please do contact your AWS Solution Architects, who can be of assistance alongside your innovation journey.
About the Authors
Kinnar Kumar Sen is a Sr. Solutions Architect at Amazon Web Services (AWS) focusing on Flexible Compute. As a part of the EC2 Flexible Compute team, he works with customers to guide them to the most elastic and efficient compute options that are suitable for their workload running on AWS. Kinnar has more than 15 years of industry experience working in research, consultancy, engineering, and architecture.
Alex Lines is a Principal Containers Specialist at AWS helping customers modernize their Data and ML applications on Amazon EKS.
Mengfei Wang is a Software Development Engineer specializing in building large-scale, robust software infrastructure to support big data demands on containers and Kubernetes within the EMR on EKS team. Beyond work, Mengfei is an enthusiastic snowboarder and a passionate home cook.
Jerry Zhang is a Software Development Manager in AWS EMR on EKS. His team focuses on helping AWS customers to solve their business problems using cutting-edge data analytics technology on AWS infrastructure.
Large language models (LLMs) such as Anthropic Claude and Amazon Titan have the potential to drive automation across various business processes by processing both structured and unstructured data. For example, financial analysts currently have to manually read and summarize lengthy regulatory filings and earnings transcripts in order to respond to Q&A on investment strategies. LLMs could automate the extraction and summarization of key information from these documents, enabling analysts to query the LLM and receive reliable summaries. This would allow analysts to process the documents to develop investment recommendations faster and more efficiently. Anthropic Claude and other LLMs on Amazon Bedrock can bring new levels of automation and insight across many business functions that involve both human expertise and access to knowledge spread across an organization’s databases and content repositories.
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
In this post, we show how to build a Q&A bot with RAG (Retrieval Augmented Generation). RAG uses data sources like Amazon Redshift and Amazon OpenSearch Service to retrieve documents that augment the LLM prompt. For getting data from Amazon Redshift, we use the Anthropic Claude 2.0 on Amazon Bedrock, summarizing the final response based on pre-defined prompt template libraries from LangChain. To get data from Amazon OpenSearch Service, we chunk, and convert the source data chunks to vectors using Amazon Titan Text Embeddings model.
For client interaction we use Agent Tools based on ReAct. A ReAct prompt consists of few-shot task-solving trajectories, with human-written text reasoning traces and actions, as well as environment observations in response to actions. In this example, we use ReAct for zero-shot training to generate responses to fit in a pre-defined template. The additional information is concatenated as context with the original input prompt and fed to the text generator which produces the final output. This makes RAG adaptive for situations where facts could evolve over time.
Solution overview
Our solution demonstrates how financial analysts can use generative artificial intelligence (AI) to adapt their investment recommendations based on financial reports and earnings transcripts with RAG to use LLMs to generate factual content.
The hybrid architecture uses multiple databases and LLMs, with foundation models from Amazon Bedrock for data source identification, SQL generation, and text generation with results. In the following architecture, Steps 1 and 2 represent data ingestion to be done by data engineering in batch mode. Steps 3, 4, and 5 are the queries and response formation.
The following diagram shows a more detailed view of the Q&A processing chain. The user asks a question, and LangChain queries the Redshift and OpenSearch Service data stores for relevant information to build the prompt. It sends the prompt to the Anthropic Claude on Amazon Bedrock model, and returns the response.
The details of each step are as follows:
Populate the Amazon Redshift Serverless data warehouse with company stock information stored in Amazon Simple Storage Service (Amazon S3). Redshift Serverless is a fully functional data warehouse holding data tables maintained in real time.
Load the unstructured data from your S3 data lake to OpenSearch Service to create an index to store and perform semantic search. The LangChain library loads knowledge base documents, splits the documents into smaller chunks, and uses Amazon Titan to generate embeddings for chunks.
The client submits a question via an interface like a chatbot or website.
You will create multiple steps to transform a user query passed from Amazon SageMaker Notebook to execute API calls to LLMs from Amazon Bedrock. Use LLM-based Agents to generate SQL from Text and then validate if query is relevant to data warehouse tables. If yes, run query to extract information. The LangChain library calls Amazon Titan embeddings to generate a vector for the user’s question. It calls OpenSearch vector search to get similar documents.
LangChain calls Anthropic Claude on Amazon Bedrock model with the additional, retrieved knowledge as context, to generate an answer for the question. It returns generated content to client
In this deployment, you will choose Amazon Redshift Serverless, use Anthropic Claude 2.0 model on Amazon Bedrock and Amazon Titan Text Embeddings model. Overall spend for the deployment will be directly proportional to number of input/output tokens for Amazon Bedrock models, Knowledge base volume, usage hours and so on.
To deploy the solution, you need two datasets: SEC Edgar Annual Financial Filings and Stock pricing data. To join these datasets for analysis, you need to choose Stock Symbol as the join key. The provided AWS CloudFormation template deploys the datasets required for this post, along with the SageMaker notebook.
Deploy the chat application using AWS CloudFormation
To deploy the resources, complete the following steps:
Deploy the following CloudFormation template to create your stack in the us-east-1 AWS Region.The stack will deploy an OpenSearch Service domain, Redshift Serverless endpoint, SageMaker notebook, and other services like VPC and IAM roles that you will use in this post. The template sets a default user name password for the OpenSearch Service domain, and sets up a Redshift Serverless admin. You can choose to modify them or use the default values.
On the AWS CloudFormation console, navigate to the stack you created.
On the Outputs tab, choose the URL for SageMakerNotebookURL to open the notebook.
In Jupyter, choosesemantic-search-with-amazon-opensearch, thenblog, then the LLM-Based-Agentfolder.
Open the notebook Generative AI with LLM based autonomous agents augmented with structured and unstructured data.ipynb.
Follow the instructions in the notebook and run the code sequentially.
Prepare the structured data in a Redshift database – Ingest the structured data into your Amazon Redshift Serverless table.
Query the unstructured data in OpenSearch Service with a vector search – Create a function to implement semantic search with OpenSearch Service. In OpenSearch Service, match the relevant company financial information to be used as context information to LLM. This is unstructured data augmentation to the LLM.
Query the structured data in Amazon Redshift with SQLDatabaseChain – Use the LangChain library LLM text to SQL to query company stock information stored in Amazon Redshift. The search result will be used as context information to the LLM.
Create an LLM-based ReAct agent augmented with data in OpenSearch Service and Amazon Redshift – Use the LangChain library to define a ReAct agent to judge whether the user query is stock- or investment-related. If the query is stock related, the agent will query the structured data in Amazon Redshift to get the stock symbol and stock price to augment context to the LLM. The agent also uses semantic search to retrieve relevant financial information from OpenSearch Service to augment context to the LLM.
Use the LLM-based agent to generate a final response based on the template used for zero-shot training – The following is a sample user flow for a stock price recommendation for the query, “Is ABC a good investment choice right now.”
Example questions and responses
In this section, we show three example questions and responses to test our chatbot.
Example 1: Historical data is available
In our first test, we explore how the bot responds to a question when historical data is available. We use the question, “Is [Company Name] a good investment choice right now?” Replace [Company Name] with a company you want to query.
This is a stock-related question. The company stock information is in Amazon Redshift and the financial statement information is in OpenSearch Service. The agent will run the following process:
Determine if this is a stock-related question.
Get the company name.
Get the stock symbol from Amazon Redshift.
Get the stock price from Amazon Redshift.
Use semantic search to get related information from 10k financial filing data from OpenSearch Service.
response = zero_shot_agent("\n\nHuman: Is {company name} a good investment choice right now? \n\nAssistant:")
The output may look like the following:
Final Answer: Yes, {company name} appears to be a good investment choice right now based on the stable stock price, continued revenue and earnings growth, and dividend payments. I would recommend investing in {company name} stock at current levels.
You can view the final response from the complete chain in your notebook.
Example 2: Historical data is not available
In this next test, we see how the bot responds to a question when historical data is not available. We ask the question, “Is Amazon a good investment choice right now?”
This is a stock-related question. However, there is no Amazon stock price information in the Redshift table. Therefore, the bot will answer “I cannot provide stock analysis without stock price information.” The agent will run the following process:
Determine if this is a stock-related question.
Get the company name.
Get the stock symbol from Amazon Redshift.
Get the stock price from Amazon Redshift.
response = zero_shot_agent("\n\nHuman: Is Amazon a good investment choice right now? \n\nAssistant:")
The output looks like the following:
Final Answer: I cannot provide stock analysis without stock price information.
Example 3: Unrelated question and historical data is not available
For our third test, we see how the bot responds to an irrelevant question when historical data is not available. This is testing for hallucination. We use the question, “What is SageMaker?”
This is not a stock-related query. The agent will run the following process:
Determine if this is a stock-related question.
response = zero_shot_agent("\n\nHuman: What is SageMaker? \n\nAssistant:")
The output looks like the following:
Final Answer: What is SageMaker? is not a stock related query.
This was a simple RAG-based ReAct chat agent analyzing the corpus from different data stores. In a realistic scenario, you might choose to further enhance the response with restrictions or guardrails for input and output like filtering harsh words for robust input sanitization, output filtering, conversational flow control, and more. You may also want to explore the programmable guardrails to LLM-based conversational systems.
Clean up
To clean up your resources, delete the CloudFormation stack llm-based-agent.
Conclusion
In this post, you explored how LLMs play a part in answering user questions. You looked at a scenario for helping financial analysts. You could employ this methodology for other Q&A scenarios, like supporting insurance use cases, by quickly contextualizing claims data or customer interactions. You used a knowledge base of structured and unstructured data in a RAG approach, merging the data to create intelligent chatbots. You also learned how to use autonomous agents to help provide responses that are contextual and relevant to the customer data and limit irrelevant and inaccurate responses.
Leave your feedback and questions in the comments section.
Dhaval Shah is a Principal Solutions Architect with Amazon Web Services based out of New York, where he guides global financial services customers to build highly secure, scalable, reliable, and cost-efficient applications on the cloud. He brings over 20 years of technology experience on Software Development and Architecture, Data Engineering, and IT Management.
Soujanya Konka is a Senior Solutions Architect and Analytics specialist at AWS, focused on helping customers build their ideas on cloud. Expertise in design and implementation of Data platforms. Before joining AWS, Soujanya has had stints with companies such as HSBC & Cognizant
Jon Handler is a Senior Principal Solutions Architect at Amazon Web Services based in Palo Alto, CA. Jon works closely with OpenSearch and Amazon OpenSearch Service, providing help and guidance to a broad range of customers who have search and log analytics workloads that they want to move to the AWS Cloud. Prior to joining AWS, Jon’s career as a software developer included 4 years of coding a large-scale, ecommerce search engine. Jon holds a Bachelor of the Arts from the University of Pennsylvania, and a Master of Science and a PhD in Computer Science and Artificial Intelligence from Northwestern University.
Jianwei Li is a Principal Analytics Specialist TAM at Amazon Web Services. Jianwei provides consultant service for customers to help customer design and build modern data platform. Jianwei has been working in big data domain as software developer, consultant and tech leader.
Hrishikesh Karambelkar is a Principal Architect for Data and AIML with AWS Professional Services for Asia Pacific and Japan. He is proactively engaged with customers in APJ region to enable enterprises in their Digital Transformation journey on AWS Cloud in the areas of Generative AI, machine learning and Data, Analytics, Previously, Hrishikesh has authored books on enterprise search, biig data and co-authored research publications in the areas of Enterprise Search and AI-ML.
Welcome back to our exciting exploration of architectural patterns for real-time analytics with Amazon Kinesis Data Streams! In this fast-paced world, Kinesis Data Streams stands out as a versatile and robust solution to tackle a wide range of use cases with real-time data, from dashboarding to powering artificial intelligence (AI) applications. In this series, we streamline the process of identifying and applying the most suitable architecture for your business requirements, and help kickstart your system development efficiently with examples.
Now get ready as we embark on the second part of this series, where we focus on the AI applications with Kinesis Data Streams in three scenarios: real-time generative business intelligence (BI), real-time recommendation systems, and Internet of Things (IoT) data streaming and inferencing.
Real-time generative BI dashboards with Kinesis Data Streams, Amazon QuickSight, and Amazon Q
In today’s data-driven landscape, your organization likely possesses a vast amount of time-sensitive information that can be used to gain a competitive edge. The key to unlock the full potential of this real-time data lies in your ability to effectively make sense of it and transform it into actionable insights in real time. This is where real-time BI tools such as live dashboards come into play, assisting you with data aggregation, analysis, and visualization, therefore accelerating your decision-making process.
To help streamline this process and empower your team with real-time insights, Amazon has introduced Amazon Q in QuickSight. Amazon Q is a generative AI-powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your data. Amazon QuickSight is a fast, cloud-powered BI service that delivers insights.
With Amazon Q in QuickSight, you can use natural language prompts to build, discover, and share meaningful insights in seconds, creating context-aware data Q&A experiences and interactive data stories from the real-time data. For example, you can ask “Which products grew the most year-over-year?” and Amazon Q will automatically parse the questions to understand the intent, retrieve the corresponding data, and return the answer in the form of a number, chart, or table in QuickSight.
By using the architecture illustrated in the following figure, your organization can harness the power of streaming data and transform it into visually compelling and informative dashboards that provide real-time insights. With the power of natural language querying and automated insights at your fingertips, you’ll be well-equipped to make informed decisions and stay ahead in today’s competitive business landscape.
The steps in the workflow are as follows:
We use Amazon DynamoDB here as an example for the primary data store. Kinesis Data Streams can ingest data in real time from data stores such as DynamoDB to capture item-level changes in your table.
After capturing data to Kinesis Data Streams, you can ingest the data into analytic databases such as Amazon Redshift in near-real time. Amazon Redshift Streaming Ingestion simplifies data pipelines by letting you create materialized views directly on top of data streams. With this capability, you can use SQL (Structured Query Language) to connect to and directly ingest the data stream from Kinesis Data Streams to analyze and run complex analytical queries.
After the data is in Amazon Redshift, you can create a business report using QuickSight. Connectivity between a QuickSight dashboard and Amazon Redshift enables you to deliver visualization and insights. With the power of Amazon Q in QuickSight, you can quickly build and refine the analytics and visuals with natural language inputs.
For more details on how customers have built near real-time BI dashboards using Kinesis Data Streams, refer to the following:
Real-time recommendation systems with Kinesis Data Streams and Amazon Personalize
Imagine creating a user experience so personalized and engaging that your customers feel truly valued and appreciated. By using real-time data about user behavior, you can tailor each user’s experience to their unique preferences and needs, fostering a deep connection between your brand and your audience. You can achieve this by using Kinesis Data Streams and Amazon Personalize, a fully managed machine learning (ML) service that generates product and content recommendations for your users, instead of building your own recommendation engine from scratch.
With Kinesis Data Streams, your organization can effortlessly ingest user behavior data from millions of endpoints into a centralized data stream in real time. This allows recommendation engines such as Amazon Personalize to read from the centralized data stream and generate personalized recommendations for each user on the fly. Additionally, you could use enhanced fan-out to deliver dedicated throughput to your mission-critical consumers at even lower latency, further enhancing the responsiveness of your real-time recommendation system. The following figure illustrates a typical architecture for building real-time recommendations with Amazon Personalize.
After a campaign has been created, you can integrate calls to the campaign in your application. This is where calls to the GetRecommendations or GetPersonalizedRanking APIs are made to request near-real-time recommendations from Amazon Personalize. Your website or mobile application calls a AWS Lambda function over Amazon API Gateway to receive recommendations for your business apps.
An event tracker provides an endpoint that allows you to stream interactions that occur in your application back to Amazon Personalize in near-real time. You do this by using the PutEvents API. You can build an event collection pipeline using API Gateway, Kinesis Data Streams, and Lambda to receive and forward interactions to Amazon Personalize. The event tracker performs two primary functions. First, it persists all streamed interactions so they will be incorporated into future retrainings of your model. This is also how Amazon Personalize cold starts new users. When a new user visits your site, Amazon Personalize will recommend popular items. After you stream in an event or two, Amazon Personalize immediately starts adjusting recommendations.
To learn how other customers have built personalized recommendations using Kinesis Data Streams, refer to the following:
Real-time IoT data streaming and inferencing with AWS IoT Core and Amazon SageMaker
From office lights that automatically turn on as you enter the room to medical devices that monitors a patient’s health in real time, a proliferation of smart devices is making the world more automated and connected. In technical terms, IoT is the network of devices that connect with the internet and can exchange data with other devices and software systems. Many organizations increasingly rely on the real-time data from IoT devices, such as temperature sensors and medical equipment, to drive automation, analytics, and AI systems. It’s important to choose a robust streaming solution that can achieve very low latency and handle high volumes of data throughputs to power the real-time AI inferencing.
With Kinesis Data Streams, IoT data across millions of devices can simultaneously write to a centralized data stream. Alternatively, you can use AWS IoT Core to securely connect and easily manage the fleet of IoT devices, collect the IoT data, and then ingest to Kinesis Data Streams for real-time transformation, analytics, and event-driven microservices. Then, you can use integrated services such as Amazon SageMaker for real-time inference. The following diagram depicts the high-level streaming architecture with IoT sensor data.
The steps are as follows:
Data originates in IoT devices such as medical devices, car sensors, and industrial IoT sensors. This telemetry data is collected using AWS IoT Greengrass, an open source IoT edge runtime and cloud service that helps your devices collect and analyze data closer to where the data is generated.
Event data is ingested into the cloud using edge-to-cloud interface services such as AWS IoT Core, a managed cloud platform that connects, manages, and scales devices effortlessly and securely. You can also use AWS IoT SiteWise, a managed service that helps you collect, model, analyze, and visualize data from industrial equipment at scale. Alternatively, IoT devices could send data directly to Kinesis Data Streams.
AWS IoT Core can stream ingested data into Kinesis Data Streams.
The ingested data gets transformed and analyzed in near real time using Amazon Managed Service for Apache Flink. Stream data can further be enriched using lookup data hosted in a data warehouse such as Amazon Redshift. Managed Service for Apache Flink can persist streamed data into Amazon Redshift after the customer’s integration and stream aggregation (for example, 1 minute or 5 minutes). The results in Amazon Redshift can be used for further downstream BI reporting services, such as QuickSight. Managed Service for Apache Flink can also write to a Lambda function, which can invoke SageMaker models. After the ML model is trained and deployed in SageMaker, inferences are invoked in a microbatch using Lambda. Inferenced data is sent to Amazon OpenSearch Service to create personalized monitoring dashboards using OpenSearch Dashboards. The transformed IoT sensor data can be stored in DynamoDB. You can use AWS AppSync to provide near real-time data queries to API services for downstream applications. These enterprise applications can be mobile apps or business applications to track and monitor the IoT sensor data in near real time.
The streamed IoT data can be written to an Amazon Data Firehose delivery stream, which microbatches data into Amazon S3 for future analytics.
To learn how other customers have built IoT device monitoring solutions using Kinesis Data Streams, refer to:
This post demonstrated additional architectural patterns for building low-latency AI applications with Kinesis Data Streams and its integrations with other AWS services. Customers looking to build generative BI, recommendation systems, and IoT data streaming and inferencing can refer to these patterns as the starting point of designing your cloud architecture. We will continue to add new architectural patterns in the future posts of this series.
For detailed architectural patterns, refer to the following resources:
If you want to build a data vision and strategy, check out the AWS Data-Driven Everything (D2E) program.
About the Authors
Raghavarao Sodabathina is a Principal Solutions Architect at AWS, focusing on Data Analytics, AI/ML, and cloud security. He engages with customers to create innovative solutions that address customer business problems and to accelerate the adoption of AWS services. In his spare time, Raghavarao enjoys spending time with his family, reading books, and watching movies.
Hang Zuo is a Senior Product Manager on the Amazon Kinesis Data Streams team at Amazon Web Services. He is passionate about developing intuitive product experiences that solve complex customer problems and enable customers to achieve their business goals.
Shwetha Radhakrishnan is a Solutions Architect for AWS with a focus in Data Analytics. She has been building solutions that drive cloud adoption and help organizations make data-driven decisions within the public sector. Outside of work, she loves dancing, spending time with friends and family, and traveling.
Brittany Ly is a Solutions Architect at AWS. She is focused on helping enterprise customers with their cloud adoption and modernization journey and has an interest in the security and analytics field. Outside of work, she loves to spend time with her dog and play pickleball.
Depending on your industry, you may need to install and run video surveillance. And once you have footage, you might be required to store it for a set period of days, months, or even years. This leads to the question: Where are you supposed to keep it all?
Not all storage systems are created equal, so it’s important to weigh the benefits and drawbacks of each option before making a decision. In some cases, government and industry regulations will require you to use a certain type of storage system. Ultimately, you will benefit from knowing how the system functions, what risks are involved, and how to select a technology provider.
This article will help you consider the pros and cons of on-premises, cloud, and hybrid storage systems. As you read, keep in mind that the amount of storage you need for your enterprise will depend on the number of cameras you have, the quality of the video footage, the length of time you are required to retain the footage, and various other factors.
First Things First: Your Backup Strategy
No matter how or where you store your video surveillance footage, the most important thing you should do is establish a backup strategy that follows the 3-2-1 backup approach. That means you should have three copies of your data on two different media with one stored off-site. In this post, we’ll weigh the pros and cons of whether you keep that off-site copy stored at an off-site location like, say, an Iron Mountain storage facility, a remote office, or data center, or whether you keep that off-site copy in the cloud.
You might think we’re biased as a cloud provider. Of course, we’d love it if you choose to keep your backups with Backblaze! But the main thing we want to emphasize is that you should have a backup plan for your video surveillance footage (or any data, really!) whether it includes Backblaze or not. And, because you have to store one of those copies off-site, it’s miles easier (pun intended) to store in the cloud than to physically drive or mail hard drives to a secondary location.
What Is On-Premises Storage?
Storing video footage on-premises means your data is stored on physical media—that is, servers, network attached storage (NAS), storage area network (SAN), LTO tape (linear tape open), etc.—in a physical location on your premises. We’ll talk about two forms of on-premises storage as they pertain to video footage: NAS and SAN.
Are NAS Devices Good for Storing Video Footage?
NAS devices have a large data storage capacity that provides file-based data storage services to other devices on a network. Usually, they also have a client or web portal interface, as well as services like QNAP’s Hybrid Backup Sync or Synology’s Hyper Backup to help manage your files.
One of the benefits of NAS is that it’s easy to set up and use, and you can upgrade internal drives over time. The main drawback when it comes to storing video surveillance footage is that its storage capacity is limited. Even if you buy a bigger device than you need right now, eventually you’ll run out of space and need to buy more, especially if you’re storing large amounts of video surveillance footage.
Is a SAN Good for Storing Video Footage?
On the other end of the spectrum, SANs are engineered for high-performance and mission-critical applications. They function by connecting multiple storage devices, such as disk arrays or tape libraries, to a dedicated network that is separate from the main local area network (LAN).
SANs offer high-speed data access, critical for handling large video streams from multiple cameras and allow for seamless scalability. As video surveillance systems grow, SANs can accommodate additional cameras and storage without disrupting ongoing operations. They also provide enhanced data security by isolating block-level storage within the operating system layer, to protect against failures and unauthorized access. Managing SANs can be a bit complex, necessitating skilled administrators familiar with SAN architecture. Additionally, implementing SANs incurs upfront expenses for hardware, software, and expertise, while their reliance on centralized controllers poses a risk of impacting multiple cameras in case of failure.
What Is Cloud Storage?
Cloud storage enables you to securely store data and files in an off-site location. You can access this data through the public internet.
When you transfer data off-site for storage, the cloud storage provider (CSP) hosts, secures, manages, and maintains the servers and associated infrastructure, ensuring that you have seamless access to your data whenever you need it.
What Are the Benefits of Cloud Storage for Video Surveillance Footage?
Scalability: Cloud storage services allow you to dynamically adjust capacity as your video surveillance data volumes fluctuate.
Avoid capital expenses (CapEx): By leveraging cloud storage for video surveillance, your organization benefits from paying for storage technology and capacity as a service, rather than incurring the capital expenses associated with constructing and upkeeping in-house storage networks. As data volumes grow over time, your costs may increase, but there’s no need to overprovision storage networks in anticipation of future data expansion.
Security: Cloud surveillance systems enhance data security with unique user accounts and data encryption ensure that only authorized personnel can access the footage. This controlled access minimizes the risk of unauthorized viewing or tampering.
Accessibility: Cloud storage relies on an internet or network connection so authorized users can access surveillance footage remotely from anywhere using smart devices or web browsers. Whether you’re at the office, traveling, or even at home, you can review camera feeds without being physically present on-site. Keep in mind if the connection is lost or disrupted, access to video footage becomes challenging. This dependency can impact real-time monitoring and retrieval of critical data.
Just like our other storage strategies, there are drawbacks to cloud storage. For example, it relies on a stable internet connection. Video surveillance files are large, even when you apply compression techniques, which means that they take time and proper network connections to upload. So, if your internet connection goes down, it takes longer to get data properly stored or backed up than it would with other file types. That means you may not have real-time access to your data, or (in the worst cases) that you potentially risk file corruption if you don’t have a robust enough local storage infrastructure.
Similarly, businesses should evaluate the privacy and data ownership concerns. Storing video footage in the cloud means entrusting sensitive data to a third-party service provider. Make sure that your CSP meets or exceeds all regulatory or compliance requirements, like SOC 2 or ISO 27001, before you store data on their platforms.
All things considered, cloud storage offers scalability, ease of access, fine–tuned file control, and minimal maintenance, which are essential when dealing with the complexities of storing video surveillance footage.
Direct-to-Cloud Video Surveillance
Some companies choose to transfer video surveillance off-site to the cloud for backup purposes, while others push video footage directly to the cloud as a primary storage location, especially as there are several camera models and video surveillance solutions that are designed to easily push footage directly to cloud storage. When you’re choosing video surveillance hardware, it’s worth looking into whether they have this functionality, and if so, how much control you have over setting your storage destination to optimize costs.
And, if you’re using cloud storage as the primary storage for video footage, a multi-cloud setup can be used to ensure the primary copy in the cloud is backed up. A multi-cloud setup involves using multiple cloud service providers simultaneously—so, if your video surveillance platform stores footage in their own cloud, you can still set up a workflow that backs up to a different CSP. For backup and archive purposes, organizations can distribute their data across different clouds to enhance reliability, reduce risk, create geographic diversity in storage locations for disaster recovery purposes, and to comply with data retention policies. This approach ensures data availability even if one cloud provider experiences issues.
What Is Hybrid Cloud Storage?
Hybrid cloud storage combines elements from both public clouds and private clouds (typically on-premises systems). It’s essentially a unified management approach where an integrated infrastructure enables seamless movement of workloads and data between the private and public clouds.
Using a hybrid cloud for video surveillance makes sense for lots of use cases, including backup and archive. Let’s talk about how.
Backup: To deploy a hybrid approach for a video surveillance backup use case, you’d store all of your video surveillance footage in your on-premises systems, then store your backups in the cloud. Many NAS devices, for example, come with on-board backup utilities that allow you to store backups of your video surveillance footage directly in the cloud. You could also use third-party backup software to automatically back up your systems to the cloud. This hybrid approach gives you fast access to your footage via your on-premises storage, while protecting it with cloud backups.
Archive: To deploy a hybrid approach for a video surveillance archive use case, you’d store recent live recordings of your video surveillance footage on-premises. After a recurring cutoff date—whether in days or months—you then move old footage to a public cloud. This hybrid system allows you to access recent footage quickly while archiving older footage, particularly if you have retention requirements for compliance or cyber insurance purposes. If done right, this system can help your company comply with both short- and long-term industry requirements.
For a more in-depth look at hybrid cloud storage, check out our blog on hybrid cloud.
Is Hybrid Cloud Good for Video Surveillance Footage?
Leveraging hybrid cloud storage provides a dual advantage for video surveillance: swift local access to your video surveillance footage while simultaneously safeguarding it through off-site backups or off-loading it through a cloud archive. This strategic approach allows you to harness the strengths of both public and private clouds. Moreover, it offers enhanced scalability and flexibility compared to traditional on-premises solutions.
However, it’s essential to note that implementing a private cloud system can be cost-intensive. It necessitates budgeting for hardware acquisitions and replacements over time. Additionally, you’ll likely need to allocate resources for dedicated staff to maintain servers and backup strategies.
The Verdict: Which Type of Storage Is Best for Video Surveillance?
Choosing the right video surveillance storage solution is a critical decision for any organization. On-premises, cloud, and hybrid cloud each have their merits and drawbacks. While on-premises solutions offer large data storage capacity that is easy to set up and use, they require significant infrastructure investment. Cloud storage provides data accessibility and scales seamlessly while optimizing cost-effectiveness. Hybrid cloud provides both rapid local access to your video surveillance footage and secure off-site backups.
Ultimately, the choice depends on your specific needs, budget, and long-term strategy. Consider the trade-offs carefully to ensure seamless and reliable video storage for your surveillance system.
In part two of this series, we’ll walk through an example to show you how to use Security Lake and other AWS services and tools to drive an incident to resolution.
NIST SP 800-61 describes a set of steps you use to resolve an incident. These include preparation (Stage 1), detection and analysis (Stage 2), containment, eradication and recovery (Stage 3), and finally post-incident activities (Stage 4).
Figure 1 shows the workflow of incident response defined by NIST SP 800-61. The response flows from Stage 1 through Stage 4, with Stages 2 and 3 often being an iterative process. We will discuss the value of Security Lake at each stage of the NIST incident response handling process, with a focus on preparation, detection, and analysis.
Preparation helps you ensure that tools, processes, and people are prepared for incident response. In some cases, preparation can also help you identify systems, networks, and applications that might not be sufficiently secure. For example, you might determine you need certain system logs for incident response, but discover during preparation that those logs are not enabled.
Figure 2 shows how Security Lake can accelerate the preparation stage during the incident response process. Through native integration with various security data sources from both AWS services and third-party tools, Security Lake simplifies the integration and concentration of security data, which also facilitates training and rehearsal for incident response.
Figure 2: Amazon Security Lake data consolidation for IR preparation
Some challenges in the preparation stage include the following:
Insufficient incident response planning, training, and rehearsal – Time constraints or insufficient resources can slow down preparation.
Complexity of system integration and data sources – An increasing number of security data sources and integration points require additional integration effort, or increase risk that some log sources are not integrated.
Centralized log repository for mixed environments – Customers with both on-premises and cloud infrastructure told us that consolidating logs for those mixed environments was a challenge.
Security Lake can help you deal with these challenges in the following ways:
Simplify system integration with security data normalization
Security Lake provides a central repository to store your security log data from various data sources with less integration effort.
Streamline data consolidation across mixed environments
Security Lake supports multiple log sources, including AWS native services and custom sources, which include third-party partner solutions, other cloud platforms and your on-premises log sources. For example, see this blog post to learn how to ingest Microsoft Azure activity logs into Security Lake.
Facilitate IR planning and testing
Security Lake reduces the undifferentiated heavy lifting needed to get security data into tooling so teams spend less time on configuration and data extract, transform, and load (ETL) work and more time on preparedness.
With a purpose-built security data lake and data retention policies that you define, security teams can integrate data-driven decision making into their planning and testing, answering questions such as “which incident handling capabilities do we prioritize?” and running Well-Architected game days.
Stages 2 and 3: Detection and Analysis, Containment, Eradication and Recovery
The Detection and Analysis stage (Stage 2) should lead you to understand the immediate cause of the incident and what steps need to be taken to contain it. Once contained, it’s critical to fully eradicate the issue. These steps form Stage 3 of the incident response cycle. You want to ensure that those malicious artifacts or exploits are removed from systems and verify that the impacted service has recovered from the incident.
Figure 3 shows how Security Lake can enable effective detection and analysis. Doing so enables teams to quickly contain, eradicate, and recover from the incident. Security Lake natively integrates with other AWS analytics services, such as Amazon Athena, Amazon QuickSight, and Amazon OpenSearch Service, which makes it easier for your security team to generate insights on the nature of the incident and to take relevant remediation steps.
Figure 3: Amazon Security Lake accelerates IR Detection and Analysis, Containment, Eradication, and Recovery
Common challenges present in stages 2 and 3 include the following:
Challenges generating insights from disparate data sources
Inability to generate insights from security data means teams are less likely to discover an incident, as opposed to having the breach revealed to them by a third party (such as a threat actor).
Breaches disclosed by a threat actor might involve higher costs than incidents discovered by the impacted organizations themselves, because typically the unintended access has progressed for longer and impacted more resources and data than if the impacted organization discovered it sooner.
Inconsistency of data visibility and data siloing
Security log data silos may slow IR data analysis because it’s challenging to gather and correlate the necessary information to understand the full scope and impact of an incident. This can lead to delays in identifying the root cause, assessing the damage, and taking remediation steps.
Data silos might also mean additional permissions management overhead for administrators.
Disparate data sources add barriers to adopting new technology, such as AI-driven security analytics tools
AI-driven security analysis requires a large amount of security data from various data sources, which might be in disparate formats. Without a centralized security data repository, you might need to make additional effort to ingest and normalize data for model training.
Security Lake offers native support for log ingestion for a range of AWS security services, including AWS CloudTrail, AWS Security Hub, and VPC Flow Logs. Additionally, you can configure Security Lake to ingest external sources. This helps enrich findings and alerts.
Security Lake addresses the preceding challenges as follows:
Unleash security detection capability by centralizing detection data
With a purpose-built security data lake with a standard object schema, organizations can centrally access their security data—AWS and third-party—using the same set of IR tools. This can help you investigate incidents that involve multiple resources and complex timelines, which could require access logs, network logs, and other security findings. For example, use Amazon Athena to query all your security data. You can also build a centralized security finding dashboard with Amazon QuickSight.
Reduce management burden
With Security Lake, permissions complexity is reduced. You use the same access controls in AWS Identity and Access Management (IAM) to make sure that only the right people and systems have access to sensitive security data.
See this blog post for more details on generating machine learning insights for Security Lake data by using Amazon SageMaker.
Stage 4: Post-Incident Activity
Continuous improvement helps customers to further develop their IR capabilities. Teams should integrate lessons learned into their tools, policies, and processes. You decide on lifecycle policies for your security data. You can then retroactively review event data for insight and to support lessons learned. You can also share security telemetry at levels of granularity you define. Your organization can then establish distributed data views for forensic purposes and other purposes, while enforcing least privilege for data governance.
Figure 4 shows how Security Lake can accelerate the post-incident activity stage during the incident response process. Security Lake natively integrates with AWS Organizations to enable data sharing across various OUs within the organization, which further unleashes the power of machine learning to automatically create insights for incident response.
Figure 4: Security Lake accelerates post-incident activity
Having covered some advantages of working with your data in Security Lake, we will now demonstrate best practices for getting Security Lake set up.
Setting up for success with Security Lake
Most of the customers we work with run multiple AWS accounts, usually with AWS Organizations. With that in mind, we’re going to show you how to set up Security Lake and related tooling in line with guidance in the AWS Security Reference Architecture (AWS SRA). The AWS SRA provides guidance on how to deploy AWS security services in a multi-account environment. You will have one AWS account for security tooling and a different account to centralize log storage. You’ll run Security Lake in this log storage account.
If you just want to use Security Lake in a standalone account, follow these instructions.
Set up Security Lake in your logging account
Most of the instructions we link to in this section describe the process using either the console or AWS CLI tools. Where necessary, we’ve described the console experience for illustrative purposes.
The AmazonSecurityLakeAdministrator AWS managed IAM policy grants the permissions needed to set up Security Lake and related services. Note that you may want to further refine permissions, or remove that managed policy after Security Lake and the related services are set up and running.
To set up Security Lake in your logging account
Note down the AWS account number that will be your delegated administrator account. This will be your centralized archive logs account. In the AWS Management Console, sign in to your Organizations management account and set up delegated administration for Security Lake.
Sign in to the delegated administrator account, go to the Security Lake console, and choose Get started. Then follow these instructions from the Security Lake User Guide. While you’re setting this up, note the following specific guidance (this will make it easier to follow the second blog post in this series):
Define source objective: For Sources to ingest, we recommend that you select Ingest the AWS default sources. However, if you want to include S3 data events, you’ll need to select Ingest specific AWS sources and then select CloudTrail – S3 data events. Note that we use these events for responding to the incident in blog post part 2, when we really drill down into user activity.
Figure 5 shows the configuration of sources to ingest in Security Lake.
Figure 5: Sources to ingest in Security Lake
We recommend leaving the other settings on this page as they are.
Define target objective: We recommend that you choose Add rollup Region and add multiple AWS Regions to a designated rollup Region. The rollup Region is the one to which you will consolidate logs. The contributing Region is the one that will contribute logs to the rollup Region.
Figure 6 shows how to select the rollup regions.
Figure 6: Select rollup Regions
You now have Security Lake enabled, and in the background, additional services such as AWS Lake Formation and AWS Glue have been configured to organize your Security Lake data.
Subscribers are specific to a Region, so you want to make sure that you set up your subscriber in the same Region as your rollup Region.
You will also set up an External ID. This is a value you define, and it’s used by the IAM role to prevent the confused deputy problem. Note that the subscriber will be your security tooling account.
You will select Lake Formation for Data access, which will create shares in AWS Resource Access Manager (AWS RAM) that will be shared with the account that you specified in Subscriber credentials.
If you’ve already set up Security Lake at some time in the past, you should select Specific log and event sources and confirm the source and version you want the subscriber to access. If it’s a new implementation, we recommend using version 2.0 or greater.
There’s a note in the console that says the subscribing account will need to accept the RAM resource shares. However, if you’re using AWS Organizations, you don’t need to do that; the resource share will already list a status of Active when you select the Shared with me >> Resource shares in the subscriber (security tooling) account RAM console.
Note: If you prefer a visual guide, you can refer to this video to set up Security Lake in AWS Organizations.
Set up Amazon Athena and AWS Lake Formation in the security tooling account
Go to the Lake Formation console in the security tooling account and follow the instructions to create resource links for the shared Security Lake tables. You’ll most likely use the Default database and will see your tables there. The table names in that database start with amazon_security_lake_table. You should expect to see about eight tables there.
Figure 7 shows the shared tables in the Lake Formation service console.
Figure 7: Shared tables in Lake Formation
You will need to create resource links for each table, as described in the instructions from the Lake Formation Developer Guide.
Figure 8 shows the resource link creation process.
Figure 8: Creating resource links
Next, go to Amazon Athena in the same Region. If Athena is not set up, follow the instructions to get it set up for SQL queries. Note that you won’t need to create a database—you’re going to use the “default” database that already exists. Select it from the Database drop-down menu in the Query editor view.
In the Tables section, you should see all your Security Lake tables (represented by whatever names you gave them when you created the resource links in step 1, earlier).
Get your incident response playbooks ready
Incident response playbooks are an important tool that enable responders to work more effectively and consistently, and enable the organization to get incidents resolved more quickly. We’ve created some ready-to-go templates to get you started. You can further customize these templates to meet your needs. In part two of this post, you’ll be using the Unintended Data Access to an Amazon Simple Storage Service (Amazon S3) bucket playbook to resolve an incident. You can download that playbook so that you’re ready to follow it to get that incident resolved.
Conclusion
This is the first post in a two-part series about accelerating security incident response with Security Lake. We highlighted common challenges that decelerate customers’ incident responses across the stages outlined by NIST SP 800-61 and how Security Lake can help you address those challenges. We also showed you how to set up Security Lake and related services for incident response.
In the second part of this series, we’ll walk through a specific security incident—unintended data access—and share prescriptive guidance on using Security Lake to accelerate your incident response process.
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
За изминалите шест години и половина „Тоест“ се утвърди като реномирана медия, която предоставя качествен, достоверен и аналитичен прочит на обществените събития в България.
От основаването си в началото на 2018 г. „Тоест“ се издържа единствено от своите читатели. Силната подкрепа на нашата вярна аудитория дотук ни дава увереност, че в България има достатъчна критична маса от активни граждани, които със своите малки, но редовни месечни дарения могат изцяло да осигурят журналистическата независимост на медията.
През годините сме разчитали на органичния си растеж, но имаме нужда да достигнем до по-голяма аудитория и да разширим подкрепата от читателите, за да предоставяме още по-богато и по-разнообразно съдържание.
Благодарение на целево финансиране по Централноевропейската медийна програма на международните организации ZINC и IREX, в момента предстои да изградим ясна стратегия за разширяване и ангажиране на аудиторията, да осигурим добър маркетингов бюджет и да получим ноу-хау и менторство от международни експерти в сферата на медийния маркетинг и комуникация.
Междувременно търсим маркетинг специалист, с когото заедно да развиваме „Тоест“.
Ако Вие
сте инициативен, динамичен и ориентиран към ефективните резултати човек,
обичате предизвикателствата и имате нагласата непрекъснато да се учите,
имате оригинални идеи и въображение и с готовност излизате извън установените (и често неприложими за медии като нашата) клишета на маркетинга,
активно следите новините и обществените събития в България и чужбина,
вярвате като нас в демократичните ценности и в силата на будните граждани да променят обществото ни към по-добро,
имате силно желание и мотивация да работите в независима медия,
… ние търсим точно Вас!
Вижте предложението ни за работа и се присъединете към нашия великолепен, всеотдаен и сплотен екип!
Какво още е нужно да притежавате:
бакалавърска степен по маркетинг, комуникации или друга специалност в областта на икономиката или хуманитарните науки;
отлично писмено и устно владеене на английски език;
практически опит в маркетинга, комуникациите или копирайтинга и в стратегическото планиране и изпълняване на кампании (предимство имат кандидати с опит в новинарската или издателската индустрия);
увереност в използването на дигитални маркетингови инструменти и платформи, софтуери за управление на социалните медии и услуги за имейл маркетинг;
силни аналитични умения и стратегическо и креативно мислене с голямо внимание към детайлите и опит при вземане на решения, основани на данни;
отлични умения за комуникация и за работа в екип;
умение да пишете и да се изразявате ясно и грамотно, с усет към нюансите на езика;
способност да управлявате няколко кампании едновременно и да спазвате определените срокове.
Основни отговорности:
разработване и изпълнение на интегрирани маркетингови кампании, които съответстват на целите на организацията и на тенденциите на пазара;
управление и оптимизиране на дигиталните маркетингови канали, в т.ч. социалните мрежи (чрез органични и платени кампании) и имейл маркетинг, с цел разширяване на аудиторията и увеличаване на нейната ангажираност;
анализиране на пазарните тенденции и читателските предпочитания, за да се идентифицират възможности за растеж и да се подобри редакционната стратегия;
в тясно сътрудничество с редакционния екип – осигуряване на последователност на посланията и позиционирането на бранда във всички платформи;
проследяване и отчитане на ефективността на кампаниите с използване на анализ на данните в процесите на вземане на решения и нанасяне на корекции в стратегията;
комуникация с аудиторията и изграждане на общност;
комуникация с външни партньори, доставчици и маркетингови агенции, за да се осигури успехът на кампаниите и ефективното разходване на бюджета;
постоянно мислене в посока подобряване на бранда „Тоест“ и неговата позиция на пазара.
Какво предлагаме:
основно нетно заплащане от 2500 лв. по трудов договор, допълнено с всички дължими по закон данъци, здравни и пенсионни осигуровки;
бонус от 10% от всички нови приходи, получени в резултат на усилията в маркетинга, комуникацията и изграждането на общност;
обучение и менторство от международни експерти в сферата на медийния маркетинг и комуникация;
гъвкави и дистанционни работни ангажименти;
работа в динамична среда с изключително всеотдаен екип.
Организация: Фондация в обществена полза „Тоест“ Местоположение: дистанционна работа Ангажираност: пълен работен ден
The iomap
block-mapping abstraction is being used by more filesystems, in part
because of its support for large folios. But there are some challenges in
adopting iomap, which was the topic of a discussion led by Ritesh Harjani
in a combined storage and filesystem session at the 2024 Linux Storage,
Filesystem, Memory Management, and BPF Summit. One of the main trouble
spots is how to handle metadata, which is not an area that iomap has been aimed
at.
In the final session in the memory-management track of the 2024 Linux Storage,
Filesystem, Memory-Management and BPF Summit, the exhausted group of
developers looked one more time at the use of huge pages and the associated
problem of memory fragmentation. At its worst, this problem can make huge
pages harder (and more expensive) to allocate. Luis Chamberlain, who ran
the session, felt that people were worried about this problem, but that
there was little data on how severe it truly is.
A longstanding tradition in the memory-management track of the Linux Storage,
Filesystem, Memory-Management and BPF Summit is a session with
maintainer Andrew Morton to discuss the overall state of the community and
the development process. The 2024 gathering upheld that tradition toward
the end of the final day of the event. It seems that Morton and the
assembled developers were all happy with how memory-management work is
going, but there is always room for improvement.
Security updates have been issued by Debian (less), Mageia (chromium-browser-stable), SUSE (apache2, java-1_8_0-openj9, kernel, libqt5-qtnetworkauth, and openssl-3), and Ubuntu (netatalk and python-cryptography).
A full conference pass is $1,099. Register today with the code flashsale150 to receive a limited time $150 discount, while supplies last.
We’re counting down to AWS re:Inforce, our annual cloud security event! We are thrilled to invite security enthusiasts and builders to join us in Philadelphia, PA, from June 10–12 for an immersive two-and-a-half-day journey into cloud security learning. This year, we’ve expanded the event by half a day to give you more opportunities to delve into the latest security trends and technologies. At AWS re:Inforce, you’ll have the chance to explore the breadth of the Amazon Web Services (AWS) security landscape, learn how to operationalize security services, and enhance your skills and confidence in cloud security to improve your organization’s security posture. As an attendee, you will have access to over 250 sessions across multiple topic tracks, including data protection; identity and access management; threat detection and incident response; network and infrastructure security; generative AI; governance, risk, and compliance; and application security. Plus, get ready to be inspired by our lineup of customer speakers, who will share their firsthand experiences of innovating securely on AWS.
In this post, we’ll provide an overview of the key sessions that include lecture-style presentations featuring real-world use cases from our customers, as well as the interactive small-group sessions led by AWS experts that guide you through practical problems and solutions.
The threat detection and incident response track is designed to demonstrate how to detect and respond to security risks to help protect workloads at scale. AWS experts and customers will present key topics such as threat detection, vulnerability management, cloud security posture management, threat intelligence, operationalization of AWS security services, container security, effective security investigation, incident response best practices, and strengthening security through the use of generative AI and securing generative AI workloads.
Breakout sessions, chalk talks, and lightning talks
TDR201 | Breakout session | How NatWest uses AWS services to manage vulnerabilities at scale As organizations move to the cloud, rapid change is the new normal. Safeguarding against potential security threats demands continuous monitoring of cloud resources and code that are constantly evolving. In this session, NatWest shares best practices for monitoring their AWS environment for software and configuration vulnerabilities at scale using AWS security services like Amazon Inspector and AWS Security Hub. Learn how security teams can automate the identification and prioritization of critical security insights to manage alert fatigue and swiftly collaborate with application teams for remediation.
TDR301 | Breakout session | Developing an autonomous framework with Security Lake & Torc Robotics Security teams are increasingly seeking autonomy in their security operations. Amazon Security Lake is a powerful solution that allows organizations to centralize their security data across AWS accounts and Regions. In this session, learn how Security Lake simplifies centralizing and operationalizing security data. Then, hear from Torc Robotics, a leading autonomous trucking company, as they share their experience and best practices for using Security Lake to establish an autonomous security framework.
TDR302 | Breakout session | Detecting and responding to threats in generative AI workloads While generative AI is an emerging technology, many of the same services and concepts can be used for threat detection and incident response. In this session, learn how you can build out threat detection and incident response capabilities for a generative AI workload that uses Amazon Bedrock. Find out how to effectively monitor this workload using Amazon Bedrock, Amazon GuardDuty, and AWS Security Hub. The session also covers best practices for responding to and remediating security issues that may come up.
TDR303 | Breakout session | Innovations in AWS detection and response services In this session, learn about the latest advancements and recent AWS launches in the field of detection and response. This session focuses on use cases like threat detection, workload protection, automated and continual vulnerability management, centralized monitoring, continuous cloud security posture management, unified security data management, and discovery and protection of workloads and data. Through these use cases, gain a deeper understanding of how you can seamlessly integrate AWS detection and response services to help protect your workloads at scale, enhance your security posture, and streamline security operations across your entire AWS environment.
TDR304 | Breakout session | Explore cloud workload protection with GuardDuty, feat. Booking.com Monitoring your workloads at runtime allows you to detect unexpected activity sooner—before it escalates to broader business-impacting security issues. Amazon GuardDuty Runtime Monitoring offers fully managed threat detection that gives you end-to-end visibility across your AWS environment. GuardDuty’s unique detection capabilities are guided by AWS’s visibility into the cloud threat landscape. In this session, learn why AWS built the Runtime Monitoring feature and how it works. Also discover how Booking.com used GuardDuty for runtime protection, supporting their mission to make it easier for everyone to experience the world.
TDR305 | Breakout session | Cyber threat intelligence sharing on AWS Real-time, contextual, and comprehensive visibility into security issues is essential for resilience in any organization. In this session, join the Australian Cyber Security Centre (ACSC) as they present their Cyber Threat Intelligence Sharing (CTIS) program, built on AWS. With the aim to improve the cyber resilience of the Australian community and help make Australia the most secure place to connect online, the ACSC protects Australia from thousands of threats every day. Learn the technical fundamentals that can help you apply best practices for real-time, bidirectional sharing of threat intelligence across all sectors.
TDR331 | Chalk talk | Unlock OCSF: Turn raw logs into insights with generative AI So, you have security data stored using the Open Cybersecurity Schema Framework (OCSF)—now what? In this chalk talk, learn how to use AWS analytics tools to mine data stored using the OCSF and leverage generative AI to consume insights. Discover how services such as Amazon Athena, Amazon Q in QuickSight, and Amazon Bedrock can extract, process, and visualize security insights from OCSF data. Gain practical skills to identify trends, detect anomalies, and transform your OCSF data into actionable security intelligence that can help your organization respond more effectively to cybersecurity threats.
TDR332 | Chalk talk | Anatomy of a ransomware event targeting data within AWS Ransomware events can interrupt operations and cost governments, nonprofits, and businesses billions of dollars. Early detection and automated responses are important mechanisms that can help mitigate your organization’s exposure. In this chalk talk, learn about the anatomy of a ransomware event targeting data within AWS including detection, response, and recovery. Explore the AWS services and features that you can use to protect against ransomware events in your environment, and learn how you can investigate possible ransomware events if they occur.
TDR333 | Chalk talk | Implementing AWS security best practices: Insights and strategies Have you ever wondered if you are using AWS security services such as Amazon GuardDuty, AWS Security Hub, AWS WAF, and others to the best of their ability? Do you want to dive deep into common use cases to better operationalize AWS security services through insights developed via thousands of deployments? In this chalk talk, learn tips and tricks from AWS experts who have spent years talking to users and documenting guidance outlining AWS security services best practices.
TDR334 | Chalk talk | Unlock your security superpowers with generative AI Generative AI can accelerate and streamline the process of security analysis and response, enhancing the impact of your security operations team. Its unique ability to combine natural language processing with large existing knowledge bases and agent-based architectures that can interact with your data and systems makes it an ideal tool for augmenting security teams during and after an event. In this chalk talk, explore how generative AI will shape the future of the SOC and lead to new capabilities in incident response and cloud security posture management.
TDR431 | Chalk talk | Harnessing generative AI for investigation and remediation To help businesses move faster and deliver security outcomes, modern security teams need to identify opportunities to automate and simplify their workflows. One way of doing so is through generative AI. Join this chalk talk to learn how to identify use cases where generative AI can help with investigating, prioritizing, and remediating findings from Amazon GuardDuty, Amazon Inspector, and AWS Security Hub. Then find out how you can develop architectures from these use cases, implement them, and evaluate their effectiveness. The talk offers tenets for generative AI and security that can help you safely use generative AI to reduce cognitive load and increase focus on novel, high-value opportunities.
TDR432 | Chalk talk | New tactics and techniques for proactive threat detection This insightful chalk talk is led by the AWS Customer Incident Response Team (CIRT), the team responsible for swiftly responding to security events on the customer side of the AWS Shared Responsibility Model. Discover the latest trends in threat tactics and techniques observed by the CIRT, along with effective detection and mitigation strategies. Gain valuable insights into emerging threats and learn how to safeguard your organization’s AWS environment against evolving security risks.
TDR433 | Chalk talk | Incident response for multi-account and federated environments In this chalk talk, AWS security experts guide you through the lifecycle of a compromise involving federation and third-party identity providers. Learn how AWS detects unauthorized access and which approaches can help you respond to complex situations involving organizations with multiple accounts. Discover insights into how you can contain and recover from security events and discuss strong IAM policies, appropriately restrictive service control policies, and resource termination for security event containment. Also, learn how to build resiliency in an environment with IAM permission refinement, organizational strategy, detective controls, chain of custody, and IR break-glass models.
TDR227 | Lightning talk | How Razorpay scales threat detection using AWS Discover how Razorpay, a leading payment aggregator solution provider authorized by the Reserve Bank of India, efficiently manages millions of business transactions per minute through automated security operations using AWS security services. Join this lightning talk to explore how Razorpay’s security operations team uses AWS Security Hub, Amazon GuardDuty, and Amazon Inspector to monitor their critical workloads on AWS. Learn how they orchestrate complex workflows, automating responses to security events, and reduce the time from detection to remediation.
TDR321 | Lightning talk | Scaling incident response with AWS developer tools In incident response, speed matters. Responding to incidents at scale can be challenging as the number of resources in your AWS accounts increases. In this lightning talk, learn how to use SDKs and the AWS Command Line Interface (AWS CLI) to rapidly run commands across your estate so you can quickly retrieve data, identify issues, and resolve security-related problems.
TDR322 | Lightning talk | How Snap Inc. secures its services with Amazon GuardDuty In this lightning talk, discover how Snap Inc. established a secure multi-tenant compute platform on AWS and mitigated security challenges within shared Kubernetes clusters. Snap uses Amazon GuardDuty and the OSS tool Falco for runtime protection across build time, deployment time, and runtime phases. Explore Snap’s techniques for facilitating one-time cluster access through AWS IAM Identity Center. Find out how Snap has implemented isolation strategies between internal tenants using the Pod Security Standards (PSS) and network policies enforced by the Amazon VPC Container Network Interface (CNI) plugin.
TDR326 | Lightning talk | Streamlining security auditing with generative AI For identifying and responding to security-related events, collecting and analyzing logs is only the first step. Beyond this initial phase, you need to utilize tools and services to parse through logs, understand baseline behaviors, identify anomalies, and create automated responses based on the type of event. In this lightning talk, learn how to effectively parse security logs, identify anomalies, and receive response runbooks that you can implement within your environment.
Interactive sessions (builders’ sessions, code talks, and workshops)
TDR351 | Builders’ session | Accelerating incident remediation with IR playbooks & Amazon Detective In this builders’ session, learn how to investigate incidents more effectively and discover root cause with Amazon Detective. Amazon Detective provides finding-group summaries by using generative AI to automatically analyze finding groups. Insights in natural language then help you accelerate security investigations. Find out how you can create your own incident response playbooks and test them by handling multi-event security issues.
TDR352 | Builders’ session | How to automate containment and forensics for Amazon EC2 Automated Forensics Orchestrator for Amazon EC2 deploys a mechanism that uses AWS services to orchestrate and automate key digital forensics processes and activities for Amazon EC2 instances in the event of a potential security issue being detected. In this builders’ session, learn how to deploy and scale this self-service AWS solution. Explore the prerequisites, learn how to customize it for your environment, and experience forensic analysis on live artifacts to identify what potential unauthorized users could do in your environment.
TDR353 | Builders’ session | Preventing top misconfigurations associated with security events Have you ever wondered how you can prevent top misconfigurations that could lead to a security event? Join this builders’ session, where the AWS Customer Incident Response Team (CIRT) reviews some of the most commonly observed misconfigurations that can lead to security events. Then learn how to build mechanisms using AWS Security Hub and other AWS services that can help detect and prevent these issues.
TDR354 | Builders’ session | Insights in your inbox: Build email reporting with AWS Security Hub AWS Security Hub provides you with a comprehensive view of the security state of your AWS resources by collecting security data from across AWS accounts, AWS Regions, and AWS services. In this builders’ session, learn how to set up a customizable and automated summary email that distills security posture information, insights, and critical findings from Security Hub. Get hands-on with the Security Hub console and discover easy-to-implement code examples that you can use in your own organization to drive security improvements.
TDR355 | Builders’ session | Detecting ransomware and suspicious activity in Amazon RDS In this builders’ session, acquire skills that can help you detect and respond to threats targeting AWS databases. Using services such as AWS Cloud9 and AWS CloudFormation, simulate real-world intrusions on Amazon RDS and Amazon Aurora and use Amazon Athena to detect unauthorized activities. The session also covers strategies from the AWS Customer Incident Response Team (CIRT) for rapid incident response and configuring essential security settings to enhance your database defenses. The session provides practical experience in configuring audit logging and enabling termination protection to ensure robust database security measures.
TDR451 | Builders’ session | Create a generative AI runbook to resolve security findings Generative AI has the potential to accelerate and streamline security analysis, response, and recovery, enhancing the effectiveness of human engagement. In this builders’ session, learn how to use Amazon SageMaker notebooks and Amazon Bedrock to quickly resolve security findings in your AWS account. You rely on runbooks for the day-to-day operations, maintenance, and troubleshooting of AWS services. With generative AI, you can gain deeper insights into security findings and take the necessary actions to streamline security analysis and response.
TDR441 | Code talk | How to use generative AI to gain insights in Amazon Security Lake In this code talk, explore how you can use generative AI to gather enhanced security insights within Amazon Security Lake by integrating Amazon SageMaker Studio and Amazon Bedrock. Learn how AI-powered analytics can help rapidly identify and respond to security threats. By using large language models (LLMs) within Amazon Bedrock to process natural language queries and auto-generate SQL queries, you can expedite security investigations, focusing on relevant data sources within Security Lake. The talk includes a threat analysis exercise to demonstrate the effectiveness of LLMs in addressing various security queries. Learn how you can streamline security operations and gain actionable insights to strengthen your security posture and mitigate risks effectively within AWS environments.
TDR442 | Code talk | Security testing, the practical way Join this code talk for a practical demonstration of how to test security capabilities within AWS. The talk can help you evaluate and quantify your detection and response effectiveness against key metrics like mean time to detect and mean time to resolution. Explore testing techniques that use open source tools alongside AWS services such as Amazon GuardDuty and AWS WAF. Gain insights into testing your security configurations in your environment and uncover best practices tailored to your testing scenarios. This talk equips you with actionable strategies to enhance your security posture and establish robust defense mechanisms within your AWS environment.
TDR443 | Code talk | How to conduct incident response in your Amazon EKS environment Join this code talk to gain insights from both adversaries’ and defenders’ perspectives as AWS experts simulate a live security incident within an application across multiple Amazon EKS clusters, invoking an alert in Amazon GuardDuty. Witness the incident response process as experts demonstrate detection, containment, and recovery procedures in near real time. Through this immersive experience, learn how you can effectively respond to and recover from Amazon EKS–specific incidents, and gain valuable insights into incident handling within cloud environments. Don’t miss this opportunity to enhance your incident response capabilities and learn how to more effectively safeguard your AWS infrastructure.
TDR444 | Code talk | Identity forensics in the realm of short-term credentials AWS Security Token Service (AWS STS) is a common way for users to access AWS services and allows you to utilize role chaining for navigating AWS accounts. When investigating security incidents, understanding the history and potential impact is crucial. Examining a single session is often insufficient because the initial abused credential may be different than the one that precipitated the investigation, and other tokens might be generated. Also, a single session investigation may not encompass all permissions that the adversary controls, due to trust relationships between the roles. In this code talk, learn how you can construct identity forensics capabilities using Amazon Detective and create a custom graph database using Amazon Neptune.
TDR371-R | Workshop | Threat detection and response on AWS Join AWS experts for an immersive threat detection and response workshop using Amazon GuardDuty, Amazon Inspector, AWS Security Hub, and Amazon Detective. This workshop simulates security events for different types of resources and behaviors and illustrates both manual and automated responses with AWS Lambda. Dive in and learn how to improve your security posture by operationalizing threat detection and response on AWS.
TDR372-R | Workshop | Container threat detection and response with AWS security services Join AWS experts for an immersive container security workshop using AWS threat detection and response services. This workshop simulates scenarios and security events that may arise while using Amazon ECS and Amazon EKS. The workshop also demonstrates how to use different AWS security services to detect and respond to potential security threats, as well as suggesting how you can improve your security practices. Dive in and learn how to improve your security posture when running workloads on AWS container orchestration services.
TDR373-R | Workshop | Vulnerability management with Amazon Inspector and Jenkins Join AWS experts for an immersive vulnerability management workshop using Amazon Inspector and Jenkins for continuous integration and continuous delivery (CI/CD). This workshop takes you through approaches to vulnerability management with Amazon Inspector for EC2 instances, container images residing in Amazon ECR and within CI/CD tools, and AWS Lambda functions. Explore the integration of Amazon Inspector with Jenkins, and learn how to operationalize vulnerability management on AWS.
Browse the full re:Inforce catalog to learn more about sessions in other tracks, plus gamified learning, innovation sessions, partner sessions, and labs.
Our comprehensive track content is designed to help arm you with the knowledge and skills needed to securely manage your workloads and applications on AWS. Don’t miss out on the opportunity to stay updated with the latest best practices in threat detection and incident response. Join us in Philadelphia for re:Inforce 2024 by registering today. We can’t wait to welcome you!
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
Quantum computers are probably coming, though we don’t know when—and when they arrive, they will, most likely, be able to break our standard public-key cryptography algorithms. In anticipation of this possibility, cryptographers have been working on quantum-resistant public-key algorithms. The National Institute for Standards and Technology (NIST) has been hosting a competition since 2017, and there already are several proposed standards. Most of these are based on lattice problems.
The mathematics of lattice cryptography revolve around combining sets of vectors—that’s the lattice—in a multi-dimensional space. These lattices are filled with multi-dimensional periodicities. The hard problem that’s used in cryptography is to find the shortest periodicity in a large, random-looking lattice. This can be turned into a public-key cryptosystem in a variety of different ways. Research has been ongoing since 1996, and there has been some really great work since then—including many practical public-key algorithms.
On April 10, Yilei Chen from Tsinghua University in Beijing posted a paper describing a new quantum attack on that shortest-path lattice problem. It’s a very dense mathematical paper—63 pages long—and my guess is that only a few cryptographers are able to understand all of its details. (I was not one of them.) But the conclusion was pretty devastating, breaking essentially all of the lattice-based fully homomorphic encryption schemes and coming significantly closer to attacks against the recently proposed (and NIST-approved) lattice key-exchange and signature schemes.
However, there was a small but critical mistake in the paper, on the bottom of page 37. It was independently discovered by Hongxun Wu from Berkeley and Thomas Vidick from the Weizmann Institute in Israel eight days later. The attack algorithm in its current form doesn’t work.
This was discussed last week at the Cryptographers’ Panel at the RSA Conference. Adi Shamir, the “S” in RSA and a 2002 recipient of ACM’s A.M. Turing award, described the result as psychologically significant because it shows that there is still a lot to be discovered about quantum cryptanalysis of lattice-based algorithms. Craig Gentry—inventor of the first fully homomorphic encryption scheme using lattices—was less impressed, basically saying that a nonworking attack doesn’t change anything.
I tend to agree with Shamir. There have been decades of unsuccessful research into breaking lattice-based systems with classical computers; there has been much less research into quantum cryptanalysis. While Chen’s work doesn’t provide a new security bound, it illustrates that there are significant, unexplored research areas in the construction of efficient quantum attacks on lattice-based cryptosystems. These lattices are periodic structures with some hidden periodicities. Finding a different (one-dimensional) hidden periodicity is exactly what enabled Peter Shor to break the RSA algorithm in polynomial time on a quantum computer. There are certainly more results to be discovered. This is the kind of paper that galvanizes research, and I am excited to see what the next couple of years of research will bring.
To be fair, there are lots of difficulties in making any quantum attack work—even in theory.
Breaking lattice-based cryptography with a quantum computer seems to require orders of magnitude more qubits than breaking RSA, because the key size is much larger and processing it requires more quantum storage. Consequently, testing an algorithm like Chen’s is completely infeasible with current technology. However, the error was mathematical in nature and did not require any experimentation. Chen’s algorithm consisted of nine different steps; the first eight prepared a particular quantum state, and the ninth step was supposed to exploit it. The mistake was in step nine; Chen believed that his wave function was periodic when in fact it was not.
Should NIST be doing anything differently now in its post–quantum cryptography standardization process? The answer is no. They are doing a great job in selecting new algorithms and should not delay anything because of this new research. And users of cryptography should not delay in implementing the new NIST algorithms.
But imagine how different this essay would be were that mistake not yet discovered? If anything, this work emphasizes the need for systems to be crypto-agile: to be able to easily swap algorithms in and out as research continues. And for using hybrid cryptography—multiple algorithms where the security rests on the strongest—where possible, as in TLS.
And—one last point—hooray for peer review. A researcher proposed a new result, and reviewers quickly found a fatal flaw in the work. Efforts to repair the flaw are ongoing. We complain about peer review a lot, but here it worked exactly the way it was supposed to.
On 22 May 2024, we announced that we are intending to list the Foundation’s commercial subsidiary, Raspberry Pi Ltd, on the Main Market of the London Stock Exchange. This is called an Initial Public Offering (IPO).
The IPO process is — quite rightly — highly regulated, and information about the company and the potential listing can be found on the Investor Portal on Raspberry Pi Ltd’s website. If that’s what you’re looking for, head there.
In this blog post, I want to explain what an IPO of Raspberry Pi Ltd would mean for the Raspberry Pi Foundation.
A tale of two Raspberry Pis
The Raspberry Pi Foundation was founded in 2008 as a UK-based educational charity. Our co-founders wanted to inspire more young people to explore the joys of coding and creating with technology, with the goal of increasing both the number and diversity of kids choosing to study computer science and engineering.
Their idea was to create a low-cost, programmable computer that could rekindle some of the excitement sparked in young minds at the start of the personal computing revolution by platforms like the BBC Micro and ZX Spectrum (incidentally also invented in Cambridge, UK).
Raspberry Pi Ltd was incorporated in 2012 as the commercial subsidiary of the Foundation and is responsible for all aspects of design, production, and distribution of Raspberry Pi computers and associated technologies. It has always been a commercial company, albeit one that was initially wholly owned by a charity.
It’s fairly common for UK charities to have subsidiaries that handle their commercial activities. Guidance from the regulator, the Charity Commission, explains that it helps protect the charity’s assets and ensures that the charity benefits from tax relief on profits that are generated from commercial activities and used to advance the charity’s objectives.
So Raspberry Pi has pretty much always been a tale of two organisations: the Foundation, which is a charity, and Raspberry Pi Ltd, which is a commercial company. While we are legally and practically separate organisations, we are united by a mission to democratise computing, and by a set of values that reflect the community of makers, engineers, and educators that have always been such a central part of the Raspberry Pi story.
Computing for everybody
In the years since the launch of the first Raspberry Pi computer in 2012, Raspberry Pi Ltd has continued to innovate and expand its range of products, evolving into a leading provider of high-performance, single-board computers and associated technologies for industrial and embedded uses, as well as for enthusiasts and educators, in markets worldwide. For more information on the company and all it has achieved, you should take a look at the Investor Portal.
For me, one of the most important things about a Raspberry Pi computer is that kids are learning to code on the same platform that is used by the world’s leading engineers and scientists. It’s not a toy, although it is a lot of fun.
Crucially, the commitment to low-cost computing that was at the heart of Raspberry Pi’s founding ethos remains unchanged and has been enshrined in a legally binding agreement between the Foundation and the company. This means that Raspberry Pi will always produce low-cost, general-purpose computers that can be used for teaching and learning.
Over that same period, the Foundation has innovated and expanded its educational products and learning experiences to the point where we are now widely recognised as one of the world’s leading contributors to the democratisation of computing education.
We create curricula and classroom resources that are used in schools all over the globe, covering everything from basic digital skills to computer science and AI literacy. We provide high-quality professional development for teachers and we build software tools that reduce barriers, save time, and improve learning outcomes. We also support the world’s largest network of free coding clubs and inspire young people to get creative with tech through showcases and challenges. All of this is completely free for teachers and students wherever they are in the world.
We are also advancing the field of computing education through undertaking original research and translating evidence of what works into practice.
Importantly, the Foundation is device- and platform-agnostic. That means that, while Raspberry Pi computers make a huge contribution to our educational mission, you don’t need to use a Raspberry Pi computer to engage with our learning experiences and resources.
The next stage of growth and impact
The proposed IPO is all about securing the next stage of growth and impact for both the Foundation and the commercial company.
To date, Raspberry Pi Ltd has donated nearly $50m from its profits to the Foundation, which we have used to advance our educational mission combined with over $60m in funding from philanthropy, sponsorship, and contracts for educational services.
As the company has continued to grow, it has needed working capital and funding to invest in innovation and product development. Over the past few years that has mainly come from retained profits. Listing Raspberry Pi Ltd on a public market will enable the company to raise additional capital through issuing new shares, which will lead to broader reach, greater impact, and ultimately more value being created for the benefit of all shareholders, including the Foundation.
From the Foundation’s perspective, an IPO provides us with the ability to sell some of our shares to raise money to finance a sustainable expansion of our educational activities. Put simply, instead of receiving a share of the company’s profits each year, we will convert some of our shareholding into an endowment that we will use to fund our educational programmes.
What happens after the IPO?
Assuming we proceed with the IPO, what is now Raspberry Pi Ltd will become a public company that trades its shares on the Main Market of the London Stock Exchange.
The Foundation will remain a significant shareholder and we will continue to share the Raspberry Pi brand. We will be involved in decision making on the same basis as all other shareholders. Our goal will be to support the company to be as successful as possible in its mission to make computing accessible and affordable for everybody.
The Foundation will use any funds that we raise through the sale of shares at the IPO — or subsequently — to advance our ambitious global strategy to enable every young person to realise their full potential through the power of computing and digital technologies.
Partnership will continue to be at the heart of our strategy and we will work closely with businesses, foundations, and governments to ensure that our work reaches as many teachers and young people as possible. Our ambition is that around 50% of our activities will be funded from the endowment and 50% through partnerships and donations, enabling us to reach many more teachers and students by combining our resources and expertise with those of the many partners who share our mission.
Creating a lasting legacy
Whatever happens with the IPO, Raspberry Pi has already had a huge impact on the world. It’s been an enormous privilege to be part of the journey so far, and I am hugely excited about the potential of this next phase.
I want to pay tribute to all of our co-founders for setting us off on this great adventure, and particularly to Jack Lang, who very sadly passed away earlier this month. Jack made an exceptional and unique contribution to the Raspberry Pi story, and he deserves to go down in history as one of the most significant figures in computing education in the UK. I know he would have shared my excitement about this next chapter in the Raspberry Pi story.
With the pace of technological advances in fields like AI, our mission has never been more vital. We have the potential to positively impact the lives of tens of millions of young people who might otherwise miss out on the opportunity to change the world for the better through technology.
Change happens at an increasingly rapid and intense pace in the hyperconnected world we live in. This affects consumer relationships, forcing retailers to find more efficient ways of attracting customers. Linx, a company under the StoneCo group and a technology specialist for retail, understands this and has been using Zabbix to provide a better experience for their customers since 2017.
With extensive operations in over 20 retail segments and a portfolio of more than 180 solutions, Linx serves both small entrepreneurs and large retailers, offering the largest retail ecosystem in Latin America. In 2018, a presentation by the company at the Zabbix Conference Latin America stood out as a practical demonstration of how Zabbix served as a key tool in Linx’s business. Keep reading to find out how Zabbix has stayed central to the company’s strategy ever since.
Table of Contents
The challenge
It all began in 2017 when Linx, understanding the importance of a stable IT environment, real-time data collection, and a quality customer journey, faced a challenge. They needed to transform their current Network Operation Center (NOC) into a structure aligned with the business that generated real value for customers. This required multiple actions, including:
Migrating the physical operation and monitoring structure from Porto Alegre-RS to São Paulo-SP
Reviewing the monitoring structure to generate indicators focused on customer success and experience
Replacing their existing monitoring tool with a more flexible one to keep up with new challenges and guarantee technical investment
Managing the mix of technologies used in the environment with a significant need for hybrid, cloud, and on-premises monitoring
Making this new structure the main provider of information for the rest of the company
Implementing a new NOC structure aligned with the business that would also generate value for customers
Speeding up the response time whenever incidents and anomalies were detected
To meet the business challenges and needs, Linx partnered with Unirede Inteligência em TI, a Zabbix Premium Partner in Brazil, to support the validation and subsequent implementation project. At that time, Linx’s solutions were specialized in retail as well as on-premises and SaaS (Software as a Service) solutions, with a data and cloud services platform focusing on customer experience.
The solution
Linx needed a robust and flexible tool that could meet business needs and share data with multiple teams.
“Linx chose Zabbix because it is an open-source, flexible platform capable of monitoring various technologies with different collection methods. Having a specialized partner to support deliveries— in this case, Unirede— also impacted the decision.”
Gabriel Pedroso, CEO of Unirede Inteligência em TI
Linx’s existing monitoring structure focused only on network infrastructure, much like a traditional NOC. A new objective arose – to understand the customer journey across all products and service segments. Linx needed to be able to map what kind of retail experience was being promoted to customers, predict peak moments, and gain insight into other behaviors that could influence the customer experience and compromise transactions.
This led to the creation of the xCenter, a structure that would become Linx’s experience monitoring center, expanding the view of servers, communication links, memory usage, and other infrastructure details.
“The xCenter goes beyond assets to establish the monitoring of complex solutions in hybrid environments (on-premises and cloud), delivering mobility, relevant content, and flexibility. Everything is oriented towards customer success.”
Nelson Lima, Coordinator of Linx’s xCenter
During the project’s initial phase, there was a mindset shift among the company’s teams, resulting in cooperation towards the project’s success and a renewed focus on the customer experience.
The first discoveries in Linx’s network were also made, along with adjustments and improvements in existing monitoring, including business metrics and the use of dashboards for visual management. In the same year (2017) xCenter operations began, delivering the first monitoring results.
By 2018, gains were evident: teams no longer thought exclusively about servers, storage, and network assets, but rather about services, transactions, and delivery excellence. That year, the project evolved into what was called “Partiu Cloud,” introducing the first Azure cloud environment monitoring. This took place alongside the ongoing evolution of existing monitoring and the democratization of information generated by the structure for decision-making. Examples of business services monitored at that time included NFCe, TEF, POS, sales reports, fiscal coupon issuance time, POS synchronization, and SaaS delivery.
In 2019, the focus shifted to business, with assessments, structuring, and development of metrics for Linx’s products. By then, Linx’s sales force was already using availability and performance information in their service routine, accessing product dashboards from mobile devices and using the data in their pitches to customers.
A significant update occurred in 2021 with the upgrade to Zabbix 5.0 LTS, which increasingly supported the delivery of connectivity services to customers alongside the already delivered platform and software solutions.
2022 was marked by significant variations in the tool’s use – integration with Jira software, integration with Linx’s AD for secure login (SSO), and integrations with other communications tools. The tool was also customized, involving script execution and remote commands for specific collections.
2023 brought a new focus on automation, using integrations with other tools to automate functions in Zabbix. For example, Microsoft Power Automate was able to facilitate daily operational processes and improve the use of media types. The entire journey was a continuous process of adaptation and improvement, showing an increased focus on business metrics and customer experience.
“Zabbix enabled the democratization of fundamental business data and information across different company areas. In other words, monitoring has become a strategic element for Linx’s business.”
Gabriel Pedroso, CEO of Unirede Inteligência em TI
The results
After the cultural shift and Zabbix implementation, the following strategic and operational gains were observed:
Improved efficiency and agility. Problems are detected and resolved faster, with incidents directly handled by Zabbix through integrations that automate actions.
Additional integrations. Zabbix’s flexibility and scalability allow for integration with other structural components, supporting operational efficiency.
More customer-oriented features and a renewed customer focus. Using Zabbix enabled a focus on customer success through a change in culture and business perspective, with features reviewed from a user experience perspective.
Improved scalability. This came about thanks to Zabbix’s capacity to support large loads, perform distributed data collection, and expand as needed.
Better market alignment. Aligning operations and monitoring with market needs created specific panels and monitoring for major national retail dates, such as Black Friday and Christmas.
Better sales pitches. Sales pitches are now based on data and performance monitored by xCenter. The sales team presents real monitoring data as a pitch for service quality.
Consolidation. Integrating all monitoring in a single tool eliminates the need for multiple tools and integrations.
An investment guarantee. Zabbix is now seen as a tool that accompanies Linx through future challenges, negating the need for tool changes.
A democratization of information. xCenter provides information to the entire company, improving decision-making in the process.
What’s next?
The journey to success is never-ending, and the following steps demonstrate Linx’s ongoing commitment to improvement and innovation in monitoring and the future of managing customer experiences for the services provided by the company.
Upgrading to Zabbix LTS 6.0. This upgrade is a crucial step to keep the platform updated with the latest features and security improvements.
Integrating with WhatsApp. Linx aims to improve communication and notification flexibility by offering alerts and updates via the popular messaging app.
Application Tracking Monitoring (APM). This involves a renewed focus on enhancing application monitoring, which is essential for maintaining and optimizing system performance. This improvement is planned in Zabbix’s roadmap.
Business layer evolution. Enhancing the monitoring and analysis of business indicators for better data-based decision-making.
Enhancing Availability Metrics. Improving the precision and relevance of metrics related to service and product availability.
Evolving predictive metrics. Implementing and refining metrics that can predict trends or future problems, allowing proactive actions.
In conclusion
When taken together with Unirede’s support of retail monitoring, Linx’s success story with Zabbix is a true inspiration. With the help of Zabbix, Linx has turned data into information and information into action in order to prove the excellence of its services. Throughout its growth trajectory, the company has managed to turn challenges into opportunities, using Zabbix to optimize operations and focus on what really matters – satisfying their customers.
To find out more about what Zabbix can do across a variety of industries, feel free to visit our website or request a demo.
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.