Post Syndicated from Explosm.net original https://explosm.net/comics/the-spectrum
New Cyanide and Happiness Comic
Post Syndicated from Explosm.net original https://explosm.net/comics/the-spectrum
New Cyanide and Happiness Comic
Post Syndicated from Thomas Peitz (@thomaspeitz) original https://prometheus.io/blog/2024/11/19/yace-joining-prometheus-community/
Yet Another Cloudwatch Exporter (YACE) has officially joined the Prometheus community! This move will make it more accessible to users and open new opportunities for contributors to enhance and maintain the project. There’s also a blog post from Cristian Greco’s point of view.
When I first started YACE, I had no idea it would grow to this scale. At the time, I was working with Invision AG (not to be confused with the design app), a company focused on workforce management software. They fully supported me in open-sourcing the tool, and with the help of my teammate Kai Forsthövel, YACE was brought to life.
Our first commit was back in 2018, with one of our primary goals being to make CloudWatch metrics easy to scale and automatically detect what to measure, all while keeping the user experience simple and intuitive. InVision AG was scaling their infrastructure up and down due to machine learning workloads and we needed something that detects new infrastructure easily. This focus on simplicity has remained a core priority. From that point on, YACE began to find its audience.
As YACE expanded, so did the support around it. One pivotal moment was when Cristian Greco from Grafana Labs reached out. I was feeling overwhelmed and hardly keeping up when Cristian stepped in, simply asking where he could help. He quickly became the main releaser and led Grafana Labs’ contributions to YACE, a turning point that made a huge impact on the project. Along with an incredible community of contributors from all over the world, they elevated YACE beyond what I could have achieved alone, shaping it into a truly global tool. YACE is no longer just my project or Invision’s—it belongs to the community.
I am immensely grateful to every developer, tester, and user who has contributed to YACE’s success. This journey has shown me the power of community and open source collaboration. But we’re not done yet.
It’s time to take Yace even further—into the heart of the Prometheus ecosystem. Making Yace as the official Amazon CloudWatch exporter for Prometheus will make it easier and more accessible for everyone. With ongoing support from Grafana Labs and my commitment to refining the user experience, we’ll ensure YACE becomes an intuitive tool that anyone can use effortlessly.
Try out YACE (Yet Another CloudWatch Exporter) by following our step-by-step Installation Guide.
You can explore various configuration examples here to get started with monitoring specific AWS services.
Our goal is to enable easy auto-discovery across all AWS services, making it simple to monitor any dynamic infrastructure.
Post Syndicated from Shllomi Ezra original https://aws.amazon.com/blogs/architecture/know-before-you-go-aws-reinvent-2024-cloud-resilience/
With AWS re:Invent 2024 just weeks away, the excitement is building and we’re looking forward to seeing you all soon. If you’re attending re:Invent with the goal of improving your organization’s cloud resilience operations, we will be offering valuable insights, best practices, and fun activities to improve your cloud resilience expertise.
This year, we’re offering more than 100 resilience breakout sessions, workshops, chalk talks, builders’ sessions, and code talks. Find the complete list in the re:Invent 2024 session catalog and filter by “Resilience” in the area of interest field.
In this post, we highlight must-see sessions for those building resilient applications and architectures on AWS. Reserved seating is now open, so act quickly to claim your seat. Be sure to also check out the vertical-specific re:Invent guides.
Our recommendations are divided into three topics to help you choose the sessions most relevant to your business: resilience fundamentals, advanced resilience patterns, and resilience for customers operating in regulated industries.
Cloud resilience refers to the ability for an application to resist or recover from disruptions, including those related to infrastructure, dependent services, misconfigurations, transient network issues, and load spikes. Cloud resilience also plays a critical role in an organization’s broader business resilience strategy, including the ability to meet digital sovereignty requirements. Resilient applications are those built with high availability—the percentage of time the application is available for use—and also those with a disaster recovery or continuity of operations plan in place.
Join us as we explore the strategies, tools, and mindsets that enable organizations to thrive in the face of uncertainty. These sessions cover conceptual overviews and demos of AWS cloud resilience services.
Failing without flailing: Lessons we learned at AWS the hard way (ARC333)
At AWS, we’ve learned that building resilient services requires more than just designing for high availability. In this session, AWS operational leaders are back for more insights on how to mitigate impact when, not if, the unexpected happens. Hear a few short stories collected from 18 years of operational excellence, with practical advice on preparing for and mitigating failure.
Think big, build small: When to scale and when to simplify (ARC331)
Join this session to learn how to navigate the complexities of cloud architecture. Hear insights and guidance developed from working with successful AWS customers, including how to optimize for business value and agility. Discover the AWS approach to architectural tiers, engineering simplicity and reliability, and treating infrastructure as an investment.
Mastering resilience at every layer of the cake (ARC327)
Join this session to learn resilience at various levels, from platform to applications, using AWS services like AWS Resilience Hub, AWS Fault Injection Service, ARC, Amazon Elastic Disaster Recovery, and AWS Backup. You’ll leave with a mental model for resilience across these layers, and ready-to-use reference architectures and guidance. The session includes demos for a fun, lively experience.
Building resilient applications on AWS with Capital One (ARC334)
In this session, discover the patterns and principles of AWS resilience best practices. Then, hear Capital One showcase its next-generation design and deployment patterns that push the boundaries of resilient architectures and support its most critical business processes. Learn about the AWS services it uses, the trade-offs it must consider, and the decision matrix that guides developers to the right pattern for the right use case.
Data protection and resilience with AWS storage (STG301)
Join this session to dive deep on how AWS storage offers organizations defense-in-depth data protection and resilience for application data across recovery point and time objectives, helping mitigate risks with immutable solutions, restore testing, policy-based access controls, encryption, and auditing and reporting.
Building, operating, and testing resilient Multi-AZ applications (ARC303)
Join this workshop to get hands-on experience building, operating, and testing a resilient Multi-AZ application.
Building resilient architectures with observability (COP308)
Explore how to use AWS services, including AWS Resilience Hub, Amazon CloudWatch, and AWS Fault Injection Service, to build resilient and reliable cloud-based applications.
Building resilient and reliable applications in the cloud is critical for organizations running mission-critical workloads. Unexpected outages, latency spikes, or performance issues can have severe business impact. The sessions and workshops in this track explore advanced techniques and tools to help you proactively identify and address resilience weaknesses in your systems. Learn how to use chaos engineering, multi-Region architectures, and the latest AWS services and best practices to enhance the resilience and operational excellence of your cloud applications.
Chaos engineering: A proactive approach to system resilience (ARC326)
This session demonstrates the benefits of chaos engineering in action. Gain insights from BMW Group’s transformative journey, learning key lessons on scaling chaos engineering across the organization, and how BMW Group conducts large-scale chaos experiments in production, uncovering issues and fostering a culture of greater resilience and continuous improvement.
Try again: The tools and techniques behind resilient systems (ARC403)
Grand architectural theories are nice, but what makes systems resilient is in the details. Marc Brooker, VP and distinguished engineer, looks at some of the resiliency tools and techniques AWS uses in its systems. Marc rethinks, retries, breaks open circuit breakers, decodes erasure coding, and tackles the tail. Learn about formal methods and simulation, and how these tools help build faster code, faster.
Multi-Region or single Region? Considerations and architectures (ARC309)
Watch experts walk through and whiteboard architectures that take advantage of AWS services that support multi-Region capabilities, and discuss what a failover scenario would look like in real life. Leave with an understanding of what it takes to run a multi-Region architecture on AWS.
Best practices for creating multi-Region architectures on AWS (ARC323)
In this session, learn the two critical areas you’ll need to consider. First, explore different failover strategies and the trade-offs between them. Then, learn how to make the decision to initiate a cross-Region failover as well as what goes into the process. Lastly, hear from Samsung Account about their multi-Region application and how they think about these two critical areas.
Chaos engineering workshop (ARC322)
This workshop introduces AWS Fault Injection Service for running controlled resilience experiments to improve application performance, observability, and resilience. You must bring your laptop to participate.
Gen AI resilience: Chaos engineering with AWS Fault Injection Service (ARC305)
Learn how to construct a useful hypothesis backlog for generative AI applications and how to use AWS Fault Injection Service to run those experiments. You must bring your laptop to participate.
Building operational resilience in workloads using generative AI (SUP401)
Building operational resilience requires proactive identification and mitigation of risks. In this workshop, use AWS managed generative AI services in real-world scenarios to learn how to assess readiness, proactively improve your architecture, react quickly to events, troubleshoot issues, and implement effective observability practices. Also use AWS Countdown and the AWS Well-Architected Framework as the entry point reference frameworks to use generative AI services for operation. Through hands-on activities, learn strategies for debugging issues, detecting anomalies and incidents, and optimizing architectures to improve the resilience of your workloads. You must bring your laptop to participate.
In regulated industries like finance, healthcare, and telecom, resilient architecture is critical for compliance, security, and operational continuity. These sectors face strict regulations that demand robust data protection, disaster recovery, and uptime guarantees. A resilient architecture helps organizations maintain service availability, minimize downtime, and recover quickly from disruptions, safeguarding sensitive data and avoiding regulatory breaches. It also enables businesses to adapt to evolving regulations while delivering secure, uninterrupted services.
Fidelity Investments: Building for mission-critical resilience (FSI318)
This session explores the transformation of Fidelity Investments’s trade processing platform on AWS and the critical role resiliency plays in preserving operational integrity.
Service event replay: Stress-testing your architecture’s resilience (FSI314)
Learn how to assess the resiliency of your own architectures and develop strategies to strengthen your response and recovery capabilities.
Scaling multi-tenant SaaS with a cell-based architecture (ARC402)
In this workshop, see how cell-based architectures provide you with new ways to group, deploy, scale, and operate your multi-tenant workloads. Also see how this approach influences the tiering, scaling, and resilience profile of your SaaS architecture. You must bring your laptop to participate.
Advanced cross-Region DR patterns on AWS (ARC401)
Join this hands-on workshop to explore a resilient, cloud-centered architecture that surpasses the stringent availability and recovery regulations for financial markets utility providers. You must bring your laptop to participate.
Throughout the re:Invent week, if you have any questions or suggestions for the AWS Cloud Resilience team, drop by the Cloud Resilience kiosk at the AWS Village in the 2024 re:Invent Expo (the Venetian).
This post was copyedited for grammar, spelling, capitalization, punctuation, terminology, and legal issues. Other important issues are noted in comments, and you should consider revising the content accordingly before publication.
Post Syndicated from aostan original https://aws.amazon.com/blogs/compute/using-zonal-shift-with-amazon-ec2-auto-scaling/
This post is written by Michael Haken, Senior Principal Solutions Architect, AWS
Today, we’re announcing support for zonal shift in Amazon EC2 Auto Scaling. Zonal shift gives allows you to rapidly recover from application impairments in a single Availability Zone (AZ) impacting your Auto Scaling Group (ASG) resources. In this post, we describe how performing an ASG zonal shift fits in to a multi-AZ resilience strategy and considerations for how to use the feature with different architectures.
Using multiple AZs is an architectural best practice for building resilient applications on AWS. Deploying your application across multiple AZs makes your applications more available, fault tolerant, and scalable. EC2 Auto Scaling enables you to further enhance your application’s availability and fault tolerance by dynamically scaling your Amazon Elastic Compute Cloud (Amazon EC2) instances across multiple AZs and replacing them when they’re unhealthy.
AZs in AWS represent a fault isolation boundary, meaning that failures from various sources are contained to a single AZ, whether caused by a bad deployment, networking issues, power loss, or operator error. In 2023, we launched zonal shift, part of Amazon Application Recovery Controller (ARC), which allows you to Rapidly recover from application impairments in a single AZ by shifting traffic at your Elastic Load Balancing (ELB) load balancer.
Zonal shift for EC2 Auto Scaling enhances this capability for users who have already implemented recovery patterns for single AZ impairments. It also provides recovery capabilities for architectures that aren’t load balanced by allowing you to prevent new instance launches in a specified AZ. Without zonal shift, when EC2 Auto Scaling detects consistent launch failures in an AZ, the service tries to launch instances in other AZs configured for the ASG. However, certain conditions, like gray failures, can cause post-launch problems in a single AZ that EC2 Auto Scaling doesn’t detect. For example, successfully launched instances in a single AZ experience elevated error rates downloading their configuration files over a zonal Amazon S3, Amazon Virtual Private Cloud (Amazon VPC) interface endpoint. The instances can’t correctly configure their application software and respond to requests with errors. Alternatively, the single-AZ impairment could cause the instance to fail its health checks after provisioning. This causes EC2 Auto Scaling to constantly recycle instances in the impaired AZ, leading to the application running with less capacity than desired.
Although you might choose to perform a zonal shift at your load balancer to mitigate the impact caused by the event, new instances can still be launched in the impacted AZ and don’t receive incoming requests. Even if your application architecture doesn’t use load balancers, zonal shift for EC2 Auto Scaling can help you recover from single-AZ impairments by allowing you to prevent instance launches in the impaired AZ.
To use zonal shift on your ASG, you need to configure it with an AvailabilityZoneImpairmentPolicy parameter either when you create a new ASG or update an existing one. This parameter has two options, ZonalShiftEnabled that enables or disables the ability to perform zonal shifts, and ImpairedZoneHealthCheckBehaviour. The latter option allows you to choose between ignoring or replacing instances identified as unhealthy by EC2 Auto Scaling. First, we look at how you can use zonal shift with a standalone ASG architecture.
This architecture uses a standalone ASG without being integrated with an ELB load balancer. Workloads with a standalone ASG commonly perform event driven work such as generating load against a target based on a schedule or processing messages from a queue. The architecture in the following figure uses an ASG that reads messages from an Amazon Simple Queue Service (Amazon SQS) queue, performs some processing on the message data, and writes the results into an Amazon Aurora database. The instances communicate with Amazon SQS using a VPC endpoint in each AZ. Each message varies in size, thus the instances use a heartbeat pattern to update the message visibility timeout until they finish processing it. EC2 Auto Scaling scales instances based on the queue depth, which helps make sure that messages are processed in a timely manner.
Figure 1: EC2 instances deployed across three AZs that process messages from an SQS queue
Say that a networking degradation causes instances in AZ 1 to experience elevated error rates when attempting to write to the Aurora database, resulting in a 2x increase in the p50 processing latency. The instances in AZ 1 continue to heartbeat until they time out, keeping the message hidden and preventing other healthy instances from taking over the work. As a result, the queue depth grows and EC2 Auto Scaling deploys a new instance, as shown in the following figure.
Figure 2: EC2 Auto Scaling launches a new instance in AZ 1 in response to the queue depth growing
The new instance lands in AZ 1 and experiences the same problem as the other instance, thus it can’t decrease the queue depth and processing latency. Instead, it exacerbates the issue by consuming additional messages that aren’t successfully processed. The instances in AZ 1 never appeared unhealthy, thus EC2 Auto Scaling didn’t take any actions to replace them. To mitigate this problem, you can start a zonal shift for your ASG. This makes sure that any future instance launches only happen in AZ 2 or AZ 3, as shown in the following figure.
Figure 3: After the zonal shift new instances are only launched in AZ 2 and AZ 3 by EC2 Auto Scaling
You have the option to mark the instances as unhealthy using the SetInstanceHealth API to force EC2 Auto Scaling to replace these instances to prevent them from continuing to contribute to additional latency and errors. Changing the instance health state is considered a mutating change and relies on the EC2 Auto Scaling control plane. Therefore, you should avoid making this a critical step in your recovery plan. When you are confident that the impairment has abated, you can cancel the zonal shift, which causes EC2 Auto Scaling to automatically rebalance capacity across your AZs.
In this section we observe how to use zonal shift with an ASG that is serving traffic from an ELB. We also examine how the ImpairedZoneHealthCheckBehavior affects recovery in this situation. In this architecture, the instances in the ASG read data from the database when they receive HTTP requests from the ELB, as shown in the following figure.
Figure 4: A three-tier application deployed in three AZs using an ALB, ASG, and Aurora database
In this scenario, the instances in AZ 1 start experiencing increased latency with their EBS volumes causing them to respond to requests with errors and fail their EC2 instance status checks. Initially, to mitigate the impact, you can start a zonal shift at your load balancer to prevent your users from receiving errors. Then, you can initiate a zonal shift for your ASG to prevent new capacity from being launched into the AZ that isn’t receiving traffic.
If the ASG’s ImpairedZoneHealthCheckBehavior is set to IgnoreUnhealthy, then the instances in AZ 1 that are failing their health checks aren’t terminated by EC2 Auto Scaling, as shown in the following figure. This can be helpful if you’re pre-scaled to handle the loss of an AZ’s worth of capacity by not causing EC2 Auto Scaling to attempt to launch additional instances. It can also make recovery safer by leaving capacity in the AZ, thus when you end your load balancer zonal shift after the impairment ends, the AZ can immediately start receiving traffic again.
Figure 5: Performing a zonal shift on the ALB and ASG, choosing to ignore unhealthy instances in the ASG
Alternatively, you can set the option to ReplaceUnhealthy. Now, instances that are found to be unhealthy by EC2 Auto Scaling are replaced. This option can be helpful if you aren’t pre-scaled to handle the loss of capacity. EC2 Auto Scaling launches new instances into the remaining AZs to bring the ASG back to its desired capacity, as shown in the following figure. However, this approach also has a tradeoff: launching new instances isn’t guaranteed to be successful, thus you might experience delays in acquiring new capacity.
Figure 6: Performing a zonal shift on the ALB and ASG, this time replacing unhealthy instances in the remaining AZs
In both situations you must consider whether you have cross-zone load balancing enabled or disabled. When cross-zone load balancing is enabled, each instance, regardless of its AZ, receives an approximately equal share of the traffic. This means that you can end your zonal shift for both your load balancer and ASG at the same time safely. As EC2 Auto Scaling rebalances your instances across each enabled AZ, they receive the same percentage of traffic.
If cross-zone load balancing is disabled, then each AZ receives an equal percentage of the traffic, regardless of how many instances are in the AZ. If you’ve chosen to replace unhealthy instances, or if your ASG has scaled during the event, then the capacity across your AZs could have become imbalanced. When you end your load balancer zonal shift and EC2 Auto Scaling begins to rebalance your capacity, you could end up in a situation shown in the following figure, where a single or small number of instances gets an overwhelming portion of the load.
Figure 7: A three-tier architecture with an imbalance of capacity among its three AZs
This imbalance can present an overload risk, thus you must specify the –skip-zonal-shift-validation parameter when you enable zonal shift to acknowledge that you understand the risk. However, you can help prevent overload from occurring due to imbalance by using the load balancer’s target_group_health.dns_failover.minimum_healthy_targets.count option and specifying the number of instances that should be present in the AZ. If you’re using three AZs and your desired capacity is 12, then you should set the value to four (which represents one third of the ASGs total capacity). This prevents traffic from being routed to the AZ until there is enough healthy capacity there to handle the load. You may need to dynamically adjust this number as the ASG scales over time. The minimum count you set in the past may not be the right minimum count today.
As a set of best practices, we recommend that you:
With this configuration, you can also safely use zonal autoshift. When zonal autoshift is enabled, AWS automatically starts and ends the zonal shift on your behalf whenever the AWS telemetry indicates there is an impairment affecting a single AZ. This can be used in conjunction with zonal autoshift for your ELB load balancer. If you are not using zonal autoshift, then you can still use the EventBridge observer notifications to inform your zonal shift decisions or start automated processes. Refer to the EC2 Auto Scaling zonal shift documentation for more details on the full set of best practices when using zonal shift.
In this post we showed you the benefits of using zonal shift with your Amazon EC2 Auto Scaling Groups as part of enhancing your resilience in multi-AZ architectures. We explored several scenarios where zonal shift can be used, and reviewed best practices for using zonal shift safely and effectively. To get started using zonal shift with your ASGs, refer to the documentation.
Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/streamline-container-application-networking-with-native-amazon-ecs-support-in-amazon-vpc-lattice/
Since its launch, Amazon VPC Lattice has streamlined complex networking tasks. As a result, my perspective on how to build and connect modern, multi-service applications has changed. As my colleague Danilo wrote in his post announcing the general availability of VPC Lattice:
“By using VPC Lattice, you can focus on your application logic and improve productivity and deployment flexibility with consistent support for instances, containers, and serverless computing.”
Today, we’re announcing Amazon VPC Lattice built-in support for Amazon Elastic Container Service (Amazon ECS). With this new built-in integration, Amazon ECS services can now be directly associated with VPC Lattice target groups without the need for intermediate load balancers.
Here’s a quick look at how you can find Amazon VPC Lattice integration while creating an Amazon ECS service:

The Amazon VPC Lattice integration with Amazon ECS works by registering and deregistering IP addresses from ECS tasks within a service as targets in a VPC Lattice target group. As ECS tasks for the service are launched, Amazon ECS will automatically register those tasks to the VPC Lattice target group.
Furthermore, if ECS tasks fail VPC Lattice health checks, Amazon ECS will automatically replace the tasks. Also, if any task is terminated or scales down, it’s removed from the target group.
Using the Amazon VPC Lattice integration
Let me walk you through how to use this new integration. In the following demo, I will deploy a simple application server running as an ECS service and configure the integration with VPC Lattice. Then, I’ll test the application server by connecting to the VPC Lattice domain name without having to configure additional load balancers on Amazon ECS.
Before I can start with this integration, I need to make sure Amazon ECS will have the required permissions to register and deregister targets into VPC Lattice. To learn more, please visit the Amazon ECS infrastructure IAM role documentation page.
To use the integration with VPC Lattice, I need to define a task definition with at least one container and one port mapping. This is an example of my task definition.
{
"containerDefinitions": [
{
"name": "webserver",
"image": "public.ecr.aws/ecs-sample-image/amazon-ecs-sample:latest",
"cpu": 0,
"portMappings": [
{
"name": "web-80-tcp",
"containerPort": 80,
"hostPort": 80,
"protocol": "tcp",
"appProtocol": "http"
}
],
...
*redacted for brevity*
}
Then, I navigate to my ECS cluster and choose Create.

Next, I need to select the task definition and assign the service name.

In the VPC Lattice integration section, I choose Turn on VPC Lattice to start configuring the target group for VPC Lattice. I don’t need to specify a load balancer because I’ll use VPC Lattice. By default, VPC Lattice will use a round-robin routing algorithm to route requests to healthy targets.

Now, I can start defining the integration for my ECS service in VPC Lattice. First, I select the infrastructure role for Amazon ECS. Then, I need to select the virtual private cloud (VPC) where I want my service to run. After that, I need to define the Target groups that will receive traffic. After I’m done configuring the service with VPC Lattice integration, I create this service.

After a few minutes, I have my ECS service ready. I navigate to the service and choose Configuration and networking. If I scroll down to the VPC Lattice section, I can see the VPC Lattice target group created.

To get more information on this target group, I select the target group name, which will redirect me to the VPC Lattice target group page. Here, I can see that Amazon ECS successfully registered the IP address of the running task.

Now, I need to create a VPC Lattice service and service network. My preference is always to create the VPC Lattice service then associate with the VPC Lattice service network later on. So, let’s do that.
I choose Services under the VPC Lattice section and choose Create service.

I fill in all the details required to create a VPC Lattice service and choose Next.

Then, I add a listener, and for the Forward to target group on the Listener default action, I select the newly created target group.

On the next page, because I’m going to create the VPC Lattice service network later, I skip this step and choose Next, review the configurations, and create the service.

With VPC Lattice service created, now it’s time to create VPC Lattice service networks. I navigate to Service networks under the VPC Lattice section and choose Create service network.

First, I fill the VPC Lattice service network name.

Then, on the Service associations page, I select the service that I have created.

I associate this service network to my VPC as well as the security group.

For the simplicity of this demo, I set None for the Auth type. However, I highly recommend you to read how you can use IAM to manage access to VPC Lattice. Then, I choose Create service network.

At this stage, we have everything setup for this integration. My VPC Lattice service network is now associated with my VPC Lattice service and my VPC.

With everything set up, I copy the Domain name from my VPC Lattice service page.

Then, to access the service, I log in to the instance in the same VPC and call the service by using the domain name from VPC Lattice.
[ec2-user@ ~]$ curl http://service-a-XYZ.XYZ.vpc-lattice-svcs.XYZ.on.aws
"Hello there! I'm Amazon ECS."
One thing to note is if you’re not receiving traffic to your Amazon ECS workloads, check the security groups as described in the Control traffic in VPC Lattice using security groups documentation page.
I’m personally excited about this integration because it unlocks various possibilities while streamlining application architectures and improving overall system reliability. Now that all AWS compute types are inherently supported in VPC Lattice, I can unify services across all my ECS clusters, AWS accounts, and VPCs.
Things to know
Here are a couple of important points to note:
Try this new capability of Amazon VPC Lattice today and see how it can streamline your container application communication running on Amazon ECS.
Happy building!
Post Syndicated from Explosm.net original https://explosm.net/comics/dark-chocolate
New Cyanide and Happiness Comic
Post Syndicated from Channy Yun (윤석찬) original https://aws.amazon.com/blogs/aws/aws-lambda-snapstart-for-python-and-net-functions-is-now-generally-available/
Today, we’re announcing the general availability of AWS Lambda SnapStart for Python and .NET functions that delivers faster function startup performance, from several seconds to as low as sub-second, typically with minimal or no code changes in Python, C#, F#, and Powershell.
In November 28, 2022, we introduced Lambda SnapStart for Java functions to improve startup performance by up to 10 times. With Lambda SnapStart, you can reduce outlier latencies that come from initializing functions, without having to provision resources or spend time implementing complex performance optimizations.
Lambda SnapStart works by caching and reusing the snapshotted memory and disk state of any one-time initialization code, or code that runs only the first time a Lambda function is invoked. Lambda takes a Firecracker microVM snapshot of the memory and disk state of the initialized execution environment, encrypts the snapshot, and caches it for low-latency access.

When you invoke the function version for the first time, and as the invocations scale up, Lambda resumes new execution environments from the cached snapshot instead of initializing them from scratch, improving startup latency. Lambda SnapStart makes it easy to build highly scalable and responsive applications in Python and .NET using AWS Lambda.
For Python functions, startup latency from initialization code can be several seconds long. Some scenarios where this can occur are – loading dependencies (such as LangChain, Numpy, Pandas, and DuckDB) or using frameworks (such as Flask or Django). Many functions also perform machine learning (ML) inference using Lambda, and need to load ML models during initialization – a process that can take tens of seconds depending on the size of the model used. Using Lambda SnapStart can reduce startup latency from several seconds to as low as sub-second for these scenarios.
For .NET functions, we expect most use cases to benefit because .NET just-in-time (JIT) compilation takes up to several seconds. Latency variability associated with initialization of Lambda functions has been a long-standing barrier for customers to use .NET for AWS Lambda. SnapStart enables functions to resume quickly by caching a snapshot of their memory and disk state. Therefore, most .NET functions will experience significant improvement in latency variability with Lambda SnapStart.
Getting started with Lambda SnapStart for Python and .NET
To get started, you can use the AWS Management Console, AWS Command Line Interface (AWS CLI) or AWS SDKs to activate, update, and delete SnapStart for Python and .NET functions.
On the AWS Lambda console, go to the Functions page and choose your function to use Lambda SnapStart. Select Configuration, choose General configuration, and then choose Edit. You can see SnapStart settings on the Edit basic settings page.

You can activate Lambda functions using Python 3.12 and higher, and .NET 8 and higher managed runtimes. Choose Published versions and then choose Save.
When you publish a new version of your function, Lambda initializes your code, creates a snapshot of the initialized execution environment, and then caches the snapshot for low-latency access. You can invoke the function to confirm activation of SnapStart.
Here is an AWS CLI command to update the function configuration by running the update-function-configuration command with the --snap-start option.
aws lambda update-function-configuration \
--function-name lambda-python-snapstart-test \
--snap-start ApplyOn=PublishedVersions
Publish a function version with the publish-version command.
aws lambda publish-version \
--function-name lambda-python-snapstart-test
Confirm that SnapStart is activated for the function version by running the get-function-configuration command and specifying the version number.
aws lambda get-function-configuration \
--function-name lambda-python-snapstart-test:1
If the response shows that OptimizationStatus is On and State is Active, then SnapStart is activated, and a snapshot is available for the specified function version.
"SnapStart": {
"ApplyOn": "PublishedVersions",
"OptimizationStatus": "On"
},
"State": "Active",
To learn more about activating, updating, and deleting a snapshot with AWS SDKs, AWS CloudFormation, AWS Serverless Application Model (AWS SAM), and AWS Cloud Development Kit (AWS CDK), visit Activating and managing Lambda SnapStart in the AWS Lambda Developer Guide.
Runtime hooks
You can use runtime hooks to run code executed before Lambda creates a snapshot or after Lambda resumes a function from a snapshot. Runtime hooks are useful to perform cleanup or resource release operations, dynamically update configuration or other metadata, integrate with external services or systems, such as sending notifications or updating external state or to fine-tune your function’s startup sequence, such as by preloading dependencies.
Python runtime hooks are available as part of the open source Snapshot Restore for Python library, which is included in Python managed runtime. This library provides two decorators @register_before_snapshot to run before Lambda creates a snapshot and @register_after_restore to run when Lambda resumes a function from a snapshot. To learn more, visit Lambda SnapStart runtime hooks for Python in the AWS Lambda Developer Guide.
Here is an example Python handler to show how to run code before checkpointing and after restoring:
from snapshot_restore_py import register_before_snapshot, register_after_restore
def lambda_handler(event, context):
# handler code
@register_before_snapshot
def before_checkpoint():
# Logic to be executed before taking snapshots
@register_after_restore
def after_restore():
# Logic to be executed after restore
You can also use .NET runtime hooks available as part of the Amazon.Lambda.Core package (version 2.5 or later) from NuGet. This library provides two methods RegisterBeforeSnapshot() to run before snapshot creation and RegisterAfterRestore() to run after resuming a function from a snapshot. To learn more, visit Lambda SnapStart runtime hooks for .NET in the AWS Lambda Developer Guide.
Here is an example C# handler to show how to run code before checkpointing and after restoring:
public class SampleClass
{
public SampleClass()
{
Amazon.Lambda.Core.SnapshotRestore.RegisterBeforeSnapshot(BeforeCheckpoint);
Amazon.Lambda.Core.SnapshotRestore.RegisterAfterRestore(AfterRestore);
}
private ValueTask BeforeCheckpoint()
{
// Add logic to be executed before taking the snapshot
return ValueTask.CompletedTask;
}
private ValueTask AfterRestore()
{
// Add logic to be executed after restoring the snapshot
return ValueTask.CompletedTask;
}
public APIGatewayProxyResponse FunctionHandler(APIGatewayProxyRequest request, ILambdaContext context)
{
// INSERT business logic
return new APIGatewayProxyResponse
{
StatusCode = 200
};
}
}
To learn how to implement runtime hooks for your preferred runtime, visit Implement code before or after Lambda function snapshots in the AWS Lambda Developer Guide.
Things to know
Here are some things that you should know about Lambda SnapStart:
Now available
AWS Lambda SnapStart for Python and .NET functions are available today in US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm) AWS Regions.
With the Python and .NET managed runtimes, there are two types of SnapStart charges: the cost of caching a snapshot per function version that you publish with SnapStart enabled, and the cost of restoration each time a function instance is restored from a snapshot. So, delete unused function versions to reduce your SnapStart cache costs. To learn more, visit the AWS Lambda pricing page.
Give Lambda SnapStart for Python and .NET a try in the AWS Lambda console. To learn more, visit Lambda SnapStart page and send feedback through AWS re:Post for AWS Lambda or your usual AWS Support contacts.
— Channy
Post Syndicated from jzb original https://lwn.net/Articles/998615/
The FreeBSD Foundation has announced
the release of a security
audit report conducted by security firm Synacktiv. The audit uncovered
a number of vulnerabilities:
Most of these vulnerabilities have been addressed through official FreeBSD
Project security advisories, which offer detailed information
about each vulnerability, its impact, and the measures implemented to
improve the security of FreeBSD systems. […]The audit uncovered 27 vulnerabilities and issues within various
FreeBSD subsystems. 7 issues were not exploitable and were robustness
or code quality improvements rather than immediate security concerns.
Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/build-and-modify-apps-using-natural-language-with-aws-app-studio-now-generally-available/
Announced as preview in July, AWS App Studio is a generative AI-powered application development service that enables users to create applications using natural language, without the need for professional software development skills. In that post, I covered how AWS App Studio helps you build secure, scalable applications and eliminates operational overhead by fully managing each application.
App Studio empowers a new set of builders to create business applications. Whether you are an IT Project Manager, Data Engineer, Enterprise Architect, or Solution Architect, simply describe your requirements in natural language, within minutes, App Studio generates fully functional applications complete with multipage UIs, data models, and custom business logic.
Today, we’re excited to announce that AWS App Studio is now generally available in the US West (Oregon) and Europe (Ireland) AWS Regions.
Building on feedback from the preview, we are introducing several new features to enhance your app building experience:
Modify your applications with natural language
During the preview period, customers shared with us that they enjoy and appreciate generating fully functional applications using natural language prompts. However, the development journey usually doesn’t stop there, and they asked if they could extend or modify their apps using natural language.
Now, with App Studio, you can modify your applications using natural language. After you’ve generated your applications, you can now describe your desired changes and the assistant will propose updates for you to review. Upon confirmation, it will automatically make the change. This feature makes it even faster and easier to customize your application.
Let’s see how it works in my IT inventory management application that I built with App Studio.
With this new feature, I can chat with the assistant to modify my applications.

To modify my application, I can provide a prompt to add another feature to my app. In this case, I need to add another text input for the web URL to get details of requested hardware, and I need to another text area to store notes.

The generative AI assistant will then process my input and provide a proposal. I can review this proposal and select Confirm to proceed.

Then, the assistant will automatically add the components and modify my application.

Add intelligence to your app with a new generative AI component
We’re also introducing a new component to make it even easier to add generative AI capabilities such as text summarization, content generation, and file analysis to your applications.
There are two ways to use this feature. First, with my canvas open, I can select the Gen AI component and drag and drop it onto the canvas. Then, while selecting the component, I can use the assistant to customize it.

Another way is to use the assistant directly. Let’s say I need a feature to analyze repair notes and provide a summary to make it easier for me to review. I can type what I need in the chat box or use the suggested prompts.

Then, the assistant will process my input and provide a proposal. I can review the proposal and select Confirm to proceed.

App Studio will automatically add the required components. On the canvas, I see there’s a button that triggers an automation. If I need to change the underlying prompt, I can select the link that will redirect me to the respective automation.

Under the hood, the Gen AI component is powered by a new action step called Gen AI Prompt. This new component provides an easy way to modify the prompt and input parameters to customize the output generated by the large language model (LLM).

Here’s my published app with the newly added generative AI feature to summarize repair notes.

Generate and add custom business logic with natural language
I can also use the assistant to help me add custom business logic with JavaScript in my automation.
Let’s say that I need a custom business logic to calculate repair duration and notify my stakeholders through email. Here’s the multi-step automation that I created. To add the custom logic to my automation, I choose the JavaScript component and then drag and drop it into the right spot.

Next, I need to select the action and, in the Properties panel, I select the Expand editor icon.

With this feature, I can now generate JavaScript code with natural language. Here, I provide a prompt and App Studio generates the source code for me along with comments. This generated source code provides a foundation that I can customize to suit my requirements.

Next, I need to add the Send Email action into my automation to complete the flow.

Customize your app’s theme and style
Now, you can customize the look and feel of your application with App themes. With this feature, you can change the appearance of your application to Light mode or Dark mode. Additionally, you can specify custom colors for your app to match your company’s brand. To enable this feature, you need to turn on the Customize toggle.

Available today
Start building secure, intelligent, and scalable business applications with App Studio today. It’s free to build, and you’ll receive a 60-day (250 user hour) free trial.
Learn more about all these features and others in the AWS App Studio documentation and join the conversation in the #aws-app-studio channel in the AWS Developers Slack workspace.
Happy building,
— Donnie
Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-lambda-turns-ten-the-first-decade-of-serverless-innovation/
I have a very vague memory of a 2013-era meeting with my then-colleague Tim Wagner. The term serverless did not exist, but we chatted about various ways to allow developers to focus on code instead of on infrastructure. At some I recall throwing my arms skyward and indicating that it would be cool to simply toss the code into the air and have the cloud grab, store, and run it. After many more such meetings, Tim wrote a PRFAQ proposing that we build a platform that did just that, and in 2014 I was able to announce AWS Lambda – Run Code in the Cloud.
From Startup to Enterprise
It is often the case that startups, with no installed base to worry about and the need to innovate, are the first to take a chance on something new such as Lambda. While that certainly did happen, I was pleasantly surprised to find that established companies—up to and including enterprises—were just as quick to jump in. After a bit of experimentation, they quickly found ways to build event-driven applications that supported critical internal use cases. I took this as an early indicator that Lambda would be a success. It was easy to see how quickly our customers felt a new sense of empowerment: they could move from idea to implementation, and from there to realizing business value, more quickly than ever, while still building their systems in a scalable and composable way.
Today, over 1.5 million Lambda users collectively make tens of trillion function invocations per month. These customers use Lambda for file processing, stream processing (in conjunction with Amazon Kinesis and/or Amazon MSK), web applications, IoT backends, mobile backends (often using Amazon API Gateway and AWS Amplify as well) and to support and power many other use cases.
The First Decade of Serverless Innovation
Let’s roll back the calendar and take a look at a few of the more significant Lambda launches of the past decade:
2014 – The preview launch of AWS Lambda ahead of AWS re:Invent 2014 with support for Node.js and the ability to respond to event triggers from S3 buckets, DynamoDB tables, and Kinesis streams.
2015 – General availability, use of Amazon Simple Notification Service (Amazon SNS) notifications as triggers, and support for Lambda functions written in Java.
2016 – Support for DynamoDB Streams, Support for Python, and an increase in the function duration to 5 minutes (this was later raised to 15 minutes). Access to resources in a VPC, the power to call Lambda functions from Amazon Aurora stored procedures, environment variables, and the Serverless Application Model. This year also saw the introduction of Step Functions, which gave you the power to compose multiple Lambda functions to build more complex applications.
2017 – Support for AWS X-Ray, launches of AWS SAM Local and the Serverless Application Repository.
2018 – Support for Amazon SQS as an event trigger, the power to Extend AWS CloudFormation with Lambda-powered macros, and the ability to write your Lambda functions in any programming language.
2019 – Support for provisioned concurrency to give you additional control over performance.
2020 – Access to Savings Plans to save up to 17%, the ability for Lambda functions to access a shared file system, support for AWS PrivateLink to access your functions over a private network, code signing, billing at 1 ms granularity, functions that can use up to 10 MB of memory and 6 vCPUs, and support for container images.
2021 – Amazon S3 Object Lambda to let you process data as it is being retrieved from S3, AWS Lambda Extensions, support for running Lambda functions on Graviton processors.
2022 – Support for up to 10 GB of ephemeral storage per function invocation, HTTPS endpoints for Lambda functions, and Lambda SnapStart to make function invocation faster and more predictable.
2023 – Amazon S3 Object Lambda support for CloudFront, response streaming, and 12x faster function scaling when handling an unpredictable volume of requests.
2024 -New controls to make it easier to capture and search your Lambda function logs, SnapStart support for Java functions that use the ARM64 architecture, recursive loop detection, a new console editor based on VS Code, and an enhanced local IDE experience. The last two launches were designed to improve the developer experience by making it easier to build, test, debug, and deploy Lambda functions.
Again, this is just a subset of what we have launched. If you want to find even more launches, check out the Lambda category tag and search the What’s New for Lambda.
The Next Decade of Serverless
From the start, the vision for severless has been about helping developers to move from idea to business value more quickly. With that in mind, here are a couple of trends that seem clear to me as I look at Lambda’s direction over the first decade:
Default Choice – The serverless model is definitely here to stay, and will likely become the default operating model over time.
Continued Shift Toward Composability – Over time I can see that serverless applications will continue to make increasing use of reusable, prebuilt components. Aided by AI-powered development tools, a lot of new code will focus on connecting exiting components together in new and powerful ways. This will also boost consistency and reliability across applications.
Automated, AI-Optimized Infrastructure Management– We have already seen that Lambda reduces the amount of time and effort needed for managing infrastructure. Going forward, I can see that Machine Learning and other forms of AI will help to optimize costs and performance by allocating resources optimally with minimal human intervention. Applications will run on infrastructure that is automated, self-healing, and fault tolerant.
Extensibility and Integration – As a consequence of the two previous items, applications should be able to grow and adapt to changing conditions more easily than ever.
Security – Automated infrastructure management, real-time monitoring and other forms of threat detection, and AI-assisted remediation will work together to make serverless applications even more secure.
Some Lambda Resources
If you are already using Lambda to build serverless apps, great! If you are ready to get started, here are some resources to help out:
Serverless Training – Enroll in the free Serverless Learning Plan to learn about serverless concepts, common patterns, and best practices. Read the Serverless Ramp-Up Guide, and look at our extensive (in both topic and language) selection of digital training courses and in-person classroom training:
Case Studies – Review the AWS Serverless Customer Success stories to learn how AWS customers are building and innovating with Lambda and other serverless technologies.
re:Invent 2024 Sessions -Browse the re:Invent 2024 Session Catalog to find nearly 200 sessions focused on Serverless Compute & Containers:
Podcast – Listen to Episode 137 (AWS Lambda: A Decade of Transformation) of the AWS Developers Podcast to hear Marc Brooker and Julian Wood discuss the origins, evolution, and impact of Lambda.
New Books – Take a peek at some of the newest books on serverless development and architecture:
I hope that you have enjoyed this not-so-brief look at the past, present, and future of AWS Lambda. Leave me a comment and let me know what you think!
— Jeff;
Post Syndicated from Patrick Kennedy original https://www.servethehome.com/el-capitan-towers-above-the-top500-in-a-big-hpe-and-amd-win/
The newest Top500 list is out, and we have the former #1 supercomputer Frontier was dethroned. In this list, the Intel-powered Aurora supercomputer passed 1EF, but then El Capitan rose to take the #1 spot. This is a big win for HPE and AMD delivering a system at over 2 exaflops of FP64 performance. El […]
The post El Capitan Towers Above the Top500 in a Big HPE and AMD Win appeared first on ServeTheHome.
Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-buildercards-at-reinvent-2024-aws-community-day-amazon-bedrock-vector-databases-and-more-nov-18-2024/
This week, we wrapped up the final 2024 Latin America Amazon Web Services (AWS) Community Days of the year in Brazil, with multiple parallel events taking place. In Goiânia, we had Marcelo Palladino, senior developer advocate, and Marcelo Paiva, AWS Community Builder, as keynote speakers. Florianópolis feature Ana Cunha, senior developer advocate, and in Santiago de Chile, I had the honor to share the stage with Rossana Suarez, AWS Container Hero, as keynote speakers. These events, organized by communities for communities, provide opportunities to network, learn something new, and immerse yourself in the community. In a community, everyone grows together, and no one is left behind.
AWS Lambda celebrates its 10th anniversary, the service that introduced me to AWS and remains my favorite. Born from customer needs, it revolutionized cloud computing by allowing code execution without server management. Since its inception, documented in this LinkedIn post by Dr. Werner Vogels, Chief Technology Officer at Amazon.com, through the original PR/FAQ document, the service has grown significantly, introducing features such as 1ms billing precision and support for 10GB memory. Thank you AWS Lambda, here’s to many more anniversaries.
Amazon invests $110 million to support AI research at universities using Trainium chips. The initiative provides computing resources using AWS Trainium chips, enabling researchers to develop new AI architectures and machine learning innovations that will be open-sourced for broader advancement. Check out the Linkedin post by Matt Garman, CEO at AWS.
Last week’s launches
AWS BuilderCards second edition at re:Invent 2024 – Jeff Barr announced the launch of the second edition of AWS BuilderCards at re:Invent 2024. It includes improvements to the design and game mechanics, plus a new add-on pack on generative AI. Over 15,000 sets have been distributed at previous events, with excellent user feedback. They’ll be available for online purchase after re:Invent.

Amazon EventBridge announces up to 94% improvement in end-to-end latency for Event Buses – Amazon EventBridge has improved end-to-end latency for Event Buses by up to 94%, reducing average latency from 2235.23ms (measured in January 2023) to 129.33ms (measured in August 2024 at P99). This enhancement enables faster processing for time-sensitive applications such as fraud detection, industrial automation, and gaming across all AWS Regions where Amazon EventBridge is available, including the AWS GovCloud (US) Regions, at no additional cost to you.
Introducing resource control policies (RCPs), a new type of authorization policy in AWS Organizations – Resource control policies (RCPs), a new authorization policy in AWS Organizations. RCPs allow centralized control over maximum permissions granted to resources, complementing service control policies (SCPs) that control permissions for principals. RCPs can restrict external access to resources like Amazon Simple Storage Service (Amazon S3) buckets, enforcing a data perimeter across the organization.

Replicate changes from databases to Apache Iceberg tables using Amazon Data Firehose (in preview) – A new preview capability in Amazon Data Firehose that captures and replicates database changes to Apache Iceberg tables on Amazon S3. This feature supports PostgreSQL and MySQL databases, providing a simple solution to stream database updates without impacting performance. It automatically handles data partitioning and schema evolution, eliminating the need for complex ETL processes.
Amazon S3 now supports up to 1 million buckets per AWS account– Amazon S3 has increased its default bucket quota from 100 to 10,000 per AWS account. Customers can now request increases up to 1 million buckets. The first 2,000 buckets are free, with a small monthly fee applying thereafter for additional buckets.
Amazon Keyspaces (for Apache Cassandra) reduces prices by up to 75% – Amazon Keyspaces (for Apache Cassandra) announces significant price reductions of up to 75%. The service reduces on-demand mode pricing by up to 56% for single-region and 65% for multi-region usage. Time-to-live (TTL) delete prices are also reduced by 75%.
Centrally managing root access for customers using AWS Organizations – AWS Identity and Access Management (IAM) launches a new capability for centrally managing root access in AWS Organizations. This feature allows security teams to remove long-term root credentials from member accounts and use temporary, task-scoped root sessions for specific actions. The solution enhances security by eliminating permanent root credentials while maintaining the ability to perform necessary privileged operations.

Amazon DynamoDB reduces prices for on-demand throughput and global tables – Amazon DynamoDB announces significant price reductions, cutting on-demand mode throughput costs by 50% and global tables by up to 67%. Multi-region replicated writes now match single-region pricing. These changes make on-demand mode the recommended choice for most DynamoDB workloads.
Amazon Q Developer plugins for Datadog and Wiz now generally available – Amazon Q Developer now offers plugins for Datadog and Wiz services, allowing users to access these partners features directly through the AWS Console. Users can query information using natural language commands like @datadog or @wiz to get real-time updates and security insights.
Other AWS blog posts
Here are some additional projects and blog posts that you might find interesting:
Introducing Stable Diffusion 3.5 Large in Amazon SageMaker JumpStart – This powerful 8.1 billion parameter model enables high-quality, photorealistic image generation from text prompts. Customers can seamlessly deploy and use the model in Amazon SageMaker JumpStart, benefiting from Amazon SageMaker security and machine learning operations (MLOps) capabilities.
Transcribe, translate, and summarize live streams in your browser with AWS AI and generative AI services – This blog post explains how we developed a Chrome extension that uses AI services to enhance live streaming experiences. The extension use Amazon Transcribe, Amazon Translate, and Amazon Bedrock to provide real-time transcription, translation, and summarization of live streams directly in the browser. It supports over 50 languages for transcription and 75 for translation, making content globally accessible.
Simplify automotive damage processing with Amazon Bedrock and vector databases –This blog post presents a solution combining Amazon Bedrock and vector databases to streamline automotive damage assessment. The system uses AI to analyze vehicle damage images, provide cost estimates, and match with similar cases from existing datasets. It use Anthropic’s Claude 3 and Amazon Titan Multimodal Embeddings, for efficient, accurate processing.
Revolutionize trip planning with Amazon Bedrock and Amazon Location Service – Amazon Bedrock and Amazon OpenSearch Service vector databases combine to automate automotive damage assessment, using AI to analyze images and match them with historical data for accurate repair estimates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:
AWS Community Days – Join community-led conferences featuring technical discussions, workshops, and hands-on labs driven by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are scheduled for November 23 in Indonesia, and on December 14 in Kochi, India.
Browse all upcoming AWS led in-person and virtual events and developer-focused events.
Create your AWS Builder ID and reserve your alias. Builder ID is a universal login credential that gives users access to AWS tools and resources, including over 600 free training courses, community features, and developer tools such as Amazon Q Developer beyond the AWS Management Console.
That’s all for this week. Check back next Monday for another Weekly Roundup!
Thanks to Odina Jacobs for the AWS Community Chile photo.
— Eli
This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!
Post Syndicated from Talks at Google original https://www.youtube.com/watch?v=BJGO3BPYYos
Post Syndicated from corbet original https://lwn.net/Articles/997959/
Linus Torvalds released
the 6.12 kernel on November 17, as expected. This development
cycle, the last for 2024, brought 13,344 non-merge changesets into the
mainline kernel; that made it a relatively slow cycle from this
perspective, but 6.12 includes a long list of significant new features.
The time has come to look at where those changes came from, and to look at
the year-long LTS cycle as well.
Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2024/11/most-of-2023s-top-exploited-vulnerabilities-were-zero-days.html
Zero-day vulnerabilities are more commonly used, according to the Five Eyes:
Key Findings
In 2023, malicious cyber actors exploited more zero-day vulnerabilities to compromise enterprise networks compared to 2022, allowing them to conduct cyber operations against higher-priority targets. In 2023, the majority of the most frequently exploited vulnerabilities were initially exploited as a zero-day, which is an increase from 2022, when less than half of the top exploited vulnerabilities were exploited as a zero-day.
Malicious cyber actors continue to have the most success exploiting vulnerabilities within two years after public disclosure of the vulnerability. The utility of these vulnerabilities declines over time as more systems are patched or replaced. Malicious cyber actors find less utility from zero-day exploits when international cybersecurity efforts reduce the lifespan of zero-day vulnerabilities.
Post Syndicated from Mikayla Wyman original https://blog.rapid7.com/2024/11/18/unlock-24-7-soc-coverage-rapid7-mxdr-now-supports-with-microsoft-security-products/

In today’s complex threat landscape, organizations need every advantage at their disposal to stay secure–starting with maximizing the tools they already have within their ecosystem. With the launch of Rapid7 MXDR’s SOC support for key Microsoft security products, we’re making it possible for organizations to layer security defenses and amplify outcomes by combining their existing Microsoft telemetry with the 24×7 coverage, broad security ecosystem telemetry and in-depth expertise of Rapid7’s MXDR service.
By connecting directly to key Microsoft event sources—like Microsoft O365, Defender for Cloud, Defender for Endpoint, Defender for Vulnerability Management, Defender for Identity, and Entra Identity—MXDR amplifies detection, visibility, and response capabilities across the technology you rely on, without needing additional infrastructure or complex setups. From uncovering hidden threats to responding to incidents faster, this integration leverages Microsoft’s event data to help security teams achieve 24×7 comprehensive Microsoft coverage throughout their tool stack.
Organizations of every size can now harness the best of both worlds: the familiarity and depth of their Microsoft environment and the advanced detection, correlation, automation, and forensic response capabilities of Rapid7’s MXDR service.
Microsoft tools are foundational in many organizations’ tech stacks, and help teams collect security-critical data that can enhance threat detection and incident response. Without an integrated technology stack and 24×7 SOC triage, investigation, and response coverage across the Microsoft tools that teams already rely on, normalizing inputs and pinpointing real signs of attacker behavior can be nearly impossible for teams of all sizes.
By supporting Microsoft event sources as a layer on top of native telemetry provided through the Rapid7 Detection Engine, we’re making it easier for security teams to correlate data across their environment from key areas in their Microsoft toolset.
Teams can now customize their Rapid7 MXDR support to cover triage, investigation, and response to threats across key Microsoft Security tools, including:
By incorporating support for Microsoft security tools, Rapid7 MXDR maximizes your existing Microsoft investment, helping your security team stay agile and resilient in the face of an ever-evolving threat landscape.
We’re on a mission for our MDR service to bring unified visibility to the attack surface and comprehensive defense capabilities to your security program. By extending 24×7 expert SOC coverage to Microsoft Security tools, we’re bringing:
As we extend our MXDR service with more comprehensive coverage to meet security teams where they are, we’re excited to partner with you to secure your extended ecosystem. If you’re a Rapid7 MDR customer, reach out to your account team to learn more about our extended coverage. If you’re not a Rapid7 MDR customer yet, request a demo here.
Post Syndicated from Danny Cortegaca original https://aws.amazon.com/blogs/security/threat-modeling-your-generative-ai-workload-to-evaluate-security-risk/
As generative AI models become increasingly integrated into business applications, it’s crucial to evaluate the potential security risks they introduce. At AWS re:Invent 2023, we presented on this topic, helping hundreds of customers maintain high-velocity decision-making for adopting new technologies securely. Customers who attended this session were able to better understand our recommended approach for qualifying security risk and maintaining a high security bar for the applications they build. In this blog post, we’ll revisit the key steps for conducting effective threat modeling on generative AI workloads, along with additional best practices and examples, including some typical deliverables and outcomes you should look for across each stage. Throughout this post we will link to specific examples that we created with the AWS Threat Composer tool. Threat Composer is an open source AWS tool you can use to document your threat model, available at no additional cost.
This post covers a practical approach for threat modeling a generative AI workload and assumes you know the basics of threat modeling. If you want to get an overview on threat modeling, we recommend that you check out this blog post. In addition, this post is part of a larger series on the security and compliance considerations of generative AI.
Each new technology comes with its own learning curve when it comes to identifying and mitigating the unique security risks it presents. The adoption of generative AI into workloads is no different. These workloads, specifically the use of large language models (LLMs), introduce new security challenges because they can generate highly customized and non-deterministic outputs based on user prompts, which introduces the possibility for potential misuse or abuse. In addition, relies on access to large and customized data sets, often internal data sources which might contain sensitive information.
Although working with LLMs is a relatively new practice and has some unique and nuanced security risks and impacts, it’s crucial to remember that LLMs are only one portion of a larger workload. It’s important to apply the threat modeling approach to parts of the system, taking into account well-known threats such as injections or the compromise of credentials. Part 1 of the Securing generative AI AWS blog series, An introduction to the Generative AI Security Scoping Matrix, provides a great overview of what those nuances are, and how the risks differ depending on how you make use of LLMs in your organization.
As a quick refresher, threat modeling is a structured approach to identifying, understanding, addressing, and communicating the security risks in a given system or application. It is a fundamental element of the design phase that allows you to identify and implement appropriate mitigations and make fundamental security decisions as early as possible.
At AWS, threat modeling is a required input to initiating our Application Security (AppSec) process for the builder teams at AWS, and our builder teams get support from a Security Guardian to build threat models for their features or services.
A useful way of structuring the approach to threat modeling, created by expert Adam Shostack, involves answering four key questions. We’ll look into each one and how to apply them to your generative AI workload.
This question aims to get a detailed understanding of your business context and application architecture. The detail that you’re looking for should already be captured as part of the comprehensive system documentation created by the builders of your generative AI solution. By starting from this documentation, you can streamline the threat modeling process and focus on identifying potential threats and vulnerabilities, rather than on re-creating foundational system knowledge.
At a minimum, builders should capture the key components of the solution, including data flows, assumptions, and design decisions. This lays the groundwork for identifying potential threats. Key elements to document are the following:
Figure 1 shows how Threat Composer allows you to input information about the application in the Application Information, Architecture, Dataflow, and Assumptions sections.
Figure 1: Threat composer dataflow diagram view for a generative AI chatbot example
For this question, you identify possible threats to your application using the context and information you gathered for the previous question. To help you identify possible threats, make use of existing repositories of knowledge, especially those related to the new technologies you are adopting. These often have tangible examples that you can apply to your application. Useful resources are the OWASP top 10 for LLMs, MITRE ATLAS framework, and the AI Risk Repository. You can also use a structured framework such as STRIDE to aid you in your thinking. Use the information you received from the “What are we building?” question and apply the most relevant STRIDE categories to your thinking. For example, if your application hosts critical data that the business has no risk appetite for losing, then you might think about the various Information Disclosure threats first.
You can write and document these possible threats to your application in the form of threat statements. Threat statements are a way to maintain consistency and conciseness when you document your threat. At AWS, we adhere to a threat grammar which follows the syntax:
A [threat source] with [prerequisites] can [threat action] which leads to [threat impact], negatively impacting [impacted assets].
This threat grammar structure helps you to maintain consistency and allows you to iteratively write useful threat statements. As shown in Figure 2, Threat Composer provides you with this structure for new threat statements and includes examples to assist you.
Figure 2: Threat composer threat statement builder
Once you go through the process of creating threat statements, you will have a summary of “what can go wrong.” You can then define attack steps, as an analysis of “how it can go wrong.” It’s not always necessary to define attack steps for each threat statement because there are many ways a threat might actually happen. Going through the exercise of identifying and documenting a few different threat mechanisms can help to get specific mitigations that you can associate with each attack step for a more effective defense-in-depth approach.
Threat Composer gives you the ability to add additional metadata to your threat statements. Customers who have adopted this option into their workflows most commonly use the STRIDE category and Priority metadata tags. Those customers can quickly track which threats are the highest priority and which STRIDE category they correspond to. Figure 3 shows how you can document threat statements alongside their associated metadata in Threat Composer.
Figure 3: Threat Composer sample genAI chatbot application – threat view
By systematically considering what can go wrong, and how, you can uncover a range of possible threats. Let’s explore some of the example deliverables that can emerge from this process:
These are some example threat statements for an application that is interacting with an LLM component:
These are some example attack steps that demonstrate how the preceding threat statements could occur:
Now that you’ve identified some possible threats, consider which controls would be appropriate to help mitigate the risks associated with those threats. This decision will be driven by your business context and the asset in question. Your organizational policies will also influence prioritization of controls: Some organizations might choose to prioritize the control that impacts the highest number of threats, while others might choose to start with the control that impacts the threats that are deemed the highest risk (by likelihood and impact).
For each identified threat, define specific mitigation strategies. This could include input sanitization, output validation, access controls, and more. Ideally, at a minimum, you want at least one preventative control and one detective control associated with each threat. The same resources that are linked to in the What can go wrong? section are also highly useful for identifying relevant controls. For example, the MITRE ATLAS has a dedicated section for mitigations.
Note: You might find that as you identify mitigations for your threats, you start to see duplication in your controls. For example, least-privilege access control might be associated with almost all of your threats. This duplication can also help you to prioritize. If a single control appears in 90% of your threat mitigations, the effective implementation of that control will help to drive down risk across each of those threats.
Associated with each threat, you should have a list of mitigations, each with a unique identifier to ease lookups and reusability later on. Example mitigations with identifiers include the following:
For more information on relevant security controls for your workload, we recommend that you read Part 3 of our Securing generative AI series: Applying relevant security controls.
Figure 4 shows some completed example threat statements in Threat Composer, with mitigations linked to each.
Figure 4: Completed threat statements with metadata and linked mitigations
After answering the first three questions, you have your completed threat model. The documentation should contain your DFDs, threat statements, [optional] attack steps, and mitigations.
For a more detailed example, including a visual dashboard that shows a breakdown of a threat summary, see the full GenAI chatbot example in Threat Composer.
A threat model is a living document. This post has discussed how creating a threat model helps you to identify technical controls for threats, but it’s also important to consider the non-technical benefits that the process of threat modeling provides.
For your final activity, you should validate both elements of the threat modeling activity.
Validate the effectiveness of the identified mitigation: Some of the mitigations you identify might be new, and some you might already have had in place. Regardless, it’s important to continuously test and verify that your security measures are working as intended. This could involve penetration testing or automated security scans. At AWS, threat models serve as inputs to automated test cases to be embedded in the pipeline. The threats defined are also used to define the scope of the penetration testing, to confirm whether those threats have been mitigated sufficiently.
Validate the effectiveness of the process: Threat modeling is fundamentally a human activity. It requires interaction across your business, builder teams, and security functions. Those closest to the creation and operations of the application should own the threat model document and revisit it often, with support from their security team (or Security Guardian equivalent). How often this is done will depend on your organizational policies and the criticality of the workload, though it is important to define triggers that will initiate a review of the threat model. Example triggers can include threat intelligence updates, new features that significantly change data flows, or new features that impact security-related aspects of the system (such as authentication or authorization, or logging). Validating your process periodically is especially important when you adopt new technologies because the threat landscape for these evolves faster than usual.
Performing a retrospective on the threat modeling process is also a good way to work through and discuss what worked well, what didn’t work well, and what changes you will commit to the next time the threat model is revisited.
These are some example outputs for this step of the process:
In this blog post, we explored a practical and proactive approach to threat modeling for generative AI workloads. The key steps we covered provide a structured framework for conducting effective threat modeling, from understanding the business context and application architecture to identifying potential threats, defining mitigation strategies, and validating the overall effectiveness of the process.
By following this approach, organizations can better equip themselves to maintain a high security bar as they adopt generative AI technologies. The threat modeling process not only helps to mitigate known risks, but also fosters a culture of security-mindedness that is crucial for organizations to adopt. This can help your organization to unlock the full potential of these powerful technologies while maintaining the security and privacy of your systems and data.
Want to look deeper into additional areas of generative AI security? Check out the other posts in the Securing Generative AI series:
Post Syndicated from jake original https://lwn.net/Articles/998570/
Security updates have been issued by AlmaLinux (binutils, libsoup, squid:4, tigervnc, and webkit2gtk3), Debian (icinga2, postgresql-13, postgresql-15, smarty3, symfony, thunderbird, and waitress), Fedora (dotnet9.0, ghostscript, microcode_ctl, php-bartlett-PHP-CompatInfo, python-waitress, and webkitgtk), Gentoo (Perl, Pillow, and X.Org X server, XWayland), Oracle (binutils, cups-filters, giflib, squid, and webkit2gtk3), Red Hat (webkit2gtk3), SUSE (ansible-core, apache2, gio-branding-upstream, icinga2, kernel-devel, libnghttp2-14, libsoup-2_4-1, libsoup-3_0-0, libvirt, nodejs-electron, postgresql13, postgresql16, python39, rclone, thunderbird, ucode-intel-20241112, and wget), and Ubuntu (python-asyncssh and tomcat9).
Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=hodLBnww1Os
Post Syndicated from Боян Юруков original https://yurukov.net/blog/2024/otvoreno-tablo/
Още една серия от събития проследяващи иначе тривиален казус, който илюстрира по-голям проблем в надзора и отклика на сигнали за градската среда. Може да прочетете аналогичен за озеленяването на нова сграда. Подобен случай бях съобщил преди почти четири години, който поне беше покрит някак.
Първи учебен ден. Получавам съобщение от гости на квартала от чужбина, че са забелязали отворено табло на трафопост. Питат как е позволено това и кой следва да го оправи. Намирам го до 105-то СУ в Дианабад – точно по пътя, където всеки ден минават десетки ученици. Съвсем лесно може някое дете да отиде там и да пипа вътре. Подавам сигнал по call.sofia със снимки. Минават два дни и поради липсата на реакция подавам сигнал директно в Електрохолд, които общават да вземат мерки.


Минават десет дни от сигнала. Минавам да видя как са го оправили и наистина е затворено… с камък. От таблото все така стърчи кабел вързан директно към открити пластини с напрежение и минаващ през клоните на дърветата до близкото заведение. Подавам втори сигнал барем нещо се случи.



Две седмици след първия ми сигнал кметът на район Изгрев пише по двата сигнала на Електрохолд с молба да се поправи.
Ново писмо към Електрохолд от районния кмет.
Минах отново да видя какво се е случило. Има натрупани клони върху и пред таблото. Не е заключено или подсигурено срещу достъп или вода. Кабелът все така си стърчи в посока заведението. Макар да е малко по-трудно да се стигне до таблото заради сухите клони, вероятността да се запалят създава много по-голям риск, особено ако някой тръгне да ги гаси с вода.



The post Малък пример как липсата на контрол прави градската среда опасна first appeared on Блогът на Юруков.