GPS As a Key Distribution Platform

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2026/06/gps-as-a-key-distribution-platform.html

This is interesting:

The U.S. military has likely been quietly broadcasting codes for its global encryption network using public GPS for nearly 20 years, turning each satellite into a hidden “numbers station,” according to Steven Murdoch…

That means every device that uses GPS has been receiving hidden government information for years, and nobody outside the military knew it until now.

[…]

Murdoch discovered that this particular sentinel was transmitted by all 31 operational satellites within a window of a few hours on May 26, 2011, potentially heralding the activation of a new operational system. He confirmed that this timeline coincided with the rollout of the military’s Over-the-Air Distribution (OTAD) and the Over-the-Air Rekeying (OTAR) by cross-referencing declassified documents, including a 2015 presentation about the dates of the operation.

“There was a perfect match between the timeline and that presentation and the change points that were automatically identified from the data,” Murdoch said. “That was the smoking gun that made me think: This is what it’s for.”

These automated systems replaced the cumbersome manual distribution of cryptographic keying material, allowing military GPS receivers around the world to be rekeyed remotely through satellite broadcasts rather than through onsite procedures.

Asahi Linux warns users not to upgrade to macOS 27 beta

Post Syndicated from jzb original https://lwn.net/Articles/1077209/

The Asahi Linux project,
which brings Linux support to Apple Arm-based Macs, has warned
its users
not to upgrade to the macOS 27 “Golden Gate”
beta.

Apple has changed how the boot picker and Startup Disk applications
detect valid OS boot volumes. When using either from macOS 27, your
Asahi partition will not be visible! We believe this to be a bug, and
have filed a report (FB22994760).

If you have already upgraded to the beta and noticed that your
Asahi partition has disappeared, do not stress. Your Asahi partition
is still there, and you have not lost any data.

The Asahi Linux installer has been patched to prevent use with
macOS 27 for now, but any users already bitten by the change will
need to use macOS 26 to restore access to Asahi Linux.

[$] BPF loop verification with scalar evolution

Post Syndicated from daroc original https://lwn.net/Articles/1076121/

The BPF verifier has, in the course of wrestling with the difficult problem of
statically analyzing loops, grown special support for many kinds of loops over its
history, but its fundamental approach to simple for loops has not
changed.
When it encounters a loop, it evaluates it, iteration by iteration, until reaching
an exit condition — a process that can cause the verifier to mistakenly hit the
limit on the number of allowed instructions where a better implementation
would not.
Eduard Zingerman
spoke at the 2026

Linux Storage, Filesystem, Memory-Management, and BPF Summit

about his in-progress work on improving the verifier’s treatment of loops, especially nested
loops.

Rapid7 Gains Access To Anthropic’s Project Glasswing To Explore Frontier AI For Cybersecurity

Post Syndicated from Wade Woolwine original https://www.rapid7.com/blog/post/ai-rapid7-accesses-anthropics-project-glasswing-exploring-frontier-artificial-cybersecurity-intelligence

Wade Woolwine is Senior Director, Product Security at Rapid7.

Rapid7 is excited to join Anthropic’s Project Glasswing, which includes access to Claude Mythos Preview, giving our teams the opportunity to explore how frontier AI can support legitimate, internal defensive security workflows led by experienced security practitioners. Anthropic has now expanded Project Glasswing from its initial cohort to a broader group of organizations, underscoring how quickly this conversation is moving from model capability to industry readiness. 

This access comes at a critical moment for security operations. Attackers are moving faster, attack surfaces are expanding, and fragmented security data makes it harder for teams to correlate context and respond at scale. The industry is entering a period where powerful frontier AI models with advanced cyber capabilities require new operating norms, stronger safeguards, and better infrastructure for how vulnerabilities are verified, disclosed, fixed, and deployed.

Frontier AI will raise expectations for how quickly security teams can understand risk, make decisions, and prove that action has reduced exposure. Rapid7 has already been tracking what Project Glasswing means for security leaders: faster discovery is only part of the story, and the real test is how defenders handle everything that follows, from prioritization and remediation to validation, detection, and response. Rapid7’s involvement gives us another opportunity to help shape how advanced LLMs are evaluated and applied to real defensive security work.

The organizations best positioned to benefit from frontier AI will be those that pair advanced models with trusted security context, expert oversight, and mature operational workflows. That is the lens Rapid7 is bringing to our internal exploration of Claude Mythos Preview, and it reflects the same principle that guides our broader AI strategy: advanced technology delivers the most value when grounded in security expertise, operational context, and measurable outcomes.

Exploring Claude Mythos Preview inside Rapid7

In the first week of Rapid7’s access to Claude Mythos Preview , it has already given our researchers, security engineers, and analysts another way to explore how frontier AI can strengthen the security workflows we already rely on. Our use is internal and practitioner-led, with a focus on learning where these models can create defensive value, where human expertise remains essential, and where responsible guardrails are required.

Cybersecurity impact depends on more than model capability. A model may help identify a potential vulnerability and confirm exploitability, but reducing risk requires deeper operational work: understanding affected systems, mapping business context, prioritizing remediation, validating the fix, and ensuring detection coverage is in place. Anthropic’s latest Project Glasswing update reinforces that same shift: as AI makes discovery faster, the next challenge becomes helping the industry scale verification, disclosure, fixing, and deployment.

For more than 25 years, Rapid7 has helped organizations understand risk in real environments and take action against it. Access to Project Glasswing gives us another way to explore how LLMs can support that mission, while reinforcing the same principle that guides our broader AI strategy: advanced technology delivers the most value when grounded in security expertise, operational context, and measurable outcomes.

How Rapid7 is using Claude Mythos Preview internally

Our initial exploration is focused on internal defensive use cases that can help strengthen our product security, improve our research, and create better security outcomes overall. The goal is to understand how frontier AI can support highly specialized security work while helping us evaluate these capabilities with the discipline and caution they require.

In product security, we are exploring how Claude Mythos Preview can support assessment of our code and infrastructure, helping identify potential vulnerabilities, weaknesses, or risky patterns that traditional product security tools may miss. Used responsibly, this type of workflow can help engineering and product security teams reduce risk earlier in the development lifecycle.

We are also evaluating how frontier AI can support vulnerability validation and exploitation analysis in authorized environments. This includes exploring how models can help researchers reason across unfamiliar code, validate severity, build safe proof-of-concept exploit paths, and translate findings into practical remediation guidance.

Our work also includes zero-day research and frontier model evaluation. As models become more capable, security teams need a clear view of where they perform well, where they struggle, and how their outputs should be governed. Evaluating these models against vulnerability discovery and exploitation tasks helps Rapid7 understand their practical value, limitations, and safeguards.

We are also applying frontier AI to red-teaming, detection, and response research. As AI becomes more embedded in enterprise systems and security operations, it also needs to be tested adversarially. Frontier models can help practitioners explore attack paths, challenge assumptions, enrich investigations, reduce noise, and support faster decisions when paired with the right telemetry and human judgment.

Why frontier AI needs cybersecurity expertise

The industry conversation around frontier AI often starts with what models can find, especially as they become more capable at reasoning across large codebases and surfacing potential flaws. However, security teams reduce risk by knowing which findings matter, acting on them quickly, and proving that exposure has been reduced. As we’ve written before, the challenge is turning faster discovery into faster action, which requires teams to understand their environment well enough to apply emerging models with intent.

That is why expertise matters. AI can help accelerate parts of the workflow, but security impact comes from connecting discovery to validation, remediation, detection, and response. Without that connection, faster discovery can create more volume for teams that are already stretched. With the right context and operating model, it can help defenders move earlier and with more confidence.

This is the lens Rapid7 brings to Project Glasswing. Our teams are exploring these capabilities as practitioners who understand the real-world pressures customers face: incomplete asset visibility, fragmented ownership, growing vulnerability backlogs, expanding identity and cloud risk, and alert volumes that can outpace human-only workflows.

From frontier AI adoption to preemptive security

Rapid7’s broader strategy is focused on helping organizations move toward preemptive security, where exposure management, and detection and response work together to disrupt attackers before risk becomes impact. As AI accelerates both attacker activity and defender workflows, security teams need more than faster vulnerability discovery. They need rich contextual prioritization, trusted AI-driven decision making, and mitigations beyond patching so they can prioritize, validate, and respond at speed and scale.

The next phase of cybersecurity will require speed, scale, and consistency across the entire security lifecycle. The industry challenge is expanding from finding vulnerabilities to the harder operational work of verifying, disclosing, fixing, and deploying remediations. While vulnerability and alert volumes will increase, cyber resilience depends on what happens both before and after discovery. In a reality where vulnerabilities can be exploited or chained together quickly, teams need the ability to prioritize exposures that have real impact, investigate quickly with full context, and keep operating in the face of disruption.

Preemptive security also means mitigation must extend beyond patching. Timely patching at scale is not always practical, so security teams need the ability to intercept and disrupt exploit paths through virtual patching, controls management, and rapid response actions. That is why Rapid7 is approaching frontier AI through the lens of preemptive security. Our AI foundation is built around unified security data and shared operational context across exposures, assets, identities, behavior, and activity, and transparent AI decisions validated by experts and governed by policy-driven workflows.

Access to Claude Mythos Preview is another step in exploring how LLMs can help security teams move earlier, act faster, and build more resilient programs without losing the human expertise and accountability that effective security requires. Anthropic also unveiled Fable 5 today, its first publicly available Mythos-class model, which will only further underscore the importance of having an integrated, AI-ready security plan that can turn this new benchmark of visibility into meaningful security improvement.

Security updates for Tuesday

Post Syndicated from jzb original https://lwn.net/Articles/1077163/

Security updates have been issued by AlmaLinux (bind and libyang), Debian (keystone and openssl), Fedora (mingw-objfw, objfw, sentencepiece, and tailscale), Mageia (packagekit and suricata), Oracle (bind, bind9.16, go-toolset:ol8, ImageMagick, kernel, samba, and vim), SUSE (apache-commons-lang3, apache-commons-text, apache-commons- configuration2, apache-commons-cli, apache-commons-io, apache-commons-codec, avahi, busybox, chromedriver, chromium, csync2, firewalld, frr, gleam, helm, kernel-devel, keybase-client, libmozjs-140-0, libopenvswitch-3_7-0, libsoup, memcached, mutt, openjpeg2, ovmf, perl-HTML-Parser, perl-Net-CIDR-Set, perl-Protocol-HTTP2, postgresql-jdbc, postgresql17, python-CairoSVG, python-Flask, python-pip, python-pyOpenSSL, python-python-multipart, python-Twisted, python-urllib3, python-urllib3_1, python-uv, python311, rsync, tomcat, and tree-sitter), and Ubuntu (alsa-lib, cups, inetutils, isc-kea, jpeg-xl, libnet-cidr-lite-perl, netatalk, netty, nginx, node-shell-quote, php-twig, pillow, poppler, rsync, strongswan, systemd, and transmission).

Linux App Summit 2026 (Heise)

Post Syndicated from corbet original https://lwn.net/Articles/1077084/

Heise is carrying a
report from the Linux App Summit
, held in Berlin in May.

The slightly more than a dozen talks were symbolically framed
between the opening keynote by systemd creator Lennart Poettering
and the closing talk by Jorge Castro, initiator of the Universal
Blue project, from which the modern Linux systems Bluefin and
Bazzite emerged. Both Castro and Poettering call for a fundamental
rethink of how Linux operating systems are delivered but pursue
different approaches.

Exploring AI Integration in Zabbix with Gemini and WebMCP

Post Syndicated from Cesar Caceres original https://blog.zabbix.com/exploring-ai-integration-in-zabbix-with-gemini-and-webmcp/33050/

When I first started working with Zabbix in banking and telecommunications over a decade ago, the workflow was always the same: something breaks, an alert fires, you open the dashboard, you diagnose, you fix. Every step required a human sitting in front of a screen reading charts and making decisions.

Then AI came along, and I started asking a simple question. What if I could just talk to my infrastructure and get answers? That question led me down a path from Telegram bots to WhatsApp integrations, and then from chatbots with custom modules to a full mobile application on the Google Play Store.

Along the way, I discovered that the real challenge is not connecting AI to Zabbix – it is defining how they should communicate. That is where protocols like MCP and WebMCP come in, and why they matter for anyone working in infrastructure monitoring today.

Phase 1: Just let me ask a question

The first thing I wanted was simple – to ask about my infrastructure in natural language and get a useful answer. Not parse JSON, not read raw metrics, just ask.

My early integrations used Telegram and WhatsApp as the interface. The AI (initially custom modules, later Gemini) would receive a question like “What alerts do I have right now?”, query the Zabbix API, and respond in plain language. It worked, but it was limited – the AI could only answer what I had explicitly programmed it to answer.

Phase 2: MCP gives AI a standard way to talk to Zabbix

The Model Context Protocol (MCP) developed by Anthropic solves a fundamental problem – how do you give an AI model structured access to external tools and data sources without reinventing the wheel every time?

Before MCP, every AI-to-Zabbix integration was custom. You wrote a script, parsed the API response, and formatted it for the model. If you wanted to switch from one AI provider to another, you started over. MCP standardizes this. You build an MCP server once, and any compatible AI client (Claude Desktop, Gemini CLI, or others) can use it.

The Zabbix community has already embraced this. There are now multiple open source MCP servers for Zabbix available on GitHub. You can request things like:

  • “Show me all unacknowledged problems with severity High or above”
  • “Create a maintenance window for host db-01 for 2 hours”
  • “What changed in the last 24 hours?”

Best of all, you can do it all through natural language and through a standardized protocol.

In my own environment, I set up a WebMCP server that connects a FastAPI backend to the Zabbix API, exposing structured endpoints for hosts, alerts, and problems. The server runs 24/7 alongside my Zabbix instance on a dedicated Proxmox node.

With a simple query to the WebMCP server, I can retrieve the full list of monitored hosts, check active problems, view recent alerts with their severity levels, and get a usage summary – all through clean, structured JSON responses that any AI client can consume.

The WebMCP server exposes structured endpoints for health monitoring, usage tracking, and Zabbix data.
A live query to the WebMCP server returning real Zabbix alerts in structured JSON.

Phase 3: WebMCP becomes the interface

Looking ahead, WebMCP is a proposed browser standard (co-created by engineers at Google and Microsoft) that lets websites declare their capabilities as structured tools that AI agents can call directly in the browser.

Think about what this means for Zabbix. Today, the Zabbix frontend is a web application that humans navigate – click on hosts, drill into triggers, check graphs, acknowledge problems. An AI agent trying to use the Zabbix frontend would have to take screenshots, interpret the UI, and guess where to click slow, fragile, and expensive.

With WebMCP, the Zabbix frontend could declare: “Here is a tool called get_active_problems. It needs a severity filter. Call it and I will return structured results.” The AI agent calls the function, gets clean data, and acts on it. No screenshots, no DOM scraping, no guessing.

The key differences from traditional MCP:

  • WebMCP runs inside the browser tab, not on a separate server. No additional infrastructure to deploy.
  • It inherits the user’s existing session the same SSO, the same cookies, the same role-based permissions. No separate auth layer.
  • Tools are contextual on a problems page, the agent sees problem-related tools. On a host configuration page, it sees host tools.

Chrome 146 already ships WebMCP experimentally. Broader stable release in Chrome is expected by the end of 2026.
To explore this concept in practice, I set up a WebMCP server in my environment, connected to my Zabbix instance.

The server exposes Zabbix data through a browser-based interface, allowing agents to query hosts, alerts, and problems directly from the browser tab.

The server itself is monitored by Zabbix, so I can track its resource consumption and ensure it does not impact the rest of the infrastructure closing the loop between the tool and the platform it extends.

A WebMCP demo page displaying live Zabbix alerts fetched through the browser-based backend.
A large selection of dashboard widgets enable Zabbix users to create Windows dashboards for different use cases

Why this matters for mobile monitoring

Today, if you want AI-assisted Zabbix monitoring on your phone, you need a dedicated app that connects to the Zabbix API, handles authentication, processes data, and presents it through an AI layer. That is what I built. It works, but it requires significant development effort.

WebMCP opens a different path. Imagine opening the Zabbix frontend in your mobile browser and having an AI assistant that can interact with it natively – no app required, no separate server, just the browser and the protocol. The assistant inherits your Zabbix session, sees only what your user role permits, and can help you triage incidents, assign tasks, and generate reports all through the same web interface you already use.

We are not there yet. WebMCP is still in early preview, and the Zabbix frontend needs to implement the protocol. But the architectural direction is clear. The web is becoming agent-ready, and monitoring tools will benefit enormously from this shift.

The practical roadmap

If you work with Zabbix and want to start integrating AI today, here is how I see the progression:

  • Right now: Use MCP servers to connect AI assistants to the Zabbix API. The open-source options are mature, support Zabbix 7.x (and experimentally 8.0), and work with multiple AI clients. Start with read-only mode to explore safely.
  • Near term: Build purpose-specific integrations. Whether it is a mobile app, a chatbot, or a custom dashboard, the Zabbix API combined with models like Gemini or Claude can deliver real value AI-generated weekly reports, intelligent alert triage, natural language infrastructure queries.
  • Coming soon: Keep an eye on WebMCP. As it matures and browsers ship stable support, it will become the lowest-friction way to add AI capabilities to any web-based monitoring tool. The sites that become agent-ready first will have a compounding advantage.

Closing thoughts

The infrastructure monitoring world is at an inflection point. We have been watching dashboards and reading alerts for decades. The protocols are now emerging – MCP for backend integrations, WebMCP for browser-native interactions that will let our infrastructure genuinely talk back to us.

If you are still running Zabbix 7.0 or previous, this is the year to migrate. Older versions are losing support, and the newer API capabilities in 7.0+ are what make these AI integrations possible. Zabbix offers certification programs through Zabbix Academy, and their partner network can assist with migrations.

The post Exploring AI Integration in Zabbix with Gemini and WebMCP appeared first on Zabbix Blog.

Defend against frontier cyber models: Cloudflare’s architecture as customer zero

Post Syndicated from Rohit Chenna Reddy original https://blog.cloudflare.com/frontier-model-defense/

A few weeks ago, we wrote about Project Glasswing and what we observed when we pointed cyber frontier models at our own code. Since then, we’ve seen that the part of the post that has resonated most deeply is the argument that the architecture around the vulnerability matters more than the speed of the patch.

In the conversations we’ve had with CISOs and security teams since, the questions have been consistent: what does our architecture actually look like, what should we monitor for, where do we start, and how can Cloudflare help?

Before getting into the details: the architecture below is built almost entirely from Cloudflare’s own products, because Cloudflare security is customer zero for the security products we build. The Cloudflare stack already exists in front of our code, employees, and customer-facing applications. If you’re a Cloudflare customer, every layer below is available to you today. If you’re not, the principles still apply to whatever stack you’ve built.

What a cyber frontier model actually changes

In the previous post, we showed how a cyber frontier model like Mythos changes the attacker’s timeline. It can find vulnerabilities, reason through exploit chains, and generate working proofs faster than earlier models. While models like Mythos do not change the shape of an intrusion — reconnaissance, initial access, lateral movement, persistence, and exfiltration still have to happen — the difference is in the speed and scale. When pointed at the open web, a model can find and hit low-hanging fruit quickly. Against a hardened target, it still has to probe, and adapt, and it often produces more noise than a careful human operator would.

Discovery, exploit chain construction, and proof-of-concept generation used to be the gating constraints on producing a working attack. A frontier model handles all three in a fraction of the time. Work that used to be slow and methodical is now fast and indiscriminate.

While AI is accelerating how fast developer teams at Cloudflare and many other companies can ship code, the security team’s work has not compressed the same way. An attacker only needs one opening to get in, while security teams need to find and close them all. Writing a fix, regressing it, and shipping it without breaking the code around it has constraints that AI doesn’t remove. We learned this the hard way when we let an AI coding assistant write its own patches against our own bugs, as we described at the end of the previous post. Some of those patches fixed the original bug while quietly breaking something else the code depended on.

As these models become more competent and capable, our main focus from a threat standpoint comes down to three things. Each one shapes the architecture we walk through in the rest of this post.

  • The first is the speed of discovery. Frontier models make it easier to search large bodies of public code, including the open-source libraries that many companies depend on. That does not mean every bug in a library is exploitable, or that library bugs are where most vulnerabilities live. Exploitability still depends on how the code is used, whether attacker-controlled input can reach the vulnerable path, and the protections that sit around it. But widely used open-source libraries and frameworks give attackers a shared surface to study at scale. When a real, reachable vulnerability exists there, a model can help find it, reason about possible exploit paths, and generate proof-of-concept variants faster than maintainers and defenders can review every downstream use. The gap between when an attacker discovers a vulnerability and when defenders learn it exists is what worries us most. If you are not running these models against your own code, it is safe to assume someone else is.

  • The second is exploit volume and adaptation. A model can produce thousands of variations of a single exploit and run reconnaissance at the same scale. All that volume gives an attacker an advantage, but it won’t necessarily get them past signature-based detections. Many of those iterations will have the same underlying signature, so a rule that catches the first one will catch the rest. Adaptation is how they will get past signature-based detections. Ask a model to show you a SQL injection, and it will return a textbook example. Tell it there is a WAF in the way, and it will start probing, learning what gets blocked, and rewriting the payload until it can slip past the rule blocking it.

  • The third is the impact when a vulnerability is inevitably exploited. No architecture catches everything. After the vulnerability is exploited, the question we ask ourselves is: where can the attacker get to with one identity, one path, or one credential, before something else stops them? If the answer is “anywhere they want,” the vulnerability was never the problem. The architecture around the vulnerability was.

Cloudflare’s superpower: visibility

We see roughly a fifth of the world’s web traffic and that traffic tells us, in real time, which payloads are mutating, which patterns are picking up, and where attacker tooling is moving next. Two teams turn that visibility into defense.

First is Cloudforce One, our threat intelligence, research, and operations team, which sits within the Cloudflare security organization. They turn what we see across the network into insights the rest of the stack can act on: tracked adversaries, emerging campaigns, and indicators of compromise (IOCs). The hard part of this work was never knowing what is malicious — it was the delay in mitigation. Knowledge of a new threat normally has to travel from a threat report, into a feed, and then into a company’s defense before it can be used to block anything. Attackers have learned to move faster than that. Our network closes that gap: Cloudflare customers can now use Cloudforce One threat intelligence directly within the WAF to block high-risk traffic.

Second is the team that owns the WAF engine that does the actual detecting: the managed rulesets that run in front of our own properties and are available to every Cloudflare customer, the machine learning behind WAF Attack Score, and the relationships that sometimes let us ship a rule before a CVE is publicly disclosed. The team is globally distributed and moves fast, releasing rules within hours of a proof-of-concept of an attack becoming known. Once a detection is deployed, it reaches our entire network, along with every Cloudflare customer, in under 30 seconds. React2Shell is a recent example: a managed WAF rule was protecting our own properties, and everyone else’s on Cloudflare, hours before the official advisory was published.

The scoring layer, the defenses we put in front of the application, and the containment around the vulnerability all build on what these two teams see. 

Scores over signatures

Signature-based defenses were built for a world where novel exploits were scarce and variations took weeks. Cloudflare’s traditional SLA from a fresh proof-of-concept to a live, deployed rule has been 12 hours. With the advent of frontier models, this is not good enough anymore. Detections need to be in place before a CVE is discovered. This is why we layer ML-based detection in front of the traditional signature-based WAF.

The model is trained on a large body of past attack traffic, and it catches new variants of vulnerabilities before they’re publicly known. A novel SQL injection or remote code execution chain is almost always a rearrangement of attack shapes the model has seen before, even when the specific exploit is brand new. We run the model on every request and assign a WAF Attack Score between 1 and 99, based on how closely the request resembles those underlying shapes, not against a list of known-bad signatures. The lower the score, the more aggressively we treat the request. That score determines whether we let the request through. We apply a similar scoring methodology to AI prompts with AI Security for Apps: rather than check each prompt against a list of known malicious prompts, we score how closely a prompt resembles an actual attack. 

The architecture around the vulnerability

Those capabilities only matter once they’re stacked in front of an application, and the first layer in our defense-in-depth approach is the WAF. Anything that matches a known-bad pattern gets dropped before it reaches the application, which clears the bulk of the obvious traffic and lets the more specialized layers below focus on what’s left.

On the API surface, we run a positive security model through API Shield. Instead of trying to anticipate every bad request, we describe what a valid request to each API looks like, either from the API’s own definition or learned from our real traffic, and anything that doesn’t fit doesn’t get through. This neutralizes the advantage of frontier AI models: because we only permit validated traffic, generating thousands of new attack variations fails to bypass the system.


Cloudflare’s layered architecture

Bot Management catches probing traffic on our network before frontier models can build a map. It scores every request on how likely it is to be automated, using the same signals across our whole network: how the client behaves, whether it looks like a real browser, and whether the connection matches a known-bad pattern. An attack only lands if it can find a soft spot. 

Zero Trust Network Access is used for every internal application. The implicit trust of being inside the network is replaced with explicit per-request identity and policy for every employee accessing every tool. The value of this was clear when one of our engineers shipped a misconfigured tool. A flat network would have exposed everything on the same segment, but in our deployment, the exposure stopped at the tool itself. We built Require Access Protection afterwards so newly deployed or misconfigured applications can’t be reachable before an access policy is in place.

IdP Federation makes that secure by default posture easier to keep consistent across every Cloudflare account — which becomes even more necessary when more people are shipping internal tools quickly. Instead of asking each team to wire up SSO separately, we configure our identity provider (IdP) once and share it across the organization. New accounts get SSO automatically, recipient-side IdP connections are read-only, and Access policies in each account still evaluate the resulting identity as part of the normal request flow. 

MCP Server Portal gives teams a controlled way to connect AI agents to enterprise systems. Agents access MCP servers that are centrally managed through a single portal, with every action logged. That way when an agent acts on someone’s behalf, we know what it did, what it touched, and whether it should have been allowed to. The full picture of how we built it is in our post on enterprise MCP.

AI Gateway runs in front of our internal AI tools the same way AI Security for Apps runs in front of customer-facing AI features, with the same scoring and the same visibility. Inside the company, the visibility piece is more useful than the blocking, because we needed to see what engineers were actually building before we could write meaningful policy on it.

Where your teams can start 

Frontier models can help attackers find vulnerabilities, adapt payloads, and move faster, but they still have to pass through the layered defense you deploy in front of your application. That is where teams should start:

  • Put inspection in front of public applications.

  • Define what valid API traffic looks like.

  • Use bot detection to limit automated probing.

  • Require identity and access policy before any internal tool is reachable.

For AI and agentic systems:

  • Route model traffic through a gateway.

  • Keep agents connected through approved MCP servers.

  • Log what they do. 

The goal is to make sure that when one layer misses, the next layer limits what the attacker can see, reach, or change.

That is the point of the architecture around the vulnerability: to limit the scope of an attack. The vulnerability may be what starts the attack, but the architecture determines how far it can go.

How do we know this approach works?

Plenty of security stacks look impenetrable on a whiteboard but fall over in practice. That is why we test ours continuously, both at the perimeter and inside our environment, with our red team involved across both.

At the perimeter, frontier models are one tool we use to test our application security stack as an adaptive attacker. These models sit alongside the rest of our red team and detection workflows including: manual testing, threat intelligence, observed traffic patterns, proof-of-concept analysis, and signals from our own network. Together, those inputs help us decide where to aim testing: newly launched products, recently changed surfaces, and the paths an attacker is most likely to probe first. The most important part is the process that follows. When something gets through, we identify the gap, use the right mix of tools to understand it, write the rule or mitigation, ship the update, and test again to make sure the gap is closed.

Inside the environment, our red team starts from the assumption that the perimeter has already failed. They look at what has changed, where sensitive systems carry risk, and whether one compromised identity, path, or credential can reach farther than it should. When we change the architecture based on what they find, they run the scenario again against the new version to confirm the gap is actually closed.

We confirm that this architecture is working by continuously testing its behavior during failures, rather than relying on the perfection of individual layers.

If your team is working on the same problems and would like to compare notes, reach out to us at [email protected].

AWS Weekly Roundup: BYOM for Amazon RDS for SQL Server, AWS IoT Device SDK for Swift, and more (June 8, 2026)

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-byom-for-amazon-rds-for-sql-server-aws-iot-device-sdk-for-swift-and-more-june-8-2026/

This week, the AWS IoT Device SDK for Swift reached general availability. As a member of the Swift Server Workgroup (SSWG), this one caught my attention. The SDK brings production-ready MQTT 5 connectivity, Device Shadow, Jobs, and fleet provisioning to Swift developers on macOS, iOS, tvOS, and Linux.

Swift on IoT and Edge devices, an AI generated illustration

I’m curious to see what you will build with it. Swift on the server has matured over the past few years, and now it reaches IoT devices too. This connects to a broader trend of running Swift at the edge. WendyOS, for example, is an open-source operating system for physical AI that offers first-class Swift support for deploying apps to NVIDIA Jetson and Raspberry Pi hardware. Between server-side Swift, IoT, and edge computing, the language is showing up in places that would have surprised most people a few years ago.

Now, let’s get into this week’s AWS news.

Headlines
Amazon RDS for SQL Server supports Bring Your Own Media — Customers who migrate SQL Server applications from on-premises environments can now reuse their existing Microsoft SQL Server licenses, including Software Assurance, through Microsoft’s License Mobility program on Amazon RDS. BYOM is integrated with AWS License Manager for tracking license usage and compliance. Read more.

Amazon Cognito now supports multi-Region replication — You can now synchronize user and machine identity data, including credentials, user pool configurations, and federation setups, to a secondary user pool in a standby Region in near real-time. In the event of a disruption in the primary Region, signed-in users continue accessing their applications without re-authenticating, and registered users can sign in with their existing credentials. Multi-Region replication is available as an add-on for user pools in Essentials or Plus feature tiers across 16 Regions. Read more.

GPT-5.5, GPT-5.4, and Codex from OpenAI are now generally available on Amazon Bedrock — You can now use GPT-5.5 and GPT-5.4 in production workloads on Amazon Bedrock and build with Codex for AI-powered software development, with the same security, governance, and operational controls you already use across AWS. GPT-5.5 is the most capable model from OpenAI, excelling at agentic coding, data analysis, and multi-step autonomous tasks. Codex is available through the Codex App, the Codex CLI, and IDE integrations with Visual Studio Code, JetBrains, and Xcode. Pricing matches OpenAI first-party rates, and usage counts toward existing AWS commitments. Read more.

Last week’s launches
Here are some launches and updates from this past week that caught my attention:

For a full list of AWS announcements, be sure to keep an eye on the What’s New with AWS page.

Upcoming AWS events
Learn more about AWS, browse and join upcoming AWS-led in-person and virtual events, startup events, and developer-focused events as well as AWS Summits and AWS Community Days. Join the AWS Builder Center to connect with builders, share solutions, and access content that supports your development.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— seb

ICYMI: May 2026 @AWS Security

Post Syndicated from Rodolfo Brenes original https://aws.amazon.com/blogs/security/icymi-may-2026-aws-security/

Read all about the latest AWS security features, compliance updates, and hands-on resources in our new, monthly digest posts. You’ll find expert blog posts, new service capabilities, code samples, and workshops.

AWS Security Blog posts

This month’s AWS Security Blog posts covered AI security, network protection, identity management, compliance frameworks, and supply chain security. Read on for practical guidance on securing agentic AI workflows, filtering network traffic by category, defending against supply chain attacks, and more.

AI Security

Security posture improvement in the AI era
Author: Celeste Bishop | Published: May 1, 2026
Learn to use the Security Health Improvement Program (SHIP) to strengthen security fundamentals across 10 core use cases for confident AI adoption.

Enabling AI sovereignty on AWS
Author: Stéphane Israël | Published: May 12, 2026
Learn how AWS delivers control and choice across the AI stack to help customers meet digital and AI sovereignty requirements.

The AWS AI Security Framework: Securing AI with the right controls, at the right layers, at the right phases
Authors: Riggs Goodman III, Christopher Rae | May 15, 2026
A structured framework that helps security leaders align the right security controls to the right AI use case, at the right layer, at the right deployment phase.

Why Policy in Amazon Bedrock AgentCore chose Cedar for securing agentic workflows
Authors: Liana Hadarean, Jean-Baptiste Tristan | May 20, 2026
Learn how Cedar’s deterministic authorization, automated reasoning, and formal verification capabilities secure agentic AI tool invocations through Amazon Bedrock AgentCore Gateway.

Infrastructure security

Securing open proxies in your AWS environment
Author: Dodd Mitchell | Published: May 4, 2026
Learn to identify and secure open proxies in your AWS environment to prevent abuse, protect your IP reputation, and control costs.

Introducing AI traffic analysis dashboards for AWS WAF
Authors: Christopher Jen, Eitav Arditti, Kaustubh Phatak | Published: May 5, 2026
A new dashboard providing visibility into AI bot and agent activity including bot identification, intent classification, and access pattern analysis.

Simplifying policy management with URL and Domain Category filtering on AWS Network Firewall
Authors: Lawton Pittenger, Sofía Aluma-Santos, Eric Fortenbery, Mostafa Elkhouly | May 28, 2026
Learn to use AWS Network Firewall’s URL and domain category filtering to control access to website categories like AI services, manage exceptions for approved domains, and monitor traffic patterns with Amazon CloudWatch Logs Insights.

    Why and how to migrate to a Transit Gateway-attached AWS Network Firewall


    Authors: Frank Phillis, Lawton Pittenger | May 28, 2026

    Learn to migrate your centralized AWS Network Firewall deployment to a
    AWS Transit Gateway-attached model, eliminating the inspection
    Amazon VPC and enabling flexible cost allocation.

    Identity

    Regional routing for AWS access portals: Implementing custom vanity domains for IAM Identity Center
    Authors: Georgi Baghdasaryan, Laura Reith, Sowjanya Rajavaram | May 14, 2026
    Learn to build a custom vanity domain with latency-based routing and automated failover for IAM Identity Center multi-Region access portals.

    Automating identity lifecycle and security with AWS Directory Service APIs
    Authors: Ali Alzand, Kevin Sookhan | May 21, 2026
    Learn to use the new AWS Directory Service Data APIs with Amazon GuardDuty and AWS Step Functions to automate identity lifecycle management and respond to security threats.

    Governance and compliance

    Announcing the ISO 31000:2018 Risk Management on AWS compliance guide
    Authors: Jesse McMahan, Akanksha Chaturvedi, Mayur Jadhav, Juan Rodriguez, Sana Rahman | Published: May 1, 2026
    A compliance guide providing practical guidance for establishing a risk management program using ISO 31000:2018 principles in AWS environments.

    New compliance guide available: ISO/IEC 42001:2023 on AWS
    Authors: Abdul Javid, Amber Welch, Muhammad Sharief, Jonathan Jenkyn, Satish Uppalapati | Published: May 6, 2026
    A compliance guide providing practical guidance for designing and operating an Artificial Intelligence Management System (AIMS) using AWS services.

    Introducing the updated AWS User Guide to Governance, Risk, and Compliance for Responsible AI Adoption
    Authors: Krish De, Stephen James Martin, Brenda Fong, Kelvin Leung | May 13, 2026
    An updated guide providing FSI customers practical considerations for responsible AI adoption across governance, risk management, compliance, data management, and AI agent management.

    Governing infrastructure as code using pattern-based policy as code
    Authors: Guptaji Teegela, Paul Keastead | May 19, 2026
    Learn to use Open Policy Agent (OPA) in CI/CD pipelines to validate AWS infrastructure changes before deployment using recurring control patterns.

    Import historical data from AWS CloudTrail Lake to Amazon CloudWatch
    Authors: Isaiah Salinas, Erik Weber|Published: May 6, 2026
    Learn to import historical data from AWS CloudTrail Lake into Amazon CloudWatch for centralized log analysis.

    Data protection

    Automating post-quantum cryptography readiness using AWS Config
    Author: Pravin Nair | May 14, 2026
    Learn to use the PQC Readiness Scanner to inventory your ALB, NLB, and Amazon API Gateway endpoints and continuously monitor their TLS configurations for post-quantum cryptography readiness.

    Threat detection and response

    Detecting and preventing crypto mining in your AWS environment
    Authors: Jason Palmer, Nadia Mahmood | May 13, 2026
    Learn to use Amazon GuardDuty to identify and mitigate cryptocurrency mining threats in your AWS environment with a multi-layered defense strategy.

    Well-architected best practices for software supply chain security
    Authors: Trevor Schiavone, Desiree Brunner | May 26, 2026
    Learn to apply AWS Well-Architected Framework security best practices to defend against software supply chain attacks like Shai-Hulud using temporary credentials, centralized dependency management, artifact signing, and continuous scanning.

    AWS Security Hub Extended: Why enterprise security products should sell themselves
    Author: Michael Fuller | May 20, 2026
    A thought leadership piece on how AWS Security Hub Extended enables frictionless, pay-as-you-go adoption of curated partner security solutions alongside AWS-native services.

    Application Security

    Five ways to use Kiro and Amazon Q to strengthen your security posture
    Author: Roger Nem | Published: May 5, 2026
    Learn to use Kiro and Amazon Q Developer for security finding triage, infrastructure remediation, security reviews, and service control policies (SCP) development.

    AWS Security Agent full repository code scanning feature now available in preview
    Authors: Ayush Singh, Daniele Bonadiman | May 12, 2026
    Learn to use AWS Security Agent’s full repository code review to perform deep, context-aware security analysis of your entire code base.

    Training and enablement

    Complimentary virtual training: Get hands-on with AWS Security services
    Author: Ashley Nelson | Published: May 11, 2026
    Security Activation Days are free 3–6 hour virtual workshops providing hands-on practice with AWS security services guided by specialists.

    May Security Bulletins

    Investigations of reported security vulnerabilities affecting Amazon and AWS services, software, and products.

    AWS Samples

    This month brings 8 new AWS samples spanning application security, data protection, infrastructure security, governance, and AI security. From AI-powered security agents on Amazon Bedrock AgentCore to centralized AWS Config monitoring at scale, these repositories help you implement security best practices across your AWS environment.

    Application Security

    Schedule AWS Security Agent penetration test
    Learn to deploy a AWS CloudFormation template that uses Amazon EventBridge and AWS Step Functions to schedule recurring AWS Security Agent penetration tests with Amazon Simple Notification Service
    (SNS) notifications on completion.

    Security review assistant
    Learn to deploy a multi-agent system on Amazon Bedrock AgentCore that automates Deliverable Security Reviews by combining architecture analysis, IaC code review, ASH vulnerability scanning, and compliance assessment into a single pipeline.

    AWS Security Agent Recorder
    Learn to use a cross-browser extension that records the unique domains your web app contacts and auto-fills them into the AWS Security Agent penetration test configuration.

    Data Protection

    KMS access audit
    Learn to resolve and report who can use your AWS Key Management Service (KMS) keys across IAM policies, key policies, and grants, with IAM Identity Center resolution to identify the humans behind SSO roles.

    Infrastructure security

    Building a conversational AI agent for AWS WAF analysis with AgentCore
    Learn to deploy an AI-powered agent using Amazon Bedrock AgentCore and Strands SDK that investigates AWS WAF security incidents, detects bypasses, and generates security reports through natural language.

    Governance

    Centralized AWS Config CI monitoring with Amazon CloudWatch
    Learn to centrally monitor AWS Config Configuration Item recording across all accounts in an AWS Organization using CloudWatch Cross-Account Observability, with dashboards showing top resource types, per-account volume, and conformance pack compliance.

    CloudFormation Guard security analyzer
    Learn to deploy an AI agent powered by Amazon Bedrock AgentCore that scans CloudFormation resource documentation, identifies security-critical properties with risk levels, and generates ready-to-use cfn-guard 3.x rules for your CI/CD pipeline.

    AI Security

    Guarded user-controlled attested runtime deployment (Guardian Platform)
    Learn to deploy LLM models securely in consumer AWS accounts while protecting model weights using AWS Nitro TPM attestation, KMS envelope encryption, and Zero Operator Access with immutable AMIs.

    AWS Labs

    This month brings 1 new AWS Labs repository focused on governance, helping research institutions deploy secure, tagged infrastructure with self-service access and multi-account controls.

    ResearchStack on AWS
    Learn to deploy research computing infrastructure on AWS in minutes — Amazon EC2, S3, EFS, Amazon SageMaker AI, and ParallelCluster — with built-in security, cost tracking, and governance using CloudFormation templates and optional AWS Service Catalog.

    Conclusion

    May 2026 shows AI security maturing from model-level controls to full-stack protection of agentic workflows. The posts and samples provide patterns for policy-based authorization with Cedar, network traffic filtering by category, and cross-account compliance monitoring. The security bulletins address vulnerabilities in SDKs, drivers, and developer tooling. Each resource includes deployment steps or runnable code so you can validate in your own environment before adopting. Subscribe to the AWS Security Blog RSS feed to receive updates as they publish, and revisit this digest monthly for a consolidated view of what changed and what to act on.

    If you have feedback about this post, submit comments in the Comments section below.


    Rodolfo Brenes

    Rodolfo is a Principal Solutions Architect focused on Cloud Governance and Compliance. With over 18 years of experience, he currently leads a technical field community in AWS helping customers scale and improve their security and governance frameworks. Besides work, Rodolfo enjoys video games, playing with his four cats, and won’t say no to a good outdoor adventure.

    Anna Brinkmann

    Anna has 18 years of experience in the technical content space and has spent the last 6 years managing the AWS Security Blog. Outside of work, she enjoys spending time with her family.

    Unlock cost savings with incremental snapshot billing for Amazon Redshift Serverless and Amazon Redshift RG

    Post Syndicated from Nidhi Nayak original https://aws.amazon.com/blogs/big-data/unlock-cost-savings-with-incremental-snapshot-billing-for-amazon-redshift-serverless-and-amazon-redshift-rg/

    Amazon Redshift customers rely heavily on snapshots, which are point-in-time backups of their data, for disaster recovery, compliance retention, and data portability across AWS Regions. Amazon Redshift supports two types of snapshots: automated and manual. For provisioned clusters, automated snapshots are enabled by default and retained for up to 35 days; manual snapshots persist until you delete them. For serverless workgroups, Amazon Redshift automatically creates recovery points that are retained for 24 hours, and you can also create manual snapshots with a configurable retention period. For details on snapshot creation and backup storage pricing, you can refer to Amazon Redshift pricing for more details.

    Starting June 8, 2026, Amazon Redshift is introducing an incremental snapshot billing model for Amazon Redshift Serverless and Amazon Redshift RG (provisioned instances powered by AWS Graviton). With this enhancement, you pay only for the unique data blocks across your active manual snapshots within your account. This delivers significant cost savings for customers who have multiple snapshots that contain largely identical data blocks.

    In this post, you will learn how the new incremental snapshot billing model works, the customer use cases it addresses, and how it helps you optimize costs while improving your Recovery Point Objective (RPO).

    Incremental snapshot billing

    With this new billing model, Amazon Redshift bills manual snapshots based on unique data blocks. When you take multiple manual snapshots of the same workgroup or cluster, much of the data remains unchanged between snapshots. The billing model recognizes this overlap and charges only for the unique data blocks across your active snapshots. Data that has not changed between snapshots is counted once.

    Consider a 10 TB data warehouse with three manual snapshots:

    • Snapshot 1 (Day 1): Full backup, 10 TB of unique data blocks
    • Snapshot 2 (seconds later): Nothing changed, shares data blocks with Snapshot 1, no additional charge
    • Snapshot 3 (two days later): 1 TB of new unique data blocks created from changes
    • Total billed: 11 TB of unique data blocks

    Using this example, customers pay for the 10 TB of unique data blocks in Snapshot 1 plus the 1 TB of new unique data block in Snapshot 3. Snapshot 2 shares its blocks with Snapshot 1, so it adds zero cost. Hence, total 11 TB of unique data blocks are billed.

    Key billing model details

    With the new incremental snapshot billing model, you are charged only for the unique data blocks at the existing snapshot rates. Following are the key details of the new feature:

    • Scope: Amazon Redshift Serverless and Amazon Redshift RG instances. Amazon Redshift RA3 instances retains the current tiered S3 billing.
    • Rate: Based on the existing snapshot pricing for your Region.
    • Deduplication level: Account-level for Amazon Redshift Serverless and RG.
    • Automated snapshots: Unchanged, still available at no additional cost (35 days for Provisioned, 1 day for Serverless).
    • Existing snapshots: Automatically transition to the incremental snapshot billing model. No action required.

    This model is especially valuable for customers needing backup retention beyond the automated snapshot windows available at no additional cost. Serverless customers needing backup beyond 24 hours can now take manual snapshots knowing they pay for a unique data block, making extended retention more practical and affordable.

    Benefits

    With the incremental snapshot billing model, customer can adopt stronger data protection strategies at optimized costs:

    Compliance-driven long-term retention

    Regulated industries (financial services, healthcare, government, and life sciences) must often retain backups for 90 days to 5+ years. Since this billing model charges only for unique data blocks, retention policies become significantly more affordable as snapshots accumulate.

    How this feature helps: You can now maintain backup retention (90-day, 1-year, 7-year) on Amazon Redshift Serverless and RG at optimized cost. A 10 TB warehouse with 5% daily change rate retaining 90 days of daily snapshots pays for ~14.5 TB of unique data blocks total across all snapshots.

    Disaster recovery with better Recovery Point and Time Objectives (RPO/ RTO)

    Many customers want more frequent snapshots (hourly instead of daily) for tighter recovery objectives. Because each additional snapshot is billed only for its new unique data blocks, frequent backups are practical and affordable.

    How this feature helps: You can take hourly snapshots where each one adds only ~0.2% in new unique data (assuming 5% daily change rate). More snapshots mean more recovery points and less data loss in a failure scenario, all at optimized cost.

    Cross-Region disaster recovery at lower cost

    Snapshots copied to another region for disaster recovery are also billed based on unique data blocks. Organizations maintaining multi-Region disaster recovery (DR) strategies pay proportionally to actual data changes, making geographic redundancy affordable.

    How this feature helps: If you are running active-passive or active-active multi-Region architectures, you can copy snapshots across Regions more frequently, improving cross-Region RPO while keeping DR costs proportional to actual data changes rather than full dataset size.

    Affordable extended backups

    With the incremental snapshot billing model, extended manual backups are more affordable for customers, regardless of their workload size. Even retention policies (7-day, 14-day) cost proportionally to actual data changes, for enhanced data protection posture across the board.

    How this feature helps: Customers no longer need to choose between data protection and budget. This billing model helps make extended retention cost effective for workloads of varying sizes.

    Pricing example

    For example, you have an Amazon Redshift Serverless workgroup with 10 TB of active data in US East (Ohio). You take daily manual snapshots with 7-day retention. Your data changes at 5% per day (0.5 TB/day).

    Component Calculation Monthly Cost
    Active data 10 TB × 1,024 GB/TB × $0.023 $235.52
    Unique snapshot blocks (after deduplication) 13 TB × 1,024 GB/TB × $0.023 $306.18
    Total $541.70

    Because shared blocks across snapshots are counted only once. You pay for 13 TB of unique snapshot data rather than the full cumulative size of all seven daily snapshots.

    Compounding savings on Amazon Redshift RG

    If you are evaluating migrating from RA3 to RG, the savings stack significantly. Some of the compounding savings on RG include:

    • RG instances are priced at 30% discount as compared to RA3 instances.
    • Reserved Instances (RI) pricing is available for RG which provide further compute savings.
    • Incremental billing alleviates duplicate snapshot charges for backup storage.
    • Data lake queries are included in RG compute pricing, thereby avoiding the per-terabyte scanning charges of Amazon Redshift Spectrum.

    The combined effect of these options for RG can deliver an aggregate greater than 30% cost reduction over RA3. You can lock in RI pricing on RG clusters for predictable, long-term savings on top of the incremental snapshot benefit.

    Getting started

    No action is required on your end. Your existing manual snapshots automatically transition to the incremental snapshot billing model on June 8, 2026.

    To maximize the benefit:

    1. Review your current snapshot usage in the AWS Billing and Cost Management console.
    2. Increase snapshot frequency. More frequent snapshots now cost proportionally less since each additional snapshot only adds its unique data blocks to your bill.
    3. Extend retention policies. Compliance driven retention (90-day, 1-year, 7-year) is now significantly more affordable.
    4. Evaluate RA3 to RG migration. Consider the 30% compute savings, combined with RI eligibility during RG evaluation for migrating from RA3.
    5. Explore Serverless. The enhanced billing model makes Serverless a cost-effective option for customers who need backup retention beyond the 24-hour automated recovery point window.

    Conclusion

    The incremental snapshot billing model for Amazon Redshift Serverless and Amazon Redshift RG charges only for unique data blocks across your snapshots. This supports more frequent snapshots for better disaster recovery, affordable long-term compliance retention, and a compelling path to Amazon Redshift Serverless adoption. Combined with Amazon Redshift RG’s 30% compute discount and Reserved Instances, this delivers meaningful total cost savings across your entire Amazon Redshift spend.

    Review your snapshot strategy today and share your feedback on AWS re:Post. For full pricing details, visit the Amazon Redshift pricing page.


    About the authors

    Nidhi Nayak
    Nidhi is a Senior Technical Account Manager with AWS, she helps enterprise customers build scalable, high-performance cloud applications and optimize cloud operations. With over a decade of experience in Data Analytics, Nidhi currently focuses on Redshift & Generative AI integration with Redshift.

    Raza Hafeez
    Raza is a Senior Product Manager, Technical at Amazon Redshift. He has 15+ years of experience building and optimizing enterprise data warehouses and is passionate about making cloud analytics accessible and cost-effective for customers of all sizes.

    Sushmita Barthakur
    Sushmita is a Senior Data Solutions Architect at AWS, supporting Strategic customers architect their data workloads on AWS. With a background in data analytics, she has extensive experience helping customers architect and build enterprise data lakes, ETL workloads, data warehouses and data analytics solutions, both on-premises and the cloud. Sushmita is based in Florida and enjoys traveling, reading and playing tennis.

    Amy Huang
    Amy is a Senior Financial Analyst at AWS and a CPA with over 7 years of progressive experience across Strategic Finance, Banking, and Auditing. She specializes in pricing, financial modeling and valuation, and data-driven analysis. Outside of work, she enjoys yoga and hiking.

    Critical Zcash Vulnerability Found and Fixed

    Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2026/06/critical-zcash-vulnerability-found-and-fixed.html

    If you’re a user—owner?—of this cryptocurrency, this is important:

    On May 29, the security researcher Taylor Hornby found a critical vulnerability in Zcash Orchard privacy pool using Claude Opus 4.8. The Zcash team hired Hornby specifically to look for this kind of issue. He found one fast enough to be embarrassing.

    The Orchard pool is the newest and most advanced shielded transaction system in the cryptocurrency Zcash. Introduced in 2022, it allows users to send and receive ZEC while keeping transaction details private. It uses zero-knowledge proofs to validate transactions without revealing amounts or participants. The bug: a specific check that was supposed to validate transaction inputs wasn’t actually enforcing the rules it appeared to enforce. An attacker could have exploited the flaw to feed false inputs into that check and generate ZEC from nothing, with the zero-knowledge proof system blessing the fraudulent transaction as valid.

    It’s fixed; that’s the good news. The bad news is that there’s no way of knowing if anyone exploited the vulnerability to steal money. And this fragility is the fundamental problem that makes blockchain such a bad idea.

    Critical Check Point VPN Zero-Day Exploited in the Wild (CVE-2026-50751)

    Post Syndicated from Rapid7 original https://www.rapid7.com/blog/post/etr-critical-check-point-vpn-zero-day-exploited-in-the-wild-cve-2026-50751

    Overview

    On June 8, 2026, Check Point published a security advisory for CVE-2026-50751, a critical authentication bypass vulnerability affecting Check Point Remote Access VPN, Mobile Access, and Spark Firewall products. The vulnerability affects deployments configured to use the deprecated IKEv1 key exchange protocol where gateways accept legacy Remote Access clients and do not require a machine certificate for connections.

    CVE-2026-50751, classified as improper authentication (CWE-287), has a CVSS score of 9.3. The vulnerability stems from a logic flow weakness in how Remote Access and Mobile Access components validate certificates during IKEv1 key exchange; successful exploitation allows an unauthenticated attacker to establish a VPN session without providing valid credentials. Per the vendor, additional post-authentication activity is required to access internal resources or escalate privileges.

    Check Point has indicated that CVE-2026-50751 is being actively exploited in the wild, with observed activity dating back to May 7, 2026 and an increase in early June. The vendor characterizes the campaign as limited in scope, affecting several dozen organizations. At least one incident has been linked to a Qilin ransomware affiliate, which Check Point assesses with medium confidence.

    Separately, during its investigation Check Point identified a related vulnerability, CVE-2026-50752 (CVSS 7.4), in the same IKEv1 code path that could enable a man-in-the-middle attack against site-to-site VPN tunnels under certain configurations. No exploitation of CVE-2026-50752 has been observed.

    Check Point VPN products have been targeted by zero-day vulnerabilities in the past. In May 2024, CVE-2024-24919, a high-severity information disclosure vulnerability in Check Point Quantum Security Gateways, was exploited in the wild and subsequently added to the CISA Known Exploited Vulnerabilities (KEV) catalog. Organizations running affected Check Point products are urged to apply the available hot fixes and follow the vendor guidance to remediate these issues.

    Mitigation guidance

    Check Point has released hotfixes to remediate CVE-2026-50751. Affected organizations should apply the available updates on an emergency basis, without waiting for a regular patch cycle to occur.

    The following products and versions are affected (Remote Access VPN, Mobile Access / SSL VPN, Spark Firewall):

    • R80.20.X (End of Support)

    • R80.40 (End of Support)

    • R81 (End of Support)

    • R81.10 (End of Support)

    • R81.10.X

    • R81.20

    • R82

    • R82.00.X

    • R82.10

    Notably, four of the nine affected version branches (R80.20.X, R80.40, R81, R81.10) have reached End of Support. Organizations still running these versions should prioritize migration to a supported release.

    For organizations unable to immediately apply the hotfix, Check Point has provided the following alternative mitigations:

    • Remove support for the legacy remote access client

    • Configure global properties for Remote Access VPN authentication to IKEv2 only

    • Set machine certificate authentication as mandatory

    • Enable IPS and download the latest signatures

    Rapid7 strongly recommends looking for signs of compromise even after the hotfix has been applied. Per Check Point’s advisory, incident response teams should prioritize forensic log audits and configuration reviews starting from May 7, 2026, the earliest known date of exploitation.

    For the latest mitigation guidance, please refer to the vendor advisory.

    Rapid7 customers

    Exposure Command, InsightVM, and Nexpose

    Exposure Command, InsightVM, and Nexpose customers can assess exposure to CVE-2026-50751 with a vulnerability check expected to be available in the June 9 content release.

    Indicators of compromise

    Check Point has published the following indicators associated with the CVE-2026-50751 exploitation campaign. The attacker infrastructure consists of VPS hosts from several providers (Kaupo Cloud HK, Shock Hosting, Vultr Holdings), and Check Point notes that in some cases, the VPS region matched the geography of the targeted organization.

    IP addresses:

    • 45.77.149[.]152

    • 209.182.225[.]136

    • 38.60.157[.]139

    • 162.33.177[.]101

    • 45.76.26[.]42

    • 144.208.127[.]155

    • 38.54.88[.]201

    • 38.54.107[.]167

    • 66.42.99[.]200

    File hashes (MD5):

    • 52fda5c1b9704544f32ee98d9060e689

    • 51d39aa39478beeac94f2d12f682ecce

    Check Point observed post-exploitation attempts to retrieve ELF payloads from attacker-controlled servers, and identified ties to the Qilin ransomware operation based on binary analysis. For the full and most current list of IOCs, please refer to the vendor advisory.

    Updates

    • June 8, 2026: Initial publication.

    Integrating Event Source Mappings with AWS Lambda tenant isolation mode

    Post Syndicated from Anton Aleksandrov original https://aws.amazon.com/blogs/compute/integrating-event-source-mappings-with-aws-lambda-tenant-isolation-mode/

    Building event-driven multi-tenant SaaS applications typically requires compute isolation between tenants to prevent data leakage, maintain security boundaries, and ensure compliance. Traditionally, you had to choose between two approaches: sharing execution environments across tenants (risking cross-tenant contamination of in-memory state) or managing separate Lambda functions per tenant (which introduces operational overhead, increasing costs, and complicating deployments). Both approaches required you to make trade-offs between security, operational complexity, and cost efficiency.

    AWS Lambda tenant isolation mode with Event Source Mappings addresses this trade-off. This approach reduces operational complexity, improves your security posture, and removes the need to manage separate functions per tenant, all while maintaining strict compute-level isolation boundaries. You can now build event-driven architectures using services like Amazon SQS and Amazon EventBridge where each tenant’s workloads run in dedicated execution environments, but you manage only a single Lambda function.

    In this post, you’ll learn how to propagate tenant identity from event payloads, implement IAM permissions for tenant-isolated invocations, apply validation strategies to verify tenant context, and use a lightweight routing mechanism that invokes tenant-isolated backends. Complete sample code demonstrating this pattern is available in the AWS samples repository.

    Understanding Lambda tenant isolation mode

    AWS Lambda tenant isolation mode extends Lambda’s execution model by introducing tenant-aware routing of invocations. Instead of reusing execution environments across all invocations of a function, Lambda associates each execution environment with a specific tenant identifier. When a new request is received, Lambda routes it to an existing environment for that specific tenant or creates a new one if none exists.

    Tenant Isolation ArchitectureFigure 1. Using Lambda tenant isolation mode for compute isolation

    This simplifies how you build multi-tenant SaaS systems, while maintaining isolation boundaries at the compute level. Execution environments are never shared across tenants but still reused within the same tenant for maximum efficiency. That means you can safely cache tenant-specific configurations, such as feature flags or database connection strings, without adding isolation logic manually in your code.

    To use the tenant isolation mode, every invocation must include a tenant ID parameter. For synchronous, direct invocations, such as originating from Amazon API Gateway or AWS SDKs, you pass it using the X-Amz-Tenant-Id header, as described in the launch blog and service documentation. Lambda service uses this header to route the invocation to tenant-specific execution environments. Inside your function handler, the tenant ID is available using the context.tenantId property, so you can implement tenant-aware logic.

    port const handler = async (event, context) => {
        const tenantId = context.tenantId;
    
        // Tenant-specific business logic here
        console.log(`Processing request for tenant: ${tenantId}`);
    };

    Figure 2. Accessing tenant ID from function handler.

    When using API Gateway, you can extract the tenant ID value from incoming request metadata, such as HTTP headers, path parameters, query parameters, or JWT claims, and map it directly to the downstream X-Amz-Tenant-Id in the API Gateway integration request configuration. See the launch blog for detailed guidance.

    This model works well for direct, synchronous invocations. However, many serverless applications rely on event-driven patterns, where Lambda is invoked through Event Source Mappings.

    Using tenant isolation mode with event sources

    Many serverless applications use event-driven architectures built on services like Amazon SQS, Amazon EventBridge, Amazon Kinesis, or Amazon DynamoDB Streams. In these cases, Lambda is invoked by an Event Source Mapping (ESM), which polls the event source and invokes your function when new events arrive.

    With these services, you’ll commonly find the tenant identity embedded in the event payload or metadata – for example, in an SQS message body or EventBridge event detail. Each event source has its own payload schema. Below are example payloads when using SQS and EventBridge, where you can see the tenantId parameter present in the payload.

    SQS message body:

    {
        "tenantId": "TenantA",
        "orderId": "ord-12345",
        "eventType": "ORDER_PLACED",
        "payload": { ... }
    }

    EventBridge event detail:

    {
        "source": "com.myapp.orders",
        "detail-type": "OrderPlaced",
        "detail": {
            "tenantId": "TenantA",
            "orderId": "ord-12345"
        }
    }

    However, event sources don’t provide a built-in mechanism to map message properties to HTTP headers. As a result, if you try to invoke a function with tenant isolation mode enabled directly from an event source mapping, it fails because the tenant ID isn’t propagated as the X-Amz-Tenant-Id header. The following section describes how to address this and integrate ESMs with tenant-isolated Lambda functions.

    Propagating tenant identity with Event Source Mappings

    To propagate tenant identity from ESM messages, you can introduce a routing component – a lightweight Lambda function that sits between the event source and your tenant-isolated backend function. Your routing function receives events from the ESM, extracts the tenant ID from each message, and invokes your backend function using the Lambda Invoke API, passing the required X-Amz-Tenant-Id header. See the following diagram for an example architecture using SQS ESM.

    Lambda with tenant isolated SQS

    Figure 3. Propagating tenant ID from SQS messages to Lambda with tenant isolation mode enabled

    You don’t need to enable tenant isolation mode on the routing function itself – it acts as a stateless dispatcher. Your multi-tenant backend function, which contains your core business logic, runs with tenant isolation mode enabled and receives properly scoped, tenant-aware invocations. This pattern keeps tenant isolation at the backend layer while preserving a shared event ingestion model.

    The following example illustrates a routing function that processes incoming SQS messages, extracts the tenant ID from each message body, and invokes your backend function with the appropriate tenant context. This example assumes MessageGroupId is used to carry the tenant identifier, which ensures messages from the same tenant are processed in order when you’re using FIFO queues.

    export const handler = async (event) => {
        for (const record of event.Records) {
            const body = record.body;
            const messageGroupId = record.attributes?.MessageGroupId;
    
            const command = new InvokeCommand({
                FunctionName: BACKEND_FUNCTION_NAME,
                InvocationType: 'Event',
                TenantId: messageGroupId,
                Payload: Buffer.from(body)
            });
    
            await lambdaClient.send(command);
        }
    }

    Figure 4. Routing SQS messages to a Lambda function with tenant isolation mode enabled

    The following example illustrates how you can achieve the same routing functionality when processing EventBridge events.

    export const handler = async (event) => {
        const tenantId = event.detail?.tenantId;
    
        if (!tenantId) {
            throw new Error(`Missing tenantId in EventBridge event: ${JSON.stringify(event)}`);
        }
    
        const command = new InvokeCommand({
            FunctionName: BACKEND_FUNCTION_NAME,
            InvocationType: 'Event',
            TenantId: tenantId,
            Payload: JSON.stringify(event.detail),
        });
    
        await lambdaClient.send(command);
    };

    Figure 5. Routing EventBridge events to a Lambda function with tenant isolation mode enabled

    IAM permissions

    Your routing function’s execution role needs permission to:

    1. Poll the event source: You can apply this policy either to your function execution role or as a resource policy on the event source itself.
    2. Invoke the downstream backend function: Additionally, your router function requires the lambda:InvokeFunction permission scoped to your backend function ARN.

    Below is an example execution role policy to allow the router function to poll from an SQS queue

    {
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Action": [
                "sqs:ReceiveMessage",
                "sqs:DeleteMessage",
                "sqs:GetQueueAttributes"
            ],
            "Resource": "arn:aws:sqs:us-east-1:123456789012:my-queue"
        }]
    }

    Below is an example execution role policy to allow the router function to invoke the backend function

    {
        "Version": "2012-10-17",
        "Statement": [{
            "Effect": "Allow",
            "Action": "lambda:InvokeFunction",
            "Resource": "arn:aws:lambda:us-east-1:123456789012:function:my-backend-function"
        }]
    }

    Figure 6. IAM permissions used for implementing the tenant ID router function mechanism.

    Best practices and considerations

    When implementing the pattern described in this post, keep these important considerations in mind regarding validation, scaling, and overall system design.

    Validate tenant identity before invocation. Tenant identity comes from event payloads, you shouldn’t automatically assume it’s trustworthy. Here’s how to protect your system:

    • Validate incoming payloads and reject messages with missing, malformed, or unauthorized tenant IDs at the routing layer before invoking your backend function
    • Maintain an authoritative tenant registry and validate incoming tenant IDs against it
    • Use dead-letter queues (DLQs) on your SQS queues to capture messages that fail validation for investigation and replay
    • When using EventBridge Pipes, use the enrichment step to validate or normalize tenant IDs before they reach your routing function
    • Enable partial batch response for applicable ESMs, such as SQS, so your routing function can report individual message failures without failing the entire batch

    Plan for scaling considerations. Tenant isolation mode creates separate execution environments per tenant. This can increase the number of cold starts compared to shared environments. Each tenant consumes concurrency independently, so monitor your usage and request quota increases as your tenant base grows.

    Optimize the routing function. Your routing function introduces an additional invocation segment. Use asynchronous invocation (InvocationType: ‘Event’) to reduce idle waiting time and size your function accordingly.

    Understand permission boundaries. Tenants share your backend function’s execution role. If you need fine-grained per-tenant permissions, consider propagating tenant-scoped credentials (for example, using AWS STS AssumeRole) from the upstream segment.

    Sample code

    A complete, deployable sample project demonstrating this pattern – including SQS routing functions, a tenant-isolated backend function, and AWS SAM infrastructure – is available in this GitHub repository. Follow the instructions in README.md to provision the sample project in your account

    Conclusion

    Lambda tenant isolation mode introduces cross-tenant compute isolation for your multi-tenant SaaS applications by routing each invocation to a tenant-specific execution environment. When you combine this with event-driven architectures built on services like SQS, EventBridge, and Kinesis, the routing function pattern described in this post allows you to propagate tenant identity from event payloads and invoke your tenant-isolated backend with the correct context.

    This approach extends tenant isolation mode to your asynchronous workloads without changing your core business logic. You retain per-tenant execution environment isolation while continuing to use Lambda’s native event source integrations, scaling model, and operational tooling. Together, these patterns provide you with a practical foundation for building secure, scalable, event-driven multi-tenant SaaS applications on AWS.

    Next steps: Consider extending this pattern to other event sources like Kinesis Data Streams or DynamoDB Streams. You can also explore combining this approach with AWS Step Functions for orchestrating complex multi-tenant workflows while maintaining tenant isolation boundaries.

    Follow below links to learn more:

    Operationalizing AWS security: A maturity roadmap

    Post Syndicated from Joseph Sadler original https://aws.amazon.com/blogs/security/operationalizing-aws-security-a-maturity-roadmap/

    Enabling security tooling is the starting point. Making it operational—where findings drive decisions, response times are measurable, and your security posture improves week over week—is where most organizations struggle.

    This blog post provides a phased maturity roadmap for organizations that have already enabled AWS Security Hub and Amazon GuardDuty. These two services form the foundation of a cloud-centered security operations capability on AWS. Security Hub provides centralized security posture management and aggregates findings from multiple AWS security services, while GuardDuty provides intelligent threat detection by continuously monitoring for malicious activity and unauthorized behavior. For any production or enterprise AWS environment, having both services enabled across all accounts and AWS Regions is a baseline expectation; not because they’re optional add-ons, but because effective security operations require both the ability to detect threats and the ability to understand your overall security posture. If you haven’t yet enabled them, the Security Hub documentation and GuardDuty documentation provide setup guidance, including multi-account deployment with AWS Organizations.

    Customers consistently tell us that while individual AWS security service documentation is thorough, what’s missing is a consolidated operational playbook—one resource that ties the services together into a working security operations practice with clear phases, progression criteria, and an operational cadence. That’s the gap this post fills. Rather than covering how each feature works (the documentation does that well), this post focuses on when and why to use each capability, and how to build the organizational habits that make them effective.

    What follows is a six-phase roadmap for moving from these services are active to these services are driving our security operations. Each phase builds on the previous one, and each is designed to deliver tangible, measurable improvement.

    Phase 0: Assess your current state

    Goal: Understand what’s working before changing anything.

    Estimated timeline: 1–2 weeks

    Move to Phase 1 when: You have a documented current-state assessment covering all the following items.

    Before introducing new processes or automation, establish a clear picture of the current environment. This assessment informs every decision that follows.

    Actions:

    • Findings inventory: Review existing active GuardDuty findings to determine how many there are, the severity distribution, and how old the oldest findings are. A large backlog of untouched HIGH or CRITICAL findings that have been sitting for weeks is a strong signal about where to focus first.
    • Security Hub score baseline: Determine your current compliance score against AWS Foundational Security Best Practices (FSBP) and The CIS AWS Foundations Benchmark. Check to see which standards are enabled; if multiple standards are enabled, review for overlapping standards (creating noise) or unused standards.
    • Multi-account and multi-Region check: Look to see if GuardDuty is enabled in every account and every Region, or only in Regions with active workloads. Threat actors frequently operate in Regions that organizations don’t actively monitor. Also check to see if Security Hub aggregation is configured with a delegated administrator account or if each account is being managed independently.
    • Integration check: Determine if GuardDuty findings are flowing into Security Hub and if Amazon Inspector and Amazon Macie are enabled and feeding findings in. Without integration, Security Hub might be only surfacing its own compliance checks.
    • Notification check: See if there’s an Amazon EventBridge rule configured for notifications and if so, how findings are being routed and to whom. Know if notifications are being sent using an Amazon Simple Notification Service (Amazon SNS) topic or a chat channel integration. Without a clear notification and response workflow, findings can accumulate silently in the console with no one looking at them.

    Deliverable: A one-page current state assessment that identifies what’s enabled, what’s flowing where, who’s looking at it, and what’s in the existing backlog.

    Phase 1: Reduce the noise

    Goal: Make the signal meaningful before asking anyone to act on it.

    Estimated timeline: 2–3 weeks

    Move to Phase 2 when: Remaining findings represent items requiring real decisions, compliance scores reflect actual posture, and you can articulate why every suppression rule and disabled control exists.

    This is the single most important phase. If this step is skipped in favor of jumping straight to automation, the result is automated chaos. Alert fatigue is the primary reason security tooling is ignored, and addressing it first is what makes everything that follows sustainable.

    GuardDuty tuning:

    • Create suppression rules for known-benign findings. The goal is to suppress activity you’ve already evaluated and accepted—such as expected traffic from corporate egress IPs (based on trusted IP lists), internal tools that trigger DNS-based findings, or internet-facing resources that naturally receive port scanning. The principle: if you’ve investigated a finding and it’s expected, suppress it so your team can focus on what matters.
    • Triage every active HIGH and CRITICAL finding into three categories: needs immediate investigation (real threat, not yet reviewed), true positive, already addressed (archive using workflow status), or false positive or expected behavior (create a suppression rule). Every finding must be categorized into one of these three states.
    • Review GuardDuty protection plans and enable any that are relevant but not yet active. Organizations that enabled GuardDuty years ago might not have activated protection plans released since then (such as Runtime Monitoring, Malware Protection, RDS Protection, and Lambda Protection). Evaluate each against your workload profile and enable what applies.

    Security Hub tuning:

    • Disable controls that aren’t relevant to the environment. This is the highest-value quick win. If a service isn’t in use, disable its controls. If a control is addressed by an alternative solution, disable it. A 47% compliance score where half the failures are irrelevant trains teams to ignore the dashboard entirely. See the Security Hub controls reference for the full list.
    • Choose a primary standard. AWS Foundational Security Best Practices is a strong default. The CIS AWS Foundations Benchmark adds value when there’s a specific compliance mandate. Avoid enabling PCI DSS or NIST 800-53 standards unless there’s a reporting requirement—they add significant volume without proportional signal for most organizations.
    • Configure cross-Region aggregation to the delegated administrator account if not already in place. A single aggregated view eliminates the need to check findings across multiple Regional consoles.
    • Use the workflow status field operationally. Findings should progress from NEW to NOTIFIED to RESOLVED or SUPPRESSED. If everything remains in NEW indefinitely, the system carries no operational meaning.

    Deliverable: A tuned environment where remaining findings represent items that require real decisions. Compliance scores should now reflect your organization’s actual security posture rather than noise.

    Phase 2: Build the notification and routing layer

    Goal: Get the right findings to the right people at the right time.

    Estimated timeline: 2–3 weeks

    Move to Phase 3 when: CRITICAL and HIGH findings reach the security team within minutes, MEDIUM findings create tracked tickets, and notifications include enriched context. No action is taken until a person or an automation is informed that something needs attention.

    Architecture: Security Hub to EventBridge rule to routing logic to destination

    Tiered notification strategy:

    CRITICAL Page on-call immediately PagerDuty or Opsgenie 15 minutes
    HIGH Alert security team channel Slack or Teams channel and ticket creation 4 hours
    MEDIUM Create ticket for review Jira or ServiceNow 48 hours
    LOW or INFORMATIONAL Batch digest Weekly email summary or dashboard review Next review cycle

    Key design decisions:

    • Route from Security Hub, not individual services. Because findings from GuardDuty, Inspector, Macie, and Security Hub compliance checks all aggregate in Security Hub, build your EventBridge rules there for centralized management.
    • Create a fast path for the most dangerous finding types. Certain GuardDuty findings, particularly those involving credential exfiltration, cryptocurrency activity, trojans, and active compromises, warrant a separate, faster routing path that bypasses normal triage. Identify these based on your threat model and the GuardDuty finding types reference.
    • Enrich notifications before delivery. A raw JSON finding in a chat channel provides little actionable context. Use an AWS Lambda function to format notifications with the information responders need: account alias, Region, Amazon Resource Name (ARN), finding type, severity, a console deep link, and a plain-language description. The Security Hub CloudWatch Events integration guide describes the event format.

    Deliverable: A working notification pipeline where CRITICAL and HIGH findings reach the security team within minutes, MEDIUM findings create tracked work items, and LOW and INFORMATIONAL findings are batched for periodic review.

    Phase 3: Build automated remediation for high-confidence findings

    Goal: For findings where the correct response is deterministic, remove the human from the loop.

    Estimated timeline: 3–4 weeks

    Move to Phase 4 when: At least 3–5 high-confidence finding types have automated responses deployed with audit trails, and the team has established a process for evaluating new auto-remediation candidates.

    The guiding principle: Only auto-remediate when all three conditions are met: the finding is high-confidence, the response is deterministic, and the blast radius of the automated action is limited. Automated remediation must not create the risk of a production outage.

    Decision framework:

    Confidence level High – no false positive risk Medium – context-dependent Low – requires investigation
    Response complexity Single, well-defined action Multiple steps or judgment calls Requires forensic analysis
    Blast radius Limited to one resource Could affect dependent services Production-wide impact
    Rollback difficulty Straightforward to reverse Moderate effort to reverse Difficult or impossible to reverse

    Common auto-remediation categories:

    • Instance isolation for confirmed compromise findings (cryptocurrency mining, malware, and trojans): Replace the security group, snapshot volumes for forensics, and notify.
    • Credential revocation for confirmed credential compromise: Attach deny-all policies, revoke sessions, and deactivate access keys as appropriate to the credential type.
    • Compliance drift correction for deterministic misconfigurations: Re-enable Amazon Simple Storage Service (Amazon S3) Block Public Access, revoke overly permissive security group rules, and re-enable AWS CloudTrail logging.
    • Notification-only escalation for findings that require human judgment before action: Amazon Elastic Block Store (Amazon EBS) encryption gaps (require migration) and access key rotation (requires coordination with the key owner).

    For implementation, AWS provides Security Hub Automated Response and Remediation (SHARR), a solution that includes pre-built remediation playbooks deployed as AWS Step Functions workflows triggered by EventBridge. This is a strong starting point—evaluate the provided playbooks, enable the ones that fit, and extend with custom remediations as needed.

    Note: For findings that recur because the environment lacks preventive guardrails, the best long-term response is often a service control policy (SCP) that prevents the misconfiguration from occurring in the first place. Phase 5 covers this preventive controls layer.

    Deliverable: A library of automated and semi-automated remediation runbooks with full audit trails, and a documented decision framework the team uses to evaluate new auto-remediation candidates.

    Phase 4: Build the operational rhythm

    Goal: Turn security findings management into a sustained organizational practice, not a one-time cleanup.

    Estimated timeline: 4–6 weeks to establish, then ongoing

    Move to Phase 5 when: The weekly cadence has been running consistently for at least 8 weeks, monthly metrics show positive trends, and the first quarterly review has been completed.

    This is where many organizations stall, and it’s the most important phase in the entire roadmap. The technology is working, the notifications are flowing, automated remediations are firing, but there’s no organizational habit built around it. Without this phase, everything you’ve built in Phases 0–3 will gradually decay. Suppression rules will go stale, new team members won’t know the system exists, and findings will start accumulating again. The operational rhythm is what converts a security tooling deployment into a security operations practice.

    Weekly security review (30 minutes)

    Attendees: Security team lead, cloud platform team representative, rotating engineering lead from an application team

    Why the rotating engineering lead matters: Security findings don’t exist in a vacuum; they’re generated by workloads that engineering teams own. Rotating an engineering representative through this meeting accomplishes three things: it builds security awareness across the organization, ensures findings are routed to people with the context to resolve them, and creates organizational accountability beyond the security team.

    Agenda template:

    5 minutes Compliance score trend – Review Security Hub scores by account and standard. Is the trend improving, declining, or flat? If declining, why? Security lead Identified regression areas
    5 minutes Critical and high findings review – Walk through new HIGH and CRITICAL GuardDuty findings from the past week. Are there any that need immediate escalation? Security lead Escalation actions assigned
    10 minutes Top five failing controls – Identify the five Security Hub controls with the most failures. Assign an owner and a target date for each. Platform lead Owners and dates documented
    5 minutes Automation review – Did any auto-remediations fire this week? Did they work correctly? Were there any false triggers? Security lead Automation adjustments queued
    5 minutes Tuning decisions – Are new suppression rules needed based on this week’s findings? Are any new finding types candidates for auto-remediation? All Tuning backlog updated

    Running the meeting effectively:

    • Keep a running document (such as a wiki page or shared document) that captures decisions and action items week over week. This becomes your institutional memory.
    • If the compliance score hasn’t moved in over 3 weeks, that’s a signal. Either the assigned work isn’t happening, or the remaining findings are genuinely difficult to address. Both need to be discussed.
    • Track action items from previous weeks. A review that generates action items but never follows up on them will lose credibility and attendance quickly.

    Escalation procedures

    Define clear escalation paths before they’re needed:

    CRITICAL finding not acknowledged within the SLA Auto-escalate to security team manager 15 minutes after SLA breach
    HIGH finding not resolved within the SLA Escalate to finding owner’s manager 4 hours after SLA breach
    Compliance score drops more than 5 points in a week Escalate to cloud platform team lead for investigation Next business day
    Auto-remediation failure Page security on-call Immediate
    New finding type not covered by existing runbooks Add to weekly review agenda for triage and runbook development Next weekly review

    Monthly metrics report

    Compile these metrics monthly and review them with security and engineering leadership. The goal is to tell a story about whether the organization’s security posture is improving, stable, or degrading, and why.

    Mean time to acknowledge (MTTA) for CRITICAL findings Are findings being seen promptly? Decreasing month over month
    Mean time to resolve (MTTR) for CRITICAL and HIGH findings Are findings being acted on? Decreasing month over month
    Security Hub compliance score by standard, by account What is the posture trend over time? Increasing month over month
    Number of active GuardDuty findings by severity Is the backlog growing or shrinking? Decreasing for HIGH and CRITICAL
    Findings auto-remediated compared to manually resolved Is automation delivering value? Auto-remediation ratio increasing
    Number of suppressed findings (with quarterly justification review) Is noise being managed, or are problems being hidden? Stable or decreasing
    New findings introduced compared to resolved this month Is the organization getting ahead or falling behind? More finding resolved than introduced
    SLA adherence rate by severity Are response commitments being met? More than 95% for CRITICAL, and more than 90% for HIGH

    Building the dashboard: Use Amazon CloudWatch dashboards for real-time operational visibility or Amazon QuickSight connected to Security Hub findings through Amazon Security Lake for historical trend analysis and executive reporting. The dashboard should be visible to—and regularly viewed by—everyone in the weekly review, not locked in a security team tool.

    Quarterly reviews

    The quarterly review is a deeper inspection of the system itself; not just the findings, but the machinery processing them.

    Quarterly review checklist:

    • Suppression rules audit: Review every active suppression rule to determine if the underlying condition is still present and the suppression is still justified. Document the review outcome for each rule.
    • Disabled controls audit: Review every disabled Security Hub control. Check that the justification is still valid and if the environment changed (for example, a service that wasn’t in use is now in use).
    • Automation audit: Review AWS Identity and Access Management (IAM) roles used by remediation functions and verify least privilege. Review execution logs for any anomalies or failures that weren’t caught.
    • New capabilities review: Evaluate newly released GuardDuty protection plans and Security Hub controls from that quarter. AWS releases new detection and compliance capabilities regularly. If you’re not reviewing them quarterly, you’re accumulating blind spots.
    • Process effectiveness review: Determine if the weekly meeting is well-attended and if action items are being completed. Make sure SLAs are being met. If attendance, action item completion, and SLA compliance aren’t where they should be, explore structural changes to address the gaps.

    Operational maturity scoring

    Use this rubric to assess the maturity of your operational rhythm itself. Score each dimension 1–3 and use the total to track progress over time.

    Review cadence One time reviews when someone remembers Weekly review happens, but attendance is inconsistent Weekly review is consistently attended with documented outcomes
    Metrics tracking No metrics captured Metrics are collected monthly but not acted on Metrics drive decisions and declining trends trigger specific actions
    Finding ownership Findings sit in queue with no owner Findings are assigned to teams but SLAs aren’t tracked Every finding has an owner, SLAs are tracked, and breaches are escalated
    Automation management Set-and-forget automations Automation logs are reviewed periodically Automation is reviewed weekly, and new candidates are evaluated continuously
    Tuning lifecycle Suppression rules created but never reviewed Annual review of suppressions and disabled controls Quarterly reviews with documented justification for every rule
    Cross-team engagement Security team works in isolation Platform team participates Engineering teams actively participate and own remediation

    Scoring (revisit quarterly):

    • Beginning: 6–9
    • Established: 10–14
    • Optimized: 15–18

    Deliverable: A documented operational cadence with clear ownership (consider a RACI matrix), metrics dashboards, escalation procedures, and a continuous improvement loop. The cadence should survive team member turnover—if it depends on one person remembering to run it, it’s not yet operational.

    Phase 5: Mature the architecture

    Goal: Fill remaining gaps and build toward a comprehensive security operations capability. Estimated timeline: Ongoing. Prioritize based on organizational risk profile and compliance requirements.

    • Amazon Inspector integration: Enable Amazon Inspector for Amazon Elastic Compute Cloud (Amazon EC2) instances, Lambda functions, and Amazon Elastic Container Registry (Amazon ECR) container images. Findings flow into Security Hub automatically, adding vulnerability management alongside threat detection and posture management. Prioritize this if you have Amazon EC2 or container workloads without an existing vulnerability scanning solution.
    • Amazon Macie: Enable Amazon Macie for S3 buckets containing potentially sensitive data. Particularly important for organizations with compliance requirements around personally identifiable information (PII), protected health information (PHI), or Payment Card Industry (PCI) data. Configure automated sensitive data discovery and route findings to Security Hub.
    • Amazon Security Lake: Amazon Security Lake centralizes security-relevant logs in OCSF format for long-term retention, forensic investigation, and threat hunting. This is valuable when you need historical analysis beyond the Security Hub retention window, or when feeding a third-party Security Information and Event Management (SIEM) tool.
    • Preventive controls layer: Convert recurring detective findings into preventive policies. Use SCPs to prevent disabling GuardDuty, Security Hub, and CloudTrail, IAM permission boundaries on developer roles, AWS WAF on public endpoints, and AWS Network Firewall for VPC traffic inspection. The pattern is to make recurring misconfigurations impossible to introduce.
    • Detective controls expansion: Use AWS IAM Access Analyzer for external access and unused access findings, AWS CloudTrail Lake for long-term queryable audit logs, and AWS Config custom rules for organization-specific compliance checks.
    • Incident response readiness: Have incident response playbooks referencing specific GuardDuty finding types, pre-built forensics infrastructure (isolated VPC, forensic AMIs, and pre-configured IAM roles), regular tabletop exercises, and AWS CloudFormation templates to deploy isolation infrastructure on demand. See the AWS Security Incident Response Guide for a comprehensive framework.

    Conclusion

    In this post, I provided a six-phase roadmap for operationalizing Security Hub and GuardDuty and showed that it isn’t a single project, but a progression. Phase 0 and Phase 1 can typically be completed in 3–5 weeks and deliver immediate clarity. Phases 2 and 3 build the response infrastructure that turns findings into action over the following 5–7 weeks. Phase 4 is what makes everything sustainable and is where you should invest the most attention. And Phase 5 expands the aperture from Security Hub and GuardDuty into a comprehensive security operations capability.

    If you walked away from this post and did one thing, run the Phase 0 assessment this week. That single deliverable tells you exactly where to focus next. Use the following self-assessment checklist to identify your current phase, then focus on the next one. A tuned environment with working notifications and a weekly review cadence is dramatically more effective than a fully featured but neglected deployment. Start where you are, reduce the noise, build the habits, and iterate. To learn more, explore the AWS Security Hub User Guide, the Amazon GuardDuty User Guide, and the AWS Security Incident Response Guide. If you’ve implemented a similar operational cadence, or have questions about any phase, share your experience in the comments.

    Self-assessment checklist

    Phase 0 We know how many active GuardDuty findings exist across all accounts
    We know our current Security Hub compliance score
    We know whether GuardDuty is enabled in every account and region
    We know who (if anyone) is reviewing findings today
    Phase 1 GuardDuty suppression rules exist for known-benign activity
    Irrelevant Security Hub controls have been disabled with documented justification
    All active HIGH and CRITICAL findings have been triaged
    Security Hub compliance scores reflect actual posture, not noise
    Phase 2 HIGH and CRITICAL findings generate real-time notifications to the security team
    MEDIUM findings automatically create tracked work items
    Notifications include enriched context (account alias, resource ARN, and console link)
    Phase 3 At least three high-confidence finding types trigger automated remediation
    Auto-remediation actions have full audit trails
    Remediation runbooks are documented and version-controlled
    Phase 4 A weekly security review meeting occurs with defined attendees and agenda
    MTTA and MTTR are tracked monthly for CRITICAL and HIGH findings
    Suppression rules and disabled controls are reviewed quarterly
    Security metrics trend positively over the past 3 months
    Phase 5 Amazon Inspector, Macie, or Security Lake are integrated
    Preventive controls (SCPs, permission boundaries) address recurring findings
    Incident response playbooks exist and are tested through tabletop exercises

    If you have feedback about this post, submit comments in the Comments section below.


    Joseph Sadler

    Joseph Sadler

    Joseph is a Senior Solutions Architect on the Worldwide Public Sector team at AWS, specializing in cybersecurity and machine learning. With public and private sector experience, he has expertise in cloud security, artificial intelligence, threat detection, and incident response. His diverse background helps him architect robust, secure solutions that use cutting-edge technologies to safeguard mission-critical systems

    The collective thoughts of the interwebz