Baryon Asymmetry

2026-06-24 xkcd.com

Post Syndicated from xkcd.com original https://xkcd.com/3263/

Wait, what do you mean, 'dark matter'? It's not dark, it interacts with high-energy gamma rays ... right? Oh jeez, did I forget to make it interact?

This is why Tesla FSD fails

2026-06-24 digiblur DIY

Post Syndicated from digiblur DIY original https://www.youtube.com/shorts/-X9KCh_l8Ks

What One Message to My Kids Was Really Saying

2026-06-23 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/shorts/_ED0ti2BHyM

The post-quantum EO is an important milestone. Now it’s time to get to work

2026-06-23 Sharon Goldberg

Post Syndicated from Sharon Goldberg original https://blog.cloudflare.com/post-quantum-eo-2026/

On June 22, 2026, President Trump signed Executive Order 14409, “Securing the Nation Against Advanced Cryptographic Attacks.” The order sets a December 31, 2030, deadline for federal agencies to transition their most sensitive systems to post-quantum encryption, and a December 31, 2031, deadline for post-quantum authentication. The EO also directs federal contractors to comply with post-quantum Federal Information Processing Standards (FIPS) by the end of 2030.

We welcome this executive order. The U.S. government has a long track record of using federal leadership and procurement to drive adoption of new technologies across the broader industry. We’ve seen this work with IPv6, with routing security and the Resource Public Key Infrastructure (RPKI), and with DNSSEC, and we’re glad to see this tradition continue with post-quantum cryptography.

The EO is especially important at this moment because the timeline for Q-Day, the day that quantum computers can break the public-key cryptography used across the Internet, has been accelerated. In April 2026, Cloudflare moved our own target for full post-quantum security to 2029, following research breakthroughs from Google and Oratomic. This EO updates guidance from 2024, when the National Institute of Standards and Technology (NIST) stated that the classical public key cryptography used across the Internet (namely RSA and Elliptic Curve Cryptography, which can be broken once powerful quantum computers become available) should be deprecated by 2030 and disallowed by 2035.

The Internet’s transition to post-quantum encryption is well underway, while the transition to post-quantum authentication has only just begun. Today, over two-thirds of browser traffic to Cloudflare’s network is protected with post-quantum encryption, and most of our products support post-quantum key agreement. Our SASE platform, Cloudflare One, provides post-quantum encryption across all major on-ramps and off-ramps, including TLS, MASQUE, and IPsec. We’ve recently started deploying post-quantum authentication and aim to be fully post-quantum secure by 2029. The EO is an excellent foundation and builds on work from the previous two Administrations. We’ve been doing the work the EO is asking federal agencies to do since 2019, we have some thoughts on what the order gets right, we see opportunities for the Office of Management and Budget (OMB) to strengthen and facilitate cost-effective agency migration, and we provide a roadmap for how organizations and agencies can advance their transition most effectively.

The EO’s requirements for federal systems

The bulk of the EO’s binding requirements are aimed at two categories of federal systems: High Value Assets (HVAs) and high impact systems. HVAs are federal information or systems designated by OMB as the government’s crown jewels: systems whose compromise would significantly affect national security, foreign relations, or public confidence. These include databases that hold millions of federal employee records, systems that process classified intelligence, or platforms that manage federal financial transactions. Meanwhile, high impact systems are those where confidentiality, integrity, or availability is rated “high” under FIPS 199, meaning a breach could cause severe harm including loss of life, major financial damage, or significant degradation of an agency’s ability to carry out its mission.

The EO has the power to bind federal agencies, but not other organizations (i.e., critical infrastructure, state, local, tribal and territorial governments, academia, civil society). That’s why the EO only gives these deadlines to federal agencies:

Date	Requirement
July 2026	Each federal agency head identifies a PQC migration lead and provides their name and contact details to OMB and the National Cyber Director.
September 2026	OMB issues guidance requiring each agency to: (1) review their inventory of HVAs and high impact systems; (2) plan for PQC migration; and (3) submit that plan to OMB and the National Cyber Director.
December 2030	All HVAs and high impact systems must be transitioned to PQC for key establishment.
December 2031	All HVAs and high impact systems must be transitioned to PQC for digital signatures.

National Security Systems are explicitly excluded from these deadlines. They are on a separate, classified track managed by the NSA with deadlines between 2030 and 2033 already set in 2022.

Two migrations: encryption and authentication. Both should begin now.

The EO splits the PQC migration into two phases: post-quantum key establishment (encryption) by 2030, and post-quantum digital signatures and certificates (authentication) by 2031. This accurately reflects the availability of post-quantum encryption across the Internet today. Our own deadline for full post-quantum readiness (including authentication) is 2029, but we are amongst the earliest adopters in the industry.

We are also happy to see the EO focusing on NIST-standardized post-quantum cryptographic algorithms and not Quantum Key Distribution (QKD), since QKD does not operate at Internet scale due to its need for specialized hardware and dedicated physical links between sender and receiver.

Now let’s have a deeper look at the two migrations called for and required in the EO: post-quantum encryption and post-quantum authentication.

Post-quantum encryption is needed today to stop harvest-now-decrypt-later attacks, where an adversary collects encrypted traffic today and decrypts it later once quantum computers are powerful enough. Post-quantum encryption is especially valuable for organizations handling data that will still have value to adversaries 3-10 years from now, like government agencies, banks, healthcare organizations, defense contractors, and telecom providers.

Post-quantum authentication stops an adversary that has a quantum computer from forging certificates to impersonate servers, generating malicious code signatures, or gaining unauthorized access to systems. Post-quantum authentication is needed only after Q-Day risk materializes, because it stops attacks that are possible only once a cryptographically-relevant quantum computer (CRQC) exists.

It’s important to put the migration timelines in context with advancements in quantum computing. In addition to yesterday’s EO on post-quantum security, President Trump also signed an EO to accelerate deployment and commercialization of quantum computing, sensing, and networking. The fact that the EO sets a 2031 deadline for post-quantum authentication tells us something important: the U.S. government believes there is a non-negligible chance that a CRQC could be operational around that time.

What about the state of these two technologies? The migration to post-quantum authentication is a bigger challenge than post-quantum encryption for a few reasons, including:

Post-quantum ML-DSA digital signatures are larger than classic digital signatures, which could have an impact on performance of some systems, for instance in short-lived TLS connections. That’s why we are working with Google Chrome on Merkle Tree Certificates to solve the performance problem for TLS.
The dependency chain for post-quantum authentication is longer, requiring coordinated upgrades across clients, servers, certificate authorities, certificate transparency logs, root stores, and browsers.
There is only limited ecosystem deployment of post-quantum authentication so far, as compared to the much broader deployment of post-quantum encryption.

It is interesting that the EO sets a one-year gap between the encryption and authentication deadlines. One extra year of calendar time is tight, so this work cannot proceed sequentially. The ecosystem needs to start working on both of these targets concurrently, or we will miss this 2031 deadline.

Cryptographic deployment across the Internet cannot happen without standards developed by the Internet Engineering Task Force (IETF). They are working to transition their protocols to post-quantum cryptography. The TLS community is ahead, with the IETF PLANTS working group making good progress on post-quantum certificates for TLS. There is much work to do here and we look forward to supporting the IETF in its efforts.

Supply chain pressure that helps everyone

The EO includes requirements for federal contractors, which may turn out to be the most impactful part of the EO.

Namely, the FAR Council must publish proposed rules requiring “covered contractors” to comply with NIST FIPS incorporating PQC algorithms by December 31, 2030 (Sec. 6(c)). The FAR Council must also publish proposed rules requiring contractors to implement vulnerability disclosure programs that cover cryptographic vulnerabilities (Sec. 6(d)). These proposed rules need to go through notice-and-comment rulemaking, but the EO has a December 31, 2030 target which is still important. This deadline is one year earlier than federal agencies are required to complete their post-quantum authentication migration, so that federal contractors will be ready before agencies hit their own deadlines.

Federal agencies can only migrate to PQC if the products they buy support PQC. To put this into practice, CISA released its Product Categories for Technologies That Use Post-Quantum Cryptography Standards, drawing a clear line between technologies where PQC is already “widely available” versus those still “transitioning.” The “widely available” list includes cloud platforms (IaaS, PaaS), web browsers and servers, chat and messaging software, and endpoint security products like full disk encryption. For these categories, CISA’s guidance is clear: organizations should procure only PQC-capable products. The “transitioning” list, where PQC is not yet widely available, includes networking hardware (routers, firewalls, switches), identity and access management systems (HSMs, certificate authorities, identity providers), email servers and clients, and database systems.

By telling contractors their products must be PQC-compliant by 2030, and directing agencies to immediately favor PQC-capable vendors in mature markets, the federal framework forces the vendor ecosystem to ship PQC-capable products on a fixed timeline. Products that vendors build to federal requirements will end up used by hospitals, banks, universities, and small businesses, which makes PQC support more broadly available. Cloudflare is among the many vendors subject to these requirements, and because networking software and cloud services are already designated by CISA as widely available PQC categories, we’ve already shipped post-quantum encryption across most of our products at no extra cost.

Critical infrastructure and PQ for everyone

The EO also speaks to critical infrastructure: energy, financial services, water, transportation, telecommunications, healthcare, and other systems whose failure would have a serious or significant impact on the country. While the EO has no hard migration deadline for critical infrastructure owners and operators, the EO directs certain federal agencies to “assist” critical infrastructure owners and operators with their PQC migration plans (Sec. 5(a)).

While the EO focuses mostly on federal agencies and critical infrastructure in the U.S., post-quantum cryptography is important to every Internet-connected individual and organization. Harvest-now-decrypt-later attacks are a risk today. And after Q-Day, the risk of unauthorized access by an adversary armed with a quantum computer will impact any organization, big or small. When we launched free universal SSL in 2014, our CEO Matthew Prince wrote:

Having cutting-edge encryption may not seem important to a small blog, but it is critical to advancing the encrypted-by-default future of the Internet. Every byte, however seemingly mundane, that flows encrypted across the Internet makes it more difficult for those who wish to intercept, throttle, or censor the web.

We feel the same way about post-quantum cryptography. That’s why every post-quantum upgrade we build is available to all customers, on every plan, at no additional cost.

Opportunities for OMB’s implementation guidance

The EO sets the direction, and now OMB has 90 days to provide important clarifications and operational guidance to achieve the most effective PQC migration across federal agencies (Sec. 4(b)). Based on what we’ve learned from our own PQC migration, here are a few elements that we suggest that guidance should include:

Define what it means to “transition.” The EO requires agencies to “transition” their systems to PQC, but it never defines what “transition” means. Does it mean the system supports PQC algorithms? That it prefers them? Or that classical cryptography has been disabled entirely?

These are very different security postures. A system that supports ML-KEM but still allows a classical-only TLS handshake is vulnerable to downgrade attacks. An adversary capable of intercepting traffic could force the connection back to classical key exchange. The system would have “transitioned” to PQC in name, but still be vulnerable to the same quantum attacks the order is trying to prevent.

History is instructive. When SSLv3 was deprecated after the POODLE attack in 2014, servers kept SSLv3 enabled for backwards compatibility, allowing attackers to force connections to downgrade and then exploit SSLv3’s weaknesses. It took years for the ecosystem to actually turn SSLv3 off. To avoid repeating this pattern, we need a clear definition of “done” that includes disabling quantum-vulnerable cryptography to prevent downgrades.

Crypto agility: Crypto agility is the ability to swap cryptographic algorithms without re-architecting your systems. The EO mandates migrating to specific NIST crypto standards, but says nothing about building systems that can swap cryptographic algorithms if these algorithms need to change in the future. Crypto agility doesn’t mean supporting every algorithm at once. It means building systems so that when the community converges on a better algorithm in the future, the upgrade is a configuration change, not a re-architecture. The OMB should include this in its guidance.

CBOM or quantum impact inventory? The EO directs CISA and NIST to publish guidance on the minimum elements for a cryptographic bill of materials (CBOM) within 270 days (Sec. 5(d)). A CBOM is an inventory of the cryptographic algorithms, protocols, and implementations used in a given hardware or software product, similar to a software bill of materials (SBOM).

In theory, CBOMs are a good idea. In practice, we’d caution against treating exhaustive cryptographic inventories as a prerequisite for action. A detailed CBOM of every algorithm in every library in every product takes a long time to produce, it can take federal agencies an entire procurement cycle of discovery tooling and consulting, and it potentially becomes stale by the time the inventory is complete. Also, a CBOM doesn’t list systems that should be using cryptography but are not. And a CBOM lists keys without an understanding of their purpose, making them less useful for organizations trying to understand the risk associated with a quantum-vulnerable key.

We think that a quantum impact inventory is a more productive framing. What would be the impact if the system or its data is compromised? How likely is that to happen? What measures can be taken to mitigate the risk, whether a drop-in replacement, a software update, or a compensating control like tunneling traffic over bulk post-quantum connection or isolating it from the Internet? How feasible is each option and what dependency chain does it create? Identifying these informs where to take action first. You can fill in the details of a full CBOM over time if that makes sense for your organization, but you should start by discovering your most exposed and impactful systems.

Making post-quantum cryptography affordable to all. True national resilience fails if post-quantum cryptography is treated as a gated luxury rather than a universal baseline. OMB policy must resist vendor lock-in or toll booths that leave underfunded critical infrastructure behind or increase technical debt at federal agencies.

What to do now: don’t wait for 2030

You do not have to wait for 2030 or an exhaustive cryptographic inventory to start your migration. History has shown that updating cryptography is hard and can take a long time; other organizations should start sorting out their migrations as well. So as we wait for OMB guidance for federal agencies, here’s what we recommend for all organizations:

Protect your Internet traffic now. Start with traffic that crosses the public Internet, because that is the easiest for adversaries to harvest now and the most immediately at risk. If your web traffic flows through Cloudflare, your connections are largely protected with post-quantum encryption. If your enterprise network uses Cloudflare One, your private network traffic is also protected. If your provider doesn’t support post-quantum encryption, switch to one that does. Even if the individual applications running inside your network haven’t been upgraded yet, start tunneling your traffic through post-quantum encrypted infrastructure to protect it in bulk, even if individual systems are not yet inventoried and upgraded.

Update procurement. Make “post-quantum encryption by default, at no additional cost, with a clear roadmap for post-quantum authentication and crypto agility” a requirement in every technology procurement. If your vendor charges extra for post-quantum security or doesn’t have a roadmap or plan, ask why or find another vendor.

Quantum impact inventory. For traffic that stays inside your private network perimeter and is not exposed to the public Internet, the harvest-now-decrypt-later risk is lower because an adversary would need to be on your network to capture it. But you still need to know what cryptography your internal systems use, so you can plan your migration. Use a quantum impact inventory as a tool to prioritize your efforts, for example focusing on systems or connections that handle sensitive data or are exposed on the public Internet.

Plan for authentication now. The 2031 deadline for post-quantum authentication will come faster than you think. Start identifying your long-lived keys, root certificates, and code-signing infrastructure. These are the highest-priority targets for a quantum attacker, and they have the longest dependency chains to upgrade. Now is a great time to update your software libraries and automate certificate provisioning even if post-quantum certificates are not yet available in your ecosystem. And make sure your vendors are planning to be ready for the looming post-quantum authentication deadline.

Aligning policy and international standards

At the same time, work should also start now on aligning global government policy with international standards. We were glad to see that Section 5(b) directs the State Department to engage foreign governments and industry groups to encourage adoption of NIST-standardized PQC algorithms.

Here’s why this matters. Cryptography migrations cannot be run in a vacuum, with each country operating within its own borders. A TLS connection between a U.S. person and a server abroad only works if both ends negotiate the same cryptography. NIST has been running open international cryptographic competitions for decades. The AES competition (1997-2001) produced the encryption standard used across the Internet today, selecting a cipher designed by Belgian cryptographers. The SHA-3 competition (2007-2012) produced the latest hash standard, selecting an algorithm designed by a Belgian-Italian team. The PQC competition (2016-2024) followed the same open model: anyone could submit, anyone could analyze, and the winning algorithms were designed by international teams. ML-KEM, the key agreement standard now being deployed across the Internet, was created largely by European cryptographers. These are open, internationally vetted algorithms. NIST organized the competitions, but the results belong to the global cryptographic community.

The risk ahead is fragmentation. If different jurisdictions mandate different algorithms, the result is cipher bloat and increased attack surface: more code to write, test, and audit, more surface for downgrade attacks, and slower deployment for everyone. We’ve seen this happen firsthand in IPsec, where the lack of an interoperable standard led vendors to ship proprietary PQ key agreement algorithms that couldn’t interoperate, delaying the migration by years. The TLS community went the opposite way, converging on a single hybrid key agreement (X25519MLKEM768), and deployment followed quickly.

We are big fans of NIST, and especially its leadership in vetting standards globally and standardizing cryptography worldwide. We encourage the Trump Administration to work with Congress to ensure that NIST has appropriate resources, staffing, and tooling to meet current and emerging deliverables in this EO and others, like America’s AI Action Plan.

We’d like to see State Department-led engagement drive real alignment: adoption of the same NIST algorithms across allied nations, alignment on timelines, and mutual recognition of cryptographic algorithms and modules. The Internet is one network, and its cryptography should be one standard.

Speeding up CMVP

As a final note, the EO directs NIST to revise the processes used by the Cryptographic Module Validation Program (CMVP) to accelerate validations of cryptographic modules (Sec. 6(b)). Having bumped up against the CMVP program for years, we are extremely happy to see this in the order.

CMVP exists for a good reason. Federal agencies and their contractors need a way to verify that the cryptography inside a product actually does what it claims: that AES is implemented correctly or that random number generators have enough entropy. CMVP has been tuned for a steady state where cryptography doesn’t change much.

Going forward, CMVP needs to be adjusted to accept the realities of the impending migration. We welcome the FedRAMP update stream that allows updated modules to be used immediately before final validation. This allows faster adoption of post-quantum cryptography, and correction of implementation errors that were missed in validation. Similar allowances for CMVP are essential.

Go forth and PQ all the things

This post-quantum EO is a meaningful step. It sets real deadlines and creates supply chain pressure that will accelerate adoption across the industry.

For organizations starting their own migration, we suggest you start by protecting your public Internet traffic along with updates to your procurement requirements, followed by a quantum impact inventory to figure out where to focus next. Do not let cryptography inventory slow you down from deploying post-quantum encryption across your most sensitive systems immediately.

Cryptographic deployment across the Internet depends on standards developed by the IETF. The TLS community is further along, but there is lots more work to do across other protocol communities, and we look forward to supporting those efforts.

Let us go forth and PQ all the things, quickly and together. Free TLS helped encrypt the web. Free post-quantum cryptography will help secure it for what comes next.

You can get started now on Cloudflare by visiting our PQC page.

Why SIEM is Moving Toward Unified Security Operations: Rapid7 Named a Major Player in IDC MarketScape

2026-06-23 Rapid7

Post Syndicated from Rapid7 original https://www.rapid7.com/blog/post/dr-siem-moving-toward-unified-security-operations-rapid7-named-idc-marketscape-major-player

Rapid7 has been named a Major Player in the IDC MarketScape: Worldwide SIEM 2026 Vendor Assessment (#US54126826, June 2026).

This is the first IDC SIEM MarketScape to bring the enterprise and SMB markets into a single evaluation, and we believe it arrives at a time when the way teams buy and run a SOC is changing quickly. Security teams are no longer evaluating detection and response in isolation. They want their threat data, automation, and view of the attack surface working together, rather than spread across a stack of disconnected tools.

We believe Incident Command reflects that shift by bringing threat data, automation, and attack surface context into one platform instead of leaving teams to work across disconnected tools. It also speaks to a broader change in security operations, where context matters more, speed matters more, and teams need a clearer path from alert to action. That same direction runs through Rapid7’s wider point of view on preemptive security: exposure, detection, and response work better when they inform each other through shared context, AI, and human expertise.

Incident Command brings detection, response, and exposure context together

Incident Command brings SIEM, SOAR, attack surface management, and threat intelligence together on a shared data model. That gives analysts access to asset risk, vulnerability data, and exposure context during an investigation, so they can understand whether a detection affects a high-risk, internet-facing asset without having to jump between separate products.

According to the IDC MarketScape, “Incident Command is a strong fit for midmarket to enterprise organizations that want a fully integrated security operations platform with predictable costs.”

The teams we talk to are tired of stitching tools together and dealing with surprise ingestion bills. They want fewer blind spots, faster investigations, and a clearer answer to what is urgent and what to do next. Incident Command addresses that by bringing exposure context, threat intelligence, and response automation into the SIEM workflow, helping teams investigate faster and act with more clarity. For organizations looking for additional managed coverage, Rapid7 MDR is available as a separate offering. As attacks move faster and environments become harder to manage, security operations work better when exposure, threat, and response data are connected through an open platform that gives teams the context they need to move with more speed and clarity.

AI and automation, pressure-tested by a global SOC

Many vendors talk about AI in the SOC. For customers, the more important question is how those capabilities are developed, tested, and refined so they are useful in real investigations rather than just sounding good in a product story. We believe the IDC MarketScape called out what that means in Rapid7’s case:

“AI models and automation capabilities are tested in the MDR SOC before release to product customers, providing a feedback loop between managed service outcomes and product development that organizations without their own MDR equivalent cannot replicate.”

Our MDR analysts work real incidents across thousands of customer environments every day. The detections, triage models, and automation that come out of that work are tested against live attacks before they reach product customers. That feedback loop helps make the AI Engine more useful in practice by handling repetitive work such as classifying alerts, compiling evidence, and surfacing next steps, while analysts spend their time on the decisions that actually require human judgment. That balance also reflects Rapid7’s broader platform story: AI-powered, backed by human expertise.

What we believe this IDC MarketScape recognition says about the future of SIEM

The 2026 IDC MarketScape is a useful signal of where the market is heading. Organizations are looking for platforms where exposure and detection inform each other instead of living in separate systems, and where AI helps teams move faster without removing the human judgment needed to make the right call. We believe that is very much in line with the platform Rapid7 has been building through Incident Command and the wider Command Platform story. We’ll continue investing in the AI Engine, deeper attack surface context, and the integrations customers rely on. The goal remains straightforward: help defenders move faster to keep their environment safe, investigate with more context, and respond with machine speed and confidence.

Want to see Incident Command in action? Request a demo or explore the packages built to meet your team where it is.

Prime Day’s BEST Deals – Price Tracked and Tested!

2026-06-23 The Hook Up

Post Syndicated from The Hook Up original https://www.youtube.com/watch?v=Ckqq3bDtUsA

Feral Hogs #lastweektonight

2026-06-23 LastWeekTonight

Post Syndicated from LastWeekTonight original https://www.youtube.com/shorts/fJsPS28_-gQ

Autonomous troubleshooting for Medallion Architecture with AWS DevOps Agent and Apache Spark Troubleshooting Agent

2026-06-23 Mohammad Sabeel

Post Syndicated from Mohammad Sabeel original https://aws.amazon.com/blogs/big-data/autonomous-troubleshooting-for-medallion-architecture-with-aws-devops-agent-and-apache-spark-troubleshooting-agent/

Every minute of data processing pipeline downtime delays business decisions, stalls downstream analytics, drives revenue loss, and erodes stakeholder confidence. Teams that run Medallion Architecture pipelines—a common data lakehouse pattern where data flows through bronze, silver, and gold layers with increasing quality—face cascading failures that impact revenue-critical reporting and machine learning workloads. As you scale these multi-stage pipelines with Amazon Managed Workflows for Apache Airflow (MWAA), AWS Glue, and Amazon Redshift, troubleshooting failures becomes increasingly complex. When a mission-critical job fails, an engineer must sift through gigabytes of logs across interconnected systems. This means spending hours on incident investigations, examining execution timelines and resource metrics, and cross-referencing findings with Amazon CloudWatch and recent deployment changes to find the root cause. This requires deep familiarity with the underlying technologies, expertise not every team member has. When the right engineer is unavailable during off-hours, pipeline downtime extends and downstream consumers wait. The cycle of detect, investigate, fix, and repeat is costly and entirely reactive. A proactive operational model moves issue identification upstream, catching and addressing problems before they disrupt your data pipelines.

In this post, we show you how to diagnose multi-layer Medallion Architecture pipeline failures in minutes using AWS DevOps Agent with Apache Spark Troubleshooting Agent integrated as an MCP server.

What is AWS DevOps Agent and Apache Spark Troubleshooting Agent?

AWS DevOps Agent is an autonomous investigation agent powered by AI that automatically diagnoses operational issues across your AWS environment. When a failure occurs, the agent independently gathers evidence from logs, metrics, and configurations across interconnected services, identifies the root cause, and delivers actionable remediation steps, all without human intervention. It integrates with your existing workflows through webhooks and delivers findings directly to communication channels like Slack. With AWS DevOps Agent, you can replace the reactive cycle of detect, investigate, fix, and repeat with autonomous, proactive troubleshooting. The agent acts as your always-on, on-call engineer, starting its investigation the moment a failure occurs, whether during business hours or in the middle of the night.

Apache Spark Troubleshooting Agent is an AI-powered, fully managed Model Context Protocol (MCP) server that data engineers can use to diagnose Spark application failures across Amazon EMR, AWS Glue, and Amazon SageMaker AI Notebooks using natural language. It automatically correlates Spark History Server data, distributed executor logs, and configuration patterns to identify root causes and deliver actionable recommendations. This removes hours of manual investigation across multiple consoles and log files.

Use case

The following sections walk through a common Medallion Architecture failure scenario and show how autonomous troubleshooting resolves it.

The scenario

Consider this scenario: a gold layer AWS Glue job fails with “Missing data for not-null field.” The logs don’t reveal the actual problem. The root cause is a subtle data quality issue introduced upstream in the silver layer, a job that succeeded without errors. Without autonomous troubleshooting, you would manually trace data lineage across Amazon Simple Storage Service (Amazon S3), Amazon Redshift, and multiple AWS Glue job logs to find the source.

The solution

When integrated with the Apache Spark Troubleshooting Agent, AWS DevOps Agent identifies the gold layer Amazon Redshift write failure, traces it back to silver layer data corruption, and provides detailed root causes and actionable recommendations. The investigation typically completes within 3 to 5 minutes.

Solution overview

The following diagram shows the Medallion Architecture data flow across bronze, silver, and gold layers.

Medallion Architecture data flow showing the bronze layer in Amazon S3, the silver layer in Amazon S3 and Amazon Redshift, and the gold layer in Amazon Redshift, with Amazon MWAA orchestrating AWS Glue jobs and AWS DevOps Agent investigating failures

The architecture flow includes the following steps:

Amazon MWAA triggers the Medallion pipeline directed acyclic graph (DAG), orchestrating three AWS Glue jobs sequentially: bronze layer, silver layer, and gold layer.
The bronze layer job generates 50,000 synthetic ecommerce order records and writes raw Parquet files to Amazon S3.
The silver layer job reads bronze data from Amazon S3, applies transformations, and writes the results to two destinations in parallel: Amazon S3, and Amazon Redshift (filtered, cleaned, and augmented data in the silver_ecommerce table). This job silently introduces data corruption in approximately 8 percent of total_amount values.
The gold layer job reads from the Amazon Redshift silver_ecommerce table, performs aggregation, and attempts to write business-level aggregates back to the Amazon Redshift gold_ecommerce_summary table. If upstream data corruption introduces NULL values, this job fails with “Missing data for not-null field” because those NULL values violate the NOT NULL constraint.
When the gold layer job enters a FAILED state, Amazon EventBridge captures the AWS Glue Job State Change event and invokes an AWS Lambda function. The Lambda function retrieves webhook credentials from AWS Secrets Manager, constructs an HMAC-signed event payload containing the job name, run ID, and error details, and sends it to AWS DevOps Agent.
AWS DevOps Agent receives the HTTP POST request to the webhook and starts an autonomous investigation. It authenticates with Amazon Cognito using the OAuth 2.0 client credentials flow, then sends an MCP request through Amazon Bedrock AgentCore Gateway. The AgentCore Gateway invokes a Signature Version 4 (SigV4) Proxy Lambda, which signs the request and forwards it to the Apache Spark Troubleshooting Agent MCP Server. The MCP Server analyzes Spark event logs, executor metrics, and error stack traces for the failed gold job.
AWS DevOps Agent delivers the investigation to your configured Slack channel. The delivery includes root cause analysis, upstream data lineage back to the silver layer corruption, and step-by-step remediation recommendations.

Walkthrough

In the following sections, you deploy a three-layer Medallion Architecture pipeline that processes ecommerce order data. Complete the steps to get started with autonomous troubleshooting using AWS DevOps Agent.

Prerequisites

Before you begin, verify that you have the following:

An AWS account. Your AWS Identity and Access Management (IAM) user or role must have the following permissions:
- iam:CreateRole, iam:AttachRolePolicy, iam:PutRolePolicy
- lambda:CreateFunction, lambda:AddPermission
- glue:CreateJob, glue:StartJobRun
- redshift:CreateCluster, redshift:GetClusterCredentials
- airflow:CreateEnvironment
- events:PutRule, events:PutTargets
- sqs:CreateQueue
- secretsmanager:CreateSecret
- kms:CreateKey
- ec2:CreateVpc, ec2:CreateSubnet, ec2:CreateSecurityGroup
- cloudformation:CreateStack, cloudformation:DescribeStacks
- Alternatively, you can use the AdministratorAccess managed policy for simplicity in a dev/test environment.
AWS Command Line Interface (AWS CLI) version 2.30.0 or later, installed and configured with appropriate credentials.
(Optional) A Slack workspace if you want investigation results delivered to a channel.

Set up AWS DevOps Agent

In this section, you configure AWS DevOps Agent to receive and investigate pipeline failure events. This involves three tasks: creating an Agent Space (your investigation workspace), optionally connecting a Slack channel for notifications, and generating a webhook endpoint that your pipeline uses to send failure alerts to the agent.

Create an Agent Space

Open the AWS DevOps Agent console.
Choose Create Agent Space.
Enter a name (for example, medallion-troubleshooting).
Choose Create.

Connect Slack integration (optional)

If you use Slack for internal communication, you can configure it to receive investigation results.

In the AWS DevOps Agent console, go to Agent Spaces, select medallion-troubleshooting and then Communications.
Choose Add integration and choose Slack.
Choose Next to allow AWS DevOps Agent to access your Slack workspace, and choose Allow.
Provide the Slack workspace and the Channel ID where you want investigation results delivered, then choose Next.
Enter the following command in your channel chat to complete the integration: /invite @AWS DevOps Agent.
- While running this command, when prompted, choose the correct region where the Agent Space is provisioned.

Create a webhook

In your Agent Space, go to Webhooks.
Choose Add webhook and choose Next on the two following pages.
Choose Generate URL and secret key, and give the webhook a name (for example, medallion-failure-webhook).
After creation, copy and save the Webhook URL (HTTPS endpoint) and Secret Key. You can also choose Download .csv to save this information to a secure location. Select the checkbox labeled I’ve saved and stored my URL and secret key, then choose Add.

Note the Webhook URL and Secret Key for later. You provide them as parameters when you create the AWS CloudFormation stack.

Deploy the AWS CloudFormation stack

The AWS CloudFormation template deploys the full Medallion Architecture pipeline. This includes an Amazon Virtual Private Cloud (Amazon VPC) with private subnets, an Amazon Redshift cluster (ra3.xlplus, single-node), and three AWS Glue jobs. It also creates an Amazon MWAA environment, Amazon EventBridge rules, AWS Lambda functions, and an AgentCore Gateway with Amazon Cognito OAuth authentication.

You can deploy the stack using one of two methods. Use Option A if you prefer a visual, guided experience through the AWS Management Console. Use Option B if you prefer working from the command line or need to integrate the deployment into a script or automation workflow.

Before you start, download the CloudFormation template from GitHub.

Option A: AWS Management Console (recommended)

Open the AWS CloudFormation console and choose Create stack → With existing resources (import resources) or Upload a template file.
Choose Choose file, select the downloaded blog-medallion-stack.yaml, then choose Next.
For Stack name, enter medallion-troubleshooting.
Fill in the parameters:
- For WebhookUrl, enter your AWS DevOps Agent webhook URL (from Agent Space settings).
- For WebhookSecret, enter the webhook secret for authentication.
Choose Next, select I acknowledge that AWS CloudFormation might create IAM resources with custom names, then choose Submit.

Option B: AWS CLI

aws cloudformation create-stack \
    --stack-name medallion-troubleshooting \
    --template-body file://blog-medallion-stack.yaml \
    --parameters \
        ParameterKey=WebhookUrl,ParameterValue=<YOUR-WEBHOOK-URL> \
        ParameterKey=WebhookSecret,ParameterValue=<YOUR-WEBHOOK-SECRET> \
    --capabilities <CAPABILITY_NAMED_IAM> \
    --region <YOUR-REGION>

Replace the placeholder values:

YOUR-WEBHOOK-URL – Your AWS DevOps Agent webhook URL (from Agent Space settings).
YOUR-WEBHOOK-SECRET – The webhook secret for authentication.
YOUR-REGION – The AWS Region.

Wait for the stack status to show CREATE_COMPLETE. In our testing, this took approximately 30–40 minutes.

Retrieve Amazon Cognito client credentials

After the stack is deployed, it creates an Amazon Cognito user pool with an OAuth 2.0 client for AWS DevOps Agent authentication. Retrieve the client secret using the command below. The --user-pool-id and CognitoClientId needs to be copied from the stack outputs.

aws cognito-idp describe-user-pool-client \
    --user-pool-id <UserPoolId-from-outputs> \
    --client-id <CognitoClientId-from-outputs> \
    --query UserPoolClient.ClientSecret \
    --output text --region <YOUR-REGION>

Replace YOUR-REGION with the actual AWS Region value, and save this value for the MCP Server registration in the following step.

Register the Spark Troubleshooting MCP Server

The Spark Troubleshooting MCP Server gives AWS DevOps Agent the ability to analyze Apache Spark event logs, executor metrics, and error stack traces from your AWS Glue jobs. By registering this server, you connect the agent to the diagnostic tooling it needs to autonomously investigate pipeline failures.

To register the MCP Server in AWS DevOps Agent, complete the following steps:

In the AWS DevOps Agent console, go to Agent Spaces, select medallion-troubleshooting and then Capabilities.
In the MCP Servers section, choose Add or Add Source.
Find New MCP Server Registration and choose Register.
For Name, enter sparkagent.
For Endpoint URL, enter the AgentCoreGatewayUrl value from the stack outputs.
For Description, enter Apache Spark Troubleshooting MCP Server via AgentCore Gateway.
Leave Enable Dynamic Client Registration cleared.
Leave Connect to endpoint using a private connection cleared, then choose Next.
Under Authorization Flow, select OAuth Client Credentials, and choose Next.
For Client ID, enter the CognitoClientId value from the stack outputs.
For Client Secret, enter the value you retrieved in the preceding step.
For Exchange URL, enter the CognitoTokenEndpoint value from the stack outputs.
For Add Scope, enter <stack-name>-mcp-proxy/invoke. For example, medallion-troubleshooting-mcp-proxy/invoke.
Choose Next, review your configuration, and choose Add.
Once you choose Add, on the following screen, click on the checkbox next to the spark___analyze_spark_workload. This is the root cause analysis tool which provides detailed troubleshooting for failed Apache Spark workloads.
Choose Save as a last step. You will see the MCP Server associated successfully message on the top.

See AWS DevOps Agent in action

Now that you have completed the prerequisites, you can see AWS DevOps Agent in action. Go to the Amazon MWAA Airflow Environments UI and click on Open Airflow UI under Airflow UI. It will open in a new browser tab. In the Airflow console, locate and manually trigger the medallion_architecture_pipeline DAG.

Amazon MWAA Airflow console showing the medallion_architecture_pipeline DAG with the Trigger DAG action selected

Amazon MWAA Airflow UI showing the medallion_architecture_pipeline DAG with bronze, silver, and gold tasks listed sequentially

The DAG runs three AWS Glue jobs sequentially:

Bronze layer – This job generates 50,000 ecommerce order records and writes them to Amazon S3 as Parquet files.
Silver layer – This job applies transformations and loads the results to both Amazon S3 and Amazon Redshift. It also silently injects approximately 8 percent of total_amount values with $ prefix strings, introducing hidden data corruption.
Gold layer – This job reads from Amazon Redshift, casts total_amount to numeric (producing NULL values for the $-prefixed strings), and attempts to write aggregated results to the Amazon Redshift target table. It fails because the NULL values violate the NOT NULL constraint on revenue_total.

Amazon MWAA DAG run showing the bronze task succeeded, the silver task succeeded, and the gold task failed

With the components deployed and connected, the autonomous troubleshooting pipeline is ready to respond to failures. In this walkthrough, the silver layer job deliberately introduces data corruption to simulate a real-world data quality issue. This causes the gold layer job to fail, giving you the opportunity to see how AWS DevOps Agent responds.

As soon as the gold layer job fails, AWS DevOps Agent starts an autonomous investigation and uses the Apache Spark Troubleshooting MCP Server where needed.

Go to the AWS DevOps Management console and choose the medallion-troubleshooting under Agent Spaces. Next, select the Operator Access button. This will redirect you to Operator Console where you will see that the incident investigation automatically started in 1-2 minutes post Gold layer job failure.

After the investigation completes, AWS DevOps Agent presents its findings within the incident analysis. The results are organized into two sections.

Root cause identified by AWS DevOps Agent

The agent identifies the underlying cause of the failure, tracing the gold layer write error back to data corruption introduced in the upstream silver layer AWS Glue job.

Root cause analysis from AWS DevOps Agent showing the gold layer write error traced back to silver layer data corruption

Mitigation plan generated by AWS DevOps Agent

On choosing Generate Mitigation Plan, the agent provides step-by-step remediation recommendations to resolve the issue and prevent recurrence.

Mitigation plan from AWS DevOps Agent listing remediation steps to fix the silver layer data corruption and prevent recurrence

AWS DevOps Agent sends a notification to Slack

Slack channel showing the AWS DevOps Agent investigation summary with root cause identification and upstream data lineage trace

Typically, within 3–5 minutes, the agent delivers a detailed investigation in Slack that includes root cause identification, upstream data lineage tracking, and an actionable recommendation.

You have deployed an autonomous troubleshooting pipeline for Medallion Architecture data pipelines. The pipeline runs using AWS Glue, Amazon Redshift, and Amazon MWAA, with AWS DevOps Agent providing autonomous investigation. The agent traced a gold layer Amazon Redshift write failure back to a silver layer data quality issue. This type of diagnosis would typically require hours of manual investigation by an engineer with deep expertise in Apache Spark, Amazon Redshift, and data pipeline architecture. AWS DevOps Agent completed it autonomously within minutes.

If you need human assistance, you can use the Ask for human support feature within AWS DevOps Agent to open a case with AWS Support, automatically populated with relevant investigation context.

Enhanced investigations with AWS DevOps Agent Skills

AWS DevOps Agent autonomously investigates failures out of the box. You can enhance its diagnostic depth using Skills, a feature that provides the agent with domain-specific guidance tailored to your environment.

For Medallion Architecture pipelines, you can create Skills that instruct the agent to check for data type mismatches between pipeline layers when Amazon Redshift COPY errors occur, cross-reference silver layer data quality metrics with gold layer aggregation failures, or follow your internal runbook for escalating data quality issues to the upstream data engineering team.

To configure Skills, go to your Agent Space in the AWS DevOps Agent console and choose the Skills tab.

Clean up

To avoid incurring future charges, delete the resources you created during this walkthrough promptly after you finish testing.

To clean up resources, complete the following steps:

Deregister the MCP Server. In the AWS DevOps Agent console, go to your Agent Space and choose the Capabilities tab. In the MCP Servers section, choose the sparkagent server, then choose Deregister.
Delete the webhook. In your Agent Space, go to the Webhooks tab. Choose the medallion-failure-webhook, then choose Delete.
Empty the Amazon S3 buckets. Open the Amazon S3 console. Locate the buckets created by the stack (their names start with medallion-troubleshooting). For each bucket, choose Empty, enter permanently delete to confirm, and choose Empty.
Delete the AWS CloudFormation stack. Open the AWS CloudFormation console. Choose the medallion-troubleshooting stack, then choose Delete. Alternatively, run the following command:

aws cloudformation delete-stack \
    --stack-name medallion-troubleshooting \
    --region <your-region>

Wait for the stack deletion to complete.

Delete any retained Amazon S3 buckets. Some Amazon S3 buckets might have a DeletionPolicy of Retain and aren’t automatically deleted with the stack. Return to the Amazon S3 console, locate any remaining buckets created by the stack, empty them using the process in the preceding step, and then choose Delete for each bucket.

Conclusion

In this post, you deployed an autonomous troubleshooting pipeline for Medallion Architecture data pipelines using AWS Glue, Amazon Redshift, Amazon MWAA, and AWS DevOps Agent. The agent traced a gold layer Amazon Redshift write failure back to a silver layer data quality issue—a diagnosis that would typically require hours of manual investigation by an engineer with deep expertise across multiple services.

As your data pipelines grow in complexity, so does the challenge of diagnosing failures that span multiple layers and services. AWS DevOps Agent reduces your mean time to resolution by autonomously investigating incidents the moment they occur, whether during business hours or at 2 AM. Your on-call engineers spend less time sifting through logs and more time building reliable data infrastructure. By shifting from reactive firefighting to autonomous, proactive troubleshooting, you can improve pipeline reliability, protect downstream analytics and machine learning workloads, and maintain stakeholder confidence in your data platform.

To learn how to structure Agent Spaces for investigation accuracy, scope resource access, and use infrastructure as code to streamline deployment, see Best practices for deploying AWS DevOps Agent in production. To learn how to evaluate and choose the right lakehouse pattern for your needs, see Navigating architectural choices for a lakehouse using Amazon SageMaker. For more about Apache Spark Troubleshooting Agent, see Introducing the Apache Spark Troubleshooting Agent for Amazon EMR and AWS Glue.

Next steps

Now that you have set up autonomous troubleshooting for your Medallion Architecture pipeline, consider exploring the following:

Escalate to AWS Support directly from an investigation. If the agent’s findings require human assistance, you can use the Ask for human support feature within AWS DevOps Agent. This opens a case with AWS Support that is automatically populated with the relevant investigation context, which reduces the time spent describing the issue. For more information, see Getting help from AWS Support through AWS DevOps Agent.
Enhance investigations with Skills. Create custom Skills to give the agent domain-specific guidance tailored to your environment.
Learn more about AWS DevOps Agent best practices, choosing the right lakehouse pattern, and the Apache Spark Troubleshooting Agent.
Optimize your AWS DevOps Agent deployment. Learn how to structure Agent Spaces for investigation accuracy, scope resource access, and use infrastructure as code to streamline deployment. See Best practices for deploying AWS DevOps Agent in production.
Choose the right lakehouse architecture. Evaluate and compare lakehouse patterns to find the best fit for your data platform. See Navigating architectural choices for a lakehouse using Amazon SageMaker.
Explore the Apache Spark Troubleshooting Agent. Learn more about the diagnostic capabilities the agent uses to analyze Spark event logs, executor metrics, and error stack traces. See Introducing the Apache Spark Troubleshooting Agent for Amazon EMR and AWS Glue.

About the authors

[$] KASAN for JIT-compiled BPF code

2026-06-23 daroc

Post Syndicated from daroc original https://lwn.net/Articles/1077740/

Alexis Lothoré has been working to add support for the kernel’s memory-access
checker,

KASAN, to just-in-time-compiled BPF code. He spoke about that work at
the 2026

Linux Storage, Filesystem, Memory-Management, and BPF Summit.
KASAN support is needed, he said, to help catch bugs in the BPF just-in-time (JIT)
compiler. KASAN is a great tool for catching memory-management problems in the
kernel, but only in code that can be monitored by it.

Getting your SMS short code production-ready with AWS End User Messaging

2026-06-23 Harshvardhan Chunawala

Post Syndicated from Harshvardhan Chunawala original https://aws.amazon.com/blogs/messaging-and-targeting/getting-your-sms-short-code-production-ready-with-aws-end-user-messaging/

Getting your Short Message Service (SMS) short code production-ready requires you to configure the Amazon Web Services (AWS) infrastructure that controls how your messages are sent, monitored, and protected. You have provisioned your short code, and it is active on carrier networks. In this post, we walk through that setup using AWS End User Messaging SMS, covering 12 configuration steps from compliance through phased traffic migration. Total estimated time is 2 to 4 hours of configuration plus 1 to 3 business days for limit increase approvals.mess

The guide to SMS short codes with AWS End User Messaging covers the application and registration process up through provisioning. This post picks up from that point and provides an operational readiness walkthrough that takes you from “Active” status to confidently sending your first production message, including a final validation step to confirm readiness.

The following diagram shows the end-to-end message flow and event routing architecture covered in this walkthrough.

End-to-end SMS short code architecture showing message flow from sender through AWS End User Messaging SMS to carriers and recipient handsets, with event routing to Amazon CloudWatch, Amazon Simple Notification Service (Amazon SNS), and Amazon Data Firehose destinations

Prerequisites

You need the following to follow along with this walkthrough:

An AWS account with access to the AWS End User Messaging SMS console.
A short code with Active status in the AWS Management Console (carrier provisioning finished).
Permissions to create AWS Identity and Access Management (IAM) roles, Amazon CloudWatch Log Groups, and Amazon Simple Notification Service (Amazon SNS) topics.
AWS Command Line Interface (AWS CLI) v2 or an AWS SDK installed and configured.
Your approved registration documentation, including the service name, keyword responses, and message templates submitted to carriers.

Step 1: Verify your short code is active and delivering

Navigate to the AWS End User Messaging SMS console, choose Phone numbers, and locate your provisioned short code. Confirm that the status shows Active, then send a test message to a phone number you control using the SendTextMessage API or the console test feature. Verify delivery on your handset.

Carrier-side activation can take up to 24 to 48 hours to fully propagate across all networks after provisioning finishes. If the console shows Active but your test message does not arrive, submit a support case so the team can verify propagation status with the carrier.

You can also verify using the AWS CLI:

aws pinpoint-sms-voice-v2 send-text-message \
    --destination-phone-number "+15555550100" \
    --origination-identity "12345" \
    --message-body "Test message from short code" \
    --message-type TRANSACTIONAL \
    --configuration-set-name "prod-otp-shortcode"
# Replace +15555550100 with your test phone number, 12345 with your short
# code, and prod-otp-shortcode with your configuration set name from Step 3.

Step 2: Configure keywords and verify message compliance

US carriers require every short code to respond to HELP and STOP keywords. You defined these during your registration, and this step confirms they are configured correctly in your account.

In the SMS console, choose Phone numbers, select your short code, and choose the Keywords tab. Verify that STOP returns the opt-out response you submitted during registration, and that HELP returns your support contact response (which must include a phone number or email). Add any custom keywords your use case requires, such as YES for double opt-in confirmation flows. You can manage keywords programmatically using the PutKeyword API.

To add or update a keyword programmatically:

aws pinpoint-sms-voice-v2 put-keyword \
    --origination-identity "12345" \
    --keyword "YES" \
    --keyword-message "You have confirmed your subscription to Acme Health Alerts. Msg&data rates may apply. Reply STOP to opt out." \
    --keyword-action AUTOMATIC_RESPONSE
# Replace 12345 with your short code, YES with your custom keyword, and the
# keyword-message text with your approved response.

To verify your current keyword configuration:

aws pinpoint-sms-voice-v2 describe-keywords \
    --origination-identity "12345"
# Replace 12345 with your short code.

Beyond keyword configuration, carrier compliance does not end at registration approval. The content you send in production must stay aligned with what carriers reviewed and approved. Here is what to keep consistent.

Use the exact brand or program name from your approved registration across all keyword responses, confirmation messages, and outbound templates. If carriers approved your registration under “Acme Health Alerts,” every message your short code sends should reference that name. Mixing variations creates inconsistencies that auditors flag during reviews. For example, do not use the company name in one message and the product name in another.

Your HELP, STOP, and confirmation responses must match the templates submitted during registration. Do not add or remove opt-out language, change frequency disclosures, or alter customer care contact details post-approval without updating the registration through a support case. If your organization operates multiple domains, use the domain documented in the registration. For example, you might have one domain for the application and another for marketing. Carrier reviewers cross-reference message content, opt-in screenshots, and privacy policy URLs with what was submitted.

Humans conduct carrier reviews, and message content that is concise and limited to the essentials is reviewed consistently. All messages must remain under 160 characters.

Step 3: Create a configuration set with event destinations

A configuration set controls where your SMS delivery events are streamed and which event types are captured. Without one, you are limited to the basic events that AWS End User Messaging SMS sends to Amazon EventBridge by default. These default events omit recipient details and full carrier response context.

Create a configuration set with a descriptive name such as prod-otp-shortcode or marketing-sc-us. Then create at least one event destination. The three main options are Amazon CloudWatch Logs (for operational monitoring and alarming), Amazon SNS (for real-time event fanout to downstream systems), and Amazon Data Firehose (for durable archival and analytics).

Amazon Data Firehose typically delivers to an Amazon Simple Storage Service (Amazon S3) bucket, where you can query delivery history using Amazon Athena for compliance audits or delivery pattern analysis.

# Create the configuration set
aws pinpoint-sms-voice-v2 create-configuration-set \
    --configuration-set-name "prod-otp-shortcode"

# Add a CloudWatch Logs event destination
aws pinpoint-sms-voice-v2 create-event-destination \
    --configuration-set-name "prod-otp-shortcode" \
    --event-destination-name "otp-delivery-logs" \
    --matching-event-types TEXT_DELIVERED TEXT_FAILED TEXT_QUEUED TEXT_CARRIER_UNREACHABLE TEXT_TTL_EXPIRED \
    --cloud-watch-logs-destination '{
        "IamRoleArn": "arn:aws:iam::123456789012:role/SMSEventsToCloudWatch",
        "LogGroupArn": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/sms/prod-otp-shortcode"
    }'
# Replace prod-otp-shortcode with your configuration set name, otp-delivery-logs
# with a descriptive destination name, and the ARN values with your IAM role ARN
# (must have logs:PutLogEvents permission) and CloudWatch Log Group ARN.

Important: When sending messages with SendTextMessage, always specify your ConfigurationSetName parameter so events route to the appropriate destination.

Required event types

Event type	Description
`TEXT_DELIVERED`	Message successfully delivered to recipient handset.
`TEXT_FAILED`	Message delivery failed.
`TEXT_QUEUED`	Message accepted and queued for delivery.
`TEXT_CARRIER_UNREACHABLE`	Carrier network unreachable.
`TEXT_TTL_EXPIRED`	Message expired before delivery.

For a detailed walkthrough of configuration sets including multi-tenant architectures, see How to send SMS using configuration sets with AWS End User Messaging.

Step 4: Create a phone pool and associate your short code

A pool is a logical container that groups origination identities and controls routing behavior. Creating one gives you deterministic control over which number sends your messages and how opt-outs are enforced.

# Create the pool
aws pinpoint-sms-voice-v2 create-pool \
    --origination-identity "12345" \
    --iso-country-code "US" \
    --message-type TRANSACTIONAL

# Disable shared routes so only your short code is used
aws pinpoint-sms-voice-v2 update-pool \
    --pool-id "pool-1234567890abcdef0" \
    --shared-routes-enabled false
# Replace 12345 with your short code, US with your destination country code,
# and pool-1234567890abcdef0 with the Pool ID returned by create-pool.

Configuration parameters

Parameter	Recommended value	Rationale
Pool name	`us-otp-pool`	Descriptive, environment-prefixed.
`SharedRoutesEnabled`	`False`	Prevents fallback to shared routes; only your short code is used.
Opt-out list	Associate one	Manages opt-out state per use case.
`IsoCountryCode`	`US`	Restricts to destination country your short code serves.

If you operate multiple use cases on separate short codes, create a dedicated pool for each. For example, use one short code for one-time password (OTP) traffic and another for transactional notifications. This isolation means a recipient opting out of marketing messages does not lose access to authentication codes.

Step 5: Request your throughput increase

Short codes start at a default of 100 messages per second (MPS). If your production volume will exceed this, request an increase before your launch date rather than after traffic is flowing.

Create a case in the AWS Support Center, choose Service limit increase, then choose End User Messaging SMS. Provide your short code phone number, requested MPS, use case description, and expected peak volume. Allow 1 to 3 business days for processing.

To estimate your required MPS:

Required MPS = (Peak hourly volume / 3,600) x 2

Short codes support scaling to thousands of MPS, so start with a value that covers your expected peak and request further increases as traffic grows.

Step 6: Request a spending limit increase

AWS accounts have a default monthly SMS spending limit. To keep delivery uninterrupted at your expected volume, request an increase that accommodates your projected monthly spend before you begin sending.

Create a support case under Service limit increase > End User Messaging SMS > Account Spend Threshold. Provide your estimated monthly spend, use case description, and website URL.

For details, see Requesting increases to your monthly SMS spending quota.

Step 7: Restrict destination countries

If your short code serves a single country (US-only, for example), restrict sending to that country. This protects your account from artificially inflated traffic (SMS pumping). In pumping attacks, messages are routed to international premium-rate numbers, generating significant charges.

In the SMS console, navigate to Account settings, then choose Countries and keep only the countries you intend to send to. The pool-level IsoCountryCode restriction from Step 4 provides an additional enforcement layer at the sending path. Combining account-level country restrictions with pool-level country codes gives you two independent controls that both must be satisfied before a message is sent internationally.

For a detailed walkthrough on SMS fraud prevention controls, see Defending against SMS pumping: new AWS features to help combat artificially inflated traffic.

Step 8: Set up monitoring and alarms

With event destinations configured in Step 3, build proactive alerting that surfaces delivery trends before they affect your end users.

Recommended alarms

Alarm	Metric / Source	Threshold
Delivery success rate	CloudWatch SMS metrics	Alert when below 95%.
Spend threshold	CloudWatch billing metric	Alert at 80% of monthly limit.
Delivery failures	Amazon EventBridge rule on `TEXT_FAILED`	Route to Amazon SNS topic or AWS Lambda.
Carrier unreachable	Amazon EventBridge rule on `TEXT_CARRIER_UNREACHABLE`	Route to Amazon SNS topic or AWS Lambda.

Build a CloudWatch dashboard showing messages sent per minute, success versus failure breakdown, and spend accumulation over time.

You can also configure Amazon EventBridge to notify you of registration status changes. AWS End User Messaging SMS publishes events for statuses including REQUIRES_UPDATES, REVIEWING, and PROVISIONING, which is useful if a carrier requests changes during a proactive audit after your short code is already active.

For metric details, see Monitoring SMS activity with Amazon CloudWatch.

Step 9: Track OTP verification success (if applicable)

If your short code delivers OTP or two-factor authentication (2FA) codes, track end-to-end verification success in addition to carrier delivery receipts. A “delivered” status at the carrier level does not confirm the end user received and entered the code.

Tracking verification rates gives you insight into latency patterns when codes expire before arrival, geographic delivery trends, and opportunities to improve conversion. Some use cases involve asynchronous processing where several minutes of computation occur before the SMS is sent. For these, measure the full round-trip from the triggering action to message delivery. This separates application-side latency from carrier-side delivery latency.

For implementation guidance, see Track OTP success with AWS End User Messaging SMS feedback.

Step 10: Set up cost visibility

SMS costs include AWS charges plus per-message carrier surcharges. Setting up cost visibility from day one lets you track spend trends, catch anomalies early, and optimize over time.

Start by activating AWS Cost Explorer and creating a cost allocation tag for your SMS workload. Then configure an AWS Budget with threshold alerts. For example, you might notify at 80% of projected monthly spend. This gives you advance warning of unexpected cost increases, whether from traffic spikes, retry loops, or blocked-country leakage.

Step 11: Plan your traffic migration

A phased rollout validates delivery performance at each stage before you increase volume.

Start with a canary phase (Day 1 to 3) where you route 5 to 10% of traffic to the short code and monitor delivery rates, latency, and event logs. Move to a ramp phase (Day 3 to 7) at 50%, validating throughput and carrier-level delivery across your recipient base. Finish the full migration (Day 7+) at 100%. Decommission your previous origination identity only after confirming stability for at least 48 hours.

Step 12: Validate production readiness and send

Before declaring your short code production-ready, run through the following validation checks:

Confirm your CloudWatch dashboard shows events flowing for TEXT_DELIVERED and TEXT_FAILED (from Step 3).
Send a test message that triggers your STOP keyword. Verify the correct opt-out response is returned and the phone number appears in your opt-out list.
Send a test message that triggers your HELP keyword. Verify the response matches your approved registration.
Check your MPS quota in the support case response (from Step 5). Confirm it matches or exceeds your calculated peak.
Review your country restrictions (from Step 7). Attempt to send a message to a blocked country and confirm it is rejected.
Verify your CloudWatch alarm fires by temporarily lowering the threshold, or by checking that the alarm state is not INSUFFICIENT_DATA.

After all six checks pass, you are ready to begin your phased migration (Step 11) and scale to full production traffic. At this point, your short code is configured, monitored, compliant, and protected.

Automate with a validation script

You can use an AI coding assistant such as Kiro to generate a validation script tailored to your environment. Try a prompt like: “Write a boto3 script that validates my SMS short code is production-ready by checking Active status, HELP/STOP keywords, configuration set existence, and pool association using the pinpoint-sms-voice-v2 client.”

Refine the prompt with specifics from the following reference implementation, such as exact API names, filter parameters, and output format, to match your requirements.

The following script is an example of what that output looks like:

import boto3
import sys

SHORT_CODE = "12345"  # TODO: Replace with your short code (e.g., "67890")
POOL_ID = "pool-1234567890abcdef0"  # TODO: Replace with your pool ID from Step 4
CONFIG_SET_NAME = "prod-otp-shortcode"  # TODO: Replace with your configuration set name from Step 3

client = boto3.client("pinpoint-sms-voice-v2")

# Note: For accounts with many resources, implement NextToken pagination
# on describe_* calls. This script assumes results fit in a single page.


def check_short_code_active():
    """Step 1: Verify short code is Active."""
    response = client.describe_phone_numbers(
        Filters=[
            {"Name": "status", "Values": ["ACTIVE"]},
            {"Name": "number-type", "Values": ["SHORT_CODE"]}
        ]
    )
    numbers = [
        n for n in response["PhoneNumbers"]
        if n["PhoneNumber"] == SHORT_CODE
    ]
    assert len(numbers) > 0, f"Short code {SHORT_CODE} not found or not Active"
    print(f"[PASS] Short code {SHORT_CODE} is Active")


def check_keywords_configured():
    """Step 2: Verify HELP and STOP keywords exist."""
    response = client.describe_keywords(OriginationIdentity=SHORT_CODE)
    keyword_names = [kw["Keyword"].upper() for kw in response["Keywords"]]
    assert "STOP" in keyword_names, "STOP keyword not configured"
    assert "HELP" in keyword_names, "HELP keyword not configured"
    print("[PASS] HELP and STOP keywords configured")


def check_configuration_set():
    """Step 3: Verify configuration set exists."""
    response = client.describe_configuration_sets(
        ConfigurationSetNames=[CONFIG_SET_NAME]
    )
    assert len(response["ConfigurationSets"]) > 0, f"Configuration set {CONFIG_SET_NAME} not found"
    print(f"[PASS] Configuration set '{CONFIG_SET_NAME}' exists")


def check_pool_association():
    """Step 4: Verify pool exists and short code is associated to it."""
    response = client.describe_pools(PoolIds=[POOL_ID])
    assert len(response["Pools"]) > 0, f"Pool {POOL_ID} not found"

    # Verify short code is associated to the pool
    assoc_response = client.list_pool_origination_identities(PoolId=POOL_ID)
    identities = [
        oi["OriginationIdentity"]
        for oi in assoc_response["OriginationIdentities"]
    ]
    assert any(SHORT_CODE in oi for oi in identities), \
        f"Short code {SHORT_CODE} not associated with pool {POOL_ID}"
    print(f"[PASS] Pool '{POOL_ID}' exists and short code is associated")


if __name__ == "__main__":
    checks = [
        check_short_code_active,
        check_keywords_configured,
        check_configuration_set,
        check_pool_association,
    ]
    for check in checks:
        try:
            check()
        except Exception as e:
            print(f"[FAIL] {check.__doc__} - {e}")
            sys.exit(1)
    print("\nAll validation checks passed. Ready for production traffic.")

Cleaning up

If you created test resources while following this walkthrough, you can delete them through the AWS End User Messaging SMS console or with the API to avoid confusion with your production configuration. This includes a test configuration set, test pool, or test event destinations used for validation. Do not delete your production configuration set, pool, or keyword settings.

If you requested a test-level MPS increase or spending limit for validation, update these to your production values through a new support case before going live.

Quick reference checklist

Step	Action	Key API / Service
1	Verify short code is Active and test delivery	`SendTextMessage`
2	Configure keywords and verify message compliance	`PutKeyword`
3	Create configuration set with event destinations	`CreateConfigurationSet`
4	Create pool and associate short code	`CreatePool`, `AssociateOriginationIdentity`
5	Request MPS increase for expected throughput	AWS Support
6	Request spending limit increase	AWS Support
7	Restrict destination countries	Console / `UpdateAccount`
8	Set up CloudWatch alarms and dashboards	Amazon CloudWatch
9	Track OTP verification success (if applicable)	SMS Feedback events
10	Set up cost visibility	AWS Cost Explorer, AWS Budgets
11	Plan phased traffic migration	Application-level routing
12	Validate production readiness and send	All of the preceding

Conclusion

In this post, we walked through how to configure a newly provisioned SMS short code for production use with AWS End User Messaging SMS. The 12 steps cover keyword verification, message compliance, event monitoring, throughput planning, country restrictions, cost visibility, phased traffic migration, and a final production validation.

You can adapt the sequence to your specific use case and volume profile. For the full registration and application process, see A guide to SMS short codes with AWS End User Messaging. To start configuring, navigate to the AWS End User Messaging SMS console. For the full API reference, see the AWS End User Messaging SMS documentation.

About the author

Editing Flow State: Avoid Distractions #StayCreative

2026-06-23 Matt Granger

Post Syndicated from Matt Granger original https://www.youtube.com/shorts/L_uoRaF2TzM

10 Years Later – Brexit is a Catastrophe

2026-06-23 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/watch?v=wOc1cxnDR_s

Sunsetting Tor 0.4.8

2026-06-23 jzb

Post Syndicated from jzb original https://lwn.net/Articles/1079119/

The Tor Project has announced
that it is planning to actively stop supporting Tor 0.4.8 and earlier
C Tor versions soon.

Usually, we try not to break existing releases, even if they are
unsupported, unless we have a pretty good reason. In this case, we
have several reasons. […]

The most important reason is this: in 0.4.9, we have made some
former fields in our directory data obsolete — specifically, TAP
onion keys and family
lines. Removing these fields will let us save a great deal of
client directory bandwidth for everyone. This, in turn, will make all
Tor clients bootstrap a little faster, especially those on slow
connections. But when we remove these fields, clients and relays
running earlier versions of Tor will no longer work, since they expect
the TAP onion keys to be present. Therefore, in order to deliver
improved performance faster, we need to accelerate the date on which
0.4.8 will stop working.

The target sunset date is currently September 1,
2026, after which any version prior to Tor 0.4.9 will cease to work on
the network. The first stable release in the 0.4.9.x series was
announced
in February 2026, and the Tor 0.4.8.x series reached end of life on
June 1.

Security updates for Tuesday

2026-06-23 jzb

Post Syndicated from jzb original https://lwn.net/Articles/1079083/

Security updates have been issued by Debian (ffmpeg), Fedora (erlang, ffmpeg, prometheus, python-scrapy, python3-docs, python3.14, thorvg, tigervnc, and vips), Mageia (mumble and sslh), Oracle (389-ds:1.4, dracut, firefox, hplip, kernel, openssh, postgresql:15, redis:6, and uek-kernel), Red Hat (delve, gvisor-tap-vsock, nginx, nginx:1.24, nginx:1.26, osbuild-composer, podman, rhc, skopeo, and yggdrasil), SUSE (containerized-data-importer, graphite2, kernel, libarchive, openssh, openssh-askpass-gnome, openvswitch, openvswitch3, postfix, python-lxml, python-nltk, python-python-multipart, python-urllib3, rmt-server, terraform-provider-local, terraform-provider-null, and util-linux), and Ubuntu (google-guest-agent, haproxy, libxml2, linux-azure, linux-intel-iotg-5.15, linux-lowlatency, linux-lowlatency-hwe-5.15, linux-oracle-5.15, mysql-8.0, mysql-8.4, and nginx).

THG Podcast: Counterfactuals – Cannae

2026-06-23 The History Guy: History Deserves to Be Remembered

Post Syndicated from The History Guy: History Deserves to Be Remembered original https://www.youtube.com/watch?v=tmrtNJPrvn4

Anthropic’s Fable 5 Model Jailbroken Within Days

2026-06-23 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2026/06/anthropics-fable-5-model-jailbroken-within-days.html

Fable 5 is the supposed safe version of Anthropic’s Mythos Preview, with guardrails to ensure that it can’t be used to create cyberattacks.

Well, that restriction was bypassed within days.

How to design and present clear computing lessons

2026-06-23 Sean Sayers

Post Syndicated from Sean Sayers original https://www.raspberrypi.org/blog/how-to-design-and-present-clear-computing-lessons-mayers-principles/

Learning something new requires effort. Learners take in new information by listening and observing. When a lot of information is presented at once in a lesson, that can create too much cognitive load for learners — a barrier to understanding and engagement.

To help you design and deliver great computing lessons, we’ve written two new Pedagogy Quick Reads focused on Mayer’s Principles of Multimedia Learning. These research-backed principles give you practical strategies to lower your students’ unnecessary cognitive load during lessons, leading to better learning outcomes.

In this blog, we introduce the two new Quick Reads (Designing multimedia for clarity and Designing multimedia for understanding), which you can download for free to:

Find practical tips for how you can apply Mayer’s Principles to your lessons
Read a summary of the research behind them

The blog also includes some examples for how to apply the principles in your computing lessons.

If you’d like an introduction to the idea of cognitive load, you can find the Quick Read about cognitive load theory here.

In a computing classroom, a girl looks at a computer screen.

What are Mayer’s Principles?

Mayer’s Principles of Multimedia Learning are practical principles that will help you create clearer resources and present information in a way that avoids unnecessary cognitive load for your learners.

Mayer’s Principles are based on three related facts:

You can present information to learners in auditory form (e.g. spoken explanations) and visual form (e.g. written text, diagrams)
There are limits on how much new information people can take in at the same time
Teaching materials that are not well-structured can cause too much cognitive load, which negatively affects learning

Designing lessons for clarity

Our first new Quick Read focuses on the following Mayer’s Principles for making your lessons as clear as possible, so that learners can connect the information they see and hear in real time.

Make all the information you include coherent, meaning that it is directly relevant to the learning objectives and does not distract learners’ attention
Guide your learners’ attention by using signals such as arrows, bold text, colour, or auditory cues
Avoid redundant information, such as a slide with a diagram and a paragraph explaining the diagram, or a slide that you speak about without adding new, complementing information
Present related words and visuals in the same space, e.g. place your text labels, or explanations directly adjacent to diagrams, images or code segments they describe
Present related words and visuals at the same time, e.g. by pairing narration with imagery

Designing lessons for understanding

Our second new Quick Read shares three Mayer’s Principles for how you can structure your lesson delivery to support your learners’ understanding:

Structure lessons or demonstrations into clear, manageable stages or segments, rather than presenting the information all at once
When you start a new topic, begin with some pre-training by introducing key terms, components, or goals and how they relate
When you present diagrams, flowcharts, or code examples, explain this visual information using the other modality, meaning spoken narration, instead of using paragraphs of text

Applying Mayer’s Principles to your computing lessons

We suggest you consider implementing Mayer’s Principles when you next design new lessons or want to adapt materials that you reuse regularly.

Here are some ideas on how you use both sets of principles in common computing teaching scenarios.

Live coding and code walkthroughs

When displaying a new Python script or Scratch project, avoid adding long, written paragraphs of commentary to explain the code. Instead, place short text annotations or sub-goal labels directly next to the relevant lines or blocks. As you run through the code, use your pointer or live typing to guide your learners’ focus (signalling) and explain in words how the program works at the same time.

Starting a new topic such as networking

Before students move to a new topic, for example networking, consider what words or concepts your class needs to be familiar with. Allocate a few minutes at the start of your lesson for pre-training to introduce terms like LAN or bandwidth and how they relate to the lesson.

Consider how your lesson can be divided into stages to allow for better understanding (segmenting). Each stage should build on the previous one and feed into the next one. For example, when you explain how data moves across a network, you can introduce each step separately before combining them all into a complete model of a network.

Consider how you display visual information to your class. Ensuring related diagrams and labels appear close together, only include relevant materials and no decoration on your slides (coherence), and avoid simply reading out words on the slide identical forms of information (redundancy).

Supporting multilingual learners with Mayer’s Principles

Mayer’s Principles are even more important for educators teaching multilingual learners or non-native speakers. When learners need to work harder to understand the language, poor lesson design can slow down their learning significantly.

Mayer’s Principles can help you with this challenge:

Applying the coherence and redundancy principles will allow you to make your explanations and slides as clear and concise as possible
Using signaling will mean you help learners to follow along and know what is most important
Presenting diagrams that illustrate computing concepts clearly will help your multilingual learners understand your spoken explanation much more easily (modality)

Intentional design for lasting understanding

By intentionally designing and presenting lessons to give the right amount of information in the clearest way, you make it easier for your students to focus and build a lasting understanding of computing concepts. When your lesson materials align with how our brains process information, learners can build stronger mental models and approach independent learning activities with greater confidence.

Read our new Quick Reads to find out more and discover the research behind Mayer’s Principles:

Download the ‘Designing multimedia for clarity’ Quick Read

Download the ‘Designing multimedia for understanding’ Quick Read

The post How to design and present clear computing lessons appeared first on Raspberry Pi Foundation.

Toward More Controllable AI Video Editing: An Early Research Exploration at Netflix

2026-06-23 Netflix Technology Blog

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/toward-more-controllable-ai-video-editing-an-early-research-exploration-at-netflix-eb8160ed60a2

By Zhuoning Yuan, Ta-Ying Cheng, Benjamin Klein, Bahareh Azarnoush

Introduction

At Netflix, we build technology to help storytellers bring their creative visions to life and to help members discover the stories they love.

To connect stories with diverse audiences around the world, we produce promotional assets, including trailers, teasers, and social short‑form videos, that build on and elevate the original footage. Through close collaboration with the teams crafting these assets, we identified a recurring gap in current tools. Transforming raw footage into a polished final asset often requires complex edits like seamlessly adding new visual elements, patching or replacing backgrounds, or removing unwanted objects without breaking the scene’s physical continuity. These tasks typically demand hours of specialized manual editing work. While recent generative video editing models show promise, they often struggle to preserve the integrity of the source footage. Many methods regenerate every pixel to make an edit, which can fail to isolate changes and inadvertently alter elements that should remain untouched. To execute these tasks effectively, artists need tools that empower them to dictate exactly what changes and how it changes.

Our research goal is to make this process easier for artists. We’re deliberate about where and how AI is applied, ensuring that the technology always serves the creative intent. That principle drives our recent work: exploring the benefits of generative AI in ways that protect and expand creative choice, and keeping artists in precise control of their final vision. Recent advancements in AI video editing have demonstrated impressive capabilities in streamlining complex manual editing workflows, but key challenges remain before they can reliably support professional use:

Unintended edits: When editing a specific element in a video clip, many methods regenerate the entire video, which can inadvertently alter identity, performance, and other elements like objects, backgrounds, or critical scene details.

Left: input video. Right: output from Ditto using the prompt “change the background to a winding coastal highway in California,” which completely changes the scene.

Unnatural physics: When removing objects, many methods focus only on erasing the target while ignoring the scene’s physical continuity. This can lead to inconsistent motion and implausible interactions, making the results look unnatural.

Left: the green mask denotes the target to be removed. Right: output from Gen-Omnimatte where the target was removed, but the physical continuity of the scene was ignored — the pool float shouldn’t move if there’s no interaction with it.

Today, we’re sharing two research explorations that aim to address these challenges. We believe this work can help advance the field in a way that’s both meaningful and responsible:

Vera: a layered video diffusion model. Vera generates only what needs to change as separate edit layers while leaving the rest of the video untouched, preserving the identities, performances, and other details from the source footage exactly as filmed.
VOID: a video inpainting model for video object and interaction deletion. VOID performs physically plausible inpainting in complex scenes: it doesn’t just remove an object, but also reconstructs the scene as if the object was never there.

Along with this blog post, we’re also publicly releasing the research papers that detail the algorithmic innovations behind Vera and VOID. We hope these publications will enable other researchers to experiment with these ideas, build upon our findings, and further advance the field.

Vera: A Layered Video Diffusion Model

Existing video editing models regenerate the entire clip, coupling the intended edit with regions that should remain unchanged. This increases the risk of altering details of the original footage. To tackle this challenge, we introduce Vera, a novel layered video diffusion framework for content-preserving video editing.

Teaser for Vera (disclaimer: This is a research prototype, not an official product).

Inference Pipeline

Given a source video and a text editing instruction, Vera jointly generates an edit layer and an alpha matte. These layers are then seamlessly composed with the original footage to produce the final edited result. By design, Vera supports complex tasks such as object addition and background change, while ensuring that the pixels outside the edited regions from the source video remain perfectly intact.

Training Data

One of the main challenges in developing Vera was the lack of suitable training data. Since no public dataset provides the high-quality layered data we need (clean input, alpha matte, edit layer, composite video), we built our own. Using a combination of existing open-source videos and human annotation, we constructed a layered video dataset with a total of 486k frames at 832×480 resolution. We organized it into three subsets of increasing complexity:

Synthetic Composites: Clips with high-quality foreground alpha mattes are composited over diverse, automatically generated backgrounds. This subset provides strong and reliable supervision for alpha matting in object addition and background change tasks.
Realistic Single-Object Videos: Real-world clips are processed through segmentation, matting, background inpainting/generation, and human quality filtering. This subset increases scene diversity and camera motion, improving composition quality across both tasks.
Realistic Multi-Object Videos with Effects: This extends the previous subset by isolating individual objects with curated alpha mattes, including their associated effects such as shadows and reflections. This subset improves compositing and editing in more complex, dynamic scenes.

Model Architecture

Beyond data, model design is another key challenge. The three target outputs Vera generates — an edit layer (decoupled creative edits), an alpha matte layer (a grayscale mask that depends on the edit content and scene interactions such as occlusions), and a composite layer (natural footage) — have substantially different distributions. In practice, using a single shared architecture to reconcile these differences proved data-inefficient. To address this, Vera uses a MoT (Mixture-of-Transformers) design. Instead of a single DiT, we use three separate DiTs, one for each output:

Each DiT maintains its own QKV projections and FFN weights, but we concatenate the output tokens from all three branches and then pass it to joint self-attention. This enables cross-layer interaction while allowing each branch to specialize.
All three DiTs are initialized from the same pretrained T2V base model. We add two additional patch-embedding layers for the input video and an optional mask video. Source-video tokens are added to the composite tokens, while mask tokens are added to the noisy alpha tokens.
All layers share the same RoPE (Rotary Positional Encoding). We also add zero-initialized learnable embeddings to the alpha and composite tokens to help the model distinguish between layers.

Architecture of Vera compared to other methods. We train two Vera variants: 1.3B and 14B parameters.

Evaluations and Results

To evaluate Vera, we curated a benchmark of test video-prompt pairs: 72 for object addition and 69 for background change, using open-source videos. The test set spans a range of difficulty, including slow and fast motions, various camera motions, single and multiple objects, and both simple and complex scenes. We evaluated the performance across three complementary dimensions:

Content Preservation: Measures whether regions outside the targeted edit remain strictly unaltered, evaluated using pixel-level and perceptual similarity.
Instruction Compliance: Measures how faithfully the edited video executes the text prompt.
Video Quality: Assesses the temporal coherence and per-frame spatial quality of the final edited video.

In our results, both Vera-1.3B and Vera-14B significantly outperform existing baselines on content preservation, while maintaining similar video quality and instruction compliance performance compared to strongest baselines (please see the paper for full results).

Qualitative comparisons between Vera and baselines (please see more examples on Vera’s project website).

To complement automated metrics, we ran a human preference study comparing Vera against five baselines. We collaborated with 19 creative reviewers who evaluated 512 video trials in total. In each trial, reviewers were shown randomized side-by-side comparisons between the Vera model and a baseline model. The human consensus strongly aligned with our quantitative findings: Vera-1.3B was preferred over all baselines for content preservation and instruction compliance. Furthermore, reviewers rated Vera’s video quality as comparable to baselines on background change tasks, and noted a clear advantage for Vera on object addition tasks.

User study on test set: Vera-1.3B vs. five strong baselines.

VOID: Video Object and Interaction Deletion

Existing video object removal methods excel at inpainting content “behind” the object and correcting appearance-level artifacts such as shadows and reflections. However, when the removed object has more significant interactions — such as collisions with other objects — current models fail to correct them and produce implausible results. To address this, we present VOID, a video object removal framework designed to perform physically-plausible inpainting in these complex scenarios.

Teaser for VOID (disclaimer: This is a research prototype, not an official product).

A Two-Pass Inference Pipeline

Given an input video, the user clicks on an object to remove. A VLM-based reasoning pipeline then analyzes the scene to identify other regions that will be causally affected, e.g., objects that will fall, collide, or change trajectory. This physical reasoning is encoded into a quadmask to guide the diffusion model:

First Pass: VOID takes the video and the quadmasks as input and generates a physically plausible counterfactual video in which the object — and its interactions — are removed.
Second Pass: Smaller video diffusion models occasionally suffer from “object morphing” when generating moving objects. If VOID detects this failure mode, it triggers a second pass that re-runs inference using flow-warped noise derived from the first pass, stabilizing the object’s shape along its newly synthesized trajectory.

Overview of VOID’s two-pass inference pipeline.

Training Data

We built on top of the Kubric simulation engine and the HUMOTO human motion capture dataset to generate synthetic counterfactual video pairs along with their corresponding quadmasks. Specifically, the counterfactual videos are generated by re-simulating the exact scene from the original video, but with the target object(s) or human removed. This resimulation creates an alternate outcome based on strict laws of physics. For example, if a person holding a lamp is removed from the scene, the simulation ensures the lamp obeys gravity and falls to the ground. The quadmasks then capture the removed object (black), the affected regions (grey), their overlaps (dark grey), and the unchanged parts of the scene (white).

Model Training

During model training for VOID, we introduce two improvements over prior work: (i) quadmask conditioning, which explicitly identifies regions in each frame that may change after the object is removed, and (ii) a second-pass video appearance refiner that reduces artifacts such as unwanted object morphing. VOID is finally trained on the CogVideoX-Fun-V1.5–5b-InP backbone with Gen-Omnimatte’s checkpoint and fine-tuned for video inpainting with interaction-aware quadmask conditioning.

Evaluations and Results

Experiments across both synthetic and real data demonstrate that VOID preserves consistent scene dynamics far better than prior video object removal methods (please see the paper for full results). VOID successfully maintains object structure and produces plausible motion over time across a wide variety of real-world cases. By contrast, results from both open- and closed-source baselines consistently exhibit physically inaccurate artifacts. For instance, baselines generate water splashes without human impact (see top row of the figure below) or show spinning tops being disrupted without the presence of interacting hands.

Comparison of VOID with other strong baselines (please see more examples on VOID’s project website).

To complement our quantitative evaluation, we conducted a user study with 25 creative reviewers to measure the perceptual realism and physical plausibility of our counterfactual edits. Each participant was randomly assigned 5 out of 75 real-world scenarios, resulting in 125 total comparisons. For each video, participants viewed the original input alongside the outputs of VOID and six baselines (seven models total) in a randomized order. Participants were asked to select the video that best reflected how the scene should realistically appear after the object was removed, factoring in visual quality, temporal consistency, blending, the realism of scene evolution, and the absence of artifacts. VOID was selected 64.8% of the time, substantially outperforming all baseline models.

User study on real-world test examples: VOID vs. six baselines.

Looking Ahead

Applying AI in ways that serve both member and creator needs is core to our research philosophy, and these projects reflect that approach. While Vera and VOID show promising early results, reaching production-ready quality will require addressing several limitations we encountered. For example, Vera struggles with some complex effects such as lightning or smoke due to the limited training data, and in some cases, it fails to keep background motion fully consistent with the input camera movement. Despite the various generalization capabilities VOID exhibits, we still observe domain gaps. For instance, it cannot handle videos with unusual camera angles or shots captured very close to the target object, and it currently has constraints on supported video length and resolution.

These limitations motivate continued investment in this line of research. Vera and VOID are important early efforts toward making complex video editing more controllable and accessible for artists. For this work, we used publicly available datasets with additional annotation efforts for experiments, and we hope that sharing our research will encourage the broader community to build on these ideas and advance them further.

Toward More Controllable AI Video Editing: An Early Research Exploration at Netflix was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

What Would a Post-Birthright-Citizenship Country Look Like?

2026-06-23 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/shorts/HHB41KAMH5s

What Could Happen if the AI Bubble Pops?

2026-06-23 The Atlantic

Post Syndicated from The Atlantic original https://www.youtube.com/shorts/MkfCypCS5Ps

The EO’s requirements for federal systems

Two migrations: encryption and authentication. Both should begin now.

Supply chain pressure that helps everyone

Critical infrastructure and PQ for everyone

Opportunities for OMB’s implementation guidance

What to do now: don’t wait for 2030

Aligning policy and international standards

Speeding up CMVP

Go forth and PQ all the things

Rapid7 has been named a Major Player in the IDC MarketScape: Worldwide SIEM 2026 Vendor Assessment (#US54126826, June 2026).

Incident Command brings detection, response, and exposure context together

AI and automation, pressure-tested by a global SOC

What we believe this IDC MarketScape recognition says about the future of SIEM

What is AWS DevOps Agent and Apache Spark Troubleshooting Agent?

Use case

The scenario

The solution

Solution overview

Walkthrough

Prerequisites

Set up AWS DevOps Agent

Create an Agent Space

Connect Slack integration (optional)

Create a webhook

Deploy the AWS CloudFormation stack

Retrieve Amazon Cognito client credentials

Register the Spark Troubleshooting MCP Server

See AWS DevOps Agent in action

Root cause identified by AWS DevOps Agent

Mitigation plan generated by AWS DevOps Agent

AWS DevOps Agent sends a notification to Slack

Enhanced investigations with AWS DevOps Agent Skills

Clean up

Conclusion

Next steps

About the authors

Prerequisites

Step 1: Verify your short code is active and delivering

Step 2: Configure keywords and verify message compliance

Step 3: Create a configuration set with event destinations

Required event types

Step 4: Create a phone pool and associate your short code

Configuration parameters

Step 5: Request your throughput increase

Step 6: Request a spending limit increase

Step 7: Restrict destination countries

Step 8: Set up monitoring and alarms

Recommended alarms

Step 9: Track OTP verification success (if applicable)

Step 10: Set up cost visibility

Step 11: Plan your traffic migration

Step 12: Validate production readiness and send

Automate with a validation script

Cleaning up

Quick reference checklist

Conclusion

Related posts

About the author

What are Mayer’s Principles?

Designing lessons for clarity

Designing lessons for understanding

Applying Mayer’s Principles to your computing lessons

Live coding and code walkthroughs

Starting a new topic such as networking

Supporting multilingual learners with Mayer’s Principles

Intentional design for lasting understanding

Introduction

Vera: A Layered Video Diffusion Model

Inference Pipeline

Training Data

Model Architecture

Evaluations and Results

VOID: Video Object and Interaction Deletion

A Two-Pass Inference Pipeline

Training Data

Model Training

Evaluations and Results

Looking Ahead

The collective thoughts of the interwebz