Tag Archives: Uncategorized

AWS’s Well-Architected Framework Transformed by Amazon Q Developer

2025-04-30 Fahim Sajjad

Post Syndicated from Fahim Sajjad original https://aws.amazon.com/blogs/devops/awss-well-architected-framework-transformed-by-amazon-q-developer/

In the rapidly evolving landscape of cloud computing, developers, and architects face unprecedented challenges. These challenges include designing, implementing, and maintaining robust cloud infrastructures. The AWS Well-Architected Framework is the gold standard for building secure, efficient, and optimized cloud solutions. Traditionally, complying with this framework required deep expertise and manual analysis.

Now, Amazon Q Developer changes this paradigm. It introduces intelligent, context-aware recommendations. This framework is built on six pillars: operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability. Without careful attention to these foundational elements, organizations risk developing systems that fall short of both their business objectives and technical requirements, potentially compromising long-term scalability and performance.

In this blog, we will explore how Amazon Q Developer can help across the six pillars of the Well-Architected Framework.

Simplifying the AWS Well-Architected Framework with Amazon Q Developer

In this section, we will explore practical examples across the following pillars:

Operational excellence pillar with Amazon Q Developer
Security pillar with Amazon Q Developer
Reliability pillar with Amazon Q Developer
Performance efficiency pillar with Amazon Q Developer
Cost optimization pillar with Amazon Q Developer
Sustainability pillar with Amazon Q Developer

Operational Excellence pillar with Amazon Q Developer

Operational Excellence is a key pillar of the AWS Well-Architected Framework. It guides teams in running efficient workloads and monitoring systems. The pillar emphasizes continuous improvement to deliver business value. Amazon Q Developer enhances operations through AI-powered assistance for infrastructure as code, automated testing, and documentation. The service automatically creates runbooks and suggests safe infrastructure changes. It analyzes your environment and offers recommendations based on AWS best practices. Teams can implement operational excellence with reduced manual effort and fewer potential errors.

Let’s ask Amazon Q Developer in the console how it can help me improve operational excellence in my AWS Infrastructure: “How can I improve the Operational Excellence of my AWS infrastructure?”

Amazon Q Developer analyzes your prompt and generates comprehensive recommendations:

Amazon Q chat interface showing a response about best practices for improving AWS infrastructure operational excellence, with a list of recommendations including implementing IaC and automating deployments.

Figure 1: Prompting Amazon Q about Operational Excellence

Based on Amazon Q Developer’s guidance, Infrastructure as Code (IaC) is recommended for managing our AWS resources. Let’s open Visual Studio Code IDE and see how Amazon Q Developer Chat can help us implement IaC. We’ll create a CloudFormation template for a resilient web application by asking Amazon Q Developer to generate a template that includes an Application Load Balancer, EC2 instances, and an RDS database: “Generate a CloudFormation template for a highly available web application with an Application Load Balancer, EC2 instances, and an RDS database. Include best practices for operational excellence”

Figure 2: Amazon Q Developer generating a CloudFormation template for a highly available web application architecture.

It also explains the template and lists the best practices for operational excellence that were followed:

Figure 3: Amazon Q explaining Generated template and Best Practices

Amazon Q Developer automates documentation to track and approve infrastructure changes effectively. It performs automatic code reviews to check quality and identify security vulnerabilities. The service detects code duplication and guides developers in making small, predictable changes. When issues arise, Amazon Q Developer quickly investigates operational problems across AWS environments. This rapid troubleshooting helps businesses maintain high application availability.

Security pillar with Amazon Q Developer

Cloud security at AWS is the highest priority. The AWS Well-Architected Framework’s Security Pillar provides a comprehensive approach to implementing robust protective measures. Even though traditionally security has been an afterthought in development often sacrificed for speed and automation, Amazon Q Developer transforms this by enabling security checks at every stage of the software development lifecycle. By embedding continuous security validation, you can significantly reduce vulnerabilities in production environments. This shifts security from reactive to proactive, ensuring your cloud applications are not just functional, but fundamentally secure.

Amazon Q Developer can streamline this process serving as an intelligent security assistant for AWS environments. It can help create robust Identity and Access Management (IAM) policies, including role-based access controls, user permissions, and multi-factor authentication. For data protection, Amazon Q Developer supports encryption strategies, key management, and secure backup procedures. Additionally, Amazon Q Developer guides users in infrastructure protection planning through secure network architectures and VPC segmentation, offering comprehensive support across critical security areas.

Amazon Q Developer enhances security beyond basic configurations. It helps set up advanced monitoring solutions using CloudWatch and CloudTrail. The service creates intelligent security alerts and automates incident response mechanisms. It protects your AWS environment against emerging threats through security scanning. Amazon Q Developer identifies potential vulnerabilities in your infrastructure. These capabilities align with AWS Well-Architected Framework security best practices. To illustrate Amazon Q Developer’s practical application in enhancing workload security, let’s consider implementing VPC flow logs for improved network monitoring in our web application CloudFormation file that we created before: “How can we implement VPC flow logs for better network monitoring?”

Figure 4: Prompting Amazon Q about implementing VPC flow logs for better network monitoring

We can also ask Amazon Q Developer to check if there are any security best practices that is missing in our code. “What IAM security best practices are missing?”

Figure 5: Prompting Amazon Q about implementing VPC flow logs for better network monitoring.

These examples demonstrates how developers can leverage Amazon Q Developer to bolster their security posture effectively.

Reliability pillar with Amazon Q Developer

Reliability is a critical pillar of the AWS Well-Architected Framework. It extends beyond maintaining system uptime. A reliable architecture must handle unexpected disruptions and recover gracefully. Amazon Q Developer brings AI-powered intelligence to reliability engineering. It helps organizations design resilient, self-healing cloud infrastructures. The service anticipates potential failures and suggests mitigation strategies before business operations are affected.

Amazon Q Developer guides users in implementing AWS reliability best practices. It helps architect resilient systems through multi-AZ deployments and auto-scaling configurations. The service assists in setting up automated recovery procedures and fault-tolerant systems. It streamlines the configuration of health checks, load balancers, and redundant components. Amazon Q Developer enables effective monitoring through CloudWatch alarms and automated failover mechanisms. It supports infrastructure as code implementation with proper testing procedures. The service helps establish cross-region redundancy and appropriate service quotas. These features ensure systems maintain high availability and recover quickly from failures.

By interacting with Amazon Q Developer in the AWS Management Console, you can receive intelligent recommendations for improving your infrastructure’s reliability by asking Amazon Q Developer: “Can you provide recommendations to eliminate single point of failures?”

Figure 6: Prompting Amazon Q about reliabilityPerformance Efficiency pillar with Amazon Q Developer

Performance can determine an application’s success in cloud computing. The Performance Efficiency pillar guides organizations in maximizing their computational resources. Amazon Q Developer enhances this pillar through AI-powered recommendations. It helps organizations design and optimize their cloud infrastructure more effectively.

Amazon Q Developer uses machine learning to deliver advanced performance insights. It recommends architectural improvements like serverless adoption and optimal service configurations. The AI assistant suggests effective caching strategies and data processing optimizations. It analyzes system metrics and infrastructure patterns to guide improvements. Development teams can enhance system performance and reduce computational overhead. Amazon Q Developer helps create adaptive architectures that respond effectively to changing workload demands.

For example, let’s say you’re an IT Professional with a monolithic three-tier web application on AWS and wanted to get recommendation on performance efficiency. The IT Professional could open a new Amazon Q Developer chat in the AWS Management Console, and enter a prompt such as: “Based on my current monolithic application on AWS, what are some things I should do as it relates to the performance efficiency pillar of the Well-Architected Framework?”

Figure 7: Prompting Amazon Q about performance efficiency

As shown above in figure 7, Amazon Q Developer made multiple recommendations based on the monolith application in the prompt. Amazon Q Developer makes recommendations such as using Serverless options and breaking the application into microservices, which is a design principal in the performance efficiency pillar of the Well-Architected Framework.

Cost Optimization pillar with Amazon Q Developer

Amazon Q Developer has the ability to give general recommendations for Cost Optimization based on the Well-Architected Framework. For example, let’s say an IT Professional wants to get more information about ways they can generally optimize their compute costs in AWS. The IT Professional could open a new Amazon Q chat in the AWS Management Console, and enter a prompt such as: “What are some ways I can cost optimize my compute infrastructure I have running in AWS based on the cost optimization pillar of the Well Architected Framework?”

Figure 8: Prompting Amazon Q about cost optimization

As shown above in figure 8, Amazon Q Developer was able to make recommendations based on the Well-architected framework to help the developer cost optimize their compute through reserved instances and savings plans, auto scaling, rightsizing, and more, while also providing links to resources to help dive in further.

Additionally, Amazon Q Developer has revolutionized AWS cost analysis by introducing natural language processing capabilities directly integrated with AWS Cost Explorer. This innovative feature allows users to gain deep insights into cloud spending through simple, conversational queries, enabling professionals to understand complex cost structures, identify spending trends, and forecast future expenses with unprecedented ease. By transforming technical cost data into actionable insights, Amazon Q Developer empowers organizations to make more informed financial decisions about their cloud infrastructure.

For comprehensive details and specific use case, please refer to the full blog post: Analyzing your AWS Cost Explorer data with Amazon Q Developer: Now Generally Available.

Sustainability pillar with Amazon Q Developer

Sustainability represents a critical emerging pillar of the AWS Well-Architected Framework, focusing on minimizing the environmental impact of cloud computing infrastructure and operations. Amazon Q Developer introduces AI-powered capabilities that help organizations optimize their cloud resources to reduce carbon footprint, improve energy efficiency, and align technological strategies with environmental responsibility.

Through intelligent analysis and context-aware recommendations, Amazon Q Developer enables teams to make more sustainable architectural decisions. The AI assistant can provide insights into resource optimization, suggesting ways to reduce unnecessary compute power, recommend more energy-efficient service configurations, and help developers understand the environmental implications of their architectural choices. By leveraging machine learning and comprehensive AWS infrastructure knowledge, Amazon Q Developer empowers organizations to not only meet their technological requirements but also contribute to broader environmental sustainability goals in cloud computing.

In the below example, you can see how you can ask Amazon Q Developer to help meet your company’s sustainability goals by asking Amazon Q: “How can I review my sustainability objectives on AWS?”

Figure 9: Prompting Amazon Q about sustainability

As you can see above Amazon Q Developer generated recommendations showing how you can review your sustainability objectives.

Now let’s take the first recommendation: Carbon Footprint tool as an example and ask Amazon Q Developer in the console to ask a follow up question We will be using the following prompt to generate response: “How do I view my carbon footprint on AWS?”

Figure 10: Prompting Amazon Q about carbon footprint

Conclusion

Amazon Q Developer represents a pivotal moment in cloud computing, transforming the AWS Well-Architected Framework from a static set of guidelines to a dynamic, intelligent system of continuous improvement. By integrating advanced AI capabilities across operational excellence, security, reliability, performance efficiency, cost optimization, and sustainability, this innovative tool democratizes sophisticated cloud architecture strategies for organizations of all sizes. The true power of Amazon Q Developer lies not just in its ability to provide recommendations, but in its capacity to learn, adapt, and evolve alongside your infrastructure, bridging the gap between complex technical knowledge and actionable insights. As cloud technologies continue to advance, AI-powered tools like Amazon Q Developer will become increasingly essential, signifying a fundamental shift in how we approach cloud infrastructure: proactively, intelligently, and with a holistic understanding of technological and business requirements.

To get started with Amazon Q Developer in the AWS console, check out the documentation on chatting with Amazon Q Developer in AWS Console Home.

Migrating a CDK v1 Application to CDK v2 with Amazon Q Developer

2025-04-30 Dr. Rahul Sharad Gaikwad

Post Syndicated from Dr. Rahul Sharad Gaikwad original https://aws.amazon.com/blogs/devops/migrating-a-cdk-v1-application-to-cdk-v2-with-amazon-q-developer/

Introduction:

AWS Cloud Development Kit (AWS CDK) is an open-source software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation. As of June 1, 2023, AWS CDK version 1 is no longer supported. To avoid the potential issues that come with using an outdated version and to take advantage of the latest features and improvements, we highly recommend upgrading to AWS CDK version 2.

Amazon Q Developer, a generative AI-powered assistant for software development, enhances the efficiency of software development teams. It facilitates the creation of deployment-ready infrastructure as code (IaC) for AWS CloudFormation, AWS CDK, and Terraform. By using Amazon Q, developers can accelerate IaC development, enhance code quality, and decrease the likelihood of configuration errors.

This post demonstrates how Amazon Q Developer helps in upgrading the existing AWS CDK v1 application to AWS CDK v2.

Prerequisites

An AWS Builder ID or an AWS IAM Identity Center login controlled by your organization
A supported IDE, like Visual Studio Code
The AWS Toolkit IDE extension
Authenticate and Connect
Nodejs
AWS CDK v1
AWS CDK v2

Planning

In this blog post, I will explore a code example where I have created a VPC, Subnets, and an ECS Fargate cluster using AWS CDK version 1. I will then explain how you can use Amazon Q to transform the code from CDK v1 to CDK v2.

1. In order to initiate this process, I have begun by asking Amazon Q Developer for the necessary steps to migrate from CDK version 1 to version 2, which are outlined below.

Can you provide the steps to migrate from cdk version 1 to version 2?

Amazon Q Developer outlining the comprehensive process to upgrade AWS CDK applications from version 1 to version 2.

2. In the above screenshot Amazon Q Developer outlined several steps we can take to make the necessary changes. The first step is to update the dependencies. If I need guidance on how to update the dependencies, I can ask the Amazon Q Developer again for help by asking the steps regarding updating dependencies as below .

Can you provide the steps to update dependencies?

Amazon Q Developer offering detailed, AI-powered guidance to upgrade project dependencies by analyzing the existing codebase, identifying outdated or deprecated libraries and frameworks, and recommending precise updates to ensure compatibility with newer language versions.

3. After updating the dependencies, the next step is to update the import statements. To get guidance on how to update the import statements, I can ask the Amazon Q Developer assistant again for help by asking the steps regarding how to import statements as shown below.

@workspace Can you provide the steps to update import statements?

Amazon Q Developer advises on updating import statements by analyzing the current code context and guiding developers to replace legacy or outdated import paths with the latest.

In the above screenshot if you have noticed I have added @workspace before the question which automatically includes the most relevant chunks of my workspace code as context.

4. If any errors occur while updating the code as recommended by Amazon Q Developer, I can use Amazon Q Developer to debug the issue and provide the needed inputs to resolve it.

Amazon Q Developer diagnosing issues by analyzing error messages and AWS resource states, providing natural language explanations of root causes such as permission errors and misconfigurations.

5. Once I have finished the required steps, I can deploy the application using version 2 of the AWS CDK by running the cdk deploy command.

Deployment of the updated AWS CDK version 2 application, involving synthesizing CDK stacks to generate CloudFormation templates and deployment artifacts, bootstrapping the AWS environment to provision necessary resources.

6. In addition to its other capabilities, Amazon Q offers code review functionality. To initiate a code review, simply select Amazon Q and use the /review command. I’ll then have the option to review either the active files or the entire open workspace. Select your preference, and Amazon Q will analyze your project and provide comprehensive review results.

Amazon Q Developer performs comprehensive code analysis by reviewing your entire codebase or real-time code as you write, identifying security vulnerabilities, code quality issues, and deployment risks.

7. Amazon Q Developer can also generate documentation, including README files. To create documentation, select Amazon Q and enter the /doc command. Amazon Q will automatically generate a README file for your project. I can then review the generated documentation, accept the changes, or provide specific instructions for further modifications.

Amazon Q Developer automatically generates a comprehensive README file for the entire project by analyzing the codebase, project structure, and dependencies within the selected folder in the IDE.

Conclusion

In this blog, I demonstrated how Amazon Q Developer can simplify and accelerate the upgrade process from AWS CDK version 1 to version 2, ensuring your cloud infrastructure remains secure, efficient, and aligned with the latest AWS innovations. AWS CDK v2 offers a streamlined, consolidated library with improved performance and ongoing support, making infrastructure management easier and more reliable.

By leveraging Amazon Q Developer, a generative AI-powered assistant, teams can automate Infrastructure as Code development, enhance code quality, and minimize configuration errors. Together, these tools empower development teams to confidently modernize and scale their AWS environments, turning the upgrade process into a seamless opportunity for innovation and growth.

Resources

To learn more about Amazon Q Developer, see the following resources:

To learn more about the AWS CDK, see the following resources:

About the authors:

WhatsApp Case Against NSO Group Progressing

2025-04-30 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/whatsapp-case-against-nso-group-progressing.html

Meta is suing NSO Group, basically claiming that the latter hacks WhatsApp and not just WhatsApp users. We have a procedural ruling:

Under the order, NSO Group is prohibited from presenting evidence about its customers’ identities, implying the targeted WhatsApp users are suspected or actual criminals, or alleging that WhatsApp had insufficient security protections.

[…]

In making her ruling, Northern District of California Judge Phyllis Hamilton said NSO Group undercut its arguments to use evidence about its customers with contradictory statements.

“Defendants cannot claim, on the one hand, that its intent is to help its clients fight terrorism and child exploitation, and on the other hand say that it has nothing to do with what its client does with the technology, other than advice and support,” she wrote. “Additionally, there is no evidence as to the specific kinds of crimes or security threats that its clients actually investigate and none with respect to the attacks at issue.”

I have written about the issues at play in this case.

Applying Security Engineering to Prompt Injection Security

2025-04-29 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/applying-security-engineering-to-prompt-injection-security.html

This seems like an important advance in LLM security against prompt injection:

Google DeepMind has unveiled CaMeL (CApabilities for MachinE Learning), a new approach to stopping prompt-injection attacks that abandons the failed strategy of having AI models police themselves. Instead, CaMeL treats language models as fundamentally untrusted components within a secure software framework, creating clear boundaries between user commands and potentially malicious content.

[…]

To understand CaMeL, you need to understand that prompt injections happen when AI systems can’t distinguish between legitimate user commands and malicious instructions hidden in content they’re processing.

[…]

While CaMeL does use multiple AI models (a privileged LLM and a quarantined LLM), what makes it innovative isn’t reducing the number of models but fundamentally changing the security architecture. Rather than expecting AI to detect attacks, CaMeL implements established security engineering principles like capability-based access control and data flow tracking to create boundaries that remain effective even if an AI component is compromised.

Research paper. Good analysis by Simon Willison.

I wrote about the problem of LLMs intermingling the data and control paths here.

Windscribe Acquitted on Charges of Not Collecting Users’ Data

2025-04-28 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/windscribe-acquitted-on-charges-of-not-collecting-users-data.html

The company doesn’t keep logs, so couldn’t turn over data:

Windscribe, a globally used privacy-first VPN service, announced today that its founder, Yegor Sak, has been fully acquitted by a court in Athens, Greece, following a two-year legal battle in which Sak was personally charged in connection with an alleged internet offence by an unknown user of the service.

The case centred around a Windscribe-owned server in Finland that was allegedly used to breach a system in Greece. Greek authorities, in cooperation with INTERPOL, traced the IP address to Windscribe’s infrastructure and, unlike standard international procedures, proceeded to initiate criminal proceedings against Sak himself, rather than pursuing information through standard corporate channels.

Friday Squid Blogging: Squid Facts on Your Phone

2025-04-26 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/friday-squid-blogging-squid-facts-on-your-phone.html

Text “SQUID” to 1-833-SCI-TEXT for daily squid facts. The website has merch.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Cryptocurrency Thefts Get Physical

2025-04-25 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/cryptocurrency-thefts-get-physical.html

Long story of a $250 million cryptocurrency theft that, in a complicated chain events, resulted in a pretty brutal kidnapping.

New Linux Rootkit

2025-04-24 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/new-linux-rootkit.html

Interesting:

The company has released a working rootkit called “Curing” that uses io_uring, a feature built into the Linux kernel, to stealthily perform malicious activities without being caught by many of the detection solutions currently on the market.

At the heart of the issue is the heavy reliance on monitoring system calls, which has become the go-to method for many cybersecurity vendors. The problem? Attackers can completely sidestep these monitored calls by leaning on io_uring instead. This clever method could let bad actors quietly make network connections or tamper with files without triggering the usual alarms.

Here’s the code.

Note the self-serving nature of this announcement: ARMO, the company that released the research and code, has a product that it claims blocks this kind of attack.

Amazon introduces SWE-PolyBench, a multilingual benchmark for AI Coding Agents

2025-04-23 Christian Bock

Post Syndicated from Christian Bock original https://aws.amazon.com/blogs/devops/amazon-introduces-swe-polybench-a-multi-lingual-benchmark-for-ai-coding-agents/

Coding agents powered by large language models have shown impressive capabilities in software engineering tasks, but evaluating their performance across diverse programming languages and real-world scenarios remains challenging. This led to a recent explosion in benchmark creation to assess the coding effectiveness of said systems in controlled environments. In particular, SWE-Bench which measures the performance of systems in the context of GitHub issues has spurred the development of capable coding agents resulting in over 50 leaderboard submissions, thereby becoming the de-facto standard for coding agent benchmarking. Despite its significant impact as a pioneering benchmark, SWE-Bench, and in particular its “verified” subset, also shows some limitations. It contains only Python repositories, the majority of tasks are bug fixes, and at over 45% of all tasks, the Django repository is significantly over-represented.

Today, Amazon introduces SWE-PolyBench, the first industry benchmark to evaluate AI coding agents’ ability to navigate and understand complex codebases, introducing rich metrics to advance AI performance in real-world scenarios. SWE-PolyBench contains over 2,000 curated issues in four languages. In addition, it contains a stratified subset of 500 issues (SWE-PolyBench500) for the purpose of rapid experimentation. SWE-PolyBench evaluates the performance of AI coding agents through a comprehensive set of metrics: pass rates across different programming languages and task complexity levels, along with precision and recall measurements for code/file context identification. These evaluation metrics can help the community address challenges in understanding how well AI coding agents can navigate through and comprehend complex codebases

The leaderboard is accessible here. The SWE-PolyBench dataset is available on Hugging Face and the paper at arxiv. Evaluations can be run using the SWE-PolyBench codebase.

Below, we describe the key features, characteristics, and the creation process of our dataset alongside the new evaluation metrics, and performance of open source agents from our experiments.

Key features of SWE-PolyBench at a glance

Multi-Language Support: Java (165 tasks), JavaScript (1017 tasks), TypeScript (729 tasks), and Python (199 tasks).
Extensive Dataset: 2110 instances from 21 repositories ranging from web frameworks to code editors and ML tools, on the same scale as SWE-Bench full with more repository.
Task Variety: Includes bug fixes, feature requests, and code refactoring.
Faster Experimentation: SWE-PolyBench500 is a stratified subset for efficient experimentation.
Leaderboard: A leaderboard with a rich set of metrics for transparent benchmarking.

Building a comprehensive dataset

The creation of SWE-PolyBench involved a data collection and filtering process designed to ensure the quality and relevance of the benchmark tasks. SWE-Bench, a benchmark for Python code generation, evaluates agents on real-world programming tasks by utilizing GitHub issues and their corresponding code and test modifications. We extended the SWE-Bench data acquisition pipeline to support 3 additional languages besides Python and used it to gather and process coding challenges from real-world repositories as shown in Figure 1.

A flowchart diagram showing a software development process. It starts with an issue (#3039) and pull request (#3147) on the left, goes through a metadata filter in the middle, then splits into a runtime setup and testing phase on the right. The testing phase feeds into a test-based filter at the end. The diagram includes icons for programming languages like JavaScript, TypeScript, Python, and Java.

Figure 1: Overview of the SWE-PolyBench data generation pipeline, illustrating the process of collecting, filtering, and validating coding tasks.

The data acquisition pipeline collects pull requests (PRs) that close issues from popular repositories across Java, JavaScript, TypeScript, and Python. These PRs undergo filtering and are set up in containerized environments for consistent test execution. The process categorizes tests as fail-to-pass (F2P) or pass-to-pass (P2P) based on their outcomes before and after patch application. Only PRs with at least one F2P test are included in the final dataset, ensuring that each task represents a meaningful coding challenge. This streamlined approach results in a dataset that closely mimics real-world coding scenarios, providing a robust foundation for evaluating AI coding assistants.

Dataset characteristics

When constructing SWE-PolyBench, we aimed to collect GitHub issues that represent diverse programming scenarios: issues involving modifications across multiple code files and spanning different task categories (such as bug fixes, feature requests, and refactoring). Tables 1 and 2 provide descriptive statistics on the composition and complexity of SWE-PolyBench full (PB) and SWE-PolyBench500 (PB500). To offer a point of reference, we compare these statistics with those of SWE-Bench (SWE) and SWE-Bench verified (SWEv). Tasks in SWE-PolyBench require on average more files to be modified and more nodes to be changed, which indicates that they have higher complexity and are closer to tasks in real-world projects. The distribution of tasks is also more diverse, in particular for SWE-PolyBench500.

A comparison table showing statistics for different software benchmarks (SWE-PolyBench, SWE-PolyBench500, SWE-Bench, and SWE-Bench verified). The table has two main sections: Modified Files showing average changes across programming languages (Python, Java, JavaScript, TypeScript), and Task Category distribution showing percentages for Bug Fix, Feature Request, Refactoring, and Miscellaneous tasks

New evaluation metrics

To comprehensively evaluate AI coding assistants, SWE-Polybench introduces multiple new metrics in addition to the pass rate. The pass rate is the proportion of tasks successfully solved as measured by the generated patch passing all relevant tests. It is the primary metric for assessing coding agent performance, but it doesn’t provide a complete picture of an agent’s capabilities. In particular, it doesn’t give much information on an agent’s ability to navigate and understand complex code repositories. SWE-PolyBench introduces a new set of metrics based on Concrete Syntax Tree (CST) node analysis and the established file-level localization metric:

File-level Localization: assesses the agent’s ability to identify the correct files that need to be modified within a repository. Let us assume that we would need to modify file.py to solve our problem. If our coding agent implements a change in any other file, it would receive a file retrieval score of 0.
CST Node-level Retrieval: evaluates the agent’s ability to identify specific code structures that require changes. It uses the Concrete Syntax Tree (CST) representation of the code to measure how accurately the agent can locate the exact functions or classes that need modification.

A side-by-side comparison showing two Git version control diffs. Each diff shows a line being removed (in red, prefixed with '-') where my_var equals 3, and a line being added (in green, prefixed with '+') where my_var equals 2. Above the diffs are connected dots in different colors (green, pink, blue, and yellow) representing Git commit history visualization.

Figure 2: Illustration of CST node changes.

In Figure 2, we see a change in class node A materialized by a change in its initialization function on the left path starting from the file node. In contrast to the first change, the change in class B is considered a function node change as it doesn’t impact class construction.

Let us assume the change that would solve our problem is the change in the __init__ function. If our coding agent implements the change in my_func, it receives both a class and function node retrieval score of 0.

By combining pass rate assessment with both file-level and CST node-level retrieval metrics, SWE-PolyBench offers a detailed evaluation of AI coding assistants’ capabilities in real-world scenarios. This approach provides deeper insights into how well agents navigate and comprehend complex codebases, going beyond simple task completion to assess their understanding of code structure and organization.

Performance of open-source coding agents

Key Findings

Language Proficiency: Python is the strongest language for all agents, likely due to its prevalence in training data and existing benchmarks.
Complexity Challenges: Performance degrades as task complexity increases, particularly when modifications to 3 or more files are required.
Task Specialization: Different agents show strengths in various task categories (bug fixes, feature requests, refactoring).
Context Importance: The informativeness of problem statements impacts success rates across all agents (refer to Figure 5 of the appendix paper for details about this analysis).

Many existing open-source agents are designed primarily for Python. Adapting them to work for all four languages of SWE-PolyBench required adjusting test execution commands, modifying parsing mechanisms, and adapting containerization strategies for each language. We adapted and evaluated three open-source agents on SWE-PolyBench. The aforementioned adjustments are reflected by the added “-PB” suffix to the original agent names.

Two radar charts comparing three AI models: Aider-PB Sonnet 3.5, Agentless-PB Sonnet 3.5, and SWE-agent-PB Sonnet 3.5. The left chart shows performance across programming languages (Java, JavaScript, TypeScript, Python). The right chart displays performance in different coding styles (Functional only, Single Function, All, Mixed, No nodes, Single Class, Class only). Each model is represented by a different colored line, with Aider-PB generally showing the highest performance across categories.

Figure 3: Performance of coding agents across programming languages and task complexities, highlighting strengths and areas for improvement.

Figure 3 provides a visual representation of agent performance across different dimensions:

Language Proficiency: The left side of the chart shows that all three agents perform best in Python, with significantly lower pass rates in other languages. This highlights the current bias towards Python in many coding agents and their underlying large language models.
Task Complexity: The right side of the chart illustrates how performance degrades as task complexity increases. Agents show higher pass rates for tasks involving single class or function changes, but struggle with tasks requiring modifications to multiple classes or functions and in instances where both class and function changes are required.

This comprehensive view of agent performance underscores the value of SWE-PolyBench in identifying specific strengths and weaknesses of different coding assistants, paving the way for targeted improvements in future iterations.

In addition to these insights, the evaluation revealed interesting patterns across different task categories as shown in Table 2. The performance data across bug fixes, feature requests, and refactoring tasks reveals varying strengths among AI coding assistants. The performance on bug fixing tasks is relatively consistent. There is more variability between different agents and between multiple runs of a given agent for feature request tasks and refactoring tasks.

Table 3 showing average pass rates with standard error by task category for three agents: Agentless-PB, SWE-Agent-PB, and Aider-PB. The task categories are Bug Fix, Feature Request, and Refactoring. Aider-PB has the highest pass rates for Bug Fix (13.8) and Feature Request (15.1), while SWE-Agent-PB leads in Refactoring (16.1). Standard errors are provided for each value.

Join the SWE-PolyBench community

SWE-PolyBench and its evaluation framework are publicly available. This open approach invites the global developer community to build upon this work and advance the field of AI-assisted software engineering. As coding agents continue to evolve, benchmarks like SWE-PolyBench play a crucial role in ensuring they can meet the diverse needs of real-world software development across multiple programming languages and task types.

Explore SWE-PolyBench today and contribute to the future of AI-powered software engineering!

Resources

Authors

Regulating AI Behavior with a Hypervisor

2025-04-23 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/regulating-ai-behavior-with-a-hypervisor.html

Interesting research: “Guillotine: Hypervisors for Isolating Malicious AIs.”

Abstract:As AI models become more embedded in critical sectors like finance, healthcare, and the military, their inscrutable behavior poses ever-greater risks to society. To mitigate this risk, we propose Guillotine, a hypervisor architecture for sandboxing powerful AI models—models that, by accident or malice, can generate existential threats to humanity. Although Guillotine borrows some well-known virtualization techniques, Guillotine must also introduce fundamentally new isolation mechanisms to handle the unique threat model posed by existential-risk AIs. For example, a rogue AI may try to introspect upon hypervisor software or the underlying hardware substrate to enable later subversion of that control plane; thus, a Guillotine hypervisor requires careful co-design of the hypervisor software and the CPUs, RAM, NIC, and storage devices that support the hypervisor software, to thwart side channel leakage and more generally eliminate mechanisms for AI to exploit reflection-based vulnerabilities. Beyond such isolation at the software, network, and microarchitectural layers, a Guillotine hypervisor must also provide physical fail-safes more commonly associated with nuclear power plants, avionic platforms, and other types of mission critical systems. Physical fail-safes, e.g., involving electromechanical disconnection of network cables, or the flooding of a datacenter which holds a rogue AI, provide defense in depth if software, network, and microarchitectural isolation is compromised and a rogue AI must be temporarily shut down or permanently destroyed.

The basic idea is that many of the AI safety policies proposed by the AI community lack robust technical enforcement mechanisms. The worry is that, as models get smarter, they will be able to avoid those safety policies. The paper proposes a set technical enforcement mechanisms that could work against these malicious AIs.

Android Improves Its Security

2025-04-22 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/android-improves-its-security.html

Android phones will soon reboot themselves after sitting idle for three days. iPhones have had this feature for a while; it’s nice to see Google add it to their phones.

Friday Squid Blogging: Live Colossal Squid Filmed

2025-04-19 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/friday-squid-blogging-live-colossal-squid-filmed.html

A live colossal squid was filmed for the first time in the ocean. It’s only a juvenile: a foot long.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Accelerate large-scale modernization of .NET, mainframe, and VMware workloads using Amazon Q Developer

2025-04-16 Krishna Parab

Post Syndicated from Krishna Parab original https://aws.amazon.com/blogs/devops/accelerate-large-scale-modernization-of-net-mainframe-and-vmware-workloads-using-amazon-q-developer/

Software runs the world – not just the new software applications built in modern languages and deployed on the most optimized cloud infrastructure, but also legacy software built over years and barely understood by the teams that inherit them. These legacy applications may have snowballed into monolithic blocks or may be fragmented across siloed on-premises infrastructure. The significant maintenance, security, and compliance challenges caused can create lasting implications for business performance and competitiveness. Therefore, transformation of legacy applications using modern languages, new frameworks, and cloud services has become an organizational imperative.

Application modernization challenges

Modernization of software applications is a long and painful journey – requiring large teams of developers, domain experts, and consultants who first need to understand the application landscape, devise strategic modernization plans, and then tactically implement the plans in phases, typically over a span of many years. This process is linear, slow, and complex. Traditional labor-intensive modernization approaches incur significant costs and take years to leverage new cloud technologies and innovations for business-critical applications.

Generative AI can help with intelligent automation, domain expertise, and scalability to transform modernization journeys.

Introducing Amazon Q Developer transformation capabilities

Q Developer transformation capabilities powered by LLMs and domain-expert agents support human-agent interaction via an IDE experience for individual developers and a web experience for multifunctional teams.

Amazon Q Developer transformation capabilities

Amazon Q Developer, the most capable generative AI–powered assistant for software development, is now the first generative AI-powered assistant for large-scale modernization and migration of .NET, mainframe, and VMware workloads. This extends Q Developer’s transformation capabilities for Java upgrades launched in April 2024 to new types of workloads. Q Developer combines both foundational models and specialized tools based on AI and automated reasoning via autonomous agents that tackle workload-specific modernization steps spanning analysis, planning, and implementation.

Multifunctional teams, including consultants, IT experts, workload domain experts, and developers, can use a unified web experience to offload transformation tasks to Amazon Q Developer agents and transform hundreds of workloads at a time. The agents can port .NET Framework to cross-platform Linux-ready .NET, modernize COBOL applications on mainframes to Java applications on AWS, or virtualized workloads on VMware to scalable workloads on EC2. The modernization teams engage with Q Developer using natural language and share transformation objectives, code repositories, and context. Q Developer agents analyze artifacts like code segments, dependencies, and integrations, applying expertise from prior modernizations. They propose customized plans tailored to codebases, resource utilization, and objectives. The teams can then review, adjust, and approve the plans with iterative engagement with the agents. After the plans are approved, the agents implement the transformation keeping the modernization teams updated on milestones completed and blockers needing human guidance. The transformation journey is an interactive process between the modernization team and Q Developer, with modernization team maintaining control and visibility over the transformation.

Human team members interact with Q Developer generative AI agents using natural language chat.

Natural language chat with Q Developer AI agents

Faster, scalable, and better modernization

Amazon Q Developer enhances transformation in three primary ways – acceleration, scalability, and quality.

Amazon Q Developer automates and accelerates complex, multi-step processes. Agents conduct assessment and discovery of legacy artifacts to build documentation and dependency maps that improve the understanding of source assets. Most large-scale modernization projects are done in waves that need to be carefully planned. The agents develop modernization wave plans based on source dependencies, stated project goals, and teams can review and approve the plans. Thereafter, the goal-seeking autonomous agents handle implementation complexities to execute the plans. Customers using Amazon Q Developer can modernize Windows .NET applications to Linux up to four times faster than traditional methods and help customers realize up to 40% savings in licensing costs. Migration Planning for the sequence to transform monolith z/OS COBOL application code that takes months to accomplish with human subject matter experts, Amazon Q Developer generates in minutes. Q Developer agents convert on-premises VMware network configurations into modern AWS equivalents in hours vs. the weeks required with traditional manual approaches. The shorter time spent on manual modernization means more freedom for your team to focus on innovation.

Modernization has traditionally been a linear journey with multiple steps and dependencies on cross-functional teams with limited mechanisms for collaboration. This limits teams’ ability to tackle large-scale projects. Amazon Q Developer addresses the challenges by task parallelization and web-based collaboration. Multiple generative AI agents work simultaneously on tasks. Large monolithic applications can be decomposed along business functions like engineering, marketing, sales applications, and transformed in parallel. A unified web-based experience for large-scale transformation means multi-functional team members can collaborate with the autonomous agents, and review and approve key decisions in one place, enabling teams to execute larger and more complex projects in a given time.

Finally, the quality of transformation manifested in functional equivalence, security, and resilience of modernized applications determines the business outcomes like project ROI and operational performance. To ensure transformation quality, you need expertise in languages and frameworks like COBOL, Java, .NET; specialized steps like code base analysis, monolith decomposition, code refactoring, network translation; and domains like mainframe, virtualization, and cloud. You may not have the requisite expertise in your team. That is where Amazon Q Developer can help. Q Developer agents are trained with specific domain expertise to identify code dependencies and frameworks, replace deprecated code, upgrade to new language frameworks, incorporate security best practices, and validate upgraded workloads using workload-tailored plans. Your team can examine the agents’ recommendations, make informed decisions, and guide the modernization journey towards better outcomes like enhanced security, compliance, and performance.

Q Developer supports modernization of .NET Framework applications to cross-platform .NET applications, mainframe-based COBOL applications to Java applications on AWS, on-premises VMware workloads to workloads on EC2, and Java v8/11/17 to Java17/21.

Workloads supported by Amazon Q Developer transformation capabilities

Next steps

Amazon Q Developer transformation capabilities are now available in preview. To learn more, please visit Q Developer web page featuring short demo videos and documentation that can get you started. Read the AWS News blogs that walk you through the unified web experience and IDE experience. Dive deeper into the transformation of specific workloads by reading the workload-specific blogs related to transformation of .NET, mainframe, and VMware workloads.

About the author:

CVE Program Almost Unfunded

2025-04-16 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/cve-program-almost-unfunded.html

Mitre’s CVE’s program—which provides common naming and other informational resources about cybersecurity vulnerabilities—was about to be cancelled, as the US Department of Homeland Security failed to renew the contact. It was funded for eleven more months at the last minute.

This is a big deal. The CVE program is one of those pieces of common infrastructure that everyone benefits from. Losing it will bring us back to a world where there’s no single way to talk about vulnerabilities. It’s kind of crazy to think that the US government might damage its own security in this way—but I suppose no crazier than any of the other ways the US is working against its own interests right now.

Sasha Romanosky, senior policy researcher at the Rand Corporation, branded the end to the CVE program as “tragic,” a sentiment echoed by many cybersecurity and CVE experts reached for comment.

“CVE naming and assignment to software packages and versions are the foundation upon which the software vulnerability ecosystem is based,” Romanosky said. “Without it, we can’t track newly discovered vulnerabilities. We can’t score their severity or predict their exploitation. And we certainly wouldn’t be able to make the best decisions regarding patching them.”

Ben Edwards, principal research scientist at Bitsight, told CSO, “My reaction is sadness and disappointment. This is a valuable resource that should absolutely be funded, and not renewing the contract is a mistake.”

He added “I am hopeful any interruption is brief and that if the contract fails to be renewed, other stakeholders within the ecosystem can pick up where MITRE left off. The federated framework and openness of the system make this possible, but it’ll be a rocky road if operations do need to shift to another entity.”

Slopsquatting

2025-04-15 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/slopsquatting.html

As AI coding assistants invent nonexistent software libraries to download and use, enterprising attackers create and upload libraries with those names—laced with malware, of course.

EDITED TO ADD (1/22): Research paper. Slashdot thread.

Upcoming Speaking Engagements

2025-04-14 B. Schneier

Post Syndicated from B. Schneier original https://www.schneier.com/blog/archives/2025/04/upcoming-speaking-engagements-45.html

This is a current list of where and when I am scheduled to speak:

I’m giving an online talk on AI and trust for the Weizenbaum Institute on April 24, 2025 at 2:00 PM CEST (8:00 AM ET).

The list is maintained on this page.

China Sort of Admits to Being Behind Volt Typhoon

2025-04-14 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/china-sort-of-admits-to-being-behind-volt-typhoon.html

The Wall Street Journal has the story:

Chinese officials acknowledged in a secret December meeting that Beijing was behind a widespread series of alarming cyberattacks on U.S. infrastructure, according to people familiar with the matter, underscoring how hostilities between the two superpowers are continuing to escalate.

The Chinese delegation linked years of intrusions into computer networks at U.S. ports, water utilities, airports and other targets, to increasing U.S. policy support for Taiwan, the people, who declined to be named, said.

The admission wasn’t explicit:

The Chinese official’s remarks at the December meeting were indirect and somewhat ambiguous, but most of the American delegation in the room interpreted it as a tacit admission and a warning to the U.S. about Taiwan, a former U.S. official familiar with the meeting said.

No surprise.

Announcing the European region for Amazon Q Developer

2025-04-14 Brian Beach

Post Syndicated from Brian Beach original https://aws.amazon.com/blogs/devops/amazon-q-developer-european-region/

As I sat down to write this post, my daughter called from the top of the Eiffel Tower on a trip with her high school class. While she excitedly pointed her camera toward the Parisian skyline, I was struck by how technology has transformed our concept of distance. Her world, at eighteen, is infinitely more connected than the one I knew at her age. I couldn’t help but smile at the timing of this call, because today Amazon Q Developer is expanding to Europe.

The launch of Amazon Q Developer Pro Tier in the Frankfurt (eu-central-1) region marks a significant milestone for our European customers, addressing two critical needs: data residency and performance optimization. For organizations that need to meet EU data residency requirements, the ability to store customer content within EU boundaries can help provide the assurances they require. Beyond compliance, this regional presence brings performance benefits. European customers will experience reduced latency in their interactions with Amazon Q Developer, as requests are processed closer to home. This proximity not only improves response times but also enhances the overall development experience, making real-time interactions with Amazon Q Developer more fluid and natural.

Amazon Q Developer Pro tier users now have the choice of creating a profile in N. Virginia (us-east-1) or Frankfurt (eu-central-1). Associated data – including customizations – is stored in this region. While data is stored in Frankfurt, Amazon Q utilizes cross-region inferencing to optimize request processing. At launch, this includes Frankfurt, Ireland, Paris and Stockholm, as shown in the following image.

A map of Western Europe showing connections between four cities: Frankfurt (shown as a central hub with concentric circles) connected by curved orange lines to Stockholm (Sweden), Ireland, and Paris (France). The map has a dark background with countries shown in gray.

Finally, it is important to note that certain operations, such as querying AWS resources in other regions (e.g. “List my S3 buckets in Tokyo”), will naturally involve cross-region calls regardless of your Q Developer profile’s location.

The Frankfurt region includes all GA features except the command line and the ability to chat with Support. You can read more in the Amazon Q Developer User Guide. We invite you to experience these new capabilities by upgrading to the Pro tier and selecting Frankfurt as your region during profile creation. Get started with Amazon Q Developer, and share your feedback with us as we continue to expand our global presence.

Friday Squid Blogging: Squid and Efficient Solar Tech

2025-04-11 Bruce Schneier

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2025/04/friday-squid-blogging-squid-and-efficient-solar-tech.html

Researchers are trying to use squid color-changing biochemistry for solar tech.

This appears to be new and related research to a 2019 squid post.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Simplifying the AWS Well-Architected Framework with Amazon Q Developer

Operational Excellence pillar with Amazon Q Developer

Security pillar with Amazon Q Developer

Reliability pillar with Amazon Q Developer

Cost Optimization pillar with Amazon Q Developer

Sustainability pillar with Amazon Q Developer

Conclusion

Introduction:

Prerequisites

Planning

Conclusion

Resources

Key features of SWE-PolyBench at a glance

Building a comprehensive dataset

Dataset characteristics

New evaluation metrics

Performance of open-source coding agents

Key Findings

Join the SWE-PolyBench community

Resources

Authors

Application modernization challenges

Introducing Amazon Q Developer transformation capabilities

Faster, scalable, and better modernization

Next steps

The collective thoughts of the interwebz