Tag Archives: provable security

An unexpected discovery: Automated reasoning often makes systems more efficient and easier to maintain

2024-10-17 Byron Cook

Post Syndicated from Byron Cook original https://aws.amazon.com/blogs/security/an-unexpected-discovery-automated-reasoning-often-makes-systems-more-efficient-and-easier-to-maintain/

During a recent visit to the Defense Advanced Research Projects Agency (DARPA), I mentioned a trend that piqued their interest: Over the last 10 years of applying automated reasoning at Amazon Web Services (AWS), we’ve found that formally verified code is often more performant than the unverified code it replaces.

The reason is that the bug fixes we make during the process of formal verification often positively impact the code’s runtime. Automated reasoning also gives our builders confidence to explore additional optimizations that improve system performance even further. We’ve found that formally verified code is easier to update, modify, and operate, leading to fewer late-night log analysis and debugging sessions. In this post, I’ll share three examples that came up during my discussions with DARPA.

Automated reasoning: The basics

At AWS, we strive to build services that are simple and intuitive for our customers. Underneath that simplicity lie vast, complex distributed systems that process billions of requests every second. Verifying the correctness of these complex systems is a significant challenge. Our production services are in a constant state of evolution as we introduce new features, redesign components, enhance security, and optimize performance. Many of these changes are complex themselves, and must be made without impacting the security or resilience of AWS or our customers.

Design reviews, code audits, stress testing, and fault injection are all invaluable tools we use regularly, and always will. However, we’ve found that we need to supplement these techniques in order to confirm correctness in many cases. Subtle bugs can still escape detection, particularly in large-scale, fault-tolerant architectures. And some issues might even be rooted in the original system design, rather than implementation flaws. As our services have grown in scale and complexity, we’ve had to supplement traditional testing approaches with more powerful techniques based on math and logic. This is where the branch of artificial intelligence (AI) called automated reasoning comes into play.

While traditional testing focuses on validating system behavior under specific scenarios, automated reasoning aims to use logic to verify system behavior under any possible scenario. In even a moderately complex system, it would take an intractably large amount of time to reproduce every combination of possible states and parameters that may occur. With automated reasoning, it’s possible to achieve the same effect quickly and efficiently by computing a logical proof of the correctness of the system.

Using automated reasoning requires our builders to have a different mindset. Instead of trying to think about all possible input scenarios and how they might go wrong, we define how the system should work and identify the conditions that must be met in order for it to behave correctly. Then we can verify that those conditions are true by using mathematical proof. In other words, we can verify that the system is correct.

Automated reasoning views a system’s specification and implementation in mathematics, then applies algorithmic approaches to verify that the mathematical representation of the system satisfies the specification. By encoding our systems as mathematical systems and reasoning about them using formal logic, automated reasoning allows us to efficiently and authoritatively answer critical questions about the systems’ future behavior. What can the system do? What will it do? What can it never do? Automated reasoning can help answer these questions for even the most complex, large-scale, and potentially unbounded systems—scenarios that are impossible to exhaustively validate through traditional testing alone.

Does automated reasoning allow us to achieve perfection? No, because it still depends on certain assumptions about the correct behavior of the components of a system and the relationship between the system and the model of its environment. For example, the model of a system might incorrectly assume that underlying components such as compilers and processors don’t have any bugs (although it is possible to formally verify those components as well). That said, automated reasoning allows us to achieve higher confidence in correctness than is possible by using traditional software development and testing methods.

Faster development

Automated reasoning is not just for mathematicians and scientists. Our Amazon Simple Storage Service (Amazon S3) engineers use automated reasoning every day to prevent bugs. Behind the simple interface of S3 is one of the world’s largest and most complex distributed systems, holding 400 trillion objects, exabytes of data, and regularly processing over 150 million requests per second. S3 is composed of many subsystems that are distributed systems in their own right, many consisting of tens of thousands of machines. New features are being added all the time, while S3 is under heavy use by our customers.

A key component of S3 is the S3 index subsystem, an object metadata store that enables fast data lookups. This component contains a very large, complex data structure and intricate, optimized algorithms. Because the algorithms are difficult for humans to get right at S3 scale, and because we can’t afford errors in S3 lookups, we made new improvements on a cadence of about once per quarter, due to the extreme care and extensive testing required to confidently make a change.

S3 is a well-built and well-tested system built on 15 years of experience. However, there was a bug in the S3 index subsystem for which we couldn’t determine the root cause for some time. The system was able to automatically recover from the exception, so its presence didn’t impact the behavior of the system. Still, we were not satisfied.

Why was this bug around so long? Distributed systems like S3 have a large number of components, each with their own corner cases, and a number of corner cases happen at the same time. In the case of S3, which has over 300 microservices, the number of potential combinations of these corner cases is enormous. It’s not possible for developers to think through each of these corner cases, even when they have evidence the bug exists and ideas about its root cause—never mind all of the possible combinations of corner cases.

This complexity drove us to look at how we could use automated reasoning to explore the possible states and errors that might be hidden in those states. By building a formal specification of the system, we were able to find the bug and prove the absence of further bugs of its type. Using automated reasoning also gave us the confidence to ship updates and improvements every one to two months rather than just three to four times a year.

Faster code

The correctness of the AWS Identity and Access Management (IAM) service is foundational to the security of our customers’ workloads. Across millions of customers, thousands of resource types, and hundreds of AWS services, every API call—every single request to AWS—is processed by the IAM authorization engine. That’s over 1.2 billion requests per second. This is some of the most security-critical and highly scaled software in AWS.

Before any change at AWS goes into production, we need an extremely high degree of confidence that the system remains secure and correct. Using automated reasoning, we can prove that our systems adhere to specific security properties, under an exhaustive number of circumstances. We call this provable security. Not only has automated reasoning enabled us to provide provable security assurance to our customers, it gives us the ability to deliver functionality, security, and optimization at scale.

Like S3, IAM has evolved over 15 years into a time-tested and trusted system. But we wanted to raise the bar further. We built a formal specification that captures the behavior of the existing IAM authorization engine, codified its policy evaluation principles into provable theorems, and used automated reasoning to build a new and more efficient implementation. Earlier this year, we deployed the new proved-correct authorization engine —and no one noticed. Automated reasoning allowed us to seamlessly replace one of the most critical pieces of AWS infrastructure, the authorization engine, with a proved-correct equivalent.

With the specification and proofs in place, we could safely and aggressively optimize the code with a high degree of confidence. At the massive scale of IAM, every microsecond of performance improvement translates into a better customer experience and better cost optimization for AWS. We optimized string matching, removed unnecessary memory allocation and redundant computations, strengthened security, and improved scalability. After every change, we re-ran our proofs to confirm that the system was still operating correctly.

The optimized IAM authorization engine is now 50% faster than its predecessor. We simply would not have been able to make these types of impactful optimizations with such confidence if we didn’t use automated reasoning. For a deeper look at how we did this, see this AWS re:Inforce session.

Faster deployment (of faster code)

Most secure online transactions are protected by encryption. For example, the RSA encryption algorithm protects data by generating two keys: one to encrypt the data, and one to decrypt it. These keys enable secure data transmission as well as secure digital signatures. In the context of encryption, correctness and performance are both essential—a bug in an encryption algorithm can be disastrous.

As AWS customers move their workloads to AWS Graviton, the benefits of optimizing cryptography for the ARM instruction set increase. But optimizing encryption for better performance is complex, which makes it difficult to verify that modified encryption algorithms are behaving properly. Before we started to use automated reasoning, optimizations to cryptography libraries often required months-long reviews to achieve confidence for release into production.

Enter the power of automated reasoning: formal verification made RSA faster, and faster to deploy. We are seeing similar improvements when we apply automated reasoning to elliptic curve cryptography.

The formation of a virtuous cycle

Over the last decade, we’ve increasingly applied automated reasoning techniques within AWS to prove the correctness of our cloud infrastructure and services. We routinely use these methods not only to verify correctness, but also to enhance security and reliability and minimize design flaws. Automated reasoning can be used to create a precise, testable model of a system, which we can use to quickly verify that changes are safe—or learn they are unsafe without causing harm in production.

We can answer critical questions about our infrastructure to detect misconfigurations that might expose data. We can help stop subtle but serious bugs from reaching production that we would not have found with other techniques. We can make bold performance optimizations that we would not have dared attempt without model checking. Automated reasoning provides rigorous mathematical assurance that critical systems behave as expected.

AWS is the first and only cloud provider to use automated reasoning at this scale. As adoption of automated reasoning tools increases, it becomes easier for us to justify ever-larger investments into improving the usability and scalability of automated reasoning tools. The easier it is to use the automated reasoning tools and the more powerful they become, the more adoption we’ve observed. The more we’re able to prove correctness of our cloud infrastructure, the more compelling the cloud is to security-obsessed customers. And, as the examples in this post illustrate, not only are we able to increase security assurance, we are delivering higher performant code to customers faster, translating into cost savings that we can eventually pass on to customers.

My prediction is that we’re in the beginning of an era in which critical properties like security, compliance, availability, durability, and safety can be proved automatically for large-scale cloud architectures. From preventing potential issues with AI hallucinations to analyzing hypervisors, cryptography, and distributed systems, having sound mathematical reasoning at our foundations and continuously analyzing what we build sets Amazon apart.

Learn more

Learn more about automated reasoning on the Amazon Science blog.
Understand how AWS applies automated reasoning to provide provable security.
If you’re interested in an internship with the AWS Automated Reasoning Group, let us know.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

AWS Security Profile – Cryptography Edition: Panos Kampanakis, Principal Security Engineer

2023-04-18 Roger Park

Post Syndicated from Roger Park original https://aws.amazon.com/blogs/security/aws-security-profile-panos-kampanakis/

In the AWS Security Profile — Cryptography Edition series, we interview Amazon Web Services (AWS) thought leaders who help keep our customers safe and secure. This interview features Panos Kampanakis, Principal Security Engineer, AWS Cryptography. Panos shares thoughts on data protection, cloud security, post-quantum cryptography, and more.

What do you do in your current role and how long have you been at AWS?

I have been with AWS for two years. I started as a Technical Program Manager in AWS Cryptography, where I led some AWS Cryptography projects related to cryptographic libraries and FIPS, but I’m currently working as a Principal Security Engineer on a team that focuses on applied cryptography, research, and cryptographic software. I also participate in standardization efforts in the security space, especially in cryptographic applications. It’s a very active space that can consume as much time as you have to offer.

How did you get started in the data protection/ cryptography space? What about it piqued your interest?

I always found cybersecurity fascinating. The idea of proactively focusing on security and enabling engineers to protect their assets against malicious activity was exciting. After working in organizations that deal with network security, application security, vulnerability management, and security information sharing, I found myself going back to what I did in graduate school: applied cryptography.

Cryptography is a constantly evolving, fundamental area of security that requires breadth of technical knowledge and understanding of mathematics. It provides a challenging environment for those that like to constantly learn. Cryptography is so critical to the security and privacy of data and assets that it is top of mind for the private and public sector worldwide.

How do you explain your job to your non-tech friends?

I usually tell them that my work focuses on protecting digital assets, information, and the internet from malicious actors. With cybersecurity incidents constantly in the news, it’s an easy picture to paint. Some of my non-technical friends still joke that I work as a security guard!

What makes cryptography exciting to you?

Cryptography is fundamental to security. It’s critical for the protection of data and many other secure information use cases. It combines deep mathematical topics, data information, practical performance challenges that threaten deployments at scale, compliance with various requirements, and subtle potential security issues. It’s certainly a challenging space that keeps evolving. Post-quantum or privacy preserving cryptography are examples of areas that have gained a lot of attention recently and have been consistently growing.

Given the consistent evolution of security in general, this is an important and impactful space where you can work on challenging topics. Additionally, working in cryptography, you are surrounded by intelligent people who you can learn from.

AWS has invested in the migration to post-quantum cryptography by contributing to post-quantum key agreement and post-quantum signature schemes to protect the confidentiality, integrity, and authenticity of customer data. What should customers do to prepare for post-quantum cryptography?

There are a few things that customers can do while waiting for the ratification of the new quantum-safe algorithms and their deployment. For example, you can inventory the use of asymmetric cryptography in your applications and software. Admittedly, this is not a simple task, but with proper subject matter expertise and instrumentation where necessary, you can identify where you’re using quantum-vulnerable algorithms in order to prioritize the uses. AWS is doing this exercise to have a prioritized plan for the upcoming migration.

You can also study and experiment with the potential impact of these new algorithms in critical use cases. There have been many studies on transport protocols like TLS, virtual private networks (VPNs), Secure Shell (SSH), and QUIC, but organizations might have unique uses that haven’t been accounted for yet. For example, a firm that specializes in document signing might require efficient signature methods with small size constraints, so deploying Dilithium, NIST’s preferred quantum-safe signature, could come at a cost. Evaluating its impact and performance implications would be important. If you write your own crypto software, you can also strive for algorithm agility, which would allow you to swap in new algorithms when they become available.

More importantly, you should push your vendors, your hardware suppliers, the software and open-source community, and cloud providers to adjust and enable their solutions to become quantum-safe in the near future.

What’s been the most dramatic change you’ve seen in the data protection and post-quantum cryptography landscape?

The transition from typical cryptographic algorithms to ones that can operate on encrypted data is an important shift in the last decade. This is a field that’s still seeing great development. It’s interesting how the power of data has brought forward a whole new area of being able to operate on encrypted information so that we can benefit from the analytics. For more information on the work that AWS is doing in this space, see Cryptographic Computing.

In terms of post-quantum cryptography, it’s exciting to see how an important potential risk brought a community from academia, industry, and research together to collaborate and bring new schemes to life. It’s also interesting how existing cryptography has reached optimal efficiency levels that the new cryptographic primitives sometimes cannot meet, which pushes the industry to reconsider some of our uses. Sometimes the industry might overestimate the potential impact of quantum computing to technology, but I don’t believe we should disregard the effect of heavier algorithms on performance, our carbon footprint, energy consumption, and cost. We ought to aim for efficient solutions that don’t undermine security.

Where do you see post-quantum cryptography heading in the future?

Post-quantum cryptography has received a lot of attention, and a transition is about to start ramping up after we have ratified algorithms. Although it’s sometimes considered a Herculian effort, some use cases can transition smoothly.

AWS and other industry peers and researchers have already evaluated some post-quantum migration strategies. With proper prioritization and focus, we can address some of the most important applications and gradually transition the rest. There might be some applications that will have no clear path to a post-quantum future, but most will. At AWS, we are committed to making the transitions necessary to protect our customer data against future threats.

What are you currently working on that you look forward to sharing with customers’?

I’m currently focused on bringing post-quantum algorithms to our customers’ cryptographic use cases. I’m looking into the challenges that this upcoming migration will bring and participating in standards and industry collaborations that will hopefully enable a simpler transition for everyone.

I also engage on various topics with our cryptographic libraries teams (for example, AWS-LC and s2n-tls). We build these libraries with security and performance in mind, and they are used in software across AWS.

Additionally, I work with some AWS service teams to help enable compliance with various cryptographic requirements and regulations.

Is there something you wish customers would ask you about more often?

I wish customers asked more often about provable security and how to integrate such solutions in their software. This is a fascinating field that can prevent serious issues where cryptography can go wrong. It’s a complicated topic. I would like for customers to become more aware of the importance of provable security especially in open-source software before adopting it in their solutions. Using provably secure software that is designed for performance and compliance with crypto requirements is beneficial to everyone.

I also wish customers asked more about why AWS made certain choices when deploying new mechanisms. In areas of active research, it’s often simpler to experimentally build a proof-of-concept of a new mechanism and test and prove its performance in a controlled benchmark scenario. On the other hand, it’s usually not trivial to deploy new solutions at scale (especially given the size and technological breadth of AWS), to help ensure backwards compatibility, commit to supporting these solutions in the long run, and make sure they’re suitable for various uses. I wish I had more opportunities to go over with customers the effort that goes into vetting and deploying new mechanisms at scale.

You have frequently contributed to cybersecurity publications, what is your favorite recent article and why?

I’m excited about a vision paper that I co-authored with Tancrède Lepoint called Do we need to change some things? Open questions posed by the upcoming post-quantum migration to existing standards and deployments. We are presenting this paper at the Security Standardisation Research Conference 2023. The paper discussed some open questions posed by the upcoming post-quantum transition. It also proposed some standards updates and research topics on cryptographic issues that we haven’t addressed yet.

How about outside of work—any hobbies?

I used to play basketball when I was younger, but I no longer have time. I spend most of my time with my family and little toddlers who have infinite amounts of energy. When I find an opportunity, I like reading books and short stories or watching quality films.