Modernizing a familiar approach to REST APIs, with PostgreSQL and Cloudflare Workers

Post Syndicated from Kristian Freeman original https://blog.cloudflare.com/modernizing-a-familiar-approach-to-rest-apis-with-postgresql-and-cloudflare-workers/

Modernizing a familiar approach to REST APIs, with PostgreSQL and Cloudflare Workers

Postgres is a ubiquitous open-source database technology. It contains a vast number of features and offers rock-solid reliability. It’s also one of the most popular SQL database tools in the industry. As the industry builds “modern” developer experience tools—real-time and highly interactive—Postgres has also served as a great foundation. Projects like Hasura, which offers a real-time GraphQL engine, and Supabase, an open-source Firebase alternative, use Postgres under the hood. This makes Postgres a technology that every developer should know, and consider using in their applications.

For many developers, REST APIs serve as the primary way we interact with our data. Language-specific libraries like pg allow developers to connect with Postgres in their code, and directly interact with their databases. Yet in almost every case, developers reinvent the wheel, building the same connection logic on an app-by-app basis.

Many developers building applications with Cloudflare Workers, our serverless functions platform, have asked how they can use Postgres in Workers functions. Today, we’re releasing a new tutorial for Workers that shows how to connect to Postgres inside Workers functions. Built on PostgREST, you’ll write a REST API that communicates directly with your database, on the edge.

This means that you can entirely build applications on Cloudflare’s edge — using Workers as a performant and globally-distributed API, and Cloudflare Pages, our Jamstack deployment platform, as the host for your frontend user interface. With Workers, you can add new API endpoints and handle authentication in front of your database without needing to alter your Postgres configuration. With features like Workers KV and Durable Objects, Workers can provide globally-distributed caching in front of your Postgres database. Features like WebSockets can be used to build real-time interactions for your applications, without having to migrate from Postgres to a new database-as-a-service platform.

PostgREST is an open-source tool that generates a standards-compliant REST API for your Postgres databases. Many growing database-as-a-service startups like Retool and Supabase use PostgREST under the hood. PostgREST is fast and has great defaults, allowing you to access your Postgres data using standard REST conventions.

It’s great to be able to access your database directly from Workers, but do you really want to expose your database directly to the public Internet? Luckily, Cloudflare has a solution for this, and it works great with PostgREST: Cloudflare Tunnel. Cloudflare Tunnel is one of my personal favorite products at Cloudflare. It creates a secure tunnel between your local server and the Cloudflare network. We want to expose our PostgREST endpoint, without making our entire database available on the public internet. Cloudflare Tunnel allows us to do that securely.

Modernizing a familiar approach to REST APIs, with PostgreSQL and Cloudflare Workers

By using PostgREST with Postgres, we can build REST API-based applications. In particular, it’s an excellent fit for Cloudflare Workers, our serverless function platform. Workers is a great place to build REST APIs. With the open-source JavaScript library postgrest-js, we can interact with a PostgREST endpoint from inside our Workers function, using simple JS-based primitives.

By the way — if you haven’t built a REST API with Workers yet, check out our free video course with Egghead: “Building a Serverless API with Cloudflare Workers”.

Scaling applications built on Postgres is an incredibly common problem that developers face. Often, this means duplicating your Postgres database and distributing reads between your primary database, and a fleet of “read replicas”. With PostgREST and Workers, we can begin to explore a different approach to solving the scaling problem. Workers’ unique architecture allows us to deploy hyper-performant functions in front of Postgres databases. With tools like Workers KV and Durable Objects, exposed in Workers as basic JavaScript APIs, we can build intelligent caches for our databases, without sacrificing performance or developer experience.

If you’d like to learn more about building REST APIs in Cloudflare Workers using PostgREST, check out our new tutorial! We’ve also provided two open-source libraries to help you get started. cloudflare/postgres-postgrest-cloudflared-example helps you set up a Cloudflare Tunnel-backed Postgres + PostgREST endpoint. postgrest-worker-example is an example of using postgrest-js inside of Cloudflare Workers, to build REST APIs with your Postgres databases.

Modernizing a familiar approach to REST APIs, with PostgreSQL and Cloudflare Workers

With postgrest-js, you can build dynamic queries and request data from your database using the JS primitives you know and love:

// Get all users with at least 100 followers
const { data: users, error } = await client
.from('users')
.select(‘*’)
.gte('followers', 100)

You can also join our Cloudflare Developers Discord community! Learn more about what you can build with Cloudflare Workers, and meet our wonderful community of developers from around the world. Get your invite link here.

Meet Laura Kampf: Wood and metalworker

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/meet-laura-kampf-wood-and-metalworker/

Laura Kampf, the Köln-based wood and metalworker with a mild tiny house and Leatherman obsession sat down (virtually) with Alex Bate to talk about prison tattoo machines, avoiding your nightmares, and why aggressive hip-hop and horror movies inspire her weekly project builds.

Smudo the workshop dog was also there, which seems to be becoming a recurring and pleasant feature of HackSpace magazine interviews.

laura kampf
In five years, Laura has uploaded over 200 videos to YouTube

Alex: Your videos feel very unique in how they’re produced. It feels as though we’re in the workshop watching you get on with your day. That you’d be doing this regardless of whether the camera was there or not.

Laura: Yeah, that’s absolutely it. I mean, I document it for YouTube because I’m aware that this is the only place for me. And the documentation, that’s the work part, like setting up the camera, thinking about the story. But the physical work of building something, that’s a form of meditation. That’s just my happy place. And I know I have to document my work because I have to do something to make a living, right? I can’t just play. So YouTube is my work. But making is just, it’s just what I do, and I feel more and more that this is the only place for me.

And this is probably how musicians feel when they are performing on stage. You know, this – being in my shop, I feel so comfortable. And I feel so good. I don’t have that anywhere else. 

Subscribe to Laura Kampf on YouTube

I remember seeing that you went to design school. Is that where your journey as a maker started or does creativity run in your family? I know your brother is creative  (instagram.com/zooburger), but what about your parents?

My brother is super-creative, but my parents, not so much. My grandfather was an engineer. So I think 
it kind of skipped a generation.

In design school, there was a project where we had to build something out of everyday objects. And it was for us, the designers, to get away from the computers and just do something with our hands. I built a tattoo machine, like a prison-style tattoo machine. And I was hooked. I remember coming home and I was so moved by the whole thing. Even though the machine looks terrible, everything fell into place. 

Because all my life, people were telling me you need to find this one thing that you’re really good at and then just keep doing that. I think it’s also a German thing, you know, like, be perfect at one thing, and then you’ll be the best in your field. And I could never focus on one thing and building this tattoo machine; there were so many different things coming together. 

I had all this interest in so many different fields and I could use them for the project – I enjoyed drawing fonts and learned how to do old-school tattoo lettering, and I could do a little bit of electronics to hook up a switch. All these things, I thought it was super-interesting. It was the first time I could just use little bits of everything I knew to make something that was really cool, and I was hooked after that.

laura kampf
Laura works mostly with wood and metal in her weekly videos

Have you tattooed yourself with the tattoo machine?

I wanted to and then, thank God, because I was really young, it’s very likely that I would have done that, a tattoo artist came by and I showed him the machine, and he was like, “Don’t do it. It’s running way too fast. You will make mincemeat out of your skin”. But I bought pigs legs and pig’s ears and tattooed them. I couldn’t eat pig for probably two years after that. It was so warm, and tattooing the piece for a couple hours, the fat was running out of it – it was disgusting.

It’s interesting that if you look at the stuff that you’re making now, the anchor point that started all of this is a prison tattoo machine.

Looking back, I remember the little things that I made; when I showed them to people, they just didn’t show the same excitement for them as I did. And it was such a disappointment until I realised that no, the stuff I was making was really bad. That’s why no one was excited, because I didn’t know what I was doing. Once I got better and better, and especially with YouTube and talking to the community – well, I’m preaching to the choir here; everyone knows making is fantastic, and we have a very focused, niche community – and they get it.

laura kampf dog
All good workshops need a dog

Do you feel like a bit of a sense of responsibility being a woman in this community, being queer in this community? Two of the things that are a minority in this field. Do you feel that affects your work at all? 

I didn’t to begin with, I have to say. In the beginning, I felt more that it wasn’t about me, it was about the things that I make, and my sexuality and my gender don’t play a role in this. I don’t think about my sexuality all day long; I don’t think about the fact I’m a girl all day long, so why would it be in my videos? But I have to say that I changed my mind about these things. Because visibility is really important.

I had this really weird experience at the 10 Maker event a few years ago. I was wearing this T-shirt I got for free on one Christopher Street Day, it says ‘Gay Okay’. I love that shirt; it’s a really nice fit. I went to get some groceries with Brett from Skull and Spade and Hassan from HABU, and there was this girl, maybe eleven or twelve years old, and she saw me wearing that shirt, hanging out with regular dudes, doing regular stuff in a regular supermarket, and her jaw dropped. We were in the countryside, you don’t see things like rainbow flags there. And I could tell she’s maybe gay too, and it was so good for her to see that. There’s nothing different about you – you can still hang out with guys, you can still,   you know, go shopping and all these things. That’s when I realised, institutions like Christopher Street Day are so important, but it’s also important to just have it integrated into regular stuff, not just special occasions. Today’s International Women’s Day? Well, we need to celebrate girls every day; every day you need to celebrate these things. 

So, I kind of made it a habit to have rainbow flags in my videos. Not every video, and never super-obvious, but in the background, when I talk to the camera sometimes. I do wear my Gay Okay shirt every once in a while. I don’t want to make it a point because people like to put you in drawers. And, once you’re the queer maker, you’re the queer maker, and that’s all people want to talk about. And I don’t want that because I still think, at the end of the day, it’s about the things I build and not about me and my sexuality and gender. But, yeah, to just sprinkle it in every once in a while, I think it’s very important.

I don’t get much negativity about this. I was surprised, pleasantly so, obviously, but yeah, a couple of days ago, I wore my Gay Okay t-shirt in my Instagram Stories, and people applauded me for it, and that’s really interesting. I would never have thought that.

laura kampf
It might not look like it now, but this will become a pub on wheels

Do you get much trolling at all? Or are you spared from it?

I think, at the beginning of my YouTube career, I was growing really fast and really, like, exponentially. And I had a couple of videos that went viral, like the beer bike, that went outside of the community. For those viral videos, you get negativity. They don’t know who you are, they don’t know the context, they don’t know what I’m doing. That’s why I hate having viral videos. It brings in the worst. I like to be in this little lake, surrounded by my followers.

A few people have said that, actually. That it’s the worst. It’s the thing everybody aims for and then, when you get there, you wish you weren’t.

Yeah, they take you out of context. Those people, they see one of my videos, they don’t know that I’m building something. And that’s another interesting thing that your community learns about you. They know I build something every week for the past six years. Every week, it can’t be the Holy Grail every freaking week. Sometimes it’s bad, but it’s stuff that I did that week – it’s documentation.

When I was a kid, I remember my mind was blown that The Simpsons had a different intro every episode. Something different happens every time. I couldn’t believe that, and how much work went into it. I think it primed me for being a weekly creator.

The tattoo machine that started it all
The tattoo machine that started it all

It’s impressive. There aren’t a lot of makers releasing weekly videos, and many that do are releasing build videos in weekly parts. And you just come along and go ta-da!

Haha, but not every video is a good idea. Some of them are really bad ideas. But that’s my privilege, you know, that I can still do that. Because I have to, otherwise there wouldn’t be a video, and I love that because the pressure helps me keep going. And the process is the same. It doesn’t matter if you’re building a tiny house or a scratch post for a cat. The process is me, being in the shop, listening to podcasts, listening to music, enjoying my tools, playing with the material – it’s all the same, it doesn’t really matter. 

So, the public bench stuff that I’ve been doing lately, I get so many questions like, “Oh no, how could you leave the bench” and, like, I don’t give a damn about the bench. It’s not the bench, it’s the process that I enjoy. I could literally throw everything that I build away – I could throw it in the trash right away. I wouldn’t mind. I’m so focused on the process.

I was going to ask you about the bench, because it was recently vandalised and so you made another one. Most people would probably just not, would just raise their hands in defeat and leave it. But you just made it again. 

I was expecting it to break eventually. And, to be honest, I was kinda hoping for it because I wanted to do it again. And, this time, I’m actually hoping for it to get broken again because I want to do it again.

Laura’s bike frame cup holder

I may be making this up, but I’m sure you once mentioned that it’s illegal to sell furniture in Germany unless you’re registered. Is that right?

Yeah, it’s a very broad description of this, but the craftsmanship in Germany is of a very high standard, right? At least we like to think so. So, if you want to be a carpenter, you’re first an apprentice for three years or so, then you can be a carpenter and work under a master carpenter. If you want to educate other apprentices, or if you want to sell certain furniture, I think chairs is one of them, then you have to be a master. And it’s the same for every field. I think the most plausible is electricians. If you are not a master electrician, you cannot, say, make a lamp and sell it.

But my interest is so general. I wanted to make lamps, but the notion of designing a lamp that’s made out of wood and then obviously has electricity in it, it’s just impossible. 

I spoke to the TÜV and asked them, if I design a lamp and want to sell it in a   store, how do I do it. And I would have to get it checked by their institution, which is a couple of hundred euros, and get a certificate. But I would have to do this for the next lamp design, and the next. And that makes them so expensive. I can’t sell a lamp for 150 euros if it costs me more than that to get it checked. I’m not interested in mass production, I want to make one-off pieces. 

I had already quit my job when I discovered this and remember having a big knot in my stomach thinking, ‘what do I do?’, and YouTube was the answer. 

Could you not use YouTube as a way of selling lamps? It’s not a lamp, it’s a video prop?

Yeah, there are loopholes – this is not a lamp, this is art. But, when I quit my job to become a self-employed lamp seller, I really only quit my job because I hated working for other people, not because it was my dream to sell furniture and lamps. I didn’t know YouTube really existed as a thing for me, and once I figured out people were actually making money off this, I was like, OK, I need to get a camera, I need to give this a try. Because that would be better than building stuff to sell it. I wasn’t interested in selling stuff. I don’t want clients. I don’t want that pressure from anyone else except me, so YouTube worked out perfectly for me. 

How to build a tattoo machine from scratch – one of Laura’s most popular videos

The job you quit was as a Display Artist for Urban Outfitters, if I remember correctly? Designing displays within a store. That sounded like a brilliant job.

It was. It was a great job, but it wasn’t for me. It was probably the perfect job, but I am not a good employee. I was asked a couple of years ago if I would do a talk about my career and how I made this job for myself and followed my dreams, blah, blah. I don’t like ‘follow your dreams’. It was the other way around. I avoided my nightmares. That’s how I got here. I never dreamed of this, I didn’t know this existed. So, I think avoiding your nightmares is much more efficient than following your dreams.

With your design school background, when you create something, how much of that project is art over functionality? Dovetails versus pocket holes for instance. 

It’s more, and this is hard to explain, but I have this internal measuring unit of how much work should go into a project. I know how much time I can put into a project, and there’s this bucket of work I’ve put into it, and depending on how full the bucket is determines how the project looks and whether I use pocket holes or dovetails, for example.

You work a lot with wood and with metal, as well as a few other materials, all of which require their own set of skills. Where have you learned all your techniques?

All YouTube. That’s the cool thing. It’s all full circle. There are some things – I had a couple of jobs where I learned some skills. I worked as a flight case builder for three years, just filling those black flight cases. Which sounds very, very trivial. It’s not though, It’s crazy. You have to work so precisely, otherwise, the catches won’t close and all these things, and everything is building boxes. So I learned a bunch of stuff there. It was my Karate Kid apprenticeship. But a lot of it is YouTube. I remember watching Jimmy DiResta – I saw his TV show online, and then I watched a bunch of his videos without realising it was the same guy. Eventually, I noticed he had a weekly schedule and a podcast, and it was all exactly what I needed to see and hear. Right when I quit my job and I couldn’t sell lamps, there were these people telling me that they do this for a living. It was perfect timing. I feel like I’m the second generation YouTuber and they’re the first. 

Laura’s cargo bike

As well as those makers, what else influences your work?

I like to listen to a lot of hip-hop, like super-aggressive hip-hop that is the complete opposite of me and has nothing to do with my world. And I like to watch horror movies, super-scary and bloody horror movies. I like to explore the opposite of what I have. A view into a completely different world. The Fantasy Filmfest is a huge inspiration for me. These movies that go right to DVD; they don’t go into the big theatres. I like to think about how they got made? How did they think of that? That’s the biggest inspiration. And, with hip-hop, the personas, and why they feel like they do, and how do they come up with those lines. They’re in their own universe, they have their own rules. I just love that. It’s how I feel when I’m building stuff. I’m telling myself a story that I don’t know the ending of. I don’t like to make sketches, I don’t like to know if it works. If I see someone else had the idea and did a full video about it, I don’t even want to do the idea anymore. I want to have that unknown. This is the idea, this is the stuff you have, now try to make it happen.

Is there anything still on the list? Projects you still want to work on?

I don’t know if you saw it on Instagram, but I bought a Multicar. It’s so good. It’s the slowest car ever; it is painfully slow – 45 kilometres an hour and that’s it. But it has torque; you can tow pretty much everything with it. So my plan is to take the world’s smallest pub that I built a couple months ago and put it on at the back of the Multicar.

Something is holding me back at the moment, though. I have all the parts, I should be able to do it, but I don’t know what it is. I experience that quite often – I have an idea, and everything should be good to go, but I’m not doing it. And then, eventually, it turns out I wasn’t sure about the colour, or something else that was missing that I didn’t know at the time. So I don’t push it. But that’s the project I’m looking forward to.

Do you think you’ll ever just get to the point where you’re going to stop doing weekly videos? Or is this you for life?

I don’t know. Like, that’s the one thing that I’m really scared of, like, what happens when I get sick? Because at the beginning of the year, I hired my best friend. So now we’re both relying on my mental and physical health. So I think it’s a good idea to broaden stuff and have more income streams. I love doing the TV stuff [Laura recently started presenting a new TV show], because whenever I’m working with the TV people, I think, like, oh man, I love YouTube. And, when I do too much YouTube, I start really looking forward to working with actual professionals again. It’s a cool balance. I kind of hope that I can keep doing this. You know, I think it’s really cool. And as I said, there’s no other place for me. Where would I go?

Clever keyring with screwdriver

You have YouTube, you have TV, you have your podcast, and you sell merchandise. Is there anything left?

I think I would like to actually have a couple of products now. Some of the furniture I’m building, if you look at them in a different context to ‘this is just what I built this week’ and is only the product of seven days’ worth of work, I think some of those ideas aren’t that bad. And if you put some more work into them, they could be pieces that would sell. But I would want someone who takes the prototypes and does the whole production for me. I’m not interested in all that. But I think it would be cool to have a line of plywood furniture.

So we won’t be seeing a run of the Laura Kampf bench across Köln?

A newspaper interviewed me about the bench. And for the interview, they also approached the city saying, hey, wouldn’t you want to work with her? And you know, maybe collaborate on this because this might be a cool thing. And they said that they don’t have the personnel. But honestly, if they would have done it, that would have made it so boring. Working with somebody in an office telling me where the broken benches are so I can go and fix them. That’s a job.

Laura’s beer bike

Is there anything you’ve ever made that you haven’t wanted to share? A build just for you?

Until I hit publish, I feel like that every week. It feels like I’m just making it for myself. I talk very positively about YouTube, and that’s genuinely how I feel about it, but sometimes it’s really hard on me because I’ll work seven days on a video, think it’s the best thing I ever did, and it makes me so happy. I’ll edit it for hours, sink all this time into it, all this energy, and then the video tanks, and it kinda ruins it for me. I’m in a super-good mood right now because the bench video did so well, and people understand what I’m trying to say. But, there are other cases where it doesn’t work as well, and where I feel like I’ve dropped the ball and couldn’t get my excitement across. And that’s always super-disappointing because I’m always excited about the stuff I make; I always have some angle I find super-interesting, otherwise, I’m not motivated to do it. And, when the video tanks, it makes me feel like I lost the opportunity to spread that excitement, to spread that motivation, and that feels like I wasted my time. And that’s the downside of YouTube. 

I mean, I think every creator takes it in a different way. And you need to find a way to deal with this, and it’s really important to talk about it. This is my dream job and I can do whatever I want to do as long as I don’t drop the ball. I hired my friend, so now I can’t drop the ball for the both of us.

Laura appeared in our video “How do you define ‘maker’?”

Laura Kampf produces a video every Sunday on her YouTube channel. You can also follow her on Instagram and, for any German-speaking readers, her podcast – Raabe & Kampf – with friend and journalist Melanie Raabe can be found wherever you listen to podcasts. 

HackSpace magazine issue 45 out NOW!

Each month, HackSpace magazine brings you the best projects, tips, tricks and tutorials from the makersphere. You can get it from the Raspberry Pi Press online store or your local newsagents.

Hack space magazine issue 45 front cover

As always, every issue is free to download from the HackSpace magazine website.

The post Meet Laura Kampf: Wood and metalworker appeared first on Raspberry Pi.

Using VPC Endpoints in Multi-Region Architectures with Route 53 Resolver

Post Syndicated from Michael Haken original https://aws.amazon.com/blogs/architecture/using-vpc-endpoints-in-multi-region-architectures-with-route-53-resolver/

Many customers are building multi-Region architectures on AWS. They might want to bring their systems closer to their end users, support disaster recovery (DR), or comply with data sovereignty requirements. Often, these architectures use Amazon Virtual Private Cloud (VPC) to host resources like Amazon EC2 instances, Amazon Relational Database Service (RDS) databases, and AWS Lambda functions. Typically, these VPCs are also connected using VPC peering or AWS Transit Gateway.

Within these VPC networks, customers also use AWS PrivateLink to deploy VPC endpoints. These endpoints provide private connectivity between VPCs and AWS services. They also support endpoint policies that allow customers to implement guardrails. As an example, customers frequently use endpoint policies to ensure that only IAM principals in their AWS Organization are accessing resources from their networks.

The challenge some customers have faced is that VPC endpoints can only be used to access resources in the same Region as the endpoint. For example, an Amazon Simple Storage Service (S3) VPC endpoint deployed in us-east-1 can only be used to access S3 buckets also located in us-east-1. To access a bucket in us-east-2, that traffic has to traverse the public internet. Ideally, customers want to keep this traffic within their private network and apply VPC endpoint policies, regardless of the Region where the resource is located.

Amazon Route 53 Resolver to the rescue

One of the ways we can solve this problem is with Amazon Route 53 Resolver. Route 53 Resolver provides inbound and outbound DNS services in a VPC. It allows you to resolve domain names for AWS resources in the Region where the resolver endpoint is deployed. It also allows you to forward DNS requests to other DNS servers based on rules you define. To consistently apply VPC endpoint policies to all traffic, we use Route 53 Resolver to steer traffic to VPC endpoints in each Region.

Figure 1. A multi-Region architecture with Route 53 Resolver and S3 endpoints

Figure 1. A multi-Region architecture with Route 53 Resolver and S3 endpoints

In this example shown in Figure 1, we have a workload that operates in us-east-1. It must access Amazon S3 buckets in us-east-2 and us-west-2. There is a VPC in each Region that is connected via VPC peering to the one in us-east-1. We’ve also deployed an inbound and outbound Route 53 Resolver endpoint in each VPC.

Finally, we also have Amazon S3 interface VPC endpoints in each VPC. These provide their own unique DNS names. They can be resolved to private IP addresses using VPC provided DNS (using the .2 address or 169.254.169.253 address) or the inbound resolver IP addresses.

When the EC2 instance accesses a bucket in us-east-1, the Route 53 Resolver endpoint resolves the DNS name to the private IP address of the VPC endpoint. However, without an outbound rule, a DNS query for a bucket in another Region like us-east-2 would resolve to the public IP address of the S3 service. To solve this, we’re going to add four outbound rules to the resolver in us-east-1.

  • us-west-2.amazonaws.com
  • us-west-2.vpce.amazonaws.com
  • us-east-2.amazonaws.com
  • us-east-2.vpce.amazonaws.com

These rules will forward the DNS request to the appropriate inbound Route 53 Resolver in the peered VPC. When there isn’t a VPC endpoint deployed for a service, the resolver will use its automatically created recursive rule to return the public IP address. Let’s look at how this works in Figure 2.

Figure 2. The workflow of resolving an out-of-Region S3 DNS name

Figure 2. The workflow of resolving an out-of-Region S3 DNS name

  1. The EC2 instance runs a command to list a bucket in us-east-2. The DNS request first goes to the local Route 53 Resolver endpoint in us-east-1.
  2. The Route 53 Resolver in us-east-1 has an outbound rule matching the bucket’s domain name. This forwards all DNS queries for the domain us-east-2.vpce.amazonaws.com to the inbound Route 53 Resolver in us-east-2.
  3. The Route 53 Resolver in us-east-2 responds with the private IP address of the S3 interface VPC endpoint in its VPC. This is then returned to the EC2 instance.
  4. The EC2 instance sends the request to the S3 interface VPC endpoint in us-east-2.

This pattern can be easily extended to support any Region that your organization uses. Add additional VPCs in those Regions to host the Route 53 Resolver endpoints and VPC endpoints. Then, add additional outbound resolver rules for those Regions. You can also support additional AWS services by deploying VPC endpoints for them in each peered VPC that hosts the inbound Route 53 Resolver endpoint.

This architecture can be extended to provide a centralized capability to your entire business instead of supporting a single workload in a VPC. We’ll look at that next.

Scaling cross-Region VPC endpoints with Route 53 Resolver

In Figure 3, each Region has a centralized HTTP proxy fleet. This is located in a dedicated VPC with AWS service VPC endpoints and a Route 53 Resolver endpoint. Each workload VPC in the same Region connects to this VPC over Transit Gateway. All instances send their HTTP traffic to the proxies. The proxies manage resolving domain names and forwarding the traffic to the correct Region. Here, each Route 53 Resolver supports inbound DNS requests from other VPCs. It also has outbound rules to forward requests to the appropriate Region. Let’s walk through how this solution works.

Figure 3. Using Route 53 Resolver endpoints with central HTTP proxies

Figure 3. Using Route 53 Resolver endpoints with central HTTP proxies

  1. The EC2 instance in us-east-1 runs a command to list a bucket in us-east-2. The HTTP request is sent to the proxy fleet in the same Region.
  2. The proxy fleet attempts to resolve the domain name of the bucket in us-east-2. The Route 53 Resolver in us-east-1 has an outbound rule for the domain us-east-2.vpce.amazonaws.com. This rule forwards the DNS query to the inbound Route 53 Resolver in us-east-2. The Route 53 Resolver in us-east-2 responds with the private IP address of the S3 interface endpoint in its VPC.
  3. The proxy server sends the request to the S3 interface endpoint in us-east-2 over the Transit Gateway connection. VPC endpoint policies are consistently applied to the request.

This solution (Figure 3) scales the previous implementation (Figure 2) to support multiple workloads across all of the in-use Regions. And it does this without duplicating VPC endpoints in every VPC.

If your environment doesn’t use HTTP proxies, you could alternatively deploy Route 53 Resolver outbound endpoints in each workload VPC. In this case, you have two options. The outbound rules can forward the DNS requests directly to the cross-Region inbound resolver, like in the Figure 2. Or, there can be a single outbound rule to forward the DNS requests to a central inbound resolver in the same Region (see Figure 3). The first option reduces dependencies on a centralized service. The second option provides reduced management overhead of the creation and updates to outbound rules.

Conclusion

Customers want a straightforward way to use VPC endpoints and endpoint policies for all Regions uniformly and consistently. Route 53 Resolver provides a solution using DNS. This ensures that requests to AWS services that support VPC endpoints stay within the VPC network, regardless of their Region.

Check out the documentation for Route 53 Resolver to learn more about how you can use DNS to simplify using VPC endpoints in multi-Region architectures.

Linux Kernel Security Done Right (Google Security Blog)

Post Syndicated from original https://lwn.net/Articles/865102/rss

Over on the Google Security Blog, Kees Cook describes his vision for approaches to assuring kernel security in a more collaborative way. He sees a number of areas where companies could work together to make it easier for everyone to use recent kernels rather than redundantly backporting fixes to older kernel versions. It will take more engineers working on things like testing and its infrastructure, security tool development, toolchain improvements for security, and boosting the number of kernel maintainers:

Long-term Linux robustness depends on developers, but especially on effective kernel maintainers. Although there is effort in the industry to train new developers, this has been traditionally justified only by the “feature driven” jobs they can get. But focusing only on product timelines ultimately leads Linux into the Tragedy of the Commons. Expanding the number of maintainers can avoid it. Luckily the “pipeline” for new maintainers is straightforward.

Maintainers are built not only from their depth of knowledge of a subsystem’s technology, but also from their experience with mentorship of other developers and code review. Training new reviewers must become the norm, motivated by making upstream review part of the job. Today’s reviewers become tomorrow’s maintainers. If each major kernel subsystem gained four more dedicated maintainers, we could double productivity.

PetitPotam: Novel Attack Chain Can Fully Compromise Windows Domains Running AD CS

Post Syndicated from Caitlin Condon original https://blog.rapid7.com/2021/08/03/petitpotam-novel-attack-chain-can-fully-compromise-windows-domains-running-ad-cs/

PetitPotam: Novel Attack Chain Can Fully Compromise Windows Domains Running AD CS

Late last month (July 2021), security researcher Topotam published a proof-of-concept (PoC) implementation of a novel NTLM relay attack christened “PetitPotam.” The technique used in the PoC allows a remote, unauthenticated attacker to completely take over a Windows domain with the Active Directory Certificate Service (AD CS) running — including domain controllers.

PetitPotam works by abusing Microsoft’s Encrypting File System Remote Protocol (MS-EFSRPC) to trick one Windows host into authenticating to another over LSARPC on TCP port 445. Successful exploitation means that the target server will perform NTLM authentication to an arbitrary server, allowing an attacker who is able to leverage the technique to do… pretty much anything they want with a Windows domain (e.g., deploy ransomware, create nefarious new group policies, and so on). The folks over at SANS ISC have a great write-up here.

According to Microsoft’s ADV210003 advisory, Windows users are potentially vulnerable to this attack if they are using Active Directory Certificate Services (AD CS) with any of the following services:

  • Certificate Authority Web Enrollment
  • Certificate Enrollment Web Service

NTLM relay attacks aren’t new — they’ve been around for decades. However, a few things make PetitPotam and its variants of higher interest than your more run-of-the-mill NTLM relay attack. As noted above, remote attackers don’t need credentials to make this thing work, but more importantly, there’s no user interaction required to coerce a target domain controller to authenticate to a threat actor’s server. Not only is this easier to do — it’s faster (though admittedly, well-known tools like Mimikatz are also extremely effective for gathering domain administrator-level service accounts). PetitPotam is the latest attack vector to underscore the fundamental fragility of the Active Directory privilege model.

Microsoft released an advisory with a series of updates in response to community concern about the attack — which, as they point out, is “a classic NTLM relay attack” that abuses intended functionality. Users concerned about the PetitPotam attack should review Microsoft’s guidance on mitigating NTLM relay attacks against Active Directory Certificate Services in KB500413. Since it looks like Microsoft will not issue an official fix for this vector, community researchers have added PetitPotam to a running list of “won’t fix” exploitable conditions in Microsoft products.

The PetitPotam PoC is already popular with red teams and community researchers. We expect that interest to increase as Black Hat brings further scrutiny to Active Directory Certificate Services attack surface area.

Mitigation Guidance

In general, to prevent NTLM relay attacks on networks with NTLM enabled, domain administrators should ensure that services that permit NTLM authentication make use of protections such as Extended Protection for Authentication (EPA) coupled with “Require SSL” for affected virtual sites, or signing features such as SMB signing. Implementing “Require SSL” is a critical step: Without it, EPA is ineffective.

As an NTLM relay attack, PetitPotam takes advantage of servers on which Active Directory Certificate Services (AD CS) is not configured with the protections mentioned above. Microsoft’s KB5005413: Mitigating NTLM Relay Attacks on Active Directory Certificate Services (AD CS) emphasizes that the primary mitigation for PetitPotam consists of three configuration changes (and an IIS restart). In addition to primary mitigations, Microsoft also recommends disabling NTLM authentication where possible, starting with domain controllers.

In this order, KB5005413 recommends:

  • Disabling NTLM Authentication on Windows domain controllers. Documentation on doing this can be found here.
  • Disabling NTLM on any AD CS Servers in your domain using the group policy Network security: Restrict NTLM: Incoming NTLM traffic. For step-by-step directions, see KB5005413.
  • Disabling NTLM for Internet Information Services (IIS) on AD CS Servers in your domain running the “Certificate Authority Web Enrollment” or “Certificate Enrollment Web Service” services.

While not included in Microsoft’s official guidance, community researchers have tested using NETSH RPC filtering to block PetitPotam attacks with apparent success. Rapid7 research teams have not verified this behavior, but it may be an option for blocking the attack vector without negatively impacting local EFS functionality.

Rapid7 Customers

We are investigating approaches for adding assessment capabilities to InsightVM and Nexpose to determine exposure to PetitPotam relay attacks.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

[$] New features in Neovim 0.5

Post Syndicated from original https://lwn.net/Articles/864712/rss

Neovim 0.5, the fifth major version of the Neovim
editor
, which descends from the venerable vi
editor by way of Vim, was
released
on July 2. This release is the culmination of almost two years of work,
and it comes with some major features that aim to modernize the editing
experience significantly. Highlights include native support for the Language
Server Protocol (LSP), which enables advanced editing features for a wide variety of
languages, improvements to
its Lua APIs for configuration and plugins, and better syntax highlighting
using Tree-sitter. Overall, the 0.5 release
is a solid upgrade for the editor; the improvements should
please the existing fan base and potentially draw in new users and contributors
to the project.

Building well-architected serverless applications: Building in resiliency – part 1

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-well-architected-serverless-applications-building-in-resiliency-part-1/

This series of blog posts uses the AWS Well-Architected Tool with the Serverless Lens to help customers build and operate applications using best practices. In each post, I address the serverless-specific questions identified by the Serverless Lens along with the recommended best practices. See the introduction post for a table of contents and explanation of the example application.

Reliability question REL2: How do you build resiliency into your serverless application?

Evaluate scaling mechanisms for serverless and non-serverless resources to meet customer demand. Build resiliency into your workload to make your serverless application resilient to withstand partial and intermittent failures across components that may only surface in production.

Required practice: Manage transaction, partial, and intermittent failures

Whenever one service or system calls another, there is a chance that failures can happen. Services or systems often don’t fail as a single unit, but rather suffer partial or transient failures. Applications should be designed to handle component failures as part of the architecture. The system should be designed to detect failure and, ideally, automatically heal itself.

Transaction failures can occur when a component is unavailable or under high load. Partial failures can occur when a percentage of requests succeeds, including during batch processing. Intermittent failures might occur when a request fails for a short period of time due to network or other transient issues.

AWS serverless services, including AWS Lambda, are fault-tolerant and designed to handle failures. If a service invokes a Lambda function and there is a service disruption, Lambda invokes the function in a different Availability Zone.

When you invoke a function directly, you determine the strategy for handling errors. You can retry, send the event to a destination or queue for debugging, or ignore the error. Clients such as the AWS Command Line Interface (CLI) and the AWS SDK retry on client timeouts, throttling errors (429), and other errors that are not caused by a bad request.

When you invoke a function indirectly, you must be aware of the retry behavior of the invoker and any service that the request encounters along the way. For more information, see “Error handling and automatic retries in AWS Lambda”. You can configure Maximum Retry Attempts and Maximum Event Age for asynchronous invocations.

When reading from Amazon Kinesis Data Streams and Amazon DynamoDB Streams, Lambda retries the entire batch of items. Retries continue until the records expire or exceed the maximum age that you configure on the event source mapping. You can also configure the event source mapping to split a failed batch into two batches. Retrying with smaller batches isolates bad records and works around timeout issues.

Partial failures can occur in non-atomic operations. PutRecords for Kinesis and BatchWriteItem for DynamoDB return a successful response if at least one record is ingested successfully. Always inspect the response when using such operations and programmatically deal with partial failures.

Use exponential backoff with jitter

The simplest technique for dealing with failures in a networked environment is to retry calls until they succeed. This technique increases the reliability of the application and reduces operational costs for the developer.

However, it is not always safe to retry. A retry can further increase the load on the system being called if the system is already failing due to an overload. To avoid this problem, use backoff. Instead of retrying immediately and aggressively, the client waits some amount of time between tries. The most common pattern is an exponential backoff, which uses exponentially longer wait times between retries. This is typically capped to a maximum delay and number of retries.

If all backoff retries are still happening at the same time, this can still overload a system or cause contention. To avoid this problem, use jitter. Jitter adds some amount of randomness to the backoff to spread the retries around in time. This can help prevent large bursts by spreading out the rate when clients connect. For more information see the Amazon Builders’ Library article “Timeouts, retries, and backoff with jitter” and AWS Architecture blog post “Exponential Backoff And Jitter”.

Exponential backoff and jitter

Exponential backoff and jitter

When your application responds to callers in fail-fast scenarios and when performance is degraded, inform the caller via headers or metadata when they can retry.

Each AWS SDK implements automatic retry logic including exponential backoff. For downstream calls, you can adjust AWS and third-party SDK retries, backoffs, TCP, and HTTP timeouts. This helps you decide when to stop retrying. For more information, see the documentation and troubleshooting steps for Lambda and the AWS SDK.

Use a dead-letter queue mechanism to retain, investigate and retry failed transactions

There are a number of ways to handle message failures including destinations and dead-letter queues.

You can configure Lambda to send records of asynchronous invocations to another destination service. These include Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), Lambda, and Amazon EventBridge. You can configure separate destinations for events that fail processing and events that are successfully processed. The invocation record contains details about the event, the response, and the reason that the record was sent.

The following example shows a function that sends a record of a successful invocation to an EventBridge event bus. When an event fails all processing attempts, Lambda sends an invocation record to an SQS queue. It includes the function’s response in the invocation record.

AWS Lambda destinations for asynchronous invocation

AWS Lambda destinations for asynchronous invocation

SNS, SQS, Lambda, and EventBridge support dead-letter queues (DLQs). DLQs make your applications more resilient and durable by storing messages or events that can’t be processed correctly into a dedicated SQS queue. This helps you debug your application by isolating the problematic messages to determine why their processing failed. One you have resolved the issue, re-process the failed message. For more information, see “When should I use a dead-letter queue?” There is an example serverless application to redrive the messages from an SQS DLQ back to its source SQS queue.

For Lambda, DLQs provide an alternative to a failure destination. Lambda destinations is preferable for asynchronous invocations.

Good practice: Orchestrate long-running transactions

Long-running transactions can be processed by one or multiple components. Consider implementing the saga pattern using state machines for these types of transactions.

The saga pattern coordinates transactions between multiple microservices as part of a state machine. Each service that performs a transaction publishes an event to trigger the next transaction in the saga. This continues until the transaction chain is complete. If a transaction fails, saga orchestrates a series of compensating transactions that undo the changes that were made by the preceding transactions.

This is preferable to handling complex or long-running transactions within application code. State machines prevent cascading failures and avoid tightly coupling components with orchestrating logic and business logic.

Use a state machine to visualize distributed transactions, and to separate business logic from orchestration logic.

AWS Step Functions lets you coordinate multiple AWS services into serverless workflows via state machines. Within Step Functions, you can set separate retries, backoff rates, max attempts, intervals, and timeouts. These are set for every step of your state machine using a declarative language.

In the serverless airline example used in this series, Step Functions is used to orchestrate the Booking microservice. The ProcessBooking state machine handles all the necessary steps to create bookings, including payment.

Booking service Step Functions state machine

Booking service Step Functions state machine

The state machine uses a combination of service integrations using DynamoDB, SQS, and Lambda functions to coordinate transactions and handle failures.

For example, the Reserve Booking task invokes a Lambda function. The task has retry and error handling configured as part of the task definition.

"Reserve Booking": {
	"Type": "Task",
	"Resource": "${ReserveBooking.Arn}",
	"TimeoutSeconds": 5,
	"Retry": [
		{
			"ErrorEquals": [
				"BookingReservationException"
			],
			"IntervalSeconds": 1,
			"BackoffRate": 2,
			"MaxAttempts": 2
		}
	],
	"Catch": [
		{
			"ErrorEquals": [
				"States.ALL"
			],
			"ResultPath": "$.bookingError",
			"Next": "Cancel Booking"
		}
	],
	"ResultPath": "$.bookingId",
	"Next": "Collect Payment"
},

Step Functions supports direct service integrations, including DynamoDB. The Reserve Flight task directly updates the flightTable without requiring a Lambda function.

"Reserve Flight": {
	"Type": "Task",
	"Resource": "arn:aws:states:::dynamodb:updateItem",
	"Parameters": {
		"TableName.$": "$.flightTable",
		"Key": {
			"id": {
				"S.$": "$.outboundFlightId"
			}
		},
		"UpdateExpression": "SET seatCapacity = seatCapacity - :dec",
		"ExpressionAttributeValues": {
			":dec": {
				"N": "1"
			},
			":noSeat": {
				"N": "0"
			}
		},
		"ConditionExpression": "seatCapacity > :noSeat"
	},

By default, when a state reports an error, Step Functions causes the execution to fail entirely.

Utilize dead-letter queues in response to failed state machine executions

Any state within the Step Functions workflow can encounter runtime errors. These include state machine definition issues, task failures such as Lambda function exceptions, or transient issues such as network connectivity issues. For more information, see “Error handling in Step Functions”.

Use the Step Functions service integration with SQS to send failed transactions to a DLQ as the final step. This adds a higher level of durability within your state machines.

For example, the airline Notify Failed Booking final task catches failed states from four previous steps. It sends the results to the Booking DLQ.

Booking service Step Functions DLQ

Booking service Step Functions DLQ

The message includes the output of the previous failed states for further troubleshooting.

"Booking DLQ": {
	"Type": "Task",
	"Resource": "arn:aws:states:::sqs:sendMessage",
	"Parameters": {
		"QueueUrl": "${BookingsDLQ}",
		"MessageBody.$": "$"
	},
	"ResultPath": "$.deadLetterQueue",
	"Next": "Booking Failed"
},

The Step Functions documentation has more information on calling SQS.

Conclusion

Build resiliency into your workloads. This makes sure that your application can withstand partial and intermittent failures across components that may only surface in production.

In this post, I cover managing failures using retries, exponential backoff, and jitter. I explain how DLQs can isolate failed messages. I show how to use state machines to orchestrate long running transactions rather than handling these in application code.

This well-architected question continues in part 2 where I look at managing duplicate and unwanted events with idempotency and an event schema. I cover how to consider scaling patterns at burst rates by managing account limits and show relevant metrics to evaluate.

For more serverless learning resources, visit Serverless Land.

Secure connectivity patterns to access Amazon MSK across AWS Regions

Post Syndicated from Sam Mokhtari original https://aws.amazon.com/blogs/big-data/secure-connectivity-patterns-to-access-amazon-msk-across-aws-regions/

AWS customers often segment their workloads across accounts and Amazon Virtual Private Cloud (Amazon VPC) to streamline access management while being able to expand their footprint. As a result, in some scenarios you, as an AWS customer, need to make an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster accessible to Apache Kafka clients not only in the same Amazon VPC as the cluster but also in a remote Amazon VPC. A guest post by Goldman Sachs presented cross-account connectivity patterns to an MSK cluster using AWS PrivateLink. Inspired by the work of Goldman Sachs, this post demonstrates additional connectivity patterns that can support both cross-account and cross-Region connectivity to an MSK cluster. We also developed sample code that supports the automation of the creation of resources for the connectivity pattern based on AWS PrivateLink.

Overview

Amazon MSK makes it easy to run Apache Kafka clusters on AWS. It’s a fully managed streaming service that automatically configures, and maintains Apache Kafka clusters and Apache Zookeeper nodes for you. Amazon MSK lets you focus on building your streaming solutions and supports familiar Apache Kafka ecosystem tools (such as MirrorMaker, Kafka Connect, and Kafka streams) and helps avoid the challenges of managing the Apache Kafka infrastructure and operations.

If you have workloads segmented across several VPCs and AWS accounts, there may be scenarios in which you need to make Amazon MSK cluster accessible to Apache Kafka clients across VPCs.  To provide secure connection between resources across multiple VPCs, AWS provides several networking constructs. Let’s get familiar with these before discussing the different connectivity patterns:

  • Amazon VPC peering is the simplest networking construct that enables bidirectional connectivity between two VPCs. You can use this connection type to enable between VPCs across accounts and AWS Regions to communicate with each other using private IP addresses.
  • AWS Transit Gateway provides a highly available and scalable design for connecting VPCs. Unlike VPC peering that can go cross-Region, AWS Transit Gateway is a regional service, but you can use inter-Region peering between transit gateways to route traffic across Regions.

AWS PrivateLink is an AWS networking service that provides private access to a specific service instead of all resources within a VPC and without traversing the public internet. You can use this service to expose your own application in a VPC to other users or applications in another VPC via an AWS PrivateLink-powered service (referred to as an endpoint service). Other AWS principals can then create a connection from their VPC to your endpoint service using an interface VPC endpoint.

Amazon MSK networking

When you create an MSK cluster, either via the AWS Management Console or AWS Command Line Interface (AWS CLI), it’s deployed into a managed VPC with brokers in private subnets (one per Availability Zone) as shown in the following diagram. Amazon MSK also creates the Apache ZooKeeper nodes in the same private subnets.

The brokers in the cluster are made accessible to clients in the customer VPC through elastic network interfaces (ENIs) that appear in the customer account. The security groups on the ENIs dictate the source and type of ingress and egress traffic allowed on the brokers.

IP addresses from the customer VPC are attached to the ENIs, and all network traffic stays within the AWS network and is not accessible to the internet.

Connections between clients and an MSK cluster are always private.

This blog demonstrates four connectivity patterns to securely access an MSK cluster from a remote VPC. The following table lists these patterns and their key characteristics. Each pattern aligns with the networking constructs discussed earlier.

VPC Peering AWS Transit Gateway AWS PrivateLink with a single NLB

 

WS PrivateLink multiple NLB

 

Bandwidth Limited by instance network performance and flow limits. Up to 50 Gbps

10 Gbps per AZ

 

10 Gbps per AZ

 

Pricing Data transfer charge (free if data transfer is within AZs) Data transfer charge + hourly charge per attachment Data transfer charge + interface endpoint charge + Network load balancer charge Data transfer charge + interface endpoint charge + Network load balancer charge
Scalability Recommended for smaller number of VPCs No limit on number of VPCs No limit on number of VPCs No limit on number of VPCs

Let’s explore these connectivity options in more detail.

VPC peering

To access an MSK cluster from a remote VPC, the first option is to create a peering connection between the two VPCs.

Let’s say you use Account A to provision an MSK cluster in us-east-1 Region, as shown in the following diagram. Now, you have an Apache Kafka client in the customer VPC in Account B that needs to access this MSK cluster. To enable this connectivity, you just need to create a peering connection between the VPC in Account A and the VPC in Account B. You should also consider implementing fine-grained network access controls with security groups to make sure that only specific resources are accessible between the peered VPCs.

Because VPC peering works across Regions, you can extend this architecture to provide access to Apache Kafka clients in another Region. As shown in the following diagram, to provide access to Kafka clients in the VPC of Account C, you just need to create another peering connection between the VPC in Account C with the VPC in Account A. The same networking principles apply to make sure only specific resources are reachable. In the following diagram, a solid line indicates a direct connection from the Kafka client to MSK cluster, whereas a dotted line indicates a connection flowing via VPC peering.

VPC peering has the following benefits:*

  • Simplest connectivity option.
  • Low latency.
  • No bandwidth limits (it is just limited by instance network performance and flow limits).
  • Lower overall cost compared to other VPC-to-VPC connectivity options.

However, it has some drawbacks:

  • VPC peering doesn’t support transitive peering, which means that only directly peered VPCs can communicate with each other.
  • You can’t use this connectivity pattern when there are overlapping IPv4 or IPv6 CIDR blocks in the VPCs.
  • Managing access can become challenging as the number of peered VPCs grows.

You can use VPC peering when the number of VPCs to be peered is less than 10.

AWS Transit Gateway

AWS Transit Gateway can provide scalable connectivity to MSK clusters. The following diagram demonstrates how to use this service to provide connectivity to MSK cluster. Let’s again consider a VPC in Account A running an MSK cluster, and an Apache Kafka client in a remote VPC in Account B is looking to connect to this MSK cluster. You set up AWS Transit Gateway to connect these VPCs and use route tables on the transit gateway to control the routing.

To extend this architecture to support access from a VPC in another Region, you need to use another transit gateway because this service can’t span Regions. In other words, for the Apache Kafka client in Account C in us-west-2 to connect to the MSK cluster, you need to peer another transit gateway in us-west-2 with the transit gateway in us-east-1 and work with the route tables to manage access to the MSK cluster. If you need to connect another account in us-west-2, you don’t need an additional transit gateway. The Apache Kafka clients in the new account (Account D) simply require a connection to the existing transit gateway in us-west-2 and the appropriate route tables.

The hub and spoke model for AWS Transit Gateway simplifies management at scale because VPCs only need to connect to one transit gateway per Region to gain access to the MSK cluster in the attached VPCs. However, this setup has some drawbacks:

  • Unlike VPC peering in which you only pay for data transfer charges, Transit Gateway has an hourly charge per attachment in addition to the data transfer fee.
  • This connectivity pattern doesn’t support transitive routing.
  • Unlike VPC peering, Transit Gateway is an additional hop between VPCs which may cause more latency.
  • It has higher latency (an additional hop between VPCs) comparing to VPC Peering.
  • The maximum bandwidth (burst) per Availability Zone per VPC connection is 50 Gbps.

You can use AWS Transit Gateway when you need to provide scalable access to the MSK cluster.

AWS PrivateLink

To provide private, unidirectional access from an Apache Kafka client to an MSK cluster across VPCs, you can use AWS PrivateLink. This also eliminates the need to expose the entire VPC or subnet and prevents issues like having to deal with overlapping CIDR blocks between the VPC that hosts the MSK cluster ENIs and the remote Apache Kafka client VPC.

Let’s do a quick recap of the architecture as explained in blog post How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS PrivateLink.

Let’s assume Account A has a VPC with three private subnets and an MSK cluster with three broker nodes in a 3-AZ deployment. You have three ENIs, one for each broker node in each subnet representing the broker nodes, and each ENI gets a private IPv4 address from its subnet’s CIDR block, and an MSK broker DNS endpoint. To expose the MSK cluster in Account A to other accounts via AWS PrivateLink, you have to create a VPC endpoint service in Account A. The VPC endpoint service requires the entity, in this case the MSK cluster, to be fronted by a Network Load Balancer (NLB).

You can choose from two patterns using AWS PrivateLink to provide cross-account access to Amazon MSK: with a single NLB or multiple NLBs.

AWS PrivateLink connectivity pattern with a single NLB

The following diagram illustrates access to an MSK cluster via an AWS PrivateLink connectivity pattern with a single NLB.

In this pattern, you have a single dedicated internal NLB in Account A. The NLB has a separate listener for each MSK broker. Because this pattern has a single NLB endpoint, each of the listeners need to listen on unique port. In the preceding diagram, the ports are depicted as 8443, 8444, and 8445. Correspondingly, for each listener, you have a unique target group, each of which has a single registered target: the IP address of an MSK broker ENI. Because the ports are different from the advertised listeners defined in the MSK cluster for each of the broker nodes, the advertised listeners configuration for each of the broker nodes in the cluster should be updated. Additionally, one target group has all the broker ENI IPs as targets and a corresponding listener (on port 9094), which means a request coming to the NLB on port 9094 can be routed to any of the MSK brokers.

In Account B, you need to create a corresponding VPC endpoint for the VPC endpoint service in Account A. Apache Kafka clients in Account B can connect to the MSK cluster in Account B by directing their requests to the VPC endpoint. For Transport Layer Security (TLS) to work, you also need an Amazon Route 53 private hosted zone with the domain name kafka.<region of the amazon msk cluster>.amazonaws.com, with alias resource record sets for each of the broker endpoints pointing to the VPC endpoint in Account B.

In this pattern, for the Apache Kafka clients local to the VPC with the Amazon MSK broker ENIs in Account A to connect to the MSK cluster, you need to set up a Route 53 private hosted zone, similar to Account B, with alias resource record sets for each of the broker endpoints pointing to the NLB endpoint. This is because the ports in the advertised.listener configuration have been changed for the brokers and the default Amazon MSK broker endpoints won’t work.

To extend this connectivity pattern and provide access to Apache Kafka clients in a remote Region, you need to create a peering connection (which can be via VPC peering or AWS Transit Gateway) between the VPC in Account B and the VPC in the remote Region. The same networking principles apply to make sure only specific intended resources are reachable.

AWS PrivateLink connectivity pattern with multiple NLBs

In the second pattern, you don’t share one VPC endpoint service or NLB across multiple MSK brokers. Instead, you have an independent set for each broker. Each NLB has only one listener listening on the same port (9094) for requests to each Amazon MSK broker. Correspondingly, you have a separate VPC endpoint service for each NLB and each broker. Just like in the first pattern, in Account B, you need a Route53 hosted private zone to alias broker DNS endpoints to VPC endpoints—in this case, they’re aliased to their own specific VPC endpoint.

This pattern has the advantage of not having to modify the advertised listeners configuration in the MSK cluster. However, there is an additional cost of deploying more NLBs, one for each broker. Furthermore, in this pattern, Apache Kafka clients that are local to the VPC with the MSK broker ENIs in Account A can connect to the cluster as usual with no additional setup needed. The following diagram illustrates this setup.

To extend this connectivity pattern and provide access to Apache Kafka clients in a remote Region, you need to create a peering connection between the VPC in Account B and the VPC in the remote Region.

You can use the sample code provided on GitHub to set up the AWS PrivateLink connectivity pattern with multiple NLBs for an MSK cluster. The intent of the code is to automate the creation of multiple resources instead of wiring it manually.

These patterns have the following benefits:

  • They are scalable solutions and do not limit the number of consumer VPCs.
  • AWS PrivateLink allows for VPC CIDR ranges to overlap.
  • You don’t need path definitions or a route table (access only to the MSK cluster), therefore it’s easier to manage

 The drawbacks are as follows:

  • The VPC endpoint and service must be in the same Region.
  • The VPC endpoints support IPv4 traffic only.
  • The endpoints can’t be transferred from one VPC to another.

You can use either connectivity pattern when you need your solution to scale to a large number of Amazon VPCs that can consume each service. You can also use either pattern when the cluster and client VPCs have overlapping IP addresses and when you want to restrict access to only the MSK cluster instead of the VPC itself. The single NLB pattern adds relevant complexity to the architecture because you need to maintain an additional target group and listener that has all brokers registered as well as keep the advertised.listeners property up to date. You can offset that complexity with the multiple NLB pattern but at an additional cost for the increased number of NLBs.

Conclusion

In this post, we explored different secure connectivity patterns to access an MSK cluster from a remote VPC. We also discussed the advantages, challenges, and limitations of each connectivity pattern. You can use this post as guidance to help you identify an appropriate connectivity pattern to address your requirements for accessing an MSK cluster. You can also use a combination of connectivity patterns to address your use case.

References

To read more about the solutions that inspired this post, see How Goldman Sachs builds cross-account connectivity to their Amazon MSK clusters with AWS PrivateLink and the webinar Cross-Account Connectivity Options for Amazon MSK.


About the Authors

Dr. Sam Mokhtari is a Senior Solutions Architect in AWS. His main area of depth is data and analytics, and he has published more than 30 influential articles in this field. He is also a respected data and analytics advisor who led several large-scale implementation projects across different industries including energy, health, telecom, and transport.

 

 

 

Pooja Chikkala is a Solutions Architect in AWS. Big data and analytics is her area of interest. She has 13 years of experience leading large-scale engineering projects with expertise in designing and managing both on-premises and cloud-based infrastructures.

 

 

 

Rajeev Chakrabarti is a Principal Developer Advocate with the Amazon MSK team. He has worked for many years in the big data and data streaming space. Before joining the Amazon MSK team, he was a Streaming Specialist SA helping customers build streaming pipelines.

 

 

 

Imtiaz (Taz) Sayed is the WW Tech Leader for Analytics at AWS. He enjoys engaging with the community on all things data and analytics, and can be reached at IN.

 

 

Effective data lakes using AWS Lake Formation, Part 5: Securing data lakes with row-level access control

Post Syndicated from Noritaka Sekiyama original https://aws.amazon.com/blogs/big-data/effective-data-lakes-using-aws-lake-formation-part-5-secure-data-lakes-with-row-level-access-control/

Increasingly, customers are looking at data lakes as a core part of their strategy to democratize data access across the organization. Data lakes enable you to handle petabytes and exabytes of data coming from a multitude of sources in varying formats, and gives users the ability to access it from their choice of analytics and machine learning tools. Fine-grained access controls are needed to ensure data is protected and access is granted to only those who require it.

AWS Lake Formation is a fully managed service that helps you build, secure, and manage data lakes, and provide access control for data in the data lake. Lake Formation row-level permissions allow you to restrict access to specific rows based on data compliance and governance policies. Lake Formation also provides centralized auditing and compliance reporting by identifying which principals accessed what data, when, and through which services.

Effective data lakes using AWS Lake Formation

This post demonstrates how row-level access controls work in Lake Formation, and how to set them up.

If you have large fact tables storing billions of records, you need a way to enable different users and teams to access only the data they’re allowed to see. Row-level access control is a simple and performant way to protect data, while giving users access to the data they need to perform their job. In the retail industry for instance, you may want individual departments to only see their own transactions, but allow regional managers access to transactions from every department.

Traditionally you can achieve row-level access control in a data lake through two common approaches:

  • Duplicate the data, redact sensitive information, and grant coarse-grained permissions on the redacted dataset
  • Load data into a database or a data warehouse, create a view with a WHERE clause to select only specific records, and grant permission on the resulting view

These solutions work well when dealing with a small number of tables, principals, and permissions. However, they make it difficult to audit and maintain because access controls are spread across multiple systems and methods. To make it easier to manage and enforce fine-grained access controls in a data lake, we announced a preview of Lake Formation row-level access controls. With this preview feature, you can create row-level filters and attach them to tables to restrict access to data for AWS Identity and Access Management (IAM) and SAMLv2 federated identities.

How data filters work for row-level security

Granting permissions on a table with row-level security (row filtering) restricts access to only specific rows in the table. The filtering is based on the values of one or more columns. For example, a salesperson analyzing sales opportunities should only be allowed to see those opportunities in their assigned territory and not others. We can define row-level filters to restrict access where the value of the territory column matches the assigned territory of the user.

With row-level security, we introduced the concept of data filters. Data filters make it simpler to manage and assign a large number of fine-grained permissions. You can specify the row filter expression using the WHERE clause syntax described in the PartiQL dialect.

Example use case

In this post, a fictional ecommerce company sells many different products, like books, videos, and toys. Customers can leave reviews and star ratings for each product, so other customers can make informed decisions about what they should buy. We use the Amazon Customer Reviews Dataset, which includes different products and customer reviews.

To illustrate the different roles and responsibilities of a data owner and a data consumer, we assume two personas: a data lake administrator and a data analyst. The administrator is responsible for setting up the data lake, creating data filters, and granting permissions to data analysts. Data analysts residing in different countries (for our use case, the US and Japan) can only analyze product reviews for customers located in their own country and for compliance reasons, shouldn’t be able to see data for customers located in other countries. We have two data analysts: one responsible for the US marketplace and another for the Japanese marketplace. Each analyst uses Amazon Athena to analyze customer reviews for their specific marketplace only.

Set up resources with AWS CloudFormation

This post includes an AWS CloudFormation template for a quick setup. You can review and customize it to suit your needs.

The CloudFormation template generates the following resources:

  • An AWS Lambda function (for Lambda-backed AWS CloudFormation custom resources). We use the function to copy sample data files from the public S3 bucket to your Amazon Simple Storage Service (Amazon S3) bucket.
  • An S3 bucket to serve as our data lake.
  • IAM users and policies:
    • DataLakeAdmin
    • DataAnalystUS
    • DataAnalystJP
  • An AWS Glue Data Catalog database, table, and partition.
  • Lake Formation data lake settings and permissions.

When following the steps in this section, use either us-east-1 or us-west-2 Regions (where the preview functionality is currently available).

Before launching the CloudFormation template, you need to ensure that you disabled Use only IAM access control for new databases/tables by following steps:

  1. Sign in to the Lake Formation console in the us-east-1 or us-west-2 Region.
  2. Under Data catalog, choose Settings.
  3. Deselect Use only IAM access control for new databases and Use only IAM access control for new tables in new databases.
  4. Choose Save.

To launch the CloudFormation stack, complete the following steps:

  1. Sign in to the CloudFormation console in the same Region.
  2. Choose Launch Stack:
  3. Choose Next.
  4. For DatalakeAdminUserName and DatalakeAdminUserPassword, enter the user name and password you want for the data lake admin IAM user.
  5. For DataAnalystUsUserName and DataAnalystUsUserPassword, enter the user name and password you want for the data analyst user who is responsible for the US marketplace.
  6. For DataAnalystJpUserName and DataAnalystJpUserPassword, enter the user name and password you want for the data analyst user who is responsible for the Japanese marketplace.
  7. For DataLakeBucketName, enter the name of your data lake bucket.
  8. For DatabaseName and TableName, leave as the default.
  9. Choose Next.
  10. On the next page, choose Next.
  11. Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
  12. Choose Create.

Stack creation can take about 1 minute.

Query without data filters

After you set up the environment, you can query the product reviews table. Let’s first query the table without row-level access controls to make sure we can see the data. If you’re running queries in Athena for the first time, you need to configure the query result location.

Sign in to the Athena console using the DatalakeAdmin user, and run the following query:

SELECT * 
FROM lakeformation_tutorial_row_security.amazon_reviews
LIMIT 10

The following screenshot shows the query result. This table has only one partition, product_category=Video, so each record is a review comment for a video product.

Let’s run an aggregation query to retrieve the total number of records per marketplace:

SELECT marketplace, count(*) as total_count
FROM lakeformation_tutorial_row_security.amazon_reviews
GROUP BY marketplace

The following screenshot shows the query result. The marketplace column has five different values. In the subsequent steps, we set up row-based filters using the marketplace column.

Set up data filters

Let’s start by creating two different data filters, one for the analyst responsible for the US marketplace, and another for the one responsible for the Japanese marketplace. The we grant the users their respective permissions.

Create a filter for the US marketplace data

Let’s first set up a filter for the US marketplace data.

  1. As the DatalakeAdmin user, open the Lake Formation console.
  2. Choose Data filters.
  3. Choose Create new filter.
  4. For Data filter name, enter amazon_reviews_US.
  5. For Target database, choose the database lakeformation_tutorial_row_security.
  6. For Target table, choose the table amazon_reviews.
  7. For Column-level access, leave as the default.
  8. For Row filter expression, enter marketplace='US'.
  9. Choose Create filter.

Create a filter for the Japanese marketplace data

Let’s create another data filter to restrict access to the Japanese marketplace data.

  1. On the Data filters page, choose Create new filter.
  2. For Data filter name, enter amazon_reviews_JP.
  3. For Target database, choose the database lakeformation_tutorial_row_security.
  4. For Target table, choose the table amazon_reviews.
  5. For Column-level access, leave as the default.
  6. For Row filter expression, enter marketplace='JP'.
  7. Choose Create filter.

Grant permissions to the US data analyst

Now we have two data filters. Next, we need to grant permissions using these data filters to our analysts. We start by granting permissions to the DataAnalystUS user.

  1. On the Data permissions page, choose Grant.
  2. For Principals, choose IAM users and roles, and choose the user DataAnalystUS.
  3. For Policy tags or catalog resources, choose Named data catalog resources.
  4. For Database, choose the database lakeformation_tutorial_row_security.
  5. For Table, choose the table amazon_reviews.
  6. For Table permissions, select Select.
  7. For Data permissions, select Advanced cell-level filters.
  8. Select the filter amazon_reviews_US.
  9. Choose Grant.

The following screenshot show the available data filters you can attach to a table when configuring permissions.

Grant permissions to the Japanese data analyst

Next, complete the following steps to configure permissions for the user DataAnalystJP:

  1. On the Data permissions page, choose Grant.
  2. For Principals, choose IAM users and roles, and choose the user DataAnalystJP.
  3. For Policy tags or catalog resources, choose Named data catalog resources.
  4. For Database, choose the database lakeformation_tutorial_row_security.
  5. For Table, choose the table amazon_reviews.
  6. For Table permissions, select Select.
  7. For Data permissions, select Advanced cell-level filters.
  8. Select the filter amazon_reviews_JP.
  9. Choose Grant.

Query with data filters

With the data filters attached to the product reviews table, we’re ready to run some queries and see how permissions are enforced by Lake Formation. Because row-level security is in preview as of this writing, we need to create a special Athena workgroup named AmazonAthenaLakeFormationPreview, and switch to using it. For more information, see Managing Workgroups.

Sign in to the Athena console using the DataAnalystUS user and switch to the AmazonAthenaLakeFormationPreview workgroup. Run the following query to retrieve a few records, which are filtered based on the row-level permissions we defined:

SELECT * 
FROM lakeformation.lakeformation_tutorial_row_security.amazon_reviews
LIMIT 10

Note the prefix of lakeformation. before the database name; this is required for the preview only.

The following screenshot shows the query result.

Similarly, run a query to count the total number of records per marketplace:

SELECT marketplace, count(*) as total_count
FROM lakeformation.lakeformation_tutorial_row_security.amazon_reviews
GROUP BY marketplace 

The following screenshot shows the query result. Only the marketplace US shows in the results. This is because our user is only allowed to see rows where the marketplace column value is equal to US.

Switch to the DataAnalystJP user and run the same query:

SELECT * 
FROM lakeformation.lakeformation_tutorial_row_security.amazon_reviews
LIMIT 10

The following screenshot shows the query result. All of the records belong to the JP marketplace.

Run the query to count the total number of records per marketplace:

SELECT marketplace, count(*) as total_count
FROM lakeformation.lakeformation_tutorial_row_security.amazon_reviews
GROUP BY marketplace

The following screenshot shows the query result. Again, only the row belonging to the JP marketplace is returned.

Clean up

Now to the final step, cleaning up the resources.

  1. Delete the CloudFormation stack.
  2. Delete the Athena workgroup AmazonAthenaLakeFormationPreview.

Conclusion

In this post, we covered how row-level security in Lake Formation enables you to control data access without needing to duplicate it or manage complicated alternatives such as views. We demonstrated how Lake Formation data filters can make creating, managing, and enforcing row-level permissions simple and easy.

When you want to grant permission on specific cell, you can include or exclude columns in the data filters in addition to the row filter expression. You can learn more about the cell filters in Part 4: Implementing cell-level and row-level security.

You can get started with Lake Formation today by visiting the AWS Lake Formation product page. If you want to try out row-level security, as well as the other exciting new features like ACID transactions and acceleration currently available for preview in the US East (N. Virginia) and the US West (Oregon) Regions, sign up for the preview.


About the Authors

Noritaka Sekiyama is a Senior Big Data Architect on the AWS Glue and AWS Lake Formation team. He has 11 years of experience working in the software industry. Based in Tokyo, Japan, he is responsible for implementing software artifacts, building libraries, troubleshooting complex issues and helping guide customer architectures.

 

 

 

Sanjay Srivastava is a Principal Product Manager for AWS Lake Formation. He is passionate about building products, in particular products that help customers get more out of their data. During his spare time, he loves to spend time with his family and engage in outdoor activities including hiking, running, and gardening.
 

 

Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs

Post Syndicated from Hareesh Singireddy original https://aws.amazon.com/blogs/architecture/expiring-amazon-s3-objects-based-on-last-accessed-date-to-decrease-costs/

Organizations are using Amazon Simple Storage Service (S3) for building their data lakes, websites, mobile applications, and enterprise applications. As the number of objects within your S3 bucket increases, you may want to move older objects into lower-cost tiers of Amazon S3. In some cases you may want to delete the objects altogether to further reduce S3 storage costs. A common practice is to use S3 Lifecycle rules to achieve this. These rules can be applied to objects based on their creation date. In certain situations, you may want to keep objects available that are still being accessed, but transition or delete objects that are no longer in use.

In this post, we will demonstrate how you can create custom object expiry rules for Amazon S3 based on the last accessed date of the object. We will first walk through the various features used within the workflow, followed by an architecture diagram outlining the process flow.

Amazon S3 server access logging

S3 Server access logging provides detailed records of the requests that are made to objects in Amazon S3 buckets. Amazon S3 periodically collects access log records, consolidates the records in log files, and then uploads log files to your target bucket as log objects. Each log record consists of information such as bucket name, the operation in the request, and the time at which the request was received. S3 Server Access Log Format provides more details about the format of the log file.

Amazon S3 inventory

Amazon S3 inventory provides a list of your objects and the corresponding metadata on a daily or weekly basis, for an S3 bucket or a shared prefix. The inventory lists are stored as a comma-separated value (CSV) file compressed with GZIP, as an Apache optimized row columnar (ORC) file compressed with ZLIB, or as an Apache Parquet file compressed with Snappy.

Amazon S3 Lifecycle

Amazon S3 Lifecycle policies help you manage your objects through two types of actions, Transition and Expiration. In the architecture shown following in Figure 1, we create an S3 Lifecycle configuration rule that expires objects after ‘x’ days. It has a filter for an object tag of “delete=True”. You can configure the value of ‘x’ based on your requirements.

If you are using an S3 bucket to store short lived objects with unknown access patterns, you might want to keep the objects that are still being accessed, but delete the rest. This will let you retain objects in your S3 bucket even after their expiry date as per the S3 lifecycle rules, while saving you costs by deleting objects that are not needed anymore. The following diagram shows an architecture that considers the last accessed date of the object before deleting S3 objects.

Figure 1. Object expiry architecture flow

Figure 1. Object expiry architecture flow

This architecture uses native S3 features mentioned earlier in combination with other AWS services to achieve the desired outcome.

Here is the architecture flow:

  1. The S3 server access logs capture S3 object requests. These are generated and stored in the target S3 bucket.
  2. An S3 inventory report is generated for the source bucket daily. It is written to the S3 inventory target bucket.
  3. An Amazon EventBridge rule is configured that will initiate an AWS Lambda function once a day, or as desired.
  4. The Lambda function initiates an S3 Batch Operation job to tag objects in the source bucket. These must be expired using the following logic:
    • Capture the number of days (x) configuration from the S3 Lifecycle configuration.
    • Run an Amazon Athena query that will get the list of objects from the S3 inventory report and server access logs. Create a delta list with objects that were created earlier than ‘x’ days, but not accessed during that time.
    • Write a manifest file with the list of these objects to an S3 bucket.
    • Create an S3 Batch operation job that will tag all objects in the manifest file with a tag of “delete=True”.
  5. The Lifecycle rule on the source S3 bucket will expire all objects that were created prior to ‘x’ days. They will have the tag given via the S3 batch operation of “delete=True”.

The preceding architecture is built for fault tolerance. If a particular run fails, all the objects that must be expired will be picked up during the next run. You can configure error handling and automatic retries in your Lambda function. An Amazon Simple Notification Service (SNS) topic will send out a notification in the event of a failure.

Cost considerations

S3 server access logs, S3 inventory lists, and manifest files can accumulate many objects over time. We recommend you configure an S3 Lifecycle policy on the target bucket to periodically delete older objects. Although following the guidelines in this post can decrease some of your costs, S3 requests, S3 inventory, S3 Object Tagging, and Lifecycle transitions also have costs associated with them. Additional details can be found on the S3 pricing page.

Amazon Athena charges you based on the amount of data scanned by each query. But Amazon S3 inventory can also output files in Apache ORC or Apache Parquet format, which can reduce the amount data scanned by Athena. The Athena pricing page would be helpful to review.

AWS Lambda has a free usage tier of 1M free requests per month and 400,000 GB-seconds of compute time per month. However, you are charged based on the number of requests, the amount of memory allocated, and the runtime duration of the function. See more at the Lambda pricing page.

Conclusion

In this blog post, we showed how you can create a custom process to delete objects from your S3 bucket based on the last time the object was accessed. You can use this architecture to customize your object transitions, clean up your S3 buckets for any unnecessary objects, and keep your S3 buckets cost-effective. This architecture can also be used on versioned S3 buckets with some minor modifications.

We hope you found this blog post useful and welcome your feedback!

Read more about queries, rules, and tags:

Backblaze Drive Stats for Q2 2021

Post Syndicated from original https://www.backblaze.com/blog/backblaze-drive-stats-for-q2-2021/

As of June 30, 2021, Backblaze had 181,464 drives spread across four data centers on two continents. Of that number, there were 3,298 boot drives and 178,166 data drives. The boot drives consisted of 1,607 hard drives and 1,691 SSDs. This report will review the quarterly and lifetime failure rates for our data drives, and we’ll compare the failure rates of our HDD and SSD boot drives. Along the way, we’ll share our observations of and insights into the data presented and, as always, we look forward to your comments below.

Q2 2021 Hard Drive Failure Rates

At the end of June 2021, Backblaze was monitoring 178,166 hard drives used to store data. For our evaluation, we removed from consideration 231 drives which were used for either testing purposes or as drive models for which we did not have at least 60 drives. This leaves us with 177,935 hard drives for the Q2 2021 quarterly report, as shown below.

Notes and Observations on the Q2 2021 Stats

The data for all of the drives in our data centers, including the 231 drives not included in the list above, is available for download on the Hard Drive Test Data webpage.

Zero Failures

Three drive models recorded zero failures during Q2, let’s take a look at each.

  • 6TB Seagate (ST6000DX000): The average age of these drives is over six years (74 months) and with one failure over the last year, this drive is aging quite well. The low number of drives (886) and drive days (80,626) means there is some variability in the failure rate, but the lifetime failure rate of 0.92% is solid.
  • 12TB HGST (HUH721212ALE600): These drives reside in our Dell storage servers in our Amsterdam data center. After recording a quarterly high of five failures last quarter, they are back on track with zero failures this quarter and a lifetime failure rate of 0.41%.
  • 16TB Western Digital (WUH721816ALE6L0): These drives have only been installed for three months, but no failures in 624 drives is a great start.

Honorable Mention

Three drive models recorded one drive failure during the quarter. They vary widely in age.

  • On the young side, with an average age of five months, the 16TB Toshiba (MG08ACA16TEY) had its first drive failure out of 1,430 drives installed.
  • At the other end of the age spectrum, one of our 4TB Toshiba (MD04ABA400V) drives finally failed, the first failure since Q4 of 2018.
  • In the middle of the age spectrum with an average of 40.7 months, the 8TB HGST drives (HUH728080ALE600) also had just one failure this past quarter.

Outliers

Two drive models had an annualized failure rate (AFR) above 4%, let’s take a closer look.

  • The 4TB Toshiba (MD04ABA400V) had an AFR of 4.07% for Q2 2021, but as noted above, that was with one drive failure. Drive models with low drive days in a given period are subject to wide swings in the AFR. In this case, one less failure during the quarter would result in an AFR of 0% and one more failure would result in an AFR of over 8.1%.
  • The 14TB Seagate (ST14000NM0138) drives have an AFR of 5.55% for Q2 2021. These Seagate drives along with 14TB Toshiba drives (MG07ACA14TEY) were installed in Dell storage servers deployed in our U.S. West region about six months ago. We are actively working with Dell to determine the root cause of this elevated failure rate and expect to follow up on this topic in the next quarterly drive stats report.

Overall AFR

The quarterly AFR for all the drives jumped up to 1.01% from 0.85% in Q1 2021 and 0.81% one year ago in Q2 2020. This jump ended a downward trend over the past year. The increase is within our confidence interval, but bears watching going forward.

HDDs vs. SSDs, a Follow-up

In our Q1 2021 report, we took an initial look at comparing our HDD and SSD boot drives, both for Q1 and lifetime timeframes. As we stated at the time, a numbers-to-numbers comparison was suspect as each type of drive was at a different point in its life cycle. The average age of the HDD drives was 49.63 months while the SSDs average age was 12.66 months. As a reminder, the HDD and SSD boot drives perform the same functions which include booting the storage servers and performing reads, writes, and deletes of daily log files and other temporary files.

To create a more accurate comparison, we took the HDD boot drives that were in use at the end of Q4 2020 and went back in time to see where their average age and cumulative drive days would be similar to those same attributes for the SDDs at the end of Q4 2020. We found that at the end of Q4 2015 the attributes were the closest.

Let’s start with the HDD boot drives that were active at the end of Q4 2020.

Next, we’ll look at the SSD boot drives that were active at the end of Q4 2020.

Finally, let’s look at the lifetime attributes of the HDD drives active in Q4 2020 as they were back in Q4 2015.

To summarize, when we control using the same drive models, the same average drive age, and a similar number of drive days, HDD and SSD drives failure rates compare as follows:

While the failure rate for our HDD boot drives is nearly two times higher than the SSD boot drives, it is not the nearly 10 times failure rate we saw in the Q1 2021 report when we compared the two types of drives at different points in their lifecycle.

Predicting the Future?

What happened to the HDD boot drives from 2016 to 2020 as their lifetime AFR rose from 1.54% in Q4 2015 to 6.26% in Q4 2020? The chart below shows the lifetime AFR for the HDD boot drives from 2014 through 2020.

As the graph shows, beginning in 2018 the HDD boot drive failures accelerated. This continued in 2019 and 2020 even as the number of HDD boot drives started to decrease when failed HDD boot drives were replaced with SSD boot drives. As the average age of the HDD boot drive fleet increased, so did the failure rate. This makes sense and is borne out by the data. This raises a couple of questions:

  • Will the SSD drives begin failing at higher rates as they get older?
  • How will the SSD failure rates going forward compare to what we have observed with the HDD boot drives?

We’ll continue to track and report on SSDs versus HDDs based on our data.

Lifetime Hard Drive Stats

The chart below shows the lifetime AFR of all the hard drive models in production as of June 30, 2021.

Notes and Observations on the Lifetime Stats

The lifetime AFR for all of the drives in our farm continues to decrease. The 1.45% AFR is the lowest recorded value since we started back in 2013. The drive population spans drive models from 4TB to 16TB and varies in average age from three months (WDC 16TB) to over six years (Seagate 6TB).

Our best performing drive models in our environment by drive size are listed in the table below.

Notes:

  1. The WDC 16TB drive, model: WUH721816ALE6L0, does not appear to be available in the U.S. through retail channels at this time.
  2. Status is based on what is stated on the website. Further investigation may be required to ensure you are purchasing a new drive versus a refurbished drive marked as new.
  3. The source and price were as of 7/30/2021.
  4. In searching for the Toshiba 16TB drive, model: MG08ACA16TEY, you may find model: MG08ACA16TE for much less ($399.00 or less). These are not the same drive and we have no information on the latter model. The MG08ACA16TEY includes the Sanitize Instant Erase feature.

The Drive Stats Data

The complete data set used to create the information used in this review is available on our Hard Drive Test Data page. You can download and use this data for free for your own purpose. All we ask are three things: 1) you cite Backblaze as the source if you use the data, 2) you accept that you are solely responsible for how you use the data, and 3) you do not sell this data to anyone; it is free.

If you just want the summarized data used to create the tables and charts in this blog post, you can download the ZIP file containing the CSV files for each chart.

Good luck and let us know if you find anything interesting.

The post Backblaze Drive Stats for Q2 2021 appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Security updates for Tuesday

Post Syndicated from original https://lwn.net/Articles/865029/rss

Security updates have been issued by Arch Linux (chromium, nodejs, nodejs-lts-erbium, and nodejs-lts-fermium), Debian (pyxdg, shiro, and vlc), openSUSE (qemu), Oracle (lasso), Red Hat (glibc, lasso, rh-php73-php, rh-varnish6-varnish, and varnish:6), Scientific Linux (lasso), SUSE (dbus-1, lasso, python-Pillow, and qemu), and Ubuntu (exiv2, gnutls28, and qpdf).

The Ransomware Task Force: A New Approach to Fighting Ransomware

Post Syndicated from Jen Ellis original https://blog.rapid7.com/2021/08/03/the-ransomware-task-force-a-new-approach-to-fighting-ransomware/

The Ransomware Task Force: A New Approach to Fighting Ransomware

In the past few months, we’ve seen ransomware attacks shut down healthcare across Ireland, fuel delivery across parts of the US, and meat processing across Australia, Canada and the US. We’ve seen demands of payments in the tens of millions of dollars. We’re also continuing to see trends around ransomware-as-a-service and double or triple extortion continuing to rise. It’s clear that ransomware attacks are increasing in frequency, breadth, sophistication, scale, and impact.

Recognizing this, the Institute for Security and Technology put together a comprehensive Ransomware Task Force (RTF) to identify new approaches to shift the dynamics of ransomware and reduce opportunities for attackers. The Ransomware Task Force involved more than 60 participants representing a wide range of expertise and experience, including from multiple governments, law enforcement, civil society and public policy nonprofits, and security advancement groups. From the private sector, organizations of all sizes participated, including many that have experienced ransomware attacks firsthand or that are involved in dealing with the fallout, such as cybersecurity companies, law firms, and cyber insurers. Rapid7 was among those that participated — I was one of the co-chairs, and my amazing colleagues, Bob Rudis, Tod Beardsley, and Scott King participated as well.

From the outset, the intent of the Task Force was to look at the issue holistically and come up with a comprehensive set of recommendations to deter and disrupt ransomware attackers, thereby helping organizations prepare for and respond to attacks at scale. Recognizing the scale and severity of the issue — and the need for systemic and societal responses — our target audience was policymakers and government leaders.

The Task Force recognized that ransomware is not a new topic, and we had no desire to rehash previous efforts. Instead, we sought to learn from them and, where appropriate, amplify and extend them, supporting the next period of growth on this thorny issue. Ransomware’s reach and impact are increasing, which has a serious impact on society. The effects are only likely to worsen without significant action from governments and other leaders.

Key recommendations

The final report issued by the Task Force makes 48 recommendations, broken into actions to deter, disrupt, prepare for, and respond to ransomware attacks. The recommendations are designed to work in concert with each other, though we recognize there are a large number of them, and many will take time to implement. In reality, though, there truly is no silver bullet for addressing ransomware, no one thing that will magically solve this problem. If we want to shift the dynamics in a meaningful way that makes it harder for attackers to succeed, we need to make adjustments in a range of areas. It’s also worth noting that the Task Force’s goal was to provide recommendations to government and other leaders, not to provide tactical, technical guidance.

Given there are 48 recommendations, and they are well set out in the report, I won’t go over them now. I’ll just highlight a few of the big themes and, where relevant, what’s happened since the launch of the report.

Make it a top priority

One of the biggest challenges we face with any discussion around cybercrime is that it’s often viewed as a niche technical problem, not as a broad societal issue. This has made it harder to get the required attention and investment in solutions. The Task Force called for senior political leaders to recognize ransomware for what it is: a national security issue and a major threat to our ways of life (Action 1.2.5, page 26). We also called for a whole-of-government approach whereby leaders would engage various stakeholders across the government to help ensure necessary action is taking place collaboratively across the board (Actions 1.2.1 and 1.2.2, page 23).

One possible silver lining of the recent attacks against critical infrastructure is that they’ve helped establish this level of priority. In the US, we’ve seen various parts of the government start to take action: Congress has held hearings and proposed legislation; the Department of Justice has given ransomware investigations similar status to those for terrorism; the Department of Homeland Security has issued new cybersecurity guidelines for pipelines; the White House issued a memo to urge the private sector to take steps to protect against ransomware; and even President Biden has talked about ransomware in press conferences and with other world leaders.

Global action for a global problem

To take meaningful action to reduce ransomware attacks, we must acknowledge the geopolitical aspects. Firstly, the issue affects countries all around the world. Governments taking action should do so in coordination and cooperation in order to amplify the impact and hit attackers on multiple fronts at once (Actions 1.1.1 – 1.1.4, 1.2.6, pages 21-22, 26).

Secondly, and perhaps more crucially, one of the main advantages for attackers is the existence of nations that provide safe havens, because they’re either unwilling or unable to prosecute cybercriminals. This also makes it much harder for other countries to prosecute these criminals, and as such, ransomware attackers rarely seem to fear consequences for their actions.

The Task Force recommended that governments work together to tackle the issue of safe havens and adopt key practices to protect their citizens — or help them better protect themselves (Actions 1.3.1 and 1.3.2, page 27).

We’ve already seen some progress in this regard, as ransomware was raised at the recent G7 Summit, and the resulting communique included the following commitment from members:

“We also commit to work together to urgently address the escalating shared threat from criminal ransomware networks. We call on all states to urgently identify and disrupt ransomware criminal networks operating from within their borders, and hold those networks accountable for their actions.”

It will be interesting to see whether and how the G7 members will follow through on this commitment. I hope they’ll take action, build momentum, and recruit participation from other nations.

Reducing paths to revenue

As mentioned above, we’re seeing attackers demand higher and higher ransoms, which likely attracts other criminals to enter the market. Hopefully, the opposite is also true; if we reduce the opportunity to make money from ransomware, the number of attacks will decrease.

This rationale, coupled with discomfort over the idea of ransom payments being used to fund other types of organized crime — including human trafficking, child exploitation, and weapons trafficking — resulted in a great deal of discussion around the notion of banning ransom payments.

While the Task Force agreed that payments should be discouraged, the idea of a legal prohibition was challenging. Given the lack of real risk or friction for attackers, it’s likely that if payments were outlawed, attackers wouldn’t simply give up. Rather, they’d first play a game of chicken against victims, focusing on the organizations least likely to resist paying — namely providers of critical functions that can’t be disrupted without profound impact on society, or small-to-medium businesses that aren’t financially able to prepare for and weather an attack.

Given the concerns over these practicalities, the Task Force did not recommend banning payments. Rather, we looked at alternative ways of reducing the ease with which attackers realize a profit. There are two main paths to this: reducing the likelihood of victims making a payment, and making it technically harder for attackers to get their payment.

In terms of making victims think twice before making a payment, the RTF recommended a few measures:

  • Requiring the disclosure of payments (Action 4.2.4, page 46): This will help to build greater understanding of what is happening in the attack landscape and may enable law enforcement to build more information on attackers, or even recapture payments.
  • Requiring organizations to conduct cost-benefit analysis prior to making payments (Action 4.3.1 and 4.3.2, pages 47 and 48): This will encourage organizations to look into alternative options for resolution — for example, turning to the No More Ransom Project to seek decryption keys.
  • Creating a fund to assist certain organizations in recovery (Action 4.1.2, page 43): Often, organizations say the cost of recovery significantly outsizes that of the ransom, leaving them no choice but to give into their attacker’s demands. For qualifying organizations, this fund would rebalance the scales and give them a pragmatic alternative to paying the ransom.

On the other track — disrupting the system that facilitates the payment of ransoms — the RTF recommended that cryptocurrency exchanges, kiosks, and over-the-counter trading desks be required to comply with existing laws, such as Know Your Customer (KYC), Anti-Money Laundering (AML), and Combatting Financing of Terrorism (CFT) (Action 2.1.2, pages 29 and 30).

Better preparation, better response

During the explorations of the Task Force, it became apparent that part of the reason ransomware attacks are so successful is that many organizations don’t truly understand the threat, believe it’s relevant to them, or understand how to protect themselves. We repeatedly heard that, while there is a lot of information on ransomware, it’s overwhelming and often unhelpful. Many organizations don’t know what to focus on, and guidance may be oversimplified, overcomplicated, or insufficient.

With this in mind, one of our top recommendations was for the development of a ransomware framework that would cover measures for both preparing for and responding to attacks (Action 3.1.1, pages 35 and 36). The framework would need to be pragmatic, actionable, and address varying levels of sophistication and capability (Action 3.1.2, page 36). And because one of our main themes was around international cooperation, we also recommended there be a single source of truth adopted and promoted by multiple governments around the world. In fact, we recommended the framework be developed through both international and public-private collaboration. It should also be kept up to date to react to evolving ransomware attack trends.

Creating the framework is a lift, but it’s only part of the battle — you can’t drive adoption if you don’t also tackle the lack of awareness and understanding. As such, we also recommend that governments run high-profile awareness campaigns, partnering with organizations with reach into audiences that aren’t being well addressed today (Actions 3.2.1 and 3.2.2, pages 37 and 38). For example, many governments have toolkits or content aimed at small-to-medium businesses, but most leaders of these organizations seem largely unaware of the risk — until someone they know personally is hit by an attack.

The path forward

Unfortunately, ransomware continues to dominate headlines and harm organizations around the world. As a result, many governments are paying a great deal of attention to this issue and looking for solutions. I’m relieved to say the Ransomware Task Force’s report and recommendations have seen a fair bit of interest and support. For us, the next challenge is to keep the momentum going and help governments translate interest into action.

In the meantime, my colleagues at Rapid7 and I will continue to try to help our customers and community prepare for and respond to attacks. We’re working on some other content to help people better understand the dynamics of the issue, as well as the steps they can take to protect themselves or get involved in broader response efforts.

Look out for our series of blogs on different aspects of ransomware, and in the meantime, check out our interviews with ransomware experts on our Security Nation podcast. You can also check out my talk and Q&A on the Ransomware Task Force at Black Hat, or as part of Rapid7’s Virtual Vegas, which includes a Ransomware (un)Happy Hour — bring your ransomware war stories, lessons learned, or questions.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Durable Objects: Easy, Fast, Correct — Choose three.

Post Syndicated from Kenton Varda original https://blog.cloudflare.com/durable-objects-easy-fast-correct-choose-three/

Durable Objects: Easy, Fast, Correct — Choose three.

Durable Objects: Easy, Fast, Correct — Choose three.

Storage in distributed systems is surprisingly hard to get right. Distributed databases and consensus are well-known to be extremely hard to build. But, application code isn’t necessarily easy either. There are many ways in which apps that use databases can have subtle timing bugs that could result in inconsistent results, or even data loss. Worse, these problems can be very hard to test for, as they’ll often manifest only under heavy load, or only after a sudden machine failure.

Up until recently, Durable Objects were no exception. A Durable Object is a special kind of Cloudflare Worker that has access to persistent storage and processes requests in one of Cloudflare’s points of presence. Each Object has its own private storage, accessible through a classical key/value storage API. Like any classical database API, this storage API had to be used carefully to avoid possible race conditions and data loss, especially when performance mattered. And like any classical database API, many apps got it wrong.

However, rather than fix the apps, we decided to fix the model. Last month, we rolled out deep changes to the Durable Objects runtime such that many applications which previously contained subtle race conditions are now correct by default, and many that were previously slow are now fast. Developers can now write their code in an intuitive way, and have it work. No changes at all are needed to your code in order to take advantage of these new features.

So, let me tell you about what changed…

Background: Durable Objects are Single-Threaded

To understand what changed, it’s necessary to first understand Durable Objects. For a full introduction, see the Durable Objects announcement blog post.

The most important point is: Each Durable Object runs in exactly one location, in one single thread, at a time. Each object has its own private on-disk storage. This is a very different situation from a typical database, where many clients may be accessing the same data. In Durable Objects, any particular piece of data belongs to exactly one thread at a time.

Because a single Durable Object is single-threaded, it’s possible, and even encouraged, to keep state and perform synchronization in memory. This is, indeed, the killer feature of Durable Objects. With classical databases, in-memory state is extremely difficult to keep synchronized between all database clients. But with Durable Objects, since each piece of data belongs to a specific thread, this synchronization is easy.

However, interacting with the disk is still an I/O (input/output) operation, which means that each operation returns a Promise which you must await. As we’ll see, this re-introduces some of the synchronization difficulties that we were trying to avoid. However, it turns out, we can solve these difficulties within the system itself, without bothering application developers.

An Example

Consider this code:

// Used to be slow and racy -- but not anymore!
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

At first glance, this seems like reasonable code that returns a unique number each time it is called (incrementing each time).

Unfortunately, before now, this code had two problems:

  1. It had a subtle race condition (even though Durable Objects are single-threaded!).
  2. It was kind of slow.

The Race Condition

A race condition occurs when two operations running concurrently might interfere with each other in a way that makes them behave incorrectly. Race conditions are commonly associated with code that uses multiple threads.

JavaScript, however, famously does not use threads. Instead, it uses event-driven programming, with callbacks. It’s not possible for two pieces of JavaScript code to be running "at the same time" in the same isolate (and Durable Objects promises that no other isolate could possibly be accessing the same storage). Does that mean that race conditions aren’t a problem in JavaScript, the way they are in multi-threaded apps?

Unfortunately, it does not. The problem is, the code above is an async function, containing two await statements. Each time await is used, execution pauses, waiting for the specified Promise to complete.

In the meantime, though, other code can run! For example, the Durable Object might receive two requests at the same time. If each of them calls getUniqueNumber(), then the two calls might be interleaved. Each time one call performs an await, execution may switch to the other call. So, the two calls might end up looking like this:

Request 1 timeline Request 2 timeline
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
 
 
 
 
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
 
 
  await this.storage.put("counter", val + 1);
  return val;
}
 
 
 
 
  return val;
}

There’s a big problem here: Both of these two calls will call get("counter") before either of them calls put("counter", val + 1). That means, both of them will return the same value!

This problem is especially bad because it only happens when multiple requests are being handled at the same time — and even then, only sometimes. It is very hard to test for this kind of problem, and everything might seem just fine when the application is deployed, as long as it isn’t getting too much traffic. But one day, when a lot of visitors try to use the same object at the same time, all of a sudden getUniqueNumber() starts returning duplicates!

The Slowness

To add insult to injury, getUniqueNumber() was (until recently) pretty slow. The problem is, it has to do two round trips to storage — a get() and a put(). The get() might typically take a couple milliseconds. The put(), however, will take much longer, probably tens of milliseconds.

Why is put() so slow? Because we don’t want to lose data. The worst thing an application can do is tell the user that their action was successful when it wasn’t. If, for some reason, a write cannot be completed, then it’s imperative that the application presents an error to the user, so that the user knows that something is wrong and they’ll have to try again or look for a fix.

In order to make sure an application does not prematurely report success to the user, await put() has to make sure it doesn’t return until the data is actually safe on disk. Disks are slow, so this might take a while.

But that’s not all. Disks can fail. In order for the data to be really safe, we have to write the same data on multiple disks, in multiple machines. That means we have to wait for some network traffic.

But that’s still not all. What if a meteor were to come out of the sky and land on a Cloudflare data center, completely destroying it? Or, more likely, what if the power or network connection failed? We don’t want a user’s data to be lost in this case, or even temporarily become unavailable. Therefore, Durable Object data is replicated to multiple Cloudflare locations. This requires communicating across long distances before any write can be confirmed. There is little we can do to make this faster, the speed of light being what it is.

A call to getUniqueNumber() will therefore always take tens of milliseconds. If an application calls it multiple times, awaiting each call before beginning the next, it can easily become very slow very quickly. Or, at least, that was the case before our recent changes.

The Wrong Fixes

There are several ways that an application could fix these problems, but all of them have their own issues.

Transactions?

Many databases offer "transactions". A transaction allows an application to make sure some operation completes "atomically", with no interference from concurrent operations.

The Durable Objects storage API has always supported transactions. We could use them to fix our getUniqueNumber() implementation like so:

// No more race condition... but slow and complicated.
async function getUniqueNumber() {
  let val;
  await this.storage.transaction(async (txn) => {
    val = await txn.get("counter");
    await txn.put("counter", val + 1);
  });
  return val;
}

This fixes our race condition. Now, if getUniqueNumber() is called multiple times concurrently such that the storage operations interleave, the system will detect the problem. One of the concurrent calls will be chosen to be the "winner", and will complete normally. The other calls will be canceled and retried, so that they can see the value written by the first call.

This fixes our problems! But, at some cost:

  • getUniqueNumber() is now even slower than it was before. The difference typically won’t be huge, but setting up a transaction does require some additional coordination in the database. Of course, if the transaction needs to be retried, then it may end up being much slower. And retries will tend to happen more when load gets high… the worst possible time.
  • Speaking of retries, many developers might not realize that the transaction callback can be called multiple times. It’s difficult to test for this, since retries will only happen when concurrent operations cause conflicts. The problem is especially acute when the application is trying to synchronize not just on-disk state, but also in-memory state — if the transaction callback modifies in-memory state, it must be careful to ensure that its changes are idempotent. The need for idempotency may not be top of mind for most developers, and tests won’t catch the problem, making it very easy to end up deploying buggy code.

So we solved our problem, but we did it with a foot-gun. If we keep using the foot-gun, we’re probably going to shoot our own feet eventually.

Is there another way?

In-memory caching?

Durable Objects’ superpower is their in-memory state. Each object has only one active instance at any particular time. All requests sent to that object are handled by that same instance. That means, you can store some state in memory.

// Much faster! But (used to be) wrong.
async function getUniqueNumber() {
  if (this.val === undefined) {
    this.val = await this.storage.get("counter");
  }

  let result = this.val;
  ++this.val;
  this.storage.put("counter", this.val);
  return result;
}

This code is MUCH faster than the previous implementation, because it stores the value in memory. In fact, after the function runs once, further calls won’t wait for any I/O at all — they will return immediately. This is because by caching the value in memory, we avoid waiting for a get() (except for the first time), and we don’t wait for the put() either, trusting that it will complete asynchronously later on.

Returning immediately also means that there’s no opportunity for concurrency, so the calls that return immediately will always return unique numbers! This means that not only is this implementation faster than our original implementation, it is also more correct. This is only possible because the Durable Objects platform guarantees that there will only be one instance, and therefore only one copy of this.val.

Unfortunately, there are two problems with this code:

  • We still have a race condition on initialization. If the first two calls to getUniqueNumber() happen to occur at about the same time, then initialization will be performed multiple times. The second call will likely clobber what the first call did, and the two calls will end up returning the same number. We could solve this problem by making initialization more complicated — the first call could create an initialization promise, and other concurrent calls could wait on it, so that initialization really only happens once. But this creates even deeper complexity: What if initialization fails for some reason? The object could be placed in a permanently broken state. It’s possible to get this right, but it’s surprisingly tricky.
  • Because we don’t wait for the put() to report success, it’s possible that it could be silently lost. For example, if the machine hosting the Durable Object suffered a sudden power failure, then the Durable Object would be transferred to some other machine. When it starts up there, calls to getUniqueNumber() might return numbers that had already been returned under the old instance before it failed, because the put()s hadn’t actually completed before the failure occurred. But if we await the put(), then our function becomes slow again, and creates more opportunities for race conditions (e.g. in the calling code).

Our answer: Make it automatic

When looking at this, we had two options:

  1. Try to carefully document these problems and educate developers about them, so that they could write code that does the right thing.
  2. Change the system so that naturally-written code just does the right thing by default — and runs quickly.

We chose option 2. We accomplished this in three parts.

Part 1: Input Gates

Let’s go back to our original example. Can we make this example "just work", even in the face of concurrent requests?

// Can this "just work" please?
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

It turns out we can! We create a new rule:

Input gates: While a storage operation is executing, no events shall be delivered to the object except for storage completion events. Any other events will be deferred until such a time as the object is no longer executing JavaScript code and is no longer waiting for any storage operations. We say that these events are waiting for the "input gate" to open.

If we do this, then our storage operations above are no longer an opportunity for concurrency. Our concurrent requests now look like this:

Request 1 timeline Request 2 timeline
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
 
 
 
 
// Request 2 delivery is blocked because
// request 1 is waiting for storage.
  await this.storage.put("counter", val + 1);
 
 
 
// Request 2 delivery is blocked because
// request 1 is waiting for storage.
  return val;
}
 
 
 
 
 
 
 
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

The two calls return unique numbers, as expected. Hooray! (Unfortunately, we did it by delaying the second request, creating latency and reducing throughput — but we’ll address that in part 3, below.)

Note that our rule does not preclude making multiple concurrent requests to storage at the same time. You can still say:

let promise1 = this.storage.get("foo");
let promise2 = this.storage.put("bar", 123);
await promise1;
frob();
await promise2;

Here, the get() and put() execute concurrently. Moreover, the call to frob() may execute before the put() has completed (but strictly after the get() completes, since we awaited that promise). However, no other event — such as receiving a new request — can unexpectedly happen in the meantime.

On the other hand, the rule protects you not just against concurrent incoming requests, but also concurrent responses to outgoing requests. For example, say you have:

async function task1() {
  await fetch("https://example.com/api1");
  return await this.getUniqueNumber();
}
async function task2() {
  await fetch("https://example.com/api2");
  return await this.getUniqueNumber();
}
let promise1 = task1();
let promise2 = task2();
let val1 = await promise1;
let val2 = await promise2;

This code launches two fetch() calls concurrently. After each fetch completes, getUniqueNumber() is invoked. Could the two calls interfere with each other?

No, they will not. The completion of a fetch() is itself a kind of event. Our rule states that such events cannot be delivered while storage events are in progress. When the first of the two fetches returns, the app calls getUniqueNumber(), which starts performing some storage operations. If the second fetch() also returns while these storage operations are still outstanding, that return will be deferred until after the storage operations are done. Once again, our code ends up correct!

At this point, the async programming experts in the audience are probably starting to feel like something is fishy here. Indeed, there is a catch. What if we do:

// Still a problem even with input gates.
let promise1 = getUniqueNumber();
let promise2 = getUniqueNumber();
let val1 = await promise1;
let val2 = await promise2;

In this case, there is, in fact, a problem. Two calls to getUniqueNumber() are initiated by the same event. The application does not await the first call before starting the second, so the two calls end up running concurrently. Our special rule doesn’t protect us here, because there is no incoming event that can be deferred between when the two calls are made. From the system’s point of view, there’s no way to distinguish this code from code which legitimately decided to perform two storage operations in parallel.

As such, in this case, the two calls to getUniqueNumber() will interfere with each other. However, this problem is far less likely to come about by accident, and is far easier to catch in testing. This bug is deterministic, not caused by the unpredictable timing of network events. We consider this an acceptable caveat in order to solve the larger problem posed by concurrent requests.

Part 2: Output Gates

Let’s go back to our in-memory caching example. Can we make it work?

// Can we make this "just work"?
async function getUniqueNumber() {
  if (this.val === undefined) {
    this.val = await this.storage.get("counter");
  }

  let result = this.val;
  ++this.val;
  this.storage.put("counter", this.val);
  return result;
}

With input gates (part 1), we’ve solved one of the two problems this code had: the race condition of initialization. We no longer need to worry that two requests will call this at the same time, leading this.val to be initialized twice.

However, the problem with not awaiting the put() is still there. If we don’t await it, then we could lose data. If we do await it, then the call is slow.

We make another new rule:

Output gates: When a storage write operation is in progress, any new outgoing network messages will be held back until the write has completed. We say that these messages are waiting for the "output gate" to open. If the write ultimately fails, the outgoing network messages will be discarded and replaced with errors, while the Durable Object will be shut down and restarted from scratch.

With this rule, we no longer have to await the result of put(). Our code can happily continue executing and just assume the put() will succeed. If the put() doesn’t succeed, then anything the application does here will never be observable to the rest of the world anyway. For example, if the app prematurely sends a response to the user saying that the operation succeeded, this response will not actually be delivered until after the put() completes successfully. So, by the time the user receives the message, it is no longer "premature"! In the very rare event that the write operation fails, the user will not receive the premature confirmation at all.

Note that output gates apply not only to responses sent back to a client, but also to new outgoing requests made with fetch() — those requests will be delayed from being sent until all prior writes are confirmed. So, once again, it is impossible for anything else in the world to observe a premature confirmation.

With this change, our getUniqueNumber() implementation with in-memory caching is now fully correct, while retaining most of its speed advantage over the non-caching implementation. Except for the very first call, the application will never be blocked waiting for getUniqueNumber() to finish. The final response from the app to the client will be delayed pending write confirmation, but that write can be performed in parallel with any writes the application performs after getUniqueNumber() completes.

Part 3: Automatic in-memory caching

Our in-memory caching example now works great. But, it’s still a little bit complicated and unnatural to write. Let’s go back to our original, simple code one more time… can we make it fast by default?

// Can we make this not just work, but just work FAST?
async function getUniqueNumber() {
  let val = await this.storage.get("counter");
  await this.storage.put("counter", val + 1);
  return val;
}

The answer to this part is a classic one: we can add automatic caching to the storage layer, just like most operating systems do for disk storage.

We have rolled out an in-memory caching layer for Durable Objects. This layer keeps up to several megabytes worth of data directly in memory in the process where the object runs.

When a get() requests a key that is in cache, the operation returns immediately, without even context-switching out of the thread and isolate where the object is hosted. If the key is not in cache, then a storage request will still be needed, but reads complete relatively quickly.

Better yet, put() requests now always complete "instantaneously". A put() simply writes to cache. We rely on output gates ("part 2", above) to prevent the premature confirmation of writes to any external party. Writes will be coalesced (even if you await them), so that the output gate waits only for O(1) network round trips of latency, not O(n).

Moreover, because get() and put() now complete instantly in most or all cases, the negative impact of input gates on throughput is largely mitigated, because the gate now spends relatively little time blocked.

With Durable Objects built-in caching, our simple code is now just as fast as our code that manually implemented in-memory caching. Combined with input and output gates, our code is now simple, fast, and correct, all at the same time.

Bonus Correctness

Our caching layer provides some bonus consistency guarantees, in addition to performance.

First, writes are automatically coalesced. That is, if you perform multiple put() or delete() operations without awaiting them or anything else in between, then the operations are automatically grouped together and stored atomically. In the case of a sudden power failure, after coming back up, either all of the writes will have been stored, or none of them will. For example:

// Move a value from "foo" to "bar".
let val = await this.storage.get("foo");

this.storage.delete("foo");
this.storage.put("bar", val);
// There's no possibility of data loss, because the delete() and the
// following put() are automatically coalesced into one atomic
// operation. This is true as long as you do not `await` anything
// in between.

Second, the API is also able to provide stronger ordering guarantees for reads. Previously, overlapping storage operations did not have guaranteed ordering. For example, if you issued a get() and a put() on the same key at the same time (without awaiting one before starting the other), then it was not deterministic whether the get() might return the value written by the put() — regardless of the ordering of the statements in your code. The caching layer fixes this. Now, operations are performed in exactly the order in which they were initiated, regardless of when they complete.

These two features eliminate more subtle bugs that might otherwise be hard to catch in testing, so that you don’t have to be a database expert to write code that works.

Optional Bypass

We expect gates and caching will be a win in the vast majority of use cases, but not always. In some use cases, concurrency won’t lead to any problems, and so blocking it may be a loss. Sometimes, the application is OK with prematurely confirming writes in order to minimize latency. And sometimes, caching may just waste memory because the same keys are not frequently accessed.

For those cases, we offer explicit bypasses:

this.storage.get("foo", {allowConcurrency: true, noCache: true});
this.storage.put("foo", "bar", {allowUnconfirmed: true, noCache: true});

Developers who have taken the time to think carefully about these issues can use these flags to tune performance to their specific needs. For those who don’t want to think about it, the defaults should work well.

Conclusion

Concurrency is hard. It doesn’t matter if you’re a novice or an expert: even experts regularly get it wrong. It’s difficult to think about all the ways that concurrent operations might overlap to corrupt your application state.

The traditional answer has been to make applications stateless, and defer all concurrency control to the database layer using transactions. However, transactions are slow, which is a big reason why so many web applications today take hundreds of milliseconds or more to respond to basic actions.

Durable Objects are all about state. By keeping state in memory in addition to on disk, and directing requests for the same data to be coordinated through the same instance, we can make applications much faster. But until recently, this was extremely tricky to get right.

With input gates, output gates, and caching, code written in the most intuitive way now "just works", and runs fast. This means you can focus on building your application, without wasting time optimizing I/O performance and debugging obscure race conditions.

The collective thoughts of the interwebz

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close