Bringing a unified developer experience to Cloudflare Workers and Pages

Post Syndicated from Nevi Shah original http://blog.cloudflare.com/pages-and-workers-are-converging-into-one-experience/

Bringing a unified developer experience to Cloudflare Workers and Pages

Bringing a unified developer experience to Cloudflare Workers and Pages

Today, we’re thrilled to announce that Pages and Workers will be joining forces into one singular product experience!

We’ve all been there. In a surge of creativity, you visualize in your head the application you want to build so clearly with the pieces all fitting together – maybe a server side rendered frontend and an SQLite database for your backend. You head to your computer with the wheels spinning. You know you can build it, you just need the right tools. You log in to your Cloudflare dashboard, but then you’re faced with an incredibly difficult decision:

Cloudflare Workers or Pages?

Both seem so similar at a glance but also different in the details, so which one is going to make your idea become a reality? What if you choose the wrong one? What are the tradeoffs between the two? These are questions our users should never have to think about, but the reality is, they often do. Speaking with our wide community of users and customers, we hear it ourselves! Decision paralysis hits hard when choosing between Pages and Workers with both products made to build out serverless applications.

In short, we don’t want this for our users — especially when you’re on the verge of a great idea – no, a big idea. That’s why we’re excited to show off the first milestone towards bringing together the best of both beloved products — Workers and Pages into one powerful development platform! This is the beginning of the journey towards a shared fate between the two products, so we wanted to take the opportunity to tell you why we were doing this, what you can use today, and what’s next.

More on the “why”

The relationship between Pages and Workers has always been intertwined. Up until today, we always looked at the two as siblings — each having their own distinct characteristics but both allowing their respective users to build rich and powerful applications. Each product targeted its own set of use cases.

Workers first started as a way to extend our CDN and then expanded into a highly configurable general purpose compute platform. Pages first started as a static web hosting that expanded into Jamstack territory. Over time, Pages began acquiring more of Workers' powerful compute features, while Workers began adopting the rich developer features introduced by Pages. The lines between these two products blurred, making it difficult for our users to understand the differences and pick the right product for their application needs.

We know we can do better to help alleviate this decision paralysis and help you move fast throughout your development experience.

Cool, but what do you mean?

Instead of being forced to make tradeoffs between these two products, we want to bring you the best of the both worlds: a single development platform that has both powerful compute and superfast static asset hosting – that seamlessly integrates with our portfolio of storage products like R2, Queues, D1, and others, and provides you with rich tooling like CI/CD, git-ops workflows, live previews, and flexible environment configurations.

All the details in one place

Today, a lot of our developers use both Pages and Workers to build pieces of their applications. However, they still live in separate parts of the Cloudflare dashboard and don’t always translate from one to the other, making it difficult to combine and keep track of your app’s stack. While we’re still vision-boarding the look and feel, we’re planning a world where users have the ability to manage all of their applications in one central place.

Bringing a unified developer experience to Cloudflare Workers and Pages

No more scrambling all over the dashboard to find the pieces of your application – you’ll have all the information you need about a project right at your fingertips.

Primitives

With Pages and Workers converging, we’ll also be redefining the concept of a “project” , introducing a new blank canvas of possibilities to plug and play. Within a project, you will be able to add (1) static assets, (2) serverless functions (Workers), (3) resources or (4) any combination of each.

To unlock the full potential of your application, we’re exploring project capabilities that allow you to auto-provision and directly integrate with resources like KV, Durable Objects, R2 and D1. With the possibility of all of these primitives on a project, more importantly, you'll be able to safely perform rollbacks and previews, as we'll keep the versions of your assets, functions and resources in sync with every deployment. No need to worry about any of them becoming stale on your next deployment.

Bringing a unified developer experience to Cloudflare Workers and Pages

Deployments

One of Pages’ most notable qualities is its git-ops centered deployments. In our converged world, you’ll be able to optionally connect, build and deploy git repos that contain any combination of static assets, serverless functions and bindings to resources, as well as take advantage of the same high-performance CI system that exists in Pages today.

Like Pages, you will be able to preview deployments of your project with unique URLs protected by Cloudflare Access, available in your PRs or via Wrangler command. Because we know that great ideas take lots of vetting before the big release, we’ll also have a first-class concept of environments to enable testing in different setups.

Local development

Arguably one of the most important parts to consider is our local development story in a post-converged world. This developer experience should be no different from how we’re converging the products. In the future, as you work with our Wrangler CLI, you can expect a unified and predictable set of commands to use on your project – e.g. a simple wrangler dev and wrangler deploy. Using a configuration file that applies to your entire project along with all of its components, you can have the confidence that your command will act on the entire project – not just pieces of it!

What are the benefits?

With Workers and Pages converging, we’re not just unlocking all the golden developer features of each product into one development platform. We’re bringing all the performance, cost and load benefits too. This includes:

  • Super low latency with globally distributed static assets and compute on our network that is just 50ms away from 95% of Internet-connected world-wide population.
  • Free egress and also free static asset hosting.
  • Standards-based JavaScript runtime with seamless compatibility across the packages and libraries you're already familiar with.

Seamless migrations for all

If you’re already a Pages or Workers user and are starting to get nervous about what this means for your existing projects – never fear. As we build out this merged architecture, seamless migration is our top priority and the North Star for every step on the way to a unified development platform. Existing projects on both Pages and Workers will continue to work without users needing to lift a finger. Instead, you'll see more and more features become available to enrich your existing projects and workflows, regardless of the product you started with.

What’s new today?

We’ll be working over the next year to converge Pages and Workers into one singular experience, blending not only the products themselves but also our product, engineering and design teams behind the scenes.

While we can’t wait to welcome you to the new converged world, this change unfortunately won’t happen overnight. We’re planning to hit some big but incremental milestones over the next few quarters to ensure a smooth transition into convergence, and this Developer Week, we’re excited to take our first step toward convergence. In the dashboard, things might feel a bit different!

Get started together

Combining the onboarding experience for Pages and Workers into one flow, you’ll notice some changes on our dashboard when you’re creating a project. We’re slowly bringing the two products closer together by unifying the creation flow giving you access to create either a Pages project or Worker from one screen.

Bringing a unified developer experience to Cloudflare Workers and Pages

Go faster with templates

We understand the classic developer urge to immediately get hands dirty and hit the ground running on their big vision. We’re making it easier than ever to go from an idea to an application that’s live on the Cloudflare network. In a couple of clicks, you can deploy a starter template, ranging from a simple Hello World Worker to a ChatGPT plugin. In the future, we’re working on Pages templates in our dashboard, allowing you to automatically create a new repo and deploy starter full-stack apps with a couple of buttons.

Bringing a unified developer experience to Cloudflare Workers and Pages

Your favorite full stack frameworks at your fingertips

We're not stopping with static templates or our dashboard either. Bringing the framework of your choice doesn't mean you have to leave behind the tools you already know and love. If you’re itching to see just what we mean when we say “deploy with your favorite full-stack framework” or “check out the power of Workers”, simply execute:

npm create cloudflare@latest

from your terminal and enjoy the ride! This new CLI experience integrates with CLIs from some of our first class and solidly supported full-stack frameworks like Angular, Next, Qwik and Remix giving you full control of how you create new projects. From this tool you can also deploy a variety of Workers using our powerful starter templates, with a wizard-like experience.

One singular place to find all of your applications

We’re taking one step closer to a unified experience by merging the Pages and Workers project list dashboards together. Once you’ve deployed your application, you’ll notice all of your Pages and Workers on one page, so you don’t have to navigate to different parts of your dashboard. Track your usage analytics for Workers / Pages Functions in one spot. In the future, these cards won’t be identifiable as Pages and Workers – just “projects” with a combination of assets, functions and resources!

Bringing a unified developer experience to Cloudflare Workers and Pages

What’s next?

As we begin executing, you’ll notice that each product will slowly become more and more similar as we unlock features for each platform until they’re ready to be one such as git integration for your Workers and a config file for your Pages projects!

Keep an eye out on Twitter to hear about the newest capabilities and more on what’s to come in every milestone.

Have thoughts?

Of course, we wouldn’t be able to build an amazing platform without first listening to the voice of our community. In fact, we’ve put together a survey to collect more information about our users and receive input on what you’d like to see. If you have a few minutes, you can fill it out or reach out to us on the Cloudflare Developers Discord or Twitter @CloudflareDev.

Modernizing the toolbox for Cloudflare Pages builds

Post Syndicated from Greg Brimble original http://blog.cloudflare.com/moderizing-cloudflare-pages-builds-toolbox/

Modernizing the toolbox for Cloudflare Pages builds

Modernizing the toolbox for Cloudflare Pages builds

Cloudflare Pages launched over two years ago in December 2020, and since then, we have grown Pages to build millions of deployments for developers. In May 2022, to support developers with more complex requirements, we opened up Pages to empower developers to create deployments using their own build environments — but that wasn't the end of our journey. Ultimately, we want to be able to allow anyone to use our build platform and take advantage of the git integration we offer. You should be able to connect your repository and have it just work on Cloudflare Pages.

Today, we're introducing a new beta version of our build system (a.k.a. "build image") which brings the default set of tools and languages up-to-date, and sets the stage for future improvements to builds on Cloudflare Pages. We now support the latest versions of Node.js, Python, Hugo and many more, putting you on the best path for any new projects that you undertake. Existing projects will continue to use the current build system, but this upgrade will be available to opt-in for everyone.

New defaults, new possibilities

The Cloudflare Pages build system has been updated to not only support new versions of your favorite languages and tools, but to also include new versions by default. The versions of 2020 are no longer relevant for the majority of today's projects, and as such, we're bumping these to their more modern equivalents:

  • Node.js' default is being increased from 12.18.0 to 18.16.0,
  • Python 2.7.18 and 3.10.5 are both now available by default,
  • Ruby's default is being increased from 2.7.1 to 3.2.2,
  • Yarn's default is being increased from 1.22.4 to 3.5.1,
  • And we're adding pnpm with a default version of 8.2.0.

These are just some of the headlines — check out our documentation for the full list of changes.

We're aware that these new defaults constitute a breaking change for anyone using a project without pinning their versions with an environment variable or version file. That's why we're making this new build system opt-in for existing projects. You'll be able to stay on the existing system without breaking your builds. If you do decide to adventure with us, we make it easy to test out the new system in your preview environments before rolling out to production.

Modernizing the toolbox for Cloudflare Pages builds

Additionally, we're now making your builds more reproducible by taking advantage of lockfiles with many package managers. npm ci and yarn --pure-lockfile are now used ahead of your build command in this new version of the build system.

For new projects, these updated defaults and added support for pnpm and Yarn 3 mean that more projects will just work immediately without any undue setup, tweaking, or configuration. Today, we're launching this update as a beta, but we will be quickly promoting it to general availability once we're satisfied with its stability. Once it does graduate, new projects will use this updated build system by default.

We know that this update has been a long-standing request from our users (we thank you for your patience!) but part of this rollout is ensuring that we are now in a better position to make regular updates to Cloudflare Pages' build system. You can expect these default languages and tools to now keep pace with the rapid rate of change seen in the world of web development.

We very much welcome your continued feedback as we know that new tools can quickly appear on the scene, and old ones can just as quickly drop off. As ever, our Discord server is the best place to engage with the community and Pages team. We’re excited to hear your thoughts and suggestions.

Our modular and scalable architecture

Powering this updated build system is a new architecture that we've been working on behind-the-scenes. We're no strangers to sweeping changes of our build infrastructure: we've done a lot of work to grow and scale our infrastructure. Moving beyond purely static site hosting with Pages Functions brought a new wave of users, and as we explore convergence with Workers, we expect even more developers to rely on our git integrations and CI builds. Our new architecture is being rolled out without any changes affecting users, so unless you're interested in the technical nitty-gritty, feel free to stop reading!

The biggest change we're making with our architecture is its modularity. Previously, we were using Kubernetes to run a monolithic container which was responsible for everything for the build. Within the same image, we'd stream our build logs, clone the git repository, install any custom versions of languages and tools, install a project's dependencies, run the user's build command, and upload all the assets of the build. This was a lot of work for one container! It meant that our system tooling had to be compatible with versions in the user's space and therefore new default versions were a massive change to make. This is a big part of why it took us so long to be able to update the build system for our users.

In the new architecture, we've broken these steps down into multiple separate containers. We make use of Kubernetes' init containers feature and instead of one monolithic container, we have three that execute sequentially:

  1. clone a user's git repository,
  2. install any custom versions of languages and tools, install a project's dependencies, run the user's build command, and
  3. upload all the assets of a build.

We use a shared volume to give the build a persistent workspace to use between containers, but now there is clear isolation between system stages (cloning a repository and uploading assets) and user stages (running code that the user is responsible for). We no longer need to worry about conflicting versions, and we've created an additional layer of security by isolating a user's control to a separate environment.

Modernizing the toolbox for Cloudflare Pages builds

We're also aligning the final stage, the one responsible for uploading static assets, with the same APIs that Wrangler uses for Direct Upload projects. This reduces our maintenance burden going forward since we'll only need to consider one way of uploading assets and creating deployments. As we consolidate, we're exploring ways to make these APIs even faster and more reliable.

Logging out

You might have noticed that we haven't yet talked about how we're continuing to stream build logs. Arguably, this was one of the most challenging pieces to work out. When everything ran in a single container, we were able to simply latch directly into the stdout of our various stages and pipe them through to a Durable Object which could communicate with the Cloudflare dashboard.

By introducing this new isolation between containers, we had to get a bit more inventive. After prototyping a number of approaches, we've found one that we like. We run a separate, global log collector container inside Kubernetes which is responsible for collating logs from a build, and passing them through to that same Durable Object infrastructure. The one caveat is that the logs now need to be annotated with which build they are coming from, since one global log collector container accepts logs from multiple builds. A Worker in front of the Durable Object is responsible for reading the annotation and delegating to the relevant build's Durable Object instance.

Modernizing the toolbox for Cloudflare Pages builds

Caching in

With this new modular architecture, we plan to integrate a feature we've been teasing for a while: build caching. Today, when you run a build in Cloudflare Pages, we start fresh every time. This works, but it's inefficient.

Very often, only small changes are actually made to your website between deployments: you might tweak some text on your homepage, or add a new blog post; but rarely does the core foundation of your site actually change between deployments. With build caching, we can reuse some of the work from earlier builds to speed up subsequent builds. We'll offer a best-effort storage mechanism that allows you to persist and restore files between builds. You'll soon be able to cache dependencies, as well as the build output itself if your framework supports it, resulting in considerably faster builds and a tighter feedback loop from push to deploy.

This is possible because our new modular design has clear divides between the stages where we'd want to restore and cache files.

Modernizing the toolbox for Cloudflare Pages builds

Start building

We're excited about the improvements that this new modular architecture will afford the Pages team, but we're even more excited for how this will result in faster and more scalable builds for our users. This architecture transition is rolling out behind-the-scenes, but the updated beta build system with new languages and tools is available to try today. Navigate to your Pages project settings in the Cloudflare Dashboard to opt-in.

Let us know if you have any feedback on the Discord server, and stay tuned for more information about build caching in upcoming posts on this blog. Later today (Wednesday 17th, 2023), the Pages team will be hosting a Q&A session to talk about this announcement on Discord at 17:30 UTC.

Making Cloudflare the best place for your web applications

Post Syndicated from Igor Minar original http://blog.cloudflare.com/making-cloudflare-for-web/

Making Cloudflare the best place for your web applications

Making Cloudflare the best place for your web applications

Hey web developers! We are about to shake things up a bit here at Cloudflare and wanted to give you a heads-up, so that you know what we are doing and where we are going. You might know Cloudflare as one of the best places to come to when you need to protect, speed up, or scale your web application, but increasingly Cloudflare is also becoming the best place to deploy and run your application!

Why deploy your application to Cloudflare? Two simple reasons. First, it removes lots of hassle of managing many separate systems and allows you to develop, deploy, monitor, and tune your application all in one place. Second, by deploying to Cloudflare directly, there is so much more we can do to optimize your application and get it to the hands, ears, or eyes of your users more quickly and smoothly.

So what’s changing? Quite a bit, actually. I’m not going to bore you with rehashing all the details as my most-awesome colleagues have written separate blog posts with all the details, but here is a high level rundown.

Cloudflare Workers + Pages = awesome development platform

Cloudflare Pages and Workers are merging into a single unified development and application hosting platform that offers:

  • Super low latency globally: your static assets and compute are less than 50ms away from 95% of the world’s Internet-connected population.
  • Free egress including free static asset hosting.
  • Standards-based JavaScript and WASM runtime that already serves over 10 million requests per second at peak globally.
  • Access to powerful features like R2 (object storage with an S3-compatible API), low-latency globally replicated KV storage, Queues, D1 database, and many more.
  • Support for GitOps and CI/CD workflows and preview environments to boost development velocity.
  • … and so much more.

While mathematically proven to be wrong, we stubbornly believe that 1+1=3, and in this case this translates to Cloudflare Pages + Workers = way more than the sum of the parts. In fact, it’s an awesome foundation for one of a kind development platform that we are thrilled to be building for you.

We started this product convergence journey a few quarters ago, and early on agreed upon not leaving any of the existing applications behind. Instead, we’ll be bringing them over to this new world. Today we are ready to start sharing the incremental results, with so much more to come over the upcoming quarters. Want to know more? My colleague Nevi posted lots of spicy details in her blog post.

Smart Placement for Workers takes us beyond the edge!

Smart placement is, to put it simply, revolutionary for Cloudflare. It enables a new compute paradigm on our platform, unmatched by any other application hosting providers today. Do you have a typical full-stack application built with one of the many popular web frameworks? This feature is for you! And it works with both Workers and Pages!

While previously we always executed all applications at the “edge” of our global network — meaning, as close to the user as possible. With smart placement, we intelligently determine the best location within our network where the compute (your application) should run. We do this by observing your application’s behavior and what other network resources or endpoints the application interacts with. We then transparently spawn your application at an optimal location, usually close to where your data is stored, and route the incoming requests via our network to this location.

Smart placement enables applications to run near to the data these applications need to get stuff done. This is especially powerful for applications that interact with databases, object stores, or other backend endpoints, especially if these are centralized and not globally distributed.

Your user or clients requests still enter our lightning fast network in one of our 285+ datacenters in the world, close to their current location, but instead of spawning the application right there, we route the request to the most optimal datacenter, the one that is near the data or backend system the application talks to.

This doesn’t mean that compute at the edge is not cool anymore! It is! There are still many use-cases where running your application at the edge makes sense, and smart placement will determine this scenario and keep the application at the edge if that’s the right place for it to be. A/B testing, localization, asset serving, and others are use-cases that should almost always happen at the edge.

Sounds interesting? Check out this visual demo and read up on Smart Placement in a blog post from my colleague Tanushree to get started.

Develop locally or in the browser!

We continue to deliver on our goal to build the best development environment integrated directly into our lightning fast and globally distributed application platform. We’re launching Wrangler v3, with complete support for local-by-default development workflow. Powered by the open-source Cloudflare Workers JavaScript runtime — workerd, this change reduces development server startup time by 10x and script reload times by 60x — boosting your productivity and keeping you in the flow longer.

In the dashboard, we're introducing an upgraded and far more powerful online editor powered by VSCode – you can now finally edit multiple JavaScript modules in your browser, get an accurate edge preview of your code, friendly error pages, and type checking!

Finally, in both our dashboard editor and Wrangler, we've updated our workerd-customized Chrome DevTools to the latest version, providing even greater debugging and profiling capabilities, wherever you choose to work.

This is just the first wave of improvements to our development tooling space, you’ll see us iterating in this space over the next few quarters, but in the meantime, check out in-depth posts from Adam, Brendan, and Samuel with all the Wrangler v3 details and VSCode and dash editor improvements.

Increased memory, CPU, and application size limits and simplified pricing!

In the age of AI, WASM, and powerful full-stack applications, we’ve noticed that developers are hitting our current resource limits with increased frequency. We want to be a place where these applications thrive and developers are empowered to build bigger and more sophisticated applications. Therefore, within the next week we’ll be increasing application size limits (JavaScript/WASM bundle size) to 10MB (after gzip) and startup latency limit (script compile time) is being increased from 200ms to 400ms.

To further empower developers, we’re thinking about how to unify and simplify our billing model to make our pricing more straightforward, and increase limits such as memory limits by introducing tiers. Stay tuned for more information on these!

With these changes developers can build cooler apps and operate them for less! Cool, right?!?

Pages CI now with a modern build image!

The wait is finally over! Pages now use a modern build image to power the CI and integrated build system. With this improvement you can finally use recent versions of Node.js, pnpm, and many other tools used by developers today.

While delivering this improvement, we made it much easier for us to keep things up to date in the future, but also unlocked new features like build caching!

The updates are available to all new projects by default, while existing projects can opt in to newer defaults. Sounds like your cup of coffee? Read on in this blog post by Greg.

Enough already, let’s get started! …with your framework of choice and C3!

In addition to being a CDN, and place to deploy your Worker applications, Cloudflare is now also becoming the best place to run your full-stack web applications. This includes all full-stack web frameworks like Angular, Astro, Next, Nuxt, Qwik, Remix, Solid, Svelte, Vue, and others.

Our overall mission is to help build a better Internet, and my team’s contribution to this mission is to enable developers, but really just about anyone, to go from an idea to a deployed application in no time.

To enable developers to turn their ideas into deployed applications quickly and without any hassle we’ve built two things.

First, we partnered with many web framework authors to build new or improve existing adapters for all the popular JavaScript web frameworks. These adapters ensure that your application runs on our platform in the most efficient way, while having access to all the capabilities and features of our platform.

These adapters include the highly requested Next.js adapter, that we’ve just overhauled to be production ready and are launching 1.0.0 today! In partnership with the respective teams, we’ve built brand-new adapters for Angular, and Qwik, while improving Astro, Nuxt, Solid, and a few others.

Second, we developed a brand new sassy CLI we call C3 — short for create-cloudflare CLI, a sibling to our existing Wrangler CLI. If you are a developer who lives your life in terminal or local editors like VSCode, then this CLI is your single entry-point to the Cloudflare universe.

Run the C3 command, and we’ll get you started. You pick your framework of choice, we hand the control over to the CLI of the chosen framework as we don’t want to stand in between you and the hard-working framework authors that craft the experience for their framework. A minute or so later once all npm dependencies are installed, you get a URL from us with your application deployed. That’s it. From an idea to a URL that you can share with friends almost instantly! Boom.

The best place for your web applications

So to recap, our first class support for full-stack web frameworks, combined with the low latency and cost-effectiveness of our platform, as well as smart placement that allows the backend of the full-stack web application to run in the optimal location automagically, and all the remaining significant improvements in our developer tooling, makes Cloudflare THE best place to build and host web applications. This is our contribution to our mission to build a better Internet and push the Web forward.

We aspire to be the place people turn to when they want to get business done, or when they just want to be creative, explore ideas and have fun. It’s a long journey, and we’ve got a lot of interesting challenges ahead of us. Your input will be critical in guiding us. We are all thrilled to have the opportunity to be part of it and give it our best shot. You can join this journey too, and get started today:

npm create cloudflare my-first-app

Improved local development with wrangler and workerd, Developer Week

Post Syndicated from Brendan Coll original http://blog.cloudflare.com/wrangler3/

Improved local development with wrangler and workerd, Developer Week

Improved local development with wrangler and workerd, Developer Week

For over a year now, we’ve been working to improve the Workers local development experience. Our goal has been to improve parity between users' local and production environments. This is important because it provides developers with a fully-controllable and easy-to-debug local testing environment, which leads to increased developer efficiency and confidence.

To start, we integrated Miniflare, a fully-local simulator for Workers, directly into Wrangler, the Workers CLI. This allowed users to develop locally with Wrangler by running wrangler dev --local. Compared to the wrangler dev default, which relied on remote resources, this represented a significant step forward in local development. As good as it was, it couldn’t leverage the actual Workers runtime, which led to some inconsistencies and behavior mismatches.

Last November, we announced the experimental version of Miniflare v3, powered by the newly open-sourced workerd runtime, the same runtime used by Cloudflare Workers. Since then, we’ve continued to improve upon that experience both in terms of accuracy with the real runtime and in cross-platform compatibility.

As a result of all this work, we are proud to announce the release of Wrangler v3 – the first version of Wrangler with local-by-default development.

A new default for Wrangler

Starting with Wrangler v3, users running wrangler dev will be leveraging Miniflare v3 to run your Worker locally. This local development environment is effectively as accurate as a production Workers environment, providing an ability for you to test every aspect of your application before deploying. It provides the same runtime and bindings, but has its own simulators for KV, R2, D1, Cache and Queues. Because you’re running everything on your machine, you won’t be billed for operations on KV namespaces or R2 buckets during development, and you can try out paid-features like Durable Objects for free.

In addition to a more accurate developer experience, you should notice performance differences. Compared to remote mode, we’re seeing a 10x reduction to startup times and 60x reduction to script reload times with the new local-first implementation. This massive reduction in reload times drastically improves developer velocity!

Improved local development with wrangler and workerd, Developer Week

Remote development isn’t going anywhere. We recognise many developers still prefer to test against real data, or want to test Cloudflare services like image resizing that aren’t implemented locally yet. To run wrangler dev on Cloudflare’s network, just like previous versions, use the new --remote flag.

Deprecating Miniflare v2

For users of Miniflare, there are two important pieces of information for those updating from v2 to v3. First, if you’ve been using Miniflare’s CLI directly, you’ll need to switch to wrangler dev. Miniflare v3 no longer includes a CLI. Secondly, if you’re using Miniflare’s API directly, upgrade to miniflare@3 and follow the migration guide.

How we built Miniflare v3

Miniflare v3 is now built using workerd, the open-source Cloudflare Workers runtime. As workerd is a server-first runtime, every configuration defines at least one socket to listen on. Each socket is configured with a service, which can be an external server, disk directory or most importantly for us, a Worker! To start a workerd server running a Worker, create a worker.capnp file as shown below, run npx workerd serve worker.capnp and visit http://localhost:8080 in your browser:

using Workerd = import "/workerd/workerd.capnp";


const helloConfig :Workerd.Config = (
 services = [
   ( name = "hello-worker", worker = .helloWorker )
 ],
 sockets = [
   ( name = "hello-socket", address = "*:8080", http = (), service = "hello-worker" )
 ]
);


const helloWorker :Workerd.Worker = (
 modules = [
   ( name = "worker.mjs",
     esModule =
       `export default {
       `  async fetch(request, env, ctx) {
       `    return new Response("Hello from workerd! 👋");
       `  }
       `}
   )
 ],
 compatibilityDate = "2023-04-04",
);

If you’re interested in what else workerd can do, check out the other samples. Whilst workerd provides the runtime and bindings, it doesn’t provide the underlying implementations for the other products in the Developer Platform. This is where Miniflare comes in! It provides simulators for KV, R2, D1, Queues and the Cache API.

Improved local development with wrangler and workerd, Developer Week

Building a flexible storage system

As you can see from the diagram above, most of Miniflare’s job is now providing different interfaces for data storage. In Miniflare v2, we used a custom key-value store to back these, but this had a few limitations. For Miniflare v3, we’re now using the industry-standard SQLite, with a separate blob store for KV values, R2 objects, and cached responses. Using SQLite gives us much more flexibility in the queries we can run, allowing us to support future unreleased storage solutions. 👀

A separate blob store allows us to provide efficient, ranged, streamed access to data. Blobs have unguessable identifiers, can be deleted, but are otherwise immutable. These properties make it possible to perform atomic updates with the SQLite database. No other operations can interact with the blob until it's committed to SQLite, because the ID is not guessable, and we don't allow listing blobs. For more details on the rationale behind this, check out the original GitHub discussion.

Running unit tests inside Workers

One of Miniflare’s primary goals is to provide a great local testing experience. Miniflare v2 provided custom environments for popular Node.js testing frameworks that allowed you to run your tests inside the Miniflare sandbox. This meant you could import and call any function using Workers runtime APIs in your tests. You weren’t restricted to integration tests that just send and receive HTTP requests. In addition, these environments provide per-test isolated storage, automatically undoing any changes made at the end of each test.

In Miniflare v2, these environments were relatively simple to implement. We’d already reimplemented Workers Runtime APIs in a Node.js environment, and could inject them using Jest and Vitest’s APIs into the global scope.

Improved local development with wrangler and workerd, Developer Week

For Miniflare v3, this is much trickier. The runtime APIs are implemented in a separate workerd process, and you can’t reference JavaScript classes across a process boundary. So we needed a new approach…

Improved local development with wrangler and workerd, Developer Week

Many test frameworks like Vitest use Node’s built-in worker_threads module for running tests in parallel. This module spawns new operating system threads running Node.js and provides a MessageChannel interface for communicating between them. What if instead of spawning a new OS thread, we spawned a new workerd process, and used WebSockets for communication between the Node.js host process and the workerd “thread”?

Improved local development with wrangler and workerd, Developer Week

We have a proof of concept using Vitest showing this approach can work in practice. Existing Vitest IDE integrations and the Vitest UI continue to work without any additional work. We aren’t quite ready to release this yet, but will be working on improving it over the next few months. Importantly, the workerd “thread” needs access to Node.js built-in modules, which we recently started rolling out support for.

Improved local development with wrangler and workerd, Developer Week

Running on every platform

We want developers to have this great local testing experience, regardless of which operating system they’re using. Before open-sourcing, the Cloudflare Workers runtime was originally only designed to run on Linux. For Miniflare v3, we needed to add support for macOS and Windows too. macOS and Linux are both Unix-based, making porting between them relatively straightforward. Windows on the other hand is an entirely different beast… 😬

The workerd runtime uses KJ, an alternative C++ base library, which is already cross-platform. We’d also migrated to the Bazel build system in preparation for open-sourcing the runtime, which has good Windows support. When compiling our C++ code for Windows, we use LLVM's MSVC-compatible compiler driver clang-cl, as opposed to using Microsoft’s Visual C++ compiler directly. This enables us to use the "same" compiler frontend on Linux, macOS, and Windows, massively reducing the effort required to compile workerd on Windows. Notably, this provides proper support for #pragma once when using symlinked virtual includes produced by Bazel, __atomic_* functions, a standards-compliant preprocessor, GNU statement expressions used by some KJ macros, and understanding of the .c++ extension by default. After switching out unix API calls for their Windows equivalents using #if _WIN32 preprocessor directives, and fixing a bunch of segmentation faults caused by execution order differences, we were finally able to get workerd running on Windows! No WSL or Docker required! 🎉

Let us know what you think!

Wrangler v3 is now generally available! Upgrade by running npm install --save-dev wrangler@3 in your project. Then run npx wrangler dev to try out the new local development experience powered by Miniflare v3 and the open-source Workers runtime. Let us know what you think in the #wrangler channel on the Cloudflare Developers Discord, and please open a GitHub issue if you hit any unexpected behavior.

[$] High-granularity mappings for huge pages

Post Syndicated from original https://lwn.net/Articles/931773/

The use of huge pages can make memory management more efficient in a number
of ways, but it can also impose costs in the form of internal fragmentation and
I/O amplification. At the 2023 Linux
Storage, Filesystem, Memory-Management and BPF Summit
, James Houghton
ran a session on a scheme to get the best of both worlds: using huge pages
while maintaining base-page mappings within them.

Security updates for Wednesday

Post Syndicated from original https://lwn.net/Articles/932130/

Security updates have been issued by Debian (netatalk), Mageia (connman, firefox/nss/rootcerts, freeimage, golang, indent, kernel, python-django, python-pillow, and thunderbird), Red Hat (apr-util, firefox, java-1.8.0-ibm, libreswan, and thunderbird), SUSE (conmon, curl, java-11-openjdk, and libheif), and Ubuntu (libwebp, linux, linux-aws, linux-aws-5.15, linux-azure, linux-azure-5.15,
linux-azure-fde, linux-azure-fde-5.15, linux-hwe-5.15, linux-ibm,
linux-kvm, linux-lowlatency, linux-lowlatency-hwe-5.15, linux-oracle, linux, linux-aws, linux-aws-hwe, linux-kvm, linux, linux-aws, linux-azure, linux-azure-5.19, linux-kvm,
linux-lowlatency, linux-raspi, node-eventsource, and openjdk-8, openjdk-lts, openjdk-17, openjdk-20).

Microsoft Secure Boot Bug

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2023/05/microsoft-secure-boot-bug.html

Microsoft is currently patching a zero-day Secure-Boot bug.

The BlackLotus bootkit is the first-known real-world malware that can bypass Secure Boot protections, allowing for the execution of malicious code before your PC begins loading Windows and its many security protections. Secure Boot has been enabled by default for over a decade on most Windows PCs sold by companies like Dell, Lenovo, HP, Acer, and others. PCs running Windows 11 must have it enabled to meet the software’s system requirements.

Microsoft says that the vulnerability can be exploited by an attacker with either physical access to a system or administrator rights on a system. It can affect physical PCs and virtual machines with Secure Boot enabled.

That’s important. This is a nasty vulnerability, but it takes some work to exploit it.

The problem with the patch is that it breaks backwards compatibility: “…once the fixes have been enabled, your PC will no longer be able to boot from older bootable media that doesn’t include the fixes.”

And:

Not wanting to suddenly render any users’ systems unbootable, Microsoft will be rolling the update out in phases over the next few months. The initial version of the patch requires substantial user intervention to enable—you first need to install May’s security updates, then use a five-step process to manually apply and verify a pair of “revocation files” that update your system’s hidden EFI boot partition and your registry. These will make it so that older, vulnerable versions of the bootloader will no longer be trusted by PCs.

A second update will follow in July that won’t enable the patch by default but will make it easier to enable. A third update in “first quarter 2024” will enable the fix by default and render older boot media unbootable on all patched Windows PCs. Microsoft says it is “looking for opportunities to accelerate this schedule,” though it’s unclear what that would entail.

So it’ll be almost a year before this is completely fixed.

Защо Истанбулската конвенция отново е на дневен ред

Post Syndicated from Светла Енчева original https://www.toest.bg/zashto-istanbulskata-konventsiya-otnovo-e-na-dneven-red/

Защо Истанбулската конвенция отново е на дневен ред

Европейският парламент гласува за присъединяването на ЕС към Конвенцията на Съвета на Европа за превенция и борба с насилието над жени и домашното насилие, по-известна като Истанбулската конвенция (ИК). Парламентът я ратифицира на 10 март 2023 г. От това следва, че отделни елементи от нея стават задължителни за страните членки. Дали поради усилията на ГЕРБ в настоящия момент да се представя като истинска европейска партия, или по друга причина, ратифицирането на Конвенцията от страна на ЕП не произведе скандал в България. Не стана дори запомняща се новина. Превърна се основно в повод позабравеният евродепутат от ВМРО Ангел Джамбазки да напомни за себе си, получавайки трибуна в една от най-гледаните телевизии.

Какво точно е ратифицирал ЕП?

Още през 2016 г. Европейската комисия препоръча ЕС да се присъедини към Конвенцията, но това не става поради съпротивата на шест страни членки, между които и България. През 2021 г. съдът на ЕС излезе с решение, по силата на което ЕС може да ратифицира ИК с квалифицирано мнозинство, но по такъв начин, че само отделни нейни аспекти да са задължително приложими за държавите членки. Именно в изпълнение на това решение е гласуването от 10 март.

Тематичните области от ИК, които са подложени на ратифициране, са разделени в две отделни гласувания. Това се налага, тъй като според Договора за функционирането на ЕС за Ирландия и Дания важат някои изключения, които правят гласуването на част от предложенията безпредметно. А ратифицирането на ИК е именно с цел да бъде синхронизирана с този договор.

Първото гласуване засяга не толкова страните членки, колкото функционирането на европейските институции. Става дума за прилагането на ИК по отношение на правилата за длъжностните служители на ЕС и условията за работа на другите служители на Съюза. С други думи, при работата на администрацията на ЕС да не се допуска насилие, основано на пола.

Второто гласуване (което не включва Ирландия и Дания) се отнася до съдебното сътрудничество по въпросите на наказателното право, както и до убежището.

Що се отнася до наказателното право, става дума за това, че страните членки трябва да признават взаимно присъдите и съдебните решения в областта на ИК, които издават, и полицейските органи на тези страни да си сътрудничат. За целта ЕП и Европейският съвет могат да установят минимални правила, които да се спазват, като се отчитат различните правни традиции на отделните държави. Двете европейски институции могат и да приемат мерки, с които да насърчат държавите членки да осъществяват превенция.

По въпроса за убежището ЕП прие да има единни стандарти за предоставяне на бежански статут, включително в случаите, когато молбите за убежище се основават на пола. Също и лица, потърсили убежище на такова основание, да не бъдат принудително връщани в страна, в която са изложени на риск.

Какво следва за България от тази ратификация?

След ратифицирането на ИК от ЕП българските (както, разбира се, и останалите) евродепутати и други служители в европейските институции няма да могат безнаказано да упражняват върху свои колежки и подчинени насилие, основано на факта, че те са жени. Не че подобно отношение беше приемливо и досега, но то вече ще бъде недопустимо и по силата на Конвенцията.

Ако някой бъде осъден за домашно насилие или насилие към жена в някоя страна членка, той ще бъде смятан за такъв и в България. Осъдените в България пък ще се признават за такива в целия ЕС (с уточнението, че тук не влизат Ирландия и Дания). Същото важи и за хората, по отношение на които има мерки за защита от такова насилие. Ако някой, който е обявен за издирване например в Швеция заради тормоз върху бившата си приятелка, дойде на почивка в Слънчев бряг, българските полицаи ще трябва да съдействат на шведските си колеги за залавянето му.

Ако една жена от Афганистан или Иран кандидатства у нас за убежище на основание, че в страната си няма право на образование или на работа, или на свободно изразяване, България ще трябва да се отнесе сериозно към мотивите ѝ. Същото важи например и за жените, склонявани към принудителни бракове, подлагани на обрязване или заплашени от т.нар. убийства на честта.

Как гласуваха българските евродепутати?

И на двата вота, свързани с ратификацията на ИК, българските евродепутати гласуваха по един и същи начин. За ратификацията бяха шестима – Атидже Алиева-Вели, Илхан Кючюк и Искра Михайлова от парламентарната група „Обнови Европа“ (представители на ДПС), Сергей Станишев и Елена Йончева от групата на социалистите и демократите (излъчени от БСП, но понастоящем в конфликт с партията под ръководството на Корнелия Нинова) и Радан Кънев от Европейската народна партия (ДСБ).

С „въздържал се“ гласуваха двама – Асим Адемов и Андрей Новаков от Европейската народна партия (ГЕРБ).

Против ратификацията бяха отново двама – Ангел Джамбазки от групата на европейските консерватори и реформисти (ВМРО–БНД) и Александър Йорданов от ЕНП (СДС). Последният впрочем и при двете гласувания първоначално фигурира в списъка на въздържалите се, после коригира вота си на „против“.

Останалите 7 от общо 17-те български евродепутати не участваха в гласуването, макар някои от тях да присъстваха на заседанието същия ден и да дадоха гласа си по други въпроси.

Какви са следващите стъпки и каква е драмата с ИК?

След ратификацията на ИК от ЕП от България се очаква да се съобразява с конкретните аспекти от Конвенцията, които стават задължителни за отделните страни. Няма обаче изискване страните да хармонизират законодателството си в тази област, още по-малко да променят собствените си правни традиции.

Няма и принуда върху България да ратифицира ИК. Евродепутати нееднократно призовават страната ни, както и Чехия, Унгария, Латвия, Литва и Словакия, да ратифицират Конвенцията, за да могат жените в тези държави да получат цялата защита, предвидена в нея. Решението дали това да стане обаче, принадлежи единствено на отделните страни.

Конвенцията на Съвета на Европа за превенция и борба с насилието над жени и домашното насилие не беше ратифицирана от България и останалите изброени по-горе страни след целенасочена кампания за демонизирането ѝ, чиито корени могат да се намерят както в руската пропагандна машина, така и в някои християнски фундаменталистки кръгове.

Основен аргумент срещу ИК стана изопачаването на понятието „джендър“, включващо тези измерения на пола, които не се свеждат до биологията (например защо до 40-те години на ХХ век розовият цвят се е свързвал с мъжествеността, а синият с женствеността, а днес е обратното). Тъй като обаче съществува и понятие „джендър идентичност“ (дали идентичността на даден човек съответства на пола, приписан му при раждане), което се отнася към трансхората, се тиражира внушението, че всичко, що е джендър, се свежда до това.

В допълнение, този пропаганден език слага знак на равенство между джендър идентичност и сексуална ориентация, да не говорим за по-фини дистинкции като тези между трансджендър, транссексуалност и трети пол. Затова и хомо-, и бисексуалните започнаха да бъдат наричани „джендъри“. А на организациите, които се борят с насилието над жени, им се лепна етикетът, че „искат да узаконят третия пол“.

Струва ли си заради това жените да продължават да бъдат убивани?

В чл. 4 от ИК впрочем действително се споменават понятията идентичност, основана на пола (тоест джендър идентичност), и сексуална ориентация, за да се каже, че Конвенцията важи и за носителите на тези дискриминационни признаци, наред с много други. Но да не забравяме, че тя е посветена на борбата с насилието над жени и с домашното насилие. В този смисъл целият вой как с ратифицирането ѝ ще се узакони „третият пол“, всъщност означава едва ли не едно – по-добре жените да продължават да бъдат бити и убивани, отколкото защитата, която получат те, да важи и за ЛГБТИ хората.

В България впрочем правата на ЛГБТИ хората са защитени и от Закона за защита от дискриминация, по силата на който е забранена дискриминацията на основата на сексуална ориентация и на пол, като признакът „пол“ включва и случаите на промяната му. Друг е въпросът в каква степен този закон работи.

А докато България гордо и патриотично отказва да ратифицира Конвенцията на Съвета на Европа за превенция и борба с насилието над жени и домашното насилие, понеже се бори с „джендъра“, една жена на седмица-две бива убивана от настоящ или бивш съпруг или партньор.

AWS completes the 2023 Cyber Essentials Plus certification and NHS Data Security and Protection Toolkit assessment

Post Syndicated from Tariro Dongo original https://aws.amazon.com/blogs/security/aws-completes-the-2023-cyber-essentials-plus-certification-and-nhs-data-security-and-protection-toolkit-assessment/

Amazon Web Services (AWS) is pleased to announce the successful completion of the United Kingdom Cyber Essentials Plus certification and the National Health Service Data Security and Protection Toolkit (NHS DSPT) assessment. The Cyber Essentials Plus certificate and NHS DSPT assessment are valid for one year until March 28, 2024, and June 30, 2024, respectively.

Cyber Essentials Plus is a UK Government-backed, industry-supported certification scheme intended to help organizations demonstrate organizational cyber security against common cyber attacks. An independent third-party auditor certified by the Information Assurance for Small and Medium Enterprises (IASME) completed the audit. The scope of our Cyber Essentials Plus certificate covers AWS Europe (London), AWS Europe (Ireland), and AWS Europe (Frankfurt) Regions.

The NHS DSPT is a self-assessment that organizations use to measure their performance against data security and information governance requirements. The UK Department of Health and Social Care sets these requirements.

When customers move to the AWS Cloud, AWS is responsible for protecting the global infrastructure that runs our services offered in the AWS Cloud. AWS customers are the data controllers for patient health and care data, and are responsible for anything they put in the cloud or connect to the cloud. For more information, see the AWS Shared Security Responsibility Model.

AWS status is available on the AWS Cyber Essentials Plus compliance page, the NHS DSPT portal, and through AWS Artifact. AWS Artifact is a self-service portal for on-demand access to AWS compliance reports. Sign in to AWS Artifact in the AWS Management Console, or learn more at Getting Started with AWS Artifact.

As always, we value your feedback and questions. Reach out to the AWS Compliance team through the Contact Us page. If you have feedback about this post, submit a comment in the Comments section below. To learn more about our other compliance and security programs, see AWS Compliance Programs.

Want more AWS Security news? Follow us on Twitter.

Tariro Dongo

Tariro Dongo

Tariro is a Security Assurance Program Manager at AWS, based in London. Tari is responsible for third-party and customer audits, attestations, certifications, and assessments across EMEA. Previously, Tari worked in security assurance and technology risk in the big four and financial services industry over the last 12 years.

Jennifer Park

Jennifer Park

Jennifer is a Security Assurance Program Manager at AWS, based in New York. She is responsible for third-party and customer audits, attestations and certifications across EMEA. Jennifer graduated from Boston College and has just under one year experience in Security Assurance.

Share and query encrypted data in AWS Clean Rooms

Post Syndicated from Jonathan Herzog original https://aws.amazon.com/blogs/security/share-and-query-encrypted-data-in-aws-clean-rooms/

In this post, we’d like to introduce you to the cryptographic computing feature of AWS Clean Rooms. With AWS Clean Rooms, customers can run collaborative data-query sessions on sensitive data sets that live in different AWS accounts, and can do so without having to share, aggregate, or replicate the data. When customers also use the cryptographic computing feature, their data remains cryptographically protected even while it is being processed by an AWS Clean Rooms collaboration.

Where would AWS Clean Rooms be useful? Consider a scenario where two different insurance companies want to identify duplicate claims so they can identify potential fraud. This would be simple if they could compare their claims with each other, but they might not be able to do so due to privacy constraints.

Alternately, consider an advertising network and a client that want to measure the effectiveness of an advertising campaign. To that end, they would like to know how many of the people who saw the campaign (exposures) went on to make a purchase from the client (purchasers). However, confidentiality concerns might prevent the advertising network from sharing their list of exposures with the client or prevent the client from sharing their list of purchasers with the advertising network.

As these examples show, there can be many situations in which different organizations want to collaborate on a joint analysis of their pooled data, but cannot share their individual datasets directly. One solution to this problem is a data clean room, which is a service trusted by a collaboration’s participants to do the following:

  • Hold the data of individual parties
  • Enforce access-control rules that collaborators specify regarding their data
  • Perform analyses over the pooled data

To serve customers with these needs, AWS recently launched a new data clean-room service called AWS Clean Rooms. This service provides AWS customers with a way to collaboratively analyze data (stored in other AWS services as SQL tables) without having to replicate the data, move the data outside of the AWS Cloud, or allow their collaborators to see the data itself.

Additionally, AWS Clean Rooms provides a feature that gives customers even more control over their data: cryptographic computing. This feature allows AWS Clean Rooms to operate over data that customers encrypt themselves and that the service cannot actually read. Specifically, customers can use this feature to select which portions of their data should be encrypted and to encrypt that data themselves. Collaborators can continue to analyze that data as if it were in the clear, however, even though the data in question remains encrypted while it is being processed in AWS Clean Rooms collaborations. In this way, customers can use AWS Clean Rooms to securely collaborate on data they may not have been able to share due to internal policies or regulations.

Cryptographic computing

Using the cryptographic computing feature of AWS Clean Rooms involves these steps:

  • Users create AWS Clean Rooms collaborations and set collaboration-wide encryption settings. They then invite collaborators to support the analysis process.
  • Outside of AWS Clean Rooms, those collaborators agree on a shared secret: a common, secret, cryptographic key.
  • Collaborators individually encrypt their tables outside of the AWS Cloud (typically on their own premises) using the shared secret, the collaboration ID of the intended collaboration, and the Cryptographic Computing for Clean Rooms (C3R) encryption client (which AWS provides as an open-source package). Collaborators then provide the encrypted tables to AWS Clean Rooms, just as they would have provided plaintext tables.
  • Collaborators continue to use AWS Clean Rooms for their data analysis. They impose access-control rules on their tables, submit SQL queries over the tables in the collaboration, and retrieve results.
  • These results might contain encrypted columns, and so collaborators decrypt the results by using the shared secret and the C3R encryption client.

As a result, data that enters AWS Clean Rooms in encrypted format will remain encrypted from input tables to intermediate values to result sets. AWS Clean Rooms will be unable to decrypt or read the data even while performing the queries.

Note: For those interested in the academic aspects of this process, the cryptographic computing feature of AWS Clean Rooms is based on server-aided private set intersection (PSI). Server-aided PSI allows two or more participants to submit sets of values to a server and learn which elements are found in all sets, but without (1) allowing the participants to learn anything about the other (non-shared) elements, or (2) allowing the server to learn anything about the underlying data (aside from the degrees to which the sets overlap). PSI is just one example of the field of cryptographic computing, which provides a variety of new methods by which encrypted data can be processed for various purposes and without decryption. These techniques allow our customers to use the scale and power of AWS systems on data that AWS will not be able to read. See our Cryptographic Computing webpage for more about our work in this area.

Let’s dive deeper into each new step in the process for using cryptographic computing in AWS Clean Rooms.

Key agreement. Each collaboration needs its own shared secret: a secure cryptographic secret (of at least 256 bits). Customers sometimes have a regulatory need to maintain ownership of their encryption keys. Therefore, the cryptographic computing feature supports the case where customers generate, distribute, and store their collaboration’s secret themselves. In this way, customers’ encryption keys are never stored on an AWS system.

Encryption. AWS Clean Rooms allows table owners to control how tables are encrypted on a column-by-column basis. In particular, each column in an encrypted table will be one of three types: cleartext, sealed, or fingerprint. These types map directly to both how columns are used in queries and how they are protected with cryptography, described as follows:

  • Cleartext columns are not cryptographically processed at all. They are copied to encrypted tables verbatim, and can be used anywhere in a SQL query.
  • Sealed columns are encrypted. The encryption scheme used (AES-GCM) is randomized, meaning that encrypting the same value multiple times yields different ciphertexts each time. This helps prevent the statistical analysis of these columns, but also means that these columns cannot be used in JOIN clauses. They can be used in SELECT clauses, however, which allows them to appear in query results.
  • Fingerprint columns are hashed using the Hash-based Message Authentication Code (HMAC) algorithm. There is no way to decrypt these values, and therefore no reason for them to appear in the SELECT clause of a query. They can, however, be used in JOIN clauses: HMAC will map a given value to the same fingerprint every time, meaning that JOINs will be able to unify common values across different fingerprint columns.

Encryption settings. This last point—that fingerprint values will always map a given plaintext value to the same fingerprint—might give pause to some readers. If this is true, won’t the encrypted table be vulnerable to statistical analysis? That is absolutely correct: it will. For this reason, users might wish to set collaboration-wide encryption settings to control these forms of analysis.

To see how statistical analysis might be a concern, imagine a table where one fingerprint column is named US_State. In this case, a simple frequency analysis will reverse-engineer the plaintext values relatively quickly: the most common fingerprint is almost certain to be “California”, followed by “Texas”, “Florida”, and so on. Also, imagine that the same table has another fingerprint column called US_City, and that a given fingerprint appears in both columns. In that case, the fingerprint in question is almost certain to be “New York”. If a row has a fingerprint in the US_City column but a NULL in the US_State column, furthermore, it’s very likely that the fingerprint is for “District of Columbia”. And finally, imagine that the table has a cleartext column called Time_Zone. In this case, values of “HST” (Hawaii standard time) or “AKST” (Alaska standard time) reveal the value in the US_State column regardless of the cryptography.

Not all datasets will be vulnerable to these kinds of statistical analysis, but some will. Only customers can determine which types of analysis may reveal their data and which may not. Because of this, the cryptographic computing feature allows the customer to decide which protections will be needed. At the time of collaboration creation, that is, the creator of the AWS Clean Rooms collaboration can configure the following collaboration-wide encryption settings:

  • Whether or not fingerprint columns can contain duplicate plaintext values (addressing the “California” example)
  • Whether or not fingerprint columns with different names should fingerprint values in the same way (addressing the “New York” example)
  • Whether or not NULL values in the plaintext table should be left as NULL in the encrypted table (addressing the “District of Columbia” example)
  • Whether or not encrypted tables should be allowed to have cleartext columns at all (addressing the time zone example)

Security is maximized when all of these options are set to “no,” but each “no” will limit the queries that C3R will be able to support. For example, the choice of whether or not encrypted tables should be allowed to have cleartext columns will determine which WHERE clauses will be supported: If cleartext columns are not supported, then the Time_Zone column must be cryptographically processed — meaning that the clause WHERE Time_Zone=”EST” will not act as intended. There might be reasons to set these options to “yes” in order to enable a wider variety of queries, which we discuss in the Query behavior section later in this post.

Decryption. AWS Clean Rooms will write query results to an Amazon Simple Storage Service (Amazon S3) bucket. The recipient copies these results from the bucket to some on-premises storage and then runs the C3R encryption client. The client will find encrypted elements of the output and decrypt them. Note that the client can only decrypt elements from sealed columns. If the output contains elements from a fingerprint column, the client will warn you, but will also leave these elements untouched, as cryptographic fingerprints can’t be decrypted.

Having finished our overview, let’s return to the discussion regarding how encryption can affect the behavior of queries.

Query behavior

Implicit in the discussion so far is something worth calling out explicitly: AWS Clean Rooms runs queries over the data that is provided to it. If the data given to AWS Clean Rooms is encrypted, therefore, queries will be run on the ciphertexts and not the plaintexts. This will not affect the results returned, so long as the columns are used for their intended purposes:

  • Fingerprint columns are used in JOIN clauses
  • Sealed columns are used in SELECT clauses

(Cleartext columns can be used anywhere.) Queries might produce unexpected results, however, if the columns are used outside of their intended purposes:

  • Sometimes queries will fail when they would have succeeded on the plaintext. For example, ciphertexts and fingerprints will be string values, even if the original plaintext values were another type. Therefore, SUM() or AVG() calls on fingerprint or sealed columns will yield errors even if the corresponding plaintext columns were numeric.
  • Sometimes queries will omit results that would have been found by querying the plaintext. For example, attempting to JOIN on sealed columns will yield empty result sets: no two ciphertexts will be the same, even if they encrypt the same plaintext value. (Also, performing a JOIN on fingerprint columns with different names will exhibit the same behavior, if the collaboration-wide encryption settings specified that fingerprint columns of different names should fingerprint values differently.)
  • Sometimes results will include rows that would not be found by querying the plaintext. As mentioned, ciphertexts and fingerprints will be string values—base64 encodings of random-looking bytes, specifically. This means that a clause such as WHERE ‘US_State’ CONTAINS ‘CA’ will match some ciphertexts or fingerprints even when they would not match the plaintext.

To avoid these issues, fingerprint and sealed columns should only be used for their intended purposes (JOIN and SELECT clauses, respectively).

Conclusion

In this blog post, you have learned how AWS Clean Rooms can help you harness the power of AWS services to query and analyze your most-sensitive data. By using cryptographic computing, you can work with collaborators to perform joint analyses over pooled data without sharing your “raw” data with each other—or with AWS. If you believe that you can benefit from cryptographic computing (in AWS Clean Rooms or elsewhere), we’d like to hear from you. Please contact us with any questions or feedback. Also, we invite you to learn more about AWS Clean Rooms (including its use of cryptographic computing). Finally, the C3R client is open source, and can be downloaded from its GitHub page.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Jonathan Herzog

Jonathan Herzog

Jonathan is a Principal Security Engineer in AWS Cryptographyand has worked in cryptography for 25 years. He received his PhD in crypto from MIT, and has developed cryptographic systems for the US Air Force, the National Security Agency, Akamai Technologies, and (now) Amazon.

Addressing GitHub’s recent availability issues

Post Syndicated from Mike Hanley original https://github.blog/2023-05-16-addressing-githubs-recent-availability-issues/

Last week, GitHub experienced several availability incidents, both long running and shorter duration. We have since mitigated these incidents and all systems are now operating normally. The root causes for these incidents were unrelated but in aggregate, they negatively impacted the services that organizations and developers trust GitHub to deliver. This is not acceptable nor the standard we hold ourselves to. We took immediate and direct action to remedy the situation, and we want to be very transparent about what caused these incidents and what we’re doing to mitigate in the future. Read on for more details.

May 9 Git database incident

Date: May 9, 2023
Incident: Git Databases degraded due to configuration change
Impact: 8 of 10 main services degraded

Details:

On May 9, we had an incident that caused 8 of the 10 services on the status portal to be impacted by a major (status red) outage. The majority of downtime lasted just over an hour. During that hour-long period, many services could not read newly-written Git data, causing widespread failures. Following this outage, there was an extended timeline for post-incident recovery of some pull request and push data.

This incident was triggered by a configuration change to the internal service serving Git data. The change was intended to prevent connection saturation, and had been previously introduced successfully elsewhere in the Git backend.

Shortly after the rollout began, the cluster experienced a failover. We reverted the config change and attempted a rollback within a few minutes, but the rollback failed due to an internal infrastructure error.

Once we completed a gradual failover, write operations were restored to the database and broad impact ended. Additional time was needed to get Git data, website-visible contents, and pull requests consistent for pushes received during the outage to achieve a full resolution.

Plot of error rates over time: At around 11:30, rates rise from zero to about 30,000. The rate continues to fluctuate between 25,000 and 35,000 until around 12:30, at which point it falls back to zero.
Git Push Error Rate

May 10 GitHub App auth token incident

Date: May 10, 2023
Incident: GitHub App authentication token issuance degradation due to load
Impact: 6 of 10 main services degraded

Details:

On May 10, the database cluster serving GitHub App auth tokens saw a 7x increase in write latency for GitHub App permissions (status yellow). The failure rate of these auth token requests was 8-15% for the majority of this incident, but did peak at 76% percent for a short time.

Line plot of latency over time, showing a jump from zero to fluctuate around '3e14' from 12:30 on Wednesday, May 10 until midnight on Thursday, May 11. Peak latency spiked close to '1e15' 5 times in that period.
Total Latency
Line plot of latency over time, showing a jump from zero to '25T' at 12:00 on Wednesday, May 10, followed by a another jump further up to '60T' at 17:00, then a drop back down to zero at midnight on Thursday, May 11. The line shows a peak latency of 75T at 21:00 on May 10.
Fetch Latency

We determined that an API for managing GitHub App permissions had an inefficient implementation. When invoked under specific circumstances, it results in very large writes and a timeout failure. This API was invoked by a new caller that retried on timeouts, triggering the incident. While working to identify root cause, improve the data access pattern, and address the source of the new call pattern, we also took steps to reduce load from both internal and external paths, reducing impact to critical paths like GitHub Actions workflows. After recovery, we re-enabled all suspended sources before statusing green.

While we update the backing data model to avoid this pattern entirely, we are updating the API to check for the shift in installation state and will fail the request if it would trigger these large writes as a temporary measure.

Beyond the problem with the query performance, much of our observability is optimized for identifying high-volume patterns, not low-volume high-cost ones, which made it difficult to identify the specific circumstances that were causing degraded cluster health. Moving forward, we are prioritizing work to apply the experiences of our investigations during this incident to ensure we have quick and clear answers for similar cases in the future.

May 11 git database incident

Date: May 11, 2023
Incident: Git database degraded due to loss of read replicas
Impact: 8 of 10 main services degraded

Details:

On May 11, a database cluster serving git data crashed, triggering an automated failover. The failover of the primary was successful, but in this instance read replicas were not attached. The primary cannot handle full read/write load, so an average of 15% of requests for Git data were failed or slow, with peak impact of 26% at the start of the incident. We mitigated this by reattaching the read replicas and the core scenarios recovered. Similar to the May 9 incident, additional work was required to recover pull request push updates, but we were eventually able to achieve full resolution.

Beyond the immediate mitigation work, the top workstreams underway are focused on determining and resolving what caused the cluster to crash and why the failure didn’t leave the cluster in a good state. We want to clarify that the team was already working to understand and address a previous cluster crash as part of a repair item from a different recent incident. This failover replica failure is new.

Line plot of successful operations over time, showing a typical value around 2.5 million. The plot displays a drop to around 1.5 million operations at 13:30, followed by a steady increase back to 2.5 million, normalizing at 14:00.
Git Operation success rate
Line plot of error rate over time, showing a roughly inverted trend to the success rate plot. The error rate spiked from zero to 200,000 at 13:30, then continued to rise past 400,000 until around 13:40 at which point it began to steadily decrease back down to zero, normalizing at 13:50.
Git Operation error rate

Why did these incidents impact other GitHub services?

We expect our services to be as resilient as possible to failure. Failure in a distributed system is inevitable, but it shouldn’t result in significant outages across multiple services. We saw widespread degradation in all three of these incidents. In the Git database incidents, Git reads and writes are at the core of many GitHub scenarios, so increased latency and failures resulted in GitHub Actions workflows unable to pull data or pull requests not updating.

In the GitHub Apps incident, the impact on the token issuance also impacted GitHub features that rely on tokens for operation. This is the source of each GITHUB_TOKEN in GitHub Actions, as well as the tokens used to give GitHub Codespaces access to your repositories. They’re also how access to private GitHub Pages are secured. When token issuance fails, GitHub Actions and GitHub Codespaces are unable to access the data they need to run, and fail to launch as a result.

What actions are we taking?

  1. We are carefully reviewing our internal processes and making adjustments to ensure changes are always deployed safely moving forward. Not all of these incidents were caused by production changes, but we recognize this as an area of improvement.
  2. In addition to the standard post-incident analysis and review, we are analyzing the breadth of impact these incidents had across services to identify where we can reduce the impact of future similar failures.
  3. We are working to improve observability of high-cost, low-volume query patterns and general ability to diagnose and mitigate this class of issue quickly.
  4. We are addressing the Git database crash that has caused more than one incident at this point. This work was already in progress and we will continue to prioritize it.
  5. We are addressing the database failover issues to ensure that failovers always recover fully without intervention.

As part of our commitment to transparency, we publish summaries of all incidents that result in degraded performance of GitHub services in our monthly availability report. Given the scope and duration of these recent incidents we felt it was important to address them with the community now. The May report will include these incidents and any further detail we have on them, along with a general update on progress towards increasing the availability of GitHub. We are deeply committed to improving site reliability moving forward and will continue to hold ourselves accountable for delivering on that commitment.

Мария Степанова: „Влакът на историята отново навлиза в тъмен тунел“

Post Syndicated from Радио „Свобода“ original https://www.toest.bg/maria-stepanova-vlakut-na-istoriyata-otnovo-navliza-v-tumen-tunel/

Мария Степанова: „Влакът на историята отново навлиза в тъмен тунел“

Безусловността, с която настоява за поетично възприемане на света“ – така  определя приноса на Мария Степанова журито на Лайпцигската литературна награда за европейско взаиморазбирателство. Връчването на наградата се състоя през април тази година. Журито отбелязва също умението на Степанова, „като се вглежда в своите произведения в самото дъно, да дарява надежда“. Какво може да направи днес поетът – в ситуация, когато думите нищо не могат да променят и от друга страна, имаш чувството, че думи вече не са останали? Да пазим речта свободна, освободена от контрол и диктат. „Ние“ – хората, които се опитваме да останем субекти, а не обекти на историята – се заклещихме в нещо като историческа фуния, смята Степанова. И именно затова Манделщам и Хармс ни изглеждат така актуални. Макар че това е не толкова повод за възхищение, колкото знак, че бъдещето с нас „не е настъпило“.

Мария Степанова отговаря на въпросите на Радио „Свобода“.

– През предходните 30 години обществото в Русия – е, да речем, не всички, едно малцинство, макар и на скокове, на зигзази, неуверено – се движеше напред и това чувство за „движение по пътя на прогреса“ все пак беше опора на нашата жизнена мотивация. След началото на агресията ние се оказахме отхвърлени към изходната точка, в една условна 1984-та – ако не и направо в 1937 година. Какъв урок е това за всички нас – поредното прекатурване в „кръга на историята“?

– В края на 80-те – в началото на 90-те наистина изглеждаше, че излизаме, че сме излезли от предишния катастрофален коридор – ех, какъв живот започва! И не щеш ли – „Внимание, вратите се затварят“ и влакът на историята отново навлиза в тъмен тунел – веднага след осветената гара. Вратите отново са залостени; всички усилия, всички надежди и огромното количество частни избори, които е правил всеки от нас – те нямат никакво значение. Ние всички преминаваме към мрака – и после, един ден, ако някой има късмет, ще излезе на светло и ще благодари на съдбата за късмета си. Но самото това състояние на безсубектност (избора правим не ние, а нещо, наречено История, вселенски хаос, мироздание) и това движение по спирала, в кръг или насам-натам подир махалото всъщност е крайно унизително за човешкото същество. Излиза, че ние нищо не решаваме. Но как така, нали се мислехме за самостоятелни единици? Просто основа на всички наши решения беше необходимостта от мир, съпротивата на самата идея за война с когото и да било – и ни се струваше, че този въпрос вече е решен безвъзвратно от предишните поколения, за това поне можем да не се тревожим, never again, нали така?

Днес тази увереност ни изглежда наивна. Веднъж когато Надежда Яковлевна се оплаквала колко е нещастна, Манделщам ѝ казал: „А някой обещавал ли ти е да бъдеш щастлива?“

Кой ни е обещавал да живеем щастливо и мирно –

сякаш зло не съществува, сякаш на света няма нещастие? През цялото това време са се случвали нещастия и катастрофи – но с други. Ирак, Сирия, Афганистан, Руанда… Но някой е имал чувството, че ако замижи и си седи у дома, нещастието ще го заобиколи. Не ни заобиколи.

– Влакът, вагонът представляват затворено пространство, от което не можеш да изскочиш в движение. Със съзнанието, че спирки вече няма да има. „Влакът ни пътува за Освиенцим, днес и ежедневно“ – спомням си един ред от Галич.

– В Берлин, на две крачки от мястото, където разговаряме, се намира Grunewald Gleis 17 – мемориал на пункта, откъдето са изпращани композициите към лагерите за унищожение, предимно с евреи. Хубав мемориал е – не се опитва да стане предмет на изкуството. Една чиста, безпримесна територия на паметта. На перона са изписани датите: кога именно са тръгвали ешелоните, накъде, колко души месечно са откарвали тези влакове. Те са поемали на път буквално до април 1945 г., когато според нашите представи всичко вече е било ясно. И бройката вече не е била хиляди, нито стотици – само петнайсетина души, че нали до края на войната е имало само няколко седмици, върху Берлин се сипят бомби, друга работа нямали ли са, та ще снабдяват концлагерите. Тях обаче са ги натоварили във вагоните и са ги пратили на смърт – въпреки всичко. В този момент ти разбираш, че тези влакове са били управлявани, контролирани от хора. Безспорно сега на всички щеше да ни е по-леко да смятаме въпросните влакове за някаква имперсонална, безсубектна сила. За съжаление, не е така. И тогава, и сега всичко това се прави от хора.

– През тези предходни 30 години ни се струваше също, че „историята е свършила“ – всичко най-лошо в историята на човечеството вече се е случило и ние живеем в постистория. Всъщнност вашата книга „В памет на Паметта“ беше потвърждение и дори символ на това усещане. Погрешна ли беше тази проекция?

– Наистина, ние (какво е това ние“, хайде да смятаме, че онези, които са се занимавали с история или са се интересували от нея, от миналото, от културата на паметта, представляват широк кръг от хора) изхождахме от увереността, че съществуваме донякъде в бъдещето, в посткатастрофална ситуация. И че единствената задача, към която трябва да бъдат насочени всички интелектуални сили на човечеството, е да се опитаме да направим равносметка на миналото, на историята с нейните катастрофи, да запазим, да си спомним, да не забравим, да меморизираме всичко, което е останало. Всичко трябва да бъде подредено, обяснено, каталогизирано: с изградени паметници, с премислени ритуали, които са нужни, за да се простим с историята по правилния начин. Е, и разбира се, всички пишехме книгите си за миналото – за какво друго. Никога няма да забравя как веднъж бях в журито на една литературна награда – и в задълженията ми влизаше да прочета около 80 книги проза. От тях близо 70 бяха за миналото. Много добре разбирам авторите, и аз съм такава.

Само дето се получава, че сме хвърлили всичките си сили за любов към миналото и сме изгубили доверието си в бъдещето.

Впрочем това е така не само в Русия. Случва ми се да преподавам в Европа или в Америка. И понякога питам студентите – това е нещо като игра – да назоват сравнително нов филм, някой блокбастър, който ни предлага добър вариант за бъдещето. Не задължително лъчезарно-утопичен, но приемлив, неплашещ, поносим. И на това място моите студенти се умълчават, нямат какво да кажат. Най-сетне – и това се случва много рядко – някой вдига ръка и предлага безпогрешното „Назад към бъдещето“. „Назад към бъдещето“! Филм, заснет през 1985 г. Това означава, че цели 30 години всички машини на човешкото въображение, цялата супериндустрия на киното не е могла или не е пожелала да ни предложи позитивна версия на бъдещето. За сметка на това всяка година по екраните излиза пълен асортимент антиутопии – политически, икономически, социални, екологични. Тоест децата, още неизлюпили се от яйцето, вече добре знаят, че бъдещето е нещо страшно, докато виж, миналото може и да е плашещо, но поне знаем какво е било. И им се струва, че то е по-безопасно от бъдещето, че могат да научат правилата, за да не ги сгази поредният влак на историята. И се получава, че ние с безумна скорост препускаме в обратна посока, опитваме да се върнем в тази „история“ – естествено, фикционална, измислена, друга нямаме на разположение. Каква ще бъде тя, зависи изключително от източниците, от които черпим.

– И при всяко положение е подобрена версия на миналото.

– Подправена. Така или иначе, лично обагрена. При всяко положение имаме работа с фантазия за миналото, индивидуална или колективна. И ето, днес виждаме как фантазията по мотиви от миналото – произведена, бих казала, все пак от един човек и неговото обкръжение и тиражирана „в мащабите на цялата страна“ – се реализира. Февруарското нахлуване в Украйна от самото начало имаше, как да се изразя, персоналистични черти. Това е фантазията на Путин, която той има сили, средства, възможности да осъществи не в компютърна игра, не в някакъв безумен предутринен сън – а на територията на Европа с цената на хиляди човешки съдби.

– „Настоява за поетично възприемане на света“ е формулировката, с която Ви връчиха Лайпцигската литературна награда за европейско взаиморазбирателство през 2023 г. Като се върнем още веднъж към вашата метафора за влака: на поета – индивида в концентриран вид – по-лесно ли е да се изтръгне от желязното менгеме на историята с помощта на езика? Да се издигне подобно на балон над препускащия влак?

– Известно е, че стиховете са своеобразно убежище и в някои ситуации могат да спасяват. Прочитате едно стихотворение – и то се заселва у вас. Или е обратното – вие се заселвате у него? Ако знаеш наизуст достатъчно много стихове, можеш да се скриеш в тях. И да изчакаш, да си отдъхнеш, да си поемеш въздух. Можеш и да си поделиш стихотворението – като палатка или хранителна дажба – с другиго. Всичко това е истина, макар да звучи красиво и възвишено. Но за съжаление, ние знаем също, че стиховете изобщо не спасяват хората, които ги пишат. Тъкмо за това си мислех, когато пишех речта си за Лайпцигската награда, и си спомних Хармс:

Стиховете трябва да се пишат така, че ако ги запокитиш по прозорец, той да се счупи.

Това е чиста истина. Хармс го е написал през 1940 г., а в началото на 1942-ра е умрял от глад в затворническа болница…

– А Валтер Бенямин приблизително по същото време гълта отрова.

– А Цветаева се обесва в Елабуга. И ние можем да продължим тази поредица. Както е казал Одън:

Poetry makes nothing happen.

Стиховете не карат нещата да се случват. Те нищо не променят. Това е разговор със себе си вътре в затворено пространство. Ето защо, когато мисля за стиховете, аз постоянно мисля и за това – до каква степен те не са достатъчни сега. Ала има и друга страна на този въпрос. Главната функция на стиховете, освен чудното чуруликане в рими и без рими, е да формират езика. Стиховете говорят на езика на близкото или далечното бъдеще. Стиховете на Манделщам, които той е написал по време на заточението си във Воронеж, са изглеждали на неговите съвременници толкова неясни, че им е било нужно усилие, за да разберат какво иска да каже той. А той е отговарял:

Аз мисля с пропуснати брънки.

В крайна сметка 80 години по-късно днешният ученик от горните класове чете Манделщам – и разбира всичко или почти всичко. За 80 години текстът не е престанал да бъде съвременен – но освен това изведнъж става и разбираем: през това време езикът, на който са изговорени текстовете, е успял да стане всеобщ. По отношение на езика стиховете изпълняват хигиенизираща функция. Хигиената далеч не се свежда само до запазване на езиковата норма, спусната от началник някъде отгоре – „кафето да се нарича еди-как си“ – и в гледане отвисоко на онези, които пият това кафе в неправилна форма. Хигиената е обратното – когато стиховете раздрусват езика, освобождавайки допълнителни коридори. Допълнителни опции. И в този смисъл

днешните стихове работят против Путин, против днешната власт в Русия.

Против вцепенението и помъртвяването, които те привнасят в езика. Стиховете – докато се пишат – посочват възможност за друго: различно, не като сегашното.

– Спомням си и думите на Ахматова – „но ние ще те запазим, руска реч“. Да се запази свободната, безцензурна руска реч днес е напълно достойна задача за интелектуалеца.

– Да, но аз бих отбелязала също: това е задача не само за интелектуалеца с руски паспорт в джоба. Защото пред всички нас днес се очертава печално прочулата се „отмяна на руската култура“. Отмяна, разбира се, не буквална, с която обичат да ни плашат пропутинските средства за масова информация, това им е любима тема; а ментална, която ще се случва незабелязано и обективно чрез помъртвяването и корозията на езика. Единственото, което можем да противопоставим на това, е да отделим руския език, руската култура от днешния режим, от държавата. Днес благодарение на новата порция напуснали Русия езикът вече се отделя от определена територия, превръща се в език на диаспора – и този „напуснал“ руски би трябвало да се сдобие с нови черти. Това е добре: никой не може да му натрапи отвън някакви норми, правила на поведение, да го консервира, да го изложи в кристален ковчег в мавзолей. Струва ми се, че „запазването на руската реч“ може да се свежда само до нейното разкрепостяване, най-сетне. Нека нашата реч живее където иска, нека се смесва с други наречия. Нека свири, чурулика, произвежда неологизми – нека се англифицира, германизира, арабизира. Това е, което се случва с езиците, които искат да остават живи. И това ще бъде не обичайното натрапване, няма да е експанзия, нашествие – това ще бъде… скитащ руски. Тръгнал на разходка из света. Един от многото световни езици.

… Веднъж бях в компания на швейцарци; те се забавляваха, като произнасяха една и съща фраза всеки на своя диалект. Невъзможно е да си представи човек, че това се случва вътре в един език. Такова богатство, такава разлика – и такава прекрасна способност на хората да не превръщат това в проблем, big deal. Това е смехотворно, прекрасно – и то съществува. Способност за разнообразие, за различаване – тя просто е дадена на хората от люлката. Така е и в Италия, да речем. В Германия съществута берлински диалект, съществува кьолнски, има баварски, има вестници, които излизат на тези диалекти, има пиеси, които се пишат на диалекти. И това ни най-малко не пречи на съществуването на така наречения висок немски. Защо с нас такова нещо не се е случило?

Защо в Русия, на тази огромна територия, практически не съществуват диалекти?

У нас е била императивно установена единствена езикова норма (по образеца на френската езикова реформа в края на ХVIII век), а после руската империя, а после съветската гледат на въпросния единен език като на свой приоритет. И ние сме наследили този норматив – и заедно с него потребността между развиването на езика и неговата консервация да избираме второто. Хомогенния, неизменяемия, невариативния руски – който не признава различия между областите, народите, градовете. „Така трябва.“ Струва ми се, че нашата печално прочута несвобода има своите корени и тук например. И това води до съкрушителни последствия.

– Упреците по повод имперскостта на руската култура са в известен смисъл справедливи. Тя наистина до голяма степен е създадена от хора с колониален опит. Но ако потърсим съвършено неопетнена фигура, изведнъж изниква пак Хармс. Трябва да кажем, че сега – колкото и да е странно – Хармс звучи още по-често, отколкото преди войната. Сякаш всички са се вкопчили в него като в последно спасение/оправдание. Пък и целият вледеняващ абсурд на случващото се е най-лесно да се изрази пак с помощта на Хармс. Показателно е: днес именно Хармс от една страна, а от друга – Бекет и Кафка – само те помагат да не полудееш.

– Добре казано. Аз мисля за това в малко по-различен аспект: може би работата е там, че Хармс и Введенски в много по-голяма степен са наши съвременници от тяхната по-възрастна съвременничка Ахматова. Или отколкото техния дори по-млад съвременник Бродски. Става дума не за сравнителни достойнства. А по-скоро за ситуацията, в която изведнъж се озовахме. Манделщам има една статия – „Краят на романа“, в която пише, че измислицата вече няма тежест, смисъл, интерес за читателя. Защото ние много добре разбрахме във времена на катастрофи и масова смърт, че човекът вече не определя своята биография. Биографията става типова и както се изразявахме през 90-те, изчезва в заоблянията“. Частната биография вече не съществува. А щом тя не съществува, значи няма и сюжетостроене. Няма автор, който стои зад решението дали мадам Бовари да умре, или не. Няма автор, който се разпорежда с бъдещето на Наташа Ростова. И затова – по Манделщам – важен се оказва жанрът, който днес наричаме нонфикшън. Литературата на реалния живот, където е важен не сюжетът, а нещо друго. Важно е, че това някога с някого наистина се е случвало. Това е един от начините да си имаме работа с водовъртежа на историята.

Вторият начин е онова, което правят Бекет и Кафка (и Хармс, да). Ние се намираме в ситуация, където всички връзки между предметите са случайни или произволни. Можем да предположим, че зад тях стои някаква външна воля. Но не сме в състояние да дешифрираме нейната логика. И в този смисъл световната литература, като се започне от Първата световна, се занимава приблизително с едно и също. Да, ето, избухва взрив (сега тази дума звучи съвсем различно – сега тя не е метафора, а нещо от реално по-реално); случва се някакво движение, изместване на историческата ос. Всичко замира, увисва във въздуха. И единственото, с което може да се работи сега, са фрагментите. С елементите на предишната система, които са повредени, разпилени, разединени. Ние знаем, че някога те са били част от цяло – и това цяло вече никога няма да бъде предишното. Но от късчетата на цялото може да се премонтира друга система, в която старите елементи ще присъстват при други условия и в други връзки. В руската литература вероятно най-точно по това са работили Хармс и Введенски – в поезията, а в прозата – Добичин, и по съвсем друг начин Платонов.

Лошото обаче е там, че всичко написано от тези автори упорито не престава да бъде актуално. Прието е да се възхищаваме от това, но ето какво се получава: излиза, че оттогава насам с нас не се е случило нищо ново. Хайде да сравним това с англоезичната литуратура от ХХ век. Да речем, с Елиът, Йейтс, Къмингс. Велики хора, прекрасни, гениални стихове – но ние добре разбираме, че това са стихове, написани не сега, между тях и нас има дистанция. И това е правилно чувство. А с руските стихове се получава

усещането, че се намираме във все същия културен аквариум, онзи отпреди 100 години.

Където на чаша чай и разговори за съдбините изведнъж се сещаме за стих от Манделщам – и всички имат чувството, че той е написан вчера и няма какво да се добави. Защото нищо не се е променило. И ето това непреодоляно, неизминато минало е нещо доста плашещо и тъжно.

– Критикът Игор Гулин пише: „Мария Степанова говори не от свое име, а от името на чужди гласове“; критиците датират този поетичен обрат, случил се с вас, приблизително в 2014 г. Но сега ми се струва, че сме се озовали в ситуация, когато ще се наложи отново да преминем в друг регистър – скръбно мълчание. Да говорим от името на самата немота, немотата на срама. Възможно ли е това в поезията?

– Не съм сигурна, че е възможно. Поне за мен. Струва ми се, че мислещите хора, хората, които пишат на руски, така или иначе са свързани с Русия – сега те усещат едно: изгубили сме възможността да представяме единствено себе си. Да говорим от свое име и само от свое име. Тъй като ти, аз, вие, ние сега сме на първо място руснаци – без значение какво е гражданството ни, етническият ни произход, езикът; дори „ние“ да сме се преселили в Германия или в Щатите още през 1978 г. Това нищо не променя. Отсега нататък всички ние сме част от множеството хора, които извършиха, сториха това.

Аз например съм родена през 1972 г. Заварих пионерските сборове, речите, политатестациите и приемането в Комсомола – целия този комичен балет. И от ранна възраст ми беше много близка, дори прекалено, логиката на неприсъединяването.

Никъде не влизам, не съм член на никаква партия. И не съм представител на никакво нищо – нито на болшинство, нито на малцинство. Говоря само от свое име.

Много точно сумира това Бродски в своята Нобелова реч, която започва с думите:

За любител на усамотението, за човек, който цял живот го е предпочитал пред всяка обществена роля…

Е, и така нататък.

… Ако изкуството учи на нещо, то е тъкмо частността на човешкото съществуване.

Това винаги е било много разбираемо за мен. И това по-скоро е моя психосоматика, отколкото политическо самоопределение. Но резултатът беше именно такъв: никога не съм имала предвид да представлявам нито руската литература, нито, пази боже, руската държавност, нито някакво въображаемо съдружие. Отстоявала съм възможността да говоря от свое име. От свое и за своето. И в продължение на някакъв брой години съм успявала. Сега вече нямам такава възможност и мисля, че никога повече няма да имам. Каквото и да пиша сега – може да пиша за пеперуди, за вятърни мелници, за какво ли не, – то така или иначе ще се възприема като текст, създаден на езика на хората, които извършиха това. И с това трябва някак да се живее. Тоест бих могла с лекота да посоча с пръст – ето там са нашите „лоши хора“, на мен обаче браво, аз мислех и постъпвах правилно, така че сега съм част от друго, просветено съдружие и мога свободно, без никакви утежнения да осъждам тези хора като други, странични. Не, за съжаление не е така. Значи ще трябва да поема част от общата вина – и някак да живея и да работя с нея. Ако съумея. И да се опитвам да намеря нови принципи на съществуване в езика.

Как ли?… Не знам, сега не пиша стихове. Не знам дали това е немота. Немота е Паул Целан например. Немотата е начин да минеш през неизговоримото. И да направиш така, че мълчанието да бъде определящо в текста, който пишеш. Но ние не можем да се сравняваме с Целан, нали така? С Целан може да се сравнява поет или поетеса от Украйна.

А ние се оказахме част от множеството, чийто език се превърна в оръдие на насилието.

И това е съвсем друга немота. И друг ужас. И какво да правим с това? Поет от Русия не може да описва Буча. Защото това би било апроприация, използване на чуждо страдание, за да – какво – за да напишеш текст? Ние можем най-много да споменем Буча, пък и не си представям добре как именно може да се направи това. Миналото лято написах някакви стихове, но не съм напълно сигурна, че това са стихове. Може би те имат някакъв смисъл като свидетелство, като надпис на стена. Но проблемът, с който се сблъсках тогава, когато ги пишех, бе проблематичността на тяхната агентност. На точката, от която всъщност произлиза речта – отговорът на въпроса какво е това „аз“, което говори. Имам едно стихотворение, което започва с

докато спяхме, ние бомбардирахме Харков.

Ето го разединеното, биполярно „ние“, което едновременно се срамува и убива, иска да потъне вдън земя и яде шишчета на слънчице край вилата си. И всичко това се случва едновременно. Ситуация, с която не е ясно как да си имаш работа. Засега на мен не ми е ясно.

– При това в нета, просто във въздуха се появи огромно количество талантливи и безпощадни украински стихове. Как ги оценявате вие? Като хронология на болката?

– Струва ми се, че това е невероятно важно и аз ги следя, защото украинските поети сега имат нещо, което ние нямаме. И не можем да имаме. Намираме се в състояние на абсолютна неувереност, в състояние на самоотмяна. Не е нужно дори специално да ни канцелират (от думата cancel), ние сами сме си канцелария. Пак Манделщам е смятал – и за това е интересно да се поразмишлява, – че „поезията е съзнание за собствената правота“. Дали е така, или не е, но сега никой от нас няма това съзнание за правота. Имаме съзнание само за собствената си неправота. И какво може да се направи от това чувство, и може ли изобщо да се направи нещо – това е открит въпрос. Поезията в Украйна винаги е заемала особено място. А сега пък съвсем се намира на онази точка, където всеки написан ред, текст придобиват силата на окончателно свидетелство. Но работата не се ограничава с това. Сега на украински се пишат велики стихове. Ето в момента чета текстовете на Мариана Кияновска, които тя е написала през последните няколко месеца. Поразителни стихове са – те ще се четат след сто, след двеста години, този език е за векове. Но това е възможно, защото тези стихове имат слушател, имат чуваемост. Поезията в Украйна изначално е заемала по-важно място. Не говоря за огромни културни фигури като Сергей Жадан например, който събира стадиони. В Русия аудиторията на една поетична вечер е 25 души, при повече късмет – 50. Тоест това е съвършено несъпоставим резонанс. И докато ние предполагаме, че стихотворението, след като е чуто, прочетено, ще образува в съзнанието на читателя някакви пространства на бъдещето, Украйна реализира това свое бъдеще още сега.

Колкото до мен, сега аз не пиша стихове, както вече казах; бавно довършвам романа, който започнах да пиша в началото на пандемията и на няколко пъти го зарязвах, защото историческото колело толкова рязко завиваше настрани, че аз трябваше наново да сглобявам някакви основи, някакъв фундамент, върху който се опитвах да поставя тази конструкция. Мисля, че този текст по някакъв начин ми помага да се крепя. Това е удивителна способност на прозата. Защото със стиховете ти така или иначе си принуден да чакаш в главата ти да се образува някаква невидима въздушна форма, която трябва – като пъзел – да запълниш с думи, да проявиш като фотолента. А в прозата все пак е важен елементът воля. Ти знаеш, че това нещо трябва и може да бъде написано. И влизаш в него като в друга стая, и в тази стая времето така или иначе спира. Това време е насаме с текста, езика – и с някаква чужда история, която много относително засяга твоята собствена. Тази възможност е голям подарък, трябва да кажа. Цял живот съм смятала, че ще пиша изключително стихове. Но се оказва, че прозата работи по други закони – и за човек, чието съществуване от време на време му причинява остра болка, прозата се превръща в прозорче, което може да открехне.

– Накрая въпрос за онлайн изданието Colta: възнамерявате ли да го възобновите на друга платформа? Да го презаредите, да го реанимирате?

– Да съществуваме в режима, в който работехме до февруари 2022 г., е невъзможно поради очевидни етични причини. Не можем да работим така, сякаш нищо не се е случило. Colta и нейният предшественик OpenSpace бяха замислени като ежедневни издания за култура, които се занимават с текущия културен процес. Книги, изложби – година след година следяхме това. Сега, когато Русия продължава агресията, би било пределно странно да описваме московски или петербургски концерти или изложба „Нонфикшън“. Да не говорим за финансовата част: ние много дълго съществувахме с читателски дарения и по партньорски програми. Сега и едното, и другото по различни причини са затруднени. Опитваме се да работим в полудоброволчески режим – разработваме едни кратки спецпроекти, известен брой материали, обединени от обща тема. Неотдавна публикувахме цикъл материали за бъдещето, там разни умни хора (в диапазона от Елена Фанайлова до Оксана Тимофеева) размишляват дали Русия има що-годе поносимо бъдеще – и какво би могло да бъде то? Самата мен ме владеят доста мрачни мисли по този повод, така че се надявах на тази анкета – ами ако някой предложи версия, която ще ме въодушеви?

Никакви позитивни прогнози.

Интересно е, че единственият автор,  чийто текст може да се нарече донякъде оптимистичен, е Кирил Медведев, който сега се намира в Москва и е въвлечен в различни форми на активизъм.

Е, и освен това аз много бих искала нашият сайт да успее да се удържи и да работи, прекосявайки границите, въпреки вече набелязалата се тенденция на разделение между емигранти и останали. Не ми се ще да възпроизвеждаме схизмата, логиката на разкола между съветската и емигрантската литература. Той е продължил толкова години – и в крайна сметка за нищо не е помогнал. За разлика от онази ситуация, ние имаме поне инструментариум, с който диалогът да продължава, така че едни и същи текстове да могат да се четат както в Русия, така и извън нея. И ми се струва, че да се работи за продължаването на разговора е необходимо.

Превод от руски на интервюто за Радио „Свобода“: Здравка Петрова


Мария Степанова (р. 1972, Москва) е руска поетеса, писателка и есеистка, главна редакторка на онлайн изданието за изкуство и култура Colta.ru. Философският ѝ роман-есе „В памет на Паметта“ (прев. Здравка Петрова, изд. „Жанет 45“, 2019) е отличен с най-престижните руски литературни награди – „Ясна поляна“ (2018), „Большая книга“ 2018 (най-високото руско литературно отличие) и „НоС“ („Новая Словестность“, 2019).

How Zoom implemented streaming log ingestion and efficient GDPR deletes using Apache Hudi on Amazon EMR

Post Syndicated from Sekar Srinivasan original https://aws.amazon.com/blogs/big-data/how-zoom-implemented-streaming-log-ingestion-and-efficient-gdpr-deletes-using-apache-hudi-on-amazon-emr/

In today’s digital age, logging is a critical aspect of application development and management, but efficiently managing logs while complying with data protection regulations can be a significant challenge. Zoom, in collaboration with the AWS Data Lab team, developed an innovative architecture to overcome these challenges and streamline their logging and record deletion processes. In this post, we explore the architecture and the benefits it provides for Zoom and its users.

Application log challenges: Data management and compliance

Application logs are an essential component of any application; they provide valuable information about the usage and performance of the system. These logs are used for a variety of purposes, such as debugging, auditing, performance monitoring, business intelligence, system maintenance, and security. However, although these application logs are necessary for maintaining and improving the application, they also pose an interesting challenge. These application logs may contain personally identifiable data, such as user names, email addresses, IP addresses, and browsing history, which creates a data privacy concern.

Laws such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) require organizations to retain application logs for a specific period of time. The exact length of time required for data storage varies depending on the specific regulation and the type of data being stored. The reason for these data retention periods is to ensure that companies aren’t keeping personal data longer than necessary, which could increase the risk of data breaches and other security incidents. This also helps ensure that companies aren’t using personal data for purposes other than those for which it was collected, which could be a violation of privacy laws. These laws also give individuals the right to request the deletion of their personal data, also known as the “right to be forgotten.” Individuals have the right to have their personal data erased, without undue delay.

So, on one hand, organizations need to collect application log data to ensure the proper functioning of their services, and keep the data for a specific period of time. But on the other hand, they may receive requests from individuals to delete their personal data from the logs. This creates a balancing act for organizations because they must comply with both data retention and data deletion requirements.

This issue becomes increasingly challenging for larger organizations that operate in multiple countries and states, because each country and state may have their own rules and regulations regarding data retention and deletion. For example, the Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada and the Australian Privacy Act in Australia are similar laws to GDPR, but they may have different retention periods or different exceptions. Therefore, organizations big or small must navigate this complex landscape of data retention and deletion requirements, while also ensuring that they are in compliance with all applicable laws and regulations.

Zoom’s initial architecture

During the COVID-19 pandemic, the use of Zoom skyrocketed as more and more people were asked to work and attend classes from home. The company had to rapidly scale its services to accommodate the surge and worked with AWS to deploy capacity across most Regions globally. With a sudden increase in the large number of application endpoints, they had to rapidly evolve their log analytics architecture and worked with the AWS Data Lab team to quickly prototype and deploy an architecture for their compliance use case.

At Zoom, the data ingestion throughput and performance needs are very stringent. Data had to be ingested from several thousand application endpoints that produced over 30 million messages every minute, resulting in over 100 TB of log data per day. The existing ingestion pipeline consisted of writing the data to Apache Hadoop HDFS storage through Apache Kafka first and then running daily jobs to move the data to persistent storage. This took several hours while also slowing the ingestion and creating the potential for data loss. Scaling the architecture was also an issue because HDFS data would have to be moved around whenever nodes were added or removed. Furthermore, transactional semantics on billions of records were necessary to help meet compliance-related data delete requests, and the existing architecture of daily batch jobs was operationally inefficient.

It was at this time, through conversations with the AWS account team, that the AWS Data Lab team got involved to assist in building a solution for Zoom’s hyper-scale.

Solution overview

The AWS Data Lab offers accelerated, joint engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data, analytics, artificial intelligence (AI), machine learning (ML), serverless, and container modernization initiatives. The Data Lab has three offerings: the Build Lab, the Design Lab, and Resident Architect. During the Build and Design Labs, AWS Data Lab Solutions Architects and AWS experts supported Zoom specifically by providing prescriptive architectural guidance, sharing best practices, building a working prototype, and removing technical roadblocks to help meet their production needs.

Zoom and the AWS team (collectively referred to as “the team” going forward) identified two major workflows for data ingestion and deletion.

Data ingestion workflow

The following diagram illustrates the data ingestion workflow.

Data Ingestion Workflow

The team needed to quickly populate millions of Kafka messages in the dev/test environment to achieve this. To expedite the process, we (the team) opted to use Amazon Managed Streaming for Apache Kafka (Amazon MSK), which makes it simple to ingest and process streaming data in real time, and we were up and running in under a day.

To generate test data that resembled production data, the AWS Data Lab team created a custom Python script that evenly populated over 1.2 billion messages across several Kafka partitions. To match the production setup in the development account, we had to increase the cloud quota limit via a support ticket.

We used Amazon MSK and the Spark Structured Streaming capability in Amazon EMR to ingest and process the incoming Kafka messages with high throughput and low latency. Specifically, we inserted the data from the source into EMR clusters at a maximum incoming rate of 150 million Kafka messages every 5 minutes, with each Kafka message holding 7–25 log data records.

To store the data, we chose to use Apache Hudi as the table format. We opted for Hudi because it’s an open-source data management framework that provides record-level insert, update, and delete capabilities on top of an immutable storage layer like Amazon Simple Storage Service (Amazon S3). Additionally, Hudi is optimized for handling large datasets and works well with Spark Structured Streaming, which was already being used at Zoom.

After 150 million messages were buffered, we processed the messages using Spark Structured Streaming on Amazon EMR and wrote the data into Amazon S3 in Apache Hudi-compatible format every 5 minutes. We first flattened the message array, creating a single record from the nested array of messages. Then we added a unique key, known as the Hudi record key, to each message. This key allows Hudi to perform record-level insert, update, and delete operations on the data. We also extracted the field values, including the Hudi partition keys, from incoming messages.

This architecture allowed end-users to query the data stored in Amazon S3 using Amazon Athena with the AWS Glue Data Catalog or using Apache Hive and Presto.

Data deletion workflow

The following diagram illustrates the data deletion workflow.

Data Deletion Workflow

Our architecture allowed for efficient data deletions. To help comply with the customer-initiated data retention policy for GDPR deletes, scheduled jobs ran daily to identify the data to be deleted in batch mode.

We then spun up a transient EMR cluster to run the GDPR upsert job to delete the records. The data was stored in Amazon S3 in Hudi format, and Hudi’s built-in index allowed us to efficiently delete records using bloom filters and file ranges. Because only those files that contained the record keys needed to be read and rewritten, it only took about 1–2 minutes to delete 1,000 records out of the 1 billion records, which had previously taken hours to complete as entire partitions were read.

Overall, our solution enabled efficient deletion of data, which provided an additional layer of data security that was critical for Zoom, in light of its GDPR requirements.

Architecting to optimize scale, performance, and cost

In this section, we share the following strategies Zoom took to optimize scale, performance, and cost:

  • Optimizing ingestion
  • Optimizing throughput and Amazon EMR utilization
  • Decoupling ingestion and GDPR deletion using EMRFS
  • Efficient deletes with Apache Hudi
  • Optimizing for low-latency reads with Apache Hudi
  • Monitoring

Optimizing ingestion

To keep the storage in Kafka lean and optimal, as well as to get a real-time view of data, we created a Spark job to read incoming Kafka messages in batches of 150 million messages and wrote to Amazon S3 in Hudi-compatible format every 5 minutes. Even during the initial stages of the iteration, when we hadn’t started scaling and tuning yet, we were able to successfully load all Kafka messages consistently under 2.5 minutes using the Amazon EMR runtime for Apache Spark.

Optimizing throughput and Amazon EMR utilization

We launched a cost-optimized EMR cluster and switched from uniform instance groups to using EMR instance fleets. We chose instance fleets because we needed the flexibility to use Spot Instances for task nodes and wanted to diversify the risk of running out of capacity for a specific instance type in our Availability Zone.

We started experimenting with test runs by first changing the number of Kafka partitions from 400 to 1,000, and then changing the number of task nodes and instance types. Based on the results of the run, the AWS team came up with the recommendation to use Amazon EMR with three core nodes (r5.16xlarge (64 vCPUs each)) and 18 task nodes using Spot fleet instances (a combination of r5.16xlarge (64 vCPUs), r5.12xlarge (48 vCPUs), r5.8xlarge (32 vCPUs)). These recommendations helped Zoom to reduce their Amazon EMR costs by more than 80% while meeting their desired performance goals of ingesting 150 million Kafka messages under 5 minutes.

Decoupling ingestion and GDPR deletion using EMRFS

A well-known benefit of separation of storage and compute is that you can scale the two independently. But a not-so-obvious advantage is that you can decouple continuous workloads from sporadic workloads. Previously data was stored in HDFS. Resource-intensive GDPR delete jobs and data movement jobs would compete for resources with the stream ingestion, causing a backlog of more than 5 hours in upstream Kafka clusters, which was close to filling up the Kafka storage (which only had 6 hours of data retention) and potentially causing data loss. Offloading data from HDFS to Amazon S3 allowed us the freedom to launch independent transient EMR clusters on demand to perform data deletion, helping to ensure that the ongoing data ingestion from Kafka into Amazon EMR is not starved for resources. This enabled the system to ingest data every 5 minutes and complete each Spark Streaming read in 2–3 minutes. Another side effect of using EMRFS is a cost-optimized cluster, because we removed reliance on Amazon Elastic Block Store (Amazon EBS) volumes for over 300 TB storage that was used for three copies (including two replicas) of HDFS data. We now pay for only one copy of the data in Amazon S3, which provides 11 9s of durability and is relatively inexpensive storage.

Efficient deletes with Apache Hudi

What about the conflict between ingest writes and GDPR deletes when running concurrently? This is where the power of Apache Hudi stands out.

Apache Hudi provides a table format for data lakes with transactional semantics that enables the separation of ingestion workloads and updates when run concurrently. The system was able to consistently delete 1,000 records in less than a minute. There were some limitations in concurrent writes in Apache Hudi 0.7.0, but the Amazon EMR team quickly addressed this by back-porting Apache Hudi 0.8.0, which supports optimistic concurrency control, to the current (at the time of the AWS Data Lab collaboration) Amazon EMR 6.4 release. This saved time in testing and allowed for a quick transition to the new version with minimal testing. This enabled us to query the data directly using Athena quickly without having to spin up a cluster to run ad hoc queries, as well as to query the data using Presto, Trino, and Hive. The decoupling of the storage and compute layers provided the flexibility to not only query data across different EMR clusters, but also delete data using a completely independent transient cluster.

Optimizing for low-latency reads with Apache Hudi

To optimize for low-latency reads with Apache Hudi, we needed to address the issue of too many small files being created within Amazon S3 due to the continuous streaming of data into the data lake.

We utilized Apache Hudi’s features to tune file sizes for optimal querying. Specifically, we reduced the degree of parallelism in Hudi from the default value of 1,500 to a lower number. Parallelism refers to the number of threads used to write data to Hudi; by reducing it, we were able to create larger files that were more optimal for querying.

Because we needed to optimize for high-volume streaming ingestion, we chose to implement the merge on read table type (instead of copy on write) for our workload. This table type allowed us to quickly ingest the incoming data into delta files in row format (Avro) and asynchronously compact the delta files into columnar Parquet files for fast reads. To do this, we ran the Hudi compaction job in the background. Compaction is the process of merging row-based delta files to produce new versions of columnar files. Because the compaction job would use additional compute resources, we adjusted the degree of parallelism for insertion to a lower value of 1,000 to account for the additional resource usage. This adjustment allowed us to create larger files without sacrificing performance throughput.

Overall, our approach to optimizing for low-latency reads with Apache Hudi allowed us to better manage file sizes and improve the overall performance of our data lake.

Monitoring

The team monitored MSK clusters with Prometheus (an open-source monitoring tool). Additionally, we showcased how to monitor Spark streaming jobs using Amazon CloudWatch metrics. For more information, refer to Monitor Spark streaming applications on Amazon EMR.

Outcomes

The collaboration between Zoom and the AWS Data Lab demonstrated significant improvements in data ingestion, processing, storage, and deletion using an architecture with Amazon EMR and Apache Hudi. One key benefit of the architecture was a reduction in infrastructure costs, which was achieved through the use of cloud-native technologies and the efficient management of data storage. Another benefit was an improvement in data management capabilities.

We showed that the costs of EMR clusters can be reduced by about 82% while bringing the storage costs down by about 90% compared to the prior HDFS-based architecture. All of this while making the data available in the data lake within 5 minutes of ingestion from the source. We also demonstrated that data deletions from a data lake containing multiple petabytes of data can be performed much more efficiently. With our optimized approach, we were able to delete approximately 1,000 records in just 1–2 minutes, as compared to the previously required 3 hours or more.

Conclusion

In conclusion, the log analytics process, which involves collecting, processing, storing, analyzing, and deleting log data from various sources such as servers, applications, and devices, is critical to aid organizations in working to meet their service resiliency, security, performance monitoring, troubleshooting, and compliance needs, such as GDPR.

This post shared what Zoom and the AWS Data Lab team have accomplished together to solve critical data pipeline challenges, and Zoom has extended the solution further to optimize extract, transform, and load (ETL) jobs and resource efficiency. However, you can also use the architecture patterns presented here to quickly build cost-effective and scalable solutions for other use cases. Please reach out to your AWS team for more information or contact Sales.


About the Authors

Sekar Srinivasan is a Sr. Specialist Solutions Architect at AWS focused on Big Data and Analytics. Sekar has over 20 years of experience working with data. He is passionate about helping customers build scalable solutions modernizing their architecture and generating insights from their data. In his spare time he likes to work on non-profit projects focused on underprivileged Children’s education.

Chandra DhandapaniChandra Dhandapani is a Senior Solutions Architect at AWS, where he specializes in creating solutions for customers in Analytics, AI/ML, and Databases. He has a lot of experience in building and scaling applications across different industries including Healthcare and Fintech. Outside of work, he is an avid traveler and enjoys sports, reading, and entertainment.

Amit Kumar Agrawal is a Senior Solutions Architect at AWS, based out of San Francisco Bay Area. He works with large strategic ISV customers to architect cloud solutions that address their business challenges. During his free time he enjoys exploring the outdoors with his family.

Viral Shah is a Analytics Sales Specialist working with AWS for 5 years helping customers to be successful in their data journey. He has over 20+ years of experience working with enterprise customers and startups, primarily in the data and database space. He loves to travel and spend quality time with his family.

Improve power utility operational efficiency using smart sensor data and Amazon QuickSight

Post Syndicated from Bin Qiu original https://aws.amazon.com/blogs/big-data/improve-power-utility-operational-efficiency-using-smart-sensor-data-and-amazon-quicksight/

This blog post is co-written with Steve Alexander at PG&E.

In today’s rapidly changing energy landscape, power disturbances cause businesses millions of dollars due to service interruptions and power quality issues. Large utility territories make it difficult to detect and locate faults when power outages occur, leading to longer restoration times, recurring outages, and unhappy customers. Although it’s complex and expensive to modernize distribution networks, many utilities choose to use their capital through the application of smart sensor technologies. These smart sensors are installed in selected locations on distribution networks to monitor various disturbances, such as momentary and permanent outages, line disturbances, voltage sags and surges. The sensors provide analysts with fault waveforms and alerts in addition to graphical representation of regular loads. Different communication infrastructure types such as mesh network and cellular can be used to send load information on a pre-defined schedule or event data in real time to the backend servers residing in the utility UDN (Utility Data Network).

In this series of posts, we walk you through how we use Amazon QuickSight, a serverless, fully managed, business intelligence (BI) service that enables data-driven decision making at scale. QuickSight meets varying analytics needs with modern interactive dashboards, paginated reports, natural language queries, ML-insights, and embedded analytics, from one unified service.

In this first post of the series, we show you how data collected from smart sensors is used for building automated dashboards using QuickSight to help distribution network engineers manage, maintain and troubleshoot smart sensors and perform advanced analytics to support business decision making.

Current challenges in power utility operations

To have a comprehensive monitoring coverage of the distribution networks, utilities normally deploy hundreds, if not thousands, of smart sensors. Similar to any other equipment or device, smart sensors could encounter different issues, such as having defective parts, wearing out over time, becoming obsolete due to technological advances, or suffering loss of communication due to power outages or low cellular signal coverage. Managing such a large number of devices can be challenging.

Furthermore, based on the use case, utilities normally apply sensor technologies from different vendors. Solutions from different vendors can vary, such as data protocols, formats, native connectors, and communication media, which further increases the complexity of managing these smart sensors.

To effectively solve smart sensor management issues and improve operational efficiency, distribution engineers need a BI application that is simple to use and has a powerful data processing and analytics engine. QuickSight provides an ideal solution to meet these business needs.

Solution overview

The following highly simplified architectural diagram illustrates the smart sensor data collection and processing. Smart sensors send data via cellular communication based on a predefined schedule or triggered by real-time events. Data collection and processing are handled by a third-party smart sensor manufacturer application residing in Amazon Virtual Private Cloud (Amazon VPC) private subnets behind a Network Load Balancer. Amazon Kinesis Data Streams interacts with the third-party application through a native connection and conducts necessary data transformation in real time, and Amazon Kinesis Data Firehose stores the data in Amazon Simple Storage Service (Amazon S3) buckets. The AWS Glue Data Catalog contains the table definitions for the smart sensor data sources stored in the S3 buckets. Amazon Athena runs queries using a variety of SQL statements on data stored in Amazon S3, and QuickSight is used for business intelligence and data visualization.

After the smart sensor’s data is collected and stored in Amazon S3 and is accessible via Athena, we can focus on building the following QuickSight dashboards for distribution network engineers:

  • Sensor status dashboard – Analyze and monitor the status of smart sensors
  • Distribution network events dashboard – Analyze the operational information of the distribution networks

Prerequisites

This solution requires an active AWS account with the permission to create and modify AWS Identity and Access Management (IAM) roles along with the following services enabled:

  • Athena
  • AWS Glue
  • Kinesis Data Firehose
  • Kinesis Data Streams
  • Network Load Balancer
  • QuickSight
  • Amazon S3
  • Amazon VPC

Additionally, data collection and data processing are functional blocks of the third-party smart sensor manufacturer application. The smart sensor application solution must be already deployed in the same AWS account and Region that you will use for the dashboards.

This solution uses QuickSight SPICE (Super-fast, Parallel, In-memory Calculation Engine) storage to improve dashboard performance.

Sensor status dashboard

When hundreds or thousands of line sensors are installed, it’s critical for distribution engineers to understand the status of all smart sensors on a regular basis and fix issues to ensure smart sensors provide real-time information for operator decision-making. Assuming a utility has 5,000 smart sensors installed, even if only 1% of the sensors have communication issues (a realistic scenario based on utility experience), distribution engineers need to check and troubleshoot 50 sensors per day on average. The smart sensor communication losses could be caused by low cellular signal strength, low power supply, or planned or unplanned outages. If it takes 10 minutes to analyze one sensor, it will cause the engineering team around 500 minutes per day just to analyze the questionable smart sensors.

Rather than checking smart sensor information from different applications or systems to find answers, a sensor status dashboard solves this problem by aggregating status statistics across all sensors by different attributes, including sensor location, communication status, and distributions in different regions, substations, and circuits.

In the following sensor status dashboard, a hypothetical utility has 102 smart sensors (each location needs three sensors for phases A, B, and C) deployed in five substations and six circuits. During normal operations, smart sensor reports load data every 5–15 minutes, and the event data (different fault events) could come at any time depending on the circuit situation.

Multiple panes are designed to help distribution engineers answer critical questions on smart sensors and facilitate troubleshooting in case communication issues happen to smart sensors:

  • Summary – The top summary pane provides a quick glance of the smart sensor statistics, such as number of substations, circuits, smart sensors with good communications, or smart sensors that have communication issues.
  • Smart Sensor Status By Location – This pane shows the geographical distributions of all the smart sensors. Different colors are used to demonstrate smart sensor operational status. In this case, four of the sensors have communication issues, which are shown in red on the map. The operator can identify the questionable sensors, zoom in, and determine the actual location of these sensors. When operators pick up the questionable smart sensors, the geo-map can auto focus on these smart sensors as well.
  • Sensor Status By Substation and Circuit – This pane gives operators a glance of smart sensors by substation and circuit, such as number of healthy smart sensors and number of sensors with communication issues.
  • Unhealthy Sensor Details – This pane provides information about questionable smart sensor data.
  • Cellular Communication Signal Strength Distribution – Smart sensors transmit data to the cloud using cellular communication. If the signal strength is lower than -100 dBm to -109 dBm (considered poor signal of 1 to 2 bars), the signal might be too weak for the sensor to transmit data. Distribution lines provide power to the smart sensors. If the line current is lower than 5-10 Amps, the sensor may not have enough power to transmit data as well. Therefore, cellular communication strength and circuit loads provide critical information for operators to narrow down the potential root causes of the smart sensor communication loss issues. The Cellular Communication Signal Strength Distribution pane provides this information. Red dots represent smart sensors with either very low signal strength or very low circuit load, orange dots show moderate signal strength and circuit load, and green dots are the sensors with strong signal strength as well as large circuit load.
  • Smart Sensor Health Status Trend – Although real-time information is important to understand the smart sensors’ status live, it’s critical to learn the health trend of smart sensors as well. The Smart Sensor Health Status Trend pane provides a pattern showing whether the overall operations of the smart sensor are better or worse by week or day. Operators can choose the time range, substation, or circuit to learn more granular information.
  • Sensor Distribution by Substation and Sensor Distribution by Circuit – These panes help the operator learn the smart sensor deployment distribution information.
  • Smart Sensor List – This sensor detail pane provides comprehensive information of the smart sensors in a tabular view in case the operator wants to search or sort sensors by detail information.

With aggregated smart sensor data (geo location, cellular signal strength, distributed circuit power flow), operators can quickly identify problematic sensors and narrow down the possible root causes. This approach can save a significant amount of time performing sensor maintenance and troubleshooting—up to 90% or more.

In future posts in this series, we’ll show you how to use the paginated reports function to generate daily reports to improve the operational efficiency even more. The communication pane also shows the smart sensor distribution using a bar chart, and provides insights of smart sensor deployment information based on region, division, substation, and circuit.

Distribution network events dashboard

Smart sensors measure and provide the operational information of the distribution networks. This information is critical for operators to understand the circuit running status and the distribution of different events, such as permanent outages, momentary outages, line disturbance, or voltage sags and swells. QuickSight helps operators quickly configure different views, insights, and calculations on smart sensor information.

When an operator specifies a time range, QuickSight is able to provide smart sensor statistics on various metrics, such as the following:

  • Total number of events compared to a previous time frame
  • Distribution of events across selected regions, substations, or circuits
  • Distribution of events by region, substation, or circuit
  • Distribution of events by event type such as permanent or momentary faults

This information can help operators determine the areas or fault types of interest and study more detailed information. It can also help operators identify the substations or circuits with the most events and take proactive actions to fix any existing or hidden issues. The trend information can also be used to validate the equipment repair or circuit enhancement works.

Conclusion

Many utilities today are experiencing increased integration of distributed energy resources (DERs), such as solar photovoltaic, and power electronics loads such as variable speed drive and electric vehicle battery chargers. However, the existing grid wasn’t originally designed to coordinate these DERs, which can cause hidden issues on the existing networks. A large number of smart sensors are widely used to monitor the distribution networks to improve grid resiliency and stability.

In this post, we showed how QuickSight can help power utility distribution network engineers or operators to visualize smart sensor status in real time and troubleshoot smart sensor issues. We discussed out-of-the-box QuickSight features such as its rich suite of visualizations, analytical functions and calculations, in-memory data engine, and scalability, which will greatly reduce the time, cost, and effort of managing large number of smart sensors and fixing any problems early.

Smart sensors are the eyes and ears of utility distribution networks. With QuickSight BI functions, operators can quickly and easily create circuit event dashboards; search, sort, filter, and analyze different mission-critical events; and help engineers take early action when certain abnormalities occur on the distribution networks.

In the following posts in this series, we’ll show you how to use QuickSight to generate daily paginated reports and use advanced features such as natural language processing to conduct advanced search and analytics functions.


About the Authors

Bin Qiu is a Global Partner Solutions Architect focusing on ER&I at AWS. He has more than 20 years’ experience in the energy and power industries, designing, leading, and building different smart grid projects, such as distributed energy resources, microgrid, AI/ML implementation for resource optimization, IoT smart sensor application for equipment predictive maintenance, EV car and grid integration, and more. Bin is passionate about helping utilities achieve digital and sustainability transformations.

Steve Alexander is a Senior Manager, IT Products at PG&E. He leads product teams building wildfire prevention and risk mitigation data products. Recent work has been focused on integrating data from various sources including weather, asset data, sensors, and dynamic protective devices to improve situational awareness and decision-making. Steve has over 20 years of experience with data systems and cutting-edge IT research and development, and is passionate about applying creative thinking in technical domains.

Karthik Tharmarajan is a Senior Specialist Solutions Architect for Amazon QuickSight. Karthik has over 15 years of experience implementing enterprise business intelligence (BI) solutions and specializes in integration of BI solutions with business applications and enabling data-driven decisions.

Ranjan Banerji is a Principal Partner Solutions Architect at AWS focused on the power and utilities vertical. Ranjan has been at AWS for 5 years, first on the department of defense (DoD) team helping the branches of the DoD migrate and/or build new systems on AWS ensuring security and compliance requirements and now supporting the power and utilities team. Ranjan’s expertise ranges from server less architecture to security and compliance for regulated industries. Ranjan has over 25 years of experience building and designing systems for the DoD, federal agencies, energy, and financial industry.

The collective thoughts of the interwebz