Tag Archives: апи

Practical API Design at Netflix, Part 2: Protobuf FieldMask for Mutation Operations

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/practical-api-design-at-netflix-part-2-protobuf-fieldmask-for-mutation-operations-2e75e1d230e4

By Ricky Gardiner, Alex Borysov

Background

In our previous post, we discussed how we utilize FieldMask as a solution when designing our APIs so that consumers can request the data they need when fetched via gRPC. In this blog post we will continue to cover how Netflix Studio Engineering uses FieldMask for mutation operations such as update and remove.

Example: Netflix Studio Production

Money Heist (La casa de papel) / Netflix

Previously we outlined what a Production is and how the Production Service makes gRPC calls to other microservices such as the Schedule Service and Script Service to retrieve schedules and scripts (aka screenplay) for a particular production such as La Casa De Papel. We can take that model and showcase how we can mutate particular fields on a production.

Mutating Production Details

Let’s say we want to update the format field from LIVE_ACTION to HYBRID as our production has added some animated elements. A naive way for us to solve this is to add an updateProductionFormatRequest method and gRPC endpoint just to update the productionFormat:

This allows us to update the production format for a particular production but what if we then want to update other fields such as titleor even multiple fields such as productionFormat, schedule, etc? Building on top of this we could just implement an update method for every field: one for Production format, another for title and so on:

This can become unmanageable when maintaining our APIs due to the number of fields on the Production. What if we want to update more than one field and do it atomically in a single RPC? Creating additional methods for various combinations of fields will lead to an explosion of mutation APIs. This solution is not scalable.

Instead of trying to create every single combination possible, another solution could be to have an UpdateProduction endpoint that requires all fields from the consumer:

The issue with this solution is two-fold as the consumer must know and provide every single required field in a Production even if they just want to update one field such as the format. The other issue is that since a Production has many fields the request payload can become quite large particularly if the production has schedule or scripts information.

What if, instead of all the fields, we send only the fields we actually want to update, and leave all other fields unset? In our example, we would only set the production format field (and ID to reference the production):

This could work if we never need to remove or blank out any fields. But what if we want to remove the value of the title field? Again, we can introduce one-off methods like RemoveProductionTitle, but as discussed above, this solution does not scale well. What if we want to remove a value of a nested field such as the planned launch date field from the schedule? We would end up adding remove RPCs for every individual nullable sub-field.

Utilizing FieldMask for Mutations

Instead of numerous RPCs or requiring a large payload, we can utilize a FieldMask for all our mutations. The FieldMask will list all of the fields we would like to explicitly update. First, let’s update our proto file to add in the UpdateProductionRequest, which will contain the data we want to update from a production, and a FieldMask of what should be updated:

Now, we can use a FieldMask to make mutations. We can update the format by creating a FieldMask for the format field by using the FieldMaskUtil.fromStringList() utility method which constructs a FieldMask for a list of field paths in a certain type. In this case, we will have one type, but will build upon this example later:

Since our FieldMask only specifies the format field that will be the only field that is updated even if we provide more data in ProductionUpdateOperation. It becomes easier to add or remove more fields to our FieldMask by modifying the paths. Data that is provided in the payload but not added in a path of a FieldMask will not be updated and simply ignored in the operation. But, if we omit a value it will perform a remove mutation on that field. Let’s modify our example above to showcase this and update the format but remove the planned launch date, which is a nested field on the ProductionSchedule as “schedule.planned_launch_date”:

In this example, we are performing both update and remove mutations as we have added “format” and “schedule.planned_launch_date” paths to our FieldMask. When we provide this in our payload these fields will be updated to the new values, but when building our payload we are only providing the format and omitting the schedule.planned_launch_date. Omitting this from the payload but having it defined in our FieldMask will function as a remove mutation:

Empty / Missing Field Mask

When a field mask is unset or has no paths, the update operation applies to all the payload fields. This means the caller must send the whole payload or, as mentioned above, any unset fields will be removed.

This convention has an implication on schema evolution: when a new field is added to the message, all the consumers must start sending its value on the update operation or it will get removed.

Suppose we want to add a new field: production budget. We will extend both the Production message, and ProductionUpdateOperation:

If there is a consumer that doesn’t know about this new field or hasn’t updated client stubs yet, it can accidentally null the budget field out by not sending the FieldMask in the update request.

To avoid this issue, the producer should consider requiring the field mask for all the update operations. Another option would be to implement a versioning protocol: force all callers to send their version numbers and implement custom logic to skip fields not present in the old version.

Bella Ciao

In this blog post series, we have gone over how we use FieldMask at Netflix and how it can be a practical and scalable solution when designing your APIs.

API designers should aim for simplicity, but make their APIs open for extension and evolution. It’s often not easy to keep APIs simple and future-proof. Utilizing FieldMask in APIs helps us achieve both simplicity and flexibility.


Practical API Design at Netflix, Part 2: Protobuf FieldMask for Mutation Operations was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Maintaining Zabbix API token via JavaScript

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/maintaining-zabbix-api-token-via-javascript/15561/

In this blog post, we will talk about maintaining and storing the Zabbix API session key in an automated fashion. The blog post builds upon the Close problem automatically via Zabbix API subject and can be used as extra configuration for this particular use-case. The blog post also shares a great example of synthetic monitoring by way of JavaScript preprocessing – how to emulate a scenario in an automated fashion and get alerted in case of any problems.

Prerequisites

First, let us create the Zabbix API user and user macros where we will store our username, password, Zabbix URL and the API session key.

1) Open “Administration” => “Users”. Create a new user ‘api’ with password ‘zabbix’. At the permissions tab set User Type “Zabbix Super Admin”.

2) Go to “Administration” => “General” => “Macros”. Configure base characteristics:

      {$Z_API_PHP} = http://demo.zabbix.demo/api_jsonrpc.php
     {$Z_API_USER} = api
 {$Z_API_PASSWORD} = zabbix
{$Z_API_SESSIONID} =

It’s OK to leave {$Z_API_SESSIONID} empty for now.

3) Let’s check if the Zabbix backend server can reach the Zabbix frontend server. Make sure that you are logged into the Zabbix backend server by looking up the zabbix_server process:

ps auxww | grep "[z]abbix_server.*conf"

Ensure that we can reach the Zabbix frontend by curling the Zabbix frontend server from the Zabbix backend server:

curl -s "http://demo.zabbix.demo/index.php" | grep Zabbix

4) Download template “Check and repair Zabbix API token” and import it in your instance.

5) Create a new host with an Agent interface and link the template. The IP address of the host does not matter. The template will use an agentless check to do the monitoring, it will use an “HTTP agent” item.

How it works

Our goal for today is to figure out a way to keep the Zabbix API authentication token up to date in a user macro. This way we can reuse the macro repeatedly for items, action operations and scripts that require for us to use the Zabbix API. We need to ensure that even if the token changes, the macro gets automatically updated with the new token value! Let’s try and understand each step of the underlying workflow required for us to achieve this goal.

The first component of our workflow is the “Validate session key raw” item. This is an HTTP agent item that performs a POST request with an arbitrary method – proxy.get in this case, but we could have used ANY other method. We simply want to check if an arbitrary Zabbix API call can be executed with the current {$Z_API_SESSIONID} macro value.

The second part of the workflow is the “Repair session key” dependent item. This item utilizes the JavaScript preprocessing step with custom JavaScript code to check the values obtained by the previous item and generate a new authentication token if that is necessary.

The third item – “Status”, is another dependent item that uses regular expression preprocessing steps to check for different error messages or status codes in the value of the “Validate session key raw” item. Most of the triggers defined in this template will react to the values obtained by this item.

 

Below you can see the full underlying workflow:

Code-wise, the magic is implemented with the following JavaScript code snippet:

if (value.match(/Session terminated/)) {

var req = new CurlHttpRequest();

// Zabbix API
var json_rpc='{$Z_API_PHP}'

// lib curl header
req.AddHeader('Content-Type: application/json');

// First request to get authentication token
var token =  JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"user.login","params":{"user":"{$Z_API_USER}","password":"{$Z_API_PASSWORD}"},"id":1,"auth":null}'
));

// If authentication was unsuccessful
if ( token.error )
{
// Login name or password is incorrect
return 32500;
}

else {
// Update the macro

// Get the global macro ID
// We cannot plot here a very native Zabbix macro because it will be automatically expanded
// Must use a workaround to distinguish a dollar sign from the actual macro name and merge with '+'
var id = JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"usermacro.get","params":{"output":["globalmacroid"],"globalmacro":true,"filter":{"macro":"{$'+'Z_API_SESSIONID'+'}"}},"auth":"'+token.result+'","id":1}'
)).result[0].globalmacroid;

// This line contains a keyword '+value+' which will grab exactly the previous value outside this JavaScript snippet
var overwrite = JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"usermacro.updateglobal","params":{"globalmacroid":"'+id+'","value":"'+token.result+'"},"auth":"'+token.result+'","id":1}'
));

// Return the id (an integer) of the macro, which was updated
return overwrite.result.globalmacroids[0];
}

} else {
return 0;
}

Throughout the JavaScript code, we are extracting a value from one step and using that as an input for the next step.

After the data has been collected, the values are analyzed by 4 triggers. 3 out of these 4 triggers prints a misconfiguration problem that requires a human investigation. We also have a “repairing Zabbix API session key in background” title, which is the main trigger that indicates a token has been expired and repair automatically.

And that’s it – now any integration that requires a Zabbix API authentication token can receive the current token value by referencing the user macro that we created in the article. We’ve ensured that the token is always going to stay up to date and in case of any issues, we will receive an alert from Zabbix!

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/practical-api-design-at-netflix-part-1-using-protobuf-fieldmask-35cfdc606518

By Alex Borysov, Ricky Gardiner

Background

At Netflix, we heavily use gRPC for the purpose of backend to backend communication. When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore. Some response fields can be expensive to compute, some fields can require remote calls to other services. Remote calls are never free; they impose extra latency, increase probability of an error, and consume network bandwidth. How can we understand which fields the caller doesn’t need to be supplied in the response, so we can avoid making unnecessary computations and remove calls? With GraphQL this comes out of the box through the use of field selectors. In the JSON:API standard a similar technique is known as Sparse Fieldsets. How can we achieve a similar functionality when designing our gRPC APIs? The solution we use within the Netflix Studio Engineering is protobuf FieldMask.

Money Heist (La casa de papel) / Netflix

Protobuf FieldMask

Protocol Buffers, or simply protobuf, is a data serialization mechanism. By default, gRPC uses protobuf as its IDL (interface definition language) and data serialization protocol.

FieldMask is a protobuf message. There are a number of utilities and conventions on how to use this message when it is present in an RPC request. A FieldMask message contains a single field named paths, which is used to specify fields that should be returned by a read operation or modified by an update operation.

Example: Netflix Studio Production

Money Heist (La casa de papel) / Netflix

Let’s assume there is a Production service that manages Studio Content Productions (in the film and TV industry, the term production refers to the process of making a movie, not the environment to run a software).

GetProduction returns a Production message by its unique ID. A production contains multiple fields such as: title, format, schedule dates, scripts aka screenplay, budgets, episodes, etc, but let’s keep this example simple and focus on filtering out schedule dates and scripts when requesting a production.

Reading Production Details

Let’s say we want to get production information for a particular production such as “La Casa De Papel” using the GetProduction API. While a production has many fields, some of these fields are returned from other services such as schedule from the Schedule service, or scripts from the Script service.

The Production service will make RPCs to Schedule and Script services every time GetProduction is called, even if clients ignore the schedule and scripts fields in the response. As mentioned above, remote calls are not free. If the service knows which fields are important for the caller, it can make an informed decision about making expensive calls, starting resource-heavy computations, and/or calling the database. In this example, if the caller only needs production title and production format, the Production service can avoid making remote calls to Schedule and Script services.

Additionally, requesting a large number of fields can make the response payload massive. This can become an issue for some applications, for example, on mobile devices with limited network bandwidth. In these cases it is a good practice for consumers to request only the fields they need.

Money Heist (La casa de papel) / Netflix

A naïve way of solving these problems can be adding additional request parameters, such as includeSchedule and includeScripts:

This approach requires adding a custom includeXXX field for every expensive response field and doesn’t work well for nested fields. It also increases the complexity of the request, ultimately making maintenance and support more challenging.

Add FieldMask to the Request Message

Instead of creating one-off “include” fields, API designers can add field_mask field to the request message:

Consumers can set paths for the fields they expect to receive in the response. If a consumer is only interested in production titles and format, they can set a FieldMask with paths “title” and “format”:

Masking fields

Please note, even though code samples in this blog post are written in Java, demonstrated concepts apply to any other language supported by protocol buffers.

If consumers only need a title and an email of the last person who updated the schedule, they can set a different field mask:

By convention, if a FieldMask is not present in the request, all fields should be returned.

Protobuf Field Names vs Field Numbers

You might notice that paths in the FieldMask are specified using field names, whereas on the wire, encoded protocol buffers messages contain only field numbers, not field names. This (alongside some other techniques like ZigZag encoding for signed types) makes protobuf messages space-efficient.

To understand the difference between field numbers and field names, let’s take a detailed look at how protobuf encodes and decodes messages.

Our protobuf message definition (.proto file) contains Production message with five fields. Every field has a type, name, and number.

When the protobuf compiler (protoc) compiles this message definition, it creates the code in the language of your choice (Java in our example). This generated code contains classes for defined messages, together with message and field descriptors. Descriptors contain all the information needed to encode and decode a message into its binary format. For example, they contain field numbers, names, types. Message producer uses descriptors to convert a message to its wire format. For efficiency, the binary message contains only field number-value pairs. Field names are not included. When a consumer receives the message, it decodes the byte stream into an object (for example, Java object) by referencing the compiled message definitions.

As mentioned above, FieldMask lists field names, not numbers. Here at Netflix we are using field numbers and convert them to field names using FieldMaskUtil.fromFieldNumbers() utility method. This method utilizes the compiled message definitions to convert field numbers to field names and creates a FieldMask.

However, there is an easy-to-overlook limitation: using FieldMask can limit your ability to rename message fields. Renaming a message field is generally considered a safe operation, because, as described above, the field name is not sent on the wire, it is derived using the field number on the consumer side. With FieldMask, field names are sent in the message payload (in the paths field value) and become significant.

Suppose we want to rename the field title to title_name and publish version 2.0 of our message definition:

In this chart, the producer (server) utilizes new descriptors, with field number 2 named title_name. The binary message sent over the wire contains the field number and its value. The consumer still uses the original descriptors, where the field number 2 is called title. It is still able to decode the message by field number.

This works well if the consumer doesn’t use FieldMask to request the field. If the consumer makes a call with the “title” path in the FieldMask field, the producer will not be able to find this field. The producer doesn’t have a field named title in its descriptors, so it doesn’t know the consumer asked for field number 2.

As we see, if a field is renamed, the backend should be able to support new and old field names until all the callers migrate to the new field name (backward compatibility issue).

There are multiple ways to deal with this limitation:

  • Never rename fields when FieldMask is used. This is the simplest solution, but it’s not always possible
  • Require the backend to support all the old field names. This solves the backward compatibility issue but requires extra code on the backend to keep track of all historical field names
  • Deprecate old and create a new field instead of renaming. In our example, we would create the title_name field number 6. This option has some advantages over the previous one: it allows the producer to keep using generated descriptors instead of custom converters; also, deprecating a field makes it more prominent on the consumer side

Regardless of the solution, it is important to remember that FieldMask makes field names an integral part of your API contract.

Using FieldMask on the Producer (Server) Side

On the producer (server) side, unnecessary fields can be removed from the response payload using the FieldMaskUtil.merge() method (lines ##8 and 9):

If the server code also needs to know which fields are requested in order to avoid making external calls, database queries or expensive computations, this information can be obtained from the FieldMask paths field:

This code calls the makeExpensiveCallToScheduleServicemethod (line #21) only if the schedule field is requested. Let’s explore this code sample in more detail.

(1) The SCHEDULE_FIELD_NAME constant contains the name of the field. This code sample uses message type Descriptor and FieldDescriptor to lookup field name by field number. The difference between protobuf field names and field numbers is described in the Protobuf Field Names vs Field Numbers section above.

(2) FieldMaskUtil.normalize() returns FieldMask with alphabetically sorted and deduplicated field paths (aka canonical form).

(3) Expression (lines ##14 – 17) that yields the scheduleFieldRequestedvalue takes a stream of FieldMask paths, maps it to a stream of top-level fields, and returns true if top-level fields contain the value of the SCHEDULE_FIELD_NAME constant.

(4) ProductionSchedule is retrieved only if scheduleFieldRequested is true.

If you end up using FieldMask for different messages and fields, consider creating reusable utility helper methods. For example, a method that returns all top-level fields based on FieldMask and FieldDescriptor, a method to return if a field is present in a FieldMask, etc.

Ship Pre-built FieldMasks

Some access patterns can be more common than others. If multiple consumers are interested in the same subset of fields, API producers can ship client libraries with FieldMask pre-built for the most frequently used field combinations.

Providing pre-built field masks simplifies API usage for the most common scenarios and leaves consumers the flexibility to build their own field masks for more specific use-cases.

Limitations

  • Using FieldMask can limit your ability to rename message fields (described in the Protobuf Field Names vs Field Numbers section)
  • Repeated fields are only allowed in the last position of a path string. This means you cannot select (mask) individual sub-fields in a message inside a list. This can change in the foreseeable future, as a recently approved Google API Improvement Proposal AIP-161 Field masks includes support for wildcards on repeated fields.

Bella Ciao

Protobuf FieldMask is a simple, yet powerful concept. It can help make APIs more robust and service implementations more efficient.

This blog post covered how and why it is used at Netflix Studio Engineering for APIs that read the data. Part 2 will shed light on using FieldMask for update and remove operations.


Practical API Design at Netflix, Part 1: Using Protobuf FieldMask was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Тука е така или с АПИ на семинар

Post Syndicated from Боян Юруков original https://yurukov.net/blog/2021/api-seminar/

Бях решил да не се занимавам с тази случка, но заради това, което стана накрая реших, че си го заслужават.

Взех си няколко дни отпуска и в предпоследния ден в хотела ни нямаше почти хора. Тогава забелязах, че пристигат доста коли, сред които поне четири на Агенция Пътна инфраструктура както и няколко прескъпи джипа. Досетих се, че явно ще имат конференция или семинар или нещо подобно.

Така и се оказа. Напълниха конферентната стая с 80-тина души за 2-3 часа, след което се отправиха към басейна. В началото не бях сигурен кое ме раздразни – че трошаха яко държавна пара за 5-звезден хотел, че изведнъж всичко се напълни на 100% или че дори не си правиха труда да изглеждат, че са на официално събитие.

Бързо обаче размислих и се примирих – можеше да се напълни от други гости, а и все пак първи ден им е и може да имат друга програма за следващите. Дори фактът, че харчат толкова за пет звезден хотел в условията на сериозни скандали за разточителство на агенцията отметнах с това, че бледнее пред милиардите, които са раздали без обосновка, поръчка или дори строеж. А и все пак е форма на бонус към служители на държавна работа, която ако не се вкарат в някаква схема не може да се отрече, че е доста неблагодарна. Всеки офис и организация имат нужда от някаква форма на team-building, особено след последната година и въпреки всички резерви, които имах към тях. Отметнах всичко с ръка.

Гледах да не им обръщам внимание, но не можех да избягам от разговорите им. Явно не им пукаше много кой ги чува. Темите започнаха от злободневни – колко било скъпо в Гърция сега, колко скочили цените, един се жалваше колко малко от хората в агенцията са се ваксинирали, а никой не носел маска никъде в офисите или на конференцията сега.

После минаха на клюки кой какъв страничен бизнес имал, кой шеф успял да си доведе любовницата в хотела и кой не успял, защото я бил уредил в администрацията, а тука били само архитектура. После кой шеф каква кола под 6 цифрени суми не поглеждал и прочие. Общо взето се бяха разделили на три групи: едни дето се правеха на тежкари (предимно мъже, но и две жени), младите, които активно се наслаждаваха на почивката и „лелките и чичковци“, които също се радваха на басейна, но видимо бяха уморени от цялата драма.

Тежкарите изглеждаха доста притеснени през цялото време и все висяха на телефона. Единият много настоя точно до мен да крещи как са посмели да пуснат нещо електронно, като трябвало да си стои затворено в папка в бюрото му. Други двама постоянно дваха инструкции какво да се подписва от тяхно име „все едно са там“. Едната увещаваше, че трябвало да изкарат, че е била на комисия за одобрение на някакъв проект с още двама, които обръщаха коктейли на бара, а служителят в офиса трябвало с подписите на тримата да подпише протокола, защото „всичко сме се разбрали вече, така трябва да мине“. Имаше разговори и за очакваните бонуси и как нищо не можело да се спре, защото „то вече уредено и подписано всичко“.

Всички тези подхвърляния и спорадични изпускания показваха една добре смазана машина, която явно работи дори и от сауната и няма особени притеснения кой какво ще чуе. Общо взето смесица от обикновени хора, които просто искат да си свършат работата предвид ситуацията, в която са поставени от други хора, които раздават стотици милиони с един подпис и не им пука, защото няма кой да ги съди.

Гледах да страня от тях, защото все пак исках да си почина от тези глупости. Не смятах, че което и да е от подвърлянията или гръмогласните заповеди е толкова конкретно, за да си заслужава внимание, а и общата картинка е ясна на всички. Причината обаче да напиша всичко това беше черешката на тортата накрая.

Точно когато освобождавах стаята във фоайето бяха седнали няколко от АПИ и обсъждаха сметките. Напред-назад сновеше служителка на хотела притеснено. Темата беше как трябва да бъде изготвена фактурата и кое как да бъде вписано в касовата бележка. За малкото време, в което чаках собствената си сметка, на три пъти чух вариации на изказване, че хотелът иска да спази закона и трябва да фактурира всичко както е консумирано. От своя страна заобиколилите я държавни служители АПИ отчетливо заплашиха, че ще им изпратят данъчните на проверка и ще „има много неприятни ситуации“, ако не променят сметката.

Всички бяха с много ехидни усмивки, които не се сдържах да снимам. Имам право на това, защото макар някои да бяха по бански, всички бяха формално на работа и обсъждаха неща свързани с разход на бюджетни средства. Скривам лицата им, но с радост ще ги предоставя на отговорната институция, ако някой все има интерес. Скривам и служителката на хотела, защото искрено я съжалих на какъв натиск и заплахи е подложена от държавни служители. Надявам се да не се е поддала, но надали. Така работи АПИ и доста други институции.

ЯЯсно тук е, че е дори да се сменят шапките на тези организации, схемите, връзките и проблемите вътре остават. Разбиването им ще доведе неизменно до масово напускане на хора и криза в конкретната администрация. Т.е. ще стане по-зле преди да стане по-добре. Никой не изглеждаше притеснен от смяната на двама шефове за два месеца, но явно бяха притеснени какво следва.

И ние като данъкоплатци и гласоподаватели сме притеснени какво следва и ключов момент тук е не само каква култура на работа, говорене, поведение и нетърпимост ще бъде наложена на най-високо ниво в управлението, но и да има прокуратура и съд, които да преследват както такива дребни документални измами, така и милиардите подарени на фирми близки до ГЕРБ, ДПС и каквото друг открият. За първото – виждаме някакъв шанс. За прокуратурата ще трябва доста работа и борба.

The post Тука е така или с АПИ на семинар first appeared on Блогът на Юруков.

Announcing Rollbacks and API Access for Pages

Post Syndicated from David Song original https://blog.cloudflare.com/rollbacks-and-api-access-for-pages/

Announcing Rollbacks and API Access for Pages

Announcing Rollbacks and API Access for Pages

A couple of months ago, we announced the general availability of Cloudflare Pages: the easiest way to host and collaboratively develop websites on Cloudflare’s global network. It’s been amazing to see over 20,000 incredible sites built by users and hear your feedback. Since then, we’ve released user-requested features like URL redirects, web analytics, and Access integration.

We’ve been listening to your feedback and today we announce two new features: rollbacks and the Pages API. Deployment rollbacks allow you to host production-level code on Pages without needing to stress about broken builds resulting in website downtime. The API empowers you to create custom functionality and better integrate Pages with your development workflows. Now, it’s even easier to use Pages for production hosting.

Rollbacks

You can now rollback your production website to a previous working deployment with just a click of a button. This is especially useful when you want to quickly undo a new deployment for troubleshooting. Before, developers would have to push another deployment and then wait for the build to finish updating production. Now, you can restore a working version within a few moments by rolling back to a previous working build.

To rollback to a previous build, just click the “Rollback to this deployment” button on either the deployments list menu or on a specific deployment page.

Announcing Rollbacks and API Access for Pages

API Access

The Pages API exposes endpoints for you to easily create automations and to integrate Pages within your development workflow. Refer to the API documentation for a full breakdown of the object types and endpoints. To get started, navigate to the Cloudflare API Tokens page and copy your “Global API Key”. Now, you can authenticate and make requests to the API using your email and auth key in the request headers.

For example, here is an API request to get all projects on an account.

Request (example)

curl -X GET "https://api.cloudflare.com/client/v4/accounts/{account_id}/pages/projects" \
     -H "X-Auth-Email: {email}" \
     -H "X-Auth-Key: {auth_key}"

Response (example)

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "name": "NextJS Blog",
    "id": "7b162ea7-7367-4d67-bcde-1160995d5",
    "created_on": "2017-01-01T00:00:00Z",
    "subdomain": "helloworld.pages.dev",
    "domains": [
      "customdomain.com",
      "customdomain.org"
    ],
    "source": {
      "type": "github",
      "config": {
        "owner": "cloudflare",
        "repo_name": "ninjakittens",
        "production_branch": "main",
        "pr_comments_enabled": true,
        "deployments_enabled": true
      }
    },
    "build_config": {
      "build_command": "npm run build",
      "destination_dir": "build",
      "root_dir": "/",
      "web_analytics_tag": "cee1c73f6e4743d0b5e6bb1a0bcaabcc",
      "web_analytics_token": "021e1057c18547eca7b79f2516f06o7x"
    },
    "deployment_configs": {
      "preview": {
        "env_vars": {
          "BUILD_VERSION": {
            "value": "3.3"
          }
        }
      },
      "production": {
        "env_vars": {
          "BUILD_VERSION": {
            "value": "3.3"
          }
        }
      }
    },
    "latest_deployment": {
      "id": "f64788e9-fccd-4d4a-a28a-cb84f88f6",
      "short_id": "f64788e9",
      "project_id": "7b162ea7-7367-4d67-bcde-1160995d5",
      "project_name": "ninjakittens",
      "environment": "preview",
      "url": "https://f64788e9.ninjakittens.pages.dev",
      "created_on": "2021-03-09T00:55:03.923456Z",
      "modified_on": "2021-03-09T00:58:59.045655",
      "aliases": [
        "https://branchname.projectname.pages.dev"
      ],
      "latest_stage": {
        "name": "deploy",
        "started_on": "2021-03-09T00:55:03.923456Z",
        "ended_on": "2021-03-09T00:58:59.045655",
        "status": "success"
      },
      "env_vars": {
        "BUILD_VERSION": {
          "value": "3.3"
        },
        "ENV": {
          "value": "STAGING"
        }
      },
      "deployment_trigger": {
        "type": "ad_hoc",
        "metadata": {
          "branch": "main",
          "commit_hash": "ad9ccd918a81025731e10e40267e11273a263421",
          "commit_message": "Update index.html"
        }
      },
      "stages": [
        {
          "name": "queued",
          "started_on": "2021-06-03T15:38:15.608194Z",
          "ended_on": "2021-06-03T15:39:03.134378Z",
          "status": "active"
        },
        {
          "name": "initialize",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "clone_repo",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "build",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "deploy",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        }
      ],
      "build_config": {
        "build_command": "npm run build",
        "destination_dir": "build",
        "root_dir": "/",
        "web_analytics_tag": "cee1c73f6e4743d0b5e6bb1a0bcaabcc",
        "web_analytics_token": "021e1057c18547eca7b79f2516f06o7x"
      },
      "source": {
        "type": "github",
        "config": {
          "owner": "cloudflare",
          "repo_name": "ninjakittens",
          "production_branch": "main",
          "pr_comments_enabled": true,
          "deployments_enabled": true
        }
      }
    },
    "canonical_deployment": {
      "id": "f64788e9-fccd-4d4a-a28a-cb84f88f6",
      "short_id": "f64788e9",
      "project_id": "7b162ea7-7367-4d67-bcde-1160995d5",
      "project_name": "ninjakittens",
      "environment": "preview",
      "url": "https://f64788e9.ninjakittens.pages.dev",
      "created_on": "2021-03-09T00:55:03.923456Z",
      "modified_on": "2021-03-09T00:58:59.045655",
      "aliases": [
        "https://branchname.projectname.pages.dev"
      ],
      "latest_stage": {
        "name": "deploy",
        "started_on": "2021-03-09T00:55:03.923456Z",
        "ended_on": "2021-03-09T00:58:59.045655",
        "status": "success"
      },
      "env_vars": {
        "BUILD_VERSION": {
          "value": "3.3"
        },
        "ENV": {
          "value": "STAGING"
        }
      },
      "deployment_trigger": {
        "type": "ad_hoc",
        "metadata": {
          "branch": "main",
          "commit_hash": "ad9ccd918a81025731e10e40267e11273a263421",
          "commit_message": "Update index.html"
        }
      },
      "stages": [
        {
          "name": "queued",
          "started_on": "2021-06-03T15:38:15.608194Z",
          "ended_on": "2021-06-03T15:39:03.134378Z",
          "status": "active"
        },
        {
          "name": "initialize",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "clone_repo",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "build",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "deploy",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        }
      ],
      "build_config": {
        "build_command": "npm run build",
        "destination_dir": "build",
        "root_dir": "/",
        "web_analytics_tag": "cee1c73f6e4743d0b5e6bb1a0bcaabcc",
        "web_analytics_token": "021e1057c18547eca7b79f2516f06o7x"
      },
      "source": {
        "type": "github",
        "config": {
          "owner": "cloudflare",
          "repo_name": "ninjakittens",
          "production_branch": "main",
          "pr_comments_enabled": true,
          "deployments_enabled": true
        }
      }
    }
  },
  "result_info": {
    "page": 1,
    "per_page": 100,
    "count": 1,
    "total_count": 1
  }
}

Here’s another quick example using the API to rollback to a previous deployment:

Request (example)

curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/pages/projects/{project_name}/deployments/{deployment_id}/rollback" \
     -H "X-Auth-Email: {email}" \
     -H "X-Auth-Key: {auth_key"

Response (example)

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "id": "f64788e9-fccd-4d4a-a28a-cb84f88f6",
    "short_id": "f64788e9",
    "project_id": "7b162ea7-7367-4d67-bcde-1160995d5",
    "project_name": "ninjakittens",
    "environment": "preview",
    "url": "https://f64788e9.ninjakittens.pages.dev",
    "created_on": "2021-03-09T00:55:03.923456Z",
    "modified_on": "2021-03-09T00:58:59.045655",
    "aliases": [
      "https://branchname.projectname.pages.dev"
    ],
    "latest_stage": {
      "name": "deploy",
      "started_on": "2021-03-09T00:55:03.923456Z",
      "ended_on": "2021-03-09T00:58:59.045655",
      "status": "success"
    },
    "env_vars": {
      "BUILD_VERSION": {
        "value": "3.3"
      },
      "ENV": {
        "value": "STAGING"
      }
    },
    "deployment_trigger": {
      "type": "ad_hoc",
      "metadata": {
        "branch": "main",
        "commit_hash": "ad9ccd918a81025731e10e40267e11273a263421",
        "commit_message": "Update index.html"
      }
    },
    "stages": [
      {
        "name": "queued",
        "started_on": "2021-06-03T15:38:15.608194Z",
        "ended_on": "2021-06-03T15:39:03.134378Z",
        "status": "active"
      },
      {
        "name": "initialize",
        "started_on": null,
        "ended_on": null,
        "status": "idle"
      },
      {
        "name": "clone_repo",
        "started_on": null,
        "ended_on": null,
        "status": "idle"
      },
      {
        "name": "build",
        "started_on": null,
        "ended_on": null,
        "status": "idle"
      },
      {
        "name": "deploy",
        "started_on": null,
        "ended_on": null,
        "status": "idle"
      }
    ],
    "build_config": {
      "build_command": "npm run build",
      "destination_dir": "build",
      "root_dir": "/",
      "web_analytics_tag": "cee1c73f6e4743d0b5e6bb1a0bcaabcc",
      "web_analytics_token": "021e1057c18547eca7b79f2516f06o7x"
    },
    "source": {
      "type": "github",
      "config": {
        "owner": "cloudflare",
        "repo_name": "ninjakittens",
        "production_branch": "main",
        "pr_comments_enabled": true,
        "deployments_enabled": true
      }
    }
  }
}

Try out an API request with one of your projects by replacing {account_id}, {deployment_id},{email}, and {auth_key}. You can find your account_id in the URL address bar by navigating to the Cloudflare Dashboard. (Ex: 41643ed677c7c7gba4x463c4zdb9563c).

Refer to the API documentation for a full breakdown of the object types and endpoints.

Using the Pages API on Workers

The Pages API is even more powerful and simple to use with workers.new. If you haven’t used Workers before, feel free to go through the getting started guide to learn more. However, you’ll need to have set up a Pages project to follow along. Next, you can copy and paste this template for the new worker. Then, customize the values such as {account_id}, {project_name}, {auth_key}, and {your_email}.

const endpoint = "https://api.cloudflare.com/client/v4/accounts/{account_id}/pages/projects/{project_name}/deployments";
const email = "{your_email}";
addEventListener("scheduled", (event) => {
  event.waitUntil(handleScheduled(event.scheduledTime));
});
async function handleScheduled(request) {
  const init = {
    method: "POST",
    headers: {
      "content-type": "application/json;charset=UTF-8",
      "X-Auth-Email": email,
      "X-Auth-Key": API_KEY,
      //We recommend you store API keys as secrets using the Workers dashboard or using Wrangler as documented here https://developers.cloudflare.com/workers/cli-wrangler/commands#secret
    },
  };
  const response = await fetch(endpoint, init);
  return new Response(200);
}

Announcing Rollbacks and API Access for Pages

To finish configuring the script, click the back arrow near the top left of the window and click on the settings tab. Then, set an environment variable “API_KEY” with the value of your Cloudflare Global key and click “Encrypt” and then “Save”.

The script just makes a POST request to the deployments’ endpoint to trigger a new build. Click “Quick edit” to go back to the code editor to finish testing the script. You can test out your configuration and make a request by clicking on the “Trigger scheduled event” button in the “Schedule” tab near the tabs saying “HTTP” and “Preview”. You should see a new queued build on your Project through the Pages dashboard. Now, you can click “Save and Deploy” to publish your work. Finally, back to the worker settings page by clicking the back arrow near the top left of the window.

All that’s left to do is set a cron trigger to periodically run this Worker on the “Triggers tab”. Click on “Add Cron Trigger”.

Announcing Rollbacks and API Access for Pages

Next, we can input “0 * * * *” to trigger the build every hour.

Announcing Rollbacks and API Access for Pages

Finally, click save and your automation using the Pages API will trigger a new build every hour.

2. Deleting old deployments after a week: Pages hosts and serves all project deployments on preview links. Suppose you want to keep your project relatively private and prevent access to old deployments. You can use the API to delete deployments after a month so that they are no longer public online. This is easy to do on Workers using Cron Triggers.

3. Sharing project information: Imagine you are working on a development team using Pages to build our websites. You probably want an easy way to share deployment preview links and build status without having to share Cloudflare accounts. Using the API, you can easily share project information, including deployment status and preview links, and serve this content as HTML from a Cloudflare Worker.

Find the code snippets for all three examples here.

Conclusion

We will continue making the API more powerful with features such as supporting prebuilt deployments in the future. We are excited to see what you build with the API and hope you enjoy using rollbacks. At Cloudflare, we are committed to building the best developer experience on Pages, and we always appreciate hearing your feedback. Come chat with us and share more feedback on the Workers Discord (We have a dedicated #pages-help channel!).

Building fine-grained authorization using Amazon Cognito, API Gateway, and IAM

Post Syndicated from Artem Lovan original https://aws.amazon.com/blogs/security/building-fine-grained-authorization-using-amazon-cognito-api-gateway-and-iam/

June 5, 2021: We’ve updated Figure 1: User request flow.


Authorizing functionality of an application based on group membership is a best practice. If you’re building APIs with Amazon API Gateway and you need fine-grained access control for your users, you can use Amazon Cognito. Amazon Cognito allows you to use groups to create a collection of users, which is often done to set the permissions for those users. In this post, I show you how to build fine-grained authorization to protect your APIs using Amazon Cognito, API Gateway, and AWS Identity and Access Management (IAM).

As a developer, you’re building a customer-facing application where your users are going to log into your web or mobile application, and as such you will be exposing your APIs through API Gateway with upstream services. The APIs could be deployed on Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Lambda, or Elastic Load Balancing where each of these options will forward the request to your Amazon Elastic Compute Cloud (Amazon EC2) instances. Additionally, you can use on-premises services that are connected to your Amazon Web Services (AWS) environment over an AWS VPN or AWS Direct Connect. It’s important to have fine-grained controls for each API endpoint and HTTP method. For instance, the user should be allowed to make a GET request to an endpoint, but should not be allowed to make a POST request to the same endpoint. As a best practice, you should assign users to groups and use group membership to allow or deny access to your API services.

Solution overview

In this blog post, you learn how to use an Amazon Cognito user pool as a user directory and let users authenticate and acquire the JSON Web Token (JWT) to pass to the API Gateway. The JWT is used to identify what group the user belongs to, as mapping a group to an IAM policy will display the access rights the group is granted.

Note: The solution works similarly if Amazon Cognito would be federating users with an external identity provider (IdP)—such as Ping, Active Directory, or Okta—instead of being an IdP itself. To learn more, see Adding User Pool Sign-in Through a Third Party. Additionally, if you want to use groups from an external IdP to grant access, Role-based access control using Amazon Cognito and an external identity provider outlines how to do so.

The following figure shows the basic architecture and information flow for user requests.

Figure 1: User request flow

Figure 1: User request flow

Let’s go through the request flow to understand what happens at each step, as shown in Figure 1:

  1. A user logs in and acquires an Amazon Cognito JWT ID token, access token, and refresh token. To learn more about each token, see using tokens with user pools.
  2. A RestAPI request is made and a bearer token—in this solution, an access token—is passed in the headers.
  3. API Gateway forwards the request to a Lambda authorizer—also known as a custom authorizer.
  4. The Lambda authorizer verifies the Amazon Cognito JWT using the Amazon Cognito public key. On initial Lambda invocation, the public key is downloaded from Amazon Cognito and cached. Subsequent invocations will use the public key from the cache.
  5. The Lambda authorizer looks up the Amazon Cognito group that the user belongs to in the JWT and does a lookup in Amazon DynamoDB to get the policy that’s mapped to the group.
  6. Lambda returns the policy and—optionally—context to API Gateway. The context is a map containing key-value pairs that you can pass to the upstream service. It can be additional information about the user, the service, or anything that provides additional information to the upstream service.
  7. The API Gateway policy engine evaluates the policy.

    Note: Lambda isn’t responsible for understanding and evaluating the policy. That responsibility falls on the native capabilities of API Gateway.

  8. The request is forwarded to the service.

Note: To further optimize Lambda authorizer, the authorization policy can be cached or disabled, depending on your needs. By enabling cache, you could improve the performance as the authorization policy will be returned from the cache whenever there is a cache key match. To learn more, see Configure a Lambda authorizer using the API Gateway console.

Let’s have a closer look at the following example policy that is stored as part of an item in DynamoDB.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"PetStore-API",
         "Effect":"Allow",
         "Action":"execute-api:Invoke",
         "Resource":[
            "arn:aws:execute-api:*:*:*/*/*/petstore/v1/*",
            "arn:aws:execute-api:*:*:*/*/GET/petstore/v2/status"
         ],
         "Condition":{
            "IpAddress":{
               "aws:SourceIp":[
                  "192.0.2.0/24",
                  "198.51.100.0/24"
               ]
            }
         }
      }
   ]
}

Based on this example policy, the user is allowed to make calls to the petstore API. For version v1, the user can make requests to any verb and any path, which is expressed by an asterisk (*). For v2, the user is only allowed to make a GET request for path /status. To learn more about how the policies work, see Output from an Amazon API Gateway Lambda authorizer.

Getting started

For this solution, you need the following prerequisites:

  • The AWS Command Line Interface (CLI) installed and configured for use.
  • Python 3.6 or later, to package Python code for Lambda

    Note: We recommend that you use a virtual environment or virtualenvwrapper to isolate the solution from the rest of your Python environment.

  • An IAM role or user with enough permissions to create Amazon Cognito User Pool, IAM Role, Lambda, IAM Policy, API Gateway and DynamoDB table.
  • The GitHub repository for the solution. You can download it, or you can use the following Git command to download it from your terminal.

    Note: This sample code should be used to test out the solution and is not intended to be used in production account.

     $ git clone https://github.com/aws-samples/amazon-cognito-api-gateway.git
     $ cd amazon-cognito-api-gateway
    

    Use the following command to package the Python code for deployment to Lambda.

     $ bash ./helper.sh package-lambda-functions
     …
     Successfully completed packaging files.
    

To implement this reference architecture, you will be utilizing the following services:

Note: This solution was tested in the us-east-1, us-east-2, us-west-2, ap-southeast-1, and ap-southeast-2 Regions. Before selecting a Region, verify that the necessary services—Amazon Cognito, API Gateway, and Lambda—are available in those Regions.

Let’s review each service, and how those will be used, before creating the resources for this solution.

Amazon Cognito user pool

A user pool is a user directory in Amazon Cognito. With a user pool, your users can log in to your web or mobile app through Amazon Cognito. You use the Amazon Cognito user directory directly, as this sample solution creates an Amazon Cognito user. However, your users can also log in through social IdPs, OpenID Connect (OIDC), and SAML IdPs.

Lambda as backing API service

Initially, you create a Lambda function that serves your APIs. API Gateway forwards all requests to the Lambda function to serve up the requests.

An API Gateway instance and integration with Lambda

Next, you create an API Gateway instance and integrate it with the Lambda function you created. This API Gateway instance serves as an entry point for the upstream service. The following bash command below creates an Amazon Cognito user pool, a Lambda function, and an API Gateway instance. The command then configures proxy integration with Lambda and deploys an API Gateway stage.

Deploy the sample solution

From within the directory where you downloaded the sample code from GitHub, run the following command to generate a random Amazon Cognito user password and create the resources described in the previous section.

 $ bash ./helper.sh cf-create-stack-gen-password
 ...
 Successfully created CloudFormation stack.

When the command is complete, it returns a message confirming successful stack creation.

Validate Amazon Cognito user creation

To validate that an Amazon Cognito user has been created successfully, run the following command to open the Amazon Cognito UI in your browser and then log in with your credentials.

Note: When you run this command, it returns the user name and password that you should use to log in.

 $ bash ./helper.sh open-cognito-ui
  Opening Cognito UI. Please use following credentials to login:
  Username: cognitouser
  Password: xxxxxxxx

Alternatively, you can open the CloudFormation stack and get the Amazon Cognito hosted UI URL from the stack outputs. The URL is the value assigned to the CognitoHostedUiUrl variable.

Figure 2: CloudFormation Outputs - CognitoHostedUiUrl

Figure 2: CloudFormation Outputs – CognitoHostedUiUrl

Validate Amazon Cognito JWT upon login

Since we haven’t installed a web application that would respond to the redirect request, Amazon Cognito will redirect to localhost, which might look like an error. The key aspect is that after a successful log in, there is a URL similar to the following in the navigation bar of your browser:

http://localhost/#id_token=eyJraWQiOiJicVhMYWFlaTl4aUhzTnY3W...

Test the API configuration

Before you protect the API with Amazon Cognito so that only authorized users can access it, let’s verify that the configuration is correct and the API is served by API Gateway. The following command makes a curl request to API Gateway to retrieve data from the API service.

 $ bash ./helper.sh curl-api
{"pets":[{"id":1,"name":"Birds"},{"id":2,"name":"Cats"},{"id":3,"name":"Dogs"},{"id":4,"name":"Fish"}]}

The expected result is that the response will be a list of pets. In this case, the setup is correct: API Gateway is serving the API.

Protect the API

To protect your API, the following is required:

  1. DynamoDB to store the policy that will be evaluated by the API Gateway to make an authorization decision.
  2. A Lambda function to verify the user’s access token and look up the policy in DynamoDB.

Let’s review all the services before creating the resources.

Lambda authorizer

A Lambda authorizer is an API Gateway feature that uses a Lambda function to control access to an API. You use a Lambda authorizer to implement a custom authorization scheme that uses a bearer token authentication strategy. When a client makes a request to one of the API operations, the API Gateway calls the Lambda authorizer. The Lambda authorizer takes the identity of the caller as input and returns an IAM policy as the output. The output is the policy that is returned in DynamoDB and evaluated by the API Gateway. If there is no policy mapped to the caller identity, Lambda will generate a deny policy and request will be denied.

DynamoDB table

DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. This is ideal for this use case to ensure that the Lambda authorizer can quickly process the bearer token, look up the policy, and return it to API Gateway. To learn more, see Control access for invoking an API.

The final step is to create the DynamoDB table for the Lambda authorizer to look up the policy, which is mapped to an Amazon Cognito group.

Figure 3 illustrates an item in DynamoDB. Key attributes are:

  • Group, which is used to look up the policy.
  • Policy, which is returned to API Gateway to evaluate the policy.

 

Figure 3: DynamoDB item

Figure 3: DynamoDB item

Based on this policy, the user that is part of the Amazon Cognito group pet-veterinarian is allowed to make API requests to endpoints https://<domain>/<api-gateway-stage>/petstore/v1/* and https://<domain>/<api-gateway-stage>/petstore/v2/status for GET requests only.

Update and create resources

Run the following command to update existing resources and create a Lambda authorizer and DynamoDB table.

 $ bash ./helper.sh cf-update-stack
Successfully updated CloudFormation stack.

Test the custom authorizer setup

Begin your testing with the following request, which doesn’t include an access token.

$ bash ./helper.sh curl-api
{"message":"Unauthorized"}

The request is denied with the message Unauthorized. At this point, the Amazon API Gateway expects a header named Authorization (case sensitive) in the request. If there’s no authorization header, the request is denied before it reaches the lambda authorizer. This is a way to filter out requests that don’t include required information.

Use the following command for the next test. In this test, you pass the required header but the token is invalid because it wasn’t issued by Amazon Cognito but is a simple JWT-format token stored in ./helper.sh. To learn more about how to decode and validate a JWT, see decode and verify an Amazon Cognito JSON token.

$ bash ./helper.sh curl-api-invalid-token
{"Message":"User is not authorized to access this resource"}

This time the message is different. The Lambda authorizer received the request and identified the token as invalid and responded with the message User is not authorized to access this resource.

To make a successful request to the protected API, your code will need to perform the following steps:

  1. Use a user name and password to authenticate against your Amazon Cognito user pool.
  2. Acquire the tokens (id token, access token, and refresh token).
  3. Make an HTTPS (TLS) request to API Gateway and pass the access token in the headers.

Before the request is forwarded to the API service, API Gateway receives the request and passes it to the Lambda authorizer. The authorizer performs the following steps. If any of the steps fail, the request is denied.

  1. Retrieve the public keys from Amazon Cognito.
  2. Cache the public keys so the Lambda authorizer doesn’t have to make additional calls to Amazon Cognito as long as the Lambda execution environment isn’t shut down.
  3. Use public keys to verify the access token.
  4. Look up the policy in DynamoDB.
  5. Return the policy to API Gateway.

The access token has claims such as Amazon Cognito assigned groups, user name, token use, and others, as shown in the following example (some fields removed).

{
    "sub": "00000000-0000-0000-0000-0000000000000000",
    "cognito:groups": [
        "pet-veterinarian"
    ],
...
    "token_use": "access",
    "scope": "openid email",
    "username": "cognitouser"
}

Finally, let’s programmatically log in to Amazon Cognito UI, acquire a valid access token, and make a request to API Gateway. Run the following command to call the protected API.

$ bash ./helper.sh curl-protected-api
{"pets":[{"id":1,"name":"Birds"},{"id":2,"name":"Cats"},{"id":3,"name":"Dogs"},{"id":4,"name":"Fish"}]}

This time, you receive a response with data from the API service. Let’s examine the steps that the example code performed:

  1. Lambda authorizer validates the access token.
  2. Lambda authorizer looks up the policy in DynamoDB based on the group name that was retrieved from the access token.
  3. Lambda authorizer passes the IAM policy back to API Gateway.
  4. API Gateway evaluates the IAM policy and the final effect is an allow.
  5. API Gateway forwards the request to Lambda.
  6. Lambda returns the response.

Let’s continue to test our policy from Figure 3. In the policy document, arn:aws:execute-api:*:*:*/*/GET/petstore/v2/status is the only endpoint for version V2, which means requests to endpoint /GET/petstore/v2/pets should be denied. Run the following command to test this.

 $ bash ./helper.sh curl-protected-api-not-allowed-endpoint
{"Message":"User is not authorized to access this resource"}

Note: Now that you understand fine grained access control using Cognito user pool, API Gateway and lambda function, and you have finished testing it out, you can run the following command to clean up all the resources associated with this solution:

 $ bash ./helper.sh cf-delete-stack

Advanced IAM policies to further control your API

With IAM, you can create advanced policies to further refine access to your APIs. You can learn more about condition keys that can be used in API Gateway, their use in an IAM policy with conditions, and how policy evaluation logic determines whether to allow or deny a request.

Summary

In this post, you learned how IAM and Amazon Cognito can be used to provide fine-grained access control for your API behind API Gateway. You can use this approach to transparently apply fine-grained control to your API, without having to modify the code in your API, and create advanced policies by using IAM condition keys.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Cognito forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Artem Lovan

Artem is a Senior Solutions Architect based in New York. He helps customers architect and optimize applications on AWS. He has been involved in IT at many levels, including infrastructure, networking, security, DevOps, and software development.

Summarize devices that are not reachable

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/summarize-devices-that-are-not-reachable/13219/

In this lab, we will list all devices which are not reachable by a monitoring tool. This is good when we want to improve the overall monitoring experience and decrease the size queue (metrics which has not been arrived at the instance).

Tools required for the job: Access to a database server or a Windows computer with PowerShell

To summarize devices that are not reachable at the moment we can use a database query. Tested and works on 4.0, 5.0, on MySQL and PostgreSQL:

SELECT hosts.host,
       interface.ip,
       interface.dns,
       interface.useip,
       CASE interface.type
           WHEN 1 THEN 'ZBX'
           WHEN 2 THEN 'SNMP'
           WHEN 3 THEN 'IPMI'
           WHEN 4 THEN 'JMX'
       END AS "type",
       hosts.error
FROM hosts
JOIN interface ON interface.hostid=hosts.hostid
WHERE hosts.available=2
  AND interface.main=1
  AND hosts.status=0;

A very similar (but not exactly the same) outcome can be obtained via Windows PowerShell by contacting Zabbix API. Try this snippet:

$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "application/json")
$url = 'http://192.168.1.101/api_jsonrpc.php'
$user = 'api'
$password = 'zabbix'

# authorization
$key = Invoke-RestMethod $url -Method 'POST' -Headers $headers -Body "
{
    `"jsonrpc`": `"2.0`",
    `"method`": `"user.login`",
    `"params`": {
        `"user`": `"$user`",
        `"password`": `"$password`"
    },
    `"id`": 1
}
" | foreach { $_.result }
echo $key

# filter out unreachable Agent, SNMP, JMX, IPMI hosts
Invoke-RestMethod $url -Method 'POST' -Headers $headers -Body "
{
    `"jsonrpc`": `"2.0`",
    `"method`": `"host.get`",
    `"params`": {
        `"output`": [`"interfaces`",`"host`",`"proxy_hostid`",`"disable_until`",`"lastaccess`",`"errors_from`",`"error`"],
        `"selectInterfaces`": `"extend`",
        `"filter`": {`"available`": `"2`",`"status`":`"0`"}
    },
    `"auth`": `"$key`",
    `"id`": 1
}
" | foreach { $_.result }  | foreach { $_.interfaces } | Out-GridView

# log out
Invoke-RestMethod $url -Method 'POST' -Headers $headers -Body "
{
    `"jsonrpc`": `"2.0`",
    `"method`": `"user.logout`",
    `"params`": [],
    `"id`": 1,
    `"auth`": `"$key`"
}
"

Set a valid credential (URL, username, password) on the top of the code before executing it.

The benefit of PowerShell here is that we can use some on-the-fly filtering:

What is the exact meaning of the field ‘type’ we can understand by looking on the previous database query:

       CASE interface.type
           WHEN 1 THEN 'ZBX'
           WHEN 2 THEN 'SNMP'
           WHEN 3 THEN 'IPMI'
           WHEN 4 THEN 'JMX'
       END AS "type",

On Windows PowerShell, it is possible to download the unreachable hosts directly to CSV file. To do that, in the code above, we need to change:

Out-GridView

to

Export-Csv c:\temp\unavailable.hosts.csv

Alright, this was the knowledge bit today. Let’s keep Zabbixing!

A Byzantine failure in the real world

Post Syndicated from Tom Lianza original https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/

A Byzantine failure in the real world

An analysis of the Cloudflare API availability incident on 2020-11-02

When we review design documents at Cloudflare, we are always on the lookout for Single Points of Failure (SPOFs). Eliminating these is a necessary step in architecting a system you can be confident in. Ironically, when you’re designing a system with built-in redundancy, you spend most of your time thinking about how well it functions when that redundancy is lost.

On November 2, 2020, Cloudflare had an incident that impacted the availability of the API and dashboard for six hours and 33 minutes. During this incident, the success rate for queries to our API periodically dipped as low as 75%, and the dashboard experience was as much as 80 times slower than normal. While Cloudflare’s edge is massively distributed across the world (and kept working without a hitch), Cloudflare’s control plane (API & dashboard) is made up of a large number of microservices that are redundant across two regions. For most services, the databases backing those microservices are only writable in one region at a time.

Each of Cloudflare’s control plane data centers has multiple racks of servers. Each of those racks has two switches that operate as a pair—both are normally active, but either can handle the load if the other fails. Cloudflare survives rack-level failures by spreading the most critical services across racks. Every piece of hardware has two or more power supplies with different power feeds. Every server that stores critical data uses RAID 10 redundant disks or storage systems that replicate data across at least three machines in different racks, or both. Redundancy at each layer is something we review and require. So—how could things go wrong?

In this post we present a timeline of what happened, and how a difficult failure mode known as a Byzantine fault played a role in a cascading series of events.

2020-11-02 14:43 UTC: Partial Switch Failure

At 14:43, a network switch started misbehaving. Alerts began firing about the switch being unreachable to pings. The device was in a partially operating state: network control plane protocols such as LACP and BGP remained operational, while others, such as vPC, were not. The vPC link is used to synchronize ports across multiple switches, so that they appear as one large, aggregated switch to servers connected to them. At the same time, the data plane (or forwarding plane) was not processing and forwarding all the packets received from connected devices.

This failure scenario is completely invisible to the connected nodes, as each server only sees an issue for some of its traffic due to the load-balancing nature of LACP. Had the switch failed fully, all traffic would have failed over to the peer switch, as the connected links would’ve simply gone down, and the ports would’ve dropped out of the forwarding LACP bundles.

Six minutes later, the switch recovered without human intervention. But this odd failure mode led to further problems that lasted long after the switch had returned to normal operation.

2020-11-02 14:44 UTC: etcd Errors begin

The rack with the misbehaving switch included one server in our etcd cluster. We use etcd heavily in our core data centers whenever we need strongly consistent data storage that’s reliable across multiple nodes.

In the event that the cluster leader fails, etcd uses the RAFT protocol to maintain consistency and establish consensus to promote a new leader. In the RAFT protocol, cluster members are assumed to be either available or unavailable, and to provide accurate information or none at all. This works fine when a machine crashes, but is not always able to handle situations where different members of the cluster have conflicting information.

In this particular situation:

  • Network traffic between node 1 (in the affected rack) and node 3 (the leader) was being sent through the switch in the degraded state,
  • Network traffic between node 1 and node 2 were going through its working peer, and
  • Network traffic between node 2 and node 3 was unaffected.

This caused cluster members to have conflicting views of reality, known in distributed systems theory as a Byzantine fault. As a consequence of this conflicting information, node 1 repeatedly initiated leader elections, voting for itself, while node 2 repeatedly voted for node 3, which it could still connect to. This resulted in ties that did not promote a leader node 1 could reach. RAFT leader elections are disruptive, blocking all writes until they’re resolved, so this made the cluster read-only until the faulty switch recovered and node 1 could once again reach node 3.

A Byzantine failure in the real world

2020-11-02 14:45 UTC: Database system promotes a new primary database

Cloudflare’s control plane services use relational databases hosted across multiple clusters within a data center. Each cluster is configured for high availability. The cluster setup includes a primary database, a synchronous replica, and one or more asynchronous replicas. This setup allows redundancy within a data center. For cross-datacenter redundancy, a similar high availability secondary cluster is set up and replicated in a geographically dispersed data center for disaster recovery. The cluster management system leverages etcd for cluster member discovery and coordination.

When etcd became read-only, two clusters were unable to communicate that they had a healthy primary database. This triggered the automatic promotion of a synchronous database replica to become the new primary. This process happened automatically and without error or data loss.

There was a defect in our cluster management system that requires a rebuild of all database replicas when a new primary database is promoted. So, although the new primary database was available instantly, the replicas would take considerable time to become available, depending on the size of the database. For one of the clusters, service was restored quickly. Synchronous and asynchronous database replicas were rebuilt and started replicating successfully from primary, and the impact was minimal.

For the other cluster, however, performant operation of that database required a replica to be online. Because this database handles authentication for API calls and dashboard activities, it takes a lot of reads, and one replica was heavily utilized to spare the primary the load. When this failover happened and no replicas were available, the primary was overloaded, as it had to take all of the load. This is when the main impact started.

Reduce Load, Leverage Redundancy

At this point we saw that our primary authentication database was overwhelmed and began shedding load from it. We dialed back the rate at which we push SSL certificates to the edge, send emails, and other features, to give it space to handle the additional load. Unfortunately, because of its size, we knew it would take several hours for a replica to be fully rebuilt.

A silver lining here is that every database cluster in our primary data center also has online replicas in our secondary data center. Those replicas are not part of the local failover process, and were online and available throughout the incident. The process of steering read-queries to those replicas was not yet automated, so we manually diverted API traffic that could leverage those read replicas to the secondary data center. This substantially improved our API availability.

The Dashboard

The Cloudflare dashboard, like most web applications, has the notion of a user session. When user sessions are created (each time a user logs in) we perform some database operations and keep data in a Redis cluster for the duration of that user’s session. Unlike our API calls, our user sessions cannot currently be moved across the ocean without disruption. As we took actions to improve the availability of our API calls, we were unfortunately making the user experience on the dashboard worse.

This is an area of the system that is currently designed to be able to fail over across data centers in the event of a disaster, but has not yet been designed to work in both data centers at the same time. After a first period in which users on the dashboard became increasingly frustrated, we failed the authentication calls fully back to our primary data center, and kept working on our primary database to ensure we could provide the best service levels possible in that degraded state.

2020-11-02 21:20 UTC Database Replica Rebuilt

The instant the first database replica rebuilt, it put itself back into service, and performance resumed to normal levels. We re-ramped all of the services that had been turned down, so all asynchronous processing could catch up, and after a period of monitoring marked the end of the incident.

Redundant Points of Failure

The cascade of failures in this incident was interesting because each system, on its face, had redundancy. Moreover, no system fully failed—each entered a degraded state. That combination meant the chain of events that transpired was considerably harder to model and anticipate. It was frustrating yet reassuring that some of the possible failure modes were already being addressed.

A team was already working on fixing the limitation that requires a database replica rebuild upon promotion. Our user sessions system was inflexible in scenarios where we’d like to steer traffic around, and redesigning that was already in progress.

This incident also led us to revisit the configuration parameters we put in place for things that auto-remediate. In previous years, promoting a database replica to primary took far longer than we liked, so getting that process automated and able to trigger on a minute’s notice was a point of pride. At the same time, for at least one of our databases, the cure may be worse than the disease, and in fact we may not want to invoke the promotion process so quickly. Immediately after this incident we adjusted that configuration accordingly.

Byzantine Fault Tolerance (BFT) is a hot research topic. Solutions have been known since 1982, but have had to choose between a variety of engineering tradeoffs, including security, performance, and algorithmic simplicity. Most general-purpose cluster management systems choose to forgo BFT entirely and use protocols based on PAXOS, or simplifications of PAXOS such as RAFT, that perform better and are easier to understand than BFT consensus protocols. In many cases, a simple protocol that is known to be vulnerable to a rare failure mode is safer than a complex protocol that is difficult to implement correctly or debug.

The first uses of BFT consensus were in safety-critical systems such as aircraft and spacecraft controls. These systems typically have hard real time latency constraints that require tightly coupling consensus with application logic in ways that make these implementations unsuitable for general-purpose services like etcd. Contemporary research on BFT consensus is mostly focused on applications that cross trust boundaries, which need to protect against malicious cluster members as well as malfunctioning cluster members. These designs are more suitable for implementing general-purpose services such as etcd, and we look forward to collaborating with researchers and the open source community to make them suitable for production cluster management.

We are very sorry for the difficulty the outage caused, and are continuing to improve as our systems grow. We’ve since fixed the bug in our cluster management system, and are continuing to tune each of the systems involved in this incident to be more resilient to failures of their dependencies.  If you’re interested in helping solve these problems at scale, please visit cloudflare.com/careers.

Close problem automatically via Zabbix API

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/close-problem-automatically-via-zabbix-api/12461/

Today we are talking about a use case when it’s impossible to find a proper way to write a recovery expression for the Zabbix trigger. In other words, we know how to identify problems. But there is no good way to detect when the problem is gone.

This mostly relates to a huge environment, for example:

  • Got one log file. There are hundreds of patterns inside. We respect all of them. We need them
  • SNMP trap item (snmptrap.fallback) with different patterns being written inside

In these situations, the trigger is most likely configured to “Event generation mode: Multiple.” This practically means: when a “problematic metric” hits the instance, it will open +1 additional problem.

Goal:
I just need to receive an email about the record, then close the event.

As a workaround (let’s call it a solution here), we can define an action which will:

  1. contact an API endpoint
  2. manually acknowledge the event and close it

The biggest reason why this functionality is possible is that: when an event hits the action, the operation actually knows the event ID of the problem. The macro {EVENT.ID} saves the day.

To solve the problem, we need to install API characteristics at the global level:

     {$Z_API_PHP}=http://127.0.0.1/api_jsonrpc.php
    {$Z_API_USER}=api
{$Z_API_PASSWORD}=zabbix

NOTE
‘http://127.0.0.1/api_jsonrpc.php’ means the frontend server runs on the same server as systemd:zabbix-server. If it is not the case, we need to plot a front-end address of Zabbix GUI + add ‘api_jsonrpc.php’.

We will have 2 actions. The first one will deliver a notification to email:

After 1 minute, a second action will close the event:

This is a full bash snippet we must put inside. No need to change anything. It works with copy and paste:

url={$Z_API_PHP}
    user={$Z_API_USER}
password={$Z_API_PASSWORD}

# authorization
auth=$(curl -sk -X POST -H "Content-Type: application/json" -d "
{
	\"jsonrpc\": \"2.0\",
	\"method\": \"user.login\",
	\"params\": {
		\"user\": \"$user\",
		\"password\": \"$password\"
	},
	\"id\": 1,
	\"auth\": null
}
" $url | \
grep -E -o "([0-9a-f]{32,32})")

# acknowledge and close event
curl -sk -X POST -H "Content-Type: application/json" -d "
{
	\"jsonrpc\": \"2.0\",
	\"method\": \"event.acknowledge\",
	\"params\": {
		\"eventids\": \"{EVENT.ID}\",
		\"action\": 1,
		\"message\": \"Problem resolved.\"
	},
	\"auth\": \"$auth\",
	\"id\": 1
}" $url

# close api key
curl -sk -X POST -H "Content-Type: application/json" -d "
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"user.logout\",
    \"params\": [],
    \"id\": 1,
    \"auth\": \"$auth\"
}
" $url

Zabbix API scripting via curl and jq

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/zabbix-api-scripting-via-curl-and-jq/12434/

In this lab we will use a bash environment and utilities ‘curl’ and ‘jq’ to perform Zabbix API calls, do some scripting.

‘curl’ is a tool to exchange JSON messages over HTTP/HTTPS.
‘jq’ utility helps to locate and extract specific elements in output.

To follow the lab we need to install ‘jq’:

# On CentOS7/RHEL7:
yum install epel-release && yum install jq

# On CentOS8/RHEL8:
dnf install jq

# On Ubuntu/Debian:
apt install jq

# On any 64-bit Linux platform:
curl -skL "https://github.com/stedolan/jq/releases/download/jq1.5/jq-linux64" -o /usr/bin/jq && chmod +x /usr/bin/jq

Obtaining an authorization token

In order to operate with API calls we need to:

  • Define an API endpoint. this is an URL, a PHP file which is designed to accept requests
  • Obtain an authorization token

If you tend to execute API calls from frontend server then most likelly.

url=http://127.0.0.1/api_jsonrpc.php
# or:
url=http://127.0.0.1/zabbix/api_jsonrpc.php

It’s required to set the URL variable to jump to the next step. Test if you have it configured:

echo $url

Any API call needs to be used via authorization token. To put one token in variable use the command:

auth=$(curl -s -X POST -H 'Content-Type: application/json-rpc' \
-d '
{"jsonrpc":"2.0","method":"user.login","params":
{"user":"api","password":"zabbix"},
"id":1,"auth":null}
' $url | \
jq -r .result
)

Note
Notice there is user ‘api’ with password ‘zabbix’. This is a dedicated user for API calls.

Check if you have a session key. It should be 32 character HEX string:

echo $auth

Though process

1) visit the documentation page and pick an API flavor for example alert.get:

{
"jsonrpc": "2.0",
"method": "alert.get",
"params": {
	"output": "extend",
	"actionids": "3"
},
"auth": "038e1d7b1735c6a5436ee9eae095879e",
"id": 1
}

2) Let’s use our favorite text editor and build in Find&Replace functionality to escape all double quotes:

{
\"jsonrpc\": \"2.0\",
\"method\": \"alert.get\",
\"params\": {
	\"output\": \"extend\",
	\"actionids\": \"3\"
},
\"auth\": \"038e1d7b1735c6a5436ee9eae095879e\",
\"id\": 1
}

NOTE
Don’t ever think to do this process manually by hand!

3) Replace session key 038e1d7b1735c6a5436ee9eae095879e with our variable $auth

{
\"jsonrpc\": \"2.0\",
\"method\": \"alert.get\",
\"params\": {
	\"output\": \"extend\",
	\"actionids\": \"3\"
},
\"auth\": \"$auth\",
\"id\": 1
}

4) Now let’s encapsulate the API command with curl:

curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \

{
\"jsonrpc\": \"2.0\",
\"method\": \"alert.get\",
\"params\": {
	\"output\": \"extend\",
	\"actionids\": \"3\"
},
\"auth\": \"$auth\",
\"id\": 1
}

" $url

By executing the previous command, it should already print a JSON content in response.
To make the output more beautiful we can pipe it to jq .:

curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \

{
\"jsonrpc\": \"2.0\",
\"method\": \"alert.get\",
\"params\": {
	\"output\": \"extend\",
	\"actionids\": \"3\"
},
\"auth\": \"$auth\",
\"id\": 1
}

" $url | jq .

Wrap everything together in one file

This is ready to use the snippet:

#!/bin/bash

# 1. set connection details
url=http://127.0.0.1/api_jsonrpc.php
user=api
password=zabbix

# 2. get authorization token
auth=$(curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
 \"jsonrpc\": \"2.0\",
 \"method\": \"user.login\",
 \"params\": {
  \"user\": \"$user\",
  \"password\": \"$password\"
 },
 \"id\": 1,
 \"auth\": null
}
" $url | \
jq -r '.result'
)

# 3. show triggers in problem state
curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
 \"jsonrpc\": \"2.0\",
    \"method\": \"trigger.get\",
    \"params\": {
        \"output\": \"extend\",
        \"selectHosts\": \"extend\",
        \"filter\": {
            \"value\": 1
        },
        \"sortfield\": \"priority\",
        \"sortorder\": \"DESC\"
    },
    \"auth\": \"$auth\",
    \"id\": 1
}
" $url | \
jq -r '.result'

# 4. logout user
curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"user.logout\",
    \"params\": [],
    \"id\": 1,
    \"auth\": \"$auth\"
}
" $url

Conveniences

We can use https://jsonpathfinder.com/ to identify what should be the path to extract an element.

For example, to list all Zabbix proxies we will use and API call:

curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"proxy.get\",
    \"params\": {
        \"output\": [\"host\"]
    },
    \"auth\": \"$auth\",
    \"id\": 1
} 
" $url

It may print content like:

{"jsonrpc":"2.0","result":[{"host":"broceni","proxyid":"10387"},{"host":"mysql8mon","proxyid":"12066"},{"host":"riga","proxyid":"12585"}],"id":1}

Inside JSONPathFinder by using a mouse click at the right panel, we can locate a sample element what we need to extract:

It suggests a path ‘x.result[1].host’. This means to extract all elements we can remove the number and use ‘.result[].host’ like this:

curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"proxy.get\",
    \"params\": {
        \"output\": [\"host\"]
    },
    \"auth\": \"$auth\",
    \"id\": 1
} 
" $url | jq -r '.result[].host'

Now it prints only the proxy titles:

broceni
mysql8mon
riga

That is it for today. Bye.

Zabbix API calls through Postman

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/zabbix-api-calls-through-postman/12198/

Zabbix API calls can be used through the graphical user interface (GUI), no need to jump to scripting. An application to perform API calls is called Postman.

Benefits:

  • Available on Windows, Linux, or MAC
  • Save/synchronize your collection with Google account
  • Can copy and paste examples from the official documentation page

Let’s go to basic steps on how to perform API calls:

1st step – Grab API method user.login and use a dedicated username and password to obtain and session token:

{
    "jsonrpc": "2.0",
    "method": "user.login",
    "params": {
        "user": "api",
        "password": "zabbix"
    },
    "id": 1
}

This is how it looks in Postman:

NOTE
We recommend using a dedicated user for API calls. For example, a user called “api”. Make sure the user type has been chosen as “Zabbix Super Admin” so through this user we can access any type of information.

2nd step – Use API method trigger.get to list all triggers in the problem state:

{
    "jsonrpc": "2.0",
    "method": "trigger.get",
    "params": {
        "output": [
            "triggerid",
            "description",
            "priority"
        ],
        "filter": {
            "value": 1
        },
        "sortfield": "priority",
        "sortorder": "DESC"
    },
    "auth": "<session key>",
    "id": 1
}

Replace “<session key>” inside the API snippet to make it work. Then click “Send” button. It will list all triggers in the problem state on the right side:

Postman conveniences – Environments

Environments are “a must” if you:

  • Have a separate test, development, and production Zabbix instance
  • Plan to migrate Zabbix to next version (4.0 to 5.0) so it’s better to test all API calls beforehand

On the top right corner, there is a button Manage Environments. Let’s click it.

Now Create an environment:

Each environment must consist of url and auth key:

Now we have one definition prod. Can close window with [X]:

In order to work with your new environment, select a newly created profile prod. It’s required to substitute Zabbix API endpoint with {{url}} and plot {{auth}} to serve as a dynamic authorization key:

NOTE
Every time we notice an API procedure does not work anymore, all we need to do is to enter Manage environments section and install a new session tokken..

Topic in video format:
https://youtu.be/B14tsDUasG8?t=2513

Automated Origin CA for Kubernetes

Post Syndicated from Terin Stock original https://blog.cloudflare.com/automated-origin-ca-for-kubernetes/

Automated Origin CA for Kubernetes

Automated Origin CA for Kubernetes

In 2016, we launched the Cloudflare Origin CA, a certificate authority optimized for making it easy to secure the connection between Cloudflare and an origin server. Running our own CA has allowed us to support fast issuance and renewal, simple and effective revocation, and wildcard certificates for our users.

Out of the box, managing TLS certificates and keys within Kubernetes can be challenging and error prone. The secret resources have to be constructed correctly, as components expect secrets with specific fields. Some forms of domain verification require manually rotating secrets to pass. Once you’re successful, don’t forget to renew before the certificate expires!

cert-manager is a project to fill this operational gap, providing Kubernetes resources that manage the lifecycle of a certificate. Today we’re releasing origin-ca-issuer, an extension to cert-manager integrating with Cloudflare Origin CA to easily create and renew certificates for your account’s domains.

Origin CA Integration

Creating an Issuer

After installing cert-manager and origin-ca-issuer, you can create an OriginIssuer resource. This resource creates a binding between cert-manager and the Cloudflare API for an account. Different issuers may be connected to different Cloudflare accounts in the same Kubernetes cluster.

apiVersion: cert-manager.k8s.cloudflare.com/v1
kind: OriginIssuer
metadata:
  name: prod-issuer
  namespace: default
spec:
  signatureType: OriginECC
  auth:
    serviceKeyRef:
      name: service-key
      key: key
      ```

This creates a new OriginIssuer named “prod-issuer” that issues certificates using ECDSA signatures, and the secret “service-key” in the same namespace is used to authenticate to the Cloudflare API.

Signing an Origin CA Certificate

After creating an OriginIssuer, we can now create a Certificate with cert-manager. This defines the domains, including wildcards, that the certificate should be issued for, how long the certificate should be valid, and when cert-manager should renew the certificate.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: default
spec:
  # The secret name where cert-manager
  # should store the signed certificate.
  secretName: example-com-tls
  dnsNames:
    - example.com
  # Duration of the certificate.
  duration: 168h
  # Renew a day before the certificate expiration.
  renewBefore: 24h
  # Reference the Origin CA Issuer you created above,
  # which must be in the same namespace.
  issuerRef:
    group: cert-manager.k8s.cloudflare.com
    kind: OriginIssuer
    name: prod-issuer

Once created, cert-manager begins managing the lifecycle of this certificate, including creating the key material, crafting a certificate signature request (CSR), and constructing a certificate request that will be processed by the origin-ca-issuer.

When signed by the Cloudflare API, the certificate will be made available, along with the private key, in the Kubernetes secret specified within the secretName field. You’ll be able to use this certificate on servers proxied behind Cloudflare.

Extra: Ingress Support

If you’re using an Ingress controller, you can use cert-manager’s Ingress support to automatically manage Certificate resources based on your Ingress resource.

apiVersion: networking/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/issuer: prod-issuer
    cert-manager.io/issuer-kind: OriginIssuer
    cert-manager.io/issuer-group: cert-manager.k8s.cloudflare.com
  name: example
  namespace: default
spec:
  rules:
    - host: example.com
      http:
        paths:
          - backend:
              serviceName: examplesvc
              servicePort: 80
            path: /
  tls:
    # specifying a host in the TLS section will tell cert-manager 
    # what DNS SANs should be on the created certificate.
    - hosts:
        - example.com
      # cert-manager will create this secret
      secretName: example-tls

Building an External cert-manager Issuer

An external cert-manager issuer is a specialized Kubernetes controller. There’s no direct communication between cert-manager and external issuers at all; this means that you can use any existing tools and best practices for developing controllers to develop an external issuer.

We’ve decided to use the excellent controller-runtime project to build origin-ca-issuer, running two reconciliation controllers.

Automated Origin CA for Kubernetes

OriginIssuer Controller

The OriginIssuer controller watches for creation and modification of OriginIssuer custom resources. The controllers create a Cloudflare API client using the details and credentials referenced. This client API instance will later be used to sign certificates through the API. The controller will periodically retry to create an API client; once it is successful, it updates the OriginIssuer’s status to be ready.

CertificateRequest Controller

The CertificateRequest controller watches for the creation and modification of cert-manager’s CertificateRequest resources. These resources are created automatically by cert-manager as needed during a certificate’s lifecycle.

The controller looks for Certificate Requests that reference a known OriginIssuer, this reference is copied by cert-manager from the origin Certificate resource, and ignores all resources that do not match. The controller then verifies the OriginIssuer is in the ready state, before transforming the certificate request into an API request using the previously created clients.

On a successful response, the signed certificate is added to the certificate request, and which cert-manager will use to create or update the secret resource. On an unsuccessful request, the controller will periodically retry.

Learn More

Up-to-date documentation and complete installation instructions can be found in our GitHub repository. Feedback and contributions are greatly appreciated. If you’re interested in Kubernetes at Cloudflare, including building controllers like these, we’re hiring.

Add Watermarks to your Cloudflare Stream Video Uploads

Post Syndicated from Rachel Chen original https://blog.cloudflare.com/add-watermarks-to-your-cloudflare-stream-video-uploads/

Add Watermarks to your Cloudflare Stream Video Uploads

Add Watermarks to your Cloudflare Stream Video Uploads

Since the launch of Cloudflare Stream, our customers have been asking for a programmatic way to add watermarks to their videos. We built the Watermarks API to support a wide range of use cases: from customers who simply want to tell Stream “can you put this watermark image to the top right of my video?” to customers with more detailed asks such as “can you put this watermark image in a way it doesn’t take up more than 10% of the original video and with 20% opacity?” All that and more is now available at no additional cost through the Watermarks API.

What is Cloudflare Stream?

Cloudflare Stream provides out-of-the-box video infrastructure so developers can bring their app ideas to market faster. While building a video streaming app, developers must ask themselves questions like

  • Where do we store the videos affordably?
  • How do we encode the videos to support users with varying Internet speeds?
  • How do we maintain our video pipeline in the long term?”

Cloudflare Stream is a single product that handles video encoding, storage, delivery and presentation (with the Stream Player.) Stream lets developers launch their ideas faster while having the confidence the video infrastructure will scale with their app’s growth.

How the Watermark API works

The Watermark API lets you add a watermark to a video at the time of uploading. It consists of two new features to the Stream API:

  • A new /stream/watermarks endpoint that lets you create watermark profiles and returns a uid, a unique identifier for each watermark profile
  • Support for a watermark object containing the uid of the watermark profile that can be passed at the time of upload

Step 1: Creating a Watermark Profile

A watermark profile describes the nature of the watermark, including the image to use as a watermark and properties such as its positioning, padding and scale.

Add Watermarks to your Cloudflare Stream Video Uploads

In this example, we are going to create a watermark profile that places the Cloudflare logo to the lower left of the video:

curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/stream/watermarks \
  --header 'content-type: application/json' \
  --header 'x-auth-email: $CLOUDFLARE_EMAIL \
  --header 'x-auth-key: $CLOUDFLARE_KEY \
  --data '{
  "url": "https://storage.googleapis.com/zaid-test/Watermarks%20Demo/cf-icon.png",
  "name": "Cloudflare Icon",
  "opacity": 0.5,
  "padding": 0.05,
  "scale": 0.1,
  "position": "lowerLeft"
}'

The response contains information about the watermark profile, including a uid that we will use in the next step

{
  "result": {
    "uid": "a85d289c2e3f82701103620d16cd2408",
    "size": 9165,
    "height": 504,
    "width": 600,
    "created": "2020-09-03T20:43:56.337486Z",
    "downloadedFrom": "REDACTED_VIDEO_URL",
    "name": "Cloudflare Icon",
    "opacity": 0.5,
    "padding": 0.05,
    "scale": 0.1,
    "position": "lowerLeft"
  },
  "success": true,
  "errors": [],
  "messages": []
}

Step 2: Apply the Watermark

We’ve created the watermark and are ready to use it. Below is a screengrab from the Built For This commercial. It contains no watermark:

Add Watermarks to your Cloudflare Stream Video Uploads

We are going to upload the commercial and request Stream to add the logo from the previous step as a watermark:

curl --request POST \
  --url https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/stream/copy \
  --header 'content-type: application/json' \
  --header 'x-auth-email: $EMAIL \
  --header 'x-auth-key: $AUTH_KEY' \
  --data '{
  "url": "https://storage.googleapis.com/zaid-test/Watermarks%20Demo/The%20Internet%20was%20BuiltForThis.mp4",
  "watermark": {
    "uid": "a85d289c2e3f82701103620d16cd2408"
  }
}'

Step 3: Your video, now with a watermark!

You’re done! You can watch the video with a watermark:



What’s next

Read the detailed Watermark API docs covering different use cases.

In future iterations, we plan to add support for animated watermarks. Additionally, we want to add Watermark support to the Stream Dashboard so you have a UI to manage and add watermarks.

Introducing GitHub’s OpenAPI Description

Post Syndicated from Marc-Andre Giroux original https://github.blog/2020-07-27-introducing-githubs-openapi-description/

The GitHub REST API has been through three major revisions since it was first released, only a month after the site was launched. We often receive feedback that our REST API is an inspiration to many for design, and that it’s an industry reference for what an API should look like. Today, we’re excited to announce an improvement to how developers can interact with the API. GitHub has open sourced an OpenAPI description of the REST API.

OpenAPI

The OpenAPI specification is a programming language agnostic standard that lets providers describe the interface of their HTTP APIs. This allows both humans and machines to discover the capabilities of an API without needing to first read documentation or understand the implementation. OpenAPI is a widely adopted industry standard and GitHub is proud to be part of the community and help push the standard forward.

Try it Out

The GitHub OpenAPI description contains more than 600 operations exposed in our API. For visual exploration of the API, you can load the description as a Postman Collection. Programmatically, the description can be used to generate mock servers, test suites, and bindings for languages not supported by Octokit.

The description is provided under two formats. The bundled version is preferred for most use cases as it makes use of OpenAPI components for reuse and readability. For tooling that has poor support for inline references to components, we also provide a fully dereferenced version.

Active Development

The description is currently in beta. Describing a 12-year-old API is no easy task. We’ve built this description using a mix of existing JSON schemas, documented examples, contract testing, and love. We expect to make the description even more complete and accurate as we go forward and as OpenAPI becomes central to our developer experience — internally and externally.

Quarterly releases of the description are available for GitHub Enterprise Server and GitHub Private Instances, with versions like v2.21. More frequent updates to the description will be available for GitHub.com.

How Can I Contribute?

We’re always looking to make our OpenAPI description more complete and accurate as well as making it easier to consume. If you’d like to help contribute to the description, check out our contributing guide. If something is not working for you, please file an Issue on the repository.

Building a complete OpenAPI description for the GitHub API was no easy task and could not have been possible without a great team. Thanks to Gregor Martynus for his initial work on describing the API, the Docs Engineering team for their amazing work around OpenAPI and documentation, Will Roden for his help validating the description with octo-go, as well as the folks at Redoc.ly who helped along the way.

Learn more about our REST API OpenAPI Description

*  The OpenAPI Initiative logo is a trademark of The Linux Foundation

Backblaze B2 and the S3 Compatible API on Cloudflare

Post Syndicated from Tim Obezuk original https://blog.cloudflare.com/backblaze-b2-and-the-s3-compatible-api-on-cloudflare/

Backblaze B2 and the S3 Compatible API on Cloudflare

In May 2020, Backblaze, a founding Bandwidth Alliance partner announced S3 compatible APIs for their B2 Cloud Storage service. As a refresher, the Bandwidth Alliance is a group of forward-thinking cloud and networking companies that are committed to discounting or waiving data transfer fees for shared customers. Backblaze has been a proud partner since 2018. We are excited to see Backblaze introduce a new level of compatibility in their Cloud Storage service.

History of the S3 API

First let’s dive into the history of the S3 API and why it’s important for Cloudflare users.

Prior to 2006, before the mass migration to the Cloud, if you wanted to store content for your company you needed to build your own expensive and highly available storage platform that was large enough to store all your existing content with enough growth headroom for your business. AWS launched to help eliminate this model by renting their physical computing and storage infrastructure.

Amazon Simple Storage Service (S3) led the market by offering a scalable and resilient tool for storing unlimited amounts of data without building it yourself. It could be integrated into any application but there was one catch: you couldn’t use any existing standard such as WebDAV, FTP or SMB: your application needed to interface with Amazon’s bespoke S3 API.

Fast forward to 2020 and the storage provider landscape has become highly competitive with many providers capable of providing petabyte (and exabyte) scale content storage at extremely low cost-per-gigabyte. However, Amazon S3 has remained a dominant player despite heavy competition and not being the most cost-effective player.

The broad adoption of the S3 API by developers in their codebases and internal systems has transformed the S3 API into what WebDAV promised us to be: de facto standard HTTP File Storage API.

Engineering costs of changing storage providers

With many code bases and legacy applications being entrenched in the S3 API, the process to switch to a more cost-effective storage provider is not so easy. Companies need to consider the cost of engineer time programming a new storage API while also physically moving their data.

This engineering overhead has led many storage providers to natively support the S3 API, leveling the playing field and allowing companies to focus on picking the most cost-effective provider.

First-mile bandwidth costs and the Bandwidth Alliance

Cloudflare caches content in Points of Presence located in more than 200 cities around the world. This cached content is then handed to your Internet service provider (ISP) over low cost and often free Internet exchange connections in the same facility using mutual fibre optic cables. This cost saving is fairly well understood as the benefit of content delivery networks and has become highly commoditized.

What is less well understood is the first-mile cost of moving data from a storage provider to the content delivery network. Typically storage providers expect traffic to route via the Internet and will charge the consumer per-gigabyte of data transmitted. This is not the case for Cloudflare as we also share facilities and mutual fibre optic cables with many storage providers.

Backblaze B2 and the S3 Compatible API on Cloudflare

These shared interconnects created an opportunity to waive the cost of first-mile bandwidth between Cloudflare and many providers and is what prompted us to create the Bandwidth Alliance.

Media and entertainment companies serving user-generated content have a continuous supply of new content being moved over the first-mile from the storage provider to the content delivery network. The first-mile bandwidth cost adds up and using a Bandwidth Alliance partner such Backblaze can entirely eliminate it.

Using the S3 API in Cloudflare Workers

The Solutions Engineering team at Cloudflare is tasked with providing strategic technical guidance for our enterprise customers.

It’s not uncommon for developers to connect Cloudflare’s global network directly to their storage provider and directly serve content such as live and on-demand video without an intermediate web server.

For security purposes engineers typically use Cloudflare Workers to sign each uncached request using the S3 API. Cloudflare Workers allows anyone to deploy code to our global network of over 200+ Points of Presence in seconds and is built on top of Service Workers.

We’ve tested Backblaze B2’s S3 Compatible API in Cloudflare Workers using the same code tested for Amazon S3 buckets and it works perfectly by changing the target endpoint.

Creating a S3 Compatible Worker script

Here’s how it is done using Cloudflare Worker’s CLI tool Wrangler:

Generate a new project in Wrangler using a template intended for use with Amazon S3:

wrangler generate <projectname> https://github.com/obezuk/worker-signed-s3-template

This template uses aws4fetch. A fast, lightweight implementation of an S3 Compatible signing library that is commonly used in Service Worker environments like Cloudflare Workers.

The template creates an index.js file with a standard request signing implementation:

import { AwsClient } from 'aws4fetch'

const aws = new AwsClient({
    "accessKeyId": AWS_ACCESS_KEY_ID,
    "secretAccessKey": AWS_SECRET_ACCESS_KEY,
    "region": AWS_DEFAULT_REGION
});

addEventListener('fetch', function(event) {
    event.respondWith(handleRequest(event.request))
});

async function handleRequest(request) {
    var url = new URL(request.url);
    url.hostname = AWS_S3_BUCKET;
    var signedRequest = await aws.sign(url);
    return await fetch(signedRequest, { "cf": { "cacheEverything": true } });
}

Environment Variables

Modify your wrangler.toml file to use your Backblaze B2 API Key ID and Secret:

[env.dev]
vars = { AWS_ACCESS_KEY_ID = "<BACKBLAZE B2 keyId>", 
AWS_SECRET_ACCESS_KEY = "<BACKBLAZE B2 secret>", 
AWS_DEFAULT_REGION = "", 
AWS_S3_BUCKET = "<BACKBLAZE B2 bucketName>.<BACKBLAZE B2 S3 Endpoint>"}

AWS_S3_BUCKET environment variable will be the combination of your bucket name, period and S3 Endpoint. For a Backblaze B2 Bucket named example-bucket and S3 Endpoint s3.us-west-002.backblazeb2.com use example-bucket.s3.us-west-002.backblazeb2.com

AWS_DEFAULT_REGION environment variable is interpreted from your S3 Endpoint. I use us-west-002.

We recommend using Secret Environment variables to store your AWS_SECRET_ACCESS_KEY content when using this script in production.

Preview your Cloudflare Worker

Next run wrangler preview --env dev to enter a preview window of your Worker script. My bucket contained a static website containing adaptive streaming video content stored in a Backblaze B2 bucket.

Backblaze B2 and the S3 Compatible API on Cloudflare
Note: We permit caching of third party video content only for enterprise domains. Free/Pro/Biz users wanting to serve video content via Cloudflare may use Stream which delivers an end-to-end video delivery service.

Backblaze B2’s compatibility for the S3 API is an exciting update that has made their storage platform highly compatible with existing code bases and legacy systems. And, as a special offer to Cloudflare blog readers, Backblaze will pay the migration costs for transferring your data from S3 to Backblaze B2 (click here for more detail). With the cost of migration covered and compatibility for your existing workflows, it is now easier than ever to switch to a Bandwidth Alliance partner and save on first-mile costs. By doing so, you can slash your cloud bills, gain flexibility, and make no compromises to your performance.

To learn more, join us on May 14th for a webinar focused on getting you ultra fast worldwide content delivery.

Ready for changes with Hexagonal Architecture

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/ready-for-changes-with-hexagonal-architecture-b315ec967749

by Damir Svrtan and Sergii Makagon

As the production of Netflix Originals grows each year, so does our need to build apps that enable efficiency throughout the entire creative process. Our wider Studio Engineering Organization has built more than 30 apps that help content progress from pitch (aka screenplay) to playback: ranging from script content acquisition, deal negotiations and vendor management to scheduling, streamlining production workflows, and so on.

Highly integrated from the start

About a year ago, our Studio Workflows team started working on a new app that crosses multiple domains of the business. We had an interesting challenge on our hands: we needed to build the core of our app from scratch, but we also needed data that existed in many different systems.

Some of the data points we needed, such as data about movies, production dates, employees, and shooting locations, were distributed across many services implementing various protocols: gRPC, JSON API, GraphQL and more. Existing data was crucial to the behavior and business logic of our application. We needed to be highly integrated from the start.

Swappable data sources

One of the early applications for bringing visibility into our productions was built as a monolith. The monolith allowed for rapid development and quick changes while the knowledge of the space was non-existent. At one point, more than 30 developers were working on it, and it had well over 300 database tables.

Over time applications evolved from broad service offerings towards being highly specialized. This resulted in a decision to decompose the monolith to specific services. This decision was not geared by performance issues — but with setting boundaries around all of these different domains and enabling dedicated teams to develop domain-specific services independently.

Large amounts of the data we needed for the new app were still provided by the monolith, but we knew that the monolith would be broken up at some point. We were not sure about the timing of the breakup, but we knew that it was inevitable, and we needed to be prepared.

Thus, we could leverage some of the data from the monolith at first as it was still the source of truth, but be prepared to swap those data sources to new microservices as soon as they came online.

Leveraging Hexagonal Architecture

We needed to support the ability to swap data sources without impacting business logic, so we knew we needed to keep them decoupled. We decided to build our app based on principles behind Hexagonal Architecture and Uncle Bob’s Clean Architecture.

The idea of Hexagonal Architecture is to put inputs and outputs at the edges of our design. Business logic should not depend on whether we expose a REST or a GraphQL API, and it should not depend on where we get data from — a database, a microservice API exposed via gRPC or REST, or just a simple CSV file.

The pattern allows us to isolate the core logic of our application from outside concerns. Having our core logic isolated means we can easily change data source details without a significant impact or major code rewrites to the codebase.

One of the main advantages we also saw in having an app with clear boundaries is our testing strategy — the majority of our tests can verify our business logic without relying on protocols that can easily change.

Defining the core concepts

Leveraged from the Hexagonal Architecture, the three main concepts that define our business logic are Entities, Repositories, and Interactors.

  • Entities are the domain objects (e.g., a Movie or a Shooting Location) — they have no knowledge of where they’re stored (unlike Active Record in Ruby on Rails or the Java Persistence API).
  • Repositories are the interfaces to getting entities as well as creating and changing them. They keep a list of methods that are used to communicate with data sources and return a single entity or a list of entities. (e.g. UserRepository)
  • Interactors are classes that orchestrate and perform domain actions — think of Service Objects or Use Case Objects. They implement complex business rules and validation logic specific to a domain action (e.g., onboarding a production)

With these three main types of objects, we are able to define business logic without any knowledge or care where the data is kept and how business logic is triggered. Outside of the business logic are the Data Sources and the Transport Layer:

  • Data Sources are adapters to different storage implementations.
    A data source might be an adapter to a SQL database (an Active Record class in Rails or JPA in Java), an elastic search adapter, REST API, or even an adapter to something simple such as a CSV file or a Hash. A data source implements methods defined on the repository and stores the implementation of fetching and pushing the data.
  • Transport Layer can trigger an interactor to perform business logic. We treat it as an input for our system. The most common transport layer for microservices is the HTTP API Layer and a set of controllers that handle requests. By having business logic extracted into interactors, we are not coupled to a particular transport layer or controller implementation. Interactors can be triggered not only by a controller, but also by an event, a cron job, or from the command line.
The dependency graph in Hexagonal Architecture goes inward.

With a traditional layered architecture, we would have all of our dependencies point in one direction, each layer above depending on the layer below. The transport layer would depend on the interactors, the interactors would depend on the persistence layer.

In Hexagonal Architecture all dependencies point inward — our core business logic does not know anything about the transport layer or the data sources. Still, the transport layer knows how to use interactors, and the data sources know how to conform to the repository interface.

With this, we are prepared for the inevitable changes to other Studio systems, and whenever that needs to happen, the task of swapping data sources is easy to accomplish.

Swapping data sources

The need to swap data sources came earlier than we expected — we suddenly hit a read constraint with the monolith and needed to switch a certain read for one entity to a newer microservice exposed over a GraphQL aggregation layer. Both the microservice and the monolith were kept in sync and had the same data, reading from one service or the other produced the same results.

We managed to transfer reads from a JSON API to a GraphQL data source within 2 hours.

The main reason we were able to pull it off so fast was due to the Hexagonal architecture. We didn’t let any persistence specifics leak into our business logic. We created a GraphQL data source that implemented the repository interface. A simple one-line change was all we needed to start reading from a different data source.

With a proper abstraction it was easy to change data sources

At that point, we knew that Hexagonal Architecture worked for us.

The great part about a one-line change is that it mitigates risks to the release. It is very easy to rollback in the case that a downstream microservice failed on initial deployment. This as well enables us to decouple deployment and activation, as we can decide which data source to use through configuration.

Hiding data source details

One of the great advantages of this architecture is that we are able to encapsulate data source implementation details. We ran into a case where we needed an API call that did not yet exist — a service had an API to fetch a single resource but did not have bulk fetch implemented. After talking with the team providing the API, we realized this endpoint would take some time to deliver. So we decided to move forward with another solution to solve the problem while this endpoint was being built.

We defined a repository method that would grab multiple resources given multiple record identifiers — and the initial implementation of that method on the data source sent multiple concurrent calls to the downstream service. We knew this was a temporary solution and that the second take at the data source implementation was to use the bulk API once implemented.

Our business logic doesn’t need to be aware of specific data source limitations.

A design like this enabled us to move forward with meeting the business needs without accruing much technical debt or the need to change any business logic afterward.

Testing strategy

When we started experimenting with Hexagonal Architecture, we knew we needed to come up with a testing strategy. We knew that a prerequisite to great development velocity was to have a test suite that is reliable and super fast. We didn’t think of it as a nice to have, but a must-have.

We decided to test our app at three different layers:

  • We test our interactors, where the core of our business logic lives but is independent of any type of persistence or transportation. We leverage dependency injection and mock any kind of repository interaction. This is where our business logic is tested in detail, and these are the tests we strive to have most of.
  • We test our data sources to determine if they integrate correctly with other services, whether they conform to the repository interface, and check how they behave upon errors. We try to minimize the amount of these tests.
  • We have integration specs that go through the whole stack, from our Transport / API layer, through the interactors, repositories, data sources, and hit downstream services. These specs test whether we “wired” everything correctly. If a data source is an external API, we hit that endpoint and record the responses (and store them in git), allowing our test suite to run fast on every subsequent invocation. We don’t do extensive test coverage on this layer — usually just one success scenario and one failure scenario per domain action.

We don’t test our repositories as they are simple interfaces that data sources implement, and we rarely test our entities as they are plain objects with attributes defined. We test entities if they have additional methods (without touching the persistence layer).

We have room for improvement, such as not pinging any of the services we rely on but relying 100% on contract testing. With a test suite written in the above manner, we manage to run around 3000 specs in 100 seconds on a single process.

It’s lovely to work with a test suite that can easily be run on any machine, and our development team can work on their daily features without disruption.

Delaying decisions

We are in a great position when it comes to swapping data sources to different microservices. One of the key benefits is that we can delay some of the decisions about whether and how we want to store data internal to our application. Based on the feature’s use case, we even have the flexibility to determine the type of data store — whether it be Relational or Documents.

Uncle Bob said it great:

The purpose of a good architecture is to delay decisions. Why? Because when we delay a decision, we have more information when it comes time to make it.

At the beginning of a project, we have the least amount of information about the system we are building. We should not lock ourselves into an architecture with uninformed decisions leading to a project paradox.

The decisions we made make sense for our needs now and have enabled us to move fast. The best part of Hexagonal Architecture is that it keeps our application flexible for future requirements to come.


Ready for changes with Hexagonal Architecture was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Introducing Secrets and Environment Variables to Cloudflare Workers

Post Syndicated from John Donmoyer original https://blog.cloudflare.com/workers-secrets-environment/

Introducing Secrets and Environment  Variables to Cloudflare Workers

Introducing Secrets and Environment  Variables to Cloudflare Workers

The Workers team here at Cloudflare has been hard at work shipping a bunch of new features in the last year and we’ve seen some amazing things built with the tools we’ve provided. However, as my uncle once said, with great serverless platform growth comes great responsibility.

One of the ways we can help is by ensuring that deploying and maintaining your Workers scripts is a low risk endeavor. Rotating a set of API keys shouldn’t require risking downtime through code edits and redeployments and in some cases it may not make sense for the developer writing the script to know the actual API key value at all. To help tackle this problem, we’re releasing Secrets and Environment Variables to the Wrangler CLI and Workers Dashboard.

Supporting secrets

As we started to design support for secrets in Workers we had a sense that this was already a big concern for a lot of our users but we wanted to learn about all of the use cases to ensure we were building the right thing. We headed to the community forums, twitter, and the inbox of Louis Grace, business development representative extraordinaire, for some anecdotes about Secrets usage. We also sent out a survey to our existing users to learn about use cases and pain points.

We learned that even though there was already a way to store secrets without exposing them via Workers KV, the solution was not very intuitive, nor did it meet all the needs of our users. Many users didn’t even know we had an interim solution in place. Recognizing that we were not the first platform to encounter this problem, we surveyed the existing landscape of Platform as a Service offerings to get a better sense for what our users would expect of us.

Deciding on a solution

One of the first things we found was that not all environment variables are created equal. While the simplest use case for having a defined environment variable may be storing a piece of text that can be updated no matter where it is referenced in a script, sometimes those variables may have higher stakes associated with them. If you’re storing an API key that controls access to an important system, you may not want to allow anyone with dashboard access to see it, maybe not even the developers themselves.

With this in mind, we had to ensure the feature covered two different use cases: one for storing variables in plain text where you could see the variable being referenced and make edits to it and another where the variable would be encrypted as soon as you save it, never to be seen again. This way, we were able to serve both needs of our users, side by side, without one compromising for the other.

Testing our prototypes

Once we had a fairly good idea of what we wanted to build, we built some prototypes and rough implementations in staging environments so we would be able to perform some usability testing. We wrangled up some developers and observed them as they performed a series of tasks where they were asked to add some secrets and plain-text environment variables, reference them in one of their Workers, and bind their Worker to a Worker KV namespace.

Along the way we also asked questions to understand the developer’s professional background, familiarity with the product, and the use cases they’ve had for using Workers in the past along with any paint points they experienced.

While we were testing the new dashboard interface we also began testing the usability of the Wrangler CLI. We had Wrangler users perform the same tasks as the Workers dashboard users to help us find out if users are expecting different things out of their command-line tooling.

Findings and fixes

Through our testing we were able to make a number of changes before the final release. Some of the smaller changes included things like adjusting the behavior of form fields to ensure users knew which variable would be associated with each value. We also made larger changes like electing to separate the KV namespace bindings from the other environment variables as a way to emphasize that KV namespace bindings are not the keys and values themselves but a reference to a namespace where those keys are stored.

Cina, one of our engineers, put together a proposal to align some of our terminology with the terms that our developers were naturally using to describe their workflow. In Wrangler users were accustomed to referencing their KV namespaces by adding a KV namespace binding so when they came to the Workers dashboard interface and saw a field called “KV Variables” they were often confused, thinking they were adding keys and values to the namespace itself instead of establishing a variable that could be used to reference the namespace. As a fix, we decided to call it a “KV namespace binding” throughout the experience.

Try it out

Environment variables are available now with the Wrangler CLI and in the Workers Dashboard so go ahead and give them a shot today!

Introducing Secrets and Environment  Variables to Cloudflare Workers
Adding a secret with Wrangler
Introducing Secrets and Environment  Variables to Cloudflare Workers
Managing environment variables and KV bindings in the Workers Dashboard

As we continue to build out the Workers platform we’d love to hear from you. Let us know if you’re interested in participating in user research or just have something to say as we’d love to hear from you.

How we used our new GraphQL Analytics API to build Firewall Analytics

Post Syndicated from Nick Downie original https://blog.cloudflare.com/how-we-used-our-new-graphql-api-to-build-firewall-analytics/

How we used our new GraphQL Analytics API to build Firewall Analytics

How we used our new GraphQL Analytics API to build Firewall Analytics

Firewall Analytics is the first product in the Cloudflare dashboard to utilize the new GraphQL Analytics API. All Cloudflare dashboard products are built using the same public APIs that we provide to our customers, allowing us to understand the challenges they face when interfacing with our APIs. This parity helps us build and shape our products, most recently the new GraphQL Analytics API that we’re thrilled to release today.

By defining the data we want, along with the response format, our GraphQL Analytics API has enabled us to prototype new functionality and iterate quickly from our beta user feedback. It is helping us deliver more insightful analytics tools within the Cloudflare dashboard to our customers.

Our user research and testing for Firewall Analytics surfaced common use cases in our customers’ workflow:

  • Identifying spikes in firewall activity over time
  • Understanding the common attributes of threats
  • Drilling down into granular details of an individual event to identify potential false positives

We can address all of these use cases using our new GraphQL Analytics API.

GraphQL Basics

Before we look into how to address each of these use cases, let’s take a look at the format of a GraphQL query and how our schema is structured.

A GraphQL query is comprised of a structured set of fields, for which the server provides corresponding values in its response. The schema defines which fields are available and their type. You can find more information about the GraphQL query syntax and format in the official GraphQL documentation.

To run some GraphQL queries, we recommend downloading a GraphQL client, such as GraphiQL, to explore our schema and run some queries. You can find documentation on getting started with this in our developer docs.

At the top level of the schema is the viewer field. This represents the top level node of the user running the query. Within this, we can query the zones field to find zones the current user has access to, providing a filter argument, with a zoneTag of the identifier of the zone we’d like narrow down to.

{
  viewer {
    zones(filter: { zoneTag: "YOUR_ZONE_ID" }) {
      # Here is where we'll query our firewall events
    }
  }
}

Now that we have a query that finds our zone, we can start querying the firewall events which have occurred in that zone, to help solve some of the use cases we’ve identified.

Visualising spikes in firewall activity

It’s important for customers to be able to visualise and understand anomalies and spikes in their firewall activity, as these could indicate an attack or be the result of a misconfiguration.

Plotting events in a timeseries chart, by their respective action, provides users with a visual overview of the trend of their firewall events.

Within the zones field in the query we’ve created earlier, we can query our firewall event aggregates using the firewallEventsAdaptiveGroups field, providing arguments to limit the count of groups, a filter for the date range we’re looking for (combined with any user-entered filters), and a list of fields to order by; in this case, just the datetimeHour field that we’re grouping by.

Within the zones field in the query we created earlier, we can further query our firewall event aggregates using the firewallEventsAdaptiveGroups field and providing arguments for:

  • A limit for the count of groups
  • A filter for the date range we’re looking for (combined with any user-entered filters)
  • A list of fields to orderBy (in this case, just the datetimeHour field that we’re grouping by).

By adding the dimensions field, we’re querying for groups of firewall events, aggregated by the fields nested within dimensions. In this case, our query includes the action and datetimeHour fields, meaning the response will be groups of firewall events which share the same action, and fall within the same hour. We also add a count field, to get a numeric count of how many events fall within each group.

query FirewallEventsByTime($zoneTag: string, $filter: FirewallEventsAdaptiveGroupsFilter_InputObject) {
  viewer {
    zones(filter: { zoneTag: $zoneTag }) {
      firewallEventsAdaptiveGroups(
        limit: 576
        filter: $filter
        orderBy: [datetimeHour_DESC]
      ) {
        count
        dimensions {
          action
          datetimeHour
        }
      }
    }
  }
}

Note – Each of our groups queries require a limit to be set. A firewall event can have one of 8 possible actions, and we are querying over a 72 hour period. At most, we’ll end up with 567 groups, so we can set that as the limit for our query.

This query would return a response in the following format:

{
  "viewer": {
    "zones": [
      {
        "firewallEventsAdaptiveGroups": [
          {
            "count": 5,
            "dimensions": {
              "action": "jschallenge",
              "datetimeHour": "2019-09-12T18:00:00Z"
            }
          }
          ...
        ]
      }
    ]
  }
}

We can then take these groups and plot each as a point on a time series chart. Mapping over the firewallEventsAdaptiveGroups array, we can use the group’s count property on the y-axis for our chart, then use the nested fields within the dimensions object, using action as unique series and the datetimeHour as the time stamp on the x-axis.

How we used our new GraphQL Analytics API to build Firewall Analytics

Top Ns

After identifying a spike in activity, our next step is to highlight events with commonality in their attributes. For example, if a certain IP address or individual user agent is causing many firewall events, this could be a sign of an individual attacker, or could be surfacing a false positive.

Similarly to before, we can query aggregate groups of firewall events using the firewallEventsAdaptiveGroups field. However, in this case, instead of supplying action and datetimeHour to the group’s dimensions, we can add individual fields that we want to find common groups of.

By ordering by descending count, we’ll retrieve groups with the highest commonality first, limiting to the top 5 of each. We can add a single field nested within dimensions to group by it. For example, adding clientIP will give five groups with the IP addresses causing the most events.

We can also add a firewallEventsAdaptiveGroups field with no nested dimensions. This will create a single group which allows us to find the total count of events matching our filter.

query FirewallEventsTopNs($zoneTag: string, $filter: FirewallEventsAdaptiveGroupsFilter_InputObject) {
  viewer {
    zones(filter: { zoneTag: $zoneTag }) {
      topIPs: firewallEventsAdaptiveGroups(
        limit: 5
        filter: $filter
        orderBy: [count_DESC]
      ) {
        count
        dimensions {
          clientIP
        }
      }
      topUserAgents: firewallEventsAdaptiveGroups(
        limit: 5
        filter: $filter
        orderBy: [count_DESC]
      ) {
        count
        dimensions {
          userAgent
        }
      }
      total: firewallEventsAdaptiveGroups(
        limit: 1
        filter: $filter
      ) {
        count
      }
    }
  }
}

Note – we can add the firewallEventsAdaptiveGroups field multiple times within a single query, each aliased differently. This allows us to fetch multiple different groupings by different fields, or with no groupings at all. In this case, getting a list of top IP addresses, top user agents, and the total events.

How we used our new GraphQL Analytics API to build Firewall Analytics

We can then reference each of these aliases in the UI, mapping over their respective groups to render each row with its count, and a bar which represents the proportion of total events, showing the proportion of all events each row equates to.

Are these firewall events false positives?

After users have identified spikes, anomalies and common attributes, we wanted to surface more information as to whether these have been caused by malicious traffic, or are false positives.

To do this, we wanted to provide additional context on the events themselves, rather than just counts. We can do this by querying the firewallEventsAdaptive field for these events.

Our GraphQL schema uses the same filter format for both the aggregate firewallEventsAdaptiveGroups field and the raw firewallEventsAdaptive field. This allows us to use the same filters to fetch the individual events which summate to the counts and aggregates in the visualisations above.

query FirewallEventsList($zoneTag: string, $filter: FirewallEventsAdaptiveFilter_InputObject) {
  viewer {
    zones(filter: { zoneTag: $zoneTag }) {
      firewallEventsAdaptive(
        filter: $filter
        limit: 10
        orderBy: [datetime_DESC]
      ) {
        action
        clientAsn
        clientCountryName
        clientIP
        clientRequestPath
        clientRequestQuery
        datetime
        rayName
        source
        userAgent
      }
    }
  }
}

How we used our new GraphQL Analytics API to build Firewall Analytics

Once we have our individual events, we can render all of the individual fields we’ve requested, providing users the additional context on event they need to determine whether this is a false positive or not.

That’s how we used our new GraphQL Analytics API to build Firewall Analytics, helping solve some of our customers most common security workflow use cases. We’re excited to see what you build with it, and the problems you can help tackle.

You can find out how to get started querying our GraphQL Analytics API using GraphiQL in our developer documentation, or learn more about writing GraphQL queries on the official GraphQL Foundation documentation.

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Post Syndicated from Filipp Nisenzoun original https://blog.cloudflare.com/introducing-the-graphql-analytics-api-exactly-the-data-you-need-all-in-one-place/

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Today we’re excited to announce a powerful and flexible new way to explore your Cloudflare metrics and logs, with an API conforming to the industry-standard GraphQL specification. With our new GraphQL Analytics API, all of your performance, security, and reliability data is available from one endpoint, and you can select exactly what you need, whether it’s one metric for one domain or multiple metrics aggregated for all of your domains. You can ask questions like “How many cached bytes have been returned for these three domains?” Or, “How many requests have all the domains under my account received?” Or even, “What effect did changing my firewall rule an hour ago have on the responses my users were seeing?”

The GraphQL standard also has strong community resources, from extensive documentation to front-end clients, making it easy to start creating simple queries and progress to building your own sophisticated analytics dashboards.

From many APIs…

Providing insights has always been a core part of Cloudflare’s offering. After all, by using Cloudflare, you’re relying on us for key parts of your infrastructure, and so we need to make sure you have the data to manage, monitor, and troubleshoot your website, app, or service. Over time, we developed a few key data APIs, including ones providing information regarding your domain’s traffic, DNS queries, and firewall events. This multi-API approach was acceptable while we had only a few products, but we started to run into some challenges as we added more products and analytics. We couldn’t expect users to adopt a new analytics API every time they started using a new product. In fact, some of the customers and partners that were relying on many of our products were already becoming confused by the various APIs.

Following the multi-API approach was also affecting how quickly we could develop new analytics within the Cloudflare dashboard, which is used by more people for data exploration than our APIs. Each time we built a new product, our product engineering teams had to implement a corresponding analytics API, which our user interface engineering team then had to learn to use. This process could take up to several months for each new set of analytics dashboards.

…to one

Our new GraphQL Analytics API solves these problems by providing access to all Cloudflare analytics. It offers a standard, flexible syntax for describing exactly the data you need and provides predictable, matching responses. This approach makes it an ideal tool for:

  1. Data exploration. You can think of it as a way to query your own virtual data warehouse, full of metrics and logs regarding the performance, security, and reliability of your Internet property.
  2. Building amazing dashboards, which allow for flexible filtering, sorting, and drilling down or rolling up. Creating these kinds of dashboards would normally require paying thousands of dollars for a specialized analytics tool. You get them as part of our product and can customize them for yourself using the API.

In a companion post that was also published today, my colleague Nick discusses using the GraphQL Analytics API to build dashboards. So, in this post, I’ll focus on examples of how you can use the API to explore your data. To make the queries, I’ll be using GraphiQL, a popular open-source querying tool that takes advantage of GraphQL’s capabilities.

Introspection: what data is available?

The first thing you may be wondering: if the GraphQL Analytics API offers access to so much data, how do I figure out what exactly is available, and how I can ask for it? GraphQL makes this easy by offering “introspection,” meaning you can query the API itself to see the available data sets, the fields and their types, and the operations you can perform. GraphiQL uses this functionality to provide a “Documentation Explorer,” query auto-completion, and syntax validation. For example, here is how I can see all the data sets available for a zone (domain):

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

If I’m writing a query, and I’m interested in data on firewall events, auto-complete will help me quickly find relevant data sets and fields:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Querying: examples of questions you can ask

Let’s say you’ve made a major product announcement and expect a surge in requests to your blog, your application, and several other zones (domains) under your account. You can check if this surge materializes by asking for the requests aggregated under your account, in the 30 minutes after your announcement post, broken down by the minute:

{
 viewer { 
   accounts (filter: {accountTag: $accountTag}) {
     httpRequests1mGroups(limit: 30, filter: {datetime_geq: "2019-09-16T20:00:00Z", datetime_lt: "2019-09-16T20:30:00Z"}, orderBy: [datetimeMinute_ASC]) {
	  dimensions {
		datetimeMinute
	  }
	  sum {
		requests
	  }
	}
   }
 }
}

Here is the first part of the response, showing requests for your account, by the minute:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Now, let’s say you want to compare the traffic coming to your blog versus your marketing site over the last hour. You can do this in one query, asking for the number of requests to each zone:

{
 viewer {
   zones(filter: {zoneTag_in: [$zoneTag1, $zoneTag2]}) {
     httpRequests1hGroups(limit: 2, filter: {datetime_geq: "2019-09-16T20:00:00Z",
datetime_lt: "2019-09-16T21:00:00Z"}) {
       sum {
         requests
       }
     }
   }
 }
}

Here is the response:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

Finally, let’s say you’re seeing an increase in error responses. Could this be correlated to an attack? You can look at error codes and firewall events over the last 15 minutes, for example:

{
 viewer {
   zones(filter: {zoneTag: $zoneTag}) {
     httpRequests1mGroups (limit: 100,
filter: {datetime_geq: "2019-09-16T21:00:00Z",
datetime_lt: "2019-09-16T21:15:00Z"}) {
       sum {
         responseStatusMap {
           edgeResponseStatus
           requests
         }
       }
     }
    firewallEventsAdaptiveGroups (limit: 100,
filter: {datetime_geq: "2019-09-16T21:00:00Z",
datetime_lt: "2019-09-16T21:15:00Z"}) {
       dimensions {
         action
       }
       count
     }
    }
  }
}

Notice that, in this query, we’re looking at multiple datasets at once, using a common zone identifier to “join” them. Here are the results:

Introducing the GraphQL Analytics API: exactly the data you need, all in one place

By examining both data sets in parallel, we can see a correlation: 31 requests were “dropped” or blocked by the Firewall, which is exactly the same as the number of “403” responses. So, the 403 responses were a result of Firewall actions.

Try it today

To learn more about the GraphQL Analytics API and start exploring your Cloudflare data, follow the “Getting started” guide in our developer documentation, which also has details regarding the current data sets and time periods available. We’ll be adding more data sets over time, so take advantage of the introspection feature to see the latest available.

Finally, to make way for the new API, the Zone Analytics API is now deprecated and will be sunset on May 31, 2020. The data that Zone Analytics provides is available from the GraphQL Analytics API. If you’re currently using the API directly, please follow our migration guide to change your API calls. If you get your analytics using the Cloudflare dashboard or our Datadog integration, you don’t need to take any action.

One more thing….

In the API examples above, if you find it helpful to get analytics aggregated for all the domains under your account, we have something else you may like: a brand new Analytics dashboard (in beta) that provides this same information. If your account has many zones, the dashboard is helpful for knowing summary information on metrics such as requests, bandwidth, cache rate, and error rate. Give it a try and let us know what you think using the feedback link above the new dashboard.

What’s new with Workers KV?

Post Syndicated from Steve Klabnik original https://blog.cloudflare.com/whats-new-with-workers-kv/

What’s new with Workers KV?

What’s new with Workers KV?

The Storage team here at Cloudflare shipped Workers KV, our global, low-latency, key-value store, earlier this year. As people have started using it, we’ve gotten some feature requests, and have shipped some new features in response! In this post, we’ll talk about some of these use cases and how these new features enable them.

New KV APIs

We’ve shipped some new APIs, both via api.cloudflare.com, as well as inside of a Worker. The first one provides the ability to upload and delete more than one key/value pair at once. Given that Workers KV is great for read-heavy, write-light workloads, a common pattern when getting started with KV is to write a bunch of data via the API, and then read that data from within a Worker. You can now do these bulk uploads without needing a separate API call for every key/value pair. This feature is available via api.cloudflare.com, but is not yet available from within a Worker.

For example, say we’re using KV to redirect legacy URLs to their new homes. We have a list of URLs to redirect, and where they should redirect to. We can turn this list into JSON that looks like this:

[
  {
    "key": "/old/post/1",
    "value": "/new-post-slug-1"
  },
  {
    "key": "/old/post/2",
    "value": "/new-post-slug-2"
  }
]

And then POST this JSON to the new bulk endpoint, /storage/kv/namespaces/:namespace_id/bulk. This will add both key/value pairs to our namespace.

Likewise, if we wanted to drop support for these redirects, we could issue a DELETE that has this body:

[
    "/old/post/1",
    "/old/post/2"
]

to /storage/kv/namespaces/:namespace_id/bulk, and we’d delete both key/value pairs in a single call to the API.

The bulk upload API has one more trick up its sleeve: not all data is a string. For example, you may have an image as a value, which is just a bag of bytes. if you need to write some binary data, you’ll have to base64 the value’s contents so that it’s valid JSON. You’ll also need to set one more key:

[
  {
    "key": "profile-picture",
    "value": "aGVsbG8gd29ybGQ=",
    "base64": true
  }
]

Workers KV will decode the value from base64, and then store the resulting bytes.

Beyond bulk upload and delete, we’ve also given you the ability to list all of the keys you’ve stored in any of your namespaces, from both the API and within a Worker. For example, if you wrote a blog powered by Workers + Workers KV, you might have each blog post stored as a key/value pair in a namespace called “contents”. Most blogs have some sort of “index” page that lists all of the posts that you can read. To create this page, we need to get a listing of all of the keys, since each key corresponds to a given post. We could do this from within a Worker by calling list() on our namespace binding:

const value = await contents.list()

But what we get back isn’t only a list of keys. The object looks like this:

{
  keys: [
    { name: "Title 1” },
    { name: "Title 2” }
  ],
  list_complete: false,
  cursor: "6Ck1la0VxJ0djhidm1MdX2FyD"
}

We’ll talk about this “cursor” stuff in a second, but if we wanted to get the list of titles, we’d have to iterate over the keys property, and pull out the names:

const keyNames = value.keys.map(e => e.name)

keyNames would be an array of strings:

[“Title 1”, “Title 2”, “Title 3”, “Title 4”, “Title 5”]

We could take keyNames and those titles to build our page.

So what’s up with the list_complete and cursor properties? Well, imagine that we’ve been a very prolific blogger, and we’ve now written thousands of posts. The list API is paginated, meaning that it will only return the first thousand keys. To see if there are more pages available, you can check the list_complete property. If it is false, you can use the cursor to fetch another page of results. The value of cursor is an opaque token that you pass to another call to list:

const value = await NAMESPACE.list()
const cursor = value.cursor
const next_value = await NAMESAPACE.list({"cursor": cursor})

This will give us another page of results, and we can repeat this process until list_complete is true.

Listing keys has one more trick up its sleeve: you can also return only keys that have a certain prefix. Imagine we want to have a list of posts, but only the posts that were made in October of 2019. While Workers KV is only a key/value store, we can use the prefix functionality to do interesting things by filtering the list. In our original implementation, we had stored the titles of keys only:

  • Title 1
  • Title 2

We could change this to include the date in YYYY-MM-DD format, with a colon separating the two:

  • 2019-09-01:Title 1
  • 2019-10-15:Title 2

We can now ask for a list of all posts made in 2019:

const value = await NAMESAPCE.list({"prefix": "2019"})

Or a list of all posts made in October of 2019:

const value = await NAMESAPCE.list({"prefix": "2019-10"})

These calls will only return keys with the given prefix, which in our case, corresponds to a date. This technique can let you group keys together in interesting ways. We’re looking forward to seeing what you all do with this new functionality!

Relaxing limits

For various reasons, there are a few hard limits with what you can do with Workers KV. We’ve decided to raise some of these limits, which expands what you can do.

The first is the limit of the number of namespaces any account could have. This was previously set at 20, but some of you have made a lot of namespaces! We’ve decided to relax this limit to 100 instead. This means you can create five times the number of namespaces you previously could.

Additionally, we had a two megabyte maximum size for values. We’ve increased the limit for values to ten megabytes. With the release of Workers Sites, folks are keeping things like images inside of Workers KV, and two megabytes felt a bit cramped. While Workers KV is not a great fit for truly large values, ten megabytes gives you the ability to store larger images easily. As an example, a 4k monitor has a native resolution of 4096 x 2160 pixels. If we had an image at this resolution as a lossless PNG, for example, it would be just over five megabytes in size.

KV browser

Finally, you may have noticed that there’s now a KV browser in the dashboard! Needing to type out a cURL command just to see what’s in your namespace was a real pain, and so we’ve given you the ability to check out the contents of your namespaces right on the web. When you look at a namespace, you’ll also see a table of keys and values:

What’s new with Workers KV?

The browser has grown with a bunch of useful features since it initially shipped. You can not only see your keys and values, but also add new ones:

What’s new with Workers KV?

edit existing ones:

What’s new with Workers KV?

…and even upload files!

What’s new with Workers KV?

You can also download them:

What’s new with Workers KV?

As we ship new features in Workers KV, we’ll be expanding the browser to include them too.

Wrangler integration

The Workers Developer Experience team has also been shipping some features related to Workers KV. Specifically, you can fully interact with your namespaces and the key/value pairs inside of them.

For example, my personal website is running on Workers Sites. I have a Wrangler project named “website” to manage it. If I wanted to add another namespace, I could do this:

$ wrangler kv:namespace create new_namespace
Creating namespace with title "website-new_namespace"
Success: WorkersKvNamespace {
    id: "<id>",
    title: "website-new_namespace",
}

Add the following to your wrangler.toml:

kv-namespaces = [
    { binding = "new_namespace", id = "<id>" }
]

I’ve redacted the namespace IDs here, but Wrangler let me know that the creation was successful, and provided me with the configuration I need to put in my wrangler.toml. Once I’ve done that, I can add new key/value pairs:

$ wrangler kv:key put "hello" "world" --binding new_namespace
Success

And read it back out again:

> wrangler kv:key get "hello" --binding new_namespace
world

If you’d like to learn more about the design of these features, “How we design features for Wrangler, the Cloudflare Workers CLI” discusses them in depth.

More to come

The Storage team is working hard at improving Workers KV, and we’ll keep shipping new stuff every so often. Our updates will be more regular in the future. If there’s something you’d particularly like to see, please reach out!