Tag Archives: апи

Streamlining evidence collection with AWS Audit Manager

Post Syndicated from Nicholas Parks original https://aws.amazon.com/blogs/security/streamlining-evidence-collection-with-aws-audit-manager/

In this post, we will show you how to deploy a solution into your Amazon Web Services (AWS) account that enables you to simply attach manual evidence to controls using AWS Audit Manager. Making evidence-collection as seamless as possible minimizes audit fatigue and helps you maintain a strong compliance posture.

As an AWS customer, you can use APIs to deliver high quality software at a rapid pace. If you have compliance-focused teams that rely on manual, ticket-based processes, you might find it difficult to document audit changes as those changes increase in velocity and volume.

As your organization works to meet audit and regulatory obligations, you can save time by incorporating audit compliance processes into a DevOps model. You can use modern services like Audit Manager to make this easier. Audit Manager automates evidence collection and generates reports, which helps reduce manual auditing efforts and enables you to scale your cloud auditing capabilities along with your business.

AWS Audit Manager uses services such as AWS Security Hub, AWS Config, and AWS CloudTrail to automatically collect and organize evidence, such as resource configuration snapshots, user activity, and compliance check results. However, for controls represented in your software or processes without an AWS service-specific metric to gather, you need to manually create and provide documentation as evidence to demonstrate that you have established organizational processes to maintain compliance. The solution in this blog post streamlines these types of activities.

Solution architecture

This solution creates an HTTPS API endpoint, which allows integration with other software development lifecycle (SDLC) solutions, IT service management (ITSM) products, and clinical trial management systems (CTMS) solutions that capture trial process change amendment documentation (in the case of pharmaceutical companies who use AWS to build robust pharmacovigilance solutions). The endpoint can also be a backend microservice to an application that allows contract research organizations (CRO) investigators to add their compliance supporting documentation.

In this solution’s current form, you can submit an evidence file payload along with the assessment and control details to the API and this solution will tie all the information together for the audit report. This post and solution is directed towards engineering teams who are looking for a way to accelerate evidence collection. To maximize the effectiveness of this solution, your engineering team will also need to collaborate with cross-functional groups, such as audit and business stakeholders, to design a process and service that constructs and sends the message(s) to the API and to scale out usage across the organization.

To download the code for this solution, and the configuration that enables you to set up auto-ingestion of manual evidence, see the aws-audit-manager-manual-evidence-automation GitHub repository.

Architecture overview

In this solution, you use AWS Serverless Application Model (AWS SAM) templates to build the solution and deploy to your AWS account. See Figure 1 for an illustration of the high-level architecture.

Figure 1. The architecture of the AWS Audit Manager automation solution

Figure 1. The architecture of the AWS Audit Manager automation solution

The SAM template creates resources that support the following workflow:

  1. A client can call an Amazon API Gateway endpoint by sending a payload that includes assessment details and the evidence payload.
  2. An AWS Lambda function implements the API to handle the request.
  3. The Lambda function uploads the evidence to an Amazon Simple Storage Service (Amazon S3) bucket (3a) and uses AWS Key Management Service (AWS KMS) to encrypt the data (3b).
  4. The Lambda function also initializes the AWS Step Functions workflow.
  5. Within the Step Functions workflow, a Standard Workflow calls two Lambda functions. The first looks for a matching control within an assessment, and the second updates the control within the assessment with the evidence.
  6. When the Step Functions workflow concludes, it sends a notification for success or failure to subscribers of an Amazon Simple Notification Service (Amazon SNS) topic.

Deploy the solution

The project available in the aws-audit-manager-manual-evidence-automation GitHub repository contains source code and supporting files for a serverless application you can deploy with the AWS SAM command line interface (CLI). It includes the following files and folders:

src Code for the application’s Lambda implementation of the Step Functions workflow.
It also includes a Step Functions definition file.
template.yml A template that defines the application’s AWS resources.

Resources for this project are defined in the template.yml file. You can update the template to add AWS resources through the same deployment process that updates your application code.

Prerequisites

This solution assumes the following:

  1. AWS Audit Manager is enabled.
  2. You have already created an assessment in AWS Audit Manager.
  3. You have the necessary tools to use the AWS SAM CLI (see details in the table that follows).

For more information about setting up Audit Manager and selecting a framework, see Getting started with Audit Manager in the blog post AWS Audit Manager Simplifies Audit Preparation.

The AWS SAM CLI is an extension of the AWS CLI that adds functionality for building and testing Lambda applications. The AWS SAM CLI uses Docker to run your functions in an Amazon Linux environment that matches Lambda. It can also emulate your application’s build environment and API.

To use the AWS SAM CLI, you need the following tools:

AWS SAM CLI Install the AWS SAM CLI
Node.js Install Node.js 14, including the npm package management tool
Docker Install Docker community edition

To deploy the solution

  1. Open your terminal and use the following command to create a folder to clone the project into, then navigate to that folder. Be sure to replace <FolderName> with your own value.

    mkdir Desktop/<FolderName> && cd $_

  2. Clone the project into the folder you just created by using the following command.

    git clone https://github.com/aws-samples/aws-audit-manager-manual-evidence-automation.git

  3. Navigate into the newly created project folder by using the following command.

    cd aws-audit-manager-manual-evidence-automation

  4. In the AWS SAM shell, use the following command to build the source of your application.

    sam build

  5. In the AWS SAM shell, use the following command to package and deploy your application to AWS. Be sure to replace <DOC-EXAMPLE-BUCKET> with your own unique S3 bucket name.

    sam deploy –guided –parameter-overrides paramBucketName=<DOC-EXAMPLE-BUCKET>

  6. When prompted, enter the AWS Region where AWS Audit Manager was configured. For the rest of the prompts, leave the default values.
  7. To activate the IAM authentication feature for API gateway, override the default value by using the following command.

    paramUseIAMwithGateway=AWS_IAM

To test the deployed solution

After you deploy the solution, run an invocation like the one below for an assessment (using curl). Be sure to replace <YOURAPIENDPOINT> and <AWS REGION> with your own values.

curl –location –request POST
‘https://<YOURAPIENDPOINT>.execute-api.<AWS REGION>.amazonaws.com/Prod’ \
–header ‘x-api-key: ‘ \
–form ‘payload=@”<PATH TO FILE>”‘ \
–form ‘AssessmentName=”GxP21cfr11″‘ \
–form ‘ControlSetName=”General requirements”‘ \
–form ‘ControlIdName=”11.100(a)”‘

Check to see that your file is correctly attached to the control for your assessment.

Form-data interface parameters

The API implements a form-data interface that expects four parameters:

  1. AssessmentName: The name for the assessment in Audit Manager. In this example, the AssessmentName is GxP21cfr11.
  2. ControlSetName: The display name for a control set within an assessment. In this example, the ControlSetName is General requirements.
  3. ControlIdName: this is a particular control within a control set. In this example, the ControlIdName is 11.100(a).
  4. Payload: this is the file representing evidence to be uploaded.

As a refresher of Audit Manager concepts, evidence is collected for a particular control. Controls are grouped into control sets. Control sets can be grouped into a particular framework. The assessment is considered an implementation, or an instance, of the framework. For more information, see AWS Audit Manager concepts and terminology.

To clean up the deployed solution

To clean up the solution, use the following commands to delete the AWS CloudFormation stack and your S3 bucket. Be sure to replace <YourStackId> and <DOC-EXAMPLE-BUCKET> with your own values.

aws cloudformation delete-stack –stack-name <YourStackId>
aws s3 rb s3://<DOC-EXAMPLE-BUCKET> –force

Conclusion

This solution provides a way to allow for better coordination between your software delivery organization and compliance professionals. This allows your organization to continuously deliver new updates without overwhelming your security professionals with manual audit review tasks.

Next steps

There are various ways to extend this solution.

  1. Update the API Lambda implementation to be a webhook for your favorite software development lifecycle (SDLC) or IT service management (ITSM) solution.
  2. Modify the steps within the Step Functions state machine to more closely match your unique compliance processes.
  3. Use AWS CodePipeline to start Step Functions state machines natively, or integrate a variation of this solution with any continuous compliance workflow that you have.

Learn more AWS Audit Manager, DevOps, and AWS for Health and start building!

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Nicholas Parks

Nicholas Parks

Nicholas has been using AWS since 2010 across various enterprise verticals including healthcare, life sciences, financial, retail, and telecommunications. Nicholas focuses on modernizations in pursuit of new revenue as well as application migrations. He specializes in Lean, DevOps cultural change, and Continuous Delivery.

Brian Tang

Brian Tang

Brian Tang is an AWS Solutions Architect based out of Boston, MA. He has 10 years of experience helping enterprise customers across a wide range of industries complete digital transformations by migrating business-critical workloads to the cloud. His core interests include DevOps and serverless-based solutions. Outside of work, he loves rock climbing and playing guitar.

Ексклузивни документи по #АПИГейт Замесени в инхаусите са преназначени начело на „Автомагистрали“ и АПИ

Post Syndicated from Николай Марченко original https://bivol.bg/%D0%B7%D0%B0%D0%BC%D0%B5%D1%81%D0%B5%D0%BD%D0%B8-%D0%B2-%D0%B8%D0%BD%D1%85%D0%B0%D1%83%D1%81%D0%B8%D1%82%D0%B5-%D1%81%D0%B0-%D0%BF%D1%80%D0%B5%D0%BD%D0%B0%D0%B7%D0%BD%D0%B0%D1%87%D0%B5%D0%BD%D0%B8.html

петък 4 февруари 2022


Вицепремиерът и министър на регионалното развитие от квотата на „Има такъв народ“ Гроздан Караджов е назначил замесени в “инхаус” сделки за укрепване на свлачища и АМ “Хемус” в управителния съвет…

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

Post Syndicated from Matthew Harrell original https://blog.cloudflare.com/zero-trust-managed-services/

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

The importance of the Cloudflare Partner Network was on full display in 2021, with record level partner growth in 2021 and aiming even higher in 2022. We’ve been listening to our partners and working to constantly strengthen our ability to deliver value for businesses of all types. An area we identified we could do better, is a program to support “service partners” that want to wrap managed and professional services around Cloudflare products. Today, we are excited to announce the next evolution of the Cloudflare Channel and Alliances Partner Program to specifically enable partners that provide services around Cloudflare products with recurring revenue streams as they equip businesses of all sizes and types with Cloudflare’s leading Zero Trust and SASE solutions.

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

Core to enabling Services Partners are some exciting enhancements:

  • New Program Paths
  • New Managed Services Partner (MSP) Accreditation.
  • New Support & Go-To-Market Motions

New Program Paths

We have seen a 29% increase in ransom DDoS attacks over the past year and a 175% increase just last quarter. Partners continue to be on the front lines helping mitigate and prevent disruption from these events as they extend our services. Our goal for 2022 is to arm our partners with the tools to address these increased threats to their customers’ infrastructure and applications, like Office365. We know the painstaking efforts that Services Partners make to build compelling offerings and provide the best customer outcomes. To that end, we’ve enhanced the Cloudflare Channel and Alliances Partner Program to not only focus on extending security, performance, and reliability to more customers, but provide a services-first approach for partners. We are committing to the growth of MSP & MSSP partners who share Cloudflare’s mission of helping to build a better Internet.

We’ve been building out the Cloudflare Partner Network for years, working alongside businesses of all sizes and types including our world-wide system integrator partners like Rackspace, as well as partners across EMEA, APAC like Blazeclan, and Kingsoft in China, to name a few. Working with enthusiastic, targeted partners has given us the opportunity to build a program to deliver a robust yet tailored program to support our partners’ ever-evolving needs from network transformations and application modernization, to Zero Trust solutions and onwards.

“Rackspace offers expert consultative services and 24×7 managed support based on Cloudflare’s web application security, secure Internet access, gateway and browsing technologies. With Rackspace Elastic Engineering for Security and Cloudflare’s cutting-edge cloud solutions, we’ll be able to help our customers fast-forward to a modern, Zero Trust approach to securing their teams, applications and infrastructure.”
Gary Alterson, Vice President of Security Services at Rackspace Technology

“We are excited to participate in Cloudflare’s new Services Partner Program.  Cloudflare provides the advanced cloud technology at the center of digital business models.  As a Cloudflare MSP Elite partner, we are able to leverage their innovative cloud solutions with embedded, network edge security to give our customers single-pane-of-glass reporting and policy management across internet applications.”
Veeraj Thaploo, CTO, Blazeclan technologies

Advanced Managed Services Partners

The entry point for partners that are providing managed IT services for small to midsize organizations who are fanatical about providing the best customer experience by leveraging Cloudflare’s extensive platform. Advanced MSP Partners will have access to tools that will help embed Cloudflare into their services offerings to deliver unique, outcome-based SASE and Zero Trust solutions.

Elite Managed Services Partners

Reserved for global partners that possess resource-intense engagement in IT consulting, advisory, and large scale services in the Large Enterprise segment. Elite MSP Partners will have access to dedicated technical and sales resources as well as a named Partner Success Manager to guide growth and business development around their Cloudflare-based services practice. In addition, these partners will have access to executive leadership and marketing programs to develop joint go-to-market functions.

Services-Only Partnerships

Invite-only for partners to deliver general professional services, highly technical services such as Application Modernization via Cloudflare Workers development, Implementation and Migration Services, and Security Assessments.

New Managed Services Partner Accreditation (AMSP)

Helping build a better Internet isn’t simple. Providing managed services for a wide range of customer types and sizes requires a targeted but ever-evolving approach. To support the new Services Program, we’ve introduced a new Certification course that provides the foundational tools for creating the Cloudflare customer experience. Course highlights include best practices for services design, implementation, and providing ongoing technical and relationship management. Services Partners provide customers with a seamless experience from implementation to configuration to ongoing services; here are some of the highlights:

  • New Enablement path for MSP Partners – L1/ L2 Support + Account Management
  • New Customer Success Enablement path for Services Partners
  • New Specialist Accreditation for Cloudflare Zero Trust
Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

New Support & Go-To-Market Motions

Technical Services Managers & Partner Success

Regional resources dedicated to helping partners develop, drive, and expand their Cloudflare-based services offerings, specifically centered around SASE and Zero Trust.

Tenant API Provisioning

Our Partner Platform was built to be extensible, and we’re excited to announce that all of our partners can support the provisioning and deployment of a leading Zero Trust suite of products.

MSP-only Licensing Plan

MSPs and MSSPs deliver finished goods and need cost-efficient ways to deliver customer outcomes. We’ve heard this from our partners, and we are developing a licensing plan that enables MSPs to sell their services powered by the Cloudflare One platform.

Transform your Customer’s Network

We’ve been working diligently to build the pilot program throughout this past year, and are eager to work with the next cohort of service partners that want to help build a better Internet for customers. We’re just getting started in providing a tailored and targeted foundation for the Cloudflare Partner Ecosystem.

Cloudflare Services Partner Offering Menu

Cloudflare Partner Program Now Supports SASE & Zero Trust Managed Services

Come join us!

During the initial launch phase, we are working with a select group of partners, and are looking forward to expanding and growing with even more partnerships. Our goal is to enable partners that are creating cloud-based security solutions and that want to join us, to further our mission to help build a better Internet. If you believe your organization would be a great fit for this program, we would love to chat with you.

More Information:

Become a Partner: Partner Portal or reach out to [email protected].

Zabbix meets television – Clever use of Zabbix features by Wolfgang Alper / Zabbix Summit Online 2021

Post Syndicated from Wolfgang Alper original https://blog.zabbix.com/zabbix-meets-television-clever-use-of-zabbix-features-by-wolfgang-alper-zabbix-summit-online-2021/19181/

TV broadcasting infrastructures have seen many great paradigm shifts over the years. From TV to live streaming – the underlying architecture consists of many moving parts supplied by different vendors and solutions. Any potential problems can cause critical downtimes, which are simply not acceptable. Let’s look at how Zabbix fits right into such a dynamic and ever-changing environment.

The full recording of the speech is available on the official Zabbix Youtube channel.

In this post, I will talk about how Zabbix is used in ZDF – Zweites Deutsche Fernsehen (Second German Television). I will specifically focus on the most unique and interesting use cases, and I hope that you will be able to use this knowledge in your next project.

ZDF – Some history

Before we move on with our unique use cases, I would like to introduce you to the history of ZDF. This will help you understand the scope and the potential complexity and scale of the underlying systems and company policies.

  • In 1961, the federal states established a central non-profit television broadcaster – Zweites Deutsches Fernsehen
  • In 1963 on April 1, ZDF officially went on air and had reached 61 percent of television viewers
  • On the Internet, a selection of programs is offered via live stream or video-on-demand through the ZDFmediathek, which has been in existence since 2001
  • Since February 2013, ZDF has been broadcasting its programs around the clock as an internet live stream
  • As of today, ZDF is one of the largest public broadcasters in Europe with permanent bureaus worldwide and is also present on various platforms like Youtube, Facebook, etc.

Here we can see that over the years, ZDF has made some major leaps – from a television broadcaster with the majority percentage of viewers to offering on-demand video service and moving to 24/7 internet live streams. ZDF has also scaled up its presence along with multiple different digital platforms as well as its physical presence all over the globe.

Integrating Zabbix with an external infrastructure monitoring system

In our first use case, we will cover integrating Zabbix with an external infrastructure monitoring system. As opposed to monitoring IT metrics like hard drive space, memory usage, or CPU loads – this external system is responsible for monitoring devices like power generators, transmission stations, and other similar components. The idea was to pass the states of these components to Zabbix. This way, Zabbix would serve as a central “Umbrella” monitoring system.

In addition, the components that are monitored by the external system have states and severities, but the severities are not static and can vary depending on the monitored component. What this means is that each component could generate problems of varying severities. We had to figure out a way to assign the correct severities to each of the external components. Our approach was split into multiple steps:

  • Use Zabbix built-in HTTP check to get LLD discovery data
    • The external monitoring system provides an API, which we can use to obtain the necessary LLD information by using the HTTP checks
    • Zabbix-sender was used for testing since the HTTP items support receiving data from it
  • Use Zabbix built-in HTTP check as a collector to obtain the component status metrics
  • Define item prototypes as dependant items to extract data from collector item
  • Create “smart “trigger prototypes to respect severity information from the LLD data

The JSON below is an example of the LLD data that we are receiving from the external monitoring systems. In addition to component names, descriptions, and categories, we are also providing the severity information. The severities that have a value of -1 are not used, while other severities are cross-checked with the status value retrieved from the returned metrics:

{
"{#NAME}": "generator-secondary",
"{#DISPLAYNAME}": "Secondary power generator",
"{#DESCRIPTION}": "Secondary emergency power generator",
"{#CATEGORY}": "Powersupply",
"{#PRIORITY.INFORMATION}": -1,
"{#PRIORITY.WARNING}": -1,
"{#PRIORITY.AVERAGE}": -1,
"{#PRIORITY.HIGH}": 1,
"{#PRIORITY.DISASTER}": 2
}

Below we can see the returned metrics – the component name and its current status. For example, status = 1 value references the {#PRIORITY.HIGH} from the LLD JSON data.

"generator-primary": {
"status": 0,
"message": "Generator is healthy."
},
"generator-secondary": {
"status": 1,
"message": "Generator is not working properly."
},

We can see that the first generator returns status = 0, which means that the generator is healthy and there are no problems, while the secondary generator is currently not working properly – status = 1 and should generate a problem with severity High.

Below we can see how the item prototypes are created for each of the components – one item prototype collects the message information, while the other collects the current status of the component. We use JSONPath preprocessing to obtain these values from our master item.

As for the trigger prototypes – we have defined a trigger prototype for each of the trigger severities. The trigger prototypes will then create triggers depending on the information contained in the LLD macros for a given component.

As you can see, the trigger expressions are also quite simple – each trigger simply checks if the last received component status matches the specific trigger threshold status value.

The resulting metrics provide us both the status value and the component status message. As we can see, the triggers are also generating problems with dynamic severities.

Improving the solution with LLD overrides

The solution works – but we can do better! You might have already guessed the underlying issue with this approach: our LLD rule creates triggers for every severity, even if it isn’t used. The threshold value for these unused triggers will use value -1, which we will never receive, so the unused triggers will always stay in the OK state. Effectively – we have created 5 trigger definitions, while in our example, we require only 2 triggers.

How can we resolve this? Thankfully, Zabbix provides just the right tool for the job – LLD Overrides! We have created 5 overrides on our discovery rule – one for each severity:

In the override conditions, we will specify that if the value contained in the priority LLD macros is equal to -1, we will not be discovering the trigger of the specific severity.

The final result looks much cleaner – now we have only two trigger definitions instead of five. 

 

This is a good example of how we can use LLD together with master items obtaining data from external APIs and also improve the LLD logic by using LLD overrides.

“Sphinx” application monitoring using Graylog REST API

For our second example, we will be monitoring the Sphinx application by using the Graylog REST API. Graylog is a log management tool that we use for log collection – it is not used for any kind of alerting. We also have an application called Sphinx, which consists of three components – a Web component, an App component, and a WCF Gateway component. Our goal here is to:

  • Use Zabbix for evaluating error messages related to Sphinx from Graylog
  • Monitor the number of errors in user-defined time intervals for different components and alert when a threshold is exceeded
  • Analyze the incoming error message and prepare them for a user-friendly output sorted by error types

The main challenges posed by this use-case are:

  • How to obtain Sphinx component information from Graylog
  • How to handle certificate problems (DH_KEY_TOO_SMALL / Diffie-Hellman key) due to an outdated version of the installed Graylog server
  • How to sort the error messages coming in “Free form” without explicit error types

Collecting the data from Graylog

Since the Graylog application used in the current scenario was outdated, we had to work around the certificate issues by using the Zabbix external check item type. Once again, we will be using master and dependent item logic – we will create three master items (one for each component) and retrieve the component data. All additional information will be retrieved by the dependent items as to not cause extra performance impact by flooding the Graylog API endpoint. The data itself was parsed and sorted by using Javascript preprocessing. The dependent item prototypes are used here to create the items for the obtained stats and the data used for visualizing each error type on a user-friendly dashboard.

Let’s take a look at the detailed workflow for this use case:

  • An External check for scanning the Graylog stream Sphinx App Raw
  • A dependent item which analyzes and filters the raw data by using preprocessing Sphinx App Raw Filtered
  • This dependent item is used as a master item for our LLD Sphinx App Error LLD
  • The same dependent item is also used as a master item for our item prototypes – Sphinx App Error count and Sphinx App Error List

Effectively this means that we perform only a single call to the Graylog API, and all of the heavy lifting is done by the dependent item in the middle of our workflow.
The following workflow is used to obtain the information only about the App component – remember, we have two other components where this will have to be implemented – Web and Gateway.

In total, we will have three master items for each of the APP components:

They will use the following shell script to execute the REST API call to the Graylog API:

graylog2zabbix.sh[{$GRAYLOG_USERNAME},{$GRAYLOG_PASSWORD},{HOST.CONN},{$GRAYLOG_PORT},search/universal/relative?
query=name%3Asphinx-app%20AND%20stage%3Aproduction%20AND%20level%3A(ERROR%20OR
%20FATAL)&amp;range=1800&amp;limit=50&amp;filter=streams%3A60000a8c1c09f9862279966e&amp;fields=name%2Clevel
%2Cmessage&amp;decorate=true]

The data that we obtain this way is extremely hard to work with without any additional processing. It very much looks like a set of regular log entries – this complicates the execution of any kind of logic in reaction to receiving this kind of data:

For this reason, we have created a dependent item, which uses preprocessing to filter and sort this data. The dependent item preprocessing is responsible for:

  • Analyzing the error messages
  • Defining the error type
  • Sorting the raw data so that we can work with it more easily

We have defined two preprocessing steps to process this data. We have the JSONPath preprocessing step to select the message from the response and a Javascript preprocessing script that does the heavy lifting. You can see the Javascript script below. It uses Regex and performs data preparation and sorting. In the last line, you can see that the data is transformed back into JSON, so we can work with it down the line by using the JSONpath preprocessing steps for our dependent items.

Below we can see the result. The data stream has been sorted and arranged by error types, which you can see on the left-hand side. All of the logged messages are now children that belong to one of these error types.

We have also created  3 LLD rules – one for each component. These LLD rules create items for each error type for each component. To achieve this, there is also some additional JSONPath and Javascript preprocessing done on the LLD rule itself:

The end result is a dashboard that uses the collected information to display the error count per component. Attached to the graph, we can see some additional details regarding the log messages related to the detected errors.

Monitoring of TV broadcast trucks

I would like to finish up this bost by talking about a completely different use case – monitoring of TV broadcast trucks!

In comparison to the previous use cases – the goals and challenges here are quite unique. We are interested in a completely different set of metrics and have to utilize a different approach to obtain them. Our goals are:

  • Monitor several metrics from different systems used in the TV broadcast truck
  • Monitor the communication availability and quality between the broadcast truck and the transmitting station
  • Only monitor the broadcast truck when it is in use

One of the main challenges for this use case is avoiding false alarms. How can we avoid false positives if a broadcast truck can be put into operation at any time without notifying the monitoring team? The end goal is to monitor the truck when it’s in use and stop monitoring it when it’s not in use.

  • Each broadcast truck is represented by a host in Zabbix – this way, we can easily put it into maintenance
  • A control host is used to monitor the connection states of all broadcasting trucks
  • We decided on creating a middleware application that would be able to implement start/stop monitoring logic
    • This was achieved by switching the maintenance on/off by using the Zabbix API
  • A specific application in the broadcasting truck then tells Zabbix how long to monitor it and when to enable the maintenance for the said truck

Below we can see the truck monitoring workflow. The truck control host gets the status for each truck to decide when to start monitoring the truck. The middleware then starts/stops the monitoring of a truck by using Zabbix API to control the maintenance periods for the trucks. Once a truck is in service, it also passes the monitoring duration to the middleware, so the middleware can decide when the monitoring of a specific truck should be turned off.

Next, let’s look at the truck control workflow from the Zabbix side.

  • Each broadcast truck is represented by a single trigger on the control host
    • The trigger actions forward the information that the truck maintenance period should be disabled to the middleware
  • Middleware uses the Zabbix API to disable the maintenance for the specific truck
  • The truck is now monitored
  • The truck forwards the Monitoring duration to the middleware
  • Once the monitoring duration is over, the middleware enables the maintenance for the specific truck

Finally, the trucks are displayed on a map which can be placed on our dashboards. The map displays if the truck is maintenance (not active) and if it has any problems. This way, we can easily monitor our broadcast truck fleet.

From gathering data from external systems to performing complex data transformations with preprocessing and monitoring our whole fleet of broadcast trucks – I hope you found these use cases useful and were able to learn a thing or two about the flexibility of different Zabbix features!

The post Zabbix meets television – Clever use of Zabbix features by Wolfgang Alper / Zabbix Summit Online 2021 appeared first on Zabbix Blog.

Landscape of API Traffic

Post Syndicated from Daniele Molteni original https://blog.cloudflare.com/landscape-of-api-traffic/

Landscape of API Traffic

Landscape of API Traffic

In recent years we have witnessed an explosion of Internet-connected applications. Whether it is a new mobile app to find your soulmate, the latest wearable to monitor your vitals, or an industrial solution to detect corrosion, our life is becoming packed with connected systems.

How is the Internet changing because of this shift? This blog provides an overview of how Internet traffic is evolving as Application Programming Interfaces (APIs) have taken the centre stage among the communication technologies. With help from the Cloudflare Radar team, we have harnessed the data from our global network to provide this snapshot of global APIs in 2021.

The huge growth in API traffic comes at a time when Cloudflare has been introducing new technologies that protect applications from nascent threats and vulnerabilities. The release of API Shield with API Discovery, Schema Validation, mTLS and API Abuse Detection has provided customers with a set of tools designed to protect their applications and data based on how APIs work and their challenges.

We are also witnessing increased adoption of new protocols. Among encryption protocols, for example, TLS v1.3 has become the most used protocol for APIs on Cloudflare while, for transport protocols, we saw an uptake of QUIC and gRPC (Cloudflare support announced in 2018 and 2020 respectively).

In the following sections we will quantify the growth of APIs and identify key industries affected by this shift. We will also look at the data to better understand the source and type of traffic we see on our network including how much malicious traffic our security systems block.

Why is API use exploding?

By working closely with our customers and observing the broader trends and data across our network in application security, we have identified three main trends behind API adoption: how applications are built is changing, API-first businesses are thriving, and finally machine-to-machine and human-to-machine communication is evolving.

During the last decade, APIs became popular because they allowed developers to separate backend and frontend, thus creating applications with better user experience. The Jamstack architecture is the most recent trend highlighting this movement, where technologies such as JavaScript, APIs and markup are being used to create responsive and high-performance applications. The growth of microservices and serverless architectures are other drivers behind using efficient HTTP-powered application interfaces.

APIs are also enabling companies to innovate their business models. Across many industries there is a trend of modularizing complex processes by integrating self-contained workflows and operations. The product has become the service delivered via APIs, allowing companies to scale and monetize their new capabilities. Financial Services is a prime example where a monolithic industry with vertically integrated service providers is giving way to a more fragmented landscape. The new Open Banking standard (PSD2) is an example of how small companies can provide modular financial services that can be easily integrated into larger applications. Companies like TrueLayer have productized APIs, allowing e-commerce organizations to onboard new sellers to a marketplace within seconds or to deliver more efficient payment options for their customers. A similar shift is happening in the logistics industry as well, where Shippo allows the same e-commerce companies to integrate with services to initiate deliveries, print labels, track goods and streamline the returns process. And of course, everything is powered by APIs.

Finally, the increase of connected devices such as wearables, sensors and robots are driving more APIs, but another aspect of this is the way manual and repetitive tasks are being automated. Infrastructure-as-Code is an example of relying on APIs to replace manual processes that have been used to manage Internet Infrastructure in the past. Cloudflare is itself a product of this trend as our solutions allow customers to use services like Terraform to configure how their infrastructure should work with our products.

Labelling traffic

The data presented in the following paragraphs is based on the total traffic proxied by Cloudflare and traffic is classified according to the Content-Type header generated in the response phase. Only requests returning a 200 response were included in the analysis except for the analysis in the ‘Security’ section where other error codes were included. Traffic generated by identified bots is not included.

When looking at trends, we compare data from the first week of February 2021 to the first week of December 2021. We chose these dates to compare how traffic changed over the year but excluding January which is affected by the holiday season.

Specifically, API traffic is labelled based on responses with types equal application/json, application/xml, and text/xml, while Web accounts for text/html, application/x-javascript, application/javascript, text/css, and text/javascript. Requests categorised as Text are text/plain; Binary are application/octet-stream; Media includes all image types, video and audio.

Finally, Other catches everything that doesn’t clearly fall into the labels above, which includes empty and unknown. Part of this traffic might be API and the categorisation might be missing due to the client or server not adding a Content-Type header.

API use in 2021

We begin by examining the current state of API traffic at our global network and the types of content served. During the first week of December 2021, API calls represented 54% of total requests, up from 52% during the first week of February 2021.

Landscape of API Traffic

When looking at individual data types, API was by far the fastest growing data type (+21%) while Web only grew by 10%. Media (such as images and videos) grew just shy of 15% while binary was the only traffic that in aggregate experienced a reduction of 6%.

Landscape of API Traffic

In summary, APIs have been one of the drivers of the traffic growth experienced by the Cloudflare network in 2021. APIs account for more than half of the total traffic generated by end users and connected devices, and they’re growing twice as fast as traditional web traffic.

New industries are contributing to this increase

We analysed where this growth comes from in terms of industry and application types. When looking at the total volume of API traffic, unsurprisingly the general Internet and Software industry accounts for almost 40% of total API traffic in 2021. The second-largest industry in terms of size is Cryptocurrency (7% of API traffic) followed by Banking and Retail (6% and 5% of API traffic respectively).

The following chart orders industries according to their API traffic growth. Banking, Retail and Financial Services have experienced the largest year-on-year growth with 70%, 51% and 50% increases since February 2021, respectively.

Landscape of API Traffic

The growth of Banking and Financial Services traffic is aligned with the trends we have observed anecdotally in the sector. The industry has seen the entrance of a number of new platforms that aggregate accounts from different providers, streamline transactions, or allow investing directly from apps, all of which rely heavily on APIs. The new “challenger banks” movement is an example where newer startups are offering captivating mobile services based on APIs while putting pressure on larger institutions to modernise their infrastructure and applications.

A closer look at the API characteristics

Generally speaking, a RESTful API request is a call to invoke a function. It includes the address of a specific resource (the endpoint) and the action you want to perform on that resource (method). A payload might be present to carry additional data and HTTP headers might be populated to add information about the origin of the call, what software is requesting data, requisite authentication credentials, etc. The method (or verb) expresses the action you want to perform, such as retrieve information (GET) or update information (POST).

It’s useful to understand the composition and origin of API traffic, such as the most commonly used methods, the most common protocol used to encode the payload, or what service generates traffic (like Web, mobile apps, or IoT). This information will help us identify the macro source of vulnerabilities and design and deploy the best tools to protect traffic.

Methods

The vast majority of API traffic is the result of POST or GET requests (98% of all requests). POST itself accounts for 53.4% of all requests and GET 44.4%. Generally speaking, GET tends to transfer sensitive data in the HTTP request header, query and in the response body, while POST typically transfers data in the request header and body. While many security tools apply to both of these types of calls, this distinction can be useful when deploying tools such as API Schema Validation (request and response) or Data Loss Prevention/Sensitive Data Detection (response), both launched by Cloudflare in March 2021.

Landscape of API Traffic

Payload encoding review

API payloads encode data using different rules and languages that are commonly referred to as transport protocols. When looking at the breakdown between two of the most common protocols, JSON has by far the largest number of requests (~97%) while XML has a smaller share of requests as it still carries the heaviest traffic. In the following figure, JSON and XML are compared in terms of response sizes. XML is the most verbose protocol and the one handling the largest payloads while JSON is more compact and results in smaller payloads.

Landscape of API Traffic

Since we have started supporting gRPC (September 2020), we have seen a steady increase in gRPC traffic and many customers we speak with are in the planning stages of migrating from JSON to gRPC, or designing translation layers at the edge from external JSON callers to internal gRPC services.

Source of API traffic

We can look at the HTTP request headers to better understand the origin and intended use of the API. The User-Agent header allows us to identify what type of client made the call, and we can divide it into three broader groups: “browser”, “non-browser” and “unknown” (which indicates that the User-Agent header was not set).

About 38% of API calls are made by browsers as part of a web application built on top of backend APIs. Here, the browser loads an HTML page and populates dynamic fields by generating AJAX API calls against the backend service. This paradigm has become the de-facto standard as it provides an effective way to build dynamic yet flexible Web applications.

The next 56% comes from non-browsers, including mobile apps and IoT devices with a long tail of different types (wearables, connected sport equipment, gaming platforms and more). Finally, approximately 6% are “unknown” and since well-behaving browsers and tools like curl send a User-Agent by default, one could attribute much of this unknown to programmatic or automated tools, some of which could be malicious.

Landscape of API Traffic

Encryption

A key aspect of securing APIs against snooping and tampering is encrypting the session. Clients use SSL/TLS to authenticate the server they are connecting with, for example, by making sure it is truly their cryptocurrency vendor. The benefit of transport layer encryption is that after handshaking, all application protocol bytes are encrypted, providing both confidentiality and integrity assurances.

Cloudflare launched the latest version of TLS (v1.3) in September 2016, and it was enabled by default on some properties in May 2018. When looking at API traffic today, TLS v1.3 is the most adopted protocol with 55.9% of traffic using it. The vulnerable v1.0  and v1.1 were deprecated in March 2021 and their use has virtually disappeared.

Transport security protocol December 2021
TLS 1.3 55.9%
TLS 1.2 32.7%
QUIC 8.4%
None 2.8%
TLS 1.0 0.3%

The protocol that is growing fastest is QUIC. While QUIC can be used to carry many types of application protocols, Cloudflare has so far focused on HTTP/3, the mapping of HTTP over IETF QUIC. We started supporting draft versions of QUIC in 2018 and when QUIC version 1 was published as RFC 9000 in May 2021, we enabled it for everyone the next day. QUIC uses the TLS 1.3 handshake but has its own mechanism for protecting and securing packets. Looking at HTTP-based API traffic, we see HTTP/3 going from less than 3% in early February 2021 to more than 8% in December 2021. This growth broadly aligns RFC 9000 being published and during the periodHTTP/3 support being stabilized and enabled in a range of client implementations.

Mutual TLS, which is often used for mobile or IoT devices, accounts for 0.3% of total API traffic. Since we released the first version of mTLS in 2017 we’ve seen a growing number of inquiries from users across all Cloudflare plans, as we have recently made it easier for customers to start using mTLS with Cloudflare API Shield. Customers can now use Cloudflare dashboard to issue and manage certificates with one-click avoiding all the complexity of having to manage a Private Key Infrastructure and root certificates themselves.

Finally, unencrypted traffic can provide a great opportunity for attackers to access plain communications. The total unencrypted API traffic dropped from 4.6% of total requests in early 2021 to 2.6% in December 2021. This represents a significant step forward in establishing basic security for all API connections.

Security

Given the huge amount of traffic that Cloudflare handles every second, we can look for trends in blocked traffic and identify common patterns in threats or attacks.

When looking at the Cloudflare security systems, an HTML request is twice as likely to be blocked than an API request. Successful response codes (200, 201, 301 and 302) account for 91% of HTML and 97% of API requests, while 4XX error codes (like 400, 403, 404) are generated for 2.8% of API calls as opposed to 7% of HTML. Calls returning 5XXs codes (such as Internal Server Error, Bad Gateway, Service Unavailable) are almost nonexistent for APIs (less than 0.2% of calls) while are almost 2% of requests for HTML.

The relatively larger volume of unmitigated API requests can be explained by the automated nature of APIs, for example more API calls are generated in order to render a page that would require a single HTML request. Malicious or malformed requests are therefore diluted in a larger volume of calls generated by well-behaving automated systems.

Landscape of API Traffic

We can further analyse the frequency of specific error codes to get a sense of what the most frequent malformed (and possibly malicious) requests are. In the following figure, we plot the share of a particular error code when compared to all 4XXs.

Landscape of API Traffic

We can identify three groups of issues all equally likely (excluding the more obvious “404 Not Found” case): “400 Bad Request” (like malformed, invalid request), “429 Too Many Requests” (“Rate Limiting”), and the combination of Authentication and Authorization issues (“403 Forbidden” and “401 Unauthorized”). Those codes are followed by a long tail of other errors, including “422 Unprocessable Entity”, “409 Conflict”, and “402 Payment Required”.

This analysis confirms that the most common attacks rely on sending non-compliant requests, brute force efforts (24% of generated 4XXs are related to rate limiting), and accessing resources with invalid authentication or permission.

We can further analyse the reason why calls were blocked (especially relative to the 400s codes) by looking at what triggered the Cloudflare WAF. The OWASP and the Cloudflare Managed Ruleset are tools that scan incoming traffic looking for fingerprints of known vulnerabilities (such as SQLi, XSS, etc.) and they can provide context on what attack was detected.

A portion of the blocked traffic has triggered a managed rule for which we can identify the threat category. Although a malicious request can match multiple categories, the WAF assigns it to the first threat that is identified. User-Agent anomaly is the most common reason why traffic is blocked. This is usually triggered by the lack of or by a malformed User-Agent header, capturing requests that do not provide enough credible information on what type of client has sent the request. The next most common threat is cross-site scripting. After these two categories, there is a long tail of other anomalies that were identified.

Landscape of API Traffic

Conclusions

More than one out of two requests we process is an API call, and industries such as Banking, Retail and Financial Services are leading in terms of adoption and growth.

Furthermore, API calls are growing twice as fast as HTML traffic, making it an ideal candidate for new security solutions aimed at protecting customer data.

Maximum redirects, minimum effort: Announcing Bulk Redirects

Post Syndicated from Sam Marsh original https://blog.cloudflare.com/maximum-redirects-minimum-effort-announcing-bulk-redirects/

Maximum redirects, minimum effort: Announcing Bulk Redirects

404: Not Found

Maximum redirects, minimum effort: Announcing Bulk Redirects

The Internet is a dynamic place. Websites are constantly changing as technologies and business practices evolve. What was front-page news is quickly moved into a sub-directory. To ensure website visitors continue to see the correct webpage even if it has been moved, administrators often implement URL redirects.

A URL redirect is a mapping from one location on the Internet to another, effectively telling the visitor’s browser that the location of the page has changed, and where they can now find it. This is achieved by providing a virtual ‘link’ between the content’s original and new location.

URL Redirects have typically been implemented as Page Rules within Cloudflare, up to a maximum of 125 URL redirects per zone. This limitation meant customers with a need for more URL redirects had to implement alternative solutions such Cloudflare Workers to achieve their goals.

To simplify the management and implementation of URL redirects at scale we have created Bulk Redirects. Bulk Redirects is a new product that allows an administrator to upload and enable hundreds of thousands of URL redirects within minutes, without having to write a single line of code.

We’ve moved!

Mail forwarding is a product offered by postal services such as USPS and Royal Mail that allows you to continue to receive letters and parcels even if they are sent to an address where you no longer reside.

The postal services achieve this by effectively maintaining a register of your new location and your old location. This allows the systems to detect ‘this letter is for Sam Marsh at address A, but he now lives at address B, therefore send the mail there’.

Maximum redirects, minimum effort: Announcing Bulk Redirects
Photo by Christian Lue on Unsplash

This problem can be solved by manually updating my bank, online shops, etc. and having them send the parcels and letters directly to my new address. However, that assumes I know of every person and business who has my address. And it also relies on those people to remember I moved address and make updates on their side. For example, Grandma Marsh might have forgotten about my new address — I’ve moved a lot — and she may send my birthday card to my old address. Or all those Christmas cards from people who I don’t speak with regularly. Those will go to my old address also.  To solve this, I can use mail forwarding to ensure I still receive my cards and other mail, even though I no longer live at that address.

URL redirects are the Internet equivalent of mail forwarding.

URL redirects are effectively a table with two columns; what traffic am I looking for, and where should I send that traffic to? This mapping allows an administrator to define “whenever visitors go to https://www.cloudflare.com/bots I want to redirect them to the new location https://www.cloudflare.com/pg-lp/bot-mitigation-fight-mode.

With this technology, our sales and marketing teams can use the vanity URL all across the Internet, safe in the knowledge that should the backend systems change they won’t need to go to all the places this URL has been posted and update it. Instead, the intermediary system that handles the URL redirects can be updated. One location. Not thousands.

Why use URL redirects?

URL redirects are used to solve a number of use cases. One such common use case is to use URL redirects to force all visitors to connect to the website over a secure HTTPS connection, instead of via plain HTTP, to improve security. It’s such a common use case we created a toggle in the Cloudflare dashboard, “Always use HTTPS”, which redirects all HTTP requests to HTTPS when enabled.

URL redirects are also used for vanity domains and hyperlinks. In these scenarios, URL redirects are deployed to provide a mapping of short, user-friendly URLs to long, server-friendly URLs.  Not only are shorter URLs more memorable, but they are better scoring from an SEO perspective. According to Backlinko, ‘Excessively long URLs may hurt a page’s search engine visibility. In fact, several industry studies have found that short URLs tend to have a slight edge in Google’s search results.’.

Maximum redirects, minimum effort: Announcing Bulk Redirects
Photo by Hal Gatewood on Unsplash

Another use case is where a company may have a local domain for each of their markets, which they want to redirect back to the main website, e.g., redirect www.example.fr and www.example.de to www.example.com/eu/fr and www.example.com/eu/de, respectively.

This also covers company acquisition, where a company is acquired and the acquiring company wants to redirect hyperlinks to the relevant pages on their own website, e.g., redirect www.example.com to www.companyB.com/portfolio/example.

Finally, one of the most common use cases for URL redirects is to maintain uptime during a website migration. As companies migrate their websites from one platform to another, or one domain to another, URL redirects ensure visitors continue to see the correct content. Without these URL redirects, hyperlinks in emails, blogs, marketing brochures, etc. would fail to load, potentially costing the business revenue in lost sales and brand damage. For example, www.example.com/products/golf/product-goes-here would redirect to the new website at products.example.com/golf/product-goes-here.

How are URL redirects implemented today?

Ensuring these URL redirects are executed correctly is often the job of the reverse proxy — a server which sits between the client and the origin whose job is, amongst many others, to re-route received traffic to the correct destination.

For example, when using NGINX, a popular web server, the administrator would have a line in the config similar to the one below to implement a URL redirect:

`rewrite ^/oldpage$ http://www.example.com/newpage permanent;`

Historically, these web servers were located physically within a company’s data center. Administrators then had full control over the URLs received, and could create the redirect rules as and when needed.

As the world rapidly migrates on-premise applications and solutions to the cloud, administrators can find themselves in a situation where they can no longer do what they previously could. Not being responsible for the origin has a number of benefits, but it also comes with drawbacks such as lack of ‘control’. Previously, an administrator could quickly add a few config lines to the web server in front of their ecommerce platform. Moving to an online hosted platform makes this much more difficult to do.

As such, administrators have moved to platforms like Cloudflare where functionality such as URL redirects can be implemented in the cloud without the need to have administrator access to the origin.

The first way to implement a URL Redirect in Cloudflare is via a Forwarding URL Page Rule. Users can create a Page Rule which matches on a specific URL and redirects matching traffic to another specific URL, along with a status code — either a permanent redirect (301) or a temporary redirect (302):

Maximum redirects, minimum effort: Announcing Bulk Redirects

Another method is to use Cloudflare Workers to implement URL redirects, either individually or as a map. For example, the code below is used to create a URL redirect map which runs when the Worker is invoked:

const redirectMap = new Map([
 ["/bulk1", "https://" + externalHostname + "/redirect2"],
 ["/bulk2", "https://" + externalHostname + "/redirect3"],
 ["/bulk3", "https://" + externalHostname + "/redirect4"],
 ["/bulk4", "https://google.com"],
])

This snippet is taken from the Cloudflare Workers examples library and can be used to scale beyond the 125 URL redirect limit of Page Rules. However, it does require the administrator to be comfortable working with code and correctly configuring their Cloudflare Workers.

Introducing: Bulk Redirects

Speaking with Cloudflare users about URL redirects and their experience with our product offerings, “Give me a product which lets me upload thousands of URL redirects to Cloudflare via a GUI” was a very common request. Customers we interviewed typically wanted a simple way to upload a list of ‘from,to,response code’ without having to write a single line of code. And that’s what we are announcing today.

Bulk Redirects is now available for all Cloudflare plans. It is an account/organization-level product capable of supporting hundreds of thousands of URL redirects, all configured via the dashboard without having to write a single line of code.

The system is implemented in two parts. The first part is the Bulk Redirect List. This is effectively the redirect map, or ‘edge dictionary’, where users can upload their URL redirects:

Maximum redirects, minimum effort: Announcing Bulk Redirects

Each URL redirect within the list contains three main elements. The first two elements are Source URL (the URL we are looking for) and Target URL (the URL we are going to redirect matching traffic to).

There is also the Status code. This is the ‘type’ of redirect. In addition to 301 (Moved Permanently) and 302 (Moved Temporarily) redirects, we have added support for the newer 307 (Temporary Redirect) and 308 (Permanent Redirect) redirect status codes.

We have added support for specifying destination ports within the Target URL field also, allowing URL redirects to non-standard ports, e.g., “Target URL: www.example.com:8443”.

If you have many URL redirects, you can upload them via a CSV file.

There are also four additional parameters available for each individual URL redirect.

Firstly, we have added two options to replace the ambiguity and confusion caused by the use of asterisks as wildcards. Take this source URL as an example: *.example.com/a/b. Would you expect www.example.com/a/b to match? How about example.com/a/b, or www.example.com/path*? Asterisks used as wildcards cause confusion and misunderstanding, and also increase the cost of implementation and maintenance from an engineering perspective. Therefore, we are not implementing them in Bulk Redirects.

Instead, we have added two discrete options: Include subdomains and Subpath matching. The Include subdomains option, once enabled, will match all subdomains to the left of the domain portion of the URL as well as the domain specified. For example, if there is a URL redirect with a source URL of example.com/a then traffic to b.example.com/a and c.b.example.com/a will also be redirected.

The Subpath matching option focuses on the opposite end of the URL. If this option is enabled, the redirect applies to the URL as well as all its subpaths. For example, if we have a URL redirect on www.example.com/foo with subpath matching enabled, we will match on that specific URL as well as all subpaths, e.g., www.example.com/foo/a, www.example.com/foo/a/, ` www.example.com/foo/a/b/c`, etc., but not www.example.com/foobar.

These options provide a tremendous amount of flexibility and granularity for each URL redirect. However, for most use cases only the source URL, target URL, and status code options will need to be set.

Secondly, we have added two options relating to retaining portions of the original HTTP request: Preserve path suffix and Preserve query string. If subpath matching is enabled, Preserve path suffix can be used to copy the URI path from the originally requested URL and add it to the destination URL. For example, if there is a URL redirect of Source URL: example.co.uk, Target URL: www.example.com/a, then requests to example.co.uk/target will be redirected to www.example.com/a/target with both options enabled. Preserve query string can be used independently of the other options, and carries forward the URI query from the originally requested URL to the new URL.

Lists by themselves do not provide any redirection, they are simply the ‘lookup table’. To enable them we need to reference them via a Bulk Redirect Rule.

The rules themselves are very simple. By default, the user experience is to provide a name for the rule, a description, and select the Bulk Redirect List that should be invoked.

Maximum redirects, minimum effort: Announcing Bulk Redirects

For users who require more granularity and control there are additional settings available under the Advanced options toggle. Within this section there are two editable sections:  Expression and Key.

Maximum redirects, minimum effort: Announcing Bulk Redirects

The first field, Expression, specifies the conditions that must be met in order for the rule to run. By default, all URL redirects of the specified list will apply.

The second field, Key, is closely related to the expression. The key is used in combination with the specified list to select the URL redirect to apply. The field used for the key should always be the same as the field used in the expression, i.e., the key should be http.request.full_uri if the field in the expression is http.request.full_uri, or conversely, the key should be raw.http.request.full_uri if the field in the expression is raw.http.request.full_uri.

There are two main use cases for modifying these settings. Firstly, users can edit these options to increase specificity in the trigger, e.g., ip.src.country == "GB" and http.request.full_uri in $redirect_list. This is useful for ensuring Bulk Redirect Lists are only applied when a visitor comes from specific countries, subnets, or ASNs — or also only applying a URL redirect list if the visitor is a verified bot, or the bot score is >35.

Secondly, users can edit these options to amend the URL being matched and used as a lookup in the given list, i.e., the user may choose to have URL redirects in their list(s) specifically for URLs that would be normalized, e.g., URLs containing specific percent-encoding.  To ensure these URL redirects still trigger, the settings in Advanced options should be used to edit the expression and key to use the raw.http.request.full_uri field instead.

Automating via the API

Another way to manage bulk redirects is via our API. Customers wishing to automate the addition of bulk redirects can use the API to either simply add URL redirects to an existing list, or automate the entire workflow — creating a list, adding URL redirects to the list, and enabling the list via a new redirect rule.

There are three main calls when creating bulk redirects via the API:

  1. Create the redirect list
  2. Load with URL redirects
  3. Enable via a rule (You will also need to create the ruleset if doing this for the first time).

For step 1, first create a mass redirect list via the API call:

curl --location --request POST 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/rules/lists' \
--header 'X-Auth-Email: <EMAIL_ADDRESS>' \
--header 'Content-Type: application/json' \
--header 'X-Auth-Key: <API_KEY>' \
--data-raw '{
 "name": "my_redirect_list_2",
 "description": "My redirect list 2",
 "kind": "redirect"
}'

The output will look similar to:

{
  "result": {
    "id": "499b94da726d4dbc9ce6bf6c96ef8b6a",
    "name": "my_redirect_list_2",
    "description": "My redirect list 2",
    "kind": "redirect",
    "num_items": 0,
    "num_referencing_filters": 0,
    "created_on": "2021-12-04T06:43:43Z",
    "modified_on": "2021-12-04T06:43:43Z"
  },
  "success": true,
  "errors": [],
  "messages": []
}

Capture the value of “id”, as this is the list id we will then add URL redirects to.

Next, in step 2 we will add URL redirects to our newly created list by executing a POST call to the id we captured previously – with our URL redirects in the body:

curl --location --request POST 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/rules/lists/<LIST_ID>/items' \
--header 'X-Auth-Email: <EMAIL_ADDRESS>' \
--header 'Content-Type: application/json' \
--header 'X-Auth-Key: <API_KEY>' \
--data-raw '[
 {
   "redirect": {
     "source_url": "www.example.com/a",
     "target_url": "https://www.example.net/a"
   }
 },
 {
   "redirect": {
     "source_url": "www.example.com/b",
     "target_url": "https://www.example.net/a/b",
     "status_code": 307,
     "include_subdomains": true
   }
 },
 {
   "redirect": {
     "source_url": "www.example.com/c",
     "target_url": "www.example.net/c",
     "status_code": 307,
     "include_subdomains": true
   }   
 }
]'

The output will look similar to:

{
  "result": {
    "operation_id": "491ab6411acf4a12a6c72df1385b095a"
  },
  "success": true,
  "errors": [],
  "messages": []
}

In step 3 we enable this list by creating a new mass redirect rule within the mass redirect account-level ruleset.

Note, if this is the first time you are creating a redirect rule you will need to use a different API call to create the ruleset. See the documentation here for more details. All subsequent updates to the rulesets are made by calls similar to below.

Firstly, we need to find our account-level rulesets id. To do this we need to get a list of all account-level rulesets and look for the ruleset with the phase http_request_redirect:

curl --location --request GET 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/rulesets \
--header 'X-Auth-Email: <EMAIL_ADDRESS>' \
--header 'Content-Type: application/json' \
--header 'X-Auth-Key: <API_KEY>'

The output will look similar to:

{
   "result": [
       {
           "id": "efb7b8c949ac4650a09736fc376e9aee",
           "name": "Cloudflare Managed Ruleset",
           "description": "Created by the Cloudflare security team, this ruleset is designed to provide fast and effective protection for all your applications. It is frequently updated to cover new vulnerabilities and reduce false positives.",
           "source": "firewall_managed",
           "kind": "managed",
           "version": "34",
           "last_updated": "2021-10-25T18:33:27.512161Z",
           "phase": "http_request_firewall_managed"
       },
       {
           "id": "4814384a9e5d4991b9815dcfc25d2f1f",
           "name": "Cloudflare OWASP Core Ruleset",
           "description": "Cloudflare's implementation of the Open Web Application Security Project (OWASP) ModSecurity Core Rule Set. We routinely monitor for updates from OWASP based on the latest version available from the official code repository",
           "source": "firewall_managed",
           "kind": "managed",
           "version": "33",
           "last_updated": "2021-10-25T18:33:29.023088Z",
           "phase": "http_request_firewall_managed"
       },
       {
           "id": "5ff4477e596448749d67da859230ac3d",
           "name": "My redirect ruleset",
           "description": "",
           "kind": "root",
           "version": "1",
           "last_updated": "2021-12-04T06:32:58.058744Z",
           "phase": "http_request_redirect"
       }
   ],
   "success": true,
   "errors": [],
   "messages": []
}

Our redirect ruleset is at the bottom of the output. Next we will add our new bulk redirect rule to this ruleset:

curl --location --request PUT 'https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/rulesets/<RULESET_ID> \
--header 'X-Auth-Email: <EMAIL_ADDRESS>' \
--header 'Content-Type: application/json' \
--header 'X-Auth-Key: <API_KEY> \
--data-raw '{
     "rules": [
   {
     "expression": "http.request.full_uri in $my_redirect_list",
     "description": "Bulk Redirect rule 2",
     "action": "redirect",
     "action_parameters": {
       "from_list": {
         "name": "my_redirect_list_2",
         "key": "http.request.full_uri"
       }
     }
   }
 ]
}'

The output will look similar to:

{
  "result": {
    "id": "5ff4477e596448749d67da859230ac3d",
    "name": "My redirect ruleset",
    "description": "",
    "kind": "root",
    "version": "2",
    "rules": [
      {
        "id": "615cf6ac24c04f439138fdc16bd20535",
        "version": "1",
        "action": "redirect",
        "action_parameters": {
          "from_list": {
            "name": "my_redirect_list_2",
            "key": "http.request.full_uri"
          }
        },
        "expression": "http.request.full_uri in $my_redirect_list",
        "description": "Bulk Redirect rule 2",
        "last_updated": "2021-12-04T07:04:16.701379Z",
        "ref": "615cf6ac24c04f439138fdc16bd20535",
        "enabled": true
      }
    ],
    "last_updated": "2021-12-04T07:04:16.701379Z",
    "phase": "http_request_redirect"
  },
  "success": true,
  "errors": [],
  "messages": []
}

With those API calls executed, our new list is created, loaded with URL redirects and enabled by the bulk redirect rule. Visitors to the URLs specified in our list will now be redirected appropriately.

Account-level benefits

One of the driving forces behind this product is the desire to make life easier for those customers with a large number of zones on Cloudflare. For these customers, URL redirects are a pain point when using Page Rules, as they need to navigate into each zone and configure URL redirects one at a time. This doesn’t scale very well.

Bulk Redirects add real value for customers in this situation. Instead of having to navigate into 400 zones and create one or two Page Rules for each, an administrator can now create and upload a single Bulk Redirect List, which contains all the URL redirects for the zones under management.

This means that if the customer simply wants 399 of those 400 zones to redirect to the “primary zone”, they can create a bulk redirect list with 399 entries, all pointing to example.com, and enable the Subpath matching and Include subdomains options on each. This vastly simplifies the management of the estate.

The same premise also applies to SSL for SaaS customers. For example, if example.com has 20 custom hostnames in their zone, customers can now create a Bulk Redirect List and Rule for each custom hostname, grouping each customer’s URL redirects into their own logical buckets.

Bulk Redirects is a game changer for companies with a large number of zones and customers under management.

Allowances

Bulk Redirects are available for all accounts. The packaging model for Bulk Redirects closely resembles that of “IP Lists”. Accounts are entitled to a set number of Edge Rules (from which “Bulk Redirect Rules” draws down), Bulk Redirect Lists, and URL Redirects depending on the highest Cloudflare plan within their account.

Feature Enterprise Business Pro Free
Edge Rules (for use of Bulk Redirect Rules) 50+ 15 15 15
Bulk Redirect Lists 25+ 5 5 5
URL Redirects 10,000+ 500 500 20

For example, an account with ten zones, all on the Free plan, would be entitled to 15 Edge Rules, 5 Bulk Redirect Lists, and 20 URL Redirects that can be stored within those lists.

An account with one Pro zone and 2 Free plan zones would be entitled to 15 Edge Rules, 5 Bulk Redirect Lists, and 500 URL Redirects that can be stored within those lists.

Enterprise customers have a default of 10,000 URL Redirects to be used across 25 lists. However, these numbers are negotiable on enquiry.

Planned enhancements

We intend to make a number of incremental improvements to the product in the coming months, specifically to the list experience to allow for the editing of URL redirects and also for searching within lists.

In the near future we intend to bring to market a product to fulfill the other common request for URL redirects, and deliver ‘Dynamic URL Redirects’. Whilst Bulk Redirects supports hundreds of thousands of URL redirects, those URL redirects are relatively prescriptive — from a.com/b to b.com/a, for example.  There is still a requirement for supporting more complex, rich URL redirects, e.g., device-specific URL redirects, country-specific URL redirects, URL redirects that allow regular expressions in their target URL, and so forth. We aspire to offer a full range of functionality to support as many use cases as possible.

Try it now

Bulk Redirects can be used to improve operations, simplify complex configurations, and ease website management, amongst many other use cases.  Try out Bulk Redirects yourself today.

Advanced Zabbix API – 5 API use cases to improve your API workfows

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/advanced-zabbix-api-5-api-use-cases-to-improve-your-api-workfows/16801/

As your monitoring infrastructures evolve, you might hit a point when there’s no avoiding using the Zabbix API. The Zabbix API can be used to automate a particular part of your day-to-day workflow, troubleshoot your monitoring or to simply analyze or get statistics about a specific set of entities.

In this blog post, we will take a look at some of the more advanced API methods and specific method parameters and learn how they can be used to improve your API workflows.

1. Count entities with CountOutput

Let’s start with gathering some statistics. Let’s say you have to count the number of some matching entities – here we can use the CountOutput parameter. For a more advanced use case – what if we have to count the number of events for some time period? Let’s combine countOutput with time_from and time_till (in unixtime) and get the number of events created for the month of November. Let’s get all of the events for the month of November that have the Disaster severity:

{
"jsonrpc": "2.0",
"method": "event.get",
"params": {
"output": "extend",
"time_from": "1635717600",
"time_till": "1638223200",
"severities": "5",
"countOutput": "true"
},
"auth": "xxxxxx",
"id": 1
}

2. Use API to perform Configuration export/import

Next, let’s take a look at how we can use the configuration.export method to export one of our templates in yaml:

{
"jsonrpc": "2.0",
"method": "configuration.export",
"params": {
"options": {
"templates": [
"10001"
]
},
"format": "yaml"
},
"auth": "xxxxxx",
"id": 1
}

Now let’s copy and paste the result of the export and import the template into another environment. It’s extremely important to remember that for this method to work exactly as we intend to, we need to include the parameters that specify the behavior of particular entities contained in the configuration string, such as items/value maps/templates, etc. For example, if I exclude the templates parameter here, no templates will be imported.

{
"jsonrpc": "2.0",
"method": "configuration.import",
"params": {
"format": "yaml",
"rules": {
"valueMaps": {
"createMissing": true,
"updateExisting": true
},
"items": {
"createMissing": true,
"updateExisting": true,
"deleteMissing": true
},
"templates": {
"createMissing": true,
"updateExisting": true
},

"templateLinkage": {
"createMissing": true
}
},
"source": "zabbix_export:\n version: '5.4'\n date: '2021-11-13T09:31:29Z'\n groups:\n -\n uuid: 846977d1dfed4968bc5f8bdb363285bc\n name: 'Templates/Operating systems'\n templates:\n -\n uuid: e2307c94f1744af7a8f1f458a67af424\n template: 'Linux by Zabbix agent active'\n name: 'Linux by Zabbix agent active'\n 
...
},
"auth": "xxxxxx",
"id": 1
}

3. Expand trigger functions and macros with expand parameters

Using trigger.get to obtain information about a particular set of triggers is a relatively common practice. One particular caveat that we have to consider is that by default macros in trigger name, expression or descriptions are not expanded. To expand the available macros we need to use the expand parameters:

{
"jsonrpc": "2.0",
"method": "trigger.get",
"params": {
"triggerids": "18135",
"output": "extend",
"expandExpression":"1",
"selectFunctions": "extend"
},
"auth": "xxxxxx",
"id": 1
}

4. Obtaining additional LLD information for a discovered item

If we wish to display additional LLD information for a discovered entity, in this case – an item, we can use the selectDiscoveryRule and selectItemDiscovery parameters.
While selectDiscoveryRule will provide the ID of the LLD rule that created the item, selectItemDiscovery can point us at the parent item prototype id from which the item was created, last discovery time, item prototype key, and more.

The example below will return the item details and will also provide the LLD rule and Item prototype IDs, the time when the lost item will be deleted and the last time the item was discovered:

{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"itemids":"36717",
"selectDiscoveryRule":"1",
"selectItemDiscovery":["lastcheck","ts_delete","parent_itemid"]
}, "auth":"xxxxxx",
"id": 1
}

5. Searching through the matched entities with search parameters

Zabbix API provides a couple of standard parameters for performing a search. With search parameter, we can search string or text fields and try to find objects based on a single or multiple entries. searchByAny parameter is capable of extending the search – if you set this as true, we will search by ANY of the criteria in the search array, instead of trying to find an entity that matches ALL of them (default behavior).

The following API call will find items that match agent and Zabbix keys on a particular template:

{
"jsonrpc": "2.0",
"method": "item.get",
"params": {
"output": "extend",
"templateids": "10001",
"search": {
"key_": ["agent.","zabbix"]
},
"searchByAny":"true",
"sortfield": "name"
},
"auth": "xxxxxx",
"id": 1
}

Feel free to take the above examples, change them around so they fit your use case and you should be able to quite easily implement them in your environment. There are many other use cases that we might potentially cover down the line – if you have a specific API use case that you wish for us to cover, feel free to leave a comment under this post and we just might cover it in one of the upcoming blog posts!

Simplifying Zabbix API workflows with named Zabbix API tokens

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/simplifying-zabbix-api-workflows-with-named-zabbix-api-tokens/16653/

Zabbix API enables you to collect any and all information from your Zabbix instance by using a multitude of API methods. You can even utilize Zabbix API calls in your HTTP items. For example, this can be used to monitor the number of particular sets of metrics and visualize their growth over time. With named Zabbix API tokens, such use cases are a lot more simple to implement.

Before Zabbix 5.4 we had to perform the user.login API call to obtain the authentication token. Once the user session was closed, we had to relog, obtain the new authentication token and use it in the subsequent API calls.

With the pre-defined named Zabbix API tokens, you don’t have to constantly check if the authentication token needs to be updated. Starting from Zabbix 5.4 you can simply create a new named Zabbix API token with an expiration date and use it in your API calls.

Creating a new named Zabbix API token

The Zabbix API token creation process is extremely simple. All you have to do is navigate to Administration – General – API tokens and create a new API token. The named API tokens are created for a particular user and can have an optional expiration date and time – otherwise, the tokens are defined without an expiry date.

You can create a named API token in the API tokens section, under Administration – General

Once the Token has been created, make sure to store it somewhere safe, since you won’t be able to recover it afterward. If the token is lost – you will have to recreate it.

Make sure to store the auth token!

Don’t forget, that when defining a role for the particular API user, we can restrict which API methods this user has access to.

Simplifying API tasks with the named API token

There are many different use cases where you could implement Zabbix API calls to collect some additional information. For this blog post, I will create an HTTP item that uses item.get API call to monitor the number of unsupported items.

To achieve that, I will create an HTTP item on a host (This can be the default Zabbix server host or a host dedicated to collecting metrics via Zabbix API calls) and provide the API call in the request body. Since the named API token now provides a static authentication token until it expires, I can simply use it in my API call without the need to constantly keep it updated.

An HTTP agent item that uses a Zabbix API call in its request body

{
    "jsonrpc": "2.0",
    "method": "item.get",
    "params": {
			"countOutput":"1",
			 "filter": {
 "state": "1"
 }
    },
    "id": 2,
    "auth": "b72be8cf163438aacc5afa40a112155e307c3548ae63bd97b87ff4e98b1f7657"
}

HTTP item request body, which returns a count of unsupported items

I will also use regular expression preprocessing to obtain the numeric value from the API call result – otherwise, we won’t be able to graph our value or calculate trends for it.

Regular expression preprocessing step to obtain a numeric value from our Zabbix API call result

Utilizing Zabbix API scripts in Actions

In one of our previous blog posts, we covered resolving problems automatically with the event.acknowledge API method. The logic defined in the blog post was quite complex since we needed to keep an eye out for the authentication tokens and use a custom script to keep them up to date. With named Zabbix API tokens, this use case is a lot more simple.

All I have to do is create an Action operation script containing my API call and pass it to an action operation.

Action operation script that invokes Zabbix event.acknowledge API method

curl -sk -X POST -H "Content-Type: application/json" -d "
{
\"jsonrpc\": \"2.0\",
\"method\": \"event.acknowledge\",
\"params\": {
\"eventids\": \"{EVENT.ID}\",
\"action\": 1,
\"message\": \"Problem resolved.\"
},
\"auth\": \"<Place your authentication token here>",
\"id\": 2
}" <Place your Zabbix API endpoint URL here>

Problem remediation script example

Now my problems will get closed automatically after the time period which I have defined in my action.

Action operation which runs our event.acknowledge Zabbix API script

These are but a few examples that we can now achieve by using API tokens. A lot of information can be obtained and filtered in a unique way via Zabbix API, thus providing you with a granular analysis of your monitored environment. If you have recently upgraded to Zabbix 5.4 or plan to upgrade to Zabbix 6.0 LTS in the future, I would recommend implementing named Zabbix API tokens to simplify your day-to-day workflow and consider the possibilities that this new feature opens up for you.

If you have any questions or if you wish to share your particular use case for data collection or task automation with Zabbix API – feel free to share them in the comments section below!

Practical API Design at Netflix, Part 2: Protobuf FieldMask for Mutation Operations

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/practical-api-design-at-netflix-part-2-protobuf-fieldmask-for-mutation-operations-2e75e1d230e4

By Ricky Gardiner, Alex Borysov

Background

In our previous post, we discussed how we utilize FieldMask as a solution when designing our APIs so that consumers can request the data they need when fetched via gRPC. In this blog post we will continue to cover how Netflix Studio Engineering uses FieldMask for mutation operations such as update and remove.

Example: Netflix Studio Production

Money Heist (La casa de papel) / Netflix

Previously we outlined what a Production is and how the Production Service makes gRPC calls to other microservices such as the Schedule Service and Script Service to retrieve schedules and scripts (aka screenplay) for a particular production such as La Casa De Papel. We can take that model and showcase how we can mutate particular fields on a production.

Mutating Production Details

Let’s say we want to update the format field from LIVE_ACTION to HYBRID as our production has added some animated elements. A naive way for us to solve this is to add an updateProductionFormatRequest method and gRPC endpoint just to update the productionFormat:

This allows us to update the production format for a particular production but what if we then want to update other fields such as titleor even multiple fields such as productionFormat, schedule, etc? Building on top of this we could just implement an update method for every field: one for Production format, another for title and so on:

This can become unmanageable when maintaining our APIs due to the number of fields on the Production. What if we want to update more than one field and do it atomically in a single RPC? Creating additional methods for various combinations of fields will lead to an explosion of mutation APIs. This solution is not scalable.

Instead of trying to create every single combination possible, another solution could be to have an UpdateProduction endpoint that requires all fields from the consumer:

The issue with this solution is two-fold as the consumer must know and provide every single required field in a Production even if they just want to update one field such as the format. The other issue is that since a Production has many fields the request payload can become quite large particularly if the production has schedule or scripts information.

What if, instead of all the fields, we send only the fields we actually want to update, and leave all other fields unset? In our example, we would only set the production format field (and ID to reference the production):

This could work if we never need to remove or blank out any fields. But what if we want to remove the value of the title field? Again, we can introduce one-off methods like RemoveProductionTitle, but as discussed above, this solution does not scale well. What if we want to remove a value of a nested field such as the planned launch date field from the schedule? We would end up adding remove RPCs for every individual nullable sub-field.

Utilizing FieldMask for Mutations

Instead of numerous RPCs or requiring a large payload, we can utilize a FieldMask for all our mutations. The FieldMask will list all of the fields we would like to explicitly update. First, let’s update our proto file to add in the UpdateProductionRequest, which will contain the data we want to update from a production, and a FieldMask of what should be updated:

Now, we can use a FieldMask to make mutations. We can update the format by creating a FieldMask for the format field by using the FieldMaskUtil.fromStringList() utility method which constructs a FieldMask for a list of field paths in a certain type. In this case, we will have one type, but will build upon this example later:

Since our FieldMask only specifies the format field that will be the only field that is updated even if we provide more data in ProductionUpdateOperation. It becomes easier to add or remove more fields to our FieldMask by modifying the paths. Data that is provided in the payload but not added in a path of a FieldMask will not be updated and simply ignored in the operation. But, if we omit a value it will perform a remove mutation on that field. Let’s modify our example above to showcase this and update the format but remove the planned launch date, which is a nested field on the ProductionSchedule as “schedule.planned_launch_date”:

In this example, we are performing both update and remove mutations as we have added “format” and “schedule.planned_launch_date” paths to our FieldMask. When we provide this in our payload these fields will be updated to the new values, but when building our payload we are only providing the format and omitting the schedule.planned_launch_date. Omitting this from the payload but having it defined in our FieldMask will function as a remove mutation:

Empty / Missing Field Mask

When a field mask is unset or has no paths, the update operation applies to all the payload fields. This means the caller must send the whole payload or, as mentioned above, any unset fields will be removed.

This convention has an implication on schema evolution: when a new field is added to the message, all the consumers must start sending its value on the update operation or it will get removed.

Suppose we want to add a new field: production budget. We will extend both the Production message, and ProductionUpdateOperation:

If there is a consumer that doesn’t know about this new field or hasn’t updated client stubs yet, it can accidentally null the budget field out by not sending the FieldMask in the update request.

To avoid this issue, the producer should consider requiring the field mask for all the update operations. Another option would be to implement a versioning protocol: force all callers to send their version numbers and implement custom logic to skip fields not present in the old version.

Bella Ciao

In this blog post series, we have gone over how we use FieldMask at Netflix and how it can be a practical and scalable solution when designing your APIs.

API designers should aim for simplicity, but make their APIs open for extension and evolution. It’s often not easy to keep APIs simple and future-proof. Utilizing FieldMask in APIs helps us achieve both simplicity and flexibility.


Practical API Design at Netflix, Part 2: Protobuf FieldMask for Mutation Operations was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Maintaining Zabbix API token via JavaScript

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/maintaining-zabbix-api-token-via-javascript/15561/

In this blog post, we will talk about maintaining and storing the Zabbix API session key in an automated fashion. The blog post builds upon the Close problem automatically via Zabbix API subject and can be used as extra configuration for this particular use-case. The blog post also shares a great example of synthetic monitoring by way of JavaScript preprocessing – how to emulate a scenario in an automated fashion and get alerted in case of any problems.

Prerequisites

First, let us create the Zabbix API user and user macros where we will store our username, password, Zabbix URL and the API session key.

1) Open “Administration” => “Users”. Create a new user ‘api’ with password ‘zabbix’. At the permissions tab set User Type “Zabbix Super Admin”.

2) Go to “Administration” => “General” => “Macros”. Configure base characteristics:

      {$Z_API_PHP} = http://demo.zabbix.demo/api_jsonrpc.php
     {$Z_API_USER} = api
 {$Z_API_PASSWORD} = zabbix
{$Z_API_SESSIONID} =

It’s OK to leave {$Z_API_SESSIONID} empty for now.

3) Let’s check if the Zabbix backend server can reach the Zabbix frontend server. Make sure that you are logged into the Zabbix backend server by looking up the zabbix_server process:

ps auxww | grep "[z]abbix_server.*conf"

Ensure that we can reach the Zabbix frontend by curling the Zabbix frontend server from the Zabbix backend server:

curl -s "http://demo.zabbix.demo/index.php" | grep Zabbix

4) Download template “Check and repair Zabbix API token” and import it in your instance.

5) Create a new host with an Agent interface and link the template. The IP address of the host does not matter. The template will use an agentless check to do the monitoring, it will use an “HTTP agent” item.

How it works

Our goal for today is to figure out a way to keep the Zabbix API authentication token up to date in a user macro. This way we can reuse the macro repeatedly for items, action operations and scripts that require for us to use the Zabbix API. We need to ensure that even if the token changes, the macro gets automatically updated with the new token value! Let’s try and understand each step of the underlying workflow required for us to achieve this goal.

The first component of our workflow is the “Validate session key raw” item. This is an HTTP agent item that performs a POST request with an arbitrary method – proxy.get in this case, but we could have used ANY other method. We simply want to check if an arbitrary Zabbix API call can be executed with the current {$Z_API_SESSIONID} macro value.

The second part of the workflow is the “Repair session key” dependent item. This item utilizes the JavaScript preprocessing step with custom JavaScript code to check the values obtained by the previous item and generate a new authentication token if that is necessary.

The third item – “Status”, is another dependent item that uses regular expression preprocessing steps to check for different error messages or status codes in the value of the “Validate session key raw” item. Most of the triggers defined in this template will react to the values obtained by this item.

 

Below you can see the full underlying workflow:

Code-wise, the magic is implemented with the following JavaScript code snippet:

if (value.match(/Session terminated/)) {

var req = new CurlHttpRequest();

// Zabbix API
var json_rpc='{$Z_API_PHP}'

// lib curl header
req.AddHeader('Content-Type: application/json');

// First request to get authentication token
var token =  JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"user.login","params":{"user":"{$Z_API_USER}","password":"{$Z_API_PASSWORD}"},"id":1,"auth":null}'
));

// If authentication was unsuccessful
if ( token.error )
{
// Login name or password is incorrect
return 32500;
}

else {
// Update the macro

// Get the global macro ID
// We cannot plot here a very native Zabbix macro because it will be automatically expanded
// Must use a workaround to distinguish a dollar sign from the actual macro name and merge with '+'
var id = JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"usermacro.get","params":{"output":["globalmacroid"],"globalmacro":true,"filter":{"macro":"{$'+'Z_API_SESSIONID'+'}"}},"auth":"'+token.result+'","id":1}'
)).result[0].globalmacroid;

// This line contains a keyword '+value+' which will grab exactly the previous value outside this JavaScript snippet
var overwrite = JSON.parse(req.Post(json_rpc,
'{"jsonrpc":"2.0","method":"usermacro.updateglobal","params":{"globalmacroid":"'+id+'","value":"'+token.result+'"},"auth":"'+token.result+'","id":1}'
));

// Return the id (an integer) of the macro, which was updated
return overwrite.result.globalmacroids[0];
}

} else {
return 0;
}

Throughout the JavaScript code, we are extracting a value from one step and using that as an input for the next step.

After the data has been collected, the values are analyzed by 4 triggers. 3 out of these 4 triggers prints a misconfiguration problem that requires a human investigation. We also have a “repairing Zabbix API session key in background” title, which is the main trigger that indicates a token has been expired and repair automatically.

And that’s it – now any integration that requires a Zabbix API authentication token can receive the current token value by referencing the user macro that we created in the article. We’ve ensured that the token is always going to stay up to date and in case of any issues, we will receive an alert from Zabbix!

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/practical-api-design-at-netflix-part-1-using-protobuf-fieldmask-35cfdc606518

By Alex Borysov, Ricky Gardiner

Background

At Netflix, we heavily use gRPC for the purpose of backend to backend communication. When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore. Some response fields can be expensive to compute, some fields can require remote calls to other services. Remote calls are never free; they impose extra latency, increase probability of an error, and consume network bandwidth. How can we understand which fields the caller doesn’t need to be supplied in the response, so we can avoid making unnecessary computations and remove calls? With GraphQL this comes out of the box through the use of field selectors. In the JSON:API standard a similar technique is known as Sparse Fieldsets. How can we achieve a similar functionality when designing our gRPC APIs? The solution we use within the Netflix Studio Engineering is protobuf FieldMask.

Money Heist (La casa de papel) / Netflix

Protobuf FieldMask

Protocol Buffers, or simply protobuf, is a data serialization mechanism. By default, gRPC uses protobuf as its IDL (interface definition language) and data serialization protocol.

FieldMask is a protobuf message. There are a number of utilities and conventions on how to use this message when it is present in an RPC request. A FieldMask message contains a single field named paths, which is used to specify fields that should be returned by a read operation or modified by an update operation.

Example: Netflix Studio Production

Money Heist (La casa de papel) / Netflix

Let’s assume there is a Production service that manages Studio Content Productions (in the film and TV industry, the term production refers to the process of making a movie, not the environment to run a software).

GetProduction returns a Production message by its unique ID. A production contains multiple fields such as: title, format, schedule dates, scripts aka screenplay, budgets, episodes, etc, but let’s keep this example simple and focus on filtering out schedule dates and scripts when requesting a production.

Reading Production Details

Let’s say we want to get production information for a particular production such as “La Casa De Papel” using the GetProduction API. While a production has many fields, some of these fields are returned from other services such as schedule from the Schedule service, or scripts from the Script service.

The Production service will make RPCs to Schedule and Script services every time GetProduction is called, even if clients ignore the schedule and scripts fields in the response. As mentioned above, remote calls are not free. If the service knows which fields are important for the caller, it can make an informed decision about making expensive calls, starting resource-heavy computations, and/or calling the database. In this example, if the caller only needs production title and production format, the Production service can avoid making remote calls to Schedule and Script services.

Additionally, requesting a large number of fields can make the response payload massive. This can become an issue for some applications, for example, on mobile devices with limited network bandwidth. In these cases it is a good practice for consumers to request only the fields they need.

Money Heist (La casa de papel) / Netflix

A naïve way of solving these problems can be adding additional request parameters, such as includeSchedule and includeScripts:

This approach requires adding a custom includeXXX field for every expensive response field and doesn’t work well for nested fields. It also increases the complexity of the request, ultimately making maintenance and support more challenging.

Add FieldMask to the Request Message

Instead of creating one-off “include” fields, API designers can add field_mask field to the request message:

Consumers can set paths for the fields they expect to receive in the response. If a consumer is only interested in production titles and format, they can set a FieldMask with paths “title” and “format”:

Masking fields

Please note, even though code samples in this blog post are written in Java, demonstrated concepts apply to any other language supported by protocol buffers.

If consumers only need a title and an email of the last person who updated the schedule, they can set a different field mask:

By convention, if a FieldMask is not present in the request, all fields should be returned.

Protobuf Field Names vs Field Numbers

You might notice that paths in the FieldMask are specified using field names, whereas on the wire, encoded protocol buffers messages contain only field numbers, not field names. This (alongside some other techniques like ZigZag encoding for signed types) makes protobuf messages space-efficient.

To understand the difference between field numbers and field names, let’s take a detailed look at how protobuf encodes and decodes messages.

Our protobuf message definition (.proto file) contains Production message with five fields. Every field has a type, name, and number.

When the protobuf compiler (protoc) compiles this message definition, it creates the code in the language of your choice (Java in our example). This generated code contains classes for defined messages, together with message and field descriptors. Descriptors contain all the information needed to encode and decode a message into its binary format. For example, they contain field numbers, names, types. Message producer uses descriptors to convert a message to its wire format. For efficiency, the binary message contains only field number-value pairs. Field names are not included. When a consumer receives the message, it decodes the byte stream into an object (for example, Java object) by referencing the compiled message definitions.

As mentioned above, FieldMask lists field names, not numbers. Here at Netflix we are using field numbers and convert them to field names using FieldMaskUtil.fromFieldNumbers() utility method. This method utilizes the compiled message definitions to convert field numbers to field names and creates a FieldMask.

However, there is an easy-to-overlook limitation: using FieldMask can limit your ability to rename message fields. Renaming a message field is generally considered a safe operation, because, as described above, the field name is not sent on the wire, it is derived using the field number on the consumer side. With FieldMask, field names are sent in the message payload (in the paths field value) and become significant.

Suppose we want to rename the field title to title_name and publish version 2.0 of our message definition:

In this chart, the producer (server) utilizes new descriptors, with field number 2 named title_name. The binary message sent over the wire contains the field number and its value. The consumer still uses the original descriptors, where the field number 2 is called title. It is still able to decode the message by field number.

This works well if the consumer doesn’t use FieldMask to request the field. If the consumer makes a call with the “title” path in the FieldMask field, the producer will not be able to find this field. The producer doesn’t have a field named title in its descriptors, so it doesn’t know the consumer asked for field number 2.

As we see, if a field is renamed, the backend should be able to support new and old field names until all the callers migrate to the new field name (backward compatibility issue).

There are multiple ways to deal with this limitation:

  • Never rename fields when FieldMask is used. This is the simplest solution, but it’s not always possible
  • Require the backend to support all the old field names. This solves the backward compatibility issue but requires extra code on the backend to keep track of all historical field names
  • Deprecate old and create a new field instead of renaming. In our example, we would create the title_name field number 6. This option has some advantages over the previous one: it allows the producer to keep using generated descriptors instead of custom converters; also, deprecating a field makes it more prominent on the consumer side

Regardless of the solution, it is important to remember that FieldMask makes field names an integral part of your API contract.

Using FieldMask on the Producer (Server) Side

On the producer (server) side, unnecessary fields can be removed from the response payload using the FieldMaskUtil.merge() method (lines ##8 and 9):

If the server code also needs to know which fields are requested in order to avoid making external calls, database queries or expensive computations, this information can be obtained from the FieldMask paths field:

This code calls the makeExpensiveCallToScheduleServicemethod (line #21) only if the schedule field is requested. Let’s explore this code sample in more detail.

(1) The SCHEDULE_FIELD_NAME constant contains the name of the field. This code sample uses message type Descriptor and FieldDescriptor to lookup field name by field number. The difference between protobuf field names and field numbers is described in the Protobuf Field Names vs Field Numbers section above.

(2) FieldMaskUtil.normalize() returns FieldMask with alphabetically sorted and deduplicated field paths (aka canonical form).

(3) Expression (lines ##14 – 17) that yields the scheduleFieldRequestedvalue takes a stream of FieldMask paths, maps it to a stream of top-level fields, and returns true if top-level fields contain the value of the SCHEDULE_FIELD_NAME constant.

(4) ProductionSchedule is retrieved only if scheduleFieldRequested is true.

If you end up using FieldMask for different messages and fields, consider creating reusable utility helper methods. For example, a method that returns all top-level fields based on FieldMask and FieldDescriptor, a method to return if a field is present in a FieldMask, etc.

Ship Pre-built FieldMasks

Some access patterns can be more common than others. If multiple consumers are interested in the same subset of fields, API producers can ship client libraries with FieldMask pre-built for the most frequently used field combinations.

Providing pre-built field masks simplifies API usage for the most common scenarios and leaves consumers the flexibility to build their own field masks for more specific use-cases.

Limitations

  • Using FieldMask can limit your ability to rename message fields (described in the Protobuf Field Names vs Field Numbers section)
  • Repeated fields are only allowed in the last position of a path string. This means you cannot select (mask) individual sub-fields in a message inside a list. This can change in the foreseeable future, as a recently approved Google API Improvement Proposal AIP-161 Field masks includes support for wildcards on repeated fields.

Bella Ciao

Protobuf FieldMask is a simple, yet powerful concept. It can help make APIs more robust and service implementations more efficient.

This blog post covered how and why it is used at Netflix Studio Engineering for APIs that read the data. Part 2 will shed light on using FieldMask for update and remove operations.


Practical API Design at Netflix, Part 1: Using Protobuf FieldMask was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Тука е така или с АПИ на семинар

Post Syndicated from original https://yurukov.net/blog/2021/api-seminar/

Бях решил да не се занимавам с тази случка, но заради това, което стана накрая реших, че си го заслужават.

Взех си няколко дни отпуска и в предпоследния ден в хотела ни нямаше почти хора. Тогава забелязах, че пристигат доста коли, сред които поне четири на Агенция Пътна инфраструктура както и няколко прескъпи джипа. Досетих се, че явно ще имат конференция или семинар или нещо подобно.

Така и се оказа. Напълниха конферентната стая с 80-тина души за 2-3 часа, след което се отправиха към басейна. В началото не бях сигурен кое ме раздразни – че трошаха яко държавна пара за 5-звезден хотел, че изведнъж всичко се напълни на 100% или че дори не си правиха труда да изглеждат, че са на официално събитие.

Бързо обаче размислих и се примирих – можеше да се напълни от други гости, а и все пак първи ден им е и може да имат друга програма за следващите. Дори фактът, че харчат толкова за пет звезден хотел в условията на сериозни скандали за разточителство на агенцията отметнах с това, че бледнее пред милиардите, които са раздали без обосновка, поръчка или дори строеж. А и все пак е форма на бонус към служители на държавна работа, която ако не се вкарат в някаква схема не може да се отрече, че е доста неблагодарна. Всеки офис и организация имат нужда от някаква форма на team-building, особено след последната година и въпреки всички резерви, които имах към тях. Отметнах всичко с ръка.

Гледах да не им обръщам внимание, но не можех да избягам от разговорите им. Явно не им пукаше много кой ги чува. Темите започнаха от злободневни – колко било скъпо в Гърция сега, колко скочили цените, един се жалваше колко малко от хората в агенцията са се ваксинирали, а никой не носел маска никъде в офисите или на конференцията сега.

После минаха на клюки кой какъв страничен бизнес имал, кой шеф успял да си доведе любовницата в хотела и кой не успял, защото я бил уредил в администрацията, а тука били само архитектура. После кой шеф каква кола под 6 цифрени суми не поглеждал и прочие. Общо взето се бяха разделили на три групи: едни дето се правеха на тежкари (предимно мъже, но и две жени), младите, които активно се наслаждаваха на почивката и „лелките и чичковци“, които също се радваха на басейна, но видимо бяха уморени от цялата драма.

Тежкарите изглеждаха доста притеснени през цялото време и все висяха на телефона. Единият много настоя точно до мен да крещи как са посмели да пуснат нещо електронно, като трябвало да си стои затворено в папка в бюрото му. Други двама постоянно дваха инструкции какво да се подписва от тяхно име „все едно са там“. Едната увещаваше, че трябвало да изкарат, че е била на комисия за одобрение на някакъв проект с още двама, които обръщаха коктейли на бара, а служителят в офиса трябвало с подписите на тримата да подпише протокола, защото „всичко сме се разбрали вече, така трябва да мине“. Имаше разговори и за очакваните бонуси и как нищо не можело да се спре, защото „то вече уредено и подписано всичко“.

Всички тези подхвърляния и спорадични изпускания показваха една добре смазана машина, която явно работи дори и от сауната и няма особени притеснения кой какво ще чуе. Общо взето смесица от обикновени хора, които просто искат да си свършат работата предвид ситуацията, в която са поставени от други хора, които раздават стотици милиони с един подпис и не им пука, защото няма кой да ги съди.

Гледах да страня от тях, защото все пак исках да си почина от тези глупости. Не смятах, че което и да е от подвърлянията или гръмогласните заповеди е толкова конкретно, за да си заслужава внимание, а и общата картинка е ясна на всички. Причината обаче да напиша всичко това беше черешката на тортата накрая.

Точно когато освобождавах стаята във фоайето бяха седнали няколко от АПИ и обсъждаха сметките. Напред-назад сновеше служителка на хотела притеснено. Темата беше как трябва да бъде изготвена фактурата и кое как да бъде вписано в касовата бележка. За малкото време, в което чаках собствената си сметка, на три пъти чух вариации на изказване, че хотелът иска да спази закона и трябва да фактурира всичко както е консумирано. От своя страна заобиколилите я държавни служители АПИ отчетливо заплашиха, че ще им изпратят данъчните на проверка и ще „има много неприятни ситуации“, ако не променят сметката.

Всички бяха с много ехидни усмивки, които не се сдържах да снимам. Имам право на това, защото макар някои да бяха по бански, всички бяха формално на работа и обсъждаха неща свързани с разход на бюджетни средства. Скривам лицата им, но с радост ще ги предоставя на отговорната институция, ако някой все има интерес. Скривам и служителката на хотела, защото искрено я съжалих на какъв натиск и заплахи е подложена от държавни служители. Надявам се да не се е поддала, но надали. Така работи АПИ и доста други институции.

ЯЯсно тук е, че е дори да се сменят шапките на тези организации, схемите, връзките и проблемите вътре остават. Разбиването им ще доведе неизменно до масово напускане на хора и криза в конкретната администрация. Т.е. ще стане по-зле преди да стане по-добре. Никой не изглеждаше притеснен от смяната на двама шефове за два месеца, но явно бяха притеснени какво следва.

И ние като данъкоплатци и гласоподаватели сме притеснени какво следва и ключов момент тук е не само каква култура на работа, говорене, поведение и нетърпимост ще бъде наложена на най-високо ниво в управлението, но и да има прокуратура и съд, които да преследват както такива дребни документални измами, така и милиардите подарени на фирми близки до ГЕРБ, ДПС и каквото друг открият. За първото – виждаме някакъв шанс. За прокуратурата ще трябва доста работа и борба.

The post Тука е така или с АПИ на семинар first appeared on Блогът на Юруков.

Announcing Rollbacks and API Access for Pages

Post Syndicated from David Song original https://blog.cloudflare.com/rollbacks-and-api-access-for-pages/

Announcing Rollbacks and API Access for Pages

Announcing Rollbacks and API Access for Pages

A couple of months ago, we announced the general availability of Cloudflare Pages: the easiest way to host and collaboratively develop websites on Cloudflare’s global network. It’s been amazing to see over 20,000 incredible sites built by users and hear your feedback. Since then, we’ve released user-requested features like URL redirects, web analytics, and Access integration.

We’ve been listening to your feedback and today we announce two new features: rollbacks and the Pages API. Deployment rollbacks allow you to host production-level code on Pages without needing to stress about broken builds resulting in website downtime. The API empowers you to create custom functionality and better integrate Pages with your development workflows. Now, it’s even easier to use Pages for production hosting.

Rollbacks

You can now rollback your production website to a previous working deployment with just a click of a button. This is especially useful when you want to quickly undo a new deployment for troubleshooting. Before, developers would have to push another deployment and then wait for the build to finish updating production. Now, you can restore a working version within a few moments by rolling back to a previous working build.

To rollback to a previous build, just click the “Rollback to this deployment” button on either the deployments list menu or on a specific deployment page.

Announcing Rollbacks and API Access for Pages

API Access

The Pages API exposes endpoints for you to easily create automations and to integrate Pages within your development workflow. Refer to the API documentation for a full breakdown of the object types and endpoints. To get started, navigate to the Cloudflare API Tokens page and copy your “Global API Key”. Now, you can authenticate and make requests to the API using your email and auth key in the request headers.

For example, here is an API request to get all projects on an account.

Request (example)

curl -X GET "https://api.cloudflare.com/client/v4/accounts/{account_id}/pages/projects" \
     -H "X-Auth-Email: {email}" \
     -H "X-Auth-Key: {auth_key}"

Response (example)

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "name": "NextJS Blog",
    "id": "7b162ea7-7367-4d67-bcde-1160995d5",
    "created_on": "2017-01-01T00:00:00Z",
    "subdomain": "helloworld.pages.dev",
    "domains": [
      "customdomain.com",
      "customdomain.org"
    ],
    "source": {
      "type": "github",
      "config": {
        "owner": "cloudflare",
        "repo_name": "ninjakittens",
        "production_branch": "main",
        "pr_comments_enabled": true,
        "deployments_enabled": true
      }
    },
    "build_config": {
      "build_command": "npm run build",
      "destination_dir": "build",
      "root_dir": "/",
      "web_analytics_tag": "cee1c73f6e4743d0b5e6bb1a0bcaabcc",
      "web_analytics_token": "021e1057c18547eca7b79f2516f06o7x"
    },
    "deployment_configs": {
      "preview": {
        "env_vars": {
          "BUILD_VERSION": {
            "value": "3.3"
          }
        }
      },
      "production": {
        "env_vars": {
          "BUILD_VERSION": {
            "value": "3.3"
          }
        }
      }
    },
    "latest_deployment": {
      "id": "f64788e9-fccd-4d4a-a28a-cb84f88f6",
      "short_id": "f64788e9",
      "project_id": "7b162ea7-7367-4d67-bcde-1160995d5",
      "project_name": "ninjakittens",
      "environment": "preview",
      "url": "https://f64788e9.ninjakittens.pages.dev",
      "created_on": "2021-03-09T00:55:03.923456Z",
      "modified_on": "2021-03-09T00:58:59.045655",
      "aliases": [
        "https://branchname.projectname.pages.dev"
      ],
      "latest_stage": {
        "name": "deploy",
        "started_on": "2021-03-09T00:55:03.923456Z",
        "ended_on": "2021-03-09T00:58:59.045655",
        "status": "success"
      },
      "env_vars": {
        "BUILD_VERSION": {
          "value": "3.3"
        },
        "ENV": {
          "value": "STAGING"
        }
      },
      "deployment_trigger": {
        "type": "ad_hoc",
        "metadata": {
          "branch": "main",
          "commit_hash": "ad9ccd918a81025731e10e40267e11273a263421",
          "commit_message": "Update index.html"
        }
      },
      "stages": [
        {
          "name": "queued",
          "started_on": "2021-06-03T15:38:15.608194Z",
          "ended_on": "2021-06-03T15:39:03.134378Z",
          "status": "active"
        },
        {
          "name": "initialize",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "clone_repo",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "build",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "deploy",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        }
      ],
      "build_config": {
        "build_command": "npm run build",
        "destination_dir": "build",
        "root_dir": "/",
        "web_analytics_tag": "cee1c73f6e4743d0b5e6bb1a0bcaabcc",
        "web_analytics_token": "021e1057c18547eca7b79f2516f06o7x"
      },
      "source": {
        "type": "github",
        "config": {
          "owner": "cloudflare",
          "repo_name": "ninjakittens",
          "production_branch": "main",
          "pr_comments_enabled": true,
          "deployments_enabled": true
        }
      }
    },
    "canonical_deployment": {
      "id": "f64788e9-fccd-4d4a-a28a-cb84f88f6",
      "short_id": "f64788e9",
      "project_id": "7b162ea7-7367-4d67-bcde-1160995d5",
      "project_name": "ninjakittens",
      "environment": "preview",
      "url": "https://f64788e9.ninjakittens.pages.dev",
      "created_on": "2021-03-09T00:55:03.923456Z",
      "modified_on": "2021-03-09T00:58:59.045655",
      "aliases": [
        "https://branchname.projectname.pages.dev"
      ],
      "latest_stage": {
        "name": "deploy",
        "started_on": "2021-03-09T00:55:03.923456Z",
        "ended_on": "2021-03-09T00:58:59.045655",
        "status": "success"
      },
      "env_vars": {
        "BUILD_VERSION": {
          "value": "3.3"
        },
        "ENV": {
          "value": "STAGING"
        }
      },
      "deployment_trigger": {
        "type": "ad_hoc",
        "metadata": {
          "branch": "main",
          "commit_hash": "ad9ccd918a81025731e10e40267e11273a263421",
          "commit_message": "Update index.html"
        }
      },
      "stages": [
        {
          "name": "queued",
          "started_on": "2021-06-03T15:38:15.608194Z",
          "ended_on": "2021-06-03T15:39:03.134378Z",
          "status": "active"
        },
        {
          "name": "initialize",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "clone_repo",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "build",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        },
        {
          "name": "deploy",
          "started_on": null,
          "ended_on": null,
          "status": "idle"
        }
      ],
      "build_config": {
        "build_command": "npm run build",
        "destination_dir": "build",
        "root_dir": "/",
        "web_analytics_tag": "cee1c73f6e4743d0b5e6bb1a0bcaabcc",
        "web_analytics_token": "021e1057c18547eca7b79f2516f06o7x"
      },
      "source": {
        "type": "github",
        "config": {
          "owner": "cloudflare",
          "repo_name": "ninjakittens",
          "production_branch": "main",
          "pr_comments_enabled": true,
          "deployments_enabled": true
        }
      }
    }
  },
  "result_info": {
    "page": 1,
    "per_page": 100,
    "count": 1,
    "total_count": 1
  }
}

Here’s another quick example using the API to rollback to a previous deployment:

Request (example)

curl -X POST "https://api.cloudflare.com/client/v4/accounts/{account_id}/pages/projects/{project_name}/deployments/{deployment_id}/rollback" \
     -H "X-Auth-Email: {email}" \
     -H "X-Auth-Key: {auth_key"

Response (example)

{
  "success": true,
  "errors": [],
  "messages": [],
  "result": {
    "id": "f64788e9-fccd-4d4a-a28a-cb84f88f6",
    "short_id": "f64788e9",
    "project_id": "7b162ea7-7367-4d67-bcde-1160995d5",
    "project_name": "ninjakittens",
    "environment": "preview",
    "url": "https://f64788e9.ninjakittens.pages.dev",
    "created_on": "2021-03-09T00:55:03.923456Z",
    "modified_on": "2021-03-09T00:58:59.045655",
    "aliases": [
      "https://branchname.projectname.pages.dev"
    ],
    "latest_stage": {
      "name": "deploy",
      "started_on": "2021-03-09T00:55:03.923456Z",
      "ended_on": "2021-03-09T00:58:59.045655",
      "status": "success"
    },
    "env_vars": {
      "BUILD_VERSION": {
        "value": "3.3"
      },
      "ENV": {
        "value": "STAGING"
      }
    },
    "deployment_trigger": {
      "type": "ad_hoc",
      "metadata": {
        "branch": "main",
        "commit_hash": "ad9ccd918a81025731e10e40267e11273a263421",
        "commit_message": "Update index.html"
      }
    },
    "stages": [
      {
        "name": "queued",
        "started_on": "2021-06-03T15:38:15.608194Z",
        "ended_on": "2021-06-03T15:39:03.134378Z",
        "status": "active"
      },
      {
        "name": "initialize",
        "started_on": null,
        "ended_on": null,
        "status": "idle"
      },
      {
        "name": "clone_repo",
        "started_on": null,
        "ended_on": null,
        "status": "idle"
      },
      {
        "name": "build",
        "started_on": null,
        "ended_on": null,
        "status": "idle"
      },
      {
        "name": "deploy",
        "started_on": null,
        "ended_on": null,
        "status": "idle"
      }
    ],
    "build_config": {
      "build_command": "npm run build",
      "destination_dir": "build",
      "root_dir": "/",
      "web_analytics_tag": "cee1c73f6e4743d0b5e6bb1a0bcaabcc",
      "web_analytics_token": "021e1057c18547eca7b79f2516f06o7x"
    },
    "source": {
      "type": "github",
      "config": {
        "owner": "cloudflare",
        "repo_name": "ninjakittens",
        "production_branch": "main",
        "pr_comments_enabled": true,
        "deployments_enabled": true
      }
    }
  }
}

Try out an API request with one of your projects by replacing {account_id}, {deployment_id},{email}, and {auth_key}. You can find your account_id in the URL address bar by navigating to the Cloudflare Dashboard. (Ex: 41643ed677c7c7gba4x463c4zdb9563c).

Refer to the API documentation for a full breakdown of the object types and endpoints.

Using the Pages API on Workers

The Pages API is even more powerful and simple to use with workers.new. If you haven’t used Workers before, feel free to go through the getting started guide to learn more. However, you’ll need to have set up a Pages project to follow along. Next, you can copy and paste this template for the new worker. Then, customize the values such as {account_id}, {project_name}, {auth_key}, and {your_email}.

const endpoint = "https://api.cloudflare.com/client/v4/accounts/{account_id}/pages/projects/{project_name}/deployments";
const email = "{your_email}";
addEventListener("scheduled", (event) => {
  event.waitUntil(handleScheduled(event.scheduledTime));
});
async function handleScheduled(request) {
  const init = {
    method: "POST",
    headers: {
      "content-type": "application/json;charset=UTF-8",
      "X-Auth-Email": email,
      "X-Auth-Key": API_KEY,
      //We recommend you store API keys as secrets using the Workers dashboard or using Wrangler as documented here https://developers.cloudflare.com/workers/cli-wrangler/commands#secret
    },
  };
  const response = await fetch(endpoint, init);
  return new Response(200);
}

Announcing Rollbacks and API Access for Pages

To finish configuring the script, click the back arrow near the top left of the window and click on the settings tab. Then, set an environment variable “API_KEY” with the value of your Cloudflare Global key and click “Encrypt” and then “Save”.

The script just makes a POST request to the deployments’ endpoint to trigger a new build. Click “Quick edit” to go back to the code editor to finish testing the script. You can test out your configuration and make a request by clicking on the “Trigger scheduled event” button in the “Schedule” tab near the tabs saying “HTTP” and “Preview”. You should see a new queued build on your Project through the Pages dashboard. Now, you can click “Save and Deploy” to publish your work. Finally, back to the worker settings page by clicking the back arrow near the top left of the window.

All that’s left to do is set a cron trigger to periodically run this Worker on the “Triggers tab”. Click on “Add Cron Trigger”.

Announcing Rollbacks and API Access for Pages

Next, we can input “0 * * * *” to trigger the build every hour.

Announcing Rollbacks and API Access for Pages

Finally, click save and your automation using the Pages API will trigger a new build every hour.

2. Deleting old deployments after a week: Pages hosts and serves all project deployments on preview links. Suppose you want to keep your project relatively private and prevent access to old deployments. You can use the API to delete deployments after a month so that they are no longer public online. This is easy to do on Workers using Cron Triggers.

3. Sharing project information: Imagine you are working on a development team using Pages to build our websites. You probably want an easy way to share deployment preview links and build status without having to share Cloudflare accounts. Using the API, you can easily share project information, including deployment status and preview links, and serve this content as HTML from a Cloudflare Worker.

Find the code snippets for all three examples here.

Conclusion

We will continue making the API more powerful with features such as supporting prebuilt deployments in the future. We are excited to see what you build with the API and hope you enjoy using rollbacks. At Cloudflare, we are committed to building the best developer experience on Pages, and we always appreciate hearing your feedback. Come chat with us and share more feedback on the Workers Discord (We have a dedicated #pages-help channel!).

Building fine-grained authorization using Amazon Cognito, API Gateway, and IAM

Post Syndicated from Artem Lovan original https://aws.amazon.com/blogs/security/building-fine-grained-authorization-using-amazon-cognito-api-gateway-and-iam/

June 5, 2021: We’ve updated Figure 1: User request flow.


Authorizing functionality of an application based on group membership is a best practice. If you’re building APIs with Amazon API Gateway and you need fine-grained access control for your users, you can use Amazon Cognito. Amazon Cognito allows you to use groups to create a collection of users, which is often done to set the permissions for those users. In this post, I show you how to build fine-grained authorization to protect your APIs using Amazon Cognito, API Gateway, and AWS Identity and Access Management (IAM).

As a developer, you’re building a customer-facing application where your users are going to log into your web or mobile application, and as such you will be exposing your APIs through API Gateway with upstream services. The APIs could be deployed on Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Lambda, or Elastic Load Balancing where each of these options will forward the request to your Amazon Elastic Compute Cloud (Amazon EC2) instances. Additionally, you can use on-premises services that are connected to your Amazon Web Services (AWS) environment over an AWS VPN or AWS Direct Connect. It’s important to have fine-grained controls for each API endpoint and HTTP method. For instance, the user should be allowed to make a GET request to an endpoint, but should not be allowed to make a POST request to the same endpoint. As a best practice, you should assign users to groups and use group membership to allow or deny access to your API services.

Solution overview

In this blog post, you learn how to use an Amazon Cognito user pool as a user directory and let users authenticate and acquire the JSON Web Token (JWT) to pass to the API Gateway. The JWT is used to identify what group the user belongs to, as mapping a group to an IAM policy will display the access rights the group is granted.

Note: The solution works similarly if Amazon Cognito would be federating users with an external identity provider (IdP)—such as Ping, Active Directory, or Okta—instead of being an IdP itself. To learn more, see Adding User Pool Sign-in Through a Third Party. Additionally, if you want to use groups from an external IdP to grant access, Role-based access control using Amazon Cognito and an external identity provider outlines how to do so.

The following figure shows the basic architecture and information flow for user requests.

Figure 1: User request flow

Figure 1: User request flow

Let’s go through the request flow to understand what happens at each step, as shown in Figure 1:

  1. A user logs in and acquires an Amazon Cognito JWT ID token, access token, and refresh token. To learn more about each token, see using tokens with user pools.
  2. A RestAPI request is made and a bearer token—in this solution, an access token—is passed in the headers.
  3. API Gateway forwards the request to a Lambda authorizer—also known as a custom authorizer.
  4. The Lambda authorizer verifies the Amazon Cognito JWT using the Amazon Cognito public key. On initial Lambda invocation, the public key is downloaded from Amazon Cognito and cached. Subsequent invocations will use the public key from the cache.
  5. The Lambda authorizer looks up the Amazon Cognito group that the user belongs to in the JWT and does a lookup in Amazon DynamoDB to get the policy that’s mapped to the group.
  6. Lambda returns the policy and—optionally—context to API Gateway. The context is a map containing key-value pairs that you can pass to the upstream service. It can be additional information about the user, the service, or anything that provides additional information to the upstream service.
  7. The API Gateway policy engine evaluates the policy.

    Note: Lambda isn’t responsible for understanding and evaluating the policy. That responsibility falls on the native capabilities of API Gateway.

  8. The request is forwarded to the service.

Note: To further optimize Lambda authorizer, the authorization policy can be cached or disabled, depending on your needs. By enabling cache, you could improve the performance as the authorization policy will be returned from the cache whenever there is a cache key match. To learn more, see Configure a Lambda authorizer using the API Gateway console.

Let’s have a closer look at the following example policy that is stored as part of an item in DynamoDB.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"PetStore-API",
         "Effect":"Allow",
         "Action":"execute-api:Invoke",
         "Resource":[
            "arn:aws:execute-api:*:*:*/*/*/petstore/v1/*",
            "arn:aws:execute-api:*:*:*/*/GET/petstore/v2/status"
         ],
         "Condition":{
            "IpAddress":{
               "aws:SourceIp":[
                  "192.0.2.0/24",
                  "198.51.100.0/24"
               ]
            }
         }
      }
   ]
}

Based on this example policy, the user is allowed to make calls to the petstore API. For version v1, the user can make requests to any verb and any path, which is expressed by an asterisk (*). For v2, the user is only allowed to make a GET request for path /status. To learn more about how the policies work, see Output from an Amazon API Gateway Lambda authorizer.

Getting started

For this solution, you need the following prerequisites:

  • The AWS Command Line Interface (CLI) installed and configured for use.
  • Python 3.6 or later, to package Python code for Lambda

    Note: We recommend that you use a virtual environment or virtualenvwrapper to isolate the solution from the rest of your Python environment.

  • An IAM role or user with enough permissions to create Amazon Cognito User Pool, IAM Role, Lambda, IAM Policy, API Gateway and DynamoDB table.
  • The GitHub repository for the solution. You can download it, or you can use the following Git command to download it from your terminal.

    Note: This sample code should be used to test out the solution and is not intended to be used in production account.

     $ git clone https://github.com/aws-samples/amazon-cognito-api-gateway.git
     $ cd amazon-cognito-api-gateway
    

    Use the following command to package the Python code for deployment to Lambda.

     $ bash ./helper.sh package-lambda-functions
     …
     Successfully completed packaging files.
    

To implement this reference architecture, you will be utilizing the following services:

Note: This solution was tested in the us-east-1, us-east-2, us-west-2, ap-southeast-1, and ap-southeast-2 Regions. Before selecting a Region, verify that the necessary services—Amazon Cognito, API Gateway, and Lambda—are available in those Regions.

Let’s review each service, and how those will be used, before creating the resources for this solution.

Amazon Cognito user pool

A user pool is a user directory in Amazon Cognito. With a user pool, your users can log in to your web or mobile app through Amazon Cognito. You use the Amazon Cognito user directory directly, as this sample solution creates an Amazon Cognito user. However, your users can also log in through social IdPs, OpenID Connect (OIDC), and SAML IdPs.

Lambda as backing API service

Initially, you create a Lambda function that serves your APIs. API Gateway forwards all requests to the Lambda function to serve up the requests.

An API Gateway instance and integration with Lambda

Next, you create an API Gateway instance and integrate it with the Lambda function you created. This API Gateway instance serves as an entry point for the upstream service. The following bash command below creates an Amazon Cognito user pool, a Lambda function, and an API Gateway instance. The command then configures proxy integration with Lambda and deploys an API Gateway stage.

Deploy the sample solution

From within the directory where you downloaded the sample code from GitHub, run the following command to generate a random Amazon Cognito user password and create the resources described in the previous section.

 $ bash ./helper.sh cf-create-stack-gen-password
 ...
 Successfully created CloudFormation stack.

When the command is complete, it returns a message confirming successful stack creation.

Validate Amazon Cognito user creation

To validate that an Amazon Cognito user has been created successfully, run the following command to open the Amazon Cognito UI in your browser and then log in with your credentials.

Note: When you run this command, it returns the user name and password that you should use to log in.

 $ bash ./helper.sh open-cognito-ui
  Opening Cognito UI. Please use following credentials to login:
  Username: cognitouser
  Password: xxxxxxxx

Alternatively, you can open the CloudFormation stack and get the Amazon Cognito hosted UI URL from the stack outputs. The URL is the value assigned to the CognitoHostedUiUrl variable.

Figure 2: CloudFormation Outputs - CognitoHostedUiUrl

Figure 2: CloudFormation Outputs – CognitoHostedUiUrl

Validate Amazon Cognito JWT upon login

Since we haven’t installed a web application that would respond to the redirect request, Amazon Cognito will redirect to localhost, which might look like an error. The key aspect is that after a successful log in, there is a URL similar to the following in the navigation bar of your browser:

http://localhost/#id_token=eyJraWQiOiJicVhMYWFlaTl4aUhzTnY3W...

Test the API configuration

Before you protect the API with Amazon Cognito so that only authorized users can access it, let’s verify that the configuration is correct and the API is served by API Gateway. The following command makes a curl request to API Gateway to retrieve data from the API service.

 $ bash ./helper.sh curl-api
{"pets":[{"id":1,"name":"Birds"},{"id":2,"name":"Cats"},{"id":3,"name":"Dogs"},{"id":4,"name":"Fish"}]}

The expected result is that the response will be a list of pets. In this case, the setup is correct: API Gateway is serving the API.

Protect the API

To protect your API, the following is required:

  1. DynamoDB to store the policy that will be evaluated by the API Gateway to make an authorization decision.
  2. A Lambda function to verify the user’s access token and look up the policy in DynamoDB.

Let’s review all the services before creating the resources.

Lambda authorizer

A Lambda authorizer is an API Gateway feature that uses a Lambda function to control access to an API. You use a Lambda authorizer to implement a custom authorization scheme that uses a bearer token authentication strategy. When a client makes a request to one of the API operations, the API Gateway calls the Lambda authorizer. The Lambda authorizer takes the identity of the caller as input and returns an IAM policy as the output. The output is the policy that is returned in DynamoDB and evaluated by the API Gateway. If there is no policy mapped to the caller identity, Lambda will generate a deny policy and request will be denied.

DynamoDB table

DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. This is ideal for this use case to ensure that the Lambda authorizer can quickly process the bearer token, look up the policy, and return it to API Gateway. To learn more, see Control access for invoking an API.

The final step is to create the DynamoDB table for the Lambda authorizer to look up the policy, which is mapped to an Amazon Cognito group.

Figure 3 illustrates an item in DynamoDB. Key attributes are:

  • Group, which is used to look up the policy.
  • Policy, which is returned to API Gateway to evaluate the policy.

 

Figure 3: DynamoDB item

Figure 3: DynamoDB item

Based on this policy, the user that is part of the Amazon Cognito group pet-veterinarian is allowed to make API requests to endpoints https://<domain>/<api-gateway-stage>/petstore/v1/* and https://<domain>/<api-gateway-stage>/petstore/v2/status for GET requests only.

Update and create resources

Run the following command to update existing resources and create a Lambda authorizer and DynamoDB table.

 $ bash ./helper.sh cf-update-stack
Successfully updated CloudFormation stack.

Test the custom authorizer setup

Begin your testing with the following request, which doesn’t include an access token.

$ bash ./helper.sh curl-api
{"message":"Unauthorized"}

The request is denied with the message Unauthorized. At this point, the Amazon API Gateway expects a header named Authorization (case sensitive) in the request. If there’s no authorization header, the request is denied before it reaches the lambda authorizer. This is a way to filter out requests that don’t include required information.

Use the following command for the next test. In this test, you pass the required header but the token is invalid because it wasn’t issued by Amazon Cognito but is a simple JWT-format token stored in ./helper.sh. To learn more about how to decode and validate a JWT, see decode and verify an Amazon Cognito JSON token.

$ bash ./helper.sh curl-api-invalid-token
{"Message":"User is not authorized to access this resource"}

This time the message is different. The Lambda authorizer received the request and identified the token as invalid and responded with the message User is not authorized to access this resource.

To make a successful request to the protected API, your code will need to perform the following steps:

  1. Use a user name and password to authenticate against your Amazon Cognito user pool.
  2. Acquire the tokens (id token, access token, and refresh token).
  3. Make an HTTPS (TLS) request to API Gateway and pass the access token in the headers.

Before the request is forwarded to the API service, API Gateway receives the request and passes it to the Lambda authorizer. The authorizer performs the following steps. If any of the steps fail, the request is denied.

  1. Retrieve the public keys from Amazon Cognito.
  2. Cache the public keys so the Lambda authorizer doesn’t have to make additional calls to Amazon Cognito as long as the Lambda execution environment isn’t shut down.
  3. Use public keys to verify the access token.
  4. Look up the policy in DynamoDB.
  5. Return the policy to API Gateway.

The access token has claims such as Amazon Cognito assigned groups, user name, token use, and others, as shown in the following example (some fields removed).

{
    "sub": "00000000-0000-0000-0000-0000000000000000",
    "cognito:groups": [
        "pet-veterinarian"
    ],
...
    "token_use": "access",
    "scope": "openid email",
    "username": "cognitouser"
}

Finally, let’s programmatically log in to Amazon Cognito UI, acquire a valid access token, and make a request to API Gateway. Run the following command to call the protected API.

$ bash ./helper.sh curl-protected-api
{"pets":[{"id":1,"name":"Birds"},{"id":2,"name":"Cats"},{"id":3,"name":"Dogs"},{"id":4,"name":"Fish"}]}

This time, you receive a response with data from the API service. Let’s examine the steps that the example code performed:

  1. Lambda authorizer validates the access token.
  2. Lambda authorizer looks up the policy in DynamoDB based on the group name that was retrieved from the access token.
  3. Lambda authorizer passes the IAM policy back to API Gateway.
  4. API Gateway evaluates the IAM policy and the final effect is an allow.
  5. API Gateway forwards the request to Lambda.
  6. Lambda returns the response.

Let’s continue to test our policy from Figure 3. In the policy document, arn:aws:execute-api:*:*:*/*/GET/petstore/v2/status is the only endpoint for version V2, which means requests to endpoint /GET/petstore/v2/pets should be denied. Run the following command to test this.

 $ bash ./helper.sh curl-protected-api-not-allowed-endpoint
{"Message":"User is not authorized to access this resource"}

Note: Now that you understand fine grained access control using Cognito user pool, API Gateway and lambda function, and you have finished testing it out, you can run the following command to clean up all the resources associated with this solution:

 $ bash ./helper.sh cf-delete-stack

Advanced IAM policies to further control your API

With IAM, you can create advanced policies to further refine access to your APIs. You can learn more about condition keys that can be used in API Gateway, their use in an IAM policy with conditions, and how policy evaluation logic determines whether to allow or deny a request.

Summary

In this post, you learned how IAM and Amazon Cognito can be used to provide fine-grained access control for your API behind API Gateway. You can use this approach to transparently apply fine-grained control to your API, without having to modify the code in your API, and create advanced policies by using IAM condition keys.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the Amazon Cognito forum or contact AWS Support.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Author

Artem Lovan

Artem is a Senior Solutions Architect based in New York. He helps customers architect and optimize applications on AWS. He has been involved in IT at many levels, including infrastructure, networking, security, DevOps, and software development.

Summarize devices that are not reachable

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/summarize-devices-that-are-not-reachable/13219/

In this lab, we will list all devices which are not reachable by a monitoring tool. This is good when we want to improve the overall monitoring experience and decrease the size queue (metrics which has not been arrived at the instance).

Tools required for the job: Access to a database server or a Windows computer with PowerShell

To summarize devices that are not reachable at the moment we can use a database query. Tested and works on 4.0, 5.0, on MySQL and PostgreSQL:

SELECT hosts.host,
       interface.ip,
       interface.dns,
       interface.useip,
       CASE interface.type
           WHEN 1 THEN 'ZBX'
           WHEN 2 THEN 'SNMP'
           WHEN 3 THEN 'IPMI'
           WHEN 4 THEN 'JMX'
       END AS "type",
       hosts.error
FROM hosts
JOIN interface ON interface.hostid=hosts.hostid
WHERE hosts.available=2
  AND interface.main=1
  AND hosts.status=0;

A very similar (but not exactly the same) outcome can be obtained via Windows PowerShell by contacting Zabbix API. Try this snippet:

$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Content-Type", "application/json")
$url = 'http://192.168.1.101/api_jsonrpc.php'
$user = 'api'
$password = 'zabbix'

# authorization
$key = Invoke-RestMethod $url -Method 'POST' -Headers $headers -Body "
{
    `"jsonrpc`": `"2.0`",
    `"method`": `"user.login`",
    `"params`": {
        `"user`": `"$user`",
        `"password`": `"$password`"
    },
    `"id`": 1
}
" | foreach { $_.result }
echo $key

# filter out unreachable Agent, SNMP, JMX, IPMI hosts
Invoke-RestMethod $url -Method 'POST' -Headers $headers -Body "
{
    `"jsonrpc`": `"2.0`",
    `"method`": `"host.get`",
    `"params`": {
        `"output`": [`"interfaces`",`"host`",`"proxy_hostid`",`"disable_until`",`"lastaccess`",`"errors_from`",`"error`"],
        `"selectInterfaces`": `"extend`",
        `"filter`": {`"available`": `"2`",`"status`":`"0`"}
    },
    `"auth`": `"$key`",
    `"id`": 1
}
" | foreach { $_.result }  | foreach { $_.interfaces } | Out-GridView

# log out
Invoke-RestMethod $url -Method 'POST' -Headers $headers -Body "
{
    `"jsonrpc`": `"2.0`",
    `"method`": `"user.logout`",
    `"params`": [],
    `"id`": 1,
    `"auth`": `"$key`"
}
"

Set a valid credential (URL, username, password) on the top of the code before executing it.

The benefit of PowerShell here is that we can use some on-the-fly filtering:

What is the exact meaning of the field ‘type’ we can understand by looking on the previous database query:

       CASE interface.type
           WHEN 1 THEN 'ZBX'
           WHEN 2 THEN 'SNMP'
           WHEN 3 THEN 'IPMI'
           WHEN 4 THEN 'JMX'
       END AS "type",

On Windows PowerShell, it is possible to download the unreachable hosts directly to CSV file. To do that, in the code above, we need to change:

Out-GridView

to

Export-Csv c:\temp\unavailable.hosts.csv

Alright, this was the knowledge bit today. Let’s keep Zabbixing!

A Byzantine failure in the real world

Post Syndicated from Tom Lianza original https://blog.cloudflare.com/a-byzantine-failure-in-the-real-world/

A Byzantine failure in the real world

An analysis of the Cloudflare API availability incident on 2020-11-02

When we review design documents at Cloudflare, we are always on the lookout for Single Points of Failure (SPOFs). Eliminating these is a necessary step in architecting a system you can be confident in. Ironically, when you’re designing a system with built-in redundancy, you spend most of your time thinking about how well it functions when that redundancy is lost.

On November 2, 2020, Cloudflare had an incident that impacted the availability of the API and dashboard for six hours and 33 minutes. During this incident, the success rate for queries to our API periodically dipped as low as 75%, and the dashboard experience was as much as 80 times slower than normal. While Cloudflare’s edge is massively distributed across the world (and kept working without a hitch), Cloudflare’s control plane (API & dashboard) is made up of a large number of microservices that are redundant across two regions. For most services, the databases backing those microservices are only writable in one region at a time.

Each of Cloudflare’s control plane data centers has multiple racks of servers. Each of those racks has two switches that operate as a pair—both are normally active, but either can handle the load if the other fails. Cloudflare survives rack-level failures by spreading the most critical services across racks. Every piece of hardware has two or more power supplies with different power feeds. Every server that stores critical data uses RAID 10 redundant disks or storage systems that replicate data across at least three machines in different racks, or both. Redundancy at each layer is something we review and require. So—how could things go wrong?

In this post we present a timeline of what happened, and how a difficult failure mode known as a Byzantine fault played a role in a cascading series of events.

2020-11-02 14:43 UTC: Partial Switch Failure

At 14:43, a network switch started misbehaving. Alerts began firing about the switch being unreachable to pings. The device was in a partially operating state: network control plane protocols such as LACP and BGP remained operational, while others, such as vPC, were not. The vPC link is used to synchronize ports across multiple switches, so that they appear as one large, aggregated switch to servers connected to them. At the same time, the data plane (or forwarding plane) was not processing and forwarding all the packets received from connected devices.

This failure scenario is completely invisible to the connected nodes, as each server only sees an issue for some of its traffic due to the load-balancing nature of LACP. Had the switch failed fully, all traffic would have failed over to the peer switch, as the connected links would’ve simply gone down, and the ports would’ve dropped out of the forwarding LACP bundles.

Six minutes later, the switch recovered without human intervention. But this odd failure mode led to further problems that lasted long after the switch had returned to normal operation.

2020-11-02 14:44 UTC: etcd Errors begin

The rack with the misbehaving switch included one server in our etcd cluster. We use etcd heavily in our core data centers whenever we need strongly consistent data storage that’s reliable across multiple nodes.

In the event that the cluster leader fails, etcd uses the RAFT protocol to maintain consistency and establish consensus to promote a new leader. In the RAFT protocol, cluster members are assumed to be either available or unavailable, and to provide accurate information or none at all. This works fine when a machine crashes, but is not always able to handle situations where different members of the cluster have conflicting information.

In this particular situation:

  • Network traffic between node 1 (in the affected rack) and node 3 (the leader) was being sent through the switch in the degraded state,
  • Network traffic between node 1 and node 2 were going through its working peer, and
  • Network traffic between node 2 and node 3 was unaffected.

This caused cluster members to have conflicting views of reality, known in distributed systems theory as a Byzantine fault. As a consequence of this conflicting information, node 1 repeatedly initiated leader elections, voting for itself, while node 2 repeatedly voted for node 3, which it could still connect to. This resulted in ties that did not promote a leader node 1 could reach. RAFT leader elections are disruptive, blocking all writes until they’re resolved, so this made the cluster read-only until the faulty switch recovered and node 1 could once again reach node 3.

A Byzantine failure in the real world

2020-11-02 14:45 UTC: Database system promotes a new primary database

Cloudflare’s control plane services use relational databases hosted across multiple clusters within a data center. Each cluster is configured for high availability. The cluster setup includes a primary database, a synchronous replica, and one or more asynchronous replicas. This setup allows redundancy within a data center. For cross-datacenter redundancy, a similar high availability secondary cluster is set up and replicated in a geographically dispersed data center for disaster recovery. The cluster management system leverages etcd for cluster member discovery and coordination.

When etcd became read-only, two clusters were unable to communicate that they had a healthy primary database. This triggered the automatic promotion of a synchronous database replica to become the new primary. This process happened automatically and without error or data loss.

There was a defect in our cluster management system that requires a rebuild of all database replicas when a new primary database is promoted. So, although the new primary database was available instantly, the replicas would take considerable time to become available, depending on the size of the database. For one of the clusters, service was restored quickly. Synchronous and asynchronous database replicas were rebuilt and started replicating successfully from primary, and the impact was minimal.

For the other cluster, however, performant operation of that database required a replica to be online. Because this database handles authentication for API calls and dashboard activities, it takes a lot of reads, and one replica was heavily utilized to spare the primary the load. When this failover happened and no replicas were available, the primary was overloaded, as it had to take all of the load. This is when the main impact started.

Reduce Load, Leverage Redundancy

At this point we saw that our primary authentication database was overwhelmed and began shedding load from it. We dialed back the rate at which we push SSL certificates to the edge, send emails, and other features, to give it space to handle the additional load. Unfortunately, because of its size, we knew it would take several hours for a replica to be fully rebuilt.

A silver lining here is that every database cluster in our primary data center also has online replicas in our secondary data center. Those replicas are not part of the local failover process, and were online and available throughout the incident. The process of steering read-queries to those replicas was not yet automated, so we manually diverted API traffic that could leverage those read replicas to the secondary data center. This substantially improved our API availability.

The Dashboard

The Cloudflare dashboard, like most web applications, has the notion of a user session. When user sessions are created (each time a user logs in) we perform some database operations and keep data in a Redis cluster for the duration of that user’s session. Unlike our API calls, our user sessions cannot currently be moved across the ocean without disruption. As we took actions to improve the availability of our API calls, we were unfortunately making the user experience on the dashboard worse.

This is an area of the system that is currently designed to be able to fail over across data centers in the event of a disaster, but has not yet been designed to work in both data centers at the same time. After a first period in which users on the dashboard became increasingly frustrated, we failed the authentication calls fully back to our primary data center, and kept working on our primary database to ensure we could provide the best service levels possible in that degraded state.

2020-11-02 21:20 UTC Database Replica Rebuilt

The instant the first database replica rebuilt, it put itself back into service, and performance resumed to normal levels. We re-ramped all of the services that had been turned down, so all asynchronous processing could catch up, and after a period of monitoring marked the end of the incident.

Redundant Points of Failure

The cascade of failures in this incident was interesting because each system, on its face, had redundancy. Moreover, no system fully failed—each entered a degraded state. That combination meant the chain of events that transpired was considerably harder to model and anticipate. It was frustrating yet reassuring that some of the possible failure modes were already being addressed.

A team was already working on fixing the limitation that requires a database replica rebuild upon promotion. Our user sessions system was inflexible in scenarios where we’d like to steer traffic around, and redesigning that was already in progress.

This incident also led us to revisit the configuration parameters we put in place for things that auto-remediate. In previous years, promoting a database replica to primary took far longer than we liked, so getting that process automated and able to trigger on a minute’s notice was a point of pride. At the same time, for at least one of our databases, the cure may be worse than the disease, and in fact we may not want to invoke the promotion process so quickly. Immediately after this incident we adjusted that configuration accordingly.

Byzantine Fault Tolerance (BFT) is a hot research topic. Solutions have been known since 1982, but have had to choose between a variety of engineering tradeoffs, including security, performance, and algorithmic simplicity. Most general-purpose cluster management systems choose to forgo BFT entirely and use protocols based on PAXOS, or simplifications of PAXOS such as RAFT, that perform better and are easier to understand than BFT consensus protocols. In many cases, a simple protocol that is known to be vulnerable to a rare failure mode is safer than a complex protocol that is difficult to implement correctly or debug.

The first uses of BFT consensus were in safety-critical systems such as aircraft and spacecraft controls. These systems typically have hard real time latency constraints that require tightly coupling consensus with application logic in ways that make these implementations unsuitable for general-purpose services like etcd. Contemporary research on BFT consensus is mostly focused on applications that cross trust boundaries, which need to protect against malicious cluster members as well as malfunctioning cluster members. These designs are more suitable for implementing general-purpose services such as etcd, and we look forward to collaborating with researchers and the open source community to make them suitable for production cluster management.

We are very sorry for the difficulty the outage caused, and are continuing to improve as our systems grow. We’ve since fixed the bug in our cluster management system, and are continuing to tune each of the systems involved in this incident to be more resilient to failures of their dependencies.  If you’re interested in helping solve these problems at scale, please visit cloudflare.com/careers.

Close problem automatically via Zabbix API

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/close-problem-automatically-via-zabbix-api/12461/

Today we are talking about a use case when it’s impossible to find a proper way to write a recovery expression for the Zabbix trigger. In other words, we know how to identify problems. But there is no good way to detect when the problem is gone.

This mostly relates to a huge environment, for example:

  • Got one log file. There are hundreds of patterns inside. We respect all of them. We need them
  • SNMP trap item (snmptrap.fallback) with different patterns being written inside

In these situations, the trigger is most likely configured to “Event generation mode: Multiple.” This practically means: when a “problematic metric” hits the instance, it will open +1 additional problem.

Goal:
I just need to receive an email about the record, then close the event.

As a workaround (let’s call it a solution here), we can define an action which will:

  1. contact an API endpoint
  2. manually acknowledge the event and close it

The biggest reason why this functionality is possible is that: when an event hits the action, the operation actually knows the event ID of the problem. The macro {EVENT.ID} saves the day.

To solve the problem, we need to install API characteristics at the global level:

     {$Z_API_PHP}=http://127.0.0.1/api_jsonrpc.php
    {$Z_API_USER}=api
{$Z_API_PASSWORD}=zabbix

NOTE
‘http://127.0.0.1/api_jsonrpc.php’ means the frontend server runs on the same server as systemd:zabbix-server. If it is not the case, we need to plot a front-end address of Zabbix GUI + add ‘api_jsonrpc.php’.

We will have 2 actions. The first one will deliver a notification to email:

After 1 minute, a second action will close the event:

This is a full bash snippet we must put inside. No need to change anything. It works with copy and paste:

url={$Z_API_PHP}
    user={$Z_API_USER}
password={$Z_API_PASSWORD}

# authorization
auth=$(curl -sk -X POST -H "Content-Type: application/json" -d "
{
	\"jsonrpc\": \"2.0\",
	\"method\": \"user.login\",
	\"params\": {
		\"user\": \"$user\",
		\"password\": \"$password\"
	},
	\"id\": 1,
	\"auth\": null
}
" $url | \
grep -E -o "([0-9a-f]{32,32})")

# acknowledge and close event
curl -sk -X POST -H "Content-Type: application/json" -d "
{
	\"jsonrpc\": \"2.0\",
	\"method\": \"event.acknowledge\",
	\"params\": {
		\"eventids\": \"{EVENT.ID}\",
		\"action\": 1,
		\"message\": \"Problem resolved.\"
	},
	\"auth\": \"$auth\",
	\"id\": 1
}" $url

# close api key
curl -sk -X POST -H "Content-Type: application/json" -d "
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"user.logout\",
    \"params\": [],
    \"id\": 1,
    \"auth\": \"$auth\"
}
" $url

Zabbix API scripting via curl and jq

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/zabbix-api-scripting-via-curl-and-jq/12434/

In this lab we will use a bash environment and utilities ‘curl’ and ‘jq’ to perform Zabbix API calls, do some scripting.

‘curl’ is a tool to exchange JSON messages over HTTP/HTTPS.
‘jq’ utility helps to locate and extract specific elements in output.

To follow the lab we need to install ‘jq’:

# On CentOS7/RHEL7:
yum install epel-release && yum install jq

# On CentOS8/RHEL8:
dnf install jq

# On Ubuntu/Debian:
apt install jq

# On any 64-bit Linux platform:
curl -skL "https://github.com/stedolan/jq/releases/download/jq1.5/jq-linux64" -o /usr/bin/jq && chmod +x /usr/bin/jq

Obtaining an authorization token

In order to operate with API calls we need to:

  • Define an API endpoint. this is an URL, a PHP file which is designed to accept requests
  • Obtain an authorization token

If you tend to execute API calls from frontend server then most likelly.

url=http://127.0.0.1/api_jsonrpc.php
# or:
url=http://127.0.0.1/zabbix/api_jsonrpc.php

It’s required to set the URL variable to jump to the next step. Test if you have it configured:

echo $url

Any API call needs to be used via authorization token. To put one token in variable use the command:

auth=$(curl -s -X POST -H 'Content-Type: application/json-rpc' \
-d '
{"jsonrpc":"2.0","method":"user.login","params":
{"user":"api","password":"zabbix"},
"id":1,"auth":null}
' $url | \
jq -r .result
)

Note
Notice there is user ‘api’ with password ‘zabbix’. This is a dedicated user for API calls.

Check if you have a session key. It should be 32 character HEX string:

echo $auth

Though process

1) visit the documentation page and pick an API flavor for example alert.get:

{
"jsonrpc": "2.0",
"method": "alert.get",
"params": {
	"output": "extend",
	"actionids": "3"
},
"auth": "038e1d7b1735c6a5436ee9eae095879e",
"id": 1
}

2) Let’s use our favorite text editor and build in Find&Replace functionality to escape all double quotes:

{
\"jsonrpc\": \"2.0\",
\"method\": \"alert.get\",
\"params\": {
	\"output\": \"extend\",
	\"actionids\": \"3\"
},
\"auth\": \"038e1d7b1735c6a5436ee9eae095879e\",
\"id\": 1
}

NOTE
Don’t ever think to do this process manually by hand!

3) Replace session key 038e1d7b1735c6a5436ee9eae095879e with our variable $auth

{
\"jsonrpc\": \"2.0\",
\"method\": \"alert.get\",
\"params\": {
	\"output\": \"extend\",
	\"actionids\": \"3\"
},
\"auth\": \"$auth\",
\"id\": 1
}

4) Now let’s encapsulate the API command with curl:

curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \

{
\"jsonrpc\": \"2.0\",
\"method\": \"alert.get\",
\"params\": {
	\"output\": \"extend\",
	\"actionids\": \"3\"
},
\"auth\": \"$auth\",
\"id\": 1
}

" $url

By executing the previous command, it should already print a JSON content in response.
To make the output more beautiful we can pipe it to jq .:

curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \

{
\"jsonrpc\": \"2.0\",
\"method\": \"alert.get\",
\"params\": {
	\"output\": \"extend\",
	\"actionids\": \"3\"
},
\"auth\": \"$auth\",
\"id\": 1
}

" $url | jq .

Wrap everything together in one file

This is ready to use the snippet:

#!/bin/bash

# 1. set connection details
url=http://127.0.0.1/api_jsonrpc.php
user=api
password=zabbix

# 2. get authorization token
auth=$(curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
 \"jsonrpc\": \"2.0\",
 \"method\": \"user.login\",
 \"params\": {
  \"user\": \"$user\",
  \"password\": \"$password\"
 },
 \"id\": 1,
 \"auth\": null
}
" $url | \
jq -r '.result'
)

# 3. show triggers in problem state
curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
 \"jsonrpc\": \"2.0\",
    \"method\": \"trigger.get\",
    \"params\": {
        \"output\": \"extend\",
        \"selectHosts\": \"extend\",
        \"filter\": {
            \"value\": 1
        },
        \"sortfield\": \"priority\",
        \"sortorder\": \"DESC\"
    },
    \"auth\": \"$auth\",
    \"id\": 1
}
" $url | \
jq -r '.result'

# 4. logout user
curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"user.logout\",
    \"params\": [],
    \"id\": 1,
    \"auth\": \"$auth\"
}
" $url

Conveniences

We can use https://jsonpathfinder.com/ to identify what should be the path to extract an element.

For example, to list all Zabbix proxies we will use and API call:

curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"proxy.get\",
    \"params\": {
        \"output\": [\"host\"]
    },
    \"auth\": \"$auth\",
    \"id\": 1
} 
" $url

It may print content like:

{"jsonrpc":"2.0","result":[{"host":"broceni","proxyid":"10387"},{"host":"mysql8mon","proxyid":"12066"},{"host":"riga","proxyid":"12585"}],"id":1}

Inside JSONPathFinder by using a mouse click at the right panel, we can locate a sample element what we need to extract:

It suggests a path ‘x.result[1].host’. This means to extract all elements we can remove the number and use ‘.result[].host’ like this:

curl -s -X POST \
-H 'Content-Type: application/json-rpc' \
-d " \
{
    \"jsonrpc\": \"2.0\",
    \"method\": \"proxy.get\",
    \"params\": {
        \"output\": [\"host\"]
    },
    \"auth\": \"$auth\",
    \"id\": 1
} 
" $url | jq -r '.result[].host'

Now it prints only the proxy titles:

broceni
mysql8mon
riga

That is it for today. Bye.

Zabbix API calls through Postman

Post Syndicated from Aigars Kadiķis original https://blog.zabbix.com/zabbix-api-calls-through-postman/12198/

Zabbix API calls can be used through the graphical user interface (GUI), no need to jump to scripting. An application to perform API calls is called Postman.

Benefits:

  • Available on Windows, Linux, or MAC
  • Save/synchronize your collection with Google account
  • Can copy and paste examples from the official documentation page

Let’s go to basic steps on how to perform API calls:

1st step – Grab API method user.login and use a dedicated username and password to obtain and session token:

{
    "jsonrpc": "2.0",
    "method": "user.login",
    "params": {
        "user": "api",
        "password": "zabbix"
    },
    "id": 1
}

This is how it looks in Postman:

NOTE
We recommend using a dedicated user for API calls. For example, a user called “api”. Make sure the user type has been chosen as “Zabbix Super Admin” so through this user we can access any type of information.

2nd step – Use API method trigger.get to list all triggers in the problem state:

{
    "jsonrpc": "2.0",
    "method": "trigger.get",
    "params": {
        "output": [
            "triggerid",
            "description",
            "priority"
        ],
        "filter": {
            "value": 1
        },
        "sortfield": "priority",
        "sortorder": "DESC"
    },
    "auth": "<session key>",
    "id": 1
}

Replace “<session key>” inside the API snippet to make it work. Then click “Send” button. It will list all triggers in the problem state on the right side:

Postman conveniences – Environments

Environments are “a must” if you:

  • Have a separate test, development, and production Zabbix instance
  • Plan to migrate Zabbix to next version (4.0 to 5.0) so it’s better to test all API calls beforehand

On the top right corner, there is a button Manage Environments. Let’s click it.

Now Create an environment:

Each environment must consist of url and auth key:

Now we have one definition prod. Can close window with [X]:

In order to work with your new environment, select a newly created profile prod. It’s required to substitute Zabbix API endpoint with {{url}} and plot {{auth}} to serve as a dynamic authorization key:

NOTE
Every time we notice an API procedure does not work anymore, all we need to do is to enter Manage environments section and install a new session tokken..

Topic in video format:
https://youtu.be/B14tsDUasG8?t=2513

Automated Origin CA for Kubernetes

Post Syndicated from Terin Stock original https://blog.cloudflare.com/automated-origin-ca-for-kubernetes/

Automated Origin CA for Kubernetes

Automated Origin CA for Kubernetes

In 2016, we launched the Cloudflare Origin CA, a certificate authority optimized for making it easy to secure the connection between Cloudflare and an origin server. Running our own CA has allowed us to support fast issuance and renewal, simple and effective revocation, and wildcard certificates for our users.

Out of the box, managing TLS certificates and keys within Kubernetes can be challenging and error prone. The secret resources have to be constructed correctly, as components expect secrets with specific fields. Some forms of domain verification require manually rotating secrets to pass. Once you’re successful, don’t forget to renew before the certificate expires!

cert-manager is a project to fill this operational gap, providing Kubernetes resources that manage the lifecycle of a certificate. Today we’re releasing origin-ca-issuer, an extension to cert-manager integrating with Cloudflare Origin CA to easily create and renew certificates for your account’s domains.

Origin CA Integration

Creating an Issuer

After installing cert-manager and origin-ca-issuer, you can create an OriginIssuer resource. This resource creates a binding between cert-manager and the Cloudflare API for an account. Different issuers may be connected to different Cloudflare accounts in the same Kubernetes cluster.

apiVersion: cert-manager.k8s.cloudflare.com/v1
kind: OriginIssuer
metadata:
  name: prod-issuer
  namespace: default
spec:
  signatureType: OriginECC
  auth:
    serviceKeyRef:
      name: service-key
      key: key
      ```

This creates a new OriginIssuer named “prod-issuer” that issues certificates using ECDSA signatures, and the secret “service-key” in the same namespace is used to authenticate to the Cloudflare API.

Signing an Origin CA Certificate

After creating an OriginIssuer, we can now create a Certificate with cert-manager. This defines the domains, including wildcards, that the certificate should be issued for, how long the certificate should be valid, and when cert-manager should renew the certificate.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: example-com
  namespace: default
spec:
  # The secret name where cert-manager
  # should store the signed certificate.
  secretName: example-com-tls
  dnsNames:
    - example.com
  # Duration of the certificate.
  duration: 168h
  # Renew a day before the certificate expiration.
  renewBefore: 24h
  # Reference the Origin CA Issuer you created above,
  # which must be in the same namespace.
  issuerRef:
    group: cert-manager.k8s.cloudflare.com
    kind: OriginIssuer
    name: prod-issuer

Once created, cert-manager begins managing the lifecycle of this certificate, including creating the key material, crafting a certificate signature request (CSR), and constructing a certificate request that will be processed by the origin-ca-issuer.

When signed by the Cloudflare API, the certificate will be made available, along with the private key, in the Kubernetes secret specified within the secretName field. You’ll be able to use this certificate on servers proxied behind Cloudflare.

Extra: Ingress Support

If you’re using an Ingress controller, you can use cert-manager’s Ingress support to automatically manage Certificate resources based on your Ingress resource.

apiVersion: networking/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/issuer: prod-issuer
    cert-manager.io/issuer-kind: OriginIssuer
    cert-manager.io/issuer-group: cert-manager.k8s.cloudflare.com
  name: example
  namespace: default
spec:
  rules:
    - host: example.com
      http:
        paths:
          - backend:
              serviceName: examplesvc
              servicePort: 80
            path: /
  tls:
    # specifying a host in the TLS section will tell cert-manager 
    # what DNS SANs should be on the created certificate.
    - hosts:
        - example.com
      # cert-manager will create this secret
      secretName: example-tls

Building an External cert-manager Issuer

An external cert-manager issuer is a specialized Kubernetes controller. There’s no direct communication between cert-manager and external issuers at all; this means that you can use any existing tools and best practices for developing controllers to develop an external issuer.

We’ve decided to use the excellent controller-runtime project to build origin-ca-issuer, running two reconciliation controllers.

Automated Origin CA for Kubernetes

OriginIssuer Controller

The OriginIssuer controller watches for creation and modification of OriginIssuer custom resources. The controllers create a Cloudflare API client using the details and credentials referenced. This client API instance will later be used to sign certificates through the API. The controller will periodically retry to create an API client; once it is successful, it updates the OriginIssuer’s status to be ready.

CertificateRequest Controller

The CertificateRequest controller watches for the creation and modification of cert-manager’s CertificateRequest resources. These resources are created automatically by cert-manager as needed during a certificate’s lifecycle.

The controller looks for Certificate Requests that reference a known OriginIssuer, this reference is copied by cert-manager from the origin Certificate resource, and ignores all resources that do not match. The controller then verifies the OriginIssuer is in the ready state, before transforming the certificate request into an API request using the previously created clients.

On a successful response, the signed certificate is added to the certificate request, and which cert-manager will use to create or update the secret resource. On an unsuccessful request, the controller will periodically retry.

Learn More

Up-to-date documentation and complete installation instructions can be found in our GitHub repository. Feedback and contributions are greatly appreciated. If you’re interested in Kubernetes at Cloudflare, including building controllers like these, we’re hiring.