Tag Archives: How-to

Creating a User Activity Dashboard for Amazon CodeWhisperer

Post Syndicated from David Ernst original https://aws.amazon.com/blogs/devops/creating-a-user-activity-dashboard-for-amazon-codewhisperer/

Maximizing the value from Enterprise Software tools requires an understanding of who and how users interact with those tools. As we have worked with builders rolling out Amazon CodeWhisperer to their enterprises, identifying usage patterns has been critical.

This blog post is a result of that work, builds on Introducing Amazon CodeWhisperer Dashboard blog and Amazon CloudWatch metrics and enables customers to build dashboards to support their rollouts. Note that these features are only available in CodeWhisperer Professional plan.

Organizations have leveraged the existing Amazon CodeWhisperer Dashboard to gain insights into developer usage. This blog explores how we can supplement the existing dashboard with detailed user analytics. Identifying leading contributors has accelerated tool usage and adoption within organizations. Acknowledging and incentivizing adopters can accelerate a broader adoption.

he architecture diagram outlines a streamlined process for tracking and analyzing Amazon CodeWhisperer user login events. It begins with logging these events in CodeWhisperer and AWS CloudTrail and then forwarding them to Amazon CloudWatch Logs. To set up the CloudTrail, you will use Amazon S3 and AWS Key Management Service (KMS). An AWS Lambda function sifts through the logs, extracting user login information. The findings are then displayed on a CloudWatch Dashboard, visually representing users who have logged in and inactive users. This outlines how an organization can dive into CodeWhisperer's usage.

The architecture diagram outlines a streamlined process for tracking and analyzing Amazon CodeWhisperer usage events. It begins with logging these events in CodeWhisperer and AWS CloudTrail and then forwarding them to Amazon CloudWatch Logs. Configuring AWS CloudTrail involves using Amazon S3 for storage and AWS Key Management Service (KMS) for log encryption. An AWS Lambda function analyzes the logs, extracting information about user activity. This blog also introduces a AWS CloudFormation template that simplifies the setup process, including creating the CloudTrail with an S3 bucket KMS key and the Lambda function. The template also configures AWS IAM permissions, ensuring the Lambda function has access rights to interact with other AWS services.

Configuring CloudTrail for CodeWhisperer User Tracking

This section details the process for monitoring user interactions while using Amazon CodeWhisperer. The aim is to utilize AWS CloudTrail to record instances where users receive code suggestions from CodeWhisperer. This involves setting up a new CloudTrail trail tailored to log events related to these interactions. By accomplishing this, you lay a foundational framework for capturing detailed user activity data, which is crucial for the subsequent steps of analyzing and visualizing this data through a custom AWS Lambda function and an Amazon CloudWatch dashboard.

Setup CloudTrail for CodeWhisperer

1. Navigate to AWS CloudTrail Service.

2. Create Trail

3. Choose Trail Attributes

a. Click on Create Trail

b. Provide a Trail Name, for example, “cwspr-preprod-cloudtrail”

c. Choose Enable for all accounts in my organization

d. Choose Create a new Amazon S3 bucket to configure the Storage Location

e. For Trail log bucket and folder, note down the given unique trail bucket name in order to view the logs at a future point.

f. Check Enabled to encrypt log files with SSE-KMS encryption

j. Enter an AWS Key Management Service alias for log file SSE-KMS encryption, for example, “cwspr-preprod-cloudtrail”

h. Select Enabled for CloudWatch Logs

i. Select New

j. Copy the given CloudWatch Log group name, you will need this for the testing the Lambda function in a future step.

k. Provide a Role Name, for example, “CloudTrailRole-cwspr-preprod-cloudtrail”

l. Click Next.

This image depicts how to choose the trail attributes within CloudTrail for CodeWhisperer User Tracking.

4. Choose Log Events

a. Check “Management events“ and ”Data events“

b. Under Management events, keep the default options under API activity, Read and Write

c. Under Data event, choose CodeWhisperer for Data event type

d. Keep the default Log all events under Log selector template

e. Click Next

f. Review and click Create Trail

This image depicts how to choose the log events for CloudTrail for CodeWhisperer User Tracking.

Please Note: The logs will need to be included on the account which the management account or member accounts are enabled.

Gathering Application ARN for CodeWhisperer application

Step 1: Access AWS IAM Identity Center

1. Locate and click on the Services dropdown menu at the top of the console.

2. Search for and select IAM Identity Center (SSO) from the list of services.

Step 2: Find the Application ARN for CodeWhisperer application

1. In the IAM Identity Center dashboard, click on Application Assignments. -> Applications in the left-side navigation pane.

2. Locate the application with Service as CodeWhisperer and click on it

An image displays where you can find the Application in IAM Identity Center.

3. Copy the Application ARN and store it in a secure place. You will need this ID to configure your Lambda function’s JSON event.

An image shows where you will find the Application ARN after you click on you AWS managed application.

User Activity Analysis in CodeWhisperer with AWS Lambda

This section focuses on creating and testing our custom AWS Lambda function, which was explicitly designed to analyze user activity within an Amazon CodeWhisperer environment. This function is critical in extracting, processing, and organizing user activity data. It starts by retrieving detailed logs from CloudWatch containing CodeWhisperer user activity, then cross-references this data with the membership details obtained from the AWS Identity Center. This allows the function to categorize users into active and inactive groups based on their engagement within a specified time frame.

The Lambda function’s capability extends to fetching and structuring detailed user information, including names, display names, and email addresses. It then sorts and compiles these details into a comprehensive HTML output. This output highlights the CodeWhisperer usage in an organization.

Creating and Configuring Your AWS Lambda Function

1. Navigate to the Lambda service.

2. Click on Create function.

3. Choose Author from scratch.

4. Enter a Function name, for example, “AmazonCodeWhispererUserActivity”.

5. Choose Python 3.11 as the Runtime.

6. Click on ‘Create function’ to create your new Lambda function.

7. Access the Function: After creating your Lambda function, you will be directed to the function’s dashboard. If not, navigate to the Lambda service, find your function “AmazonCodeWhispererUserActivity”, and click on it.

8. Copy and paste your Python code into the inline code editor on the function’s dashboard.

9. Click ‘Deploy’ to save and deploy your code to the Lambda function. The lambda function zip file can be found here.

10. You have now successfully created and configured an AWS Lambda function with our Python code.

This image depicts how to configure your AWS Lambda function for tracking user activity in CodeWhisperer.

Updating the Execution Role for Your AWS Lambda Function

After you’ve created your Lambda function, you need to ensure it has the appropriate permissions to interact with other AWS services like CloudWatch Logs and AWS Identity Store. Here’s how you can update the IAM role permissions:

Locate the Execution Role:

1. Open Your Lambda Function’s Dashboard in the AWS Management Console.

2. Click on the ‘Configuration’ tab located near the top of the dashboard.

3. Set the Time Out setting to 15 minutes from the default 3 seconds

4. Select the ‘Permissions’ menu on the left side of the Configuration page.

5. Find the ‘Execution role’ section on the Permissions page.

6. Click on the Role Name to open the IAM (Identity and Access Management) role associated with your Lambda function.

7. In the IAM role dashboard, click on the Policy Name under the Permissions policies.

8. Edit the existing policy: Replace the policy with the following JSON.

9. Save the changes to the policy.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Action":[
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents",
            "logs:StartQuery",
            "logs:GetQueryResults",
            "sso:ListInstances",
            "identitystore:DescribeUser",
            "identitystore:ListUsers",
            "identitystore:ListGroupMemberships"
         ],
         "Resource":"*",
         "Effect":"Allow"
      },
      {
         "Action":[
            "cloudtrail:DescribeTrails",
            "cloudtrail:GetTrailStatus"
         ],
         "Resource":"*",
         "Effect":"Allow"
      }
   ]
} Your AWS Lambda function now has the necessary permissions to execute and interact with CloudWatch Logs and AWS Identity Store. This image depicts the permissions after the Lambda policies are updated. 

Testing Lambda Function with custom input

1. On your Lambda function’s dashboard.

2. On the function’s dashboard, locate the Test button near the top right corner.

3. Click on Test. This opens a dialog for configuring a new test event.

4. In the dialog, you’ll see an option to create a new test event. If it’s your first test, you’ll be prompted automatically to create a new event.

5. For Event name, enter a descriptive name for your test, such as “TestEvent”.

6. In the event code area, replace the existing JSON with your specific input:

{
"log_group_name": "{Insert Log Group Name}",
"start_date": "{Insert Start Date}",
"end_date": "{Insert End Date}",
"codewhisperer_application_arn": "{Insert Codewhisperer Application ARN}"
}

7. This JSON structure includes:

a. log_group_name: The name of the log group in CloudWatch Logs.

b. start_date: The start date and time for the query, formatted as “YYYY-MM-DD HH:MM:SS”.

c. end_date: The end date and time for the query, formatted as “YYYY-MM-DD HH:MM:SS”.

e. codewhisperer_application_arn: The ARN of the Code Whisperer Application in the AWS Identity Store.

8. Click on Save to store this test configuration.

This image depicts an example of creating a test event for the Lambda function with example JSON parameters entered.

9. With the test event selected, click on the Test button again to execute the function with this event.

10. The function will run, and you’ll see the execution result at the top of the page. This includes execution status, logs, and output.

11. Check the Execution result section to see if the function executed successfully.

This image depicts what a test case that successfully executed looks like.

Visualizing CodeWhisperer User Activity with Amazon CloudWatch Dashboard

This section focuses on effectively visualizing the data processed by our AWS Lambda function using a CloudWatch dashboard. This part of the guide provides a step-by-step approach to creating a “CodeWhispererUserActivity” dashboard within CloudWatch. It details how to add a custom widget to display the results from the Lambda Function. The process includes configuring the widget with the Lambda function’s ARN and the necessary JSON parameters.

1. On the AWS Management Console and navigate to the CloudWatch service.

2. Create a New Dashboard: Click on ‘Dashboards’ in the left-hand navigation pane, then click on ‘Create dashboard’. Name your dashboard “CodeWhispererUserActivity” and click ‘Create Dashboard’.

3. Select “Other Content Types”, choose “Custom Widget”, and then click ‘Next’.

4. Configure the Lambda Function Widget: Enter your Lambda function’s ARN (Amazon Resource Name) or use the dropdown menu to find and select your “CodeWhispererUserActivity” function. Then, add the JSON Parameters.

5. Click ‘Add Widget’. The dashboard will update to include your new widget and will run the Lambda function to retrieve initial data.

6. Customize Your Dashboard: Arrange the dashboard by dragging and resizing widgets for optimal organization and visibility. Adjust the time range and refresh settings as needed to suit your monitoring requirements.

7. Save the Dashboard Configuration: After setting up and customizing your dashboard, click ‘Save dashboard’ to preserve your layout and settings.

This image depicts what the dashboard looks like. It showcases active users and inactive users, with first name, last name, display name, and email.

CloudFormation Deployment for the CodeWhisperer Dashboard

The blog post concludes with a detailed AWS CloudFormation template designed to automate the setup of the necessary infrastructure for the Amazon CodeWhisperer User Activity Dashboard. This template provisions AWS resources, streamlining the deployment process. It includes the configuration of AWS CloudTrail for tracking user interactions, setting up CloudWatch Logs for logging and monitoring, and creating an AWS Lambda function for analyzing user activity data. Additionally, the template defines the required IAM roles and permissions, ensuring the Lambda function has access to the needed AWS services and resources.

The blog post also provides a JSON configuration for the CloudWatch dashboard. This is because, at the time of writing, AWS CloudFormation does not natively support the creation and configuration of CloudWatch dashboards. Therefore, the JSON configuration is necessary to manually set up the dashboard in CloudWatch, allowing users to visualize the processed data from the Lambda function. Here is the CloudFormation template.

Create a CloudWatch Dashboard and import the JSON below.

{
   "widgets":[
      {
         "height":19,
         "width":7,
         "y":0,
         "x":0,
         "type":"custom",
         "properties":{
            "endpoint":"{Insert ARN of Lambda Function}",
            "updateOn":{
               "refresh":true,
               "resize":true,
               "timeRange":true
            },
            "params":{
               "log_group_name":"{Insert Log Group Name}",
               "start_date":"{Insert Start Date}",
               "end_date":"{Insert End Date}",
               "identity_store_id":"{Insert Identity Store ID}",
               "group_id":"{Insert Group ID}"
            }
         }
      }
   ]
}

Conclusion

In this blog, we detail a comprehensive process for establishing a user activity dashboard for Amazon CodeWhisperer to deliver data to support an enterprise rollout. The journey begins with setting up AWS CloudTrail to log user interactions with CodeWhisperer. This foundational step ensures the capture of detailed activity events, which is vital for our subsequent analysis. We then construct a tailored AWS Lambda function to sift through CloudTrail logs. Then, create a dashboard in AWS CloudWatch. This dashboard serves as a central platform for displaying the user data from our Lambda function in an accessible, user-friendly format.

You can reference the existing CodeWhisperer dashboard for additional insights. The Amazon CodeWhisperer Dashboard offers a comprehensive view summarizing valuable data about how your developers use the service.

Overall, this dashboard empowers you to track, understand, and influence the adoption and effective use of Amazon CodeWhisperer in your organizations, optimizing the tool’s deployment and fostering a culture of informed data-driven usage.

About the authors:

David Ernst

David Ernst is an AWS Sr. Solution Architect with a DevOps and Generative AI background, leveraging over 20 years of IT experience to drive transformational change for AWS’s customers. Passionate about leading teams and fostering a culture of continuous improvement, David excels in architecting and managing cloud-based solutions, emphasizing automation, infrastructure as code, and continuous integration/delivery.

Riya Dani

Riya Dani is a Solutions Architect at Amazon Web Services (AWS), responsible for helping Enterprise customers on their journey in the cloud. She has a passion for learning and holds a Bachelor’s & Master’s degree in Computer Science from Virginia Tech. In her free time, she enjoys staying active and reading.

Vikrant Dhir

Vikrant Dhir is a AWS Solutions Architect helping systemically important financial services institutions innovate on AWS. He specializes in Containers and Container Security and helps customers build and run enterprise grade Kubernetes Clusters using Amazon Elastic Kubernetes Service(EKS). He is an avid programmer proficient in a number of languages such as Java, NodeJS and Terraform.

Extending Zabbix: the power of scripting

Post Syndicated from Giedrius Stasiulionis original https://blog.zabbix.com/extending-zabbix-the-power-of-scripting/27401/

Scripts can extend Zabbix in various different aspects. If you know your ways around a CLI, you will be able to extend your monitoring capabilities and streamline workflows related to most Zabbix components.

What I like about Zabbix is that it is very flexible and powerful tool right out of the box. It has many different ways to collect, evaluate and visualize data, all implemented natively and ready to use.

However, in more complex environments or custom use cases, you will inevitably face situations when something can’t be collected (or displayed) in a way that you want. Luckily enough, Zabbix is flexible even here! It provides you with ways to apply your knowledge and imagination so that even most custom monitoring scenarios would be covered. Even though Zabbix is an open-source tool, in this article I will talk about extending it without changing its code, but rather by applying something on top, with the help of scripting. I will guide you through some examples, which will hopefully pique your curiosity and maybe you will find them interesting enough to experiment and create something similar for yourself.

Although first idea which comes to ones mind when talking about scripts in Zabbix is most likely data collection, it is not the only place where scripts can help. So I will divide those examples / ideas into three sub categories:

  • Data collection
  • Zabbix internals
  • Visualization

Data collection

First things first. Data collection is a starting point for any kind of monitoring. There are multiple ways how to collect data in “custom” ways, but the easiest one is to use UserParameter capabilities. Basics of it are very nicely covered by official documentation or in other sources, e.g. in this video by Dmitry Lambert, so I will skip the “Hello World” part and provide some more advanced ideas which might be useful to consider. Also, the provided examples use common scripting themes/scenarios and you can find many similar solutions in the community, so maybe this will serve better as a reminder or a showcase for someone who has never created any custom items before.

Data collection: DB checks

There is a lot of good information on how to setup DB checks for Zabbix, so this is just a reminder, that one of the ways to do it is via custom scripts. I personally have done it for various different databases: MySQL, Oracle, PostgreSQL, OpenEdge Progress. Thing is ODBC is not always a great or permitted way to go, since some security restrictions might be in place and you can’t get direct access to DB from just anywhere you want. Or you want to transform your retrieved data in a ways that are complex and could hardly be covered by preprocessing. Then you have to rely on Zabbix agent running those queries either from localhost where DB resides or from some other place which is allowed to connect to your DB. Here is an example how you can do it for PostgreSQL

#!/bin/bash

my_dir="$(dirname ${0})"
conf_file="${my_dir}/sms_queue.conf"

[[ ! -f $conf_file ]] && echo -1 && exit 1

. ${conf_file}

export PGPASSWORD="${db_pass}"

query="SELECT COUNT(*) FROM sms WHERE sms.status IN ('retriable', 'validated');"

psql -h "${db_host}" -p "${db_port}" -U "${db_user}" -d "${db}" -c "${query}" -At 2>/dev/null

[[ $? -ne 0 ]] && echo -1 && exit 1

exit 0

Now what’s left is to feed the output of this script into Zabbix via UserParameter. Similar approach can be applied to Oracle (via sqlplus) or MySQL.

Data collection: log delay statistics

I once faced a situation when some graphs which are based on log data started having gaps. It meant something was wrong either with data collection (Zabbix agent) or with data not being there at the moment of collection (so nothing to collect). Quick check suggested it was the second one, but I needed to prove it somehow.

Since these log lines had timestamps of creation, it was a logical step to try to measure, how much do they differ from “current time” of reading. And this is how I came up with the following custom script to implement such idea.

First of all, we need to read the file, say once each minute. We are talking about log with several hundreds of thousands lines per minute, so this script should be made efficient. It should read the file in portions created between two script runs. I have explained such reading in details here so now we will not focus on it.

Next what this script does is it greps timestamps only from each line and counts immediately number of unique lines with the same timestamp (degree of seconds). That is where it becomes fast – it doesn’t need to analyze each and every line individually but it can analyze already grouped content!

Finally, delay is calculated based on the difference between “now” and collected timestamps, and those counters are exactly what is then passed to Zabbix.

#!/bin/bash

my_log="${1}"

my_project="${my_log##*\/}"
my_project="${my_project%%.log}"

me="$(basename ${0})"
my_dir="/tmp/log_delays/${my_project}"

[[ ! -d ${my_dir} ]] && mkdir -p ${my_dir}

# only one instance of this script at single point of time
# this makes sure you don't damage temp files

me_running="${my_dir}/${me}.running"

# allow only one process
# but make it more sophisticated:
# script is being run each minute
# if .running file is here for more than 10 minutes, something is wrong
# delete .running and try to run once again

[[ -f $me_running && $(($(date +%s)-$(stat -c %Y $me_running))) -lt 600 ]] && exit 1

touch $me_running

[[ "${my_log}" == "" || ! -f "${my_log}" ]] && exit 1

log_read="${my_dir}/${me}.read"

# get current file size in bytes

current_size=$(wc -c < "${my_log}")

# remember how many bytes you have now for next read
# when run for first time, you don't know the previous

[[ ! -f "${log_read}" ]] && echo "${current_size}" > "${log_read}"

bytes_read=$(cat "${log_read}")
echo "${current_size}" > "${log_read}"

# if rotated, let's read from the beginning

if [[ ${bytes_read} -gt ${current_size} ]]; then
  bytes_read=0
fi



# get the portion

now=$(date +%s)

delay_1_min=0
delay_5_min=0
delay_10_min=0
delay_30_min=0
delay_45_min=0
delay_60_min=0
delay_rest=0

while read line; do

  [[ ${line} == "" ]] && continue

  line=(${line})

  ts=$(date -d "${line[1]}+00:00" +%s)

  delay=$((now-ts))

  if [[ ${delay} -lt 60 ]]; then
    delay_1_min=$((${delay_1_min}+${line[0]}))
  elif [[ ${delay} -lt 300 ]]; then
    delay_5_min=$((${delay_5_min}+${line[0]}))
  elif [[ ${delay} -lt 600 ]]; then
    delay_10_min=$((${delay_10_min}+${line[0]}))
  elif [[ ${delay} -lt 1800 ]]; then
    delay_30_min=$((${delay_30_min}+${line[0]}))
  elif [[ ${delay} -lt 2700 ]]; then
    delay_45_min=$((${delay_45_min}+${line[0]}))
  elif [[ ${delay} -lt 3600 ]]; then
    delay_60_min=$((${delay_60_min}+${line[0]}))
  else
    delay_rest=$((${delay_rest}+${line[0]}))
  fi

done <<< "$(tail -c +$((bytes_read+1)) "${my_log}" | head -c $((current_size-bytes_read)) | grep -Po "(?<=timestamp\":\")(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})(?=\.)" | sort | uniq -c | sort -k1nr)"

echo "delay_1_min=${delay_1_min}
delay_5_min=${delay_5_min}
delay_10_min=${delay_10_min}
delay_30_min=${delay_30_min}
delay_45_min=${delay_45_min}
delay_60_min=${delay_60_min}
delay_rest=${delay_rest}"



rm -f "${me_running}"

exit 0

Now on Zabbix side, there is an item running this script and 7 dependent items, representing the degree of delay. Since there are many logs for which this data is collected, it is all put into LLD based on contents of specific directory:

vfs.dir.get[/var/log/logs,".*log$",,,,,1000]

This LLD then provides two macros:

And item prototypes will look like:

Those dependent items have one simple preprocessing step which takes needed number out of the script output:

So the final result is the nice graph in dashboard, showing exactly when and what degree delays do appear:

So as you see, it is relatively easy to collect just about any data you wish, once you know how. As you can see from these examples, it might be something more complex but it can also be just a simple one-liner – in any case it should be obvious that possibilities are endless when talking about scripts in data collection. If something is executable from the CLI and has a valuable output, go ahead and collect it!

Zabbix internals

Another area where scripts can be really useful is adjusting how Zabbix behaves or controlling this behavior automatically. And in this case, we will employ Zabbix API, since it’s designed exactly for such or similar purposes.

Zabbix internals: automatically disabling problematic item

In our environment, we have many logs to be analyzed. And some of them sometimes go crazy – something that we intend to catch starts appearing there too often and requires attention – typically we would have to adjust the regexp, temporarily suppress some patterns and inform responsible teams about too extensive logging. If you don’t (or can’t) pay attention quick, it might kill Zabbix – history write cache starts filling up. So what we do is automatically detect such an item with most values received during some most recent short period of time and automatically disable it.

First of all there are two items – the one measuring history write cache and the other one extracting top item in the given table

[root@linux ~]# zabbix_agentd -t zabbix.db.max[history_log,30] 2>/dev/null
zabbix.db.max[history_log,30] [t|463 1997050]
[root@linux ~]#

First number here is values gathered during provided period, second one is item id. The script behind this item looks like this

[root@linux ~]# grep zabbix.db.max /etc/zabbix/zabbix_agentd.d/userparameter_mysql.conf
UserParameter=zabbix.db.max[*],HOME=/etc/zabbix mysql -BN -e "USE zabbix; SELECT count(*), itemid FROM $1 WHERE clock >= unix_timestamp(NOW() - INTERVAL $2 MINUTE) GROUP BY itemid ORDER BY count(*) DESC LIMIT 1;"
[root@linux ~]#

And now relying on the history write cache item values showing us drop, we construct a trigger:

And as a last step, such trigger invokes action, which is running the script that disables the item with given ID with the help of Zabbix API, method “item.update”

Now we are able to avoid unexpected behavior of our data sources affecting Zabbix performance, all done automatically – thanks to the scripts!

Zabbix internals: add host to group via frontend scripts

Zabbix maintenance mode is a great feature allowing us to reduce noise or avoid some false positive alerts once specific host is known to have issues. At some point we found it would be convenient to be able to add (or remove) specific host into (from) maintenance directly from “Problems” window. And that is possible and achieved via a frontend script, again with the help of Zabbix API, this time methods “host.get”, “hostgroup.get”, “hostgroup.massadd” and “hostgroup.massremove”

Data visualization

Zabbix has many different widgets that are able to cover various different ways of displaying your collected data. But in some cases, you might find yourself missing some small type of “something” which would allow your dashboards to shine even more – at least I constantly face it. Starting From version 6.4 Zabbix allows you to create your own widgets but it might be not such a straightforward procedure if you have little or no programming experience. However, you can employ two already existing widgets in order to customize your dashboard look in pretty easy way.

Data visualization: URL widget

First one example is done using the URL widget. You might feed just about any content there, so if you have any web development skills, you can easily create something which would look like custom widget. Here is an example. I need a clock but not the one already provided by Zabbix as a separate clock widget – I want to have a digital clock and I also want this clock to have a section, which would display the employee on duty now and in an upcoming shift. So with a little bit of HTML, CSS and JavaScript / AJAX, I have this

With styles properly chosen, such content can be smoothly integrated into dashboards, along with other widgets.

Data visualization: plain text widget with HTML formatting

Another useful widget which is often overlooked is the “Plain text” widget – in combination with the following parameters:

It becomes a very powerful tool to display nicely formatted data snapshots. Simple yet very good example here would be to display some content, which requires human readable structure – a table.

So again, integration with other dashboard widgets is so smooth – with just some custom HTML / CSS around your data you wrap it into something that looks like brand new “table” widget. Isn’t it awesome? And you are of course not limited to tables… Just use your imagination!

Conclusion

Although I personally prefer bash as the first option to solve things, there is no big difference regarding which scripting or programming languages to choose when extending Zabbix in these ways. Just try anything you feel most comfortable with.

I hope that examples shown here inspired you in some ways. Happy scripting!

The post Extending Zabbix: the power of scripting appeared first on Zabbix Blog.

Simplify data streaming ingestion for analytics using Amazon MSK and Amazon Redshift

Post Syndicated from Sebastian Vlad original https://aws.amazon.com/blogs/big-data/simplify-data-streaming-ingestion-for-analytics-using-amazon-msk-and-amazon-redshift/

Towards the end of 2022, AWS announced the general availability of real-time streaming ingestion to Amazon Redshift for Amazon Kinesis Data Streams and Amazon Managed Streaming for Apache Kafka (Amazon MSK), eliminating the need to stage streaming data in Amazon Simple Storage Service (Amazon S3) before ingesting it into Amazon Redshift.

Streaming ingestion from Amazon MSK into Amazon Redshift, represents a cutting-edge approach to real-time data processing and analysis. Amazon MSK serves as a highly scalable, and fully managed service for Apache Kafka, allowing for seamless collection and processing of vast streams of data. Integrating streaming data into Amazon Redshift brings immense value by enabling organizations to harness the potential of real-time analytics and data-driven decision-making.

This integration enables you to achieve low latency, measured in seconds, while ingesting hundreds of megabytes of streaming data per second into Amazon Redshift. At the same time, this integration helps make sure that the most up-to-date information is readily available for analysis. Because the integration doesn’t require staging data in Amazon S3, Amazon Redshift can ingest streaming data at a lower latency and without intermediary storage cost.

You can configure Amazon Redshift streaming ingestion on a Redshift cluster using SQL statements to authenticate and connect to an MSK topic. This solution is an excellent option for data engineers that are looking to simplify data pipelines and reduce the operational cost.

In this post, we provide a complete overview on how to configure Amazon Redshift streaming ingestion from Amazon MSK.

Solution overview

The following architecture diagram describes the AWS services and features you will be using.

architecture diagram describing the AWS services and features you will be using

The workflow includes the following steps:

  1. You start with configuring an Amazon MSK Connect source connector, to create an MSK topic, generate mock data, and write it to the MSK topic. For this post, we work with mock customer data.
  2. The next step is to connect to a Redshift cluster using the Query Editor v2.
  3. Finally, you configure an external schema and create a materialized view in Amazon Redshift, to consume the data from the MSK topic. This solution does not rely on an MSK Connect sink connector to export the data from Amazon MSK to Amazon Redshift.

The following solution architecture diagram describes in more detail the configuration and integration of the AWS services you will be using.
solution architecture diagram describing in more detail the configuration and integration of the AWS services you will be using
The workflow includes the following steps:

  1. You deploy an MSK Connect source connector, an MSK cluster, and a Redshift cluster within the private subnets on a VPC.
  2. The MSK Connect source connector uses granular permissions defined in an AWS Identity and Access Management (IAM) in-line policy attached to an IAM role, which allows the source connector to perform actions on the MSK cluster.
  3. The MSK Connect source connector logs are captured and sent to an Amazon CloudWatch log group.
  4. The MSK cluster uses a custom MSK cluster configuration, allowing the MSK Connect connector to create topics on the MSK cluster.
  5. The MSK cluster logs are captured and sent to an Amazon CloudWatch log group.
  6. The Redshift cluster uses granular permissions defined in an IAM in-line policy attached to an IAM role, which allows the Redshift cluster to perform actions on the MSK cluster.
  7. You can use the Query Editor v2 to connect to the Redshift cluster.

Prerequisites

To simplify the provisioning and configuration of the prerequisite resources, you can use the following AWS CloudFormation template:

Complete the following steps when launching the stack:

  1. For Stack name, enter a meaningful name for the stack, for example, prerequisites.
  2. Choose Next.
  3. Choose Next.
  4. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  5. Choose Submit.

The CloudFormation stack creates the following resources:

  • A VPC custom-vpc, created across three Availability Zones, with three public subnets and three private subnets:
    • The public subnets are associated with a public route table, and outbound traffic is directed to an internet gateway.
    • The private subnets are associated with a private route table, and outbound traffic is sent to a NAT gateway.
  • An internet gateway attached to the Amazon VPC.
  • A NAT gateway that is associated with an elastic IP and is deployed in one of the public subnets.
  • Three security groups:
    • msk-connect-sg, which will be later associated with the MSK Connect connector.
    • redshift-sg, which will be later associated with the Redshift cluster.
    • msk-cluster-sg, which will be later associated with the MSK cluster. It allows inbound traffic from msk-connect-sg, and redshift-sg.
  • Two CloudWatch log groups:
    • msk-connect-logs, to be used for the MSK Connect logs.
    • msk-cluster-logs, to be used for the MSK cluster logs.
  • Two IAM Roles:
    • msk-connect-role, which includes granular IAM permissions for MSK Connect.
    • redshift-role, which includes granular IAM permissions for Amazon Redshift.
  • A custom MSK cluster configuration, allowing the MSK Connect connector to create topics on the MSK cluster.
  • An MSK cluster, with three brokers deployed across the three private subnets of custom-vpc. The msk-cluster-sg security group and the custom-msk-cluster-configuration configuration are applied to the MSK cluster. The broker logs are delivered to the msk-cluster-logs CloudWatch log group.
  • A Redshift cluster subnet group, which is using the three private subnets of custom-vpc.
  • A Redshift cluster, with one single node deployed in a private subnet within the Redshift cluster subnet group. The redshift-sg security group and redshift-role IAM role are applied to the Redshift cluster.

Create an MSK Connect custom plugin

For this post, we use an Amazon MSK data generator deployed in MSK Connect, to generate mock customer data, and write it to an MSK topic.

Complete the following steps:

  1. Download the Amazon MSK data generator JAR file with dependencies from GitHub.
    awslabs github page for downloading the jar file of the amazon msk data generator
  2. Upload the JAR file into an S3 bucket in your AWS account.
    amazon s3 console image showing the uploaded jar file in an s3 bucket
  3. On the Amazon MSK console, choose Custom plugins under MSK Connect in the navigation pane.
  4. Choose Create custom plugin.
  5. Choose Browse S3, search for the Amazon MSK data generator JAR file you uploaded to Amazon S3, then choose Choose.
  6. For Custom plugin name, enter msk-datagen-plugin.
  7. Choose Create custom plugin.

When the custom plugin is created, you will see that its status is Active, and you can move to the next step.
amazon msk console showing the msk connect custom plugin being successfully created

Create an MSK Connect connector

Complete the following steps to create your connector:

  1. On the Amazon MSK console, choose Connectors under MSK Connect in the navigation pane.
  2. Choose Create connector.
  3. For Custom plugin type, choose Use existing plugin.
  4. Select msk-datagen-plugin, then choose Next.
  5. For Connector name, enter msk-datagen-connector.
  6. For Cluster type, choose Self-managed Apache Kafka cluster.
  7. For VPC, choose custom-vpc.
  8. For Subnet 1, choose the private subnet within your first Availability Zone.

For the custom-vpc created by the CloudFormation template, we are using odd CIDR ranges for public subnets, and even CIDR ranges for the private subnets:

    • The CIDRs for the public subnets are 10.10.1.0/24, 10.10.3.0/24, and 10.10.5.0/24
    • The CIDRs for the private subnets are 10.10.2.0/24, 10.10.4.0/24, and 10.10.6.0/24
  1. For Subnet 2, select the private subnet within your second Availability Zone.
  2. For Subnet 3, select the private subnet within your third Availability Zone.
  3. For Bootstrap servers, enter the list of bootstrap servers for TLS authentication of your MSK cluster.

To retrieve the bootstrap servers for your MSK cluster, navigate to the Amazon MSK console, choose Clusters, choose msk-cluster, then choose View client information. Copy the TLS values for the bootstrap servers.

  1. For Security groups, choose Use specific security groups with access to this cluster, and choose msk-connect-sg.
  2. For Connector configuration, replace the default settings with the following:
connector.class=com.amazonaws.mskdatagen.GeneratorSourceConnector
tasks.max=2
genkp.customer.with=#{Code.isbn10}
genv.customer.name.with=#{Name.full_name}
genv.customer.gender.with=#{Demographic.sex}
genv.customer.favorite_beer.with=#{Beer.name}
genv.customer.state.with=#{Address.state}
genkp.order.with=#{Code.isbn10}
genv.order.product_id.with=#{number.number_between '101','109'}
genv.order.quantity.with=#{number.number_between '1','5'}
genv.order.customer_id.matching=customer.key
global.throttle.ms=2000
global.history.records.max=1000
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=false
  1. For Connector capacity, choose Provisioned.
  2. For MCU count per worker, choose 1.
  3. For Number of workers, choose 1.
  4. For Worker configuration, choose Use the MSK default configuration.
  5. For Access permissions, choose msk-connect-role.
  6. Choose Next.
  7. For Encryption, select TLS encrypted traffic.
  8. Choose Next.
  9. For Log delivery, choose Deliver to Amazon CloudWatch Logs.
  10. Choose Browse, select msk-connect-logs, and choose Choose.
  11. Choose Next.
  12. Review and choose Create connector.

After the custom connector is created, you will see that its status is Running, and you can move to the next step.
amazon msk console showing the msk connect connector being successfully created

Configure Amazon Redshift streaming ingestion for Amazon MSK

Complete the following steps to set up streaming ingestion:

  1. Connect to your Redshift cluster using Query Editor v2, and authenticate with the database user name awsuser, and password Awsuser123.
  2. Create an external schema from Amazon MSK using the following SQL statement.

In the following code, enter the values for the redshift-role IAM role, and the msk-cluster cluster ARN.

CREATE EXTERNAL SCHEMA msk_external_schema
FROM MSK
IAM_ROLE '<insert your redshift-role arn>'
AUTHENTICATION iam
CLUSTER_ARN '<insert your msk-cluster arn>';
  1. Choose Run to run the SQL statement.

redshift query editor v2 showing the SQL statement used to create an external schema from amazon msk

  1. Create a materialized view using the following SQL statement:
CREATE MATERIALIZED VIEW msk_mview AUTO REFRESH YES AS
SELECT
    "kafka_partition",
    "kafka_offset",
    "kafka_timestamp_type",
    "kafka_timestamp",
    "kafka_key",
    JSON_PARSE(kafka_value) as Data,
    "kafka_headers"
FROM
    "dev"."msk_external_schema"."customer"
  1. Choose Run to run the SQL statement.

redshift query editor v2 showing the SQL statement used to create a materialized view

  1. You can now query the materialized view using the following SQL statement:
select * from msk_mview LIMIT 100;
  1. Choose Run to run the SQL statement.

redshift query editor v2 showing the SQL statement used to query the materialized view

  1. To monitor the progress of records loaded via streaming ingestion, you can take advantage of the SYS_STREAM_SCAN_STATES monitoring view using the following SQL statement:
select * from SYS_STREAM_SCAN_STATES;
  1. Choose Run to run the SQL statement.

redshift query editor v2 showing the SQL statement used to query the sys stream scan states monitoring view

  1. To monitor errors encountered on records loaded via streaming ingestion, you can take advantage of the SYS_STREAM_SCAN_ERRORS monitoring view using the following SQL statement:
select * from SYS_STREAM_SCAN_ERRORS;
  1. Choose Run to run the SQL statement.redshift query editor v2 showing the SQL statement used to query the sys stream scan errors monitoring view

Clean up

After following along, if you no longer need the resources you created, delete them in the following order to prevent incurring additional charges:

  1. Delete the MSK Connect connector msk-datagen-connector.
  2. Delete the MSK Connect plugin msk-datagen-plugin.
  3. Delete the Amazon MSK data generator JAR file you downloaded, and delete the S3 bucket you created.
  4. After you delete your MSK Connect connector, you can delete the CloudFormation template. All the resources created by the CloudFormation template will be automatically deleted from your AWS account.

Conclusion

In this post, we demonstrated how to configure Amazon Redshift streaming ingestion from Amazon MSK, with a focus on privacy and security.

The combination of the ability of Amazon MSK to handle high throughput data streams with the robust analytical capabilities of Amazon Redshift empowers business to derive actionable insights promptly. This real-time data integration enhances the agility and responsiveness of organizations in understanding changing data trends, customer behaviors, and operational patterns. It allows for timely and informed decision-making, thereby gaining a competitive edge in today’s dynamic business landscape.

This solution is also applicable for customers that are looking to use Amazon MSK Serverless and Amazon Redshift Serverless.

We hope this post was a good opportunity to learn more about AWS service integration and configuration. Let us know your feedback in the comments section.


About the authors

Sebastian Vlad is a Senior Partner Solutions Architect with Amazon Web Services, with a passion for data and analytics solutions and customer success. Sebastian works with enterprise customers to help them design and build modern, secure, and scalable solutions to achieve their business outcomes.

Sharad Pai is a Lead Technical Consultant at AWS. He specializes in streaming analytics and helps customers build scalable solutions using Amazon MSK and Amazon Kinesis. He has over 16 years of industry experience and is currently working with media customers who are hosting live streaming platforms on AWS, managing peak concurrency of over 50 million. Prior to joining AWS, Sharad’s career as a lead software developer included 9 years of coding, working with open source technologies like JavaScript, Python, and PHP.

HPC Monitoring: Transitioning from Nagios and Ganglia to Zabbix 6

Post Syndicated from Mark Vilensky original https://blog.zabbix.com/hpc-monitoring-transitioning-from-nagios-and-ganglia-to-zabbix-6/27313/

My name is Mark Vilensky, and I’m currently the Scientific Computing Manager at the Weizmann Institute of Science in Rehovot, Israel. I’ve been working in High-Performance Computing (HPC) for the past 15 years.

Our base is at the Chemistry Faculty at the Weizmann Institute, where our HPC activities follow a traditional path — extensive number crunching, classical calculations, and a repertoire that includes handling differential equations. Over the years, we’ve embraced a spectrum of technologies, even working with actual supercomputers like the SGI Altix.

Our setup

As of now, our system boasts nearly 600 compute nodes, collectively wielding about 25,000 cores. The interconnect is Infiniband, and for management, provisioning, and monitoring, we rely on Ethernet. Our storage infrastructure is IBM GPFS on DDN hardware, and job submissions are facilitated through PBS Professional.

We use VMware for the system management. Surprisingly, the team managing this extensive system comprises only three individuals. The hardware landscape features HPE, Dell, and Lenovo servers.

The path to Zabbix

Recent challenges have surfaced in the monitoring domain, prompting considerations for an upgrade to Red Hat 8 or a comparable distribution. Our existing monitoring framework involved Nagios and Ganglia, but they had some severe limitations — Nagios’ lack of scalability and Ganglia’s Python 2 compatibility issues have become apparent.

Exploring alternatives led us to Zabbix, a platform not commonly encountered in supercomputing conferences but embraced by the community. Fortunately, we found a great YouTube channel by Dmitry Lambert that not only gives some recipes for doing things but also provides an overview required for planning, sizing, and avowing future troubles.

Our Zabbix setup resides in a modest VM, sporting 16 CPUs, 32 GB RAM, and three Ethernet interfaces, all operating within the Rocky 8.7 environment. The database relies on PostgreSQL 14 and Timescale DB2 version 2.8, with slight adjustments to the default configurations for history and trend settings.

Getting the job done

The stability of our Zabbix system has been noteworthy, showcasing its ability to automate tasks, particularly in scenarios where nodes are taken offline, prompting Zabbix to initiate maintenance cycles automatically. Beyond conventional monitoring, we’ve tapped into Zabbix’s capabilities for external scripts, querying the PBS server and GPFS server, and even managing specific hardware anomalies.

The Zabbix dashboard has emerged as a comprehensive tool, offering a differentiated approach through host groups. These groups categorize our hosts, differentiating between CPU compute nodes, GPU compute nodes, and infrastructure nodes, allowing tailored alerts based on node types.

Alerting and visualization

Our alerting strategy involves receiving email alerts only for significant disasters, a conscious effort to avoid alert fatigue. The presentation emphasizes the nuanced differences in monitoring compute nodes versus infrastructure nodes, focusing on availability and potential job performance issues for the former and services, memory, and memory leaks for the latter.

The power of visual representations is underscored, with the utilization of heat maps offering quick insights into the cluster’s performance.

Final thoughts

In conclusion, our journey with Zabbix has not only delivered stability and automation but has also provided invaluable insights for optimizing resource utilization. I’d like to express my special appreciation for Andrei Vasilev, a member of our team whose efforts have been instrumental in making the transition to Zabbix.

The post HPC Monitoring: Transitioning from Nagios and Ganglia to Zabbix 6 appeared first on Zabbix Blog.

Introducing zabbix_utils – the official Python library for Zabbix API

Post Syndicated from Aleksandr Iantsen original https://blog.zabbix.com/python-zabbix-utils/27056/

Zabbix is a flexible and universal monitoring solution that integrates with a wide variety of different systems right out of the box. Despite actively expanding the list of natively supported systems for integration (via templates or webhook integrations), there may still be a need to integrate with custom systems and services that are not yet supported. In such cases, a library taking care of implementing interaction protocols with the Zabbix API, Zabbix server/proxy, or Agent/Agent2 becomes extremely useful. Given that Python is widely adopted among DevOps and SRE engineers as well as server administrators, we decided to release a library for this programming language first.

We are pleased to introduce zabbix_utils – a Python library for seamless interaction with Zabbix API, Zabbix server/proxy, and Zabbix Agent/Agent2. Of course, there are popular community solutions for working with these Zabbix components in Python. Keeping this fact in mind, we have tried to consolidate popular issues and cases along with our experience to develop as convenient a tool as possible. Furthermore, we made sure that transitioning to the tool is as straightforward and clear as possible. Thanks to official support, you can be confident that the current version of the library is compatible with the latest Zabbix release.

In this article, we will introduce you to the main capabilities of the library and provide examples of how to use it with Zabbix components.

Usage Scenarios

The zabbix_utils library can be used in the following scenarios, but is not limited to them:

  • Zabbix automation
  • Integration with third-party systems
  • Custom monitoring solutions
  • Data export (hosts, templates, problems, etc.)
  • Integration into your Python application for Zabbix monitoring support
  • Anything else that comes to mind

You can use zabbix_utils for automating Zabbix tasks, such as scripting the automatic monitoring setup of your IT infrastructure objects. This can involve using ZabbixAPI for the direct management of Zabbix objects, Sender for sending values to hosts, and Getter for gathering data from Agents. We will discuss Sender and Getter in more detail later in this article.

For example, let’s imagine you have an infrastructure consisting of different branches. Each server or workstation is deployed from an image with an automatically configured Zabbix Agent and each branch is monitored by a Zabbix proxy since it has an isolated network. Your custom service or script can fetch a list of this equipment from your CMDB system, along with any additional information. It can then use this data to create hosts in Zabbix and link the necessary templates using ZabbixAPI based on the received information. If the information from CMDB is insufficient, you can request data directly from the configured Zabbix Agent using Getter and then use this information for further configuration and decision-making during setup. Another part of your script can access AD to get a list of branch users to update the list of users in Zabbix through the API and assign them the appropriate permissions and roles based on information from AD or CMDB (e.g., editing rights for server owners).

Another use case of the library may be when you regularly export templates from Zabbix for subsequent import into a version control system. You can also establish a mechanism for loading changes and rolling back to previous versions of templates. Here a variety of other use cases can also be implemented – it’s all up to your requirements and the creative usage of the library.

Of course, if you are a developer and there is a requirement to implement Zabbix monitoring support for your custom system or tool, you can implement sending data describing any events generated by your custom system/tool to Zabbix using Sender.

Installation and Configuration

To begin with, you need to install the zabbix_utils library. You can do this in two main ways:

  • By using pip:
~$ pip install zabbix_utils
  • By cloning from GitHub:
~$ git clone https://github.com/zabbix/python-zabbix-utils
~$ cd python-zabbix-utils/
~$ python setup.py install

No additional configuration is required. But you can specify values for the following environment variables: ZABBIX_URL, ZABBIX_TOKEN, ZABBIX_USER, ZABBIX_PASSWORD if you need. These use cases are described in more detail below.

Working with Zabbix API

To work with Zabbix API, it is necessary to import the ZabbixAPI class from the zabbix_utils library:

from zabbix_utils import ZabbixAPI

If you are using one of the existing popular community libraries, in most cases, it will be sufficient to simply replace the ZabbixAPI import statement with an import from our library.

At that point you need to create an instance of the ZabbixAPI class. T4here are several usage scenarios:

  • Use preset values of environment variables, i.e., not pass any parameters to ZabbixAPI:
~$ export ZABBIX_URL="https://zabbix.example.local"
~$ export ZABBIX_USER="Admin"
~$ export ZABBIX_PASSWORD="zabbix"
from zabbix_utils import ZabbixAPI


api = ZabbixAPI()
  • Pass only the Zabbix API address as input, which can be specified as either the server IP/FQDN address or DNS name (in this case, the HTTP protocol will be used) or as an URL, and the authentication data should still be specified as values for environment variables:
~$ export ZABBIX_USER="Admin"
~$ export ZABBIX_PASSWORD="zabbix"
from zabbix_utils import ZabbixAPI

api = ZabbixAPI(url="127.0.0.1")
  • Pass only the Zabbix API address to ZabbixAPI, as in the example above, and pass the authentication data later using the login() method:
from zabbix_utils import ZabbixAPI

api = ZabbixAPI(url="127.0.0.1")
api.login(user="Admin", password="zabbix")
  • Pass all parameters at once when creating an instance of ZabbixAPI; in this case, there is no need to subsequently call login():
from zabbix_utils import ZabbixAPI

api = ZabbixAPI(
    url="127.0.0.1",
    user="Admin",
    password="zabbix"
)

The ZabbixAPI class supports working with various Zabbix versions, automatically checking the API version during initialization. You can also work with the Zabbix API version as an object as follows:

from zabbix_utils import ZabbixAPI

api = ZabbixAPI()

# ZabbixAPI version field
ver = api.version
print(type(ver).__name__, ver) # APIVersion 6.0.24

# Method to get ZabbixAPI version
ver = api.api_version()
print(type(ver).__name__, ver) # APIVersion 6.0.24

# Additional methods
print(ver.major)    # 6.0
print(ver.minor)    # 24
print(ver.is_lts()) # True

As a result, you will get an APIVersion object that has major and minor fields returning the respective minor and major parts of the current version, as well as the is_lts() method, returning true if the current version is LTS (Long Term Support), and false otherwise. The APIVersion object can also be compared to a version represented as a string or a float number:

# Version comparison
print(ver < 6.4)      # True
print(ver != 6.0)     # False
print(ver != "6.0.5") # True

If the account and password (or starting from Zabbix 5.4 – token instead of login/password) are not set as environment variable values or during the initialization of ZabbixAPI, then it is necessary to call the login() method for authentication:

from zabbix_utils import ZabbixAPI

api = ZabbixAPI(url="127.0.0.1")
api.login(token="xxxxxxxx")

After authentication, you can make any API requests described for all supported versions in the Zabbix documentation.

The format for calling API methods looks like this:

api_instance.zabbix_object.method(parameters)

For example:

api.host.get()

After completing all the necessary API requests, it’s necessary to execute logout() if authentication was done using login and password:

api.logout()

More examples of usage can be found here.

Sending Values to Zabbix Server/Proxy

There is often a need to send values to Zabbix Trapper. For this purpose, the zabbix_sender utility is provided. However, if your service or script sending this data is written in Python, calling an external utility may not be very convenient. Therefore, we have developed the Sender, which will help you send values to Zabbix server or proxy one by one or in groups. To work with Sender, you need to import it as follows:

from zabbix_utils import Sender

After that, you can send a single value:

from zabbix_utils import Sender

sender = Sender(server='127.0.0.1', port=10051)
resp = sender.send_value('example_host', 'example.key', 50, 1702511920)

Alternatively, you can put them into a group for simultaneous sending, for which you need to additionally import ItemValue:

from zabbix_utils import ItemValue, Sender


items = [
    ItemValue('host1', 'item.key1', 10),
    ItemValue('host1', 'item.key2', 'Test value'),
    ItemValue('host2', 'item.key1', -1, 1702511920),
    ItemValue('host3', 'item.key1', '{"msg":"Test value"}'),
    ItemValue('host2', 'item.key1', 0, 1702511920, 100)
]

sender = Sender('127.0.0.1', 10051)
response = sender.send(items)

For cases when there is a necessity to send more values than Zabbix Trapper can accept at one time, there is an option for fragmented sending, i.e. sequential sending in separate fragments (chunks). By default, the chunk size is set to 250 values. In other words, when sending values in bulk, the 400 values passed to the send() method for sending will be sent in two stages. 250 values will be sent first, and the remaining 150 values will be sent after receiving a response. The chunk size can be changed, to do this, you simply need to specify your value for the chunk_size parameter when initializing Sender:

from zabbix_utils import ItemValue, Sender


items = [
    ItemValue('host1', 'item.key1', 10),
    ItemValue('host1', 'item.key2', 'Test value'),
    ItemValue('host2', 'item.key1', -1, 1702511920),
    ItemValue('host3', 'item.key1', '{"msg":"Test value"}'),
    ItemValue('host2', 'item.key1', 0, 1702511920, 100)
]

sender = Sender('127.0.0.1', 10051, chunk_size=2)
response = sender.send(items)

In the example above, the chunk size is set to 2. So, 5 values passed will be sent in three requests of two, two, and one value, respectively.

If your server has multiple network interfaces, and values need to be sent from a specific one, the Sender provides the option to specify a source_ip for the sent values:

from zabbix_utils import Sender

sender = Sender(
    server='zabbix.example.local',
    port=10051,
    source_ip='10.10.7.1'
)
resp = sender.send_value('example_host', 'example.key', 50, 1702511920)

It also supports reading connection parameters from the Zabbix Agent/Agent2 configuration file. To do this, set the use_config flag, after which it is not necessary to pass connection parameters when creating an instance of Sender:

from zabbix_utils import Sender

sender = Sender(
    use_config=True,
    config_path='/etc/zabbix/zabbix_agent2.conf'
)
response = sender.send_value('example_host', 'example.key', 50, 1702511920)

Since the Zabbix Agent/Agent2 configuration file can specify one or even several Zabbix clusters consisting of multiple Zabbix server instances, Sender will send data to the first available server of each cluster specified in the ServerActive parameter in the configuration file. In case the ServerActive parameter is not specified in the Zabbix Agent/Agent2 configuration file, the server address from the Server parameter with the standard Zabbix Trapper port – 10051 will be taken.

By default, Sender returns the aggregated result of sending across all clusters. But it is possible to get more detailed information about the results of sending for each chunk and each cluster:

print(response)
# {"processed": 2, "failed": 0, "total": 2, "time": "0.000108", "chunk": 2}

if response.failed == 0:
    print(f"Value sent successfully in {response.time}")
else:
    print(response.details)
    # {
    #     127.0.0.1:10051: [
    #         {
    #             "processed": 1,
    #             "failed": 0,
    #             "total": 1,
    #             "time": "0.000051",
    #             "chunk": 1
    #         }
    #     ],
    #     zabbix.example.local:10051: [
    #         {
    #             "processed": 1,
    #             "failed": 0,
    #             "total": 1,
    #             "time": "0.000057",
    #             "chunk": 1
    #         }
    #     ]
    # }
    for node, chunks in response.details.items():
        for resp in chunks:
            print(f"processed {resp.processed} of {resp.total} at {node.address}:{node.port}")
            # processed 1 of 1 at 127.0.0.1:10051
            # processed 1 of 1 at zabbix.example.local:10051

More usage examples can be found here.

Getting values from Zabbix Agent/Agent2 by item key.

Sometimes it can also be useful to directly retrieve values from the Zabbix Agent. To assist with this task, zabbix_utils provides the Getter. It performs the same function as the zabbix_get utility, allowing you to work natively within Python code. Getter is straightforward to use; just import it, create an instance by passing the Zabbix Agent’s address and port, and then call the get() method, providing the data item key for the value you want to retrieve:

from zabbix_utils import Getter

agent = Getter('10.8.54.32', 10050)
resp = agent.get('system.uname')

In cases where your server has multiple network interfaces, and requests need to be sent from a specific one, you can specify the source_ip for the Agent connection:

from zabbix_utils import Getter

agent = Getter(
    host='zabbix.example.local',
    port=10050,
    source_ip='10.10.7.1'
)
resp = agent.get('system.uname')

The response from the Zabbix Agent will be processed by the library and returned as an object of the AgentResponse class:

print(resp)
# {
#     "error": null,
#     "raw": "Linux zabbix_server 5.15.0-3.60.5.1.el9uek.x86_64",
#     "value": "Linux zabbix_server 5.15.0-3.60.5.1.el9uek.x86_64"
# }

print(resp.error)
# None

print(resp.value)
# Linux zabbix_server 5.15.0-3.60.5.1.el9uek.x86_64

More usage examples can be found here.

Conclusions

The zabbix_utils library for Python allows you to take full advantage of monitoring using Zabbix, without limiting yourself to the integrations available out of the box. It can be valuable for both DevOps and SRE engineers, as well as Python developers looking to implement monitoring support for their system using Zabbix.

In the next article, we will thoroughly explore integration with an external service using this library to demonstrate the capabilities of zabbix_utils more comprehensively.

Questions

Q: Which Agent versions are supported for Getter?

A: Supported versions of Zabbix Agents are the same as Zabbix API versions, as specified in the readme file. Our goal is to create a library with full support for all Zabbix components of the same version.

Q: Does Getter support Agent encryption?

A: Encryption support is not yet built into Sender and Getter, but you can create your wrapper using third-party libraries for both.

from zabbix_utils import Sender

def psk_wrapper(sock, tls):
    # ...
    # Implementation of TLS PSK wrapper for the socket
    # ...

sender = Sender(
    server='zabbix.example.local',
    port=10051,
    socket_wrapper=psk_wrapper
)

More examples can be found here.

Q: Is it possible to set a timeout value for Getter?

A: The response timeout value can be set for the Getter, as well as for ZabbixAPI and Sender. In all cases, the timeout is set for waiting for any responses to requests.

# Example of setting a timeout for Sender
sender = Sender(server='127.0.0.1', port=10051, timeout=30)

# Example of setting a timeout for Getter
agent = Getter(host='127.0.0.1', port=10050, timeout=30)

Q: Is parallel (asynchronous) mode supported?

A: Currently, the library does not include asynchronous classes and methods, but we plan to develop asynchronous versions of ZabbixAPI and Sender.

Q: Is it possible to specify multiple servers when sending through Sender without specifying a configuration file (for working with an HA cluster)?

A: Yes, it’s possible by the following way:

from zabbix_utils import Sender


zabbix_clusters = [
    [
        'zabbix.cluster1.node1',
        'zabbix.cluster1.node2:10051'
    ],
    [
        'zabbix.cluster2.node1:10051',
        'zabbix.cluster2.node2:20051',
        'zabbix.cluster2.node3'
    ]
]

sender = Sender(clusters=zabbix_clusters)
response = sender.send_value('example_host', 'example.key', 10, 1702511922)

print(response)
# {"processed": 2, "failed": 0, "total": 2, "time": "0.000103", "chunk": 2}

print(response.details)
# {
#     "zabbix.cluster1.node1:10051": [
#         {
#             "processed": 1,
#             "failed": 0,
#             "total": 1,
#             "time": "0.000050",
#             "chunk": 1
#         }
#     ],
#     "zabbix.cluster2.node2:20051": [
#         {
#             "processed": 1,
#             "failed": 0,
#             "total": 1,
#             "time": "0.000053",
#             "chunk": 1
#         }
#     ]
# }

The post Introducing zabbix_utils – the official Python library for Zabbix API appeared first on Zabbix Blog.

Improving SNMP monitoring performance with bulk SNMP data collection

Post Syndicated from Arturs Lontons original https://blog.zabbix.com/improving-snmp-monitoring-performance-with-bulk-snmp-data-collection/27231/

Zabbix 6.4 introduced major improvements to SNMP monitoring, especially when it comes to collecting large numbers of metrics from a single device. This is done by utilizing master-dependent item logic and combining it with low-level discovery and newly introduced preprocessing rules. This blog post will cover the drawbacks of the legacy SNMP monitoring approach, the benefits of the new approach, and the steps required to deploy bulk SNMP metric collection.

The legacy SNMP monitoring approach – potential pitfalls

Let’s take a look at the SNMP monitoring logic that all of us are used to. For our example here, we will look at network interface discovery on a network switch.

To start off, we create a low-level discovery rule. In the discovery rule, we specify which low-level discovery macros are collected from which OIDs. This way, we create multiple low-level discovery macro and OID pairs. Zabbix then goes through the list of indexes at the end of the specified OIDs and matches the collected values to low-level discovery macros. Zabbix also collects the list of discovered indexes for the specified OIDs and automatically matches them with the {#SNMPINDEX} low-level discovery macros.

An example of regular SNMP discovery key:

discovery[{#IFOPERSTATUS},1.3.6.1.2.1.2.2.1.8,{#IFADMINSTATUS},1.3.6.1.2.1.2.2.1.7,{#IFALIAS},1.3.6.1.2.1.31.1.1.1.18,{#IFNAME},1.3.6.1.2.1.31.1.1.1.1,{#IFDESCR},1.3.6.1.2.1.2.2.1.2,{#IFTYPE},1.3.6.1.2.1.2.2.1.3]
An example of regular SNMP low-level discovery rule

The collected low-level discovery data will look something like this:

[
{
"{#SNMPINDEX}":"3",
"{#IFOPERSTATUS}":"2",
"{#IFADMINSTATUS}":"1",
"{#IFALIAS}":"",
"{#IFNAME}":"3",
"{#IFDESCR}":"3",
"{#IFTYPE}":"6"
},
{
"{#SNMPINDEX}":"4",
"{#IFOPERSTATUS}":"2",
"{#IFADMINSTATUS}":"1",
"{#IFALIAS}":"",
"{#IFNAME}":"4",
"{#IFDESCR}":"4",
"{#IFTYPE}":"6"
},
{
"{#SNMPINDEX}":"5",
"{#IFOPERSTATUS}":"2",
"{#IFADMINSTATUS}":"1",
"{#IFALIAS}":"",
"{#IFNAME}":"5",
"{#IFDESCR}":"5",
"{#IFTYPE}":"6"
},
{
"{#SNMPINDEX}":"6",
"{#IFOPERSTATUS}":"2",
"{#IFADMINSTATUS}":"1",
"{#IFALIAS}":"",
"{#IFNAME}":"6",
"{#IFDESCR}":"6",
"{#IFTYPE}":"6"
},
{
"{#SNMPINDEX}":"7",
"{#IFOPERSTATUS}":"2",
"{#IFADMINSTATUS}":"1",
"{#IFALIAS}":"",
"{#IFNAME}":"7",
"{#IFDESCR}":"7",
"{#IFTYPE}":"6"
}
]

Once the low-level discovery rule is created, we move on to creating item prototypes.

Items created based on this item prototype will collect metrics from the OIDs specified in the SNMP OID field and will create an item per index ( {#SNMPINDEX} macro) collected by the low-level discovery rule. Note that the item type is SNMP agent – each discovered and created item will be a regular SNMP item, polling the device and collecting metrics based on the item OID.

Now, imagine we have hundreds of interfaces and we’re polling a variety of metrics at a rapid interval for each interface. If our device has older or slower hardware, this can cause an issue where the device simply cannot process that many requests. To resolve this, a better way to collect SNMP metrics is required.

Bulk data collection with master – dependent items

Before we move on to the improved SNMP metric collection approach, we need to first take a look at how master-dependent item bulk metric collection and low-level discovery logic are implemented in Zabbix.

  • First, we create a master item, which collects both the metrics and low-level discovery information in a single go.
  • Next, we create a low-level discovery rule of type dependent item and point at the master item created in the previous step. At this point, we need to either ensure that the data collected by the master item is formatted in JSON or convert the data to JSON by using preprocessing.
  • Once we have ensured that our data is JSON-formatted, we can use the LLD macros tab to populate our low-level discovery macro values via JSONPath. Note: Here the SNMP low-level discovery with bulk metric collection uses a DIFFERENT APPROACH, designed specifically for SNMP checks.
  • Finally, we create item prototypes of type dependent item and once again point them at the master item created in the first step (Remember – our master item contains not only low-level discovery information, but also all of the required metrics). Here we use JSONPath preprocessing together with low-level discovery macros to specify which values should be collected. Remember that low-level discovery macros will be resolved to their values for each of the items created from the item prototype.

Improving SNMP monitoring performance with bulk metric collection

The SNMP bulk metric collection and discovery logic is very similar to what is discussed in the previous section, but it is more tailored to SNMP nuances.

Here, to avoid excessive polling, a new walk[] item has been introduced. The item utilizes GetBulk requests with SNMPv2 and v3 interfaces and GetNext for SNMPv1 interfaces to collect SNMP data. GetBulk requests perform much better by design. A GetBulk request retrieves values of all instances at the end of the OID tree in a single go, instead of issuing individual Get requests per each instance.

To utilize this in Zabbix, first we have to create a walk[] master item, specifying the list of OIDs from which to collect values. The retrieved values will be used in both low-level discovery (e.g.: interface names) and items created from low-level discovery item prototypes (e.g.: incoming and outgoing traffic).

Two new preprocessing steps have been introduced to facilitate SNMP bulk data collection:

  • SNMP walk to JSON is used to specify the OIDs from which the low-level discovery macros will be populated with their values
  • SNMP walk value is used in the item prototypes to specify the OID from which the item value will be collected

The workflow for SNMP bulk data collection can be described in the following steps:

  • Create a master walk[] item containing the required OIDs
  • Create a low-level discovery rule of type dependent item which depends on the walk[] master item
  • Define low-level discovery macros by using the SNMP walk to JSON preprocessing step
  • Create item prototypes of type dependent item which depend on the walk[] master item, and use the SNMP walk value preprocessing step to specify which OID should be used for value collection

Monitoring interface traffic with bulk SNMP data collection

Let’s take a look at a simple example which you can use as a starting point for implementing bulk SNMP metric collection for your devices. In the following example we will create a master walk[] item, a dependent low-level discovery rule to discover network interfaces, and dependent item prototypes for incoming and outgoing traffic.

Creating the master item

We will start by creating the walk[] SNMP agent master item. The name and the key of the item can be specified arbitrarily. What’s important here is the OID field, where we will specify the list of comma separated OIDs from which their instance values will be collected.

walk[1.3.6.1.2.1.31.1.1.1.6,1.3.6.1.2.1.31.1.1.1.10,1.3.6.1.2.1.31.1.1.1.1,1.3.6.1.2.1.2.2.1.2,1.3.6.1.2.1.2.2.1.3]

The walk[] item will collect values from the following OIDs:

  • 1.3.6.1.2.1.31.1.1.1.6 – Incoming traffic
  • 1.3.6.1.2.1.31.1.1.1.10 – Outgoing traffic
  • 1.3.6.1.2.1.31.1.1.1.1 – Interface names
  • 1.3.6.1.2.1.2.2.1.2 – Interface descriptions
  • 1.3.6.1.2.1.2.2.1.3 – Interface types

SNMP bulk metric collection master walk[] item

Here we can see the resulting values collected by this item:

Note: For readability, the output has been truncated and some of the interfaces have been left out.

.1.3.6.1.2.1.2.2.1.2.102 = STRING: DEFAULT_VLAN
.1.3.6.1.2.1.2.2.1.2.104 = STRING: VLAN3
.1.3.6.1.2.1.2.2.1.2.105 = STRING: VLAN4
.1.3.6.1.2.1.2.2.1.2.106 = STRING: VLAN5
.1.3.6.1.2.1.2.2.1.2.4324 = STRING: Switch loopback interface
.1.3.6.1.2.1.2.2.1.3.102 = INTEGER: 53
.1.3.6.1.2.1.2.2.1.3.104 = INTEGER: 53
.1.3.6.1.2.1.2.2.1.3.105 = INTEGER: 53
.1.3.6.1.2.1.2.2.1.3.106 = INTEGER: 53
.1.3.6.1.2.1.2.2.1.3.4324 = INTEGER: 24
.1.3.6.1.2.1.31.1.1.1.1.102 = STRING: DEFAULT_VLAN
.1.3.6.1.2.1.31.1.1.1.1.104 = STRING: VLAN3
.1.3.6.1.2.1.31.1.1.1.1.105 = STRING: VLAN4
.1.3.6.1.2.1.31.1.1.1.1.106 = STRING: VLAN5
.1.3.6.1.2.1.31.1.1.1.1.4324 = STRING: lo0
.1.3.6.1.2.1.31.1.1.1.10.102 = Counter64: 0
.1.3.6.1.2.1.31.1.1.1.10.104 = Counter64: 0
.1.3.6.1.2.1.31.1.1.1.10.105 = Counter64: 0
.1.3.6.1.2.1.31.1.1.1.10.106 = Counter64: 0
.1.3.6.1.2.1.31.1.1.1.10.4324 = Counter64: 12073
.1.3.6.1.2.1.31.1.1.1.6.102 = Counter64: 0
.1.3.6.1.2.1.31.1.1.1.6.104 = Counter64: 0
.1.3.6.1.2.1.31.1.1.1.6.105 = Counter64: 0
.1.3.6.1.2.1.31.1.1.1.6.106 = Counter64: 0
.1.3.6.1.2.1.31.1.1.1.6.4324 = Counter64: 12457

By looking at these values we can confirm that the item collects values required for both the low-level discovery rule (interface name, type, and description) and the items created from item prototypes (incoming/outgoing traffic).

Creating the low-level discovery rule

As our next step, we will create a dependent low-level discovery rule which will discover interfaces based on the data from the master walk[] item.

Interface discovery dependent low-level discovery rule

The most important part of configuring the low-level discovery rule lies in defining the SNMP walk to JSON preprocessing step. Here we can assign low-level discovery macros to OIDs. For our example, we will assign the {#IFNAME} macro to the OID containig the values of interface names:

Field name: {#IFNAME}
OID prefix: 1.3.6.1.2.1.31.1.1.1.1
Dependent low-level discovery rule preprocessing steps

The name and the key of the dependent item can be specified arbitrarily.

Creating item prototypes

Finally, let’s create two dependent item prototypes to collect traffic data from our master item.

Here we will provide an arbitrary name and key containing low-level discovery macros. On items created from the item prototypes, the macros will resolve as our OID values, thus giving each item a unique name and key.

Note: The {#SNMPINDEX} macro is automatically collected by the low-level discovery rule and contains the indexes from the OIDs specified in the SNMP walk to JSON preprocessing step.

The final step in creating the item prototype is using the SNMP walk value preprocessing step to define which value will be collected by the item. We will also append the {#SNMPINDEX} macro at the end of the OID. This way, each item created from the prototype will collect data from a unique OID corresponding to the correct object instance.

Incoming traffic item prototype

Incoming traffic item prototype preprocessing step:

SNMP walk value: 1.3.6.1.2.1.31.1.1.1.6.{#SNMPINDEX}
Incoming traffic item preprocessing steps

 

Outgoing traffic item prototype

Outgoing traffic item prototype preprocessing step:

SNMP walk value: 1.3.6.1.2.1.31.1.1.1.10.{#SNMPINDEX}
Outgoing traffic item preprocessing steps

Note: Since the collected traffic values are counter values (always increasing), the Change per second preprocessing step is required to collect the traffic per second values.

Note: Since the values are collected in bytes, we will use the Custom multiplier preprocessing step to convert bytes to bits.

Final notes

And we’re done! Now all we have to do is wait until the master item update interval kicks in and we should see our items getting discovered by the low-level discovery rule.

Items created from the item prototypes

After we have confirmed that our interfaces are getting discovered and the items are collecting metrics from the master item, we should also implement the Discard unchanged with heartbeat preprocessing step on our low-level discovery rule. This way, the low-level discovery rule will not try and discover new entities in situations where we’re getting the same set of interfaces over and over again from our master item. This in turn improves the overall performance of internal low-level discovery processes.

Discard unchanged with heartbeat preprocessing on the low-level discovery rule

Note that we discovered other interface parameters than just the interface name – interface description and type are also collected in the master item. To use this data, we would have to add additional fields in the low-level discovery rule SNMP walk to JSON preprocessing step and assign low-level discovery macros to the corresponding OIDs containing this information. Once that is done, we can use the new macros in the item prototype to provide additional information in item name or key, or filter the discovered interfaces based on this information (e.g.: only discover interfaces of a particular type).

If you have any questions, comments, or suggestions regarding a topic you wish to see covered next in our blog, don’t hesitate to leave a comment below!

The post Improving SNMP monitoring performance with bulk SNMP data collection appeared first on Zabbix Blog.

Getting started with Projen and AWS CDK

Post Syndicated from Michael Tran original https://aws.amazon.com/blogs/devops/getting-started-with-projen-and-aws-cdk/

In the modern world of cloud computing, Infrastructure as Code (IaC) has become a vital practice for deploying and managing cloud resources. AWS Cloud Development Kit (AWS CDK) is a popular open-source framework that allows developers to define cloud resources using familiar programming languages. A related open source tool called Projen is a powerful project generator that simplifies the management of complex software configurations. In this post, we’ll explore how to get started with Projen and AWS CDK, and discuss the pros and cons of using Projen.

What is Projen?

Building modern and high quality software requires a large number of tools and configuration files to handle tasks like linting, testing, and automating releases. Each tool has its own configuration interface, such as JSON or YAML, and a unique syntax, increasing maintenance complexity.

When starting a new project, you rarely start from scratch, but more often use a scaffolding tool (for instance, create-react-app) to generate a new project structure. A large amount of configuration is created on your behalf, and you get the ownership of those files. Moreover, there is a high number of project generation tools, with new ones created almost everyday.

Projen is a project generator that helps developers to efficiently manage project configuration files and build high quality software. It allows you to define your project structure and configuration in code, making it easier to maintain and share across different environments and projects.

Out of the box, Projen supports multiple project types like AWS CDK construct libraries, react applications, Java projects, and Python projects. New project types can be added by contributors, and projects can be developed in multiple languages. Projen uses the jsii library, which allows us to write APIs once and generate libraries in several languages. Moreover, Projen provides a single interface, the projenrc file, to manage the configuration of your entire project!

The diagram below provides an overview of the deployment process of AWS cloud resources using Projen:

Projen Overview of Deployment process of AWS Resources

 

  1. In this example, Projen can be used to generate a new project, for instance, a new CDK Typescript application.
  2. Developers define their infrastructure and application code using AWS CDK resources. To modify the project configuration, developers use the projenrc file instead of directly editing files like package.json.
  3. The project is synthesized to produce an AWS CloudFormation template.
  4. The CloudFormation template is deployed in a AWS account, and provisions AWS cloud resources.

Projen_Diagram
Diagram 1 – Projen packaged features: Projen helps gets your project started and allows you to focus on coding instead of worrying about the other project variables. It comes out of the box with linting, unit test and code coverage, and a number of Github actions for release and versioning and dependency management.

Pros and Cons of using Projen

Pros

  1. Consistency: Projen ensures consistency across different projects by allowing you to define standard project templates. You don’t need to use different project generators, only Projen.
  2. Version Control: Since project configuration is defined in code, it can be version-controlled, making it easier to track changes and collaborate with others.
  3. Extensibility: Projen supports various plugins and extensions, allowing you to customize the project configuration to fit your specific needs.
  4. Integration with AWS CDK: Projen provides seamless integration with AWS CDK, simplifying the process of defining and deploying cloud resources.
  5. Polyglot CDK constructs library: Build once, run in multiple runtimes. Projen can convert and publish a CDK Construct developed in TypeScript to Java (Maven) and Python (PYPI) with JSII support.
  6. API Documentation : Generate API documentation from the comments, if you are building a CDK construct

Cons

  1. Microsoft Windows support. There are a number of open issues about Projen not completely working with the Windows environment (https://github.com/projen/projen/issues/2427 and https://github.com/projen/projen/issues/498).
  2. The framework, Projen, is very opinionated with a lot of assumptions on architecture, best practices and conventions.
  3. Projen is still not GA, with the version at the time of this writing at v0.77.5.

Walkthrough

Step 1: Set up prerequisites

  • An AWS account
  • Download and install Node
  • Install yarn
  • AWS CLI : configure your credentials
  • Deploying stacks with the AWS CDK requires dedicated Amazon S3 buckets and other containers to be available to AWS CloudFormation during deployment (More information).

Note: Projen doesn’t need to be installed globally. You will be using npx to run Projen which takes care of all required setup steps. npx is a tool for running npm packages that:

  • live inside of a local node_modules folder
  • are not installed globally.

npx comes bundled with npm version 5.2+

Step 2: Create a New Projen Project

You can create a new Projen project using the following command:

mkdir test_project && cd test_project
npx projen new awscdk-app-ts

This command creates a new TypeScript project with AWS CDK support. The exhaustive list of supported project types is available through the official documentation: Projen.io, or by running the npx projen new command without a project type. It also supports npx projen new awscdk-construct to create a reusable construct which can then be published to other package managers.

The created project structure should be as follows:

test_project
| .github/
| .projen/
| src/
| test/
| .eslintrc
| .gitattributes
| .gitignore
| .mergify.yml
| .npmignore
| .projenrc.js
| cdk.json
| LICENSE
| package.json
| README.md
| tsconfig.dev.json
| yarn.lock

Projen generated a new project including:

  • Initialization of an empty git repository, with the associated GitHub workflow files to build and upgrade the project. The release workflow can be customized with projen tasks.
  • .projenrc.js is the main configuration file for project
  • tasks.json file for integration with Visual Studio Code
  • src folder containing an empty CDK stack
  • License and README files
  • A projen configuration file: projenrc.js
  • package.json contains functional metadata about the project like name, versions and dependencies.
  • .gitignore, .gitattributes file to manage your files with git.
  • .eslintrc identifying and reporting patterns on javascript.
  • .npmignore to keep files out of package manager.
  • .mergify.yml for managing the pull requests.
  • tsconfig.json configure the compiler options

Most of the generated files include a disclaimer:

# ~~ Generated by projen. To modify, edit .projenrc.js and run "npx projen".

Projen’s power lies in its single configuration file, .projenrc.js. By editing this file, you can manage your project’s lint rules, dependencies, .gitignore, and more. Projen will propagate your changes across all generated files, simplifying and unifying dependency management across your projects.

Projen generated files are considered implementation details and are not meant to be edited manually. If you do make manual changes, they will be overwritten the next time you run npx projen.

To edit your project configuration, simply edit .projenrc.js and then run npx projen to synthesize again. For more information on the Projen API, please see the documentation: http://projen.io/api/API.html.

Projen uses the projenrc.js file’s configuration to instantiate a new AwsCdkTypeScriptApp with some basic metadata: the project name, CDK version and the default release branch. Additional APIs are available for this project type to customize it (for instance, add runtime dependencies).

Let’s try to modify a property and see how Projen reacts. As an example, let’s update the project name in projenrc.js :

name: 'test_project_2',

and then run the npx projen command:

npx projen

Once done, you can see that the project name was updated in the package.json file.

Step 3: Define AWS CDK Resources

Inside your Projen project, you can define AWS CDK resources using familiar programming languages like TypeScript. Here’s an example of defining an Amazon Simple Storage Service (Amazon S3) bucket:

1. Navigate to your main.ts file in the src/ directory
2. Modify the imports at the top of the file as follow:

import { App, CfnOutput, Stack, StackProps } from 'aws-cdk-lib';
import * as s3 from 'aws-cdk-lib/aws-s3';
import { Construct } from 'constructs';

1. Replace line 9 “// define resources here…” with the code below:

const bucket = new s3.Bucket(this, 'MyBucket', {
  versioned: true,
});

new CfnOutput(this, 'TestBucket', { value: bucket.bucketArn });

Step 4: Synthesize and Deploy

Next we will bootstrap our application. Run the following in a terminal:

$ npx cdk bootstrap

Once you’ve defined your resources, you can synthesize a cloud assembly, which includes a CloudFormation template (or many depending on the application) using:

$ npx projen build

npx projen build will perform several actions:

  1. Build the application
  2. Synthesize the CloudFormation template
  3. Run tests and linter

The synth() method of Projen performs the actual synthesizing (and updating) of all configuration files managed by Projen. This is achieved by deleting all Projen-managed files (if there are any), and then re-synthesizing them based on the latest configuration specified by the user.

You can find an exhaustive list of the available npx projen commands in .projen/tasks.json. You can also use the projen API project.addTask to add a new task to perform any custom action you need ! Tasks are a project-level feature to define a project command system backed by shell scripts.

Deploy the CDK application:

$ npx projen deploy

Projen will use the cdk deploy command to deploy the CloudFormation stack in the configured AWS account by creating and executing a change set based on the template generated by CDK synthesis. The output of the step above should look as follow:

deploy | cdk deploy

✨ Synthesis time: 3.28s

toto-dev: start: Building 387a3a724050aec67aa083b74c69485b08a876f038078ec7ea1018c7131f4605:263905523351-us-east-1
toto-dev: success: Built 387a3a724050aec67aa083b74c69485b08a876f038078ec7ea1018c7131f4605:263905523351-us-east-1
toto-dev: start: Publishing 387a3a724050aec67aa083b74c69485b08a876f038078ec7ea1018c7131f4605:263905523351-us-east-1
toto-dev: success: Published 387a3a724050aec67aa083b74c69485b08a876f038078ec7ea1018c7131f4605:263905523351-us-east-1
toto-dev: deploying... [1/1]
toto-dev: creating CloudFormation changeset...

✅ testproject-dev

✨ Deployment time: 33.48s

Outputs:
testproject-dev.TestBucket = arn:aws:s3:::testproject-dev-mybucketf68f3ff0-1xy2f0vk0ve4r
Stack ARN:
arn:aws:cloudformation:us-east-1:263905523351:stack/testproject-dev/007e7b20-48df-11ee-b38d-0aa3a92c162d

✨ Total time: 36.76s

The application was successfully deployed in the configured AWS account! Also, the Amazon Resource Name (ARN) of the S3 bucket created is available through the CloudFormation stack Outputs tab, and displayed in your terminal under the ‘Outputs’ section.

Clean up

Delete CloudFormation Stack

To clean up the resources created in this section of the workshop, navigate to the CloudFormation console and delete the stack created. You can also perform the same task programmatically:

$ npx projen destroy

Which should produce the following output:

destroy | cdk destroy
Are you sure you want to delete: testproject-dev (y/n)? y
testproject-dev: destroying... [1/1]

✅ testproject-dev: destroyed

Delete S3 Buckets

The S3 bucket will not be deleted since its retention policy was set to RETAIN. Navigate to the S3 console and delete the created bucket. If you added files to that bucket, you will need to empty it before deletion. See the Deleting a bucket documentation for more information.

Conclusion

Projen and AWS CDK together provide a powerful combination for managing cloud resources and project configuration. By leveraging Projen, you can ensure consistency, version control, and extensibility across your projects. The integration with AWS CDK allows you to define and deploy cloud resources using familiar programming languages, making the entire process more developer-friendly.

Whether you’re a seasoned cloud developer or just getting started, Projen and AWS CDK offer a streamlined approach to cloud resource management. Give it a try and experience the benefits of Infrastructure as Code with the flexibility and power of modern development tools.

Alain Krok

Alain Krok is a Senior Solutions Architect with a passion for emerging technologies. His past experience includes designing and implementing IIoT solutions for the oil and gas industry and working on robotics projects. He enjoys pushing the limits and indulging in extreme sports when he is not designing software.

 

Dinesh Sajwan

Dinesh Sajwan is a Senior Solutions Architect. His passion for emerging technologies allows him to stay on the cutting edge and identify new ways to apply the latest advancements to solve even the most complex business problems. His diverse expertise and enthusiasm for both technology and adventure position him as a uniquely creative problem-solver.

Michael Tran

Michael Tran is a Sr. Solutions Architect with Prototyping Acceleration team at Amazon Web Services. He provides technical guidance and helps customers innovate by showing the art of the possible on AWS. He specializes in building prototypes in the AI/ML space. You can contact him @Mike_Trann on Twitter.

Handling Bounces and Complaints

Post Syndicated from Tyler Holmes original https://aws.amazon.com/blogs/messaging-and-targeting/handling-bounces-and-complaints/

As you may have seen in Jeff Barr’s blog post or in an announcement, Amazon Simple Email Service (Amazon SES) now provides bounce and complaint notifications via Amazon Simple Notification Service (Amazon SNS). You can refer to the Amazon SES Developer Guide or Jeff’s post to learn how to set up this feature. In this post, we will show you how you might manage your email list using the information you get in the Amazon SNS notifications.

Background

Amazon SES assigns a unique message ID to each email that you successfully submit to send. When Amazon SES receives a bounce or complaint message from an ISP, we forward the feedback message to you. The format of bounce and complaint messages varies between ISPs, but Amazon SES interprets these messages and, if you choose to set up Amazon SNS topics for them, categorizes them into JSON objects.

Scenario

Let’s assume you use Amazon SES to send monthly product announcements to a list of email addresses. You store the list in a database and send one email per recipient through Amazon SES. You review bounces and complaints once each day, manually interpret the bounce messages in the incoming email, and update the list. You would like to automate this process using Amazon SNS notifications with a scheduled task.

Solution

To implement this solution, we will use separate Amazon SNS topics for bounces and complaints to isolate the notification channels from each other and manage them separately. Also, since the bounce and complaint handler will not run 24/7, we need these notifications to persist until the application processes them. Amazon SNS integrates with Amazon Simple Queue Service (Amazon SQS), which is a durable messaging technology that allows us to persist these notifications. We will configure each Amazon SNS topic to publish to separate SQS queues. When our application runs, it will process queued notifications and update the email list. We have provided sample C# code below.

Configuration

Set up the following AWS components to handle bounce notifications:

  1. Create an Amazon SQS queue named ses-bounces-queue.
  2. Create an Amazon SNS topic named ses-bounces-topic.
  3. Configure the Amazon SNS topic to publish to the SQS queue.
  4. Configure Amazon SES to publish bounce notifications using ses-bounces-topic to ses-bounces-queue.

Set up the following AWS components to handle complaint notifications:

  1. Create an Amazon SQS queue named ses-complaints-queue.
  2. Create an Amazon SNS topic named ses-complaints-topic.
  3. Configure the Amazon SNS topic to publish to the SQS queue.
  4. Configure Amazon SES to publish complaint notifications using ses-complaints-topic to ses-complaints-queue.

Ensure that IAM policies are in place so that Amazon SNS has access to publish to the appropriate SQS queues.

Bounce Processing

Amazon SES will categorize your hard bounces into two types: permanent and transient. A permanent bounce indicates that you should never send to that recipient again. A transient bounce indicates that the recipient’s ISP is not accepting messages for that particular recipient at that time and you can retry delivery in the future. The amount of time you should wait before resending to the address that generated the transient bounce depends on the transient bounce type. Certain transient bounces require manual intervention before the message can be delivered (e.g., message too large or content error). If the bounce type is undetermined, you should manually review the bounce and act accordingly.

You will need to define some classes to simplify bounce notification parsing from JSON into .NET objects. We will use the open-source JSON.NET library.

/// <summary>Represents the bounce or complaint notification stored in Amazon SQS.</summary>
class AmazonSqsNotification
{
    public string Type { get; set; }
    public string Message { get; set; }
}

/// <summary>Represents an Amazon SES bounce notification.</summary>
class AmazonSesBounceNotification
{
    public string NotificationType { get; set; }
    public AmazonSesBounce Bounce { get; set; }
}
/// <summary>Represents meta data for the bounce notification from Amazon SES.</summary>
class AmazonSesBounce
{
    public string BounceType { get; set; }
    public string BounceSubType { get; set; }
    public DateTime Timestamp { get; set; }
    public List<AmazonSesBouncedRecipient> BouncedRecipients { get; set; }
}
/// <summary>Represents the email address of recipients that bounced
/// when sending from Amazon SES.</summary>
class AmazonSesBouncedRecipient
{
    public string EmailAddress { get; set; }
}

Sample code to handle bounces:

/// <summary>Process bounces received from Amazon SES via Amazon SQS.</summary>
/// <param name="response">The response from the Amazon SQS bounces queue 
/// to a ReceiveMessage request. This object contains the Amazon SES  
/// bounce notification.</param> 
private static void ProcessQueuedBounce(ReceiveMessageResponse response)
{
    int messages = response.ReceiveMessageResult.Message.Count;
 
    if (messages > 0)
    {
        foreach (var m in response.ReceiveMessageResult.Message)
        {
            // First, convert the Amazon SNS message into a JSON object.
            var notification = Newtonsoft.Json.JsonConvert.DeserializeObject<AmazonSqsNotification>(m.Body);
 
            // Now access the Amazon SES bounce notification.
            var bounce = Newtonsoft.Json.JsonConvert.DeserializeObject<AmazonSesBounceNotification>(notification.Message);
 
            switch (bounce.Bounce.BounceType)
            {
                case "Transient":
                    // Per our sample organizational policy, we will remove all recipients 
                    // that generate an AttachmentRejected bounce from our mailing list.
                    // Other bounces will be reviewed manually.
                    switch (bounce.Bounce.BounceSubType)
                    {
                        case "AttachmentRejected":
                            foreach (var recipient in bounce.Bounce.BouncedRecipients)
                            {
                                RemoveFromMailingList(recipient.EmailAddress);
                            }
                            break;
                        default:
                            ManuallyReviewBounce(bounce);
                            break;
                    }
                    break;
                default:
                    // Remove all recipients that generated a permanent bounce 
                    // or an unknown bounce.
                    foreach (var recipient in bounce.Bounce.BouncedRecipients)
                    {
                        RemoveFromMailingList(recipient.EmailAddress);
                    }
                    break;
            }
        }
    }
}

Complaint Processing

A complaint indicates the recipient does not want the email that you sent them. When we receive a complaint, we want to remove the recipient addresses from our list. Again, define some objects to simplify parsing complaint notifications from JSON to .NET objects.

/// <summary>Represents an Amazon SES complaint notification.</summary>
class AmazonSesComplaintNotification
{
    public string NotificationType { get; set; }
    public AmazonSesComplaint Complaint { get; set; }
}
/// <summary>Represents the email address of individual recipients that complained 
/// to Amazon SES.</summary>
class AmazonSesComplainedRecipient
{
    public string EmailAddress { get; set; }
}
/// <summary>Represents meta data for the complaint notification from Amazon SES.</summary>
class AmazonSesComplaint
{
    public List<AmazonSesComplainedRecipient> ComplainedRecipients { get; set; }
    public DateTime Timestamp { get; set; }
    public string MessageId { get; set; }
}

Sample code to handle complaints is:

/// <summary>Process complaints received from Amazon SES via Amazon SQS.</summary>
/// <param name="response">The response from the Amazon SQS complaint queue 
/// to a ReceiveMessage request. This object contains the Amazon SES 
/// complaint notification.</param>
private static void ProcessQueuedComplaint(ReceiveMessageResponse response)
{
    int messages = response.ReceiveMessageResult.Message.Count;
 
    if (messages > 0)
    {
        foreach (var
  message in response.ReceiveMessageResult.Message)
        {
            // First, convert the Amazon SNS message into a JSON object.
            var notification = Newtonsoft.Json.JsonConvert.DeserializeObject<AmazonSqsNotification>(message.Body);
 
            // Now access the Amazon SES complaint notification.
            var complaint = Newtonsoft.Json.JsonConvert.DeserializeObject<AmazonSesComplaintNotification>(notification.Message);
 
            foreach (var recipient in complaint.Complaint.ComplainedRecipients)
            {
                // Remove the email address that complained from our mailing list.
                RemoveFromMailingList(recipient.EmailAddress);
            }
        }
    }
}

Final Thoughts

We hope that you now have the basic information on how to use bounce and complaint notifications. For more information, please review our API reference and Developer Guide; it describes all actions, error codes and restrictions that apply to Amazon SES.

If you have comments or feedback about this feature, please post them on the Amazon SES forums. We actively monitor the forum and frequently engage with customers. Happy sending with Amazon SES!

Monitor Apache Spark applications on Amazon EMR with Amazon Cloudwatch

Post Syndicated from Le Clue Lubbe original https://aws.amazon.com/blogs/big-data/monitor-apache-spark-applications-on-amazon-emr-with-amazon-cloudwatch/

To improve a Spark application’s efficiency, it’s essential to monitor its performance and behavior. In this post, we demonstrate how to publish detailed Spark metrics from Amazon EMR to Amazon CloudWatch. This will give you the ability to identify bottlenecks while optimizing resource utilization.

CloudWatch provides a robust, scalable, and cost-effective monitoring solution for AWS resources and applications, with powerful customization options and seamless integration with other AWS services. By default, Amazon EMR sends basic metrics to CloudWatch to track the activity and health of a cluster. Spark’s configurable metrics system allows metrics to be collected in a variety of sinks, including HTTP, JMX, and CSV files, but additional configuration is required to enable Spark to publish metrics to CloudWatch.

Solution overview

This solution includes Spark configuration to send metrics to a custom sink. The custom sink collects only the metrics defined in a Metricfilter.json file. It utilizes the CloudWatch agent to publish the metrics to a custom Cloudwatch namespace. The bootstrap action script included is responsible for installing and configuring the CloudWatch agent and the metric library on the Amazon Elastic Compute Cloud (Amazon EC2) EMR instances. A CloudWatch dashboard can provide instant insight into the performance of an application.

The following diagram illustrates the solution architecture and workflow.

architectural diagram illustrating the solution overview

The workflow includes the following steps:

  1. Users start a Spark EMR job, creating a step on the EMR cluster. With Apache Spark, the workload is distributed across the different nodes of the EMR cluster.
  2. In each node (EC2 instance) of the cluster, a Spark library captures and pushes metric data to a CloudWatch agent, which aggregates the metric data before pushing them to CloudWatch every 30 seconds.
  3. Users can view the metrics accessing the custom namespace on the CloudWatch console.

We provide an AWS CloudFormation template in this post as a general guide. The template demonstrates how to configure a CloudWatch agent on Amazon EMR to push Spark metrics to CloudWatch. You can review and customize it as needed to include your Amazon EMR security configurations. As a best practice, we recommend including your Amazon EMR security configurations in the template to encrypt data in transit.

You should also be aware that some of the resources deployed by this stack incur costs when they remain in use. Additionally, EMR metrics don’t incur CloudWatch costs. However, custom metrics incur charges based on CloudWatch metrics pricing. For more information, see Amazon CloudWatch Pricing.

In the next sections, we go through the following steps:

  1. Create and upload the metrics library, installation script, and filter definition to an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Use the CloudFormation template to create the following resources:
  3. Monitor the Spark metrics on the CloudWatch console.

Prerequisites

This post assumes that you have the following:

  • An AWS account.
  • An S3 bucket for storing the bootstrap script, library, and metric filter definition.
  • A VPC created in Amazon Virtual Private Cloud (Amazon VPC), where your EMR cluster will be launched.
  • Default IAM service roles for Amazon EMR permissions to AWS services and resources. You can create these roles with the aws emr create-default-roles command in the AWS Command Line Interface (AWS CLI).
  • An optional EC2 key pair, if you plan to connect to your cluster through SSH rather than Session Manager, a capability of AWS Systems Manager.

Define the required metrics

To avoid sending unnecessary data to CloudWatch, our solution implements a metric filter. Review the Spark documentation to get acquainted with the namespaces and their associated metrics. Determine which metrics are relevant to your specific application and performance goals. Different applications may require different metrics to monitor, depending on the workload, data processing requirements, and optimization objectives. The metric names you’d like to monitor should be defined in the Metricfilter.json file, along with their associated namespaces.

We have created an example Metricfilter.json definition, which includes capturing metrics related to data I/O, garbage collection, memory and CPU pressure, and Spark job, stage, and task metrics.

Note that certain metrics are not available in all Spark release versions (for example, appStatus was introduced in Spark 3.0).

Create and upload the required files to an S3 bucket

For more information, see Uploading objects and Installing and running the CloudWatch agent on your servers.

To create and the upload the bootstrap script, complete the following steps:

  1. On the Amazon S3 console, choose your S3 bucket.
  2. On the Objects tab, choose Upload.
  3. Choose Add files, then choose the Metricfilter.json, installer.sh, and examplejob.sh files.
  4. Additionally, upload the emr-custom-cw-sink-0.0.1.jar metrics library file that corresponds to the Amazon EMR release version you will be using:
    1. EMR-6.x.x
    2. EMR-5.x.x
  5. Choose Upload, and take note of the S3 URIs for the files.

Provision resources with the CloudFormation template

Choose Launch Stack to launch a CloudFormation stack in your account and deploy the template:

launch stack 1

This template creates an IAM role, IAM instance profile, EMR cluster, and CloudWatch dashboard. The cluster starts a basic Spark example application. You will be billed for the AWS resources used if you create a stack from this template.

The CloudFormation wizard will ask you to modify or provide these parameters:

  • InstanceType – The type of instance for all instance groups. The default is m5.2xlarge.
  • InstanceCountCore – The number of instances in the core instance group. The default is 4.
  • EMRReleaseLabel – The Amazon EMR release label you want to use. The default is emr-6.9.0.
  • BootstrapScriptPath – The S3 path of the installer.sh installation bootstrap script that you copied earlier.
  • MetricFilterPath – The S3 path of your Metricfilter.json definition that you copied earlier.
  • MetricsLibraryPath – The S3 path of your CloudWatch emr-custom-cw-sink-0.0.1.jar library that you copied earlier.
  • CloudWatchNamespace – The name of the custom CloudWatch namespace to be used.
  • SparkDemoApplicationPath – The S3 path of your examplejob.sh script that you copied earlier.
  • Subnet – The EC2 subnet where the cluster launches. You must provide this parameter.
  • EC2KeyPairName – An optional EC2 key pair for connecting to cluster nodes, as an alternative to Session Manager.

View the metrics

After the CloudFormation stack deploys successfully, the example job starts automatically and takes approximately 15 minutes to complete. On the CloudWatch console, choose Dashboards in the navigation pane. Then filter the list by the prefix SparkMonitoring.

The example dashboard includes information on the cluster and an overview of the Spark jobs, stages, and tasks. Metrics are also available under a custom namespace starting with EMRCustomSparkCloudWatchSink.

CloudWatch dashboard summary section

Memory, CPU, I/O, and additional task distribution metrics are also included.

CloudWatch dashboard executors

Finally, detailed Java garbage collection metrics are available per executor.

CloudWatch dashboard garbage-collection

Clean up

To avoid future charges in your account, delete the resources you created in this walkthrough. The EMR cluster will incur charges as long as the cluster is active, so stop it when you’re done. Complete the following steps:

  1. On the CloudFormation console, in the navigation pane, choose Stacks.
  2. Choose the stack you launched (EMR-CloudWatch-Demo), then choose Delete.
  3. Empty the S3 bucket you created.
  4. Delete the S3 bucket you created.

Conclusion

Now that you have completed the steps in this walkthrough, the CloudWatch agent is running on your cluster hosts and configured to push Spark metrics to CloudWatch. With this feature, you can effectively monitor the health and performance of your Spark jobs running on Amazon EMR, detecting critical issues in real time and identifying root causes quickly.

You can package and deploy this solution through a CloudFormation template like this example template, which creates the IAM instance profile role, CloudWatch dashboard, and EMR cluster. The source code for the library is available on GitHub for customization.

To take this further, consider using these metrics in CloudWatch alarms. You could collect them with other alarms into a composite alarm or configure alarm actions such as sending Amazon Simple Notification Service (Amazon SNS) notifications to trigger event-driven processes such as AWS Lambda functions.


About the Author

author portraitLe Clue Lubbe is a Principal Engineer at AWS. He works with our largest enterprise customers to solve some of their most complex technical problems. He drives broad solutions through innovation to impact and improve the life of our customers.

How to Connect Your On-Premises Active Directory to AWS Using AD Connector

Post Syndicated from Jeremy Cowan original https://aws.amazon.com/blogs/security/how-to-connect-your-on-premises-active-directory-to-aws-using-ad-connector/

August 17, 2023: We updated the instructions and screenshots in this post to align with changes to the AWS Management Console.

April 25, 2023: We’ve updated this blog post to include more security learning resources.


AD Connector is designed to give you an easy way to establish a trusted relationship between your Active Directory and AWS. When AD Connector is configured, the trust allows you to:

  • Sign in to AWS applications such as Amazon WorkSpaces, Amazon WorkDocs, and Amazon WorkMail by using your Active Directory credentials.
  • Seamlessly join Windows instances to your Active Directory domain either through the Amazon EC2 launch wizard or programmatically through the EC2 Simple System Manager (SSM) API.
  • Provide federated sign-in to the AWS Management Console by mapping Active Directory identities to AWS Identity and Access Management (IAM) roles.

AD Connector cannot be used with your custom applications, as it is only used for secure AWS integration for the three use-cases mentioned above. Custom applications relying on your on-premises Active Directory should communicate with your domain controllers directly or utilize AWS Managed Microsoft AD rather than integrating with AD Connector. To learn more about which AWS Directory Service solution works best for your organization, see the service documentation.

With AD Connector, you can streamline identity management by extending your user identities from Active Directory. It also enables you to reuse your existing Active Directory security policies such as password expiration, password history, and account lockout policies. Also, your users will no longer need to remember yet another user name and password combination. Since AD Connector doesn’t rely on complex directory synchronization technologies or Active Directory Federation Services (AD FS), you can forego the added cost and complexity of hosting a SAML-based federation infrastructure. In sum, AD Connector helps foster a hybrid environment by allowing you to leverage your existing on-premises investments to control different facets of AWS.

This blog post will show you how AD Connector works as well as walk through how to enable federated console access, assign users to roles, and seamlessly join an EC2 instance to an Active Directory domain.

AD Connector – Under the Hood

AD Connector is a dual Availability Zone proxy service that connects AWS apps to your on-premises directory. AD Connector forwards sign-in requests to your Active Directory domain controllers for authentication and provides the ability for applications to query the directory for data. When you configure AD Connector, you provide it with service account credentials that are securely stored by AWS. This account is used by AWS to enable seamless domain join, single sign-on (SSO), and AWS Applications (WorkSpaces, WorkDocs, and WorkMail) functionality. Given AD Connector’s role as a proxy, it does not store or cache user credentials. Rather, authentication, lookup, and management requests are handled by your Active Directory.

In order to create an AD Connector, you must also provide a pair of DNS IP addresses during setup. These are used by AD Connector to retrieve Service (SRV) DNS records to locate the nearest domain controllers to route requests to. The AD connector proxy instances use an algorithm similar to the Active Directory domain controller locator process to decide which domain controllers to connect to for LDAP and Kerberos requests.

For authentication to AWS applications and the AWS Management Console, you can configure an access URL from the AWS Directory Service console. This access URL is in the format of https://<alias>.awsapps.com and provides a publicly accessible sign-in page. You can visit https://<alias>.awsapps.com/workdocs to sign in to WorkDocs, and https://<alias>.awsapps.com/console to sign in to the AWS Management Console. The following image shows the sign-in page for the AWS Management Console.

Figure 1: Login

Figure 1: Login

For added security you can enable multi-factor authentication (MFA) for AD Connector, but you’ll need to have an existing RADIUS infrastructure in your on-premises network set up to leverage this feature. See AD Connector – Multi-factor Authentication Prerequisites for more information about requirements and configuration. With MFA enabled with AD Connector, the sign-in page hosted at your access URL will prompt users for an MFA code in addition to their standard sign-in credentials.

AD Connector comes in two sizes: small and large. A large AD Connector runs on more powerful compute resources and is more expensive than a small AD Connector. Depending on the volume of traffic to be proxied by AD Connector, you’ll want to select the appropriate size for your needs.

Figure 2: Directory size

Figure 2: Directory size

AD Connector is highly available, meaning underlying hosts are deployed across multiple Availability Zones in the region you deploy. In the event of host-level failure, Directory Service will promptly replace failed hosts. Directory Service also applies performance and security updates automatically to AD Connector.

The following diagram illustrates the authentication flow and network path when you enable AWS Management Console access:

  1. A user opens the secure custom sign-in page and supplies their Active Directory user name and password.
  2. The authentication request is sent over SSL to AD Connector.
  3. AD Connector performs LDAP authentication to Active Directory.

    Note: AD Connector locates the nearest domain controllers by querying the SRV DNS records for the domain.

  4. After the user has been authenticated, AD Connector calls the STS AssumeRole method to get temporary security credentials for that user. Using those temporary security credentials, AD Connector constructs a sign-in URL that users use to access the console.

    Note: If a user is mapped to multiple roles, the user will be presented with a choice at sign-in as to which role they want to assume. The user session is valid for 1 hour.

    Figure 3: Authentication flow and network path

    Figure 3: Authentication flow and network path

Before getting started with configuring AD Connector for federated AWS Management Console access, be sure you’ve read and understand the prerequisites for AD Connector. For example, as shown in Figure 3 there must be a VPN or Direct Connect circuit in place between your VPC and your on-premises environment. Your domain also has to be running at Windows 2003 functional level or later. Also, various ports have to be opened between your VPC and your on-premises environment to allow AD Connector to communicate with your on-premises directory.

Configuring AD Connector for federated AWS Management Console access

Enable console access

To allow users to sign in with their Active Directory credentials, you need to explicitly enable console access. You can do this by opening the Directory Service console and clicking the Directory ID name (Figure 4).

This opens the Directory Details page, where you’ll find a dropdown menu on the Apps & Services tab to enable the directory for AWS Management Console access.

Figure 4: Directories

Figure 4: Directories

Choose the Application management tab as seen in Figure 5.

Figure 5: Application Management

Figure 5: Application Management

Scroll down to AWS Management Console as shown in Figure 6, and choose Enable from the Actions dropdown list.

Figure 6: Enable console access

Figure 6: Enable console access

After enabling console access, you’re ready to start configuring roles and associating Active Directory users and groups with those roles.

Follow these steps to create a new role. When you create a new role through the Directory Service console, AD Connector automatically adds a trust relationship to Directory Service. The following code example shows the IAM trust policy for the role, after a role is created.

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Sid": "",
       "Effect": "Allow",
       "Principal": {
         "Service": "ds.amazonaws.com"
       },
       "Action": "sts:AssumeRole",
       "Condition": {
         "StringEquals": {
           "sts:externalid": "482242153642"
	  }
	}
     }
   ]
}

Assign users to roles

Now that AD Connector is configured and you’ve created a role, your next job is to assign users or groups to those IAM roles. Role mapping is what governs what resources a user has access to within AWS. To do this you’ll need to do the following steps:

  1. Open the Directory Service console and navigate to the AWS Management Console section.
  2. In the search bar, type the name of the role you just created.
  3. Select the role that you just created by choosing the name under the IAM role field.
  4. Choose Add, and enter the name to be added to find users or groups for this role.
  5. Choose Add, and the user or group is now assigned to the role.

When you’re finished, you should see the name of the user or group along with the corresponding ID for that object. It is also important to note that this list can be used to remove users or groups from the role. The next time the user signs in to the AWS Management Console from the custom sign-in page, they will be signed in under the EC2ReadOnly security role.

Seamlessly join an instance to an Active Directory domain

Another advantage to using AD Connector is the ability to seamlessly join Windows (EC2) instances to your Active Directory domain. This allows you to join a Windows Server to the domain while the instance is being provisioned instead of using a script or doing it manually. This section of this blog post will explain the steps necessary to enable this feature in your environment and how the service works.

Step 1: Create a role

Until recently you had to manually create an IAM policy to allow an EC2 instance to access the SSM, an AWS service that allows you to configure Windows instances while they’re running and on first launch. Now, there’s a managed policy called AmazonEC2RoleforSSM that you can use instead. The role you are about to create will be assigned to an EC2 instance when it’s provisioned, which will grant it permission to access the SSM service.

To create the role:

  1. Open the IAM console.
  2. Click Roles in the navigation pane.
  3. Click Create Role.
  4. Type a name for your role in the Role Name field.
  5. Under AWS Service Roles, select Amazon EC2 and then click Select.
  6. On the Attach Policy page, select AmazonEC2RoleforSSM and then click Next Step.
  7. On the Review page, click Create Role.

If you click the role you created, you’ll see a trust policy for EC2, which looks like the following code example.

{
     "Version": "2012-10-17",
     "Statement": [
       {
         "Sid": "",
         "Effect": "Allow",
         "Principal": {
           "Service": "ec2.amazonaws.com"
         },
         "Action": "sts:AssumeRole"
       }
     ]
}

Step 2: Create a new Windows instance from the EC2 console

With this role in place, you can now join a Windows instance to your domain via the EC2 launch wizard. For a detailed explanation about how to do this, see Joining a Domain Using the Amazon EC2 Launch Wizard.

If you’re instantiating a new instance from the API, however, you will need to create an SSM configuration document and upload it to the SSM service beforehand. We’ll step through that process next.

Note: The instance will require internet access to communicate with the SSM service.

Figure 7: Configure instance details

Figure 7: Configure instance details

When you create a new Windows instance from the EC2 launch wizard as shown in Figure 7, the wizard automatically creates the SSM configuration document from the information stored in AD Connector. Presently, the EC2 launch wizard doesn’t allow you to specify which organizational unit (OU) you want to deploy the member server into.

Step 3: Create an SSM document (for seamlessly joining a server to the domain through the AWS API)

If you want to provision new Windows instances from the AWS CLI or API or you want to specify the target OU for your instances, you will need to create an SSM configuration document. The configuration document is a JSON file that contains various parameters used to configure your instances. The following code example is a configuration document for joining a domain.

{
	"schemaVersion": "1.0",
	"description": "Sample configuration to join an instance to a domain",
	"runtimeConfig": {
	   "aws:domainJoin": {
	       "properties": {
	          "directoryId": "d-1234567890",
	          "directoryName": "test.example.com",
	          "directoryOU": "OU=test,DC=example,DC=com",
	          "dnsIpAddresses": [
	             "198.51.100.1",
	             "198.51.100.2"
	          ]
	       }
	   }
	}
}

In this configuration document:

  • directoryId is the ID for the AD Connector you created earlier.
  • directoryName is the name of the domain (for example, examplecompany.com).
  • directoryOU is the OU for the domain.
  • dnsIpAddresses are the IP addresses for the DNS servers you specified when you created the AD Connector.

For additional information, see aws:domainJoin. When you’re finished creating the file, save it as a JSON file.

Note: The name of the file has to be at least 1 character and at most 64 characters in length.

Step 4: Upload the configuration document to SSM

This step requires that the user have permission to use SSM to configure an instance. If you don’t have a policy that includes these rights, create a new policy by using the following JSON, and assign it to an IAM user or group.

{
   "Version": "2012-10-17",
   "Statement": [
     {
       "Effect": "Allow",
       "Action": "ssm:*",
       "Resource": "*"
     }
   ]
}

After you’ve signed in with a user that associates with the SSM IAM policy you created, run the following command from the AWS CLI.

aws ssm create-document ‐‐content file://path/to/myconfigfile.json ‐‐name "My_Custom_Config_File"

Note: On Linux/Mac systems, you need to add a “/” at the beginning of the path (for example, file:///Users/username/temp).

This command uploads the configuration document you created to the SSM service, allowing you to reference it when creating a new Windows instance from either the AWS CLI or the EC2 launch wizard.

Conclusion

This blog post has shown you how you can simplify account management by federating with your Active Directory for AWS Management Console access. The post also explored how you can enable hybrid IT by using AD Connector to seamlessly join Windows instances to your Active Directory domain. Armed with this information you can create a trust between your Active Directory and AWS. In addition, you now have a quick and simple way to enable single sign-on without needing to replicate identities or deploy additional infrastructure on premises.

We’d love to hear more about how you are using Directory Service, and welcome any feedback about how we can improve the experience. You can post comments below, or visit the Directory Service forum to post comments and questions.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a new thread on the AWS Directory Service knowledge Center re:Post or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Jeremy Cowan

Jeremy Cowan

Jeremy is a Specialist Solutions Architect for containers at AWS, although his family thinks he sells “cloud space”. Prior to joining AWS, Jeremy worked for several large software vendors, including VMware, Microsoft, and IBM. When he’s not working, you can usually find on a trail in the wilderness, far away from technology.

Bright Dike

Bright Dike

Bright is a Solutions Architect with Amazon Web Services. He works with AWS customers and partners to provide guidance assessing and improving their security posture, as well as executing automated remediation techniques. His domains are threat detection, incident response, and security hub.

David Selberg

David Selberg

David is an Enterprise Solutions Architect at AWS who is passionate about helping customers build Well-Architected solutions on the AWS cloud. With a background in cybersecurity, David loves to dive deep on security topics when he’s not creating technical content like the “All Things AWS” Twitch series.

Abhra Sinha

Abhra Sinha

Abhra is a Toronto-based Enterprise Solutions Architect at AWS. Abhra enjoys being a trusted advisor to customers, working closely with them to solve their technical challenges and help build a secure scalable architecture on AWS. In his spare time, he enjoys photography and exploring new restaurants.

How to Receive Alerts When Your IAM Configuration Changes

Post Syndicated from Dylan Souvage original https://aws.amazon.com/blogs/security/how-to-receive-alerts-when-your-iam-configuration-changes/

July 27, 2023: This post was originally published February 5, 2015, and received a major update July 31, 2023.


As an Amazon Web Services (AWS) administrator, it’s crucial for you to implement robust protective controls to maintain your security configuration. Employing a detective control mechanism to monitor changes to the configuration serves as an additional safeguard in case the primary protective controls fail. Although some changes are expected, you might want to review unexpected changes or changes made by a privileged user. AWS Identity and Access Management (IAM) is a service that primarily helps manage access to AWS services and resources securely. It does provide detailed logs of its activity, but it doesn’t inherently provide real-time alerts or notifications. Fortunately, you can use a combination of AWS CloudTrail, Amazon EventBridge, and Amazon Simple Notification Service (Amazon SNS) to alert you when changes are made to your IAM configuration. In this blog post, we walk you through how to set up EventBridge to initiate SNS notifications for IAM configuration changes. You can also have SNS push messages directly to ticketing or tracking services, such as Jira, Service Now, or your preferred method of receiving notifications, but that is not discussed here.

In any AWS environment, many activities can take place at every moment. CloudTrail records IAM activities, EventBridge filters and routes event data, and Amazon SNS provides notification functionality. This post will guide you through identifying and setting alerts for IAM changes, modifications in authentication and authorization configurations, and more. The power is in your hands to make sure you’re notified of the events you deem most critical to your environment. Here’s a quick overview of how you can invoke a response, shown in Figure 1.

Figure 1: Simple architecture diagram of actors and resources in your account and the process for sending notifications through IAM, CloudTrail, EventBridge, and SNS.

Figure 1: Simple architecture diagram of actors and resources in your account and the process for sending notifications through IAM, CloudTrail, EventBridge, and SNS.

Log IAM changes with CloudTrail

Before we dive into implementation, let’s briefly understand the function of AWS CloudTrail. It records and logs activity within your AWS environment, tracking actions such as IAM role creation, deletion, or modification, thereby offering an audit trail of changes.

With this in mind, we’ll discuss the first step in tracking IAM changes: establishing a log for each modification. In this section, we’ll guide you through using CloudTrail to create these pivotal logs.

For an in-depth understanding of CloudTrail, refer to the AWS CloudTrail User Guide.

In this post, you’re going to start by creating a CloudTrail trail with the Management events type selected, and read and write API activity selected. If you already have a CloudTrail trail set up with those attributes, you can use that CloudTrail trail instead.

To create a CloudTrail log

  1. Open the AWS Management Console and select CloudTrail, and then choose Dashboard.
  2. In the CloudTrail dashboard, choose Create Trail.
    Figure 2: Use the CloudTrail dashboard to create a trail

    Figure 2: Use the CloudTrail dashboard to create a trail

  3. In the Trail name field, enter a display name for your trail and then select Create a new S3 bucket. Leave the default settings for the remaining trail attributes.
    Figure 3: Set the trail name and storage location

    Figure 3: Set the trail name and storage location

  4. Under Event type, select Management events. Under API activity, select Read and Write.
  5. Choose Next.
    Figure 4: Choose which events to log

    Figure 4: Choose which events to log

Set up notifications with Amazon SNS

Amazon SNS is a managed service that provides message delivery from publishers to subscribers. It works by allowing publishers to communicate asynchronously with subscribers by sending messages to a topic, a logical access point, and a communication channel. Subscribers can receive these messages using supported endpoint types, including email, which you will use in the blog example today.

For further reading on Amazon SNS, refer to the Amazon SNS Developer Guide.

Now that you’ve set up CloudTrail to log IAM changes, the next step is to establish a mechanism to notify you about these changes in real time.

To set up notifications

  1. Open the Amazon SNS console and choose Topics.
  2. Create a new topic. Under Type, select Standard and enter a name for your topic. Keep the defaults for the rest of the options, and then choose Create topic.
    Figure 5: Select Standard as the topic type

    Figure 5: Select Standard as the topic type

  3. Navigate to your topic in the topic dashboard, choose the Subscriptions tab, and then choose Create subscription.
    Figure 6: Choose Create subscription

    Figure 6: Choose Create subscription

  4. For Topic ARN, select the topic you created previously, then under Protocol, select Email and enter the email address you want the alerts to be sent to.
    Figure 7: Select the topic ARN and add an endpoint to send notifications to

    Figure 7: Select the topic ARN and add an endpoint to send notifications to

  5. After your subscription is created, go to the mailbox you designated to receive notifications and check for a verification email from the service. Open the email and select Confirm subscription to verify the email address and complete setup.

Initiate events with EventBridge

Amazon EventBridge is a serverless service that uses events to connect application components. EventBridge receives an event (an indicator of a change in environment) and applies a rule to route the event to a target. Rules match events to targets based on either the structure of the event, called an event pattern, or on a schedule.

Events that come to EventBridge are associated with an event bus. Rules are tied to a single event bus, so they can only be applied to events on that event bus. Your account has a default event bus that receives events from AWS services, and you can create custom event buses to send or receive events from a different account or AWS Region.

For a more comprehensive understanding of EventBridge, refer to the Amazon EventBridge User Guide.

In this part of our post, you’ll use EventBridge to devise a rule for initiating SNS notifications based on IAM configuration changes.

To create an EventBridge rule

  1. Go to the EventBridge console and select EventBridge Rule, and then choose Create rule.
    Figure 8: Use the EventBridge console to create a rule

    Figure 8: Use the EventBridge console to create a rule

  2. Enter a name for your rule, keep the defaults for the rest of rule details, and then choose Next.
    Figure 9: Rule detail screen

    Figure 9: Rule detail screen

  3. Under Target 1, select AWS service.
  4. In the dropdown list for Select a target, select SNS topic, select the topic you created previously, and then choose Next.
    Figure 10: Target with target type of AWS service and target topic of SNS topic selected

    Figure 10: Target with target type of AWS service and target topic of SNS topic selected

  5. Under Event source, select AWS events or EventBridge partner events.
    Figure 11: Event pattern with AWS events or EventBridge partner events selected

    Figure 11: Event pattern with AWS events or EventBridge partner events selected

  6. Under Event pattern, verify that you have the following selected.
    1. For Event source, select AWS services.
    2. For AWS service, select IAM.
    3. For Event type, select AWS API Call via CloudTrail.
    4. Select the radio button for Any operation.
    Figure 12: Event pattern details selected

    Figure 12: Event pattern details selected

Now that you’ve set up EventBridge to monitor IAM changes, test it by creating a new user or adding a new policy to an IAM role and see if you receive an email notification.

Centralize EventBridge alerts by using cross-account alerts

If you have multiple accounts, you should be evaluating using AWS Organizations. (For a deep dive into best practices for using AWS Organizations, we recommend reading this AWS blog post.)

By standardizing the implementation to channel alerts from across accounts to a primary AWS notification account, you can use a multi-account EventBridge architecture. This allows aggregation of notifications across your accounts through sender and receiver accounts. Figure 13 shows how this works. Separate member accounts within an AWS organizational unit (OU) have the same mechanism for monitoring changes and sending notifications as discussed earlier, but send notifications through an EventBridge instance in another account.

Figure 13: Multi-account EventBridge architecture aggregating notifications between two AWS member accounts to a primary management account

Figure 13: Multi-account EventBridge architecture aggregating notifications between two AWS member accounts to a primary management account

You can read more and see the implementation and deep dive of the multi-account EventBridge solution on the AWS samples GitHub, and you can also read more about sending and receiving Amazon EventBridge notifications between accounts.

Monitor calls to IAM

In this blog post example, you monitor calls to IAM.

The filter pattern you selected while setting up EventBridge matches CloudTrail events for calls to the IAM service. Calls to IAM have a CloudTrail eventSource of iam.amazonaws.com, so IAM API calls will match this pattern. You will find this simple default filter pattern useful if you have minimal IAM activity in your account or to test this example. However, as your account activity grows, you’ll likely receive more notifications than you need. This is when filtering only the relevant events becomes essential to prioritize your responses. Effectively managing your filter preferences allows you to focus on events of significance and maintain control as your AWS environment grows.

Monitor changes to IAM

If you’re interested only in changes to your IAM account, you can modify the event pattern inside EventBridge, the one you used to set up IAM notifications, with an eventName filter pattern, shown following.

"eventName": [
      "Add*",
      "Attach*",
      "Change*",
      "Create*",
      "Deactivate*",
      "Delete*",
      "Detach*",
      "Enable*",
      "Put*",
      "Remove*",
      "Set*",
      "Update*",
      "Upload*"
    ]

This filter pattern will only match events from the IAM service that begin with Add, Change, Create, Deactivate, Delete, Enable, Put, Remove, Update, or Upload. For more information about APIs matching these patterns, see the IAM API Reference.

To edit the filter pattern to monitor only changes to IAM

  1. Open the EventBridge console, navigate to the Event pattern, and choose Edit pattern.
    Figure 14: Modifying the event pattern

    Figure 14: Modifying the event pattern

  2. Add the eventName filter pattern from above to your event pattern.
    Figure 15: Use the JSON editor to add the eventName filter pattern

    Figure 15: Use the JSON editor to add the eventName filter pattern

Monitor changes to authentication and authorization configuration

Monitoring changes to authentication (security credentials) and authorization (policy) configurations is critical, because it can alert you to potential security vulnerabilities or breaches. For instance, unauthorized changes to security credentials or policies could indicate malicious activity, such as an attempt to gain unauthorized access to your AWS resources. If you’re only interested in these types of changes, use the preceding steps to implement the following filter pattern.

    "eventName": [
      "Put*Policy",
      "Attach*",
      "Detach*",
      "Create*",
      "Update*",
      "Upload*",
      "Delete*",
      "Remove*",
      "Set*"
    ]

This filter pattern matches calls to IAM that modify policy or create, update, upload, and delete IAM elements.

Conclusion

Monitoring IAM security configuration changes allows you another layer of defense against the unexpected. Balancing productivity and security, you might grant a user broad permissions in order to facilitate their work, such as exploring new AWS services. Although preventive measures are crucial, they can potentially restrict necessary actions. For example, a developer may need to modify an IAM role for their task, an alteration that could pose a security risk. This change, while essential for their work, may be undesirable from a security standpoint. Thus, it’s critical to have monitoring systems alongside preventive measures, allowing necessary actions while maintaining security.

Create an event rule for IAM events that are important to you and have a response plan ready. You can refer to Security best practices in IAM for further reading on this topic.

If you have questions or feedback about this or any other IAM topic, please visit the IAM re:Post forum. You can also read about the multi-account EventBridge solution on the AWS samples GitHub and learn more about sending and receiving Amazon EventBridge notifications between accounts.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Dylan Souvage

Dylan Souvage

Dylan is a Solutions Architect based in Toronto, Canada. Dylan loves working with customers to understand their business and enable them in their cloud journey. In his spare time, he enjoys martial arts, sports, anime, and traveling to warm, sunny places to spend time with his friends and family.

Abhra Sinha

Abhra Sinha

Abhra is a Toronto-based Enterprise Solutions Architect at AWS. Abhra enjoys being a trusted advisor to customers, working closely with them to solve their technical challenges and help build a secure, scalable architecture on AWS. In his spare time, he enjoys Photography and exploring new restaurants.

Deploy container applications in a multicloud environment using Amazon CodeCatalyst

Post Syndicated from Pawan Shrivastava original https://aws.amazon.com/blogs/devops/deploy-container-applications-in-a-multicloud-environment-using-amazon-codecatalyst/

In the previous post of this blog series, we saw how organizations can deploy workloads to virtual machines (VMs) in a hybrid and multicloud environment. This post shows how organizations can address the requirement of deploying containers, and containerized applications to hybrid and multicloud platforms using Amazon CodeCatalyst. CodeCatalyst is an integrated DevOps service which enables development teams to collaborate on code, and build, test, and deploy applications with continuous integration and continuous delivery (CI/CD) tools.

One prominent scenario where multicloud container deployment is useful is when organizations want to leverage AWS’ broadest and deepest set of Artificial Intelligence (AI) and Machine Learning (ML) capabilities by developing and training AI/ML models in AWS using Amazon SageMaker, and deploying the model package to a Kubernetes platform on other cloud platforms, such as Azure Kubernetes Service (AKS) for inference. As shown in this workshop for operationalizing the machine learning pipeline, we can train an AI/ML model, push it to Amazon Elastic Container Registry (ECR) as an image, and later deploy the model as a container application.

Scenario description

The solution described in the post covers the following steps:

  • Setup Amazon CodeCatalyst environment.
  • Create a Dockerfile along with a manifest for the application, and a repository in Amazon ECR.
  • Create an Azure service principal which has permissions to deploy resources to Azure Kubernetes Service (AKS), and store the credentials securely in Amazon CodeCatalyst secret.
  • Create a CodeCatalyst workflow to build, test, and deploy the containerized application to AKS cluster using Github Actions.

The architecture diagram for the scenario is shown in Figure 1.

Solution architecture diagram

Figure 1 – Solution Architecture

Solution Walkthrough

This section shows how to set up the environment, and deploy a HTML application to an AKS cluster.

Setup Amazon ECR and GitHub code repository

Create a new Amazon ECR and a code repository. In this case we’re using GitHub as the repository but you can create a source repository in CodeCatalyst or you can choose to link an existing source repository hosted by another service if that service is supported by an installed extension. Then follow the application and Docker image creation steps outlined in Step 1 in the environment creation process in exposing Multiple Applications on Amazon EKS. Create a file named manifest.yaml as shown, and map the “image” parameter to the URL of the Amazon ECR repository created above.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: multicloud-container-deployment-app
  labels:
    app: multicloud-container-deployment-app
spec:
  selector:
    matchLabels:
      app: multicloud-container-deployment-app
  replicas: 2
  template:
    metadata:
      labels:
        app: multicloud-container-deployment-app
    spec:
      nodeSelector:
        "beta.kubernetes.io/os": linux
      containers:
      - name: ecs-web-page-container
        image: <aws_account_id>.dkr.ecr.us-west-2.amazonaws.com/<my_repository>
        imagePullPolicy: Always
        ports:
            - containerPort: 80
        resources:
          limits:
            memory: "100Mi"
            cpu: "200m"
      imagePullSecrets:
          - name: ecrsecret
---
apiVersion: v1
kind: Service
metadata:
  name: multicloud-container-deployment-service
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: multicloud-container-deployment-app

Push the files to Github code repository. The multicloud-container-app github repository should look similar to Figure 2 below

Files in multicloud container app github repository 

Figure 2 – Files in Github repository

Configure Azure Kubernetes Service (AKS) cluster to pull private images from ECR repository

Pull the docker images from a private ECR repository to your AKS cluster by running the following command. This setup is required during the azure/k8s-deploy Github Actions in the CI/CD workflow. Authenticate Docker to an Amazon ECR registry with get-login-password by using aws ecr get-login-password. Run the following command in a shell where AWS CLI is configured, and is used to connect to the AKS cluster. This creates a secret called ecrsecret, which is used to pull an image from the private ECR repository.

kubectl create secret docker-registry ecrsecret\
 --docker-server=<aws_account_id>.dkr.ecr.us-west-2.amazonaws.com/<my_repository>\
 --docker-username=AWS\
 --docker-password= $(aws ecr get-login-password --region us-west-2)

Provide ECR URI in the variable “–docker-server =”.

CodeCatalyst setup

Follow these steps to set up CodeCatalyst environment:

Configure access to the AKS cluster

In this solution, we use three GitHub Actions – azure/login, azure/aks-set-context and azure/k8s-deploy – to login, set the AKS cluster, and deploy the manifest file to the AKS cluster respectively. For the Github Actions to access the Azure environment, they require credentials associated with an Azure Service Principal.

Service Principals in Azure are identified by the CLIENT_ID, CLIENT_SECRET, SUBSCRIPTION_ID, and TENANT_ID properties. Create the Service principal by running the following command in the azure cloud shell:

az ad sp create-for-rbac \
    --name "ghActionHTMLapplication" \
    --scope /subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP> \
    --role Contributor \
    --sdk-auth

The command generates a JSON output (shown in Figure 3), which is stored in CodeCatalyst secret called AZURE_CREDENTIALS. This credential is used by azure/login Github Actions.

JSON output stored in AZURE-CREDENTIALS secret

Figure 3 – JSON output

Configure secrets inside CodeCatalyst Project

Create three secrets CLUSTER_NAME (Name of AKS cluster), RESOURCE_GROUP(Name of Azure resource group) and AZURE_CREDENTIALS(described in the previous step) as described in the working with secret document. The secrets are shown in Figure 4.

Secrets in CodeCatalyst

Figure 4 – CodeCatalyst Secrets

CodeCatalyst CI/CD Workflow

To create a new CodeCatalyst workflow, select CI/CD from the navigation on the left and select Workflows (1). Then, select Create workflow (2), leave the default options, and select Create (3) as shown in Figure 5.

Create CodeCatalyst CI/CD workflow

Figure 5 – Create CodeCatalyst CI/CD workflow

Add “Push to Amazon ECR” Action

Add the Push to Amazon ECR action, and configure the environment where you created the ECR repository as shown in Figure 6. Refer to adding an action to learn how to add CodeCatalyst action.

Create ‘Push to ECR’ CodeCatalyst Action

Figure 6 – Create ‘Push to ECR’ Action

Select the Configuration tab and specify the configurations as shown in Figure7.

Configure ‘Push to ECR’ CodeCatalyst Action

Figure 7 – Configure ‘Push to ECR’ Action

Configure the Deploy action

1. Add a GitHub action for deploying to AKS as shown in Figure 8.

Github action to deploy to AKS

Figure 8 – Github action to deploy to AKS

2. Configure the GitHub action from the configurations tab by adding the following snippet to the GitHub Actions YAML property:

- name: Install Azure CLI
  run: pip install azure-cli
- name: Azure login
  id: login
  uses: azure/[email protected]
  with:
    creds: ${Secrets.AZURE_CREDENTIALS}
- name: Set AKS context
  id: set-context
  uses: azure/aks-set-context@v3
  with:
    resource-group: ${Secrets.RESOURCE_GROUP}
    cluster-name: ${Secrets.CLUSTER_NAME}
- name: Setup kubectl
  id: install-kubectl
  uses: azure/setup-kubectl@v3
- name: Deploy to AKS
  id: deploy-aks
  uses: Azure/k8s-deploy@v4
  with:
    namespace: default
    manifests: manifest.yaml
    pull-images: true

Github action configuration for deploying application to AKS

Figure 9 – Github action configuration

3. The workflow is now ready and can be validated by choosing ‘Validate’ and then saved to the repository by choosing ‘Commit’.
We have implemented an automated CI/CD workflow that builds the container image of the application (refer Figure 10), pushes the image to ECR, and deploys the application to AKS cluster. This CI/CD workflow is triggered as application code is pushed to the repository.

Automated CI/CD workflow

Figure 10 – Automated CI/CD workflow

Test the deployment

When the HTML application runs, Kubernetes exposes the application using a public facing load balancer. To find the external IP of the load balancer, connect to the AKS cluster and run the following command:

kubectl get service multicloud-container-deployment-service

The output of the above command should look like the image in Figure 11.

Output of kubectl get service command

Figure 11 – Output of kubectl get service

Paste the External IP into a browser to see the running HTML application as shown in Figure 12.

HTML application running successfully in AKS

Figure 12 – Application running in AKS

Cleanup

If you have been following along with the workflow described in the post, you should delete the resources you deployed so you do not continue to incur charges. First, delete the Amazon ECR repository using the AWS console. Second, delete the project from CodeCatalyst by navigating to Project settings and choosing Delete project. There’s no cost associated with the CodeCatalyst project and you can continue using it. Finally, if you deployed the application on a new AKS cluster, delete the cluster from the Azure console. In case you deployed the application to an existing AKS cluster, run the following commands to delete the application resources.

kubectl delete deployment multicloud-container-deployment-app
kubectl delete services multicloud-container-deployment-service

Conclusion

In summary, this post showed how Amazon CodeCatalyst can help organizations deploy containerized workloads in a hybrid and multicloud environment. It demonstrated in detail how to set up and configure Amazon CodeCatalyst to deploy a containerized application to Azure Kubernetes Service, leveraging a CodeCatalyst workflow, and GitHub Actions. Learn more and get started with your Amazon CodeCatalyst journey!

If you have any questions or feedback, leave them in the comments section.

About Authors

Picture of Pawan

Pawan Shrivastava

Pawan Shrivastava is a Partner Solution Architect at AWS in the WWPS team. He focusses on working with partners to provide technical guidance on AWS, collaborate with them to understand their technical requirements, and designing solutions to meet their specific needs. Pawan is passionate about DevOps, automation and CI CD pipelines. He enjoys watching MMA, playing cricket and working out in the gym.

Picture of Brent

Brent Van Wynsberge

Brent Van Wynsberge is a Solutions Architect at AWS supporting enterprise customers. He accelerates the cloud adoption journey for organizations by aligning technical objectives to business outcomes and strategic goals, and defining them where needed. Brent is an IoT enthusiast, specifically in the application of IoT in manufacturing, he is also interested in DevOps, data analytics and containers.

Picture of Amandeep

Amandeep Bajwa

Amandeep Bajwa is a Senior Solutions Architect at AWS supporting Financial Services enterprises. He helps organizations achieve their business outcomes by identifying the appropriate cloud transformation strategy based on industry trends, and organizational priorities. Some of the areas Amandeep consults on are cloud migration, cloud strategy (including hybrid & multicloud), digital transformation, data & analytics, and technology in general.

Picture of Brian

Brian Beach

Brian Beach has over 20 years of experience as a Developer and Architect. He is currently a Principal Solutions Architect at Amazon Web Services. He holds a Computer Engineering degree from NYU Poly and an MBA from Rutgers Business School. He is the author of “Pro PowerShell for Amazon Web Services” from Apress. He is a regular author and has spoken at numerous events. Brian lives in North Carolina with his wife and three kids.

Automate secure access to Amazon MWAA environments using existing OpenID Connect single-sign-on authentication and authorization

Post Syndicated from Ajay Vohra original https://aws.amazon.com/blogs/big-data/automate-secure-access-to-amazon-mwaa-environments-using-existing-openid-connect-single-sign-on-authentication-and-authorization/

Customers use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to run Apache Airflow at scale in the cloud. They want to use their existing login solutions developed using OpenID Connect (OIDC) providers with Amazon MWAA; this allows them to provide a uniform authentication and single sign-on (SSO) experience using their adopted identity providers (IdP) across AWS services. For ease of use for end-users of Amazon MWAA, organizations configure a custom domain endpoint to their Apache Airflow UI endpoint. For teams operating and managing multiple Amazon MWAA environments, securing and customizing each environment is a repetitive but necessary task. Automation through infrastructure as code (IaC) can alleviate this heavy lifting to achieve consistency at scale.

This post describes how you can integrate your organization’s existing OIDC-based IdPs with Amazon MWAA to grant secure access to your existing Amazon MWAA environments. Furthermore, you can use the solution to provision new Amazon MWAA environments with the built-in OIDC-based IdP integrations. This approach allows you to securely provide access to your new or existing Amazon MWAA environments without requiring AWS credentials for end-users.

Overview of Amazon MWAA environments

Managing multiple user names and passwords can be difficult—this is where SSO authentication and authorization comes in. OIDC is a widely used standard for SSO, and it’s possible to use OIDC SSO authentication and authorization to access Apache Airflow UI across multiple Amazon MWAA environments.

When you provision an Amazon MWAA environment, you can choose public or private Apache Airflow UI access mode. Private access mode is typically used by customers that require restricting access from only within their virtual private cloud (VPC). When you use public access mode, the access to the Apache Airflow UI is available from the internet, in the same way as an AWS Management Console page. Internet access is needed when access is required outside of a corporate network.

Regardless of the access mode, authorization to the Apache Airflow UI in Amazon MWAA is integrated with AWS Identity and Access Management (IAM). All requests made to the Apache Airflow UI need to have valid AWS session credentials with an assumed IAM role that has permissions to access the corresponding Apache Airflow environment. For more details on the permissions policies needed to access the Apache Airflow UI, refer to Apache Airflow UI access policy: AmazonMWAAWebServerAccess.

Different user personas such as developers, data scientists, system operators, or architects in your organization may need access to the Apache Airflow UI. In some organizations, not all employees have access to the AWS console. It’s fairly common that employees who don’t have AWS credentials may also need access to the Apache Airflow UI that Amazon MWAA exposes.

In addition, many organizations have multiple Amazon MWAA environments. It’s common to have an Amazon MWAA environment setup per application or team. Each of these Amazon MWAA environments can be run in different deployment environments like development, staging, and production. For large organizations, you can easily envision a scenario where there is a need to manage multiple Amazon MWAA environments. Organizations need to provide secure access to all of their Amazon MWAA environments using their existing OIDC provider.

Solution Overview

The solution architecture integrates an existing OIDC provider to provide authentication for accessing the Amazon MWAA Apache Airflow UI. This allows users to log in to the Apache Airflow UI using their OIDC credentials. From a system perspective, this means that Amazon MWAA can integrate with an existing OIDC provider rather than having to create and manage an isolated user authentication and authorization through IAM internally.

The solution architecture relies on an Application Load Balancer (ALB) setup with a fully qualified domain name (FQDN) with public (internet) or private access. This ALB provides SSO access to multiple Amazon MWAA environments. The user-agent (web browser) call flow for accessing an Apache Airflow UI console to the target Amazon MWAA environment includes the following steps:

  1. The user-agent resolves the ALB domain name from the Domain Name System (DNS) resolver.
  2. The user-agent sends a login request to the ALB path /aws_mwaa/aws-console-sso with a set of query parameters populated. The request uses the required parameters mwaa_env and rbac_role as placeholders for the target Amazon MWAA environment and the Apache Airflow role-based access control (RBAC) role, respectively.
  3. Once it receives the request, the ALB redirects the user-agent to the OIDC IdP authentication endpoint. The user-agent authenticates with the OIDC IdP with the existing user name and password.
  4. If user authentication is successful, the OIDC IdP redirects the user-agent back to the configured ALB with a redirect_url with the authorization code included in the URL.
  5. The ALB uses the authorization code received to obtain the access_token and OpenID JWT token with openid email scope from the OIDC IdP. It then forwards the login request to the Amazon MWAA authenticator AWS Lambda function with the JWT token included in the request header in the x-amzn-oidc-data parameter.
  6. The Lambda function verifies the JWT token found in the request header using ALB public keys. The function subsequently authorizes the authenticated user for the requested mwaa_env and rbac_role stored in an Amazon DynamoDB table. The use of DynamoDB for authorization here is optional; the Lambda code function is_allowed can be customized to use other authorization mechanisms.
  7. The Amazon MWAA authenticator Lambda function redirects the user-agent to the Apache Airflow UI console in the requested Amazon MWAA environment with the login token in the redirect URL. Additionally, the function provides the logout functionality.

Amazon MWAA public network access mode

For the Amazon MWAA environments configured with public access mode, the user agent uses public routing over the internet to connect to the ALB hosted in a public subnet.

The following diagram illustrates the solution architecture with a numbered call flow sequence for internet network reachability.

Amazon MWAA public network access mode architecture diagram

Amazon MWAA private network access mode

For Amazon MWAA environments configured with private access mode, the user agent uses private routing over a dedicated AWS Direct Connect or AWS Client VPN to connect to the ALB hosted in a private subnet.

The following diagram shows the solution architecture for Client VPN network reachability.

Amazon MWAA private network access mode architecture diagram

Automation through infrastructure as code

To make setting up this solution easier, we have released a pre-built solution that automates the tasks involved. The solution has been built using the AWS Cloud Development Kit (AWS CDK) using the Python programming language. The solution is available in our GitHub repository and helps you achieve the following:

  • Set up a secure ALB to provide OIDC-based SSO to your existing Amazon MWAA environment with default Apache Airflow Admin role-based access.
  • Create new Amazon MWAA environments along with an ALB and an authenticator Lambda function that provides OIDC-based SSO support. With the customization provided, you can define the number of Amazon MWAA environments to create. Additionally, you can customize the type of Amazon MWAA environments created, including defining the hosting VPC configuration, environment name, Apache Airflow UI access mode, environment class, auto scaling, and logging configurations.

The solution offers a number of customization options, which can be specified in the cdk.context.json file. Follow the setup instructions to complete the integration to your existing Amazon MWAA environments or create new Amazon MWAA environments with SSO enabled. The setup process creates an ALB with an HTTPS listener that provides the user access endpoint. You have the option to define the type of ALB that you need. You can define whether your ALB will be public facing (internet accessible) or private facing (only accessible within the VPC). It is recommended to use a private ALB with your new or existing Amazon MWAA environments configured using private UI access mode.

The following sections describe the specific implementation steps and customization options for each use case.

Prerequisites

Before you continue with the installation steps, make sure you have completed all prerequisites and run the setup-venv script as outlined within the README.md file of the GitHub repository.

Integrate to a single existing Amazon MWAA environment

If you’re integrating with a single existing Amazon MWAA environment, follow the guides in the Quick start section. You must specify the same ALB VPC as that of your existing Amazon MWAA VPC. You can specify the default Apache Airflow RBAC role that all users will assume. The ALB with an HTTPS listener is configured within your existing Amazon MWAA VPC.

Integrate to multiple existing Amazon MWAA environments

To connect to multiple existing Amazon MWAA environments, specify only the Amazon MWAA environment name in the JSON file. The setup process will create a new VPC with subnets hosting the ALB and the listener. You must define the CIDR range for this ALB VPC such that it doesn’t overlap with the VPC CIDR range of your existing Amazon MWAA VPCs.

When the setup steps are complete, implement the post-deployment configuration steps. This includes adding the ALB CNAME record to the Amazon Route 53 DNS domain.

For integrating with Amazon MWAA environments configured using private access mode, there are additional steps that need to be configured. These include configuring VPC peering and subnet routes between the new ALB VPC and the existing Amazon MWAA VPC. Additionally, you need to configure network connectivity from your user-agent to the private ALB endpoint resolved by your DNS domain.

Create new Amazon MWAA environments

You can configure the new Amazon MWAA environments you want to provision through this solution. The cdk.context.json file defines a dictionary entry in the MwaaEnvironments array. Configure the details that you need for each of the Amazon MWAA environments. The setup process creates an ALB VPC, ALB with an HTTPS listener, Lambda authorizer function, DynamoDB table, and respective Amazon MWAA VPCs and Amazon MWAA environments in them. Furthermore, it creates the VPC peering connection between the ALB VPC and the Amazon MWAA VPC.

If you want to create Amazon MWAA environments with private access mode, the ALB VPC CIDR range specified must not overlap with the Amazon MWAA VPC CIDR range. This is required for the automatic peering connection to succeed. It can take between 20–30 minutes for each Amazon MWAA environment to finish creating.

When the environment creation processes are complete, run the post-deployment configuration steps. One of the steps here is to add authorization records to the created DynamoDB table for your users. You need to define the Apache Airflow rbac_role for each of your end-users, which the Lambda authorizer function matches to provide the requisite access.

Verify access

Once you’ve completed with the post-deployment steps, you can log in to the URL using your ALB FQDN. For example, If your ALB FQDN is alb-sso-mwaa.example.com, you can log in to your target Amazon MWAA environment, named Env1, assuming a specific Apache Airflow RBAC role (such as Admin), using the following URL: https://alb-sso-mwaa.example.com/aws_mwaa/aws-console-sso?mwaa_env=Env1&rbac_role=Admin. For the Amazon MWAA environments that this solution created, you need to have appropriate Apache Airflow rbac_role entries in your DynamoDB table.

The solution also provides a logout feature. To log out from an Apache Airflow console, use the normal Apache Airflow console logout. To log out from the ALB, you can, for example, use the URL https://alb-sso-mwaa.example.com/logout.

Clean up

Follow the readme documented steps in the section Destroy CDK stacks in the GitHub repo, which shows how to clean up the artifacts created via the AWS CDK deployments. Remember to revert any manual configurations, like VPC peering connections, that you might have made after the deployments.

Conclusion

This post provided a solution to integrate your organization’s OIDC-based IdPs with Amazon MWAA to grant secure access to multiple Amazon MWAA environments. We walked through the solution that solves this problem using infrastructure as code. This solution allows different end-user personas in your organization to access the Amazon MWAA Apache Airflow UI using OIDC SSO.

To use the solution for your own environments, refer to Application load balancer single-sign-on for Amazon MWAA. For additional code examples on Amazon MWAA, refer to Amazon MWAA code examples.


About the Authors

Ajay Vohra is a Principal Prototyping Architect specializing in perception machine learning for autonomous vehicle development. Prior to Amazon, Ajay worked in the area of massively parallel grid-computing for financial risk modeling.

Jaswanth Kumar is a customer-obsessed Cloud Application Architect at AWS in NY. Jaswanth excels in application refactoring and migration, with expertise in containers and serverless solutions, coupled with a Masters Degree in Applied Computer Science.

Aneel Murari is a Sr. Serverless Specialist Solution Architect at AWS based in the Washington, D.C. area. He has over 18 years of software development and architecture experience and holds a graduate degree in Computer Science. Aneel helps AWS customers orchestrate their workflows on Amazon Managed Apache Airflow (MWAA) in a secure, cost effective and performance optimized manner.

Parnab Basak is a Solutions Architect and a Serverless Specialist at AWS. He specializes in creating new solutions that are cloud native using modern software development practices like serverless, DevOps, and analytics. Parnab works closely in the analytics and integration services space helping customers adopt AWS services for their workflow orchestration needs.

Forward Zabbix Events to Event-Driven Ansible and Automate your Workflows

Post Syndicated from Aleksandr Kotsegubov original https://blog.zabbix.com/forward-zabbix-events-to-event-driven-ansible-and-automate-your-workflows/25893/

Zabbix is highly regarded for its ability to integrate with a variety of systems right out of the box. That list of systems has recently been expanded with the addition of Event-Driven Ansible. Bringing Zabbix and Event-Driven Ansible together lets you completely automate your IT processes, with Zabbix being the source of events and Ansible serving as the executor. This article will explore in detail how to send events from Zabbix to Event-Driven Ansible.

What is Event-Driven Ansible?

Currently available in developer preview, Event-Driven Ansible is an event-based automation solution that automatically matches each new event to the conditions you specified. This eliminates routine tasks and lets you spend your time on more important issues. And because it’s a fully automated system, it doesn’t get sick, take lunch breaks, or go on vacation – by working around the clock, it can speed up important IT processes.

Sending an event from Zabbix to Event-Driven Ansible

From the Zabbix side, the implementation is a media type that uses a webhook – a tool that’s already familiar to most users. This solution allows you to take advantage of the flexibility of setting up alerts from Zabbix using actions. This media type is delivered to Zabbix out of the box, and if your installation doesn’t have it, you can import it yourself from our integrations page.

On the Event-Driven Ansible side, the webhook plugin from the ansible.eda standard collection is used. If your system doesn’t have this collection, you can get it by running the following command:

ansible-galaxy collection install ansible.eda

Let’s look at the process of sending events in more detail with the diagram below.

From the Zabbix side:
  1. An event is created in Zabbix.

  2. The Zabbix server checks the created event according to the conditions in the actions. If all the conditions in an action configured to send an event to Event-Driven Ansible are met, the next step (running the operations configured in the action) is executed. 

  3. Sending through the “Event-Driven Ansible” media type is configured as an operation. The address specified by the service user for the “Event-Driven Ansible” media is taken as the destination.

  4. The media type script processes all the information about the event, generates a JSON, and sends it to Event-Driven Ansible.

From the Ansible side:
  1. An event sent from Zabbix arrives at the specified address and port. The webhook plugin listens on this port.

  2. After receiving an event, ansible-rulebook starts checking the conditions in order to find a match between the received event and the set of rules in ansible-rulebook.

  3. If the conditions for any of the rules match the incoming event, then the ansible-rulebook performs the specified action. It can be either a single command or a playbook launch.

Let’s look at the setup process from each side.

Sending events from Zabbix

Setting up sending alerts is described in detail on the Zabbix – Ansible integration page. Here are the basic steps:

  1. Import the media type of the required version if it is not present in your system.

  2. Create a service user. Select “Event-Driven Ansible” as the media and specify the address of your server and the port which the webhook plugin will listen in on as the destination in the format xxx.xxx.xxx.xxx:port. This article will use the value 5001 as the port. This value will still be needed to configure ansible-rulebook.

  3. Configure an action to send notifications. As an operation, specify sending via “Event-Drive Ansible.” Specify the service user created in the previous step as the recipient.

Receiving events in Event-Driven Ansible

First things first – you need to have an eda-server installed. You can find detailed installation and configuration instructions here.

After installing an eda-server, you can make your first ansible-rulebook. To do this, you need to create a file with the “yml” extension. Call it zabbix-test.yml and put the following code in it:

---
- name: Zabbix test rulebook
  hosts: all
  sources:
    - ansible.eda.webhook:
        host: 0.0.0.0
        port: 5001
  rules:
    - name: debug
      condition: event.payload is defined
      action:
        debug:

Ansible-rulebook, as you may have noticed, uses the yaml format. In this case, it has 4 parameters – name, hosts, source, and rules.

Name and Host parameters

The first 2 parameters are typical for Ansible users. The name parameter contains the name of the ansible-rulebook. The hosts parameter specifies which hosts the ansible-rulebook applies to. Hosts are usually listed in the inventory file. You can learn more about the inventory file in the ansible documentation. The most interesting options are source and rules, so let’s take a closer look at them.

Source parameter

The source parameter specifies the origin of events for the ansible-rulebook. In this case, the ansible.eda.webhook plugin is specified as the event source. This means that after the start of the ansible-rulebook, the webhook plugin starts listening in on the port to receive the event. This also means that it needs 2 parameters to work:

  1. Parameter “host” – a value of 0.0.0.0 used to receive events from all addresses.
  2. Parameter “port” – with 5001 as the value. This plugin will accept all incoming messages received on this particular port. The value of the port parameter must match the port you specified when creating the service user in Zabbix.
Rules parameter

The rules parameter contains a set of rules with conditions for matching with an incoming event. If the condition matches the received event, then the action specified in the actions section will be performed. Since this ansible-rulebook is only for reference, it is enough to specify only one rule. For simplicity, you can use event.payload is defined as a condition. This simple condition means that the rule will check for the presence of the “event.payload” field in the incoming event. When you specify debug in the action, ansible-rulebook will show you the full text of the received event. With debug you can also understand which fields will be passed in the event and set the conditions you need.

The name, host, source parameters only affect the event source. In our case, the webhook plugin will always be the event source. Accordingly, these parameters will not change and in all the following examples they will be skipped. As an example, only the value of the rules parameter will be specified.

To start your ansible-rulebook you can use the command:

ansible-rulebook --rulebook /path/to/your/rulebook/zabbix-test.yml –verbose

The line Waiting for events in the output indicates that the ansible-rulebook has successfully loaded and is ready to receive events.

Examples 

Ansible-rulebook provides a wide variety of opportunities for handling incoming events. We will look into some of the possible conditions and scenarios for using ansible-rulebook, but please remember that a more detailed list of all supported conditions and examples can be found on the official documentation page. For a general understanding of the principles of working with ansible-rulebook, please read the documentation.

Let’s see how to build conditions for precise event filtering in more detail with a few examples.

Example #1

You need to run a playbook to change the NGINX configuration at the Berlin office when you receive an event from Zabbix. The host is in three groups:

  1. Linux servers
  2. Web servers
  3. Berlin.

And it has 3 tags:

  1. target: nginx
  2. class: software
  3. component: configuration.

You can see all these parameters in the diagram below:

On the left side you can see a host with configured monitoring. To determine whether an event belongs to a given rule, you will work with two fields – host groups and tags. These parameters will be used to determine whether the event belongs to the required server and configuration. According to the diagram, all event data is sent to the media type script to generate and send JSON. On the Ansible side, the webhook receives an event with JSON from Zabbix and passes it to the ansible-rulebook to check the conditions. If the event matches all the conditions, the ansible-rulebook starts the specified action. In this case, it’s the start of the playbook.

In accordance with the specified settings for host groups and tags, the event will contain information as in the block below. However, only two fields from the output are needed – “host_groups” and “event_tags.”

{
    ...,
    "host_groups": [
        "Berlin",
        "Linux servers",
        "Web servers"],
    "event_tags": {
        "class": ["os"],
        "component": ["configuration"],
        "target": ["nginx"]},
    ...
}
Search by host groups

First, you need to determine that the host is a web server. You can understand this by the presence of the “Web servers” group in the host in the diagram above. The second point that you can determine according to the scheme is that the host also has the group “Berlin” and therefore refers to the office in Berlin. To filter the event on the Event-Driven Ansible side, you need to build a condition by checking for the presence of two host groups in the received event – “Web servers” and “Berlin.” The “host_groups” field in the resulting JSON is a list, which means that you can use the is select construct to find an element in the list.

Search by tag value

The third condition for the search applies if this event belongs to a configuration. You can understand this by the fact that the event has a “component” tag with a value of “configuration.” However, the event_tags field in the resulting JSON is worth looking at in more detail. It is a dictionary containing tag names as keys, and because of that, you can refer to each tag separately on the Ansible side. What’s more, each tag will always contain a list of tag values, as tag names can be duplicated with different values. To search by the value of a tag, you can refer to a specific tag and use the is select construction for locating an element in the list.

To solve this example, specify the following rules block in ansible-rulebook:

  rules:
    - name: Run playbook for office in Berlin
      condition: >-
        event.payload.host_groups is select("==","Web servers") and
        event.payload.host_groups is select("==","Berlin") and
        event.payload.event_tags.component is select("==","configuration")
      action:
        run_playbook:
          name: deploy-nginx-berlin.yaml
Solution

The condition field contains 3 elements, and you can see all conditions on the right side of the diagram. In all three cases, you can use the is select construct and check if the required element is in the list.

The first two conditions check for the presence of the required host groups in the list of groups in “event.payload.host_groups.” In the diagram, you can see with a green dotted line how the first two conditions correspond to groups on the host in Zabbix. According to the condition of the example, this host must belong to both required groups, meaning that you need to set the logical operation and between the two conditions.

In the last condition, the event_tags field is a dictionary. Therefore, you can refer to the tag by specifying its name in the “event.payload.event_tags.component“ path and check for the presence of “configuration” among the tag values. In the diagram, you can see the relationship between the last condition and the tags on the host with a dotted line.

Since all three conditions must match according to the condition of the example, you once again need to put the logical operation and between them.

Action block

Let’s analyze the action block. If both conditions match, the ansible-rulebook will perform the specified action. In this case, that means the launch of the playbook using the run_playbook construct. Next, the name block contains the name of the playbook to run: deploy-nginx-berlin.yaml.

Example #2

Here is an example using the standard template Docker by Zabbix agent 2. For events triggered by “Container {#NAME}: Container has been stopped with error code”, the administrator additionally configured an action to send it to Event-Driven Ansible as well. Let’s assume that in the case of stopping the container “internal_portal” with the status “137”, its restart requires preparation, with the logic of that preparation specified in the playbook.

There are more details in the diagram above. On the left side, you can see a host with configured monitoring. The event from the example will have many parameters, but you will work with two – operational data and all tags of this event. According to the general concept, all this data will go into the media type script, which will generate JSON for sending to Event-Driven Ansible. On the Ansible side, the ansible-rulebook checks the received event for compliance with the specified conditions. If the event matches all the conditions, the ansible-rulebook starts the specified action, in this case, the start of the playbook.

In the block below you can see part of the JSON to send to Event-Driven Ansible. To solve the task, you need to be concerned only with two fields from the entire output: “event_tags” and “operation_data”:

{
    ...,
    "event_tags": {
        "class": ["software"],
        "component": ["system"],
        "container": ["/internal_portal"],
        "scope": ["availability"],
        "target": ["docker"]},
    "operation_data": "Exit code: 137",
    ...
}
Search by tag value

The first step is to determine that the event belongs to the required container. Its name is displayed in the “container” tag, so you need to add a condition to search for the name of the container “/internal_portal” in the tag. However, as discussed in the previous example, the event_tags field in the resulting JSON is a dictionary containing tag names as keys. By referring to the key to a specific tag, you can get a list of its values. Since tags can be repeated with different values, you can get all the values of this tag by key in the received JSON, and this field will always be a list. Therefore, to search by value, you can always refer to a specific tag and use the is select construction.

Search by operational data field

The second step is to check the exit code. According to the trigger settings, this information is displayed in the operational data and passed to Event-Driven Ansible in the “operation_data” field. This field is a string, and you need to check with a regular expression if this field contains the value “Exit code: 137.” On the ansible-rulebook side, the is regex construct will be used to search for a regular expression.

To solve this example, specify the following rules block in ansible-rulebook:

  rules:
    - name: Run playbook for container "internal_portal"
      condition: >-
        event.payload.event_tags.container is select("==","/internal_portal") and
        event.payload.operation_data is regex("Exit code.*137")
      action:
        run_playbook:
          name: restart_internal_portal.yaml
Solution

In the first condition, the event_tags field is a dictionary and you are referring to a specific tag, so the final path will contain the tag name, including “event.payload.event_tags.container.” Next, using the is select construct, the list of tag values is checked. This allows you to check that the required “internal_portal” container is present as the value of the tag. If you refer to the diagram, you can see the green dotted line relationship between the condition in the ansible-rulebook and the tags in the event from the Zabbix side.

In the second condition, access the event.payload.operation_data field using the is regex construct and the regular expression “Exit code.*137.” This way you check for the presence of the status “137” as a value. You can also see he link between the green dotted line of the condition on the ansible-rulebook side and the operational data of the event in Zabbix in the diagram.

Since both conditions must match, you can specify the and logical operation between the conditions.

Action block

Taking a look at the action block, if both conditions match, the ansible-rulebook will perform the specified action. In this case, it’s the launch of the playbook using the run_playbook construct. Next, the name block contains the name of the playbook to run:restart_internal_portal.yaml.

Conclusion

It’s clear that both tools (and especially their interconnected work) are great for implementing automation. Zabbix is a powerful monitoring solution, and Ansible is a great orchestration software. Both of these tools complement each other, creating an excellent tandem that takes on all routine tasks. This article has shown how to send events from Zabbix to Event-Driven Ansible and how to configure it on each side, and it has also proven that it’s not as difficult as it might initially seem. But remember – we’ve only looked at the simplest examples. The rest depends only on your imagination.

Questions

Q: How can I get the full list of fields in an event?

A: The best way is to make an ansible-rulebook with action “debug” and condition “event.payload is defined.” In this case, all events from Zabbix will be displayed. This example is described in the section “Receiving Events in Event-Driven Ansible.”

Q: Does the list of sent fields depend on the situation?

A: No. The list of fields in the sent event is always the same. If there are no objects in the event, the field will be empty. The case with tags is a good example – the tags may not be present in the event, but the “tags” field will still be sent.

Q: What events can be sent from Zabbix to Event-Drive Ansible?

A: In the current version (Zabbix 6.4)n, only trigger-based events and problems can be sent.

Q: Is it possible to use the values of received events in the ansible-playbook?

A: Yes. On the ansible-playbook side, you can get values using the ansible_eda namespace. To access the values in an event, you need to specify ansible_eda.event.

For example, to display all the details of an event, you can use:

  tasks:
    - debug:
        msg: "{{ ansible_eda.event }}"

To get the name of the container from example #2 of this article, you can use the following code:

  tasks:
    - debug:
        msg: "{{ ansible_eda.event.payload.event_tags.container }}"

The post Forward Zabbix Events to Event-Driven Ansible and Automate your Workflows appeared first on Zabbix Blog.

10 ways to build applications faster with Amazon CodeWhisperer

Post Syndicated from Kris Schultz original https://aws.amazon.com/blogs/devops/10-ways-to-build-applications-faster-with-amazon-codewhisperer/

Amazon CodeWhisperer is a powerful generative AI tool that gives me coding superpowers. Ever since I have incorporated CodeWhisperer into my workflow, I have become faster, smarter, and even more delighted when building applications. However, learning to use any generative AI tool effectively requires a beginner’s mindset and a willingness to embrace new ways of working.

Best practices for tapping into CodeWhisperer’s power are still emerging. But, as an early explorer, I’ve discovered several techniques that have allowed me to get the most out of this amazing tool. In this article, I’m excited to share these techniques with you, using practical examples to illustrate just how CodeWhisperer can enhance your programming workflow. I’ll explore:

Before we begin

If you would like to try these techniques for yourself, you will need to use a code editor with the AWS Toolkit extension installed. VS Code, AWS Cloud9, and most editors from JetBrains will work. Refer to the CodeWhisperer “Getting Started” resources for setup instructions.

CodeWhisperer will present suggestions automatically as you type. If you aren’t presented with a suggestion, you can always manually trigger a suggestion using the Option + C (Mac) or Alt + C (Windows) shortcut. CodeWhisperer will also sometimes present you with multiple suggestions to choose from. You can press the → and ← keys to cycle through all available suggestions.

The suggestions CodeWhisperer offers are non-deterministic, which means you may receive slightly different suggestions than the ones shown in this article. If you receive a suggestion that doesn’t suit your needs, try cycling through all available suggestions using the arrow keys or try rephrasing any code comment prompts slightly. Even punctuation changes can produce fresh suggestions.

The examples in this article are written in JavaScript and TypeScript. However, the techniques I’ll demonstrate are equally suited to many of the programming languages supported by CodeWhisperer. Now, onto the examples!

Typing less

At the most basic level, CodeWhisperer can be used as a supercharged auto-complete, significantly reducing your typing when writing individual lines or blocks of code. To use CodeWhisperer in this way, just write code as you normally would. CodeWhisperer will make automatic suggestions as you type. Accept suggestions when they’re helpful, and ignore suggestions when they’re not. In my experience, using CodeWhisperer in this way alone reduces my typing by 50% or more.

Generating functions

Automatically generating function bodies from signatures

Skilled programmers use clear, descriptive names for their functions and parameters. If you follow this practice in your own programming, CodeWhisperer will have all the context it needs to implement many functions for you.

After I enter the function signature below, CodeWhisperer generates the function body for me.

User input (JavaScript):

function standardDeviation(numbers)

Result:

function standardDeviation(numbers) {
  let mean = numbers.reduce((a, b) => a + b) / numbers.length;
  let variance =
    numbers.reduce((a, b) => a + Math.pow(b - mean, 2)) / numbers.length;
  return Math.sqrt(variance);
}

Generating functions from comments

Sometimes a good function signature alone isn’t enough to convey the desired behavior of a function. In those cases, you can use a detailed code comment as a prompt to CodeWhisperer. I recommend using your language’s doc comment delimiter when using this technique.

User Input (JavaScript):

/**
 * Given a list that contains some numbers and strings, format them into a string
 * in which the numbers are prepended with "#" and the strings are wrapped in
 * double quotes.
 */

Result:

/**
 * Given a list that contains some numbers and strings, format them into a string
 * in which the numbers are prepended with "#" and the strings are wrapped in
 * double quotes.
 */
function formatList(list) {
  return list
    .map((item) => {
      if (typeof item === "number") {
        return `#${item}`;
      } else {
        return `"${item}"`;
      }
    })
    .join(", ");
}

Generating classes

You can generate full or partial class implementations from a single code comment. The comment should describe the key properties and behaviors of the class. Simple descriptions are often all that is required.

User input (TypeScript):

/**
 * A folder has a name, children, and an optional parent. A child can't be added
 * twice.
 */

Result:

After I write the above block comment, CodeWhisperer provides multiple suggestions, including the following implementation, complete with documentation comments for each method!

/**
 * A folder has a name, children, and an optional parent. A child can't be added
 * twice.
 */
export class Folder {
  name: string;
  children: Folder[];
  parent?: Folder;

  constructor(name: string, parent?: Folder) {
    this.name = name;
    this.children = [];
    this.parent = parent;
  }

  /**
   * Add a child to the folder.
   * @param child The child to add.
   * @returns True if the child was added, false if it was already present.
   */
  addChild(child: Folder): boolean {
    if (this.children.includes(child)) {
      return false;
    }
    this.children.push(child);
    child.parent = this;
    return true;
  }

  /**
   * Remove a child from the folder.
   * @param child The child to remove.
   * @returns True if the child was removed, false if it wasn't present.
   */
  removeChild(child: Folder): boolean {
    const index = this.children.indexOf(child);
    if (index === -1) {
      return false;
    }
    this.children.splice(index, 1);
    child.parent = undefined;
    return true;
  }
}

Implementing algorithms

CodeWhisperer enables you to implement well-known algorithms with ease. Simply tell CodeWhisperer which algorithm you want to apply to a particular coding problem, and it will oblige.

User input (JavaScript):

const actors = [ ... ];

// Sort the actors by age using the merge sort algorithm.

Result:

const actors = [ ... ];

// Sort the actors by age using the merge sort algorithm.
function mergeSort(arr) {
  if (arr.length < 2) {
    return arr;
  }

  const middle = Math.floor(arr.length / 2);
  const left = arr.slice(0, middle);
  const right = arr.slice(middle);

  return merge(mergeSort(left), mergeSort(right));
}

function merge(left, right) {
  const result = [];

  while (left.length && right.length) {
    if (left[0].age < right[0].age) {
      result.push(left.shift());
    } else {
      result.push(right.shift());
    }
  }

  while (left.length) {
    result.push(left.shift());
  }

  while (right.length) {
    result.push(right.shift());
  }

  return result;
}

console.log(mergeSort(actors));

Writing unit tests

CodeWhisperer accelerates the implementation of unit tests. Not only is it able to implement the logic for most of the explicit unit tests I ask for, but it often suggests unit tests for edge cases that hadn’t even occurred to me.

When I want CodeWhisperer to implement a unit test, I start with a declaration of the condition I’m testing and the expected result as in the following example.

User input (JavaScript):

describe("Folder.addChild()", () => {
  it("when adding an existing folder should not add duplicate"

Result:

describe("Folder.addChild()", () => {
  it("when adding an existing folder should not add duplicate", () => {
    const folder = new Folder("root");
    const child = new Folder("child");
    folder.addChild(child);
    folder.addChild(child);
    expect(folder.children.length).toBe(1);
  });
});

After you’ve written a test or two, CodeWhisperer is likely to suggest more tests, even without explicit prompting.

Tip: You’ll notice that my test description includes 1) the function I’m testing, 2) the condition I’m testing, and 3) what I expect the result to be. Following this three-part pattern is a great way to ensure CodeWhisperer has the context it needs to implement high quality test logic.

Creating sample data

One of the delightful discoveries I’ve made is that CodeWhisperer can generate dummy data—or really any code that has a repeating pattern. As an illustration, for the following list I entered only the first “user” object as an example, and CodeWhisperer helpfully continued the pattern.

User input (JavaScript):

const fakeUsers = [
  { name: "User 1", id: "user1" },

Result:

const fakeUsers = [
  { name: "User 1", id: "user1" },
  { name: "User 2", id: "user2" },
  { name: "User 3", id: "user3" },  
  { name: "User 4", id: "user4" },
  { name: "User 5", id: "user5" },
  { name: "User 6", id: "user6" }
]

It even works for generating random but plausible values like the names seen below.

User input (JavaScript):

const fakeUsers = [
  { name: "Sally", id: "user1" },

Result:

const fakeUsers = [
  { name: "Sally", id: "user1" },
  { name: "Bob", id: "user2" },
  { name: "Joe", id: "user3" },
  { name: "Jane", id: "user4" },
  { name: "Jill", id: "user5" },
  { name: "Jack", id: "user6" }
]

Simplifying regular expressions

While CodeWhisperer can’t translate natural language into custom regular expressions (yet), it does know a lot of commonly used regular expressions. This saves you time by putting these regular expressions at your fingertips whenever you need them.

Let’s look at one notoriously complex but common regular expression.

User input (JavaScript):

/**
 * Function that validates whether a string is a valid email address.
 */

Result:

/**
 * Function that validates whether a string is a valid email address.
 */
function validateEmail(email: string): boolean {
  const re =
    /^(([^<>()[\]\\.,;:\s@"]+(\.[^<>()[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
  return re.test(String(email).toLowerCase());
}

Learning third-party code libraries faster

CodeWhisperer helps accelerate my learning when I’m using an unfamiliar library, module, or framework. In the example below, I needed to write a utility function that would take a gzip compressed value coming from an API and translate it into a value that the rest of my code could understand. I knew the “pako” library would help me, but I had no idea how to use it. I just prompted CodeWhisperer with an import statement and a comment describing my goal.

import pako from "pako";

/**
 * Accepts a base64-encoded, gzip-compressed JSON string and returns
 * a parsed JSON object.
 */

Result:

import pako from "pako";

/**
 * Accepts a base64-encoded, gzip-compressed JSON string and returns
 * a parsed JSON object.
 */
export function decodeGzipJson(base64String) {
  const compressed = atob(base64String);
  const uncompressed = pako.inflate(compressed, { to: "string" });
  return JSON.parse(uncompressed);
}

Documenting code

CodeWhisperer is capable of generating docstrings and comments for the code it generates, as well as for your existing code. For example, let’s say I want CodeWhisperer to document the matches() method of this FavoritesFilter TypeScript class I’ve implemented (I’ve omitted some implementation details for brevity).

class FavoritesFilter implements IAssetFilter {
  ...
  matches(asset: Asset): boolean {
    ...
  }
}

I can just type a doc comment delimiter (/** */) immediately above the method name and CodeWhisperer will generate the body of the doc comment for me.

Note: When using CodeWhisperer in this way you may have to manually trigger a suggestion using Option + C (Mac) or Alt + C (Windows).

class FavoritesFilter implements IAssetFilter {
  ...
  /**
   * Determines whether the asset matches the filter.
   */
  matches(asset: Asset): boolean {
    ...
  }
}

Conclusion

I hope the techniques above inspire ideas for how CodeWhisperer can make you a more productive coder. Install CodeWhisperer today to start using these time-saving techniques in your own projects. These examples only scratch the surface. As additional creative minds start applying CodeWhisperer to their daily workflows, I’m sure new techniques and best practices will continue to emerge. If you discover a novel approach that you find useful, post a comment to share what you’ve discovered. Perhaps your technique will make it into a future article and help others in the CodeWhisperer community enhance their superpowers.

Kris's profile picture

Kris Schultz (he/him)

Kris Schultz has spent over 25 years bringing engaging user experiences to life by combining emerging technologies with world class design. In his role as 3D Specialist Solutions Architect, Kris helps customers leverage AWS services to power 3D applications of all sorts.

Kubernetes monitoring with Zabbix – Part 3: Extracting Prometheus metrics with Zabbix preprocessing

Post Syndicated from Michaela DeForest original https://blog.zabbix.com/kubernetes-monitoring-with-zabbix-part-3-extracting-prometheus-metrics-with-zabbix-preprocessing/25639/

In the previous Kubernetes monitoring blog post, we explored the functionality provided by the Kubernetes integration in Zabbix and discussed use cases for monitoring and alerting to events in a cluster, such as changes in replicas or CPU pressure.

In the final part of this series on monitoring Kubernetes with Zabbix, we will show how the Kubernetes integration uses Prometheus to parse data from kube-state-metrics and how users can leverage this functionality to monitor the many cloud-native applications that expose Prometheus metrics by default.

Want to see Kubernetes monitoring in action? Watch Part 3 of our Kubernetes monitoring video guide.

Prometheus Data Model

Prometheus is an open-source toolkit for monitoring and alerting created by SoundCloud. Prometheus was the second hosted project to join the Cloud-native Computing Foundation in 2016, after Kubernetes. As such, users of Kubernetes have adopted Prometheus extensively.

Lines in the model begin with or without a pound sign. Lines beginning with a pound sign specify metadata that includes help text and type information. Additional lines follow where the first key is the metric name with optional labels specified, followed by the value, and optionally concluding with a timestamp. If a timestamp is absent, the assumption is that the timestamp is equal to the time of collection.

http_requests_total{job=”nginx”,instance=”10.0.0.1:443”} 15 1677507349983

Using Prometheus with Kubernetes Monitoring

Let’s start with an example from the kube-state-metrics endpoint, installed in the first part of this series. Below is the output for the /metrics endpoint used by the Kubernetes integration, showing the metric kube_job_created. Each metric has help text followed by a line starting with that metric name, labels describing each job, and creation time as the sample value.

# HELP kube_job_created Unix creation timestamp
# TYPE kube_job_created gauge
kube_job_created{namespace="jdoe",job_name="supportreport-supportreport-27956880"} 1.6774128e+09
kube_job_created{namespace="default",job_name="core-backup-data-default-0-27957840"} 1.6774704e+09
kube_job_created{namespace="default",job_name="core-backup-data-default-1-27956280"} 1.6773768e+09
kube_job_created{namespace="jdoe",job_name="activetrials-activetrials-27958380"} 1.6775028e+09
kube_job_created{namespace="default",job_name="core-cache-tags-27900015"} 1.6740009e+09
kube_job_created{namespace="default",job_name="core-cleanup-pipes-27954860"} 1.6772916e+09
kube_job_created{namespace="jdoe",job_name="salesreport-salesreport-27954060"} 1.6772436e+09
kube_job_created{namespace="default",job_name="core-correlation-cron-1671562914"} 1.671562914e+09
kube_job_created{namespace="jtroy",job_name="jtroy-clickhouse-default-0-maintenance-27613440"} 1.6568064e+09
kube_job_created{namespace="default",job_name="core-backup-data-default-0-27956880"} 1.6774128e+09
kube_job_created{namespace="default",job_name="core-cleanup-sessions-27896445"} 1.6737867e+09
kube_job_created{namespace="default",job_name="report-image-findings-report-27937095"} 1.6762257e+09
kube_job_created{namespace="jdoe",job_name="salesreport-salesreport-27933900"} 1.676034e+09
kube_job_created{namespace="default",job_name="core-cache-tags-27899775"} 1.6739865e+09
kube_job_created{namespace="ssmith",job_name="test-auto-merger"} 1.653574763e+09
kube_job_created{namespace="default",job_name="report-image-findings-report-1650569984"} 1.650569984e+09
kube_job_created{namespace="ssmith",job_name="auto-merger-and-mailer-auto-merger-and-mailer-27952200"} 1.677132e+09
kube_job_created{namespace="default",job_name="core-create-pipes-pxc-user"} 1.673279381e+09
kube_job_created{namespace="jdoe",job_name="activetrials-activetrials-1640610000"} 1.640610005e+09
kube_job_created{namespace="jdoe",job_name="salesreport-salesreport-27943980"} 1.6766388e+09
kube_job_created{namespace="default",job_name="core-cache-accounting-map-27958085"} 1.6774851e+09

Zabbix collects data from this endpoint in the “Get state metrics.” The item uses a script item type to get data from the /metrics endpoint. Dependent items that use a Prometheus pattern as a preprocessing step to obtain data relevant to the dependent item are created.

Prometheus and Out-Of-The-Box Templates

Zabbix also offers many templates for applications that expose Prometheus metrics, including etcd. Etcd is a distributed key-value store that uses a simple HTTP interface. Many cloud applications use etcd, including Kubernetes. Following is a description of how to set up an etcd “host” using the built-in etcd template.

A new host is created called “Etcd Application” with an agent interface specified that provides the location of the application API. The interface port does not matter because a macro sets the port. The “Etcd by HTTP” template is attached to the host.

The “Get node metrics” item is the master item that collects Prometheus metrics. Testing this item shows that it returns Prometheus formatted metrics. The master item creates many dependent items that parse the Prometheus metrics. In the dependent item, “Maximum open file descriptors,” the maximum number of open file descriptors is obtained by adding the “Prometheus pattern” preprocessing step. This metric is available with the metric name process_max_fds.

Custom Prometheus Templates

 

While it is convenient when Zabbix has a template for the application you want to monitor, creating a new template for an application that exposes a /metrics endpoint but does not have an associated template is easy.

One such application is Argo CD. Argo CD is a GitOps continuous delivery tool for Kubernetes. An “application” represents each deployment in Kubernetes. Argo CD uses Git to keep applications in sync.

Argo CD exposes a Prometheus metrics endpoint that we can be used to monitor the application. The Argo CD documentation site includes information about available metrics.

In Argo CD, the metrics service is available at the argocd-metrics service. Following is a demonstration of creating an Argo CD template that collects Prometheus metrics. Install Argo CD in a cluster with a Zabbix proxy installed before starting. To do this, follow the Argo CD “Getting Started” guide.

Create a new template called, “Argo CD by HTTP” in the “Templates/Applications” group. Add three macros to the template. Set {$ARGO.METRICS.SERVICE.PORT} to the default of 8082. Set {$ARGO.METRICS.API.PATH} to “/metrics.” Set the last macro, {$ARGO.METRICS.SCHEME} to the default of “http.”

Open the template and click “Items -> Create item.” Name this item “Get Application Metrics” and give it the “HTTP agent” type. Set the key to argocd.get_metrics with a “Text” information type. Set the URL to {$ARGO.METRICS.SCHEME}://{HOST.CONN}:{$ARGO.METRICS.SERVICE.PORT}/metrics. Set the History storage period to “Do not keep history.”

Create a new host to represent Argo. Go to “Hosts -> Create host”. Name the host “Argo CD Application” and assign the newly created template. Define an interface and set the DNS name to the name of the metrics service, including the namespace, if the Argo CD deployment is not in the same namespace as the Zabbix proxy deployment. Connect to DNS and leave the port as the default because the template does not use this value. Like in the etcd template, a macro sets the port. Set the proxy to the proxy located in the cluster. In most cases, the macros do not need to be updated.

Click “Test -> Get value and test” to test the item. Prometheus metrics are returned, including a metric called argocd_app_info. This metric collects the status of the applications in Argo. We can collect all deployed applications with a discovery rule.

Navigate to the Argo CD template and click “Discovery rules -> Create discovery rule.” Call the rule “Discover Applications.” The type should be “Dependent item” because it depends on the metrics collection item. Set the master item to the “Get Application Metrics” item. The key will be argocd.applications.discovery. Go to the preprocessing tab and add a new step called, “Prometheus to JSON.” The preprocessing step will convert the application data to JSON, which will look like the one below.

[{"name":"argocd_app_info","value":"1","line_raw":"argocd_app_info{dest_namespace=\"monitoring\",dest_server=\"https://kubernetes.default.svc\",health_status=\"Healthy\",name=\"guestbook\",namespace=\"argocd\",operation=\"\",project=\"default\",repo=\"https://github.com/argoproj/argocd-example-apps\",sync_status=\"Synced\"} 1","labels":{"dest_namespace":"monitoring","dest_server":"https://kubernetes.default.svc","health_status":"Healthy","name":"guestbook","namespace":"argocd","operation":"","project":"default","repo":"https://github.com/argoproj/argocd-example-apps","sync_status":"Synced"},"type":"gauge","help":"Information about application."}]

Set the parameters to “argocd_app_info” to gather all metrics with that name. Under “LLD Macros”, set three macros. {#NAME} is set to the .labels.name key, {#NAMESPACE} is set to the .labels.dest_namespace key, and {#SERVER} is set to .labels.dest_server.

Let us create some item prototypes. Click “Create item prototype” and name it “{#NAME}: Health Status.” Set it as a dependent item with a key of argocd.applications[{#NAME}].health. The type of information will be “Character.” Set the master item to “Get Application Metrics.”

In preprocessing, add a Prometheus pattern step with parameters argocd_app_info{name=”{#NAME}”}. Use “label” and set the label to health_status. Add a second step to “Discard unchanged with heartbeat” with the heartbeat set to 2h.

Clone the prototype to create another item called “{#NAME}: Sync status.” Change the key to argocd.applications.sync[{#NAME}]. Under “Preprocessing” change the label to sync_status.

Now, when viewing “Latest Data” the sync and health status are available for each discovered application.

Conclusion

We have shown how Zabbix templates, such as the Kubernetes template, and the etcd template utilize Prometheus patterns to extract metric data. We have also created templates for new applications that expose Prometheus data. Because of the adoption of Prometheus in Kubernetes and cloud-native applications, Zabbix benefits by parsing this data so that Zabbix can monitor Kubernetes and cloud-native applications.

I hope you enjoyed this series on monitoring Kubernetes and cloud-native applications with Zabbix. Good luck on your monitoring journey as you learn to monitor with Zabbix in a containerized world.

About the Author

Michaela DeForest is a Platform Engineer for The ATS Group. She is a Zabbix Certified Specialist on Zabbix 6.0 with additional areas of expertise, including Terraform, Amazon Web Services (AWS), Ansible, and Kubernetes, to name a few. As ATS’s resident authority in DevOps, Michaela is critical in delivering cutting-edge solutions that help businesses improve efficiency, reduce errors, and achieve a faster ROI.

About ATS Group:

The ATS Group provides a fully inclusive set of technology services and tools designed to innovate and transform IT. Their systems integration, business resiliency, cloud enablement, infrastructure intelligence, and managed services help businesses of all sizes “get IT done.” With over 20 years in business, ATS has become the trusted advisor to nearly 500 customers across multiple industries. They have built their reputation around honesty, integrity, and technical expertise unrivaled by the competition.

Just-in-Time user provisioning explained

Post Syndicated from Evgeny Yurchenko original https://blog.zabbix.com/just-in-time-user-provisioning-explained/25515/

Zabbix 6.4 finally brings a very much waited feature called “Just-In-Time user provisioning”. Zabbix “What’s new in 6.4” LDAP/SAML user provisioning paragraph is very brief and can not (not that I am saying it should) deliver any excitement about this new really game changing feature. This blog post was born to address two points:

  • explain in more details why it is “game changing” feature
  • configuration of this feature is very flexible and as it often happens flexibility brings complexity and sometimes confusion about how to actually not only get it working but also to get the most of this feature

NOTE: I am talking about LDAP in this blog post but SAML works exactly the same way so you can easily apply this article to SAML JIT user provisioning configuration.

Old times (before 6.4)

Let’s do a quick reminder how it worked before Zabbix 6.4:Obvious problem here is that a User must be pre-created in Zabbix to be able to log in using LDAP. The database user records do not have any fields noticing that the user will be authenticated via LDAP, it’s just users’ passwords stored in the database are ignored, instead, Zabbix goes to an LDAP server to verify whether:

  • a user with a given username exists
  • user provided the correct password

no other attributes configured for the user on the LDAP server side are taken into account.

So when Zabbix is used by many users and groups, user management becomes not a very trivial task as new people join different teams (or leave).

Zabbix 6.4 with JIT user provisioning enabled

Now let’s take a look at what is happening in Zabbix 6.4 (very simplified picture). The picture depicts what happens when memberOf method is selected for Group Configuration (more on that later):Now when Zabbix gets a username and password from the Login form it goes to the LDAP server and gets all the information available for this user including his/her LDAP groups membership and e-mail address. Obviously, it gets all that only if the correct (from LDAP server perspective) username and password were provided. Then Zabbix goes through pre-configured mapping that defines users from which LDAP group goes to which Zabbix user group. If at least one match is found then a user is created in the Zabbix database belonging to a Zabbix user group and having a Zabbix user role according to configured “match”. So far sounds pretty simple, right? Now let’s go into detail about how all this should be configured.

LDAP server data

To experiment with the feature I built a Docker container which is a fully functional LDAP server with some pre-configured data, you can easily spin it up using this image. Start the container this way:

docker run -p 3389:389 -p 6636:636 --name openldap-server --detach bgmot42/openldap-server:0.1.1

To visually see LDAP server data (and add your own configuration like users and groups) you can start this standard container

docker run -p 8081:80 -p 4443:443 --name phpldapadmin --hostname phpldapadmin --link openldap-server:ldap-host --env PHPLDAPADMIN_LDAP_HOSTS=ldap-host --detach osixia/phpldapadmin:0.9.0

Now you can access this LDAP server via https://<ip_address>:4443 (or any other port you configure to access this Docker container), click Login, enter “cn=admin,dc=example,dc=org” in Login DN field and “password” in Password field, click Authenticate. You should see the following structure of the LDAP server (picture shows ‘zabbix-admins’ group configuration):All users in this container for convenience are configured with “password” word as their passwords.

General LDAP authentication configuration in Zabbix

No surprises here, you need to enable LDAP authentication, just a couple of additions here:

  • You must provide Deprovisioned users group. This group must be literally “disabled” otherwise you won’t be able to select it here. This is the Zabbix user group where all “de-provisioned” users will be put into so effectively will get disabled from accessing Zabbix.
  • Enable JIT provisioing check-box which obviously needs to be checked for this feature to work.

And again already familiar interface to configure a LDAP server and search parameters, however, this picture depicts how we actually fill in these parameters according to data in our LDAP server:

“Special” Distinguished Name (DN) cn=ldap_search,dc=example,dc=org is used for searching, i.e. Zabbix uses this DN to connect to LDAP server and of course when you connect to LDAP server you need to be authenticated – this is why you need to provide Bind password. This DN should have access to a sub-tree in LDAP data hierarchy where all your users are configured. In our case all the users configured “under” ou-Users,dc=example,dc=org, this DN is called base DN and used by Zabbix as so to say “starting point” to start searching.
Note: technically it is possible to bind to LDAP server anonymously, without providing a password but this is a huge breach in security as the whole users sub-tree becomes available for anonymous (unauthenticated) search, i.e. effectively exposed to any LDAP client that can connect to LDAP server over TCP. The LDAP server we deployed previously in Docker container does not provide this functionality.

Group configuration method “memberOf”

All users in our LDAP server have memberOf attribute which defines what LDAP groups every user belongs to, e.g. if you perform a LDAP query for user1 user you’ll get that its memberOf attribute has this value:
memberOf: cn=zabbix-admins,ou=Group,dc=example,dc=org
Note, that your real LDAP server can have totally different LDAP attribute that provides users’ group membership, and of course, you can easily configure what attribute to use when searching for user’s LDAP groups by putting it into User group membership attribute field:

In the picture above we are telling Zabbix to use memberOf attribute to extract DN defining user’s group membership (in this case it is cn=zabbix-admins,out=Group,dc=example,dc=org) and take only cn attribute from that DN (in this case it is zabbix-admins) to use in searching for a match in User group mapping rules. Then we define as many mapping rules as we want. In the picture above we have two rules:

  • All users belonging to zabbix-users LDAP group will be created in Zabbix as members of Zabbix users group with User role
  • All users belonging to zabbix-admins LDAP group will be created in Zabbix as members of Zabbix administrators group with Super admin role

Group configuration method “groupOfNames”

There is another method of finding users’ group membership called “groupOfNames” it is not as efficient as “memberOf” method but can provide much more flexibility if needed. Here Zabbix is not querying LDAP server for a user instead it is searching for LDAP groups based on a given criterion (filter). It’s easier to explain with pictures depicting an example:

Firstly we define LDAP “sub-tree” where Zabbix will be searching for LDAP groups – note ou=Group,dc=example,dc=org in Group base DN field. Then in the field Group name attribute field we what attribute to use when we search in mapping rules (in this case we take cn, i.e. only zabbix-admins from full DN cn=zabbix-admins,ou=Group,dc=example,dc=org). Each LDAP group in our LDAP server has member attribute that has all users that belong to this LDAP group (look at the right picture) so we put member in Group member attribute field. Each user’s DN will help us construct Group filter field. Now pay attention: Reference attribute field defines what LDAP user’s attribute Zabbix will use in the Group filter, i.e. %{ref} will be replaced with the value of this attribute (here we are talking about the user’s attributes – we already authenticated this user, i.e. got all its attributes from LDAP server). To sum up what I’ve said above Zabbix

  1. Authenticate the user with entered Username and Password against LDAP server getting all user’s LDAP attributes
  2. Uses Reference attribute and Group filter fields to construct a filter (when user1 logs in the filter will be (member=uid=user1,ou=Users,dc=example,dc=org)
  3. Performs LDAP query to get all LDAP groups with member attribute (configured in Group member attribute field) containing constructed in step 2 filter
  4. Goes through all LDAP groups received in step 3 and picks cn attribute (configured in Group name attribute field) and finds a match in User group mapping rules

Looks a bit complicated but all you really need to know is the structure of your LDAP data.

Demo time

Finally let’s see what happens when user1 belonging to zabbix-admins LDAP group and user3 belonging to zabbix-users LDAP group log in:

That’s it. Happy JIT user provisioning!

Kubernetes monitoring with Zabbix – Part 2: Understanding the discovered resources

Post Syndicated from Michaela DeForest original https://blog.zabbix.com/kubernetes-monitoring-with-zabbix-part-2-understanding-the-discovered-resources/25476/

In the previous blog post, we installed the Zabbix Agent Helm Chart and set up official Kubernetes templates to monitor a cluster in Zabbix. In this edition, part 2, we will explore the functionality provided by the Kubernetes integration in Zabbix and discuss use cases for monitoring and alerting on events in a cluster. (This post assumes that the Kubernetes integration has been set up in at least one cluster using the helm chart and provided templates.)

Want to see Kubernetes monitoring in action? Watch Part 2 of our Kubernetes monitoring video guide.

Node and Component Discovery

Following integration setup, the templates will discover control plane components, each node, and the kubelet associated with it using the Kubernetes API via a “Script” item type.

Note:

In the last blog post, I showed a managed EKS cluster. Control plane components cannot be discovered in an EKS cluster because AWS does not make them directly available through the API. For the sake of demonstrating the full capabilities of the integration, this post will use screenshots depicting a cluster that was created using the kubeadm utility.

In the latest version of Zabbix (6.2 at the time of writing), control plane components are discovered via node labels added only for clusters created with kubeadm. Depending on your setup, you may be able to add the same node labels to your own control plane nodes or modify the template to use your specific labels.

This example cluster has 4 worker nodes and 1 master node. The control plane runs entirely on the master node.

Zabbix’s “Low-Level Discovery” is the backbone of the Kubernetes integration. Zabbix discovers each node and creates two hosts to represent them in the cluster. The first host attaches the “Linux by Zabbix Agent” template to it, and the second attaches a custom Kubelet template called “Kubernetes Kubelet by HTTP. Zabbix also creates items for most standard objects like pods, deployments, replicasets, job, cronjob, etc.

Node and Kubernetes Performance Metrics

In this example, there are four discovered worker nodes with the “Linux by Zabbix Agent” template attached to them. The template will provide metrics about the machines running in the cluster.

Each worker host’s “System performance” dashboard shows system load, CPU usage, and memory usage metrics.

Zabbix will also collect Kubernetes-specific metrics related to the nodes. “Latest Data” for the Kubernetes Nodes host shows metrics such as the Allocatable CPU available to pods and the node’s memory capacity.

Alerts are generated for events such as the allocation of too much CPU. This could indicate that capacity should be increased, assuming that the memory and CPU limits set on the pod label are accurate.

The Kubernetes integration also monitors object states. As a best practice, any tool used to monitor Kubernetes should be monitoring and alerting critical status changes within the cluster. The image above shows the triggers related to the health of a pod. There are also triggers when certain conditions are detected by the nodes, like memory or CPU pressure.

Zabbix discovers objects like pods, deployments, and Replicasets, and triggers on object states.  For example, pods that are not up or deployments that do not have the correct number of replicas up.

In this example, a cluster is running a Kubernetes dashboard deployment with 3 replicas. By running the following command, we can see that all 3 replicas are up. Under “Latest Data,” Zabbix shows those 3 replicas available out of the 3 desired.

kubectl get deployment kubernetes-dashboard



To mimic a pod crashing, the pod is edited to use an invalid image tag.

kubectl edit pod <pod name>

The image tag is changed to  “invalid.tag, “ which is unavailable for the image. This causes the pod to fail because it can no longer pull the image. Output now shows that one pod is no longer ready.

Looking at the data in Zabbix, the number of available replicas is only 3, while the number of unavailable replicas is now 1.

On the problems page, there are two new problems. Both alerted that there is a mismatch between the number of replicas for the dashboard and the number of desired replicas.

Changing the tag back to a valid one should cause those problems to be resolved.

The Kubernetes templates offer many metrics and triggers, including most provided by Prometheus and Alert Manager. With some Zabbix experience and the ability to navigate kube-state-metrics and Kubernetes APIs, creating new items is possible.

What’s Next?

Above is an example of the output from the kube-state-metrics API. Unlike most APIs that return data in JSON format, the kube-state-metrics API uses the Prometheus data model to supply metrics.

As you get comfortable with Kubernetes monitoring in Zabbix, you may want to parse your own metrics from kube-state-metrics and create new items.

In the next video, we will learn how to monitor applications with Prometheus in Zabbix.

About the Author

Michaela DeForest is a Platform Engineer for The ATS Group.  She is a Zabbix Certified Specialist on Zabbix 6.0 with additional areas of expertise, including Terraform, Amazon Web Services (AWS), Ansible, and Kubernetes, to name a few.  As ATS’s resident authority in DevOps, Michaela is critical in delivering cutting-edge solutions that help businesses improve efficiency, reduce errors, and achieve a faster ROI.

About ATS Group: The ATS Group provides a fully inclusive set of technology services and tools designed to innovate and transform IT.  Their systems integration, business resiliency, cloud enablement, infrastructure intelligence, and managed services help businesses of all sizes “get IT done.” With over 20 years in business, ATS has become the trusted advisor to nearly 500 customers across multiple industries.  They have built their reputation around honesty, integrity, and technical expertise unrivaled by the competition.

How to write a webhook for Zabbix

Post Syndicated from Andrey Biba original https://blog.zabbix.com/how-to-write-a-webhook-for-zabbix/25298/

As you know, a picture is worth a thousand words. Therefore, I would like to share the process of creating a webhook from scratch. In this article, we will walk through the creation process step by step – starting with studying the target service with which Zabbix will integrate and finishing with tests for sending events from Zabbix. Although it may seem complicated, writing your own integrations is not so difficult.

Preparation

First, we need to decide what we want to see as a result of the webhook. In most cases, the services to which we will send events are divided into 2 types:

  • Messengers to which you can send messages. For example, Telegram, Slack, Discord, etc.
  • Service Desks where you can open, close, and update tickets. For example, Jira, Redmine, ServiceNow, etc.

In both cases, the principle of creating a webhook will not differ – the difference is only in the complexity of one type from the other.

In this article, I will describe the process of creating a webhook for messengers – and specifically for Line messenger.

After we have decided on the type, we need to find out whether this service supports the possibility of API requests and, if it does, what is required for this. Usually, all the services you want to integrate Zabbix with have somewhat detailed documentation about the API methods they support. By the way, Zabbix also has its own API, which is documented in detail.

After we are done studying the Line documentation, we find out that messages are sent using the POST method to the https://api.line.me/v2/bot/message/push endpoint, using the Line bot token in the request header for authorization and passing a specially formatted JSON in the request body with the content of the message. Confused? No problem. Let’s take a closer look.

HTTP requests

The operation of the API is based on HTTP requests, which are executed with parameters provided by the developers of this API.

Several types of HTTP requests are used more often than others:

  • GET – is perhaps the most common one that all of us encounter on a daily basis. This request only involves getting data. For example, the browser used a GET request from the web server to fetch the article you are currently reading.
  • POST – is a request that sends data to a resource. This is exactly the case when we want to pass something to the service using API requests.
  • PUT – is much less common than the previous 2, but no less important. This query replaces the values in a resource.

These are not all HTTP request methods, but these three will suffice for a general introduction.

We are done with methods. Let’s move on to the endpoint.

An endpoint is a permanent address of a resource via which we transfer, receive, or change data. In this case, https://api.line.me/v2/bot/message/push is the endpoint that accepts POST requests to send messages.

So, the method and the endpoint are defined. What’s next?

Generally, any HTTP request consists of:

  1. URL
  2. Method
  3. Headers
  4. Body
HTTP request structure

We have already dealt with the first two, but the headers and the request body remain.

Headers usually contain service information that allows you to process a request correctly. For example, the Content-Type: application/json header implies that our request body should be interpreted as a json object. Also, quite often, authorization information is passed in the headers. As in the case of Line, the Authorization: Bearer {channel access token} header contains the authorization token of the bot on behalf of which messages will be sent.

The request body usually contains the information we want to pass on to the service. In our case, this will be the subject and body of the event in Zabbix.

Checking the service API

The documentation is good, but it is necessary to check that everything we read works exactly how it is documented. It is not uncommon that a service can be developed faster than the documentation can keep up with it. So field testing never hurts. Excluding unexpected behavior will significantly reduce the time spent searching for problems.

I recommend using Postman to work with API requests – a handy tool that saves time. But for this article, we will use cURL due to its prevalence and ease of use.

I will not describe the process of creating the Line Bot API token because this is not directly related to the article. However, for those interested in this process, I will leave a link here.

As we have already found out, the request type will be POST, the access point URL is https://api.line.me/v2/bot/message/push, and additional headers must be passed: Content-Type: application/json which specifies the type of data to be sent (in our case it is JSON) and Authorization: Bearer {token value}. And the messages themselves are in JSON format. For example, I used 2 messages – “Hello, world1” and “Hello, world2”. As a result, I got the following query:

After executing the request, we got the expected result of 2 messages that came to the messenger, which were in the request body.

Excellent! So half of the work has already been done: there is a ready-made request that works in manual mode and successfully sends messages to Line. The only thing left is to put the necessary information in the right places and automate the process using JS and Zabbix.

Integration with Zabbix

After successfully completing the tests, go to Zabbix, create a new notification method in the Administration section, select the webhook type, and name it Line.

For webhook integrations with external services, Zabbix uses the built-in JavaScript engine on Duktape. Parameters are passed to the script, which is used to build the logic of the webhook. As a result of the script, tags can be returned that will be assigned to the event. This is usually necessary in case of integration with service desks in order to be able to update the status of tickets.

Let’s take a closer look at the webhook setup interface.

The Media type section contains the general settings for the new media type:

  • Name – Name of the media type.
  • Type – The type of media type. There are 4 types: email, SMS, webhook, and script.
  • Parameters – This is a list of variables passed to the code. All necessary data can be passed through parameters: event id, event type, trigger severity, event source, etc. You can specify macros and text values in parameters. The parameters are passed as a JSON string, accessible through the built-in variable value.
  • Script – JS script describing the logic of the webhook.
  • Timeout – The time after which the script will be terminated.
  • Process tags   – If this option is enabled, the webhook will support generating tags for events sent using this hook.
  • Include event menu entry – This option makes the Menu Entry Name and Menu Entry URL fields available for use.
  • Menu entry name – The text displayed in the event dropdown menu for the Menu entry URL submitted using this hook.
  • Menu entry URL – A link to an external resource in the event menu.
  • Description – A text field that contains a description of the notification method.
  • Enabled – an Option that allows enabling or disabling the media type.

The Message templates section contains templates that are used by webhook to send alerts. Each template contains:

  • Message type – The event type to which the message will apply. For example, Problem – when the trigger fires and Problem recovery – when the problem is resolved.
  • Subject  – The headline of the message.
  • Message – A message template that contains useful information about the event. For example, event time, date, event name, host name, etc.

The Options section contains additional options:

  • Concurrent sessions – The number of concurrent sessions to send an alert.
  • Attempts – The number of retries in case of send failure.
  • Attempt interval  – The frequency of attempts to send an alert.

When writing your own webhook, you can take an existing one as a basis – Zabbix has more than thirty ready-made webhook solutions of varying complexity. All basic functions are usually repeated from hook to hook with little or no change at all, as are the parameters passed to them.

Let’s set the following parameters:

It is convenient to set parameter values with macros. A macro is a variable in Zabbix that contains a specific value. Macros allow you to optimize and automate your work. They can be used in various places, such as triggers, filters, alerts, and so on.

A little more about each macro separately in order to understand why each of them is needed:

  • {ALERT.SUBJECT} – The subject of the event message. This value is taken from the Subject field of the corresponding Message template type.
  • {ALERT.MESSAGE} – The event message body. This value is taken from the Message field of the corresponding Message template type.
  • {EVENT.ID} – The event id in Zabbix. Could be used for generating a link to an event
  • {EVENT.NSEVERITY} – The numerical definition of the event’s severity from 0-5. We will use this to change the message in case of different severity.
  • {EVENT.SOURCE} – The event source. Needed to handle events correctly. In most cases, we are interested in triggers; this corresponds to source value 0.
  • {EVENT.UPDATE.STATUS} – Returns 1 if it is an update event. For example, in case of acknowledge operations or a change in severity.
  • {EVENT.VALUE} – The event state. 0 for recovery and 1 for the problem.
  • {ALERT.SENDTO} – The field from the media type assigned to the user. It returns the ID of the user or group in the Line, where it will be necessary to send a message
  • {TRIGGER.DESCRIPTION} – A macro that will be expanded if the event source is a trigger. Returns the description of the trigger
  • {TRIGGER.ID} – The trigger ID. Required to generate a link to an event in Zabbix

Webhooks can use other macros if needed. A list of all macros can be viewed on the documentation page. Be careful – not all macros can be used in webhooks.

Writing the script

Before writing the script, let’s define the main points that the webhook will need to be able to perform:

  • the script should describe the logic for sending messages
  • handle possible errors
  • logging for debugging

I will not describe the entire code in order not to repeat the same type of blocks and concentrate only on important aspects.

To send messages, let’s write a function that will accept messages and params variables. We got the following function:

function sendMessage(messages, params) {
    // Declaring variables
    var response,
        request = new HttpRequest();

    // Adding the required headers to the request
    request.addHeader('Content-Type: application/json');
    request.addHeader('Authorization: Bearer ' + params.bot_token);

    // Forming the request that will send the message
    response = request.post('https://api.line.me/v2/bot/message/push', JSON.stringify({
        "to": params.send_to,
        "messages": messages
    }));

    // If the response is different from 200 (OK), return an error with the content of the response
    if (request.getStatus() !== 200) {
        throw "API request failed: " + response;
    }
}

Of course, this is not a reference function, and depending on the requirements for the request may differ. There may be other required headers and a different request body. In some cases, it may be necessary to add an additional step to obtain authorization data through another API request.

In this case, the request to send a message returns an empty {} object, so it makes no sense to return it from the function. But for example, when sending a message to Telegram, an object with data about this message is returned. If you pass this data to tags, you can write logic that will change the already sent message – for example, in case of closing or updating the problem.

Now let’s describe a function that will accept webhook parameters and validate their values. In the example, we will not describe all the conditions because they are of the same type:

function validateParams(params) {
    // Checking that the bot_token parameter is a string and not empty
    if (typeof params.bot_token !== 'string' || params.bot_token.trim() === '') {
        throw 'Field "bot_token" cannot be empty';
    }

    // Checking that the event_source parameter is only a number from 0-3
    if ([0, 1, 2, 3].indexOf(parseInt(params.event_source)) === -1) {
        throw 'Incorrect "event_source" parameter given: "' + params.event_source + '".nMust be 0-3.';
    }

    // If an event of type "Discovery" or "Autoregistration" set event_value 1, 
    // which means "Problem", and we will process these events same as problems
    if (params.event_source === '1' || params.event_source === '2') {
        params.event_value = '1';
    }

    ...

    // Checking that trigger_id is a number and not equal to zero
    if (isNaN(params.trigger_id) && params.event_source === '0') {
        throw 'field "trigger_id" is not a number';
    }
}

As you can see from the code, in most cases these are simple checks that allow you to avoid errors associated with the input data. Validation is necessary because there is no guarantee that the expected value will be in the parameter.

The main block of code is placed inside the try…catch block in order to correctly handle errors:

try {
    // Declaring the params variable and writing the webhook parameters to it
    var params = JSON.parse(value);

    // Calling the validation function and passing parameters to it for verification
    validateParams(params);

    // If the event is a trigger and it is in the problem status, compose the message body
    if (params.event_source === '0' && params.event_value === '1') {
        var line_message = [
            {
                "type": "text",
                "text": params.alert_subject + 'nn' +
                    params.alert_message + 'n' + params.trigger_description
            }
        ];
    }

    ...

    // Sending a composed message
    sendMessage(line_message, params);

    // Returning OK so that the webhook understands that the script has completed with OK status
    return 'OK';
}
catch (err) {
    // Adding a log function so in case of problems you can see the error in the Zabbix server console
    Zabbix.log(4, '[ Line Webhook ] Line notification failed : ' + err);

    // In case of an error, return it from the webhook
    throw 'Line notification failed : ' + err;
}

Here we assign parameter values to the params variable, then validate them using the validateParams() function, describe the main conditions for generating a message, and send this message to the messenger. At the same time, the try…catch block allows you to catch all errors, log them to Zabbix and return them in a readable form to the user in the web interface.

For writing webhooks in Zabbix, there is a guideline dedicated to this topic. Please read this information because it will help you write better code and avoid common mistakes.

Testing

After we’ve finished with the webhook script, it’s time to test how our code works. To do this, Zabbix provides a function to send test messages. Go to the AdministrationMedia types, find Line, and click on the Test button opposite it. In the window that appears, fill in all the fields with the necessary data and press the Test button. Check the messenger and see that the message came with the data we specified in the test.

Ready-made Line integration can be found in the Zabbix git repository and in all recent Zabbix instance builds.

Troubleshooting

Of course, everything in the article looks like I did it on the first attempt and did not encounter a single error or problem. Naturally, this is not the case in practice. Work with each new product includes Research & Development. How can you catch errors and, most importantly, understand the problem?

Well, as I wrote earlier – read the documentation and test all requests before writing code. At this stage, it is easiest to catch all the problems. The response to the HTTP request will explicitly describe the error. For example, if you make a mistake in the request body and send an object with incorrect values, the service will return the body with an error description and the response status 400 (Bad request).

There are several options for debugging in case of errors that may occur when writing a webhook script:

  • Focus on the errors displayed when the notification method is executed. For example, if you mistyped or set the wrong name of the function and variable.
  • Include logging in the code for displaying service information. For example, while you are in the script development stage, the result of the function can be logged using the Zabbix.log() function. Zabbix supports 6 debug levels (0-5), which can be set in this function. Usually, webhooks use level 4, which contains information for debugging.
  • Use the zabbix_js utility. You can transfer a file with a script and parameters to it. You can read more about it here.

Conclusion

I hope this article has helped you better understand how webhooks work in Zabbix and highlighted the basic steps for creating, diagnosing, and preparing to write your integration. The Zabbix community is constantly adding custom templates and media types. I expect that after reading this article, more people will be interested in creating their own webhooks and sharing them with the community. We appreciate any contribution to the development and expansion of the base of integration solutions.

Questions

Q: I don’t know JS, but I know other languages. Is native support of other languages planned in Zabbix, such as Python?

A: For now, there are no such plans.

Q: Are there any restrictions with writing a JS script for a webhook?

A: Yes, there are. The built-in Duktape engine is used to execute the code, and it does not have all the functionality that is available in the latest JS releases. Therefore, I recommend that you read the documentation of this engine and the built-in objects to learn more about the available methods.

Monitoring Kubernetes with Zabbix

Post Syndicated from Michaela DeForest original https://blog.zabbix.com/monitoring-kubernetes-with-zabbix/25055/

There are many options available for monitoring Kubernetes and cloud-native applications. In this multi-part blog series, we’ll explore how to use Zabbix to monitor a Kubernetes cluster and understand the metrics generated within Zabbix. We’ll also learn how to exploit Prometheus endpoints exposed by applications to monitor application-specific metrics.

Want to see Kubernetes monitoring in action? Watch the step-by-step Zabbix Kubernetes monitoring configuration and deployment guide.

Why Choose Zabbix to Monitor Kubernetes?

Before choosing Zabbix as a Kubernetes monitoring tool, we asked ourselves, “why would we choose to use Zabbix rather than Prometheus, Grafana, and alertmanager?” After all, they have become the standard monitoring tools in the cloud ecosystem. We decided that our minimum criteria for Zabbix would be that it was just as effective as Prometheus for monitoring both Kubernetes and cloud-native applications.

Through our discovery process, we concluded that Zabbix meets (and exceeds) this minimum requirement. Zabbix provides similar metrics and triggers as Prometheus, alert manager, and Grafana for Kubernetes, as they both use the same backend tools to do this. However, Zabbix can do this in one product while still maintaining flexibility and allowing you to monitor pretty much anything you can write code to collect. Regarding application monitoring, Zabbix can transform Prometheus metrics fed to it by Prometheus exporters and endpoints. In addition, because Zabbix can make calls to any HTTP endpoint, it can monitor applications that do not have a dedicated Prometheus endpoint, unlike Prometheus.

The Zabbix Helm Chart

Zabbix monitors Kubernetes by collecting metrics exposed via the Kubernetes API and kube-state-metrics. The components necessary to monitor a cluster are installed within the cluster using this helm chart provided by Zabbix. The helm chart includes the Zabbix agent installed as a daemon set and is used to monitor local resources and applications on each node. A Zabbix proxy is also installed to collect monitoring data and transfer it to the external Zabbix server.

Only the Zabbix proxy needs access to the Zabbix server, while the agents can send data to the proxy installed in the same namespace as each agent. A cluster role allows Zabbix to access resources in the cluster via the Kubernetes API. While the cluster role could be modified to restrict privileges given to Zabbix, this will result in some items becoming unsupported. We recommend keeping this the same if you want to get the most out of Kubernetes monitoring with Zabbix.

The Zabbix helm chart installs the kube-state-metrics project as a dependency. You may already be familiar with this project under the Kubernetes organization, which generates Prometheus format metrics based on the current state of the Kubernetes resources. In addition, if you have experience using Prometheus to monitor a cluster, you may already have this installed. If that is the case, you can point to this deployment rather than installing another one.

In this tutorial, we will install kube-state-metrics via the Zabbix helm chart.

For more information on skipping this step, refer to the values file in the Zabbix Kubernetes helm chart.

Installing the Zabbix Helm Chart

Now that we’ve explained how the Zabbix helm chart works, let’s go ahead and install it. In this example, we will assume that you have a running Zabbix 6.0 (or higher) instance that is reachable from the cluster you wish to monitor. I am running a 6.0 instance in a different cluster than the one we want to monitor. The server is reachable via the DNS name mdeforest.zabbix.atsgroup.io with a non-standard port of 31103.

We will start by installing the latest Zabbix helm chart. I recommend visiting zabbix.com/integrations/kubernetes to get any sources that may be referred to in this tutorial. There you will find a link to the Zabbix helm chart and templates. For the most part, we will follow the steps outlined in the readme.

 

Using a terminal window, I am going to make sure the active cluster is set to the cluster that I want to monitor:

kubectl config use-context <cluster context name>

I’m then going to add the Zabbix chart repo to my local helm repository:

helm repo add zabbix-chart-6.0 https://cdn.zabbix.com/zabbix/integrations/kubernetes-helm/6.0/

If you’re running Zabbix 6.2 or newer, change the references to 6.0 in this command to 6.2.

Depending on your circumstances, you will need to set a few values for the installation. In most cases, you only need to set a few environment variables for the Zabbix agent and the proxy. The complete list of values and environment variables is available in the helm chart repo, alongside the agent and proxy images on Docker Hub.

In this case, I’m setting the passive server environment variable for the agent to allow any IP to connect. For the proxy, I am setting the server host accessible from the proxy alongside the non-standard port. I’ve also set here some variables related to cache size. These variables may depend on your cluster size, so you may need to play around with them to find the correct values.

Now that I have the values file ready, I’m ready to install the chart. So, we’ll use the following command. Of course, the chart path might vary depending on what version of the chart you’re using.

helm install -f </path/to/values/file> [-n <namespace>] zabbix zabbix-chart-6.0/zabbix-helm-chart

You can also optionally add a namespace. You must wait until everything is running, so I’ll check just that with the following:

watch kubectl get pods

Now that everything is installed, we’re ready to set up hosts in Zabbix that will be associated with the cluster. The last step before we have all the information we need is to obtain the token created via the service account installed with the helm chart. We’ll get this by running the next command, which is the name of the service account that was created:

kubectl get secret -o jsonpath={.data.token} zabbix service-account | base64 -d

This will get the secret created for the service account and grab just the token from that, which is passed to the base64 utility to decode it. Be sure to copy that value somewhere because you’ll need it for later.

You’ll also need the Kubernetes API endpoint. In most cases, you’ll use the proxy installed rather than the server directly or a proxy outside the cluster. If this is the case, you can use the service DNS for the API. We should be able to reach it by pointing to https://kubernetes.default.svc.cluster.local:443/api.

If this is not the case, you can use the output from the command:

kubectl cluster-info

Now, let’s head over to the Zabbix UI. All the templates we need are shipped in Zabbix 6. If for some reason, you can’t find them, they are available for download and import by visiting the integrations page that I pointed out earlier on the Zabbix site.

Adding the Proxy

We will add our proxy by heading to Administration -> Proxies:

  1. Click Create Proxy. Because this is an active proxy by default, we only need to specify the proxy name. If you didn’t make any changes to the helm chart, this should default to zabbix-proxy. If you’d like to name this differently, you can change the environment variable zbx_hostname for the proxy in the helm chart. We’re going to leave it as the default for now. You’re going to enter this name and then click “Add.” After a few minutes, you’ll start to see that it says that the proxy has been seen.
  2. Create a Host Group to put hosts related to Kubernetes. For this example, let’s create one, which we’ll call Kubernetes.
  3. Head to the host page under configuration and click Create Host. The first host will collect metrics related to monitoring Kubernetes nodes, and we’ll discover nodes and create new hosts using Zabbix low-level discovery.
  4. Give this host the name Kubernetes Nodes. We’ll also assign this host to the Kubernetes host group we created and attach the template Kubernetes nodes by HTTP.
  5. Change the line “Monitored by proxy” to the proxy created earlier, called zabbix-proxy.
  6. Click the Macros tab and select “Inherited and host macros.” You should be able to see all the macros that may be set to influence what is monitored in your cluster. In this case, we need to change the first two macros. The first, {KUBE.API.ENDPOINT.URL}, should be set to the Kubernetes API endpoint. In our case, we can set it to what I mentioned earlier: default.svc.cluster.local:443/api. Next, the token should be set to the previously retrieved value from the command line.
  7. lick Add. After a few minutes, you should start seeing data on the latest data page and new hosts on the host page representing each node.

Creating an Additional Host

Now let’s create another host that will represent the metrics available via the Kubernetes API and the kube-state-metrics endpoint.

  1. Click Create Host again, name this host Kubernetes Cluster State, and add it to the Kubernetes group again.
  2. Let’s also attach the Kubernetes Cluster State template by HTTP. Again, we’re going to choose the proxy that we created earlier.
  3. In the Macro section, change the kube.api.url to the same thing we used before, but this time leave off the /api at the end. Simply: default.svc.cluster.local:443. Be sure to set the token as we did before.
  4. Assuming nothing else was changed in the installation of the helm chart, we can now add that host.

After a few minutes, you should receive metrics related to the cluster state, including hosts representing the kubelet on each node.

What’s Next?

Now you’re all set to start monitoring your Kubernetes cluster in Zabbix! Give it a try, and let us know your thoughts in the comments.

In the next blog post, we’ll look at what you can do with your newly monitored cluster and how to get the most out of it.

If you’d like help with any of this, ATS has advanced monitoring, orchestration, and automation skills to make this process a snap. Set up a 15-minute with our team to go through any questions you have.

About the Author

Michaela DeForest is a Platform Engineer for The ATS Group.  She is a Zabbix Certified Specialist on Zabbix 6.0 with additional areas of expertise, including Terraform, Amazon Web Services (AWS), Ansible, and Kubernetes, to name a few.  As ATS’s resident authority in DevOps, Michaela is critical in delivering cutting-edge solutions that help businesses improve efficiency, reduce errors, and achieve a faster ROI.

About ATS Group: The ATS Group provides a fully inclusive set of technology services and tools designed to innovate and transform IT.  Their systems integration, business resiliency, cloud enablement, infrastructure intelligence, and managed services help businesses of all sizes “get IT done.” With over 20 years in business, ATS has become the trusted advisor to nearly 500 customers across multiple industries.  They have built their reputation around honesty, integrity, and technical expertise unrivaled by the competition.