Tag Archives: monitoring

Blending Zabbix and AI with Tomáš Heřmánek

Post Syndicated from Michael Kammer original https://blog.zabbix.com/blending-zabbix-and-ai-with-tomas-hermanek/28832/

Zabbix Summit 2024 is only a few days away, which means that it’s time for the last of our interviews with Summit speakers. Our final chat this year is with Tomáš Heřmánek, the CEO and Founder of initMAX s.r.o. We asked him about his beginnings in the tech industry, how he got started with Zabbix, and how AI will change the game for monitoring in general and Zabbix in particular.

Please tell us a bit about yourself and the journey that led you to initMAX.

My journey in the IT field started with small ISPs and later took a significant leap into the world of Linux and application management, where the need for effective monitoring became evident. I worked for a company that prioritized high-quality open-source solutions, and it was during this time that we adopted Zabbix version 1.8 as a replacement for Nagios, which we found to be inflexible. Shortly after our deployment, Zabbix 2.0 was released. It introduced JMX monitoring, which was crucial for us. Since then, Zabbix has been our go-to solution for monitoring.

I set a personal goal to master this outstanding monitoring system and participated in the first official Zabbix training in the Czech Republic, where I earned my initial certifications as a Zabbix Specialist and Professional on version 3.0. The training experience drew me deeper into the world of Zabbix, especially after meeting a burgeoning group of enthusiasts in the country. I felt compelled to give back to the community that had supported me.

How long have you been using Zabbix? What kind of Zabbix-related tasks does your team tackle on a daily basis?

When I started my own company, becoming a Zabbix partner was a natural choice. To further contribute to the community, I pursued the Expert and Trainer certifications. It was the most challenging 14 days of my life, but it was worth it. For anyone serious about Zabbix, I highly recommend participating in official training sessions and actively engaging with the community through forums, local groups, Telegram, WhatsApp, blogs, and forums. This commitment to support and strengthen the community further.is also why we created our own wiki, which is accessible to everyone without restrictions.

Can you give us a few clues about what we can expect to hear during your Zabbix Summit presentation?

This year, I have prepared a demonstration for the Zabbix Summit showcasing how we integrate AI into our operations, including various modifications to the web interface that allow us to automate and streamline routine tasks. Besides showcasing these innovations, we will also be making some parts of our work available to the public. The main focus of my presentation will be on problem identification, automating the creation of preprocessing steps, and using a chatbot for creating hosts, reading configurations, and making modifications. Essentially, it’s a smart assistant and guide all in one.

The final section, which we find the most challenging, deals with automated event correlation and the creation of a topology, from which correlations partially derive and evaluate. We are using the new Zabbix 7.0 feature – root cause and symptoms – for visualization in Zabbix. Our goal is to showcase not only the capabilities of Zabbix in combination with AI, but also to contribute back to the community by sharing some of these developments freely.

In your experience, does Zabbix lend itself easily to enhancement via AI?

AI is something that truly fascinates us and is currently shaping the world. From our experience, we believe that the possibilities are limited only by our imagination. In the future, I can envision AI autonomously discovering elements that need to be monitored, integrating them into Zabbix, and configuring everything necessary for effective monitoring.

What changes do you think AI will bring to the world of monitoring in general over the next decade or so?

I foresee a shift in our roles, moving away from traditional IT tasks towards a focus on idea generation, control, and the customization of artificial intelligence. As AI continues to evolve, it will not only enhance automation but also empower us to explore and implement innovative solutions more effectively.

The post Blending Zabbix and AI with Tomáš Heřmánek appeared first on Zabbix Blog.

Exploring Telemetry Events in Amazon Q Developer

Post Syndicated from David Ernst original https://aws.amazon.com/blogs/devops/exploring-telemetry-events-in-amazon-q-developer/

As organizations increasingly adopt Amazon Q Developer, understanding how developers use it is essential. Diving into specific telemetry events and user-level data clarifies how users interact with Amazon Q Developer, offering insights into feature usage and developer behaviors. This granular view, accessible through logs, is vital for identifying trends, optimizing performance, and enhancing the overall developer experience. This blog is intended to give visibility to key telemetry events logged by Amazon Q Developer and how to explore this data to gain insights.

To help you get started, the following sections will walk through several practical examples that showcase how to extract meaningful insights from AWS CloudTrail. By reviewing the logs, organizations can track usage patterns, identify top users, and empower them to train and mentor other developers, ultimately fostering broader adoption and engagement across teams.

Although the examples here focus on Amazon Athena for querying logs, the methods can be adapted to integrate with other tools like Splunk or Datadog for further analysis. Through this exploration, readers will learn how to query the log data to understand better how Amazon Q Developer is used within your organization.

Solution Overview 

Architecture diagram illustrating the solution using Amazon Q Developer's logs from the IDE and terminal, captured in AWS CloudTrail. The logs are stored in Amazon S3 and queried using Amazon Athena to analyze feature usage, including in-line code suggestions, chat interactions, and security scanning events.

This solution leverages Amazon Q Developer’s logs from the Integrated Development Environment (IDE) and terminal, captured in AWS CloudTrail. The logs will be queried directly using Amazon Athena from Amazon Simple Storage Service (Amazon S3) to analyze feature usage, such as in-line code suggestions, chat interactions, and security scanning events.

Analyzing Telemetry Events in Amazon Q Developer

Amazon Athena is used to query the CloudTrail logs directly to analyze this data. By utilizing Athena, queries can be run on existing CloudTrail records, making it simple to extract insights from the data in its current format.

Ensuring CloudTrail is set up to log the data events.

  1. Navigate to the AWS CloudTrail Console.
  2. Edit an Existing Trail:
    • If you have a trail, verify it is configured to log data events for Amazon CodeWhisperer.
    • Note: As of 4/30/24, CodeWhisperer has been renamed to Amazon Q Developer. All the functionality previously provided by CodeWhisperer is now part of Amazon Q Developer. However, for consistency, the original API names have been retained. 
  3. Click on your existing trail in CloudTrail. Find the Data Events section and click edit.
    • For CodeWhisperer:
      • Data event type: CodeWhisperer
      • Log selector template: Log all events
  4. Save your changes.
  5. Note your “Trail log location.” This S3 bucket will be used in our Athena setup.

If you don’t have an existing trail, follow the instructions in the AWS CloudTrail User Guide to set up a new trail.

Below is a screenshot of the data events addition:

Screenshot showing the configuration of data events in AWS CloudTrail. The image illustrates the setup for logging data events for CodeWhisperer, including log selector templates ("Log all events").

Steps to Create an Athena Table from CloudTrail Logs: This step aims to turn CloudTrail events into a queryable Athena table.

 1. Navigate to the AWS Management Console > Athena > Editor.

 2. Click on the plus to create a query tab.

 3. Run the following query to create a database and table. Note to update the location to your S3 bucket.

-- Step 1: Create a new database (if it doesn't exist)
CREATE DATABASE IF NOT EXISTS amazon_q_metrics;

-- Step 2: Create the external table explicitly within the new database
CREATE EXTERNAL TABLE amazon_q_metrics.cloudtrail_logs (

    userIdentity STRUCT<
        accountId: STRING,
        onBehalfOf: STRUCT<
            userId: STRING,
            identityStoreArn: STRING
        >
    >,  
    eventTime STRING,
    eventSource STRING,
    eventName STRING,
    requestParameters STRING,
    requestId STRING,
    eventId STRING,
    resources ARRAY<STRUCT<
        arn: STRING,
        accountId: STRING,
        type: STRING
    >>,
    recipientAccountId STRING

)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://{Insert Bucket Name from CloudTrail}/'
TBLPROPERTIES ('classification'='cloudtrail');

 4. Click Run

 5. Run a quick query to view the data.

SELECT 
    eventTime,
    userIdentity.onBehalfOf.userId AS user_id,
    eventName,
    requestParameters
FROM 
    amazon_q_metrics.cloudtrail_logs AS logs
WHERE 
    eventName = 'SendTelemetryEvent'
LIMIT 10;

In this section, the significance of the telemetry events captured in the requestParameters field will be explained. The query begins by displaying key fields and their data, offering insights into how users interact with various features of Amazon Q Developer.

Query Breakdown:

  1. eventTime: This field captures the time the event was recorded, providing insights into when specific user interactions took place.
  2. userIdentity.onBehalfOf.userId: This extracts the userId of the user. This is critical for attributing interactions to the correct user, which will be covered in more detail later in the blog.
  3. eventName: The query is filtered on SendTelemetryEvent. Telemetry events are triggered when the user interacts with particular features or when a developer uses the service.
  4. requestParameters: The requestParameters field is crucial because it holds the details of the telemetry events. This field contains a rich set of information depending on the type of interaction and feature the developer uses, which programming languages are used, completion types, or code modifications.

In the context of the SendTelemetryEvent, various telemetry events are captured in the requestParameters field of CloudTrail logs. These events provide insights into user interactions, overall usage, and the effectiveness of Amazon Q Developer’s suggestions. Here are the key telemetry events along with their descriptions:

  1. UserTriggerDecisionEvent
    • Description: This event is triggered when a user interacts with a suggestion made by Amazon Q Developer. It captures whether the suggestion was accepted or rejected, along with relevant metadata.
    • Key Fields:
      • completionType: Whether the completion was a block or a line.
      • suggestionState: Whether the user accepted, rejected, or discarded the suggestion.
      • programmingLanguage: The programming language associated with the suggestion.
      • generatedLine: The number of lines generated by the suggestion.
  2. CodeScanEvent
    • Description: This event is logged when a code scan is performed. It helps track the scope and result of the scan, providing insights into security and code quality checks.
    • Key Fields:
      • codeAnalysisScope: Whether the scan was performed at the file level or the project level.
      • programmingLanguage: The language being scanned.
  3. CodeScanRemediationsEvent
    • Description: This event captures user interactions with Amazon Q Developer’s remediation suggestions, such as applying fixes or viewing issue details.
    • Key Fields:
      • CodeScanRemediationsEventType: The type of remediation action taken (e.g., viewing details or applying a fix).
      • includesFix: A boolean indicating whether the user applied a fix.
  4. ChatAddMessageEvent
    • Description: This event is triggered when a new message is added to an ongoing chat conversation. It captures the user’s intent which refers to the purpose or goal the user is trying to achieve with the chat message. The intent can include various actions, such as suggesting alternate implementations of the code, applying common best practices, improving the quality or performance of the code.
    • Key Fields:
      • conversationId: The unique identifier for the conversation.
      • messageId: The unique identifier for the chat message.
      • userIntent: The user’s intent, such as improving code or explaining code.
      • programmingLanguage: The language related to the chat message.
  5. ChatInteractWithMessageEvent
    • Description: This event captures when users interact with chat messages, such as copying code snippets, clicking links, or hovering over references.
    • Key Fields:
      • interactionType: The type of interaction (e.g., copy, hover, click).
      • interactionTarget: The target of the interaction (e.g., a code snippet or a link).
      • acceptedCharacterCount: The number of characters from the message that were accepted.
      • acceptedSnippetHasReference: A boolean indicating if the accepted snippet included a reference.
  6. TerminalUserInteractionEvent
    • Description: This event logs user interactions with terminal commands or completions in the terminal environment.
    • Key Fields:
      • terminalUserInteractionEventType: The type of interaction (e.g., terminal translation or code completion).
      • isCompletionAccepted: A boolean indicating whether the completion was accepted by the user.
      • terminal: The terminal environment in which the interaction occurred.
      • shell: The shell used for the interaction (e.g., Bash, Zsh).

For a full exploration of all event types and their detailed fields, you can refer to the official schema reference for Amazon Q Developer.

Telemetry events are key to understanding how users engage with Amazon Q Developer. They track interactions such as code completion, security scans, and chat-based suggestions. Analyzing the data in the requestParameters field helps reveal usage patterns and behaviors that offer valuable insights.

By exploring events such as UserTriggerDecisionEvent, ChatAddMessageEvent, TerminalUserInteractionEvent, and others in the schema, organizations can assess the effectiveness of Amazon Q Developer and identify areas for improvement.

Example Queries for Analyzing Developer Engagement

To gain deeper insights into how developers interact with Amazon Q Developer, the following queries can help analyze key telemetry data from CloudTrail logs. These queries track in-line code suggestions, chat interactions, and code-scanning activities. By running these queries, you can uncover valuable metrics such as the frequency of accepted suggestions, the types of chat interactions, and the programming languages most frequently scanned. This analysis helps paint a clear picture of developer engagement and usage patterns, guiding efforts to enhance productivity.

These four examples only cover a sample set of the available telemetry events, but they serve as a starting point for further exploration of Amazon Q Developer’s capabilities.

Query 1: Analyzing Accepted In-Line Code Suggestions

SELECT 
    eventTime,
    userIdentity.onBehalfOf.userId AS user_id,
    eventName,
    json_extract_scalar(requestParameters, '$.telemetryEvent.userTriggerDecisionEvent.suggestionState') AS suggestionState,
    json_extract_scalar(requestParameters, '$.telemetryEvent.userTriggerDecisionEvent.completionType') AS completionType
FROM 
    amazon_q_metrics.cloudtrail_logs
WHERE 
    eventName = 'SendTelemetryEvent'
    AND json_extract(requestParameters, '$.telemetryEvent.userTriggerDecisionEvent') IS NOT NULL
    AND json_extract_scalar(requestParameters, '$.telemetryEvent.userTriggerDecisionEvent.suggestionState') = 'ACCEPT';

Use Case:This use case focuses on how developers interact with in-line code suggestions by analyzing accepted snippets. It helps identify which users are accepting suggestions, the type of snippets being accepted (blocks or lines), and the programming languages involved. Understanding these patterns can reveal how well Amazon Q Developer aligns with the developers’ expectations.

Query Explanation: The query retrieves the event time, user ID, event name, suggestion state (filtered to show only ACCEPT), and completion type. TotalGeneratedLinesBlockAccept and totalGeneratedLinesLineAccept or discarded suggestions are not included, but this gives an idea of the developers using the service for in-line code suggestions and the lines or blocks they have accepted. Additionally, the programming language field can be extracted to see which languages are used during these interactions.

Query 2: Analyzing Chat Interactions

SELECT 
    userIdentity.onBehalfOf.userId AS userId,
    json_extract_scalar(requestParameters, '$.telemetryEvent.chatInteractWithMessageEvent.interactionType') AS interactionType,
    COUNT(*) AS eventCount
FROM 
    amazon_q_metrics.cloudtrail_logs
WHERE 
    eventName = 'SendTelemetryEvent'
    AND json_extract(requestParameters, '$.telemetryEvent.chatInteractWithMessageEvent') IS NOT NULL
GROUP BY 
    userIdentity.onBehalfOf.userId,
    json_extract_scalar(requestParameters, '$.telemetryEvent.chatInteractWithMessageEvent.interactionType')
ORDER BY 
    eventCount DESC;

Use Case: This use case looks at how developers use chat options like upvoting, downvoting, and copying code snippets. Understanding the chat usage patterns shows which interactions are most used and how developers engage with Amazon Q Developer chat. As an organization, this insight can help support other developers in successfully leveraging this feature.

Query Explanation: The query provides insights into chat interactions within Amazon Q Developer by retrieving user IDs, interaction types, and event counts. This query aggregates data based on the interactionType field within chatInteractWithMessageEvent, showcasing various user actions such as UPVOTE, DOWNVOTE, INSERT_AT_CURSOR, COPY_SNIPPET, COPY, CLICK_LINK, CLICK_BODY_LINK, CLICK_FOLLOW_UP, and HOVER_REFERENCE.

This analysis highlights how users engage with the chat feature and the interactions, offering a view of interaction patterns. By focusing on the interactionType field, you can better understand how developers interact with the chat feature of Amazon Q Developer.

Query 3: Analyzing Code Scanning Jobs Across Programming Languages

SELECT 
    userIdentity.onBehalfOf.userId AS userId,
    json_extract_scalar(requestParameters, '$.telemetryEvent.codeScanEvent.programmingLanguage.languageName') AS programmingLanguage,
    COUNT(json_extract_scalar(requestParameters, '$.telemetryEvent.codeScanEvent.codeScanJobId')) AS jobCount
FROM 
    amazon_q_metrics.cloudtrail_logs
WHERE 
    eventName = 'SendTelemetryEvent'
    AND json_extract(requestParameters, '$.telemetryEvent.codeScanEvent') IS NOT NULL
GROUP BY 
    userIdentity.onBehalfOf.userId,
    json_extract_scalar(requestParameters, '$.telemetryEvent.codeScanEvent.programmingLanguage.languageName')
ORDER BY 
    jobCount DESC;

Use Case: Amazon Q Developer includes security scanning, and this section helps determine how the security scanning feature is being used across different users and programming languages within the organization. Understanding these trends provides valuable insights into which users actively perform security scans and the specific languages targeted for these scans.

Query Explanation: The query provides insights into the distribution of code scanning jobs across different programming languages in Amazon Q Developer. It retrieves user IDs and the count of code-scanning jobs by programming language. This analysis focuses on the CodeScanEvent, aggregating data to show the total number of jobs executed per language.

By summing up the number of code scanning jobs per programming language, this query helps to understand which languages are most frequently analyzed. It provides a view of how users are leveraging the code-scanning feature. This can be useful for identifying trends in language usage and optimizing code-scanning practices.

Query 4: Analyzing User Activity across features.

SELECT 
    userIdentity.onBehalfOf.userId AS user_id,
    COUNT(DISTINCT CASE 
        WHEN json_extract(requestParameters, '$.telemetryEvent.userTriggerDecisionEvent') IS NOT NULL 
        THEN eventId END) AS inline_suggestions_count,
    COUNT(DISTINCT CASE 
        WHEN json_extract(requestParameters, '$.telemetryEvent.chatInteractWithMessageEvent') IS NOT NULL 
        THEN eventId END) AS chat_interactions_count,
    COUNT(DISTINCT CASE 
        WHEN json_extract(requestParameters, '$.telemetryEvent.codeScanEvent') IS NOT NULL 
        THEN eventId END) AS security_scans_count,
    COUNT(DISTINCT CASE 
        WHEN json_extract(requestParameters, '$.telemetryEvent.terminalUserInteractionEvent') IS NOT NULL 
        THEN eventId END) AS terminal_interactions_count
FROM 
    amazon_q_metrics.cloudtrail_logs
WHERE 
    eventName = 'SendTelemetryEvent'
GROUP BY 
    userIdentity.onBehalfOf.userId

Use Case:This use case looks at how developers use Amazon Q Developer across different features: in-line code suggestions, chat interactions, security scans, and terminal interactions. By tracking usage, organizations can see overall engagement and identify areas where developers may need more support or training. This helps optimize the use of Amazon Q Developer and helps teams get the most out of the tool.

Query Explanation: Let’s take the other events from the prior queries and additional events to get more detail overall and tie it all together. This expanded query provides a comprehensive view of user activity within Amazon Q Developer by tracking the number of in-line code suggestions, chat interactions, security scans, and terminal interactions performed by each user. By analyzing these events, organizations can gain a better understanding of how developers are using these key features.

By summing up the interactions for each feature, this query helps identify which users are most active in each category, offering insights into usage patterns and areas where additional training or support may be needed.

Enhancing Metrics with Display Names and Usernames

The previous queries had userid as a field; however, many customers would prefer to see a user alias (such as username or display name). The following section illustrates enhancing these metrics by augmenting user IDs with display names and usernames from the AWS IAM Identity Center. This will provide more human-readable user names.

In this example, the export is run locally to enhance user metrics with IAM Identity Center for simplicity. This method works well for demonstrating how to access and work with the data, but it provides a static snapshot of the users at the time of export. In a production environment, an automated solution would be preferable to capture newly added users continuously. For the purposes of this blog, this straightforward approach is used to focus on data access.

To proceed, install Python 3.8+ and Boto3, and configure AWS credentials via the CLI. Then, run the following Python script locally to export the data:

import boto3, csv
# replace this with the region of your IDC instance
RegionName='us-east-1'
# client creation
idstoreclient = boto3.client('identitystore', RegionName)
ssoadminclient = boto3.client('sso-admin', RegionName)

Instances= (ssoadminclient.list_instances()).get('Instances')
InstanceARN=Instances[0].get('InstanceArn')
IdentityStoreId=Instances[0].get('IdentityStoreId')

# query
UserDigestList = []
ListUserResponse = idstoreclient.list_users(IdentityStoreId=IdentityStoreId)
UserDigestList.extend([[user['DisplayName'], user['UserName'], user['UserId']] for user in ListUserResponse['Users']])
NextToken = None
if 'NextToken' in ListUserResponse.keys(): NextToken = ListUserResponse['NextToken']
while NextToken is not None:
    ListUserResponse = idstoreclient.list_users(IdentityStoreId=IdentityStoreId, NextToken=NextToken)
    UserDigestList.extend([[user['DisplayName'], user['UserName'], user['UserId']] for user in ListUserResponse['Users']])
    if 'NextToken' in ListUserResponse.keys(): NextToken = ListUserResponse['NextToken']
    else: NextToken = None

# write the query results to IDCUserInfo.csv
with open('IDCUserInfo.csv', 'w') as CSVFile:
    CSVWriter = csv.writer(CSVFile, quoting=csv.QUOTE_ALL)
    HeaderRow = ['DisplayName', 'UserName', 'UserId']
    CSVWriter.writerow(HeaderRow) 
    for UserRow in UserDigestList:
        CSVWriter.writerow(UserRow)

This script will query the IAM Identity Center for all users and write the results to a CSV file, including DisplayName, UserName, and UserId. After generating the CSV file, upload it to an S3 bucket. Please make note of this S3 location.

Steps to Create an Athena Table from the above CSV output: Create a table in Athena to join the existing table with the user details.

 1. Navigate to the AWS Management Console > Athena > Editor.

 2. Click on the plus to create a query tab.

 3. Run the following query to create our table. Note to update the location to your S3 bucket.

CREATE EXTERNAL TABLE amazon_q_metrics.user_data (
    DisplayName STRING,
    UserName STRING,
    UserId STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   'separatorChar' = ',',
   'quoteChar'     = '"'
)
STORED AS TEXTFILE
LOCATION 's3://{Update to your S3 object location}/'  -- Path containing CSV file
TBLPROPERTIES ('skip.header.line.count'='1');

 4. Click Run

 5. Now, let’s run a quick query to verify the data in the new table.

SELECT * FROM amazon_q_metrics.user_data limit 10;  

The first query creates an external table in Athena from user data stored in a CSV file in S3. The user_data table has three fields: DisplayName, UserName, and UserId. To specify the correct parsing of the CSV, separatorChar is specified as a comma and quoteChar as a double quote. Additionally, the TBLPROPERTIES
(‘skip.header.line.count’=’1’) flag skips the header row in the CSV file, ensuring that column names aren’t treated as data.

The user_data table holds key details: DisplayName (full name), UserName (username), and UserId (unique identifier). This table will be joined with the cloudtrail_q_metrics table using the userId field from the onBehalfOf struct, enriching the interaction logs with human-readable user names and display names instead of user IDs.

In the previous analysis of in-line code suggestions, the focus was on retrieving key metrics related to user interactions with Amazon Q Developer. The query below follows a similar structure but now includes a join with the user_data table to enrich insights with additional user details such as DisplayName and Username.

To include a join with the user_data table in the query, it is necessary to define a shared key between the cloudtrail_logs_amazon_q and user_data tables. For this example, user_id will be used.

SELECT 
    logs.eventTime,
    user_data.displayname,  -- Additional field from user_data table
    user_data.username,     -- Additional field from user_data table
    json_extract_scalar(logs.requestParameters, '$.telemetryEvent.userTriggerDecisionEvent.suggestionState') AS suggestionState,
    json_extract_scalar(logs.requestParameters, '$.telemetryEvent.userTriggerDecisionEvent.completionType') AS completionType
FROM 
    amazon_q_metrics.cloudtrail_logs AS logs  -- Specified database for cloudtrail_logs
JOIN 
    amazon_q_metrics.user_data  -- Specified database for user_data
ON 
    logs.userIdentity.onBehalfOf.userId = user_data.userid
WHERE 
    logs.eventName = 'SendTelemetryEvent'
    AND json_extract_scalar(logs.requestParameters, '$.telemetryEvent.userTriggerDecisionEvent.suggestionState') = 'ACCEPT';

This approach allows for a deeper analysis by integrating user-specific information with the telemetry data, helping you better understand how different user roles interact with the in-line suggestions and other features of Amazon Q Developer.

Cleanup

If you have been following along with this workflow, it is important to clean up the resources to avoid unnecessary charges. You can perform the cleanup by running the following query in the Amazon Athena console:

-- Step 1: Drop the tables
DROP TABLE IF EXISTS amazon_q_metrics.cloudtrail_logs;
DROP TABLE IF EXISTS amazon_q_metrics.user_data;

-- Step 2: Drop the database after the tables are removed
DROP DATABASE IF EXISTS amazon_q_metrics CASCADE;

This query removes both the cloudtrail_logs and user_data tables, followed by the amazon_q_metrics database.

Remove the S3 objects used to store the CloudTrail logs and user data by navigating to the S3 console, selecting the relevant buckets or objects, and choosing “Delete.”

If a new CloudTrail trail was created, consider deleting it to stop further logging. For instructions, see Deleting a Trail. If an existing trail was used, remove the CodeWhisperer data events to prevent continued logging of those events.

Conclusion

By tapping into Amazon Q Developer’s logging capabilities, organizations can unlock detailed insights that drive better decision-making and boost developer productivity. The ability to analyze user-level interactions provides a deeper understanding of how the service is used.

Now that you have these insights, the next step is leveraging them to drive improvements. For example, organizations can use this data to identify opportunities for Proof of Concepts (PoCs) and pilot programs that further demonstrate the value of Amazon Q Developer. By focusing on areas where engagement is high, you can support the most engaged developers as champions to advocate for the tool across the organization, driving broader adoption.

The true potential of these insights lies in the “art of the possible.” With the data provided, it is up to you to explore how to query or visualize it further. Whether you’re examining metrics for in-line code suggestions, interactions, or security scanning, this foundational analysis is just the beginning.

As Amazon Q Developer continues to evolve, staying updated with emerging telemetry events is crucial for maintaining visibility into the available metrics. You can do this by regularly visiting the official Amazon Q Developer documentation and the Amazon Q Developer’s Changelog to stay up-to-date latest information and insights.

About the authors:

David Ernst

David Ernst is an AWS Sr. Solution Architect with a DevOps and Generative AI background, leveraging over 20 years of IT experience to drive transformational change for AWS’s customers. Passionate about leading teams and fostering a culture of continuous improvement, David excels in architecting and managing cloud-based solutions, emphasizing automation, infrastructure as code, and continuous integration/delivery.

Joe Miller

Joseph Miller is a AWS Software Engineer working to illuminate Q usage insights. He specializes in Distributed Systems and Big Data applications. Joseph is passionate about high performance distributed computing, and is proficient in C++, Java and Python. In his free time, he skis and rock climbs.

My Zabbix is down, now what? Restoring Zabbix functionality

Post Syndicated from Aurea Araujo original https://blog.zabbix.com/my-zabbix-is-down-now-what/28776/

We’ve all been in a situation in which Zabbix was somehow unavailable. It can happen for a variety of reasons, and our goal is always to help you get everything back up and running as quickly as possible. In this blog post, we’ll show you what to do in the event of a Zabbix failure, and we’ll also go into detail about how to work with the Zabbix technical support team to resolve more complex issues.

Step by step: Understanding why Zabbix is unavailable

When Zabbix becomes unavailable, it’s important to follow a few key steps to try to resolve the problem as quickly as possible.

  1. Check the service status. First, verify if your Zabbix service is truly inactive. You can do this by accessing the machine where Zabbix is installed and checking the service status using a command like systemctl status zabbix-serveron Linux.
  2. Analyze the Zabbix logs. Check the Zabbix logs for any error messages or clues about what may have caused the failure.
  3. Restart the service. If the Zabbix service has stopped, try restarting it using the appropriate command for your operating system. For example, on Linux, you can use sudo systemctl restart zabbix-server.
  4. Check the database connectivity. Zabbix uses a database to store data and Zabbix server configurations. Make sure that the database is accessible and functioning properly. You can test database connectivity using tools like ping or telnet.
  5. Check your available disk space. Verify that there is available disk space on the machine where Zabbix is installed. A lack of disk space is a common cause of system failures.
  6. Evaluate dependencies. Make sure all Zabbix dependencies are installed and working correctly. This includes libraries, services, and any other software required for Zabbix to function.

If the problem persists after carrying out these steps, it may be necessary to refer to the official Zabbix documentation, seek help from the official Zabbix forum, or contact the Zabbix technical support team, depending on the severity and urgency of the situation.

Making the most of a Zabbix technical support contract

If you or your company have a Zabbix technical support contract, access to our global team of technical experts is guaranteed. This is an ideal option for resolving more complex or urgent issues. Here are a few steps you can follow when contacting the Zabbix technical support team:

  1. Gather all important information. Before contacting the Zabbix technical support team, gather all relevant information about the issue you’re facing. This can include error messages, logs, screenshots, and any steps you’ve already taken to try to resolve the issue.
  2. Open a ticket with the Zabbix technical support team. Contact Zabbix technical support by opening a ticket on the Zabbix Support System. Provide all the information gathered in the previous step to help the technicians understand the problem and find a solution as quickly as possible.
  3. Explain exactly how Zabbix crashed. When describing the problem, be as precise and detailed as possible. Include information such as the Zabbix version you are using, your operating system, your network configuration, and any other relevant details that might help our team diagnose the issue.
  4. Be available to follow up on the ticket. Once you’ve opened a ticket, be available to provide additional information or clarify any questions the support technicians may have. This will help speed up the problem resolution process.
  5. Follow the Zabbix technical support team’s recommendations. After receiving recommendations, follow them carefully and test to see whether they resolve the issue. If the problem persists or if new issues arise, inform the Zabbix technical support team immediately so they can continue assisting you.

A Zabbix technical support subscription gives you access to a team of Zabbix experts who can help you configure and troubleshoot your Zabbix environment. Check out the benefits of each type of subscription on the Zabbix website and make sure you have all the support you need to keep your monitoring fully operational.

 

The post My Zabbix is down, now what? Restoring Zabbix functionality appeared first on Zabbix Blog.

Monitoring MariaDB Clusters and MaxScale with Anders Karlsson

Post Syndicated from Michael Kammer original https://blog.zabbix.com/monitoring-mariadb-clusters-and-maxscale-with-anders-karlsson/28718/

The heart and soul of a Zabbix Summit is the wide range of expert speakers who show up each year to share their experience, knowledge, and discoveries. Accordingly, we’re continuing our series of interviews with Summit 2024 speakers by having a chat with MariaDB Sales Engineer Anders Karlsson. He’ll grace our stage at Summit 2024  to talk about his 4 decades of work experience and share how he uses a variety of Zabbix features to monitor MariaDB clusters and MariaDB MaxScale.

Please tell us a bit about yourself and the journey that led you to MariaDB.

I have been working with databases nearly all of my professional life, which is more than 40 years by now. My first IT job was as a system administrator on a development system for Telco equipment running UNIX on a PDP/11 70. This was fun, and I got to use Unix very early (the early 1980’s) and I was also there at the start of the Internet (by emailing through UUCP to the US and then through what was then the Internet).

Following that, I joined another Telco company, which used a rather unknown database technology called Oracle (version 4.1.4). When this company moved their operations from Stockholm (where I lived) to Luxembourg, I decided to leave and look for other opportunities. I heard that Oracle was looking for people and I got a job there as a support engineer. At Oracle I soon got involved with lots of things beyond Tech Support – I was a trainer, a consultant, and eventually a sales engineer.

I left Oracle in the early 1990’s to join a small application development company as a developer, but this really wasn’t for me, so I soon left and joined Informix instead. I was at Informix until 1996 or so and then I worked for some other small companies around the end of the millennium. Next, I joined forces with a couple of old friends to develop a database solution. This wasn’t very successful, and I still needed a job.

I first ended up with TimesTen before they ran out of luck. After a year or so of freelancing, I was approached by an old friend from the Informix days who was now the sales manager for MySQL in Scandinavia. I joined MySQL in 2004 as a sales engineer and was there until Oracle took over. I then worked for a small Swedish startup for a couple of years, but I missed sales engineering, so when I got an offer to join MariaDB in 2012 I said yes.

How long have you been using Zabbix? What kind of Zabbix tasks do you get up to on a daily basis?

I have known about Zabbix and used it occasionally for a while, but while preparing for Zabbix Summit 2024 I have gotten to use it “in anger” a bit more. There are pros and cons to it, but in general I like it. It does have a lot of “Open Source” feel to it, but that is not really an issue for me.

Can you give us a few clues about what we can expect to hear during your Zabbix Summit presentation?

I will focus on monitoring MariaDB Clusters running Galera Cluster and the MariaDB MaxScale database proxy. Monitoring individual MariaDB servers is easy out of the box with Zabbix, but when you have a cluster you have to monitor certain cluster-wide attributes. MariaDB MaxScale keeps track of the state of the server in the cluster in detail and the cluster as whole, and I will show how to pull cluster-wide data from MaxScale using the MaxScale REST/JSON API and how to use that to build triggers and graphs in Zabbix. I will finish up by doing a demo of this with MariaDB MaxScale and a Galera Cluster.

What led you to the topic of Monitoring MariaDB Clusters and MariaDB MaxScale with Zabbix?

The main thing was that although there are community provided Zabbix templates for MariaDB MaxScale, and Galera can be monitored largely by the Zabbix agent, using these typically does not provide as much in terms of cluster-wide monitoring as I would like. It’s important to know how the reads and writes are distributed, what the state of the database cluster is, etc.

How do you see the role of Zabbix in MariaDB in the near future? Are you planning to use it for any other new tasks?

My next goal is to see if I can write a blog for MariaDB on Zabbix monitoring with some emphasis on MariaDB MaxScale.

The post Monitoring MariaDB Clusters and MaxScale with Anders Karlsson appeared first on Zabbix Blog.

Providing Best-in-Class Security with Heather Diaz of fTLD Registry

Post Syndicated from Michael Kammer original https://blog.zabbix.com/providing-best-in-class-security-with-heather-diaz-of-ftld-registry/28585/

As the Vice President of Compliance and Policy at fTLD Registry Services, Heather Diaz is a security expert with over a decade of experience in ensuring the legal, compliance, and strategic alignment of the top-level domains .Bank and .Insurance. She is a compliance and ethics professional and leads the policy and security compliance functions at fTLD.

We sat down with her to learn more about how Zabbix makes her job easier, why she appreciates the inherent flexibility of our solutions, and how she works with our team to help make sure fTLD’s domains are as secure as they can possibly be.

Can you give us a bit of background on fTLD and what it does? What makes your business proposition stand out?

fTLD Registry is the domain authority for .Bank and .Insurance – the most trusted and exclusive domain extensions for banks, insurers, and producers. Our mission is to offer these industry-created and governed domains a shield against cyberattacks and fraud, delivering peace of mind with website and email security.

Since 2011, fTLD Registry has collaborated with experts in cybersecurity, domain security, and the banking and insurance sectors to develop Security Requirements that mitigate cyber threats such as phishing, spoofing, cybersquatting, and man-in-the-middle attacks.

Why is monitoring especially important for fTLD?

Security monitoring is a key value for .Bankers (banks who have switched to .Bank) and our .Insurance customers as well. They receive reporting from our customized Zabbix monitoring system whenever security vulnerabilities are detected. This ensures we provide proactive compliance security monitoring, which allows them to address any findings and keep their .Bank and .Insurance websites and email channels secure.

Are there any specific points you were looking to address with a new monitoring approach?

fTLD has continued to enhance our security requirements for .Bank and .Insurance to address new and evolving cybersecurity threats and provide more secure and trusted online interactions for the financial services sector and their customers. We do this by partnering with Zabbix’s security experts and engineers to make sure our security requirements and monitoring continue to provide best-in-class domain security for .Bank and .Insurance.

Can you please share any business or operational areas that have seen improvements since implementing Zabbix?

Our compliance area has enjoyed having time to engage with .Bank and .Insurance customers to educate them about how to address any security vulnerabilities, as the Zabbix system takes care of sending notifications and warnings to our customers. Not only that, the Zabbix system gives us a dashboard with easy-to-interpret metrics, the ability to generate ad-hoc reporting, and with a number of important data elements integrated, such as customer contact information and their domain status (e.g., live), so our team can always have secure employee access to security monitoring data no matter where in the world we are working. Here are just some of the external interfaces, Agent2 plugins, and custom notifications we developed together with the Zabbix team.

External interfaces:

  • ICANN CZDS (to get a list of zones)
  • Whois (to get zone and registrar details)
  • CRM (to get a list of verification contacts)
  • Marketing system (to get a list of additional zone details)
  • Subdomain discoverer (to discover zone records)

Agent2 plugins

  • DNSSEC plugin (for DNSSEC-related checks)
  • Nameservers plugin (to perform nameserver-related checks)
  • Certificate plugin (to validate TLS ciphers and certificates)
  • Port check plugin (to check what ports are open and verify the security of opened ports)
  • DMARC/SPF plugin (to check presence and validity of DMARC and SPF records)
  • Web redirect plugin (to check validity of HTTP headers and redirects)

Notifications

  • Media types to send compliance reports

Is there anything else you’d like to share about Zabbix and our capabilities?

Zabbix is a great partner for security monitoring, as they’re willing to develop new features to provide a service that meets our exacting business requirements and their support is highly responsive. Most solutions come as they are. With Zabbix, we were able to customize and adapt their solution when new needs came up. My favorite feature is how we provide automated reporting to our customers and key stakeholders – it’s all automated and handled by the Zabbix platform.

The post Providing Best-in-Class Security with Heather Diaz of fTLD Registry appeared first on Zabbix Blog.

Reducing Alert Noise with Birol Yildiz

Post Syndicated from Michael Kammer original https://blog.zabbix.com/reducing-alert-noise-with-birol-yildiz/28643/

Zabbix Summit 2024 is almost here, and we’re giving you a sneak peek into what you can expect to see on our main stage this year via a series of short interviews with a few of the eminent speakers who will grace us with their presence. First up is Birol Yildiz, the CEO and Co-founder of ilert GmbH and a man who is deeply passionate about keeping alert noise and fatigue to a minimum.

Please tell us a bit about yourself and the journey that led you to ilert GmbH.

My journey in the tech industry began with a deep passion for creating solutions that simplify and improve the lives of IT professionals. Before co-founding ilert GmbH, I spent over a decade working in various IT roles, ranging from software development to operations. I noticed that while monitoring systems were becoming increasingly sophisticated, the process of alert management and incident response was lagging behind.

This gap inspired me to create ilert, a platform focused on bridging that divide by optimizing alerting processes and reducing response times. Our goal at ilert has always been to empower teams with the tools they need to stay ahead of incidents, ensuring that their systems run smoothly and efficiently.

How long have you been using Zabbix? What kind of Zabbix-related tasks are you involved in on a daily basis?

Zabbix has been an integral part of ilert since 2018, when we first developed one of our early integrations with the platform. Recognizing its popularity among our customer base, we enhanced this integration in 2020, transforming it into a native integration and solidifying our partnership with Zabbix as a technology partner. Since then, Zabbix has become one of the most popular integrations within ilert.

On a daily basis, my involvement with Zabbix includes overseeing the continued optimization of our integration, ensuring that it meets the evolving needs of our users. I work closely with our development and support teams to identify and implement improvements based on user feedback and the latest developments from Zabbix.

Can you give us a few clues about what we can expect to hear during your Zabbix Summit presentation?

Alert fatigue has long been a significant challenge for the DevOps community, often leading to decreased efficiency and increased stress among professionals. In my presentation, we will explore innovative strategies that leverage AI to mitigate alert noise.

I’ll be discussing how to maximize the efficiency of your incident response process by leveraging Zabbix with advanced alerting and on-call management tools like ilert. I’ll share insights on reducing alert fatigue, improving incident response times, and ensuring that critical alerts reach the right people at the right time.

This talk will be particularly valuable for DevOps engineers looking to optimize their alert management systems and reduce the cognitive load caused by alert fatigue. Zabbix administrators will find it insightful, especially if they are interested in integrating advanced AI techniques into their monitoring workflows to achieve better performance and reliability.

Moreover, AI and machine learning enthusiasts will gain practical knowledge about applying AI in IT monitoring and alerting, making this session a comprehensive resource for anyone looking to advance their alert management strategies.

Reducing alert noise is something that’s on almost everyone’s wish list, but was there any particular incident or aspect of your professional life that made you want to focus on this topic?

Absolutely. There was a specific incident early in my career that left a lasting impact on me. We were using a monitoring system that generated a significant number of alerts, most of which were non-critical. One weekend, a critical issue was buried in a flood of low-priority alerts, leading to a delayed response and significant downtime for the business.

This incident underscored the importance of not just having a monitoring system in place but ensuring that it was configured to minimize noise and prioritize what truly matters. That experience drove me to focus on creating solutions that help teams filter out the noise and respond quickly to what’s really important, which is a core principle behind ilert’s offerings.

Are there any other similar issues that you can envision tackling with Zabbix?

Yes, beyond reducing alert noise, there’s a lot of potential in enhancing the collaboration between teams during incidents. For example, automating incident communication and resolution processes is an area where I see great value. By integrating Zabbix with incident management platforms like ilert, teams can not only reduce noise but also streamline communication, ensuring that the right people are involved at the right time and that resolution steps are clear and actionable.

Another area is optimizing the way multiple on-call teams work together using Zabbix and incident response platforms like ilert. In many organizations, different teams are responsible for specific sets of host groups in Zabbix, and it’s crucial that each team only receives alerts for the services they are directly responsible for. These are just a few examples of how we can continue to evolve our approach to incident management in conjunction with Zabbix.

The post Reducing Alert Noise with Birol Yildiz appeared first on Zabbix Blog.

Monitoring Self-Service Markets with Zabbix and IoT

Post Syndicated from Aurea Araujo original https://blog.zabbix.com/monitoring-self-service-markets-with-zabbix-and-iot/28422/

QU!CK Scan & Go, a startup specializing in self-service markets, required a monitoring system that could allow a comprehensive view of operations. Read on to see how Zabbix provided them with a solution that positively impacted their operations as well as their finances.

The convenience of having access to an establishment supplying staple foods around the clock is the motivating factor behind the rise of QU!CK Scan & Go. Since 2021, QU!CK Scan & Go has been developing self-service mini market systems, available in residential complexes and corporate buildings.

Available 24 hours a day, 7 days a week, the technology developed by QU!CK Scan & Go allows markets to be open at all times, with 100% self-service. Customers select the products they want, confirm the price by scanning a barcode, and complete the purchase in their own app with a credit card or virtual wallet.

QU!CK Scan & Go was the first company in the self-service market segment to operate in Argentina. As of this writing, they have 25 self-service stores located in Argentina and 2 in the United States.

The challenge

With the rapid growth in their business, QU!CK Scan & Go needed to be able to easily visualize operations in order to handle environmental issues and avoid product loss due to external factors. In the event of a power outage, for instance, refrigerators and freezers will fail to function, a problem that may take considerable time and effort to fix.

This scenario  isn’t an abstract hypothetical – power outages are a recurring issue in Argentina. In 2021 and 2022, the average length of a power outage was 5 hours. For freezers storing products such as ice cream, frozen processed foods, and other perishable items, that’s more than enough time for the products to thaw and become unusable, resulting in severe financial losses.

The solution

QU!CK Scan & Go’s search for a solution led them to Zabbix by way of CTL Information Technology, a Zabbix Certified Partner in Argentina.  Juan Guido Camaño, CEO of CTL, immediately grasped the fact that Zabbix provided the perfect solution for what QU!CK Scan & Go needed to monitor.

“Zabbix was our first, second and third choice, due to our extensive experience with the tool. We did not believe that there would be any better alternative.”

– Juan Guido Camaño, CEO of CTL

At the beginning of the implementation project, CTL identified all possible variables necessary for monitoring that should generate alarms in the case of an extraordinary event. These included:

  • Power outages
  • Internet connection status
  • Opened doors
  • Ambient and air conditioning temperatures
  • Refrigerator and freezer temperatures

In 2021 and 2022, the team at CTL carried out the proof of concept and the implementation of the tool in the first self-service markets, following a stage-by-stage plan.

First, they had to configure the Zabbix Agent on the monitoring device. After that, we created a standard monitoring model to be used in all establishments, according to data collection and alarm triggering needs. The alarms were subsequently adjusted, with possible responses implemented according to each variable identified. At that point, data visualization was organized in an external system just for reviewing the integrated dashboards.

Thanks to the implementation of IoT devices to control the temperature and the opening and closing of doors, alerts are sent to Zabbix in the event of unusual activity, such as very high or low temperatures, doors opened without supervision, and refrigerator doors open longer than the stipulated time, among other issues.

The results

Since the implementation of Zabbix project in QU!CK Scan & Go’s self-service markets, a variety of benefits have been apparent, including:

  • Increased control of self-service establishments
  • Faster resolution of incidents
  • Improved visualization of operations
  • Increased availability of services

However, the biggest returns on investment were observed at the financial level. With power outage monitoring and quick corrective actions, losses of perishable products have decreased by 75%.

“Losses of refrigerated products ceased to be an issue due to constant monitoring and immediate alerts in case of incidents during power outages.” – Juan Guido Camaño, CEO of CTL

Additionally, with real-time visualization of operations and business monitoring, the profitability of refrigerated products during power outage incidents has increased by 100%. Currently, QU!CK Scan & Go is the leading company in the self-service market segment in Argentina in terms of turnover, with a rapidly increased brand value.

“In a 100% self-service business model, investments made in incident identification technologies have a direct impact on the company’s results.” – Marcos Acuña, QU!CK Scan & Go

What’s next

While successful, the Zabbix project carried out by CTL and QU!CK Scan & Go is far from finished. The implementation of Zabbix in the company is accelerating at the same rate that new establishments are opened, and the proposal is to continue expanding this monitoring project by completely migrating data visualization to Zabbix.

“Having already managed to ensure the availability of the services associated with QU!CK operations, we are now focusing on the continuous infrastructure optimization.” – Juan Guido Camaño, CEO of CTL

For QU!CK Scan & Go, Zabbix has become much more than an IT infrastructure monitoring provider. Our solutions have improved their business and brought added value to their brand.

“With Zabbix, the return on investment after opening a new location is achieved 50% faster than it used to be.” – Marcos Acuña, Founder of QU!CK Scan & Go

Our goal of promoting seamless services to the technology market together with our partners is most visible in situations like this one, when we’re able to go beyond basic monitoring and position Zabbix as a vital support service for strategic decision making. To find out more about what Zabbix can do for customers in the retail sector, visit us here.

The post Monitoring Self-Service Markets with Zabbix and IoT appeared first on Zabbix Blog.

Case Study: Enhancing Security with Zabbix and fTLD Registry

Post Syndicated from Michael Kammer original https://blog.zabbix.com/case-study-enhancing-security-with-zabbix-and-ftld-registry/28415/

A top-level domain (TLD) is the part of a URL that comes after the last dot in a domain name. While most are familiar with the first TLDs of .com, .net, and .org, there are more than 1,400 TLDs. fTLD Registry (fTLD) is a global coalition of banks, insurance companies, and financial services trade associations who ensure the .Bank and .Insurance TLDs are governed in the best interests of the financial sector and their customers.

The challenge

In 2011, fTLD was formed to secure and manage .Bank and .Insurance. Due to the high risk of fraud in the financial sector, keeping domains (websites and email) secure and out of the hands of malicious actors was paramount – and that can’t be done without close, careful security monitoring. Unfortunately, fTLD was initially dependent on a monitoring solution that required manual compliance work, which made it difficult to get actionable information to its customers and partners. When they began to seek out a replacement solution, fTLD realized that Zabbix promised exactly the features they required, which prompted them to make the switch.

The solution

For every domain in .Bank and .Insurance that meets minimum technical requirements, Zabbix’s system performs multiple security compliance checks. These checks cover a range of domain security features to ensure .Bank and .Insurance websites and email services have implemented a multi-layered domain defense by way of the Security Requirements required by fTLD. Specifically, Zabbix checks and monitors for:

  • Authoritative name servers, which guarantee that the name servers for .Bank and .Insurance websites have the required security features.
  • Enhanced DNS security, which involves the proper validation of DNS Security Extensions (DNSSEC) with strong cryptographic algorithms to prevent unauthorized changes to domain data and cyberattacks, including domain spoofing and domain hijacking.
  • Digital identity and robust encryption, which confirm TLS certificates and TLS version requirements for secure web connections and encrypts all communications for the safe and secure transmission of personal information and financial transactions.
  • Email security, which increases the deliverability of email and checks for the deployment of DMARC and SPF to protect against phishing and spoofing.

When Zabbix detects an issue, it automatically notifies involved parties, including the registrar and the customer using the domain. As a client, fTLD has access to all the security monitoring data via a custom dashboard. Zabbix puts critical compliance security monitoring information at fTLD’s fingertips, helping them make good on their promise of airtight security for banks, insurers, and producers and their customers through .Bank and .Insurance domains.

The results

Heather Diaz, Vice President, Compliance and Policy, leads the security function for fTLD and attests that:

“With Zabbix as a partner, we have peace of mind knowing that domain security is closely monitored. We can then focus on engaging with customers to help them get the full cyber benefits of using .Bank and .Insurance to protect their brand and their customer data.”

By entrusting Zabbix with security monitoring, fTLD has seen a variety of benefits, including:

  • Considerable growth in overall security compliance, as Zabbix monitoring has provided better, more accessible, and more reliable security information.
  • A tangible boost in productivity, thanks to automated customer and partner notifications.
  • A bird’s-eye view of stats across all domains as well as detailed information for individual domains.
  • Adaptive compliance security monitoring through daily checks, which help maintain a proactive defense against cyberattacks.
  • Security expertise from Zabbix to ensure that fTLD’s Security Requirements represent best practices and security measures to ensure the security of .Bank and .Insurance domains and their customers’ well-placed trust.

In conclusion

fTLD is changing the way banks, insurers, and producers around the world interact with their customers by offering trusted, verified, more secure domains. They trust Zabbix to guarantee a multi-layered domain defense strategy by alerting fTLD and its customers to detected anomalies or security issues.

To learn more about what Zabbix can do for customers in banking and finance, visit us here.

The post Case Study: Enhancing Security with Zabbix and fTLD Registry appeared first on Zabbix Blog.

Monitoring Configuration Backups with Zabbix and Oxidized

Post Syndicated from Brian van Baekel original https://blog.zabbix.com/monitoring-configuration-backups-with-zabbix-and-oxidized/28260/

As a Zabbix partner, we help customers worldwide with all their Zabbix needs, ranging from building a simple template all the way to (massive) turn-key implementations, trainings, and support contracts. Quite often during projects, we get the question, “How about making configuration backups of our network equipment? We need this, as another tool was also capable of doing this!”

The answer is always the same – yes, but no. Yes, technically it is possible to get your configuration backups in Zabbix. It’s not even that hard to set up initially. However, you really should not want configuration backups. Zabbix is not made for them, and you will run into limitations within minutes. As you can imagine, the customer is never happy with this limitation, and some actively start to question where we think the limitation is to see if it is a limitation for them as well. So we simply set up an SSH agent item and get that config out:

Voila! Once per hour Zabbix will log in to that device, execute the command ‘show full-configuration,’ and get back the result. Frankly, it just works. You check the Monitoring -> Latest data section of the host and see that there is data for this item. Problem solved, right?

No. As a matter of fact, this is where the problems start. Zabbix allows us to store up to 64KB of data in a item value. The above screenshot is of a (small) fortigate firewall and the config if stored in a text file is just over 1.1MB. So, Zabbix truncates the data, which renders the backup useless –  restore will never work. At the same time, Zabbix is not sanitizing the output, so all secrets are left in it.

To make it even worse, it’s challenging to make a diff of different config versions/revisions – that feature is just not there. Most of the time, the customer is at this point convinced that Zabbix is not the right tool and the next question pops up – “Now what? How can we fix this?” This is where our added value is presented, as we do have a solution here which is rather affordable (free) as well.

The solution is Oxidized, which is basically Rancid on steroids. This project started years ago and is released under the Apache 2.0 license. We found it by accident, started playing around with it, and never left it. The project is available on Github (https://github.com/ytti/oxidized) and written in Ruby. Incidentally, if you (or your company) have Ruby devs and want to give something back to the community, the Oxidized project is looking for extra maintainers!

At this point, we show our customers the GUI of Oxidized, which in our case involves just making backups of a few firewalls:

So we have the name, the model, and (in this case) just one group. The status shows whether the backup was successful or not, the last update and when the last change was detected. At the same time, under actions, we can get the full config file, look at previous revisions(and diff them) combined with a ‘execute now’ option.

Looking at the versions, it’s simply showing this:

This is already giving us a nice idea of what is going on. We see the versions and dates at a glance, but the moment we check the diff option, we can easily see what was actually changed:

The perfect solution, except that it is not integrated with Zabbix. That means double administration and a lot of extra work, combined with the inevitable errors – devices not added, credential mismatches, connection errors, etc. Luckily, we can easily change the format of the above information from GUI to json by just adding ‘.json’ at the end of the url:

http://<IP/DNS>:<PORT>/nodes.json

This will give the following output:

[
  {
    "name": "fw-mid-01",
    "full_name": "fw-mid-01",
    "ip": "192.168.4.254",
    "group": "default",
    "model": "FortiOS",
    "last": {
      "start": "2024-06-13 06:46:14 UTC",
      "end": "2024-06-13 06:46:54 UTC",
      "status": "no_connection",
      "time": 40.018852483
    },
    "vars": null,
    "mtime": "unknown",
    "status": "no_connection",
    "time": "2024-06-13 06:46:54 UTC"
  },
  {
    "name": "FW-HUNZE",
    "full_name": "FW-HUNZE",
    "ip": "192.168.0.254",
    "group": "default",
    "model": "FortiOS",
    "last": {
      "start": "2024-06-13 06:46:54 UTC",
      "end": "2024-06-13 06:47:04 UTC",
      "status": "success",
      "time": 10.029043912
    },
    "vars": null,
    "mtime": "2024-06-13 06:47:05 UTC",
    "status": "success",
    "time": "2024-06-13 06:47:04 UTC"
  }
]

As you might know, Zabbix is perfectly capable of parsing json formats and creating items and triggers out of them. A master item, dependent lld (https://blog.zabbix.com/low-level-discovery-with-dependent-items/13634/), and within minutes you’ve got Oxidized making configuration backups while Zabbix is monitoring and alerting on the status:

At this point we’re getting close to a nice integration, but we haven’t overcome the double configuration management yet.

Oxidized can read its configuration from multiple sources, including a CSV file, SQL, SQLite, MySQL or HTTP. The easiest is a CSV file – just make sure you’ve got all information in the correct column and it works. An example:

Oxidized config:

source:
  default: csv
  csv:
    file: /var/lib/oxidized/router.db
    delimiter: !ruby/regexp /:/
    map:
      name: 0
      ip: 1
      model: 2
      username: 3
      password: 4
    vars_map:
      enable: 5

CSV file:

rtr01.local:192.168.1.1:ios:oxidized:5uP3R53cR3T:T0p53cR3t

Great, now we have to configure 2 places (Zabbix and Oxidized) and get a username/password cleartext in a CSV file. What about SQL as a source, and letting it connect to Zabbix? From there we should be able to get information regarding the hostname, but somehow we need the credentials as well. That’s not a default piece of information in Zabbix, but UserMacros can save us here.

So on our host we add 2 extra macros:

At the same time, we need to tell Oxidized what kind of device it is. There are multiple ways of doing this, obviously. A tag, a usermacro, hostgroups, you name it. In order to do this, we place a tag on the host:

Now we make sure that Oxidized is only taking hosts with the tag ‘oxidized’ and extract from them the host name, IP address, model, username, and password:

+-----------+----------------+---------+------------+------------------------------+
| host      | ip             | model   | username   | password                     |
+-----------+----------------+---------+------------+------------------------------+
| fw-mid-01 | 192.168.4.254  | fortios | <redacted> | <redacted>                   |
| FW-HUNZE  | 192.168.0.254  | fortios | <redacted> | <redacted>                   |
+-----------+----------------+---------+------------+------------------------------+

This way, we simply add our host in Zabbix, add the SSH credentials, and Oxidized will pick it up the next time a backup is scheduled. Zabbix will immediately start monitoring the status of those jobs and alert you if something fails.

This blog post is not meant as a complete integration write down, but rather as a way to give some insight into how we as a partner operate in the field, taking advantage of the flexibility of the products we work with. This post should give you enough information to build it yourself, but of course we’re always available to help you or just build it as part of our consultancy offering.

 

The post Monitoring Configuration Backups with Zabbix and Oxidized appeared first on Zabbix Blog.

Zabbix 7.0 – Everything You Need to Know

Post Syndicated from Michael Kammer original https://blog.zabbix.com/zabbix-7-0-everything-you-need-to-know/28210/

After plenty of breathless anticipation, we’re proud to announce the release of the latest major Zabbix version – the new and improved Zabbix 7.0 LTS. This release is the direct result of user feedback and delivers a variety of improvements, including cloud-native Zabbix proxy scalability, website transaction monitoring, improved data collection speed and scalability, new dashboard widgets, major network discovery speed improvements, new templates and integrations, and more!

Without further ado, let’s take a whistle-stop tour of what you need to know:

Synthetic end-user web monitoring

Busy enterprises can now monitor multiple websites and applications by defining flexible multi-step browser-based scenarios. 7.0 LTS also makes it easy to capture screenshots of the current website state, collect and visualize website performance and availability metrics, extract, monitor, and analyze web application data, and get alerts when issues are discovered.

Zabbix proxy high availability and load balancing

When it’s time to expand, Zabbix 7.0 LTS makes it easy to scale a Zabbix environment, guaranteeing 100% availability with automatic proxy load balancing and high availability features, including the ability to assign hosts to load-balanced proxy groups and seamlessly scale a Zabbix environment by deploying additional proxies.

Faster, more efficient Zabbix proxies

Zabbix proxy now fully supports in-memory data storage for collected metrics. Users can choose from Disk, Memory, and Hybrid proxy buffer modes, all of which are ideal for embedded hardware. In addition, memory mode enables the support of edge computing use cases. Users can expect 10-100x better proxy performance by switching to memory or hybrid modes, depending on allocated hardware.

Centralized control of data collection timeouts

Centralizing control of data collection timeouts enables better support for metrics and custom checks, taking longer data collection time intervals. Data collection timeouts can be defined per item-type and overridden per proxy or on an individual item level. In addition, timeouts are now fully configurable in the Zabbix GUI or via Zabbix API.

Faster and more scalable data collection

Synchronous poller processes have been replaced with asynchronous pollers, which improves the speed and scalability of metric polling, particularly for agent, SNMP, and HTTP checks. The next metric can now be polled before waiting for a response from a previously requested metric, and up to 1,000 concurrent checks can now be supported per poller process.

New ways to visualize data

A variety of new dashboard widgets have been introduced, with the goal of giving users detailed information about their monitored metrics and infrastructure at a glance.

Dynamic dashboard widget navigation

Speaking of dashboard widgets, a new communication framework has also been introduced for dashboard widgets, enabling communication between widgets, allowing a widget to serve as a data source for other widgets, and dynamically updating information displayed in a dashboard widget based on the data source.

Faster network discovery

Discovering services and hosts has never been easier, thanks to support of parallelization while performing network discovery. Concurrency support allows for massive improvements in network discovery speed and simplifies host and service discovery while scanning large network segments.

Better security via enterprise-grade multi-factor authentication

Out-of-the box support of multi-factor authentication enables enterprise-grade security and added flexibility for configuring user authentication methods. Support MFA providers include time-based one-time Password (TOTP) and Duo Universal Prompt authentication.

More flexible resource discovery and management

Low-level discovery has received a variety of improvements, which enable enhanced host configuration and management flexibility when discovering hosts in complex environments, such as VMware or Kubernetes.

New templates and integrations

In response to user demand, Zabbix 7.0 LTS comes pre-packaged with a range of new templates for the most popular vendors and cloud providers.

Zabbix 7.0 training updates

All Zabbix training materials have been updated based on the new functionalities that have been added to the product since Zabbix 6.0.

Everyone is welcome to sharpen their skills, but if you’re a Zabbix 6.0 Certified Specialist or Certified Professional you can master Zabbix 7.0 LTS in just one day with our Upgrade Courses. As a 7.0 Specialist, you’ll be able to automate user provisioning with the Just-in-time (JIT) feature, monitor websites with new synthetic end-user monitoring, leverage new visualization features, and enhance the speed and performance of your data collection.

The 7.0 Certified Professional course covers proxy group configuration with high availability and load balancing, improved proxy data collection, new SNMP bulk monitoring, and enhanced host discovery for VMware, Kubernetes, and Cloud infrastructures.

We’re also happy to organize private trainings for organizations of any size, so don’t hesitate to get in touch!

Upcoming 7.0 events

If you’re looking for more information regarding Zabbix 7.0, you’re in luck! You can tune in to the “What’s new in Zabbix 7.0” webinar on June 11 at 12 PM CST or June 12 at 10 AM EEST. If you’d prefer a more hands-on approach, the following workshops are also available:

• “Zabbix Proxy High-availability and Load Balancing” (June 18, 6 PM EEST)
• “New Web Monitoring Features in Zabbix 7.0” (June 20, 6 PM EEST)

While you’re at it, feel free to explore Zabbix 7.0 LTS webinars and workshops in other languages. You can also check out worldwide events related to Zabbix 7.0 LTS, including our free in-person meetup in Riga on June 19 and Zabbix Summit 2024 this fall. 

Ready to upgrade or migrate?

With a brand-new version out, there’s never been a better time to take advantage of our upgrade or migration services. Let our team take the risk out of migrating or upgrading to 7.0, giving you the latest version at a lower cost and with minimal disruption to your organization.

Need a consultation about the latest version?

Not sure about how to get the most out of Zabbix 7.0? Our expert consultants can answer any questions related to the architecture of your infrastructure, the implementation of a back-up strategy, and your capacity planning, while providing strategic advice on which 7.0 services are right for you.

Make your contribution as a translator

The Documentation 7.0 translation project is now live, which means that you can help localize Zabbix 7.0 documentation in multiple languages. Your efforts will help make Zabbix accessible to users around the globe, and you’ll also receive a reward for your contributions. The guidelines, which contain essential information about the project, are available here.

Useful links

To see what else is in store for the future, have a look at the Zabbix roadmap.

You can find the instructions and download the new version on the Download page.

Detailed, step-by-step upgrade instructions are available on our Upgrade procedure page.

Learn about new features and changes introduced in Zabbix 7.0 LTS by visiting the What’s new in Zabbix 7.0 page.

The What’s new documentation section provides a detailed description of the new features.

Take a look at the release notes to see the full list of new features and improvements.

 

The post Zabbix 7.0 – Everything You Need to Know appeared first on Zabbix Blog.

Free network flow monitoring for all enterprise customers

Post Syndicated from Chris Draper original https://blog.cloudflare.com/free-network-monitoring-for-enterprise


A key component of effective corporate network security is establishing end to end visibility across all traffic that flows through the network. Every network engineer needs a complete overview of their network traffic to confirm their security policies work, to identify new vulnerabilities, and to analyze any shifts in traffic behavior. Often, it’s difficult to build out effective network monitoring as teams struggle with problems like configuring and tuning data collection, managing storage costs, and analyzing traffic across multiple visibility tools.

Today, we’re excited to announce that a free version of Cloudflare’s network flow monitoring product, Magic Network Monitoring, is available to all Enterprise Customers. Every Enterprise Customer can configure Magic Network Monitoring and immediately improve their network visibility in as little as 30 minutes via our self-serve onboarding process.

Enterprise Customers can visit the Magic Network Monitoring product page, click “Talk to an expert”, and fill out the form. You’ll receive access within 24 hours of submitting the request. Over the next month, the free version of Magic Network Monitoring will be rolled out to all Enterprise Customers. The product will automatically be available by default without the need to submit a form.

How it works

Cloudflare customers can send their network flow data (either NetFlow or sFlow) from their routers to Cloudflare’s network edge.

Magic Network Monitoring will pick up this data, parse it, and instantly provide insights and analytics on your network traffic. These analytics include traffic volume overtime in bytes and packets, top protocols, sources, destinations, ports, and TCP flags.

Dogfooding Magic Network Monitoring during the remediation of the Thanksgiving 2023 security incident

Let’s review a recent example of how Magic Network Monitoring improved Cloudflare’s own network security and traffic visibility during the Thanksgiving 2023 security incident. Our security team needed a lightweight method to identify malicious packet characteristics in our core data center traffic. We monitored for any network traffic sourced from or destined to a list of ASNs associated with the bad actor. Our security team setup Magic Network Monitoring and established visibility into our first core data center within 24 hours of the project kick-off. Today, Cloudflare continues to use Magic Network Monitoring to monitor for traffic related to bad actors and to provide real time traffic analytics on more than 1 Tbps of core data center traffic.

Magic Network Monitoring – Traffic Analytics

Monitoring local network traffic from IoT devices

Magic Network Monitoring also improves visibility on any network traffic that doesn’t go through Cloudflare. Imagine that you’re a network engineer at ACME Corporation, and it’s your job to manage and troubleshoot IoT devices in a factory that are connected to the factory’s internal network. The traffic generated by these IoT devices doesn’t go through Cloudflare because it is destined to other devices and endpoints on the internal network. Nonetheless, you still need to establish network visibility into device traffic over time to monitor and troubleshoot the system.

To solve the problem, you configure a router or other network device to securely send encrypted traffic flow summaries to Cloudflare via an IPSec tunnel. Magic Network Monitoring parses the data, and instantly provides you with insights and analytics on your network traffic. Now, when an IoT device goes down, or a connection between IoT devices is unexpectedly blocked, you can analyze historical network traffic data in Magic Network Monitoring to speed up the troubleshooting process.

Monitoring cloud network traffic

As cloud networking becomes increasingly prevalent, it is essential for enterprises to invest in visibility across their cloud environments. Let’s say you’re responsible for monitoring and troubleshooting your corporation’s cloud network operations which are spread across multiple public cloud providers. You need to improve visibility into your cloud network traffic to analyze and troubleshoot any unexpected traffic patterns like configuration drift that leads to an exposed network port.

To improve traffic visibility across different cloud environments, you can export cloud traffic flow logs from any virtual device that supports NetFlow or sFlow to Cloudflare. In the future, we are building support for native cloud VPC flow logs in conjunction with Magic Cloud Networking. Cloudflare will parse this traffic flow data and provide alerts plus analytics across all your cloud environments in a single pane of glass on the Cloudflare dashboard.

Improve your security posture today in less than 30 minutes

If you’re an existing Enterprise customer, and you want to improve your corporate network security, you can get started right away. Visit the Magic Network Monitoring product page, click “Talk to an expert”, and fill out the form. You’ll receive access within 24 hours of submitting the request. You can begin the self-serve onboarding tutorial, and start monitoring your first batch of network traffic in less than 30 minutes.

Over the next month, the free version of Magic Network Monitoring will be rolled out to all Enterprise Customers. The product will be automatically available by default without the need to submit a form.

If you’re interested in becoming an Enterprise Customer, and have more questions about Magic Network Monitoring, you can talk with an expert. If you’re a free customer, and you’re interested in testing a limited beta of Magic Network Monitoring, you can fill out this form to request access.

Creating a User Activity Dashboard for Amazon CodeWhisperer

Post Syndicated from David Ernst original https://aws.amazon.com/blogs/devops/creating-a-user-activity-dashboard-for-amazon-codewhisperer/

Maximizing the value from Enterprise Software tools requires an understanding of who and how users interact with those tools. As we have worked with builders rolling out Amazon CodeWhisperer to their enterprises, identifying usage patterns has been critical.

This blog post is a result of that work, builds on Introducing Amazon CodeWhisperer Dashboard blog and Amazon CloudWatch metrics and enables customers to build dashboards to support their rollouts. Note that these features are only available in CodeWhisperer Professional plan.

Organizations have leveraged the existing Amazon CodeWhisperer Dashboard to gain insights into developer usage. This blog explores how we can supplement the existing dashboard with detailed user analytics. Identifying leading contributors has accelerated tool usage and adoption within organizations. Acknowledging and incentivizing adopters can accelerate a broader adoption.

he architecture diagram outlines a streamlined process for tracking and analyzing Amazon CodeWhisperer user login events. It begins with logging these events in CodeWhisperer and AWS CloudTrail and then forwarding them to Amazon CloudWatch Logs. To set up the CloudTrail, you will use Amazon S3 and AWS Key Management Service (KMS). An AWS Lambda function sifts through the logs, extracting user login information. The findings are then displayed on a CloudWatch Dashboard, visually representing users who have logged in and inactive users. This outlines how an organization can dive into CodeWhisperer's usage.

The architecture diagram outlines a streamlined process for tracking and analyzing Amazon CodeWhisperer usage events. It begins with logging these events in CodeWhisperer and AWS CloudTrail and then forwarding them to Amazon CloudWatch Logs. Configuring AWS CloudTrail involves using Amazon S3 for storage and AWS Key Management Service (KMS) for log encryption. An AWS Lambda function analyzes the logs, extracting information about user activity. This blog also introduces a AWS CloudFormation template that simplifies the setup process, including creating the CloudTrail with an S3 bucket KMS key and the Lambda function. The template also configures AWS IAM permissions, ensuring the Lambda function has access rights to interact with other AWS services.

Configuring CloudTrail for CodeWhisperer User Tracking

This section details the process for monitoring user interactions while using Amazon CodeWhisperer. The aim is to utilize AWS CloudTrail to record instances where users receive code suggestions from CodeWhisperer. This involves setting up a new CloudTrail trail tailored to log events related to these interactions. By accomplishing this, you lay a foundational framework for capturing detailed user activity data, which is crucial for the subsequent steps of analyzing and visualizing this data through a custom AWS Lambda function and an Amazon CloudWatch dashboard.

Setup CloudTrail for CodeWhisperer

1. Navigate to AWS CloudTrail Service.

2. Create Trail

3. Choose Trail Attributes

a. Click on Create Trail

b. Provide a Trail Name, for example, “cwspr-preprod-cloudtrail”

c. Choose Enable for all accounts in my organization

d. Choose Create a new Amazon S3 bucket to configure the Storage Location

e. For Trail log bucket and folder, note down the given unique trail bucket name in order to view the logs at a future point.

f. Check Enabled to encrypt log files with SSE-KMS encryption

j. Enter an AWS Key Management Service alias for log file SSE-KMS encryption, for example, “cwspr-preprod-cloudtrail”

h. Select Enabled for CloudWatch Logs

i. Select New

j. Copy the given CloudWatch Log group name, you will need this for the testing the Lambda function in a future step.

k. Provide a Role Name, for example, “CloudTrailRole-cwspr-preprod-cloudtrail”

l. Click Next.

This image depicts how to choose the trail attributes within CloudTrail for CodeWhisperer User Tracking.

4. Choose Log Events

a. Check “Management events“ and ”Data events“

b. Under Management events, keep the default options under API activity, Read and Write

c. Under Data event, choose CodeWhisperer for Data event type

d. Keep the default Log all events under Log selector template

e. Click Next

f. Review and click Create Trail

This image depicts how to choose the log events for CloudTrail for CodeWhisperer User Tracking.

Please Note: The logs will need to be included on the account which the management account or member accounts are enabled.

Gathering Application ARN for CodeWhisperer application

Step 1: Access AWS IAM Identity Center

1. Locate and click on the Services dropdown menu at the top of the console.

2. Search for and select IAM Identity Center (SSO) from the list of services.

Step 2: Find the Application ARN for CodeWhisperer application

1. In the IAM Identity Center dashboard, click on Application Assignments. -> Applications in the left-side navigation pane.

2. Locate the application with Service as CodeWhisperer and click on it

An image displays where you can find the Application in IAM Identity Center.

3. Copy the Application ARN and store it in a secure place. You will need this ID to configure your Lambda function’s JSON event.

An image shows where you will find the Application ARN after you click on you AWS managed application.

User Activity Analysis in CodeWhisperer with AWS Lambda

This section focuses on creating and testing our custom AWS Lambda function, which was explicitly designed to analyze user activity within an Amazon CodeWhisperer environment. This function is critical in extracting, processing, and organizing user activity data. It starts by retrieving detailed logs from CloudWatch containing CodeWhisperer user activity, then cross-references this data with the membership details obtained from the AWS Identity Center. This allows the function to categorize users into active and inactive groups based on their engagement within a specified time frame.

The Lambda function’s capability extends to fetching and structuring detailed user information, including names, display names, and email addresses. It then sorts and compiles these details into a comprehensive HTML output. This output highlights the CodeWhisperer usage in an organization.

Creating and Configuring Your AWS Lambda Function

1. Navigate to the Lambda service.

2. Click on Create function.

3. Choose Author from scratch.

4. Enter a Function name, for example, “AmazonCodeWhispererUserActivity”.

5. Choose Python 3.11 as the Runtime.

6. Click on ‘Create function’ to create your new Lambda function.

7. Access the Function: After creating your Lambda function, you will be directed to the function’s dashboard. If not, navigate to the Lambda service, find your function “AmazonCodeWhispererUserActivity”, and click on it.

8. Copy and paste your Python code into the inline code editor on the function’s dashboard. The lambda function code can be found here.

9. Click ‘Deploy’ to save and deploy your code to the Lambda function.

10. You have now successfully created and configured an AWS Lambda function with our Python code.

This image depicts how to configure your AWS Lambda function for tracking user activity in CodeWhisperer.

Updating the Execution Role for Your AWS Lambda Function

After you’ve created your Lambda function, you need to ensure it has the appropriate permissions to interact with other AWS services like CloudWatch Logs and AWS Identity Store. Here’s how you can update the IAM role permissions:

Locate the Execution Role:

1. Open Your Lambda Function’s Dashboard in the AWS Management Console.

2. Click on the ‘Configuration’ tab located near the top of the dashboard.

3. Set the Time Out setting to 15 minutes from the default 3 seconds

4. Select the ‘Permissions’ menu on the left side of the Configuration page.

5. Find the ‘Execution role’ section on the Permissions page.

6. Click on the Role Name to open the IAM (Identity and Access Management) role associated with your Lambda function.

7. In the IAM role dashboard, click on the Policy Name under the Permissions policies.

8. Edit the existing policy: Replace the policy with the following JSON.

9. Save the changes to the policy.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Action":[
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents",
            "logs:StartQuery",
            "logs:GetQueryResults",
            "sso:ListInstances",
            "sso:ListApplicationAssignments"
            "identitystore:DescribeUser",
            "identitystore:ListUsers",
            "identitystore:ListGroupMemberships"
         ],
         "Resource":"*",
         "Effect":"Allow"
      },
      {
         "Action":[
            "cloudtrail:DescribeTrails",
            "cloudtrail:GetTrailStatus"
         ],
         "Resource":"*",
         "Effect":"Allow"
      }
   ]
} Your AWS Lambda function now has the necessary permissions to execute and interact with CloudWatch Logs and AWS Identity Store. This image depicts the permissions after the Lambda policies are updated. 

Testing Lambda Function with custom input

1. On your Lambda function’s dashboard.

2. On the function’s dashboard, locate the Test button near the top right corner.

3. Click on Test. This opens a dialog for configuring a new test event.

4. In the dialog, you’ll see an option to create a new test event. If it’s your first test, you’ll be prompted automatically to create a new event.

5. For Event name, enter a descriptive name for your test, such as “TestEvent”.

6. In the event code area, replace the existing JSON with your specific input:

{
"log_group_name": "{Insert Log Group Name}",
"start_date": "{Insert Start Date}",
"end_date": "{Insert End Date}",
"codewhisperer_application_arn": "{Insert Codewhisperer Application ARN}", 
"identity_store_region": "{Insert Region}", 
"codewhisperer_region": "{Insert Region}"
}

7. This JSON structure includes:

a. log_group_name: The name of the log group in CloudWatch Logs.

b. start_date: The start date and time for the query, formatted as “YYYY-MM-DD HH:MM:SS”.

c. end_date: The end date and time for the query, formatted as “YYYY-MM-DD HH:MM:SS”.

e. codewhisperer_application_arn: The ARN of the Code Whisperer Application in the AWS Identity Store.

f. identity_store_region: The region of the AWS Identity Store.

f. codewhisperer_region: The region of where Amazon CodeWhisperer is configured.

8. Click on Save to store this test configuration.

This image depicts an example of creating a test event for the Lambda function with example JSON parameters entered.

9. With the test event selected, click on the Test button again to execute the function with this event.

10. The function will run, and you’ll see the execution result at the top of the page. This includes execution status, logs, and output.

11. Check the Execution result section to see if the function executed successfully.

This image depicts what a test case that successfully executed looks like.

Visualizing CodeWhisperer User Activity with Amazon CloudWatch Dashboard

This section focuses on effectively visualizing the data processed by our AWS Lambda function using a CloudWatch dashboard. This part of the guide provides a step-by-step approach to creating a “CodeWhispererUserActivity” dashboard within CloudWatch. It details how to add a custom widget to display the results from the Lambda Function. The process includes configuring the widget with the Lambda function’s ARN and the necessary JSON parameters.

1.Navigate to the Amazon CloudWatch service from within the AWS Management Console

2. Choose the ‘Dashboards’ option from the left-hand navigation panel.

3. Click on ‘Create dashboard’ and provide a name for your dashboard, for example: “CodeWhispererUserActivity”.

4. Click the ‘Create Dashboard’ button.

5. Select “Other Content Types” as your ‘Data sources types’ option before choosing “Custom Widget” for your ‘Widget Configuration’ and then click ‘Next’.

6. On the “Create a custom widget” page click the ‘Next’ button without making a selection from the dropdown.

7. On the ‘Create a custom widget’ page:

a. Enter your Lambda function’s ARN (Amazon Resource Name) or use the dropdown menu to find and select your “CodeWhispererUserActivity” function.

b. Add the JSON parameters that you provided in the test event, without including the start and end dates.

{
"log_group_name": "{Insert Log Group Name}",
“codewhisperer_application_arn”:”{Insert Codewhisperer Application ARN}”,
"identity_store_region": "{Insert identity Store Region}",
"codewhisperer_region": "{Insert Codewhisperer Region}"
}

This image depicts an example of creating a custom widget.

8. Click the ‘Add widget’ button. The dashboard will update to include your new widget and will run the Lambda function to retrieve initial data. You’ll need to click the “Execute them all” button in the upper banner to let CloudWatch run the initial Lambda retrieval.

This image depicts the execute them all button on the upper right of the screen.

9. Customize Your Dashboard: Arrange the dashboard by dragging and resizing widgets for optimal organization and visibility. Adjust the time range and refresh settings as needed to suit your monitoring requirements.

10. Save the Dashboard Configuration: After setting up and customizing your dashboard, click ‘Save dashboard’ to preserve your layout and settings.

This image depicts what the dashboard looks like. It showcases active users and inactive users, with first name, last name, display name, and email.

CloudFormation Deployment for the CodeWhisperer Dashboard

The blog post concludes with a detailed AWS CloudFormation template designed to automate the setup of the necessary infrastructure for the Amazon CodeWhisperer User Activity Dashboard. This template provisions AWS resources, streamlining the deployment process. It includes the configuration of AWS CloudTrail for tracking user interactions, setting up CloudWatch Logs for logging and monitoring, and creating an AWS Lambda function for analyzing user activity data. Additionally, the template defines the required IAM roles and permissions, ensuring the Lambda function has access to the needed AWS services and resources.

The blog post also provides a JSON configuration for the CloudWatch dashboard. This is because, at the time of writing, AWS CloudFormation does not natively support the creation and configuration of CloudWatch dashboards. Therefore, the JSON configuration is necessary to manually set up the dashboard in CloudWatch, allowing users to visualize the processed data from the Lambda function. The CloudFormation template can be found here.

Create a CloudWatch Dashboard and import the JSON below.

{
   "widgets":[
      {
         "height":30,
         "width":20,
         "y":0,
         "x":0,
         "type":"custom",
         "properties":{
            "endpoint":"{Insert ARN of Lambda Function}",
            "updateOn":{
               "refresh":true,
               "resize":true,
               "timeRange":true
            },
            "params":{
               "log_group_name":"{Insert Log Group Name}",
               "codewhisperer_application_arn":"{Insert Codewhisperer Application ARN}",
               "identity_store_region":"{Insert identity Store Region}",
               "codewhisperer_region":"{Insert Codewhisperer Region}"
            }
         }
      }
   ]
}

Conclusion

In this blog, we detail a comprehensive process for establishing a user activity dashboard for Amazon CodeWhisperer to deliver data to support an enterprise rollout. The journey begins with setting up AWS CloudTrail to log user interactions with CodeWhisperer. This foundational step ensures the capture of detailed activity events, which is vital for our subsequent analysis. We then construct a tailored AWS Lambda function to sift through CloudTrail logs. Then, create a dashboard in AWS CloudWatch. This dashboard serves as a central platform for displaying the user data from our Lambda function in an accessible, user-friendly format.

You can reference the existing CodeWhisperer dashboard for additional insights. The Amazon CodeWhisperer Dashboard offers a view summarizing data about how your developers use the service.

Overall, this dashboard empowers you to track, understand, and influence the adoption and effective use of Amazon CodeWhisperer in your organizations, optimizing the tool’s deployment and fostering a culture of informed data-driven usage.

About the authors:

David Ernst

David Ernst is an AWS Sr. Solution Architect with a DevOps and Generative AI background, leveraging over 20 years of IT experience to drive transformational change for AWS’s customers. Passionate about leading teams and fostering a culture of continuous improvement, David excels in architecting and managing cloud-based solutions, emphasizing automation, infrastructure as code, and continuous integration/delivery.

Riya Dani

Riya Dani is a Solutions Architect at Amazon Web Services (AWS), responsible for helping Enterprise customers on their journey in the cloud. She has a passion for learning and holds a Bachelor’s & Master’s degree in Computer Science from Virginia Tech. In her free time, she enjoys staying active and reading.

Vikrant Dhir

Vikrant Dhir is a AWS Solutions Architect helping systemically important financial services institutions innovate on AWS. He specializes in Containers and Container Security and helps customers build and run enterprise grade Kubernetes Clusters using Amazon Elastic Kubernetes Service(EKS). He is an avid programmer proficient in a number of languages such as Java, NodeJS and Terraform.

How to develop an Amazon Security Lake POC

Post Syndicated from Anna McAbee original https://aws.amazon.com/blogs/security/how-to-develop-an-amazon-security-lake-poc/

You can use Amazon Security Lake to simplify log data collection and retention for Amazon Web Services (AWS) and non-AWS data sources. To make sure that you get the most out of your implementation requires proper planning.

In this post, we will show you how to plan and implement a proof of concept (POC) for Security Lake to help you determine the functionality and value of Security Lake in your environment, so that your team can confidently design and implement in production. We will walk you through the following steps:

  1. Understand the functionality and value of Security Lake
  2. Determine success criteria for the POC
  3. Define your Security Lake configuration
  4. Prepare for deployment
  5. Enable Security Lake
  6. Validate deployment

Understand the functionality of Security Lake

Figure 1 summarizes the main features of Security Lake and the context of how to use it:

Figure 1: Overview of Security Lake functionality

Figure 1: Overview of Security Lake functionality

As shown in the figure, Security Lake ingests and normalizes logs from data sources such as AWS services, AWS Partner sources, and custom sources. Security Lake also manages the lifecycle, orchestration, and subscribers. Subscribers can be AWS services, such as Amazon Athena, or AWS Partner subscribers.

There are four primary functions that Security Lake provides:

  • Centralize visibility to your data from AWS environments, SaaS providers, on-premises, and other cloud data sources — You can collect log sources from AWS services such as AWS CloudTrail management events, Amazon Simple Storage Service (Amazon S3) data events, AWS Lambda data events, Amazon Route 53 Resolver logs, VPC Flow Logs, and AWS Security Hub findings, in addition to log sources from on-premises, other cloud services, SaaS applications, and custom sources. Security Lake automatically aggregates the security data across AWS Regions and accounts.
  • Normalize your security data to an open standard — Security Lake normalizes log sources in a common schema, the Open Security Schema Framework (OCSF), and stores them in compressed parquet files.
  • Use your preferred analytics tools to analyze your security data — You can use AWS tools, such as Athena and Amazon OpenSearch Service, or you can utilize external security tools to analyze the data in Security Lake.
  • Optimize and manage your security data for more efficient storage and query — Security Lake manages the lifecycle of your data with customizable retention settings with automated storage tiering to help provide more cost-effective storage.

Determine success criteria

By establishing success criteria, you can assess whether Security Lake has helped address the challenges that you are facing. Some example success criteria include:

  • I need to centrally set up and store AWS logs across my organization in AWS Organizations for multiple log sources.
  • I need to more efficiently collect VPC Flow Logs in my organization and analyze them in my security information and event management (SIEM) solution.
  • I want to use OpenSearch Service to replace my on-premises SIEM.
  • I want to collect AWS log sources and custom sources for machine learning with Amazon Sagemaker.
  • I need to establish a dashboard in Amazon QuickSight to visualize my Security Hub findings and a custom log source data.

Review your success criteria to make sure that your goals are realistic given your timeframe and potential constraints that are specific to your organization. For example, do you have full control over the creation of AWS services that are deployed in an organization? Do you have resources that can dedicate time to implement and test? Is this time convenient for relevant stakeholders to evaluate the service?

The timeframe of your POC will depend on your answers to these questions.

Important: Security Lake has a 15-day free trial per account that you use from the time that you enable Security Lake. This is the best way to estimate the costs for each Region throughout the trial, which is an important consideration when you configure your POC.

Define your Security Lake configuration

After you establish your success criteria, you should define your desired Security Lake configuration. Some important decisions include the following:

  • Determine AWS log sources — Decide which AWS log sources to collect. For information about the available options, see Collecting data from AWS services.
  • Determine third-party log sources — Decide if you want to include non-AWS service logs as sources in your POC. For more information about your options, see Third-party integrations with Security Lake; the integrations listed as “Source” can send logs to Security Lake.

    Note: You can add third-party integrations after the POC or in a second phase of the POC. Pre-planning will be required to make sure that you can get these set up during the 15-day free trial. Third-party integrations usually take more time to set up than AWS service logs.

  • Select a delegated administrator – Identify which account will serve as the delegated administrator. Make sure that you have the appropriate permissions from the organization admin account to identify and enable the account that will be your Security Lake delegated administrator. This account will be the location for the S3 buckets with your security data and where you centrally configure Security Lake. The AWS Security Reference Architecture (AWS SRA) recommends that you use the AWS logging account for this purpose. In addition, make sure to review Important considerations for delegated Security Lake administrators.
  • Select accounts in scope — Define which accounts to collect data from. To get the most realistic estimate of the cost of Security Lake, enable all accounts across your organization during the free trial.
  • Determine analytics tool — Determine if you want to use native AWS analytics tools, such as Athena and OpenSearch Service, or an existing SIEM, where the SIEM is a subscriber to Security Lake.
  • Define log retention and Regions — Define your log retention requirements and Regional restrictions or considerations.

Prepare for deployment

After you determine your success criteria and your Security Lake configuration, you should have an idea of your stakeholders, desired state, and timeframe. Now you need to prepare for deployment. In this step, you should complete as much as possible before you deploy Security Lake. The following are some steps to take:

  • Create a project plan and timeline so that everyone involved understands what success look like and what the scope and timeline is.
  • Define the relevant stakeholders and consumers of the Security Lake data. Some common stakeholders include security operations center (SOC) analysts, incident responders, security engineers, cloud engineers, finance, and others.
  • Define who is responsible, accountable, consulted, and informed during the deployment. Make sure that team members understand their roles.
  • Make sure that you have access in your management account to delegate and administrator. For further details, see IAM permissions required to designate the delegated administrator.
  • Consider other technical prerequisites that you need to accomplish. For example, if you need roles in addition to what Security Lake creates for custom extract, transform, and load (ETL) pipelines for custom sources, can you work with the team in charge of that process before the POC?

Enable Security Lake

The next step is to enable Security Lake in your environment and configure your sources and subscribers.

  1. Deploy Security Lake across the Regions, accounts, and AWS log sources that you previously defined.
  2. Configure custom sources that are in scope for your POC.
  3. Configure analytics tools in scope for your POC.

Validate deployment

The final step is to confirm that you have configured Security Lake and additional components, validate that everything is working as intended, and evaluate the solution against your success criteria.

  • Validate log collection — Verify that you are collecting the log sources that you configured. To do this, check the S3 buckets in the delegated administrator account for the logs.
  • Validate analytics tool — Verify that you can analyze the log sources in your analytics tool of choice. If you don’t want to configure additional analytics tooling, you can use Athena, which is configured when you set up Security Lake. For sample Athena queries, see Amazon Security Lake Example Queries on GitHub and Security Lake queries in the documentation.
  • Obtain a cost estimate — In the Security Lake console, you can review a usage page to verify that the cost of Security Lake in your environment aligns with your expectations and budgets.
  • Assess success criteria — Determine if you achieved the success criteria that you defined at the beginning of the project.

Next steps

Next steps will largely depend on whether you decide to move forward with Security Lake.

  • Determine if you have the approval and budget to use Security Lake.
  • Expand to other data sources that can help you provide more security outcomes for your business.
  • Configure S3 lifecycle policies to efficiently store logs long term based on your requirements.
  • Let other teams know that they can subscribe to Security Lake to use the log data for their own purposes. For example, a development team that gets access to CloudTrail through Security Lake can analyze the logs to understand the permissions needed for an application.

Conclusion

In this blog post, we showed you how to plan and implement a Security Lake POC. You learned how to do so through phases, including defining success criteria, configuring Security Lake, and validating that Security Lake meets your business needs.

As a customer, this guide will help you run a successful proof of value (POV) with Security Lake. It guides you in assessing the value and factors to consider when deciding to implement the current features.

Further resources

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Anna McAbee

Anna McAbee

Anna is a Security Specialist Solutions Architect focused on threat detection and incident response at AWS. Before AWS, she worked as an AWS customer in financial services on both the offensive and defensive sides of security. Outside of work, Anna enjoys cheering on the Florida Gators football team, wine tasting, and traveling the world.

Author

Marshall Jones

Marshall is a Worldwide Security Specialist Solutions Architect at AWS. His background is in AWS consulting and security architecture, focused on a variety of security domains including edge, threat detection, and compliance. Today, he is focused on helping enterprise AWS customers adopt and operationalize AWS security services to increase security effectiveness and reduce risk.

Marc Luescher

Marc Luescher

Marc is a Senior Solutions Architect helping enterprise customers be successful, focusing strongly on threat detection, incident response, and data protection. His background is in networking, security, and observability. Previously, he worked in technical architecture and security hands-on positions within the healthcare sector as an AWS customer. Outside of work, Marc enjoys his 3 dogs, 4 cats, and 20+ chickens.

Keeping Remote Teams Connected: The Zabbix Advantage

Post Syndicated from Michael Kammer original https://blog.zabbix.com/keeping-remote-teams-connected-the-zabbix-advantage/27551/

The popularity of remote teams may have exploded in popularity during the COVID-19 pandemic, but it’s not a phenomenon that’s likely to trend downward anytime soon. High-profile organizations like 3M, Dropbox, Shopify, and LinkedIn are continuing to enthusiastically embrace remote working, essentially making it the “default setting” for their employees.

The shift toward remote working is not without its challenges, however. Organizations of all sizes often have little time to set up the kind of networking infrastructure and efficient processes that make sure remote workers are just as connected and productive as their on-site counterparts. In this article, we’ll take a quick look at some of the most important network monitoring challenges that remote teams face and show how Zabbix can help you tackle them as efficiently as possible.

Infrastructure and connectivity issues

A remote network is essentially a grouping of multiple smaller network setups, each with their own set of variables that can affect performance. The differences between network system and infrastructure quality at different remote destinations can often lead to low overall network performance, which in turn makes it a challenge to provide the kind of high-speed communication needed to run the remote automation tools and software applications used by remote employees and teams.

By providing straightforward and easy-to-understand visibility into a network’s connected devices and how data moves between them, Zabbix makes it easy to automatically compare data and identify any drop in network performance.

With Zabbix, you can easily keep an eye on network routers and switches, especially internet provider and uplink ports up/down. You can also monitor network latency, the error rate on ports, the packet loss to important devices, and network utilization on important ports with net.if.in/net.if.out. Here are some example triggers:

High Network Utilization: avg(/Router ABC/net.if.in[eth0],5m)>80MB
High Packet Loss: avg(/Router ABC/icmppingloss,5m)>5
High Latency: avg(/Router ABC/icmppingsec,5m)>0.1

What’s more, Zabbix allows you to create network maps with important network devices and real-time data, as well as dashboards with maps and single item/gauge widgets, all of which makes it far easier to achieve the uninterrupted connectivity that remote teams depend on.

Staying safe

Remote locations aren’t islands that can be completely isolated from external traffic. Staying vigilant and doing everything possible to eliminate data breaches is important, and taking advantage of strong encryption methods, network scanning tools, and firewalls to protect your systems is a good start. However, using a whole suite of tools to protect security can add more difficulty when it comes to integrating and monitoring them.

With Zabbix, you can count on enterprise-grade security, including encrypted communication between components, a flexible user permission schema that can be easily applied to a distributed environment, and custom user roles with a granular set of permissions for different types of users.

Zabbix also provides native support for HTTP, LDAP, and SAML authentication (which gives you an additional layer of security and improves your user experience while working with Zabbix), the ability to restrict access to sensitive information by limiting which metrics can be collected in your environment, and the ability to track changes in your environment by utilizing the Audit log. It’s all designed to make sure that there are no compromises on the security of your data when you decide to go remote.

Scalability

As a remote organization grows and its distributed systems expand, a good monitoring solution needs to be able to grow along with it in order to prevent gaps in coverage while maintaining performance and reliability. Zabbix gives you limitless scalability in the form of Zabbix proxies, which act as independent intermediaries that collect performance and availability data on behalf of a Zabbix server. You can roll out new proxies as fast as you need them, and because Zabbix is free and open source, you don’t have to worry about additional licensing costs.

Zabbix proxies allow you to see at a glance what resources are being used on your network at any given moment, which is especially handy if, like most remote teams, you have tens or even hundreds of servers and network appliances to monitor. You can also execute remote commands in remote locations – either on the proxies themselves or on the agents monitored by the proxy, and multiple frontends can be deployed for load balancing as well as for improved security and connectivity. Proxy docker containers and cloud options are available as well, enhancing flexibility and making Zabbix ideal for any organization that spans the globe (or aspires to).

Managing multiple solutions

The legacy software and systems you use were most likely designed to work in a traditional networking model. Remote working, as we’ve seen, presents a whole new range of challenges when it comes to compatibility and support.

We’ve created Zabbix to be as easy as possible to integrate with existing systems. You can easily monitor any operating system, cloud service, IP telephony service, docker container, or web server/database backend. We provide out-of-the-box monitoring for the world’s leading hardware and software vendors, and our extensively documented API makes it easy to create workflows and integrate with other systems. In addition, you can also integrate Zabbix with the most popular helpdesk, messaging, and ITSM systems, such as Slack, Jira, MS Teams, and many others.

Not only that, Zabbix is designed to serve as the ideal monitoring solution for multi-tenant environments. It serves as a single pane of glass for your entire infrastructure, and it’s easy to visualize everything that’s happening with your network with unique maps, dashboards, and templates.

Conclusion

The days of large teams all working together under the same roof are a thing of the past – the remote working trend will only accelerate as technology improves and employees get more accustomed to working with colleagues across multiple locations. That’s why it’s of paramount importance to make sure your monitoring solution has the built-in flexibility and scalability to grow with your team and your business.

If you want to see for yourself how Zabbix can help you effectively monitor a globally distributed network, contact us.

 

 

The post Keeping Remote Teams Connected: The Zabbix Advantage appeared first on Zabbix Blog.

HPC Monitoring: Transitioning from Nagios and Ganglia to Zabbix 6

Post Syndicated from Mark Vilensky original https://blog.zabbix.com/hpc-monitoring-transitioning-from-nagios-and-ganglia-to-zabbix-6/27313/

My name is Mark Vilensky, and I’m currently the Scientific Computing Manager at the Weizmann Institute of Science in Rehovot, Israel. I’ve been working in High-Performance Computing (HPC) for the past 15 years.

Our base is at the Chemistry Faculty at the Weizmann Institute, where our HPC activities follow a traditional path — extensive number crunching, classical calculations, and a repertoire that includes handling differential equations. Over the years, we’ve embraced a spectrum of technologies, even working with actual supercomputers like the SGI Altix.

Our setup

As of now, our system boasts nearly 600 compute nodes, collectively wielding about 25,000 cores. The interconnect is Infiniband, and for management, provisioning, and monitoring, we rely on Ethernet. Our storage infrastructure is IBM GPFS on DDN hardware, and job submissions are facilitated through PBS Professional.

We use VMware for the system management. Surprisingly, the team managing this extensive system comprises only three individuals. The hardware landscape features HPE, Dell, and Lenovo servers.

The path to Zabbix

Recent challenges have surfaced in the monitoring domain, prompting considerations for an upgrade to Red Hat 8 or a comparable distribution. Our existing monitoring framework involved Nagios and Ganglia, but they had some severe limitations — Nagios’ lack of scalability and Ganglia’s Python 2 compatibility issues have become apparent.

Exploring alternatives led us to Zabbix, a platform not commonly encountered in supercomputing conferences but embraced by the community. Fortunately, we found a great YouTube channel by Dmitry Lambert that not only gives some recipes for doing things but also provides an overview required for planning, sizing, and avowing future troubles.

Our Zabbix setup resides in a modest VM, sporting 16 CPUs, 32 GB RAM, and three Ethernet interfaces, all operating within the Rocky 8.7 environment. The database relies on PostgreSQL 14 and Timescale DB2 version 2.8, with slight adjustments to the default configurations for history and trend settings.

Getting the job done

The stability of our Zabbix system has been noteworthy, showcasing its ability to automate tasks, particularly in scenarios where nodes are taken offline, prompting Zabbix to initiate maintenance cycles automatically. Beyond conventional monitoring, we’ve tapped into Zabbix’s capabilities for external scripts, querying the PBS server and GPFS server, and even managing specific hardware anomalies.

The Zabbix dashboard has emerged as a comprehensive tool, offering a differentiated approach through host groups. These groups categorize our hosts, differentiating between CPU compute nodes, GPU compute nodes, and infrastructure nodes, allowing tailored alerts based on node types.

Alerting and visualization

Our alerting strategy involves receiving email alerts only for significant disasters, a conscious effort to avoid alert fatigue. The presentation emphasizes the nuanced differences in monitoring compute nodes versus infrastructure nodes, focusing on availability and potential job performance issues for the former and services, memory, and memory leaks for the latter.

The power of visual representations is underscored, with the utilization of heat maps offering quick insights into the cluster’s performance.

Final thoughts

In conclusion, our journey with Zabbix has not only delivered stability and automation but has also provided invaluable insights for optimizing resource utilization. I’d like to express my special appreciation for Andrei Vasilev, a member of our team whose efforts have been instrumental in making the transition to Zabbix.

The post HPC Monitoring: Transitioning from Nagios and Ganglia to Zabbix 6 appeared first on Zabbix Blog.

A Look Back at Zabbix Summit 2023

Post Syndicated from Michael Kammer original https://blog.zabbix.com/a-look-back-at-zabbix-summit-2023/26744/

Autumn in the Latvian capital of Riga is marked by a variety of traditions. The leaves fall, the rainy season arrives, the birds migrate, and IT professionals from around the world descend on the city for the annual Zabbix Summit.

On October 6 and 7, the Radisson Blu Hotel Latvija was packed with 450 delegates from 38 countries, all there for Zabbix Summit 2023, the 11th in-person version of Zabbix’s premier yearly event.

This year’s Summit was marked by presentations, partner activities, and moments of relaxation and celebration that will energize the Zabbix community and spark ideas that attendees will take home to every corner of the world.

If you couldn’t make it, here’s a little taste of how it felt to be there!

Zabbix Summit 2023 in numbers

The stage hosted 27 speakers from 17 different countries who gave 31 speeches, including both lectures and lightning talks. There were four workshops with deep dives into technical topics, conducted by the Zabbix technical team as well as our partners from Opensource ICT Solutions and IZI-IT. Summit attendees also enjoyed three parties designed to provide a relaxing experience and networking opportunities.

Zabbix Summit 2023 proudly featured 10 sponsors, all part of Zabbix’s official partner network. They included:

initMAX – Diamond Sponsor
IntelliTrend – Platinum Sponsor
IZI-IT – Platinum Sponsor
Quadrata – Platinum Sponsor
Allenta – Gold Sponsor
Metricio – Gold Sponsor
Opensource ICT Solutions – Gold Sponsor
Docomo Business – Gold Sponsor
SRA OSS – Silver Sponsor
Enthus – Lunch and coffee break sponsor

We’d also like to give a shout-out to our Zabbix Fans, who played a crucial role in supporting the Summit this year (as every year) with their attendance, merchandise purchases, and enthusiasm!

We’re grateful to everyone who played a role and helped us make Zabbix Summit 2023 happen!

Highlights from the main stage

This year we continued a Summit tradition and allowed our in-person audience as well those tuning in via livestream and YouTube to ask questions during live Q&A sessions – a feature that made the proceedings more interactive and helped everyone feel more involved. The speeches were all fascinating and well received, but a few in particular stood out:

What the future holds for Zabbix

Zabbix CEO and Founder Alexei Vladishev kicked off the presentations on Day 1 with a keynote speech about his current plans for Zabbix’s development, including a detailed look at enhancements requested by users.

Avoiding alert fatigue

Bringing a less technical and more conceptual approach to addressing day-to-day data monitoring issues, Rihards Olups, SaaS Architect at Nokia, discussed alert fatigue and how science explains it. During his presentation, Rihards showed how an excess of alerts can negatively affect selective attention and shared his thoughts about how professionals can intervene to prevent problems.

Making Zabbix’s latest offerings accessible to everyone

Day 2 began with Zabbix Director of Business Development Sergey Sorokin focusing on new plans and offerings, including a subscription system for technical support, consulting services, and monitoring tailored for managed service providers.

Monitoring everything (and we do mean everything!)

Janne Pikkarainen, Lead Site Reliability Engineer at Forcepoint, provided detailed and entertaining insights into how he connects Zabbix to smart accessories and uses it to monitor aspects of his home, including the location of personal items, noise levels, and even the frequency of his daughter’s naps and cries.

Implementing ideas and design in MSP environments

In tackling the topic of data collection and analysis for service providers, Brian van Baekel, Zabbix Trainer at Opensource ICT Solutions, presented details on the development of projects focused on monitoring service providers. He also highlighted best practices for data collection in Zabbix Server, data storage, and presenting on the Zabbix Frontend.

Monitoring the London transportation system

A use case presented by Nathan Liefting, Zabbix Consultant and Trainer at Opensource ICT Solutions, and Adan Mohamed, DevOps Manager at Boldyn Networks, showed how Zabbix monitors the availability of the London Underground subway system. Data is collected from 136 “tube” stations in a high-level architecture and used to assess the availability of Wi-Fi networks, emergency connections, and other services.

Bringing the Olympics and World Cup to life with Zabbix

Marianna Portela, a Tech Lead at Globo in Brazil, shared her insights into how Zabbix supports Globo’s digital transformation and helps her monitor live event infrastructure at massive events like the Olympics and World Cup.

Don’t forget the fun part!

Zabbix Summits are renowned for their friendly, informal atmosphere, which is probably most clearly on display at our famous Summit parties.

Zabbix Summit 2023’s Welcome party was held at the Stargorod Riga brewery in the heart of Riga’s old town. It featured arm wrestling, a selection of delicious foods and beverages, and plenty of opportunities for Summit participants to get to know each other.

The Main party saw live music, dancing, quizzes, and other fun events take place within the historic confines of the Latvian Railway History Museum. The atmosphere, food, drinks, and good company all combined to create an event that nobody who attended will soon forget!

Last but not least, the Closing party at the Burzma food hall was a true celebration of the diversity of the global Zabbix community, with food and music from every country with a Zabbix presence as well as plenty of opportunities for Summit attendees to swap stories and exchange contact details.

Open door, open minds

The traditional Zabbix open-door day was held on Thursday, October 5, and while past Summits have typically seen around 50 visitors, we were proud to welcome closer to 100 this time around. Attendees could have a coffee with their favorite Zabbix employees, play a friendly game of foosball or table tennis, and get a behind-the-scenes look at where the magic happens.

Testify!

One new feature that made a big splash at this year’s Summit was the testimonial booth, which allowed Summit attendees to share their thoughts and experiences about Zabbix with the rest of our community. Sharing a testimonial or leaving a review allowed attendees to collect a piece of exclusive Zabbix Summit 2023 merchandise, and we went through a lot of it – the booth provided us with 28 filmed and 17 written testimonials about Zabbix products and services, far more than we anticipated.

Where to find the presentations

If you couldn’t attend but want to stay informed about what was discussed at the event (or if you’d just like to revisit the stage presentations), both days of recordings are available on Zabbix’s YouTube channel at the following links:

Streaming – Zabbix Summit Day 1

Streaming – Zabbix Summit Day 2

The graphics and texts of the presentations are also available for reference and download on the official event website.

We hope that Zabbix Summit 2023 was a time of valuable learning, connections, and idea exchange for everyone who attended or followed along through social media. If you’ve enjoyed the photos, you can see several more on our Instagram.

If you had an amazing time at Zabbix Summit 2023 (and we certainly hope you did), registration for Zabbix Summit 2024 is already open and Early Bird tickets are available.

See you next year!

 

The post A Look Back at Zabbix Summit 2023 appeared first on Zabbix Blog.

The Zabbix Advantage for Business

Post Syndicated from Michael Kammer original https://blog.zabbix.com/the-zabbix-advantage-for-business/26497/

CIOs and CITOs know all too well that a smoothly functioning network is the backbone of any business. Your network has to guarantee reliability, performance, and security. An unreliable network, by contrast, means damaged productivity, negative customer perceptions, and haphazard security. The solution is network monitoring, and in this post we’ll explore the reasons why Zabbix is the ideal monitoring solution for any business.

What is network monitoring?

Network monitoring is a critical IT process where all networking components (as well as key performance indicators like CPU utilization and network bandwidth) are constantly monitored to improve performance and eliminate bottlenecks. It provides real-time information that network administrators need to determine whether a network is running optimally.

Why Zabbix?

At Zabbix, we’re here to help you deliver for your customers, flawlessly and without interruptions. Our monitoring solution is 100% open source, available in over 20 languages, and able to collect an unlimited amount of data. Designed with enterprise requirements in mind, Zabbix provides a comprehensive, “single pane of glass” view of any size environment. Put simply, Zabbix allows you to monitor anything – from physical and virtual servers or containers to network infrastructure, applications, and cloud services.

What’s more, we offer a wide variety of additional professional services to go along with our solution, including:

  • Multiple technical support subscriptions that are tailored to the needs of your business
  • Certified training programs that are designed to help you master Zabbix under the guidance of top experts
  • A wide range of professional services, including template building, upgrades, consulting, and more

Keep reading to find out more about the difference Zabbix can make for your business.

The Zabbix advantage

IT teams are under enormous pressure to have their networks functioning perfectly 100% of the time, and with good reason. It’s simply not possible to run a business with a malfunctioning network. Here are 5 key reasons why you need to make network monitoring a top priority, and why Zabbix is the right answer for all of them.

Reliability

A network monitoring solution’s main reason for being is to show whether a device is working or not. Taking a proactive approach to maintaining a healthy network will keep tech support requests and downtime to an absolute minimum. Zabbix makes it easy to do so by automatically detecting problem states in your metric flow. Not only that, but our automated predictive functions can also help you react proactively. They do this by forecasting a value for early alerting and predicting the time left until you reach a problem threshold. Automation then allows you to remove additional inefficiencies.

Visibility

Having complete visibility of all your hardware and software assets allows you to easily monitor the health of your network. Zabbix lets businesses access metrics, issues, reports, and maps with a single click, allowing you to:

  • Analyze and correlate your metrics with easy-to-read graphs
  • Track your monitoring targets on an interactive geo-map
  • Display the statuses of your elements together with real-time data to get a detailed overview of your infrastructure on a Zabbix map
  • Generate scheduled PDF reports from any Zabbix dashboard
  • Extend the native Zabbix frontend functionality by developing your own frontend widgets and modules

Performance

By making it easy to monitor anything, Zabbix lets you know which parts of your network are being properly used, overused, or underused. This can help you uncover unnecessary costs that can be eliminated or identify a network component that needs upgrading.

Compliance

Today’s IT teams need to meet strict regulatory and protection standards in increasingly complex networks. Zabbix can spot changes in normal system behavior and unusual data flow. It can then either leverage multiple messaging channels to notify your team about anomalies or simply resolve any issues automatically.

Profitability

Zabbix has an extensive track record of making businesses more productive by saving network management time and lowering operating costs. Servers, for example, are machines that inevitably break down from time to time. Being able to quickly re-launch after a failure has occurred and minimizing the server downtime are vital. By making sure your team is aware of any and all current and impending issues, Zabbix can reduce downtime and increase the productivity and efficiency of your business.

Zabbix across industries

Whatever field you’re in, there’s no substitute for consistent, problem-free service when it comes to gaining the trust and loyalty of customers. Zabbix has an extensive track record of helping clients in multiple industries achieve their goals.

Zabbix for healthcare

A typical hospital relies on tens of thousands of connected devices. Manually checking each one for anomalies simply isn’t practical. Establishing a stable service level is a vital issue in most industries, but in healthcare it’s literally a matter of life and death. With Zabbix, hospital IT teams receive potentially life-saving alerts if anything is out of the ordinary.

What’s more, Zabbix can monitor progress toward expected outcomes, providing up-to-the-minute statistics on data errors or IT system failures. Issues, response times, and potential bottlenecks are displayed in easy-to-read graphs and charts. This allows hospital staff to follow up on the presence or absence of problems.

Zabbix for banking and finance

Financial institutions of all sizes rely on their networks to maintain connectivity and productivity. By processing millions of checks per minute and considering very complex dependencies between different elements of infrastructure, Zabbix allows banks to proactively detect and resolve network problems before they turn into major business disruptions.

Zabbix is also designed to seamlessly connect distributed architecture, including remote offices, branches, and even individual ATMs. Some of our financial industry clients previously used up to 20 different monitoring tools. Each alert sent hundreds of emails to different people, making it impossible to effectively monitor the environment. Naturally, they found Zabbix’s ability to monitor many thousands of devices and “single pane of glass” view to be a significant upgrade.

Zabbix for education

In an age of digital course materials and resources, schools and universities can’t operate without functioning IT infrastructures. Our clients in education typically have heterogeneous infrastructures with thousands of servers and clients. They also possess all kinds of connected devices, dozens of different operating systems, multiple locations, and hundreds of IT staff.

Zabbix has proven itself to be a simple, cost-effective method of monitoring geographically distributed campuses and educational sites. We’ve done this by:

  • Providing early notification of possible viruses, worms, Trojan horses, and other transmitters of system infection
  • Monitoring IT systems for intellectual property (IP) protection purposes
  • Saving human resources by reducing manual work

Zabbix for government

Network monitoring is critical for government agencies, as downtime can bring a halt to vital public services. Our public-sector clients range from city-wide public transportation companies all the way up to entire prefectures. They use Zabbix to monitor the availability of utilities, transport, lighting, and many other public services.

In the process, Zabbix increases the effectiveness of budget expenditures by providing precise and accountable data on how public resources are used. This makes it easier to justify further expenditures. In most business software, agents are required for each monitored host and costs increase in proportion to the number of monitored hosts. By contrast, Zabbix is open source and the software itself is free of charge, resulting in anticipated cost reductions of up to 25% in many cases.

Zabbix for retail

Retail environments increasingly depend on network-connected equipment, particularly when it comes to warehouse monitoring and tracking SKUs (stock keeping units). Zabbix delivers an all-in-one tool to monitor different applications, metrics, processes, and equipment while providing a complete picture about the availability and performance of all the components that make a retail business successful. This makes it possible for retailers to easily automate store openings and closings, monitor cash machines, and keep track of access system log entries.

Not only that, the quantity and quality of information that Zabbix collects makes it easy for retailers to conduct a more accurate analysis of what is happening (or what may happen) and take preventive measures. Our retail clients find that having this level of control over their resources and services increases the confidence of their teams as well as their customers.

Zabbix for telecom

Internet, telephony, and television verticals require availability and consistency. The key to success is providing your services 24/7/365.

Zabbix makes this possible by providing full visibility of all network and customer devices, allowing operators to know of any outage before customers do and take necessary actions. Some of our telecommunications clients are able to effortlessly monitor well over 100,000 devices with a single Zabbix server. This helps them improve the customer experience and driving growth in the process.

Zabbix for aerospace

In the aerospace industry, timely data delivery and issue notification are the keys to safe operations. Aircraft depend on complex electronic systems that can diagnose the slightest deviations and make malfunctions known. Unfortunately, this is often in the form of either an indicator light on an instrument panel or a log message that is accessible only with specialized software or tools.

With Zabbix, all data transfers from the aircraft’s diagnostic system to the responsible employees can happen automatically. Error prioritization and escalation to further levels can also happen automatically if any aircraft has an ongoing issue that remains active for multiple days.

Conclusion

At Zabbix, our goal is a world without interruptions, powered by a world-class universal monitoring solution that’s available and affordable to any business. Our open-source software allows you to monitor your entire IT stack, no matter what size your infrastructure is or where it’s hosted.

That’s why government institutions across the globe as well as some of the world’s largest companies trust us with their network monitoring needs.

Get in touch with us to learn more and get started on the path to maximum efficiency and uptime today!

 

The post The Zabbix Advantage for Business appeared first on Zabbix Blog.

Monitoring the London Underground with Nathan Liefting and Adan Mohamed

Post Syndicated from Michael Kammer original https://blog.zabbix.com/monitoring-the-london-underground-with-nathan-liefting-and-adan-mohamed/26693/

With just a few days remaining until Zabbix Summit 2023, our series of speaker interviews draws to a close as we talk to Opensource ICT Solutions trainer and consultant Nathan Liefting about how he worked with Adan Mohamed of Boldyn Networks to monitor the London Underground with Zabbix.

Please tell us a bit about yourself and your work.

I’m a Zabbix trainer and consultant for Opensource ICT Solutions. You might also know me from the books Brian van Baekel and I wrote called “Zabbix IT Infrastructure Monitoring.”

How long have you been using Zabbix? What kind of daily Zabbix tasks are you involved in at your company?

My tasks are easy to explain – Zabbix, Zabbix, and some more Zabbix! Opensource ICT Solutions is one of the few companies that focus solely on Zabbix, so I get to work full time with the product, 40 hours a week. I build new environments, integrations, automations, and anything that you might need for your Zabbix environment.

Can you give us a sneak peek at what we can expect to hear during your Zabbix Summit speech?

Definitely! Adan from Boldyn Networks and I will be presenting you with a real use case for Zabbix monitoring. We’ll have a look at how Boldyn has brought broadband network connectivity to the London Underground tunnels and why it’s so important to monitor the equipment that makes that all possible. Of course, since this is THE Zabbix summit, we’ll also look at what the Zabbix setup looks like and share a pretty interesting use case for SNMP traps.

How and why did you come to the decision to use Zabbix as the monitoring solution for your use case?

Boldyn was looking for the best network monitoring solution for their project. Since we offer exactly that, we got to talking and we decided that our favorite open-source network monitoring tool was the way to go. Since then, we’ve been building amazing custom monitoring implementations together. The rest is history.

Can you mention some other noteworthy non-standard Zabbix monitoring use cases that you’ve worked on?

Definitely! Since I get to work on monitoring all day long, we’ve got a lot to choose from. I do most of my work in the Netherlands, the United Kingdom, and the United States, and all those markets are super exciting. We’re monitoring infrastructure that keeps planes flying safely and makes sure power grids are up and running, and now we’re also helping to keep people connected to the internet even when they go underground. If you ask me, it doesn’t get a lot more exciting than that – and that’s just the tip of the iceberg.

The post Monitoring the London Underground with Nathan Liefting and Adan Mohamed appeared first on Zabbix Blog.

Simplifying Digital Transformation with Marianna Portela

Post Syndicated from Michael Kammer original https://blog.zabbix.com/simplifying-digital-transformation-with-marianna-portela/26609/

To help everyone in our community get up to speed with Zabbix Summit speakers and their topics, we’re continuing our series of interviews and sitting down for a chat with Marianna Portela of Brazilian mass media conglomerate Globo. Read on to get a preview of her Summit speech topic and see how she uses Zabbix to bring massive live events to millions of users around the globe.

Please tell us a bit about yourself and your work.

I’m a tech lead at Globo, the largest media group in Latin America. It includes over-the-air broadcasting, television and film production, a pay television subscription service, streaming media, publishing, and online services.

How long have you been using Zabbix? What kind of daily Zabbix tasks are you involved in at your company?

I have been working at Globo for 15 years. I’ve been involved in monitoring for 11 of those years, and I’ve been using Zabbix for 10. I help monitor the applications that generate data for live events, and I use Zabbix to generate metrics that support decision-making related to better content delivery quality.

Can you name a few of the specific challenges that Zabbix has helped you solve?

Zabbix allows us to empower our users and supports our entire digital transformation – including many things related to Globoplay streaming. It also helps us monitor live event infrastructure, like the Olympics and World Cup. Previously, when there were technical issues during live events, we would try to figure out what happened after the fact, but no longer – Zabbix gives us a proactive analysis of potential occurrences within live production.

Can you give us a sneak peek at what we can expect to hear during your Zabbix Summit speech?

I’m planning to talk about how we use Zabbix to help ensure the quality monitoring of live production, which is essentially the production and the part of Globo that deals with any type of live event and generates data for things like games, for example. I’ll introduce how we started with actual infrastructure monitoring and how this digital transformation at Globo began, specifically how we managed to enter new areas like content generation, especially live content. Then I’ll also discuss some specifics of how we monitor live event infrastructure.

The post Simplifying Digital Transformation with Marianna Portela appeared first on Zabbix Blog.

What is Server Monitoring? Everything You Need to Know

Post Syndicated from Michael Kammer original https://blog.zabbix.com/what-is-server-monitoring-everything-you-need-to-know/26617/

Servers are the foundation of a company’s IT infrastructure, and the cost of server downtime can include anything from days without system access to the loss of important business data. This can lead to operational issues, service outages, and steep repair costs.

Viewed against this backdrop, server monitoring is an investment with massive benefits to any organization. The latest generation of server monitoring tools make it easier to assess server health and deal with any underlying issues as quickly and painlessly as possible.

What are servers, and how do they work?

Servers are computers (or applications) that run software services for other computers or devices on a network. The computer takes requests from the client computers or devices and performs tasks in response to the requests. These tasks can involve processing data, providing content, or performing calculations. Some servers are dedicated to hosting web services, which are software services offered on any computer connected to the internet.

What is server monitoring? Why does it matter?

Servers are some of the most important pieces of any company’s IT infrastructure. If a server is offline, running slowly, or experiencing outages, website performance will be affected and customers may decide to go elsewhere. If an internal file server is generating errors, important business data like accounting files or customer records could be compromised.

A server monitoring system is designed to watch your systems and provide a number of key metrics regarding their operation. In general, server monitoring software tests for accessibility (making sure that the server is alive and can be reached) and response time (guaranteeing that it is running fast enough to keep users happy). What’s more, it sends notifications about missing or corrupt files, security violations, and other issues.

Server monitoring is most often used for processing data in real time, but quality server monitoring is also predictive, letting users know when disks will reach capacity and whether memory or CPU utilization is about to be throttled. By evaluating historical data, it’s possible to find out if a server’s performance is degrading over time and even predict when a complete crash might occur.

How can server monitoring help businesses?

Here are a few of the most important business benefits of server monitoring:

Server monitoring tools give you a bird’s-eye view of your server’s health and performance

A quality server monitoring tool keeps IT administrators aware of metrics like CPU usage, RAM, disk space, and network bandwidth. This helps them to see when servers are slowing down or failing, allowing them to act before users are affected.

Server monitoring simplifies process automation

IT teams have long checklists when it comes to managing servers. They need to monitor hard disk space, keep an eye on infrastructure, schedule system backups, and update antivirus software. They also need to be able to foresee and solve critical events, while managing any disruptions.

A server monitoring tool helps IT professionals by automating all or many aspects of these jobs. It can show whether a backup was successful, if software is patched, and whether a server is in good condition. This allows IT teams to focus on tasks that benefit more from their involvement and expertise.

Server monitoring makes it easier to retain customers as well as employees

Acting quickly when servers develop issues (or even before) makes sure that employee workflows aren’t disrupted, allowing them to perform their duties, see results, and reach their goals. It also guarantees a positive customer experience by providing early notification of any issues.

Server monitoring keeps costs down

By automating processes and tasks (and freeing up time in the process) server monitoring systems make the most of resources and reduce costs. And by solving potential issues before they affect the organization, they help businesses avoid lost revenue from unfinished employee tasks, operational delays, and unfinished purchases.

What should you look for in a server monitoring solution?

Now that you’re sold on the benefits of server monitoring, you’ll want to choose the server monitoring solution that’s right for you. Here are a few capabilities to keep in mind:

Ease of use

Does the solution include an intuitive dashboard that makes it easy to monitor events and react to problems quickly? It should, and it should also allow you to make the most of the data it exports by providing graphs, reports, and integrations.

Customer support

Is it easy to contact support? How quickly do they respond? A quality server monitoring solution will provide a defined SLA and stick to it with no exceptions.

Breadth of coverage

A good solution will support all the server types (hardware, software, on-premises, cloud) that your enterprise uses. It should also be flexible enough to support any server types you may implement in the future.

Alert management

There are a few important questions to ask when it comes to alerts:

  • Does the solution include a dashboard or display that makes it easy to track events and react to problems quickly?
  • Is it easy to set up alerts via the configuration of thresholds that trigger them? How are alerts delivered?
  • Does the solution have a way to help you determine why a problem has occurred, instead of just telling you that something has gone wrong without context?

What are some best practices to keep in mind?

Here are a few best practices that will help you avoid the more common server monitoring pitfalls:

Proactively check for failures

Keep a sharp eye out for any issues that may affect your software or hardware. The tools included with a good monitoring solution can alert you to errors caused by a corrupted database (for example) and let you know if a security incident has left important services disabled.

Don’t forget your historical data

Server problems rarely occur in a vacuum, so look into the context of issues that emerge. You can do that by exploring metrics across a specific period, typically between 30 to 90 days. For example, you may find that CPU temperature has increased within the past week, which may suggest a problem with a server cooling system.

Operate your hardware in line with recommended tolerance levels

File servers are commonly pushed to the limit, rarely getting a break. That’s why it’s important to monitor metrics like CPU utilization, RAM utilization, storage capacity usage, and CPU temperature. Check these metrics regularly to identify issues before it’s too late.

Keep track of alerts

Always monitor your alerts in real time as they occur and explore reliable ways to manage and prioritize them. When escalating an incident, make sure it goes to the right individual as soon as possible.

Use server monitoring data to plan short-term cloud capacity

Server monitoring systems can help you plan the right computing power for specific moments. If services become slower or users experience other problems with performance, an IT manager can assess the situation through the server monitor. They’ll then be able to allocate extra resources to solve the problem.

Take advantage of capacity planning

Data center workloads have almost doubled in the past 5 years, and servers have had to keep up with this ongoing change. Analyzing long-term server utilization trends can prepare you for future server requirements.

Go beyond asset management

With server monitoring, you can discover which systems are approaching the end of their lives and whether any assets have disappeared from your network. You can also let your server monitoring tool handle the heavy lifting for you when it comes to tracking physical hardware.

The Zabbix Advantage

Zabbix is designed to make server monitoring easy. Our solution allows you to track any possible server performance metrics and incidents, including server performance, availability, and configuration changes.

Intuitive dashboards, network graphs, and topology maps allow you to visualize server performance and availability, and our flexible alerting allows for multiple delivery methods and customized message content.

Not only that, our out-of-the-box templates come with preconfigured items, triggers, graphs, applications, screens, low-level discovery rules, and web scenarios – all designed to have you up and running in just a few minutes.

And because Zabbix is open-source, it’s not just affordable, it’s free. Contact us to find out more and enjoy the peace of mind that comes from knowing that your servers are under control.

FAQ

Why do we need server monitoring?

Server monitoring allows IT professionals to:

  • Monitor the responsiveness of a server
  • Know a server’s capacity, user load, and speed
  • Proactively detect and prevent any issues that might affect the server

Why do companies choose to monitor their servers?

Companies monitor servers so that they can:

  • Proactively identify any performance issues before they impact users
  • Understand a server’s system resource usage
  • Analyze a server for its reliability, availability, performance, security, etc.

How is server monitoring done?

Server monitoring tools constantly collect system data across an entire IT infrastructure, giving administrators a clear view of when certain metrics are above or below thresholds. They also automatically notify relevant parties if a critical system error is detected, allowing them to act in a timely manner to resolve issues.

What should you monitor on a server?

Key areas to monitor on a server include:

  • A server’s physical status
  • Server performance, including CPU utilization, memory resources, and disk activity
  • Server uptime
  • Page file usage
  • Context switches
  • Time synchronization
  • Process activity
  • Server capacity, user load, and speed

If I want to monitor a server, how easy is it to set things up?

Setting up a server monitoring tool is easy, provided you’ve taken into account these 5 steps:

  • Assess and create a monitoring plan
  • Discover how data can be collected
  • Define any and all metrics
  • Set up alerts
  • Have an established workflow

The post What is Server Monitoring? Everything You Need to Know appeared first on Zabbix Blog.