Tag Archives: news

Customize and Package Dependencies With Your Apache Spark Applications on Amazon EMR on Amazon EKS

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/customize-and-package-dependencies-with-your-apache-spark-applications-on-amazon-emr-on-amazon-eks/

Last AWS re:Invent, we announced the general availability of Amazon EMR on Amazon Elastic Kubernetes Service (Amazon EKS), a new deployment option for Amazon EMR that allows customers to automate the provisioning and management of Apache Spark on Amazon EKS.

With Amazon EMR on EKS, customers can deploy EMR applications on the same Amazon EKS cluster as other types of applications, which allows them to share resources and standardize on a single solution for operating and managing all their applications. Customers running Apache Spark on Kubernetes can migrate to EMR on EKS and take advantage of the performance-optimized runtime, integration with Amazon EMR Studio for interactive jobs, integration with Apache Airflow and AWS Step Functions for running pipelines, and Spark UI for debugging.

When customers submit jobs, EMR automatically packages the application into a container with the big data framework and provides prebuilt connectors for integrating with other AWS services. EMR then deploys the application on the EKS cluster and manages running the jobs, logging, and monitoring. If you currently run Apache Spark workloads and use Amazon EKS for other Kubernetes-based applications, you can use EMR on EKS to consolidate these on the same Amazon EKS cluster to improve resource utilization and simplify infrastructure management.

Developers who run containerized, big data analytical workloads told us they just want to point to an image and run it. Currently, EMR on EKS dynamically adds externally stored application dependencies during job submission.

Today, I am happy to announce customizable image support for Amazon EMR on EKS that allows customers to modify the Docker runtime image that runs their analytics application using Apache Spark on your EKS cluster.

With customizable images, you can create a container that contains both your application and its dependencies, based on the performance-optimized EMR Spark runtime, using your own continuous integration (CI) pipeline. This reduces the time to build the image and helps predicting container launches for a local development or test.

Now, data engineers and platform teams can create a base image, add their corporate standard libraries, and then store it in Amazon Elastic Container Registry (Amazon ECR). Data scientists can customize the image to include their application specific dependencies. The resulting immutable image can be vulnerability scanned, deployed to test and production environments. Developers can now simply point to the customized image and run it on EMR on EKS.

Customizable Runtime Images – Getting Started
To get started with customizable images, use the AWS Command Line Interface (AWS CLI) to perform these steps:

  1. Register your EKS cluster with Amazon EMR.
  2. Download the EMR-provided base images from Amazon ECR and modify the image with your application and libraries.
  3. Publish your customized image to a Docker registry such as Amazon ECR and then submit your job while referencing your image.

You can download one of the following base images. These images contain the Spark runtime that can be used to run batch workloads using the EMR Jobs API. Here is the latest full image list available.

Release Label Spark Hadoop Versions Base Image Tag
emr-5.32.0-latest Spark 2.4.7 + Hadoop 2.10.1 emr-5.32.0-20210129
emr-5.33-latest Spark 2.4.7-amzn-1 + Hadoop 2.10.1-amzn-1 emr-5.33.0-20210323
emr-6.2.0-latest Spark 3.0.1 + Hadoop 3.2.1 emr-6.2.0-20210129
emr-6.3-latest Spark 3.1.1-amzn-0 + Hadoop 3.2.1-amzn-3 emr-6.3.0:latest

These base images are located in an Amazon ECR repository in each AWS Region with an image URI that combines the ECR registry account, AWS Region code, and base image tag in the case of US East (N. Virginia) Region.

755674844232.dkr.ecr.us-east-1.amazonaws.com/spark/emr-5.32.0-20210129

Now, sign in to the Amazon ECR repository and pull the image into your local workspace. If you want to pull an image from a different AWS Region to reduce network latency, choose the different ECR repository that corresponds most closely to where you are pulling the image from US West (Oregon) Region.

$ aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 895885662937.dkr.ecr.us-west-2.amazonaws.com
$ docker pull 895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-5.32.0-20210129

Create a Dockerfile on your local workspace with the EMR-provided base image and add commands to customize the image. If the application requires custom Java SDK, Python, or R libraries, you can add them to the image directly, just as with other containerized applications.

The following example Docker commands are for a use case in which you want to install useful Python libraries such as Natural Language Processing (NLP) using Spark and Pandas.

FROM 895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-5.32.0-20210129
USER root
### Add customizations here ####
RUN pip3 install pyspark pandas spark-nlp // Install Python NLP Libraries
USER hadoop:hadoop

In another use case, as I mentioned, you can install a different version of Java (for example, Java 11):

FROM 895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-5.32.0-20210129
USER root
### Add customizations here ####
RUN yum install -y java-11-amazon-corretto // Install Java 11 and set home
ENV JAVA_HOME /usr/lib/jvm/java-11-amazon-corretto.x86_64
USER hadoop:hadoop

If you’re changing Java version to 11, then you also need to change Java Virtual Machine (JVM) options for Spark. Provide the following options in applicationConfiguration when you submit jobs. You need these options because Java 11 does not support some Java 8 JVM parameters.

"applicationConfiguration": [ 
  {
    "classification": "spark-defaults",
    "properties": {
        "spark.driver.defaultJavaOptions" : "
		    -XX:OnOutOfMemoryError='kill -9 %p' -XX:MaxHeapFreeRatio=70",
        "spark.executor.defaultJavaOptions" : "
		    -verbose:gc -Xlog:gc*::time -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
			-XX:OnOutOfMemoryError='kill -9 %p' -XX:MaxHeapFreeRatio=70 
			-XX:+IgnoreUnrecognizedVMOptions"
    }
  }
]

To use custom images with EMR on EKS, publish your customized image and submit a Spark workload in Amazon EMR on EKS using the available Spark parameters.

You can submit batch workloads using your customized Spark image. To submit batch workloads using the StartJobRun API or CLI, use the spark.kubernetes.container.image parameter.

$ aws emr-containers start-job-run \
    --virtual-cluster-id <enter-virtual-cluster-id> \
    --name sample-job-name \
    --execution-role-arn <enter-execution-role-arn> \
    --release-label <base-release-label> \ # Base EMR Release Label for the custom image
    --job-driver '{
        "sparkSubmitJobDriver": {
        "entryPoint": "local:///usr/lib/spark/examples/jars/spark-examples.jar",
        "entryPointArguments": ["1000"],
        "sparkSubmitParameters": [ "--class org.apache.spark.examples.SparkPi --conf spark.kubernetes.container.image=123456789012.dkr.ecr.us-west-2.amazonaws.com/emr5.32_custom"
		  ]
      }
  }'

Use the kubectl command to confirm the job is running your custom image.

$ kubectl get pod -n <namespace> | grep "driver" | awk '{print $1}'
Example output: k8dfb78cb-a2cc-4101-8837-f28befbadc92-1618856977200-driver

Get the image for the main container in the Driver pod (Uses jq).

$ kubectl get pod/<driver-pod-name> -n <namespace> -o json | jq '.spec.containers
| .[] | select(.name=="spark-kubernetes-driver") | .image '
Example output: 123456789012.dkr.ecr.us-west-2.amazonaws.com/emr5.32_custom

To view jobs in the Amazon EMR console, under EMR on EKS, choose Virtual clusters. From the list of virtual clusters, select the virtual cluster for which you want to view logs. On the Job runs table, select View logs to view the details of a job run.

Automating Your CI Process and Workflows
You can now customize an EMR-provided base image to include an application to simplify application development and management. With custom images, you can add the dependencies using your existing CI process, which allows you to create a single immutable image that contains the Spark application and all of its dependencies.

You can apply your existing development processes, such as vulnerability scans against your Amazon EMR image. You can also validate for correct file structure and runtime versions using the EMR validation tool, which can be run locally or integrated into your CI workflow.

The APIs for Amazon EMR on EKS are integrated with orchestration services like AWS Step Functions and AWS Managed Workflows for Apache Airflow (MWAA), allowing you to include EMR custom images in your automated workflows.

Now Available
You can now set up customizable images in all AWS Regions where Amazon EMR on EKS is available. There is no additional charge for custom images. To learn more, see the Amazon EMR on EKS Development Guide and a demo video how to build your own images for running Spark jobs on Amazon EMR on EKS.

You can send feedback to the AWS forum for Amazon EMR or through your usual AWS support contacts.

Channy

Introducing a Public Registry for AWS CloudFormation

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/introducing-a-public-registry-for-aws-cloudformation/

AWS CloudFormation and the AWS Cloud Development Kit (CDK) provide scalable and consistent provisioning of AWS resources (for example, compute infrastructure, monitoring tools, databases, and more). We’ve heard from many customers that they’d like to benefit from the same consistency and scalability when provisioning resources from AWS Partner Network (APN) members, third-party vendors, and open-source technologies, regardless of whether they are using CloudFormation templates or have adopted the CDK to define their cloud infrastructure.

I’m pleased to announce a new public registry for CloudFormation, providing a searchable collection of extensions – resource types or modules – published by AWS, APN partners, third parties, and the developer community. The registry makes it easy to discover and provision these extensions in your CloudFormation templates and CDK applications in the same manner you use AWS-provided resources. Using extensions, you no longer need to create and maintain custom provisioning logic for resource types from third-party vendors. And, you are able to use a single infrastructure as code tool, CloudFormation, to provision and manage AWS and third-party resources, further simplifying the infrastructure provisioning process (the CDK uses CloudFormation under the hood).

Launch Partners
We’re excited to be joined by over a dozen APN Partners for the launch of the registry, with more than 35 extensions available for you to use today. Blog posts and announcements from the APN Partners who collaborated on this launch, along with AWS Quick Starts, can be found below (some will be added in the next few days).

Registries and Resource Types
In 2019, CloudFormation launched support for private registries. These enabled registration and use of resource providers (Lambda functions) in your account, including providers from AWS and third-party vendors. After you registered a provider you could use resource types, comprised of custom provisioning logic, from the provider in your CloudFormation templates. Resource types were uploaded by providers to an Amazon Simple Storage Service (Amazon S3) bucket, and you used the types by referencing the relevant S3 URL. The public registry provides consistency in the sourcing of resource types and modules, and you no longer need to use a collection of Amazon Simple Storage Service (Amazon S3) buckets.

Third-party resource types in the public registry also integrate with drift detection. After creating a resource from a third-party resource type, CloudFormation will detect changes to the resource from its template configuration, known as configuration drift, just as it would with AWS resources. You can also use AWS Config to manage compliance for third-party resources consumed from the registry. The resource types are automatically tracked as Configuration items when you have configured AWS Config to record them, and used CloudFormation to create, update, and delete them. Whether the resource types you use are third-party or AWS resources, you can view configuration history for them, in addition to being able to write AWS Config rules to verify configuration best practices.

The public registry also supports Type Configuration, enabling you to configure third-party resource types with API keys and OAuth tokens per account and region. Once set, the configuration is stored securely and can be updated. This also provides a centralized way to configure third-party resource types.

Publishing Extensions to the Public Registry
Extension publishers must be verified as AWS Marketplace sellers, or as GitHub or BitBucket users, and extensions are validated against best practices. To publish extensions (resource types or modules) to the registry, you must first register in an AWS Region, using one of the mentioned account types.

After you’ve registered, you next publish your extension to a private registry in the same Region. Then, you need to test that the extension meets publishing requirements. For a resource type extension, this means it must pass all the contract tests defined for the type. Modules are subject to different requirements, and you can find more details in the documentation. With testing complete, you can publish your extension to the public registry for your Region. See the user guide for detailed information on publishing extensions.

Using Extensions in the Public Registry
I decided to try a couple of extensions related to Kubernetes, contributed by AWS Quick Starts, to make configuration changes to a cluster. Personally, I don’t have a great deal of experience with Kubernetes and its API so this was a great chance to examine how extensions could save me significant time and effort. During the process of writing this post I learned from others that using the Kubernetes API (the usual way to achieve the changes I had in mind) would normally involve effort even for those with more experience.

For this example I needed a Kubernetes cluster, so I followed this tutorial to set one up in Amazon Elastic Kubernetes Service (EKS), using the Managed nodes – Linux node type. With my cluster ready, I want to make two configuration changes.

First, I want to add a new namespace to the cluster. A namespace is a partitioning construct that lets me deploy the same set of resources to different namespaces in the same cluster without conflict thanks to the isolation namespaces provide. Second, I want to set up and use Helm, a package manager for Kubernetes. I’ll use Helm to install the kube-state-metrics package from the Prometheus helm-charts repository for gathering cluster metrics. While I can use CloudFormation to provision clusters and compute resources, previously, to perform these two configuration tasks, I’d have had to switch to the API or various bespoke tool chains. With the registry, and these two extensions, I can now do everything using CloudFormation (and of course, as I mentioned earlier, I could also use the extensions with the CDK, which I’ll show later).

Before using an extension, it needs to be activated in my account. While activation is easy to do for single accounts using the console, as we’ll see in a moment, if I were using AWS Organizations and wanted to activate various third-party extensions across my entire organization, or for a specific organization unit (OU), I could achieve this using Service-Managed StackSets in CloudFormation. Using the resource type AWS::CloudFormation::TypeActivation in a template submitted to a Service-Managed StackSet, I can target an entire Organization, or a particular OU, passing the Amazon Resource Name (ARN) identifying the third-party extension to be activated. Activation of extensions is also very easy to achieve (whether using AWS Organizations or not) using the CDK with just a few lines of code, again making use of the aforementioned TypeActivation resource type.

To activate the extensions, I head to the CloudFormation console and click Public extensions from the navigation bar. This takes me to the Registry:Public extensions home page, where I switch to viewing third party resource type extensions.

Viewing third-party types in the registry

The extensions I want are AWSQS::Kubernetes::Resource and AWSQS::Kubernetes::Helm. The Resource extension is used to apply a manifest describing configuration changes to a cluster. In my case, the manifest requests a namespace be created. Clicking the name of the AWSQS::Kubernetes::Resource extension takes me to a page where I can view schema, configuration details, and versions for the extension.

Viewing details of the Resource extension

What happens if you deactivate an extension you’re using, or an extension is withdrawn by the publisher? If you deactivate an extension a stack depends on, any resources created from that extension won’t be affected, but you’ll be unable to perform further stack operations, such as Read, Update, Delete, and List (these will fail until the extension is re-activated). Publishers must request their extensions be withdrawn from the registry (there is no “delete” API). If the request is granted, customers who activated the extension prior to withdrawal can still perform Create/Read/Update/Delete/List operations, using what is effectively a snapshot of the extension in their account.

Clicking Activate takes me to a page where I need to specify the ARN of an execution role that CloudFormation will assume when it runs the code behind the extension. I create a role following this user guide topic, but the basic trust relationship is below for reference.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "resources.cloudformation.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

I also add permissions for the resource types I’m using to my execution role. Details on the permissions needed for the types I chose can be found on GitHub, for Helm, and for Kubernetes (note the GitHub examples include the trust relationship too).

When activating an extension, I can elect to use the default name, which is how I will refer to the type in my templates or CDK applications, or I can enter a new name. The name chosen has to be unique within my account, so if I’ve enabled a version of an extension with its default name, and want to enable a different version, I must change the name. Once I’ve filled in the details, and chosen my versioning strategy (extensions use semantic versioning, and I can elect to accept automatic updates for minor version changes, or to “lock” to a specific version) clicking Activate extension completes the process.

Activating an extension from the registry

That completes the process for the first extension, and I follow the same steps for the AWSQS::Kubernetes::Helm extension. Navigating to Activated extensions I can view a list of all my enabled extensions.

Viewing the list of enabled extensions

I have one more set of permissions to update. Resource types make calls to the Kubernetes API on my behalf so I need to update the aws-auth ConfigMap for my cluster to reference the execution role I just used, otherwise the calls made by the resource types I’m using will fail. To do this, I run the command kubectl edit cm aws-auth -n kube-system at a command prompt. In the text editor that opens, I update the ConfigMap with a new group referencing my CfnRegistryExtensionExecRole, shown below (if you’re following along, be sure to change the account ID and role name to match yours).

apiVersion: v1
data:
  mapRoles: |
    - groups:
      - system:bootstrappers
      - system:nodes
      rolearn: arn:aws:iam::111122223333:role/myAmazonEKSNodeRole
      username: system:node:{{EC2PrivateDNSName}}
    - groups:
      - system:masters
      rolearn: arn:aws:iam::111122223333:role/CfnRegistryExtensionExecRole
      username: cfnresourcetypes
kind: ConfigMap
metadata:
  creationTimestamp: "2021-06-04T20:44:24Z"
  name: aws-auth
  namespace: kube-system
  resourceVersion: "6355"
  selfLink: /api/v1/namespaces/kube-system/configmaps/aws-auth
  uid: dc91bfa8-1663-45d0-8954-1e841913b324

Now I’m ready to use the extensions to configure my cluster with a new namespace, Helm, and the kube-state-metrics package. I create a CloudFormation template that uses the extensions, adding parameters for the elements I want to specify when creating a stack: the name of the cluster to update, and the namespace name. The properties for the KubeStateMetrics resource reference the package I want Helm to install.

AWSTemplateFormatVersion: "2010-09-09"
Parameters:
  ClusterName:
    Type: String
  Namespace:
    Type: String
Resources:
  KubeStateMetrics:
    Type: AWSQS::Kubernetes::Helm
    Properties:
      ClusterID: !Ref ClusterName
      Name: kube-state-metrics
      Namespace: !GetAtt KubeNamespace.Name
      Repository: https://prometheus-community.github.io/helm-charts
      Chart: prometheus-community/kube-state-metrics
  KubeNamespace:
    Type: AWSQS::Kubernetes::Resource
    Properties:
      ClusterName: !Ref ClusterName
      Namespace: default
      Manifest: !Sub |
        apiVersion: v1
        kind: Namespace
        metadata:
          name: ${Namespace}
          labels:
            name: ${Namespace}

On the Stacks page of the CloudFormation console, I click Create stack, upload my template, and then give my stack a name and the values for my declared parameters.

Launching a stack with my activated extensions

I click Next to proceed through the rest of the wizard, leaving other settings at their default values, and then Create stack to complete the process.

Once stack creation is complete, I verify my changes using the kubectl command line tool. I first check that the new namespace, newsblog-sample-namespace, is present with the command kubectl get namespaces. I then run the kubectl get all --namespace newsblog-sample-namespace command to verify the kube-state-metrics package is installed.

Verifying the extensions applied by changes

Extensions can also be used with the AWS Cloud Development Kit. To wrap up this exploration of using the new registry, I’ve included an example below of a CDK application snippet in TypeScript that achieves the same effect, using the same extensions, as the YAML template I showed earlier (I could also have written this using any of the languages supported by the CDK – C#, Java, or Python).

import {Stack, Construct, CfnResource} from '@aws-cdk/core';
export class UnoStack extends Stack {
  constructor(scope: Construct, id: string) {
    super(scope, id);
    const clusterName = 'newsblog-cluster';
    const namespace = 'newsblog-sample-namespace';

    const kubeNamespace = new CfnResource(this, 'KubeNamespace', {
      type: 'AWSQS::Kubernetes::Resource',
      properties: {
        ClusterName: clusterName,
        Namespace: 'default',
        Manifest: this.toJsonString({
          apiVersion: 'v1',
          kind: 'Namespace',
          metadata: {
            name: namespace,
            labels: {
              name: namespace,
            }
          },
        }),
      },
    });
    
    new CfnResource(this, 'KubeStateMetrics', {
      type: 'AWSQS::Kubernetes::Helm',
      properties: {
        ClusterID: clusterName,
        Name: 'kube-state-metrics',
        Namespace: kubeNamespace.getAtt('Name').toString(),
        Repository: 'https://prometheus-community.github.io/helm-charts',
        Chart: 'prometheus-community/kube-state-metrics',
      },
    });
  }
};

As mentioned earlier in this post, I don’t have much experience with the Kubernetes API, and Kubernetes in general. However, by making use of the resource types in the public registry, in conjunction with CloudFormation, I was able to easily configure my cluster using a familiar environment, without needing to resort to the API or bespoke tool chains.

Get Started with the CloudFormation Public Registry
Pricing for the public registry is the same as for the existing registry and private resource types. There is no additional charge for using native AWS resource types; for third-party resource types you will incur charges based on the number of handler operations (add, delete, list, etc.) you run per month. For details, see the AWS CloudFormation Pricing page. The new public registry is available today in the US East (N. Virginia, Ohio), US West (Oregon, N. California), Canada (Central), Europe (Ireland, Frankfurt, London, Stockholm, Paris, Milan), Asia Pacific (Hong Kong, Mumbai, Osaka, Singapore, Sydney, Seoul, Tokyo), South America (Sao Paulo), Middle East (Bahrain), and Africa (Cape Town) AWS Regions.

For more information, see the AWS CloudFormation User Guide and User Guide for Extension Development, and start publishing or using extensions today!

— Steve

Migrate Your Workloads with the Graviton Challenge!

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/migrate-your-workloads-with-the-graviton-challenge/

Today, Dave Brown, VP of Amazon EC2 at AWS, announced the Graviton Challenge as part of his session on AWS silicon innovation at the Six Five Summit 2021. We invite you to take the Graviton Challenge and move your applications to run on AWS Graviton2. The challenge, intended for individual developers and small teams, is based on the experiences of customers who’ve already migrated. It provides a framework of eight, approximately four-hour chunks to prepare, port, optimize, and finally deploy your application onto Graviton2 instances. Getting your application running on Graviton2, and enjoying the improved price performance, aren’t the only rewards. There are prizes and swag for those who complete the challenge!

AWS Graviton2 is a custom-built processor from AWS that’s based on the Arm64 architecture. It’s supported by popular Linux operating systems including Amazon Linux 2, Red Hat Enterprise Linux, SUSE Linux Enterprise Server, and Ubuntu. Compared to fifth-generation x86-based Amazon Elastic Compute Cloud (Amazon EC2) instance types, Graviton2 instance types have a 20% lower cost. Overall, customers who have moved applications to Graviton2 typically see up to 40% better price performance for a broad range of workloads including application servers, container-based applications, microservices, caching fleets, data analytics, video encoding, electronic design automation, gaming, open-source databases, and more.

Before I dive in to talk more about the challenge, check out the fun introductory video below from Jeff Barr, Chief Evangelist, AWS and Dave Brown, Vice President, EC2. As Jeff mentions in the video: same exact workload, same or better performance, and up to 40% better price performance!

After you complete the challenge, we invite you to tell us about your adoption journey and enter the contest. If you post on social media with the hashtag #ITookTheGravitonChallenge, you’ll earn a t-shirt. To earn a hoodie, include a short video with your post.

To enter the competition, you’ll need to create a 5 to 10-minute video that describes your project and the application you migrated, any hurdles you needed to overcome, and the price performance benefits you realized.

All valid contest entries will each receive a $500 AWS credit (limited to 500 quantity). A panel of judges will evaluate the content entries and award additional prizes across six categories. All category winners will receive an AWS re:Invent 2021 conference pass, flight, and hotel for one company representative, and winners will be able to meet with senior members of the Graviton2 team at the conference. Here are additional category-specific prizes:

  • Best adoption – enterprise
    Based on the performance gains, total cost savings, number of instances the workload is running on, and time taken to migrate the workload (faster is better), for companies with over 1000 employees. The winner will also receive a chance to present at the conference.
  • Best adoption – small/medium business
    Based on the performance gains, total cost savings, number of instances the workload is running on, and time taken to migrate the workload (faster is better), for companies with 100-1000 employees. The winner will also receive a chance to present at the conference.
  • Best adoption – startup
    Based on the performance gains, total cost savings, number of instances the workload is running on, and time taken to migrate the workload (faster is better), for companies with fewer than 100 employees. The winner will also receive a chance to present at the conference.
  • Best new workload adoption
    Awarded to a workload that’s new to EC2 (migrated to Graviton2 from on-premises, or other cloud) based on the performance gains, total cost savings, number of instances the workload is running on, and time taken to migrate the workload (faster is better). The winner will also receive a chance to participate in a video or written case study.
  • Most impactful adoption
    Awarded to the workload with the biggest social impact based on details provided about what the workload/application does. Applications in this category are related to fields such as sustainability, healthcare and life sciences, conservation, learning/education, justice/equity. The winner will also receive a chance to participate in a video or written case study.
  • Most innovative adoption
    Applications in this category solve unique problems for their customers, address new use cases, or are groundbreaking. The award will be based on the workload description, price performance gains, and total cost savings. The winner will also receive a chance to participate in a video or written case study.

Competition submissions open on June 22 and close August 31. Winners will be announced on October 1 2021.

Identifying a workload to migrate
Now that you know what’s possible with Graviton2, you’re probably eager to get started and identify a workload to tackle as part of the challenge. The ideal workload is one that already runs on Linux and uses open-source components. This means you’ll have full access to the source code of every component and can easily make any required changes. If you don’t have an existing Linux workload that is entirely open-source based, you can, of course, move other workloads. A robust ecosystem of ISVs and AWS services already support Graviton2. However, if you are using software from a vendor that does not support Arm64/Graviton2, reach out to the Graviton Challenge Slack channel for support.

What’s involved in the challenge?
The challenge includes eight steps performed over four days (but you don’t have to do the challenge in four consecutive days). If you need assistance from Graviton2 experts, a dedicated Slack channel is available and you can sign up for emails containing helpful tips and guidance. In addition to support on Slack and supporting emails, you also get $25 AWS credit to cover the cost of the taking the challenge. Graviton2-based burstable T4g instances also have a free trial, available until December 31 2021, that can be used to qualify your workloads.

You can download the complete whitepaper can be downloaded from the Graviton Challenge page, but here is an outline of the process.

Day 1: Learn and explore
The first day you’ll learn about Graviton2 and then assess your selected workload. I recommend that you start by checking out the 2020 AWS re:Invent session, Deep dive on AWS Graviton2 processor-powered EC2 instances. The Getting Started with AWS Graviton GitHub repository will be a useful reference as you work through the challenge.

Assessment involves identifying the application’s dependencies and requirements. As with all preparatory work, the more thorough you are at this stage, the better positioned you are for success. So, don’t skimp on this task!

Day 2: Create a plan and start porting
On the second day, you’ll create a Graviton2 environment. You can use EC2 virtual machine instances with AWS-provided images or build your own custom images. Alternatively, you can go the container route, because both Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (EKS) support Graviton2-based instances.

After you have created your environment, you’ll bootstrap the application. The Getting Started Guide on GitHub contains language-specific getting started information. If your application uses Java, Python, Node.js, .NET, or other high-level languages, then it might run as-is or need minimal changes. Other languages like C, C++, or Go will need to be compiled for the 64-bit Arm architecture. For more information, see the guides on GitHub.

Day 3: Debug and optimize
Now that the application is running on a Graviton2 environment, it’s time to test and verify its functionality. When you have a fully functional application, you can test performance and compare it to x86-64 environments. If you don’t observe the expected performance, reach out to your account team, or get support on the Graviton Challenge Slack channel. We’re here to help analyze and resolve any potential performance gaps.

Day 4: Update infrastructure and start deployments
It’s shipping day! You’ll update your infrastructure to add Graviton2-based instances, and then start deploying. We recommend that you use canary or blue-green deployments so that a portion of your traffic is redirected to the new environments. When you’re comfortable, you can transition all traffic.

At this point, you can celebrate completing the challenge, publish a post on social media using the #ITookTheGravitonChallenge hashtag, let us know about your success, and consider entering the competition. Remember, entries for the competition are due by August 31, 2021.

Start the challenge today!
Now that you have some details about the challenge and rewards, it’s time to start your (migration) engines. Download the whitepaper from the Graviton Challenge landing page, familiarize yourself with the details, and off you go! And, if you do decide to enter the competition, good luck!

Footnote
In my role as a .NET Developer Advocate at AWS, I would be remiss if I failed to mention that this challenge is equally applicable to .NET applications using .NET Core or .NET 5 and later! In fact, .NET 5 includes ARM64-specific optimizations. For information about performance improvements my colleagues found for .NET applications running on AWS Graviton2, see the Powering .NET 5 with AWS Graviton2: Benchmarks blog post. There’s also a lab for .NET 5 on Graviton2. I invite you to check out the getting started material for .NET in the aws-graviton-getting-started GitHub repository and start migrating.

— Steve

Heads Up – AWS News Blog RSS Feed Change

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/heads-up-aws-news-blog-rss-feed-change/

TL;DR – If you are using the ancient Feedburner feed for this blog, please change to the new one (https://aws.amazon.com/blogs/aws/feed/) ASAP.

Back in the early days of AWS, I paid a visit to the Chicago headquarters of a cool startup called FeedBurner. My friend Matt Shobe demo’ed their new product to me and showed me how “burning” a raw RSS feed could add value to it in various ways. Upon my return to Seattle I promptly burned the RSS feed of the original (TypePad-hosted) version of this blog, and shared that feed for many years.

FeedBurner has served us well over the years, but its time has passed. The company was acquired many years ago and the new owners are now putting the product into maintenance mode.

While the existing feed will continue to work for the foreseeable future, I would like to encourage you to use the one directly generated by this blog instead. Simply update your feed reader to refer to https://aws.amazon.com/blogs/aws/feed/ .

Jeff;

In the Works – AWS Region in Tel Aviv, Israel

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-aws-region-in-tel-aviv-israel/

We launched three AWS Regions (Italy, South Africa, and Japan) in the last 12 months, and are working on regions in Australia, Indonesia, Spain, India, Switzerland, and the United Arab Emirates.

Tel Aviv, Israel in the Works
Today I am happy to announce that the AWS Israel (Tel Aviv) Region is in the works and will open in the first half of 2023. This region will have three Availability Zones and will give AWS customers in Israel the ability to run workloads and store data that must remain in-country.

There are 81 Availability Zones within 25 AWS Regions in operation today, with 21 more Availability Zones and seven announced regions (including this one) underway.

As is always the case with an AWS Region, each of the Availability Zones will be a fully isolated part of the AWS infrastructure. The AZs in this region will be connected together via high-bandwidth, low-latency network connections over dedicated, fully redundant metro fiber. This connectivity supports applications that need synchronous replication between AZs for availability or redundancy. You can take a peek at the AWS Global Infrastructure page to learn more about how we design and build regions and AZs.

AWS in Israel
I first visited Israel in 2013 and have been back several (but definitely not enough) times since then. I have spoken at several AWS Summits and visited many of early customers in the area. Today, AWS has the following resources on the ground in Israel:

Israel is also home to Annapurna Labs, an Amazon.com subsidiary that is responsible for developing much of the innovative hardware that powers AWS.

In addition, the government of Israel announced that it has selected AWS as the primary cloud provider for the Nimbus Project. As part of this project, government ministries and subsidiaries in Israel will use cloud computing to power a digital transformation and to provide new digital services for the citizens of Israel.

Stay Tuned
We’ll announce the opening of the AWS Israel (Tel Aviv) Region in a forthcoming blog post, so be sure to stay tuned!

Jeff;

 

What’s new in Zabbix 5.4

Post Syndicated from Alexei Vladishev original https://blog.zabbix.com/whats-new-in-zabbix-5-4/14603/

Zabbix 5.4.0 released on May 17, 2021 — a non-LTS release that will be supported only for 7-8 months, has already received a lot of attention from our users, our community, and our customers due to a number of very significant and long-anticipated improvements. Zabbix 5.4.0 release comes with scheduled PDF report generation, robust problem detection, advanced data aggregation, and other significant improvements.

Contents

I. Reporting and visualization (1:22)
II. More powerful and simple (9:04)
III. Breaking changes (37:59)
IV. Upgrade notes (40:12)
V. Questions & Answers (44:01)

Reporting and visualization

Unification of screens

In Zabbix 5.2, we introduced pre-defined views for Problems. By accessing Monitoring > Problems, you may create different views based on different filtering options, allowing you to filter problems by certain criteria, and then save this filter as a separate view. You can easily switch between views with one click, for instance, between ‘All problems‘, ‘Services‘ (i.e. service-related problems), or ‘High severity problems‘ in this example.

Pre-defined views for Monitoring > Problems

In Zabbix 5.4, we implemented the unification of screens and dashboards. This means that screens are not supported anymore. In Zabbix 5.2, we had Dashboard and Screens on the menu. In Zabbix 5.4, the Screens functionality was moved to Dashboards, where all screens and all dashboards are available, which makes the workflow much more simple and user-friendly.

New Dashboard menu time

This change affected global screens, as well as local screens, which we always had on a template level. Now, we have introduced the dashboards for templates in Configuration > Templates.

For instance, we have the template for Nginx performance monitoring, so in Configuration > Templates, we have a dashboard ‘Nginx performance’. In Monitoring > Hosts, we have two dashboards for the host (cdn.example.com in this example) — ‘Nginx performance’ and the second template that may have come from some other template.

Templates for Dashboards

By clicking Dashboards for this specific host, you can go to the Dashboards view of this host.

Dashboards view for a specific host

Here, you can quickly switch between templates available for the host at the moment, such as ‘Nginx performance’ and ‘System performance’ in this example to see some Linux OS-specific metrics, such as CPU load, Disk I/O, and so on.

Multi-page dashboards

Previously, we had a very nice feature in Zabbix — slideshows or screen slideshows. Since we have moved everything to Dashboards, we spent much time thinking about how to fit it into the Dashboard functionality and found a very good solution. We introduced Multi-page Dashboards – dashboards containing several pages.

Multi-page Dashboards

In this example, in the Zabbix ‘Server performance‘ dashboard, you can see CPU memory metrics, matrix or graphs related to network performance, or any other page that can be created by clicking Add page and defining the Dashboard page properties. Then you’ll see all the pages available on the top of the Dashboard and switch easily between them. You can run the slideshow with the slideshow controls available in the full-screen mode.

Slideshow mode

Scheduled PDF reports

Scheduled reporting was highly anticipated by our community members, our customers, and our users. In Zabbix 5.4, we introduced scheduled PDF reporting allowing you to define, generate, and send PDF reports straight to your inbox.

Scheduled PDF reporting

This new functionality provides some nice features, for instance:

  • Centralized management of reports, so that super admins can see what reports are generated by Zabbix and sent to different users.
  • Any dashboard can be converted to a PDF report and sent to your email box.
  • This functionality is accessible to all users though can be restricted by a new user role.

In addition, now you can determine that you need a report, for instance, with the previous week’s data every Monday at 7.00 am. All you need is to select the period for the report, and Zabbix will generate and email it to you or any other user. You can also select to receive reports daily, monthly, or yearly.

PDF reports can be scheduled daily, weekly, monthly, or yearly

There are many other configuration parameters for PDF reports. Still, the most important advantage of PDF reporting is that this functionality is accessible from Dashboards. You just click Dashboard in Monitoring > Dashboards and select the period for PDF reporting.

PDF reporting accessible from Dashboards

 

More powerful and simple

When we think about what features need to be implemented in Zabbix and in what direction we would like Zabbix to go, we always consider different functionalities and improvements aimed at improving Zabbix usability and making Zabbix a simpler monitoring solution, and on the other hand, more powerful and much more flexible. Zabbix 5.4 is not an exception. We have introduced a number of very significant improvements, which simplify monitoring and make Zabbix even more flexible.

Tags for items

Zabbix already supports tags for almost all essential objects, such as triggers, hosts, host prototypes, and templates. Tags are everywhere. In Zabbix 5.4, we introduced tags for items (metrics).

The item-level applications have now been replaced with tags, so applications are not supported anymore. We now use a much more flexible concept of named tags. This way you can have tags providing information and values while having as many tags as you want, which is much more flexible comparing to applications.

You will notice that in the configuration view of your item you now have an additional tab — Tags where you can see the defined tags, as well as their values.

Tags instead of applications

You don’t have to worry about your applications though. Your applications will be automatically converted to the tag “Application: <app name>” during the upgrade. For instance, the application ‘CPU’ will be converted to the tag “Application: <CPU>”. All information will be preserved during the upgrade.

Syntax

Another very interesting functionality, about which we have been thinking for several years, is a new syntax for trigger expressions. We now have unified syntax for everything in Zabbix. Let’s talk about why this is an important step forward.

In previous versions we used a special syntax for triggers, some functional syntax for calculated items, and a different syntax for aggregated items:

  • TRIGGERS: {host:key.func(params)}>0
  • CALCULATED: 100*last(“vfs.fs.size[/,free]/last(“vfs.fs.size[/,total]”)
  • AGGREGATE: grpsum[“MySQL Servers”,”vfs.fs.size[/,total]”,last]

This was quite confusing, and users had to remember the syntax or consult the documentation. In Zabbix 5.4, we introduced a new and, most importantly, unified syntax. Now we use exactly the same syntax for triggers, calculated items, and aggregated items. In addition, this new syntax is more functional, while the old one was more object-oriented.

  • TRIGGERS: func(/host/key, params)>0
  • CALCULATED: sum(/host/vfs.fs.size[*,free], 10m)
  • AGGREGATE: min(avg_foreach(/*/qps?[group=“PostgreSQL” and tag=“Env:Production”], 5m))

The new syntax has a number of advantages. The new unified syntax:

  • is much simpler and unified for everything: triggers, calculated and aggregated items;
  • supports absolute time periods, such as last day, previous hour, etc. So, now, it’s easy to calculate, for instance, the minimum for the previous hour or an average value for the previous or the current day;
  • is free from any limitations of the old syntax;
  • allows for powerful aggregations and selection of items by tags utilizing wildcards, etc.;
  • allows for a function to be applied to results of other functions: func1(func2(item));
  • allows for multiple items as function arguments: min(item1, item2);
  • supports calculated metrics for everything working around the old limitations.

New set of functions

We have also introduced new sets of functions:

https://www.zabbix.com/documentation/current/manual/appendix/functions

NOTE. If you think that something is missing or you’re not able to represent a problem condition using the new syntax or a new function, just let us know. 

API tokens

Finally, we have added the support of per-user named API tokens with expiry dates. We have introduced the new user role — access to API tokens, and now any user can generate a private API token in Zabbix for some specific use. All tokens can be managed globally by super admins with appropriate permissions. Now we have a very understandable way to define which users have permissions to operate with API tokens and which users don’t.

So, if you are an ordinary user, you may go to User settings, click API tokens, and you will see your tokens with the Name, optional expiry date, the time it was last accessed, and the status — Any, Enabled, or Disabled.

API token properties

API tokens can be created by any user with appropriate permissions. Super admins can select the user.

API tokens created by any user with permissions

NOTE. You can use the tokens for any integrations from the Zabbix API. You are not forced to use the username and password to start using the API, you just need to generate a token — copy the token to the clipboard (it will not be visible later), store it in a secure location, and then reuse it.

All tokens generated by different users can be managed by super admins with sufficient permissions to review the tokens.

API tokens managed globally by super admins

Easy-to-manage templates

In Zabbix, it is not that simple  to update templates. When you make some adjustments, for instance, create new items, new triggers, or add any other entity, and it is not  easy to make an update as you don’t quite understand the final impact of the changes.

In Zabbix 5.4, we introduced unique universal IDs for each template element, such as items, triggers, graphs, and so on, which help to perform template updates in a safe way.

Universal template IDs

In the example above, these universal IDs are contained in the templates in YAML format to monitor Memcached elements. The IDs are unique, and they are used to match an item, a trigger, etc. These IDs serve as the uniqueness criteria. Zabbix can easily understand what item we are trying to update, which items no longer exists, whether it is a new item or we are making an adjustment to an existing item.

The IDs also simplify working with templates. Now you can actually keep all your templates in a Git repository, for instance, in JSON or YAML format, and then push them to Zabbix by CI/CD pipeline using Zabbix API. So, as soon as you have made some adjustments in your template, your CI/CD system takes the latest version of the template, and using Zabbix API applies it in Zabbix. Such an approach really helps to look at your infrastructure as a code.

Better import

When importing a new template or a new version of the template, Zabbix now will show you any differences when comparing it to an existing template, as well as the changes that are going to be made in Zabbix.

Comparing existing and new templates

In this example, I have a new version of the Memcached template. I removed one tag – Application, and replaced it with two tags— Service (Memcached) and Class (Infrastructure). During import, I can see the difference between my existing template in Zabbix and the new template, so that I can easily review all of the changes. After pressing Import, these changes are applied to the configuration.

Scalability improvements

We have also introduced a few scalability improvements:

  • Zabbix Server and Proxy poller processes do not require database connection anymore.
  • In-memory cache for trend data, significant speedup for trend-related functions has been introduced.
  • Better parallel data processing on the Zabbix Server side for heavy loads. For instance, environments with 10,000 – 50 000 and more new values per second will benefit from the improved performance.
  • Graceful startup of Zabbix Server, which is very useful especially for instances with thousands or tens of thousands of proxies.

NOTE. When the server goes down, for instance, for maintenance or for an upgrade from one minor version to another, you have a down time in range of 30 seconds to a few minutes. When the Zabbix Server is back again, all proxies start pushing large amounts of data at the same time. So, it’s really important to maintain the stability and good health of the Zabbix Server at this moment. That’s why we have implemented a graceful startup for the Zabbix Server.

Universal global scripts

Another nice improvement simplifying the Zabbix setup — universal global scripts.

First, we introduced JavaScript Webhooks for global scripts for easy integration with third-party alerting and ticketing systems. Zabbix uses Webhooks for many different purposes — preprocessing, integration, data collection, etc. Now you can use it for global scripts.

Global scripts now can be used for everything — auto-remediation, alerting, integrations, and the manual execution on hosts from the Zabbix UI. Now, when you define a global script, you can also define that the script should be used, for instance, only for Action operations.

Universal global scripts’ parameters

In this example, this particular script will be based on a Webhook, it will accept parameters such as event ID, event severity, and the tags in the JSON format. This global script will open a ticket in ServiceNow. After the script has been defined you need to navigatte to Actions and define what operations should be executed. You can send a message to admins or just open a ticket in the service desk.

Defining Actions

It is a very simple, easy-to-use, and easy-to-understand feature simplifying configuration of actions.

Powerful value maps

In Zabbix 5.4, we have introduced some changes related to value maps. The value map is a simple way to convert, for instance, numeric values collected by Zabbix, into human-readable values.

A common example would be monitoring the state of a service, which returns the numbers zero or one, with zero meaning ‘Down‘ or one meaning ‘Up‘. In this case we can define a value map based on the exact value match. This is the current behavior.

In Zabbix 5.4, we extended it to support matching by ranges. So, you can define that if you receive the status between 0 and 127, you consider the service to be ‘Down‘ and if any other value — ‘Up‘, or vice versa.

We have also introduced matching by regular expression. Now, when you define value mapping, you just like a service state in my case you may specify if the value is in the range between 0 and 31 or 64 and 127, then it will be mapped to ‘Up‘ and any other value — ‘Down‘.

Value mapping by range

Value  maps for templates and hosts

We have introduced value on the template level and on the host level. This means that we do not support global value maps anymore.

Value maps were easy to maintain on the global level, but only for smaller installations. As soon as you have multiple templates and different teams working with a different set of templates, value maps become a nightmare to maintain on a global level. In addition, global value mapping hinders multi-tenancy support for any object that is linked to a value map.

Advantages of value maps on the template level:

  • Now we can deliver self-contained templates without any references to global objects. A template contains all information needed for monitoring: a set of items, set of triggers, graphs, template-level dashboards, and value maps. That means we don’t have any references to any global objects now, and templates have become truly independent. You can go to zabbix.com/integrations, download the template to monitor, for instance, for a Cisco device, and we can be absolutely sure that this template will work perfectly on the system as it isn’t linked to any global elements.
  • This new feature enables better support for multi-tenant environments.
  • We have also introduced new mass actions — mass-update operations for easier management of value maps on a template level.

You can Add, Update, Rename, Remove, or Remove all value maps.

Mass-update operations

Usability improvements

  • In Zabbix 5.4, we have introduced a number of usability improvements. The one visible straight away — the third-level menu for better navigation. The hidden features in Administration > GeneralAutoregistration, Housekeeping, Images, Icon mapping, Regular expressions, etc. were difficult to spot. The third level menu provides better visibility for these submenus.

Third-level menu

  • In Zabbix, there are some usability problems that sometimes you have to jump from one page to another, for instance, from a graph to Latest data, from Latest data to a host, etc. We have been thinking about improving usability to keep users focused on what they do. So, we have introduced modal windows for mass-update and import forms. It is a small, though important step in improving overall performance as when you select the list of hosts to do some mass-update operation, you are staying on this list and don’t have to go to a separate page.

Modal windows for mass-update and import forms

  • Another small, but useful feature — negated filtering for tags. Now, in Monitoring > Problems, you can, for instance, display all problems, which are not from your staging environment. You can define a condition, for instance, TagsEnv Does not equal Staging‘, and save it as a new view ‘Excluding staging‘.

Negated filtering by tags

Better support of XML preprocessing

Zabbix has been supporting XML XPath for four years. Now we decided to introduce another conversion — a native conversion from XML to JSON format.

Since most of the operations in Zabbix are JSON-based, it is a nice way to work with XML data — you convert your XML document to JSON as, for instance, the first preprocessing step, and then you work with this data as with any JSON document.

Better XML preprocessing

More improvements

There are also security-related changes and other changes related to real-time export.

  • Security-related:
    — Support of all SNMPv3 encryption protocols.
    — Unified error messages in case of unsuccessful login.
    — Disabled autocomplete for password fields in Zabbix UI.
  • Real-time export:
    — Information about event severity is included in real-time export files.
    — More granular configuration of information exported in real-time.
  • Also:
    — Support of VMWare cluster monitoring.
    — Support of filtering by the presence of LLD macro for low-level discovery.
    — Support of macro {ITEM.VALUETYPE} for notifications.
    — Support of service name lookup for Oracle for HA setups, so that Zabbix can switch from one node to another.
    — Support of NTLM authentication for JavaScript Webhooks.
    — Support of multiple JMX metrics having the same key on one host.
    — Increased size of memory available to JavaScript Webhooks and preprocessing.
    — CurlHttpRequest renamed to HttpRequest in Webhooks.
    — ‘Alias‘ renamed to ‘Username‘ in user configuration.
    — American English is a default language of Zabbix UI and also Zabbix documentation.

New integrations and templates

  • With any new Zabbix major version, we have new integrations and templates. Zabbix 5.4 is not an exception, and it comes with the new integrations with iTOP, VictorOps,  Rocket.Chat, Signal, Express.ms, and other solutions.
  • We have also introduced a new set of official templates for monitoring of APC UPS hardware, Hikvision cameras, etcd, Hadoop, Zookeeper, Kafka, AMQ, HashiCorp Vault, MS Sharepoint, MS Exchange, smartclt, Gitlab, Jenkins, Apache Ignite, and more applications and services.

New integrations and templates

To find out what Zabbix is capable of monitoring and what integrations are available with the third-party systems, you are welcome to visit zabbix.com/integrations, as the solution you want to monitor or a system you’d like to integrate Zabbix with might be already supported or exist as a community-supported solution.

Official solutions for monitoring and alerting

Recently, we introduced device-specific templates on our integration page, where you can see what devices are currently supported. For instance, we started with APC devices. Here, if you click the required device, you will go to the page with the template for the specific device.

Templates for hardware vendor devices

We are planning to increase the number of out-of-the-box supported vendor devices, and we’ll have a look at the Cisco, Juniper, F5, and some other vendors very soon.

NOTE. Zabbix is a free and open-source solution. We don’t have any closed source components. We are just as free as Linux and use exactly the same GPLv2 license. You can download Zabbix from zabbix.com/download and deploy it anywhere you want.

Deploy Zabbix on-premise

We support a range of the most popular operating systems. So, you may deploy Zabbix on MAC OS, Windows, Docker Containers, Docker environment, Cloud AWS, Azure, OpenStack, Digital Ocean, Google Cloud. Recently, we have introduced support of Linode.

Deploy in the cloud

To have your Zabbix instance running in a cloud, you don’t need to spend a lot. You can use Digital Ocean or Linode, and for about $5 per month, you may have Zabbix Server up and running with Zabbix UI, which will be capable of monitoring thousands of devices.

Breaking changes

  • Applications and screens are not supported and many related API methods are affected.
  • We don’t support global value maps anymore, so we have to switch mentally from supporting value maps globally to supporting value maps on a template level (preferred) or on a host level.
  • New syntax for trigger expressions and calculated metrics, which is easier to understand and use.
    —  Aggregate metrics are merged into calculated metrics.

Integration with Grafana

Following the release of Zabbix 5.4, our users quickly realized that our integration with Grafana was broken as we have introduced the aforementioned API changes. Since we do not support applications anymore, all API changes were documented just a few days before the official release of Zabbix 5.4.

Thanks to the swift reaction of Alexander Zobnin, maintainer of the Grafana plugin for Zabbix, this broken integration was fixed very quickly, and you now can easily and safely use Grafana with Zabbix 5.4.

Upgrade notes

  • Upgrade to Zabbix 5.4 from your existing 5.0, or 5.2, or 4.4, or 4.2 is easy as usual. You need to install new binaries for Zabbix Server and Zabbix Proxies, upgrade Zabbix UI, start the Zabbix Server, and the Zabbix Server will upgrade the structure of your database automatically.
  • All trigger expressions, calculated and aggregate metrics will be automatically converted to the new syntax.
  • All applications will be automatically converted to tags. For instance, “CPU” will be converted to the tag “Application:CPU”.
  • Global values maps will be moved to template and host level.
  • All screens will be automatically converted to Dashboards.

NOTE. The next LTS release, Zabbix 6.0, is expected by the end of 2021. Our development team is working on High Availability for the Zabbix Server, which will be available out of the box so that you could install the Zabbix Server and you run it in a high-availability cluster mode.

We also invest much time to improve the Business Service Monitoring (BSM) in Zabbix, which is related to the service tree and dependencies between services, business service monitoring SLA, and SLA reporting.

Zabbix roadmap

More information about the expected features is available at zabbix.com/roadmap. Here, you may click the version you’re interested in, for instance, 6.0, 6.2, 6.4, or 7.0 LTS. We are now keeping a dashboard of improvement, so you will immediately see any progress down the road of our 2.5-year plan.

Zabbix roadmap

Now, we maintain a dynamic dashboard for our roadmap displaying the progress of development of any feature. For instance, ‘Support of multi-tenancy for Services’ is marked with ‘In dev’, as it is under development and has been added to the roadmap recently.

In addition, this feature has a reference to the ZNXNEXT-59 issue, where more detailed information on the feature is available. ‘Top voted’ mark here means that the feature has been voted for by many Zabbix users and community members.

Questions & Answers

Question. Is migration from screens to dashboards automatic? How does that work?

Answer. All screens — global screens and template-level screens will be automatically converted to global dashboards and template-level dashboards. You don’t need to worry about that, all information will be preserved.

Question. Now you have dashboards delivered as reports. Do you plan to natively send these dashboards to some third-party integration, for instance, put it in some frame in a company’s  website?

Answer. If we have such plans, they should be on the roadmap. You can do it right now using some sort of fixed-frame technology, though it is an ugly way to integrate one solution with another. I am looking forward to adding the ability to include a widget or a dashboard into a third-party HTML page. I think it will be implemented in Zabbix sooner or later.

Question. Do you have any plans to have reports based on some other sections of Zabbix, such as inventory reports, or availability reports, and so on?

Answer. This functionality can evolve as a result of improving the visualization capabilities of Zabbix dashboards or more specifically by introducing additional widgets for different purposes, such as geographical maps, data table widgets, which are already planned for Zabbix 6.0, capacity planning, and widgets for all possible use cases. So, as soon as we have a rich set of widgets for dashboards, it will automatically put these values into PDF reports.

Some changes planned for Zabbix 6.0 are related to business service monitoring and as part of this development, we will also introduce new widgets made specifically for SLA reporting. So, we are developing in this direction — we are going to have more widgets, more widget flexibility, and maybe more visually appealing widgets in the future.

Question. Why did we implement the value maps in the way that we did? Why didn’t we implement it on three levels just as user macros global template host?

Answer. If we implemented it in three levels, the global value maps would still be there, and it would be very hard to manage value maps in this case. Suppose you have a global map defined, such as a service state. What should Zabbix do after you import a new template with the same value map service state, which is defined on a different level? Should it keep the value map service state on a template level or should it upgrade the global service state without creating the service state on the template level? This would introduce a new set of different problems and confusion in the end. So, we really need to keep templates independent to prevent those hidden dependencies on global objects such as value maps, especially for larger Zabbix deployments. Even one hundred templates in your setup might become a huge problem from the maintenance point of view.

Question. Why do we release so many intermediate versions? Why don’t we support versions 5.2 and 5.4 for two or three years?

Answer. The reason is very simple. We maintain backward and forward compatibility within one major release, and we guarantee that the database structure remains intact. So, if you install 5.4.0, everything within 5.4 (5.4.10, 5.4.2, 5.4.0) remains backward and forward compatible and the database structure remains the same. If we start introducing new features in minor releases, we will have to modify and extend the structure of the database, then minor versions of a major release will not be compatible anymore. I don’t think it is a good approach, and you will have no possibility to downgrade if a newer version of Zabbix doesn’t work the way you want. This approach is described in the document release cycle on our website and is has proved its feasibility over time.

We support 5.2 and 5.4 only for several months not to put additional load on our support team. We have two different types of releases — LTS releases with five-year support and non-LTS releases. If we started supporting everything, then at every given moment there will be 10 major Zabbix versions supported by our team. If a customer reports a problem to our support team, we have to fix this problem, for instance, create a patch, follow the QA procedure, test the solution very carefully, etc. Even Microsoft doesn’t maintain 10 major versions. So, we support just a few LTS releases at a time.

Question. Will you continue support of Red Hat 7 or CentOS 7 when Zabbix 6.0 is released? What about RHEL 7?

Answer. We dropped support of CentOS 7 for a good reason. We discovered that in the version of the software we need, some things, for instance, TLS encryption, are outdated in CentOS 7.0. There are also other unsupported dependencies, for example PHP and so on. In addition, we realized that in Zabbix 6.0 we will not support CentOS 7.0 anymore as Zabbix 6.0 will be supported for five years and we just won’t be able to support CentOS 7.0 for extra five years starting from the end of 2021. So, we could drop support of CentOS 7.0 starting from Zabbix 5.2 or keep it supported in Zabbix 5.2 or Zabbix 5.4 and drop it starting from Zabbix 6.0. We decided to drop support of CentOS 7.0 for Zabbix Server, Zabbix Proxy, and Zabbix UI immediately starting from Zabbix 5.2. In addition, we would have to rely on a third party repository to get the particular versions of software dependencies that Zabbix requires.

Now Open Third Availability Zone in the AWS China (Beijing) Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-third-availability-zone-in-the-aws-china-beijing-region/

I made my first trip to China in late 2008. I was able to speak to developers and entrepreneurs and to get a sense of the then-nascent market for cloud computing. With over 900 million Internet users as of 2020 (according to a recent report from China Internet Network Information Center), China now has the largest user base in the world.

A limited preview of the China (Beijing) Region was launched in 2013 and brought to general availability in 2016. A year later the AWS China (Ningxia) Region launched. In order to comply with China’s legal and regulatory requirements, we collaborated with local Chinese partners. These partners have the proper telecom licenses to provide cloud services in Mainland China. Today, developers can deploy cloud-based applications inside of using the same APIs, protocols, services, and operational practices used by our customers in other parts of the world. This commonality has been particularly attractive to multinational companies that can take advantage of their existing AWS experience when they expand their cloud infrastructure into Mainland China.

Third Availability Zone in Beijing
Today I am happy to announce that we are adding a third Availability Zone (AZ) to the China (Beijing) Region operated by Sinnet in order to support the demands of our growing customer base in China. As is the case with all AWS Availability Zones, this one encompasses one or more discrete data centers in separate facilities, each with redundant power, networking, and connectivity. With this launch, both AWS Regions in China offer three AZs and allow customers to build applications that are scalable, fault-tolerant, and highly available.

AWS Customers in the Beijing Region
Many enterprise customers in China are using the China (Beijing) Region to support their digital transformations. For example:

Yantai Shinho was founded in 1992 and now manufactures 13 popular condiments. They now have a presence in over 100 countries and supply products that tens of millions of families enjoy. They are already using the region to support their front-end and big data efforts, and plan to make use of the additional architectural options made possible by the new AZ.

Kingdee International Software Group was founded in 1993 and now provides corporate management and cloud services for more than 6.8 million enterprises and government organizations. They now have over 8,000 employees AND are committed to changing the way that hundreds of millions of people work.

As I noted earlier, our multinational customers are using the AWS Regions in China to expand their global presence. Here are a few examples:

Australian independent software vendor Canva offers its design-on-demand application to 150 million active users in 190 countries. They launched their Chinese products in August 2018, and have since built in into a first-class design platform that includes tens of millions of high-resolution pictures, Chinese fonts, original templates, and more. Chinese users have already created over 50 million designs on the Canva.cn platform.

Swire is a 200 year old business group that spans the aviation, beverage, food, industrial, marine services, and property industries. Their Swire Coca-Cola division has the exclusive right to manufacture, market, and distribute Coca-Cola products in eleven Chinese provinces, the Shanghai Municipality, Hong Kong, Taiwan, and part of the Western United States — a total customer base of 728 million people. Swire Coca-Cola’s systems primarily operate in the China (Beijing) Region and will soon make use of the third AZ.

Finally, startups are using the region to power their fast-growing businesses:

CraiditX applies machine learning technology originally developed for search engines to the financial services industry. Established in 2015, they use behavioral language processing, natural language processing, neural networks, and integrated modeling to build risk management systems.

Founded in 2016, Momenta is a Chinese startup that is building a “brain” for autonomous vehicles. Powered by deep learning and data-driven path planning, they are working on autonomous driving for passenger vehicles and full autonomy for mobile service vehicles, all deployed in the China (Beijing) Region.

81 and 25
This launch raises the global AWS footprint to a total of 81 Availability Zones across 25 geographic regions, with plans to launch 18 additional Availability Zones and six more regions in Australia, India, Indonesia, Spain, Switzerland, and United Arab Emirates (UAE).

–Jeff, with lots of help from Lillian Shao;

PS – The operator and service provider for the AWS China (Beijing) Region is Beijing Sinnet Technology Co., Ltd. The operator and service provider for the AWS China (Ningxia) Region is Ningxia Western Cloud Data Technology Co., Ltd.

Introducing the newest AWS Heroes – June, 2021

Post Syndicated from Ross Barich original https://aws.amazon.com/blogs/aws/introducing-the-newest-aws-heroes-june-2021/

We at AWS continue to be impressed by the passion AWS enthusiasts have for knowledge sharing and supporting peer-to-peer learning in tech communities. A select few of the most influential and active community leaders in the world, who truly go above and beyond to create content and help others build better & faster on AWS, are recognized as AWS Heroes.

Today we are thrilled to introduce the newest AWS Heroes, including the first Heroes based in Perú and Ukraine:

Anahit Pogosova – Tampere, Finland

Data Hero Anahit Pogosova is a Lead Cloud Software Engineer at Solita. She has been architecting and building software solutions with various customers for over a decade. Anahit started working with monolithic on-prem software, but has since moved all the way to the cloud, nowadays focusing mostly on AWS Data and Serverless services. She has been particularly interested in the AWS Kinesis family and how it integrates with AWS Lambda. You can find Anahit speaking at various local and international events, such as AWS meetups, AWS Community Days, ServerlessDays, and Code Mesh. She also writes about AWS on Solita developers’ blog and has been a frequent guest on various podcasts.

Anurag Kale – Gothenburg, Sweden

Data Hero Anurag Kale is a Cloud Consultant at Cybercom Group. He has been using AWS professionally since 2017 and holds the AWS Solutions Architect – Associate certification. He is a co-organizer of the AWS User Group Pune; helping host and organize AWS Community Day Pune 2020 and AWS Community Day India 2020 – Virtual Edition. Anurag’s areas of interest include Amazon DynamoDB, relational databases, serverless data pipelines, data analytics, Infrastructure as Code, and sustainable cloud solutions. He is an active advocate of DynamoDB and Amazon Aurora, and has spoken at various national and international events such as AWS Community Day Nordics 2020 and various AWS Meetups.

Arseny Zinchenko – Kiev, Ukraine

Container Hero Arseny Zinchenko has over 15 years in IT, and currently works as a DevOps Team Lead and Data Security Officer at BetterMe Inc., a leading health & fitness mobile publisher. Since 2011 Arseny has used his blog to share expertise about DevOps, system administration, containerization, and cloud computing. Currently he is focused primary on Amazon Elastic Kubernetes Service (EKS) and security solutions provided by AWS. He is a member of the biggest Ukranian DevOps community, UrkOps, where he helps others to build their best with AWS and containers. where he helps others to build their best with AWS and containers. He also helps implement DevOps methodology in their organizations by using Amazon CloudFormation and AWS managed services like Amazon RDS, Amazon Aurora, and EKS.

Azmi Mengü – Istanbul, Turkey

Community Hero Azmi Mengü is a Sr. Software Engineer on the Infrastructure Team at Armut / HomeRun. He has over 5 years of AWS cloud development experience and has expertise in serverless, containers, data, and storage services. Since 2019, Azmi has been on the organizing committee of the Cloud and Serverless Turkey community. He co-organized and acted as a speaker at over 50 physical and online events during this time. He actively writes blog posts about developing serverless, container, and IaC technologies on AWS. Azmi also co-organized the first-ever ServerlessDays Istanbul in Turkey and AWS Community Day Turkey events.

Carlos Cortez – Lima, Perú

Community Hero Carlos Cortez is the founder and leader of AWS User Group Perú and Founder and CTO of CENNTI, which helps Peruvian companies in their difficult journey to the cloud and the development of Machine Learning solutions. The two biggest AWS events in Perú, AWS Community Day Lima 2019 and AWS UG Perú Conference in 2021, were organized by Carlos. He is the owner of the first AWS Podcast in Latam, “Imperio Cloud” and “Al día con AWS”. He loves to create content for emerging technologies, which is why he created DeepFridays to educate people in Reinforcement Learning.

Chris Miller – Santa Cruz, USA

Machine Learning Hero Chris Miller is an entrepreneur, inventor, and CEO of Cloud Brigade. After winning the 2019 AWS DeepRacer Summit race in Santa Clara, he founded the Santa Cruz DeepRacer Meetup group. Chris has worked with AWS AI/ML product teams with DeepLens and DeepRacer on projects including The Poopinator, and What’s in my Fridge. He prides himself on being a technical founder with experience across a broad range of disciplines, which has led to a lot of crazy projects in competitions and hackathons, such as an automated beer brewery, animatronic ventriloquist dummy, and his team even won a Cardboard Boat Race!

Gert Leenders – Brussels, Belgium

DevTools Hero Gert Leenders started his career as a developer in 2001. Eight years ago, his focus shifted entirely towards AWS. Today, he’s an AWS Cloud Solution Architect helping teams build and deploy cloud-native applications and manage their cloud infrastructure. On his blog, Gert emphasizes hidden gems in AWS developer tools and day-to-day topics for cloud engineers like logging, debugging, error handling and Infrastructure as Code. He also often shares code on GitHub.

Lei Wu – Beijing, China

Machine Learning Hero Lei Wu is head of the machine learning team at FreeWheel. He enjoys sharing technology with others, and he publishes many Chinese language tech blogs at infoQ, covering machine learning, big data, and distributed computing systems. Lei works hard to promote deep learning adoption with AWS services wherever he can, including talks at Spark Summit China, World Artificial Intelligence Conference, AWS Innovate AI/ML edition, and AWS re:Invent where he shared FreeWheel’s best practices on deep learning with Amazon SageMaker.

Hidetoshi Matsui – Hamamatsu, Japan

Serverless Hero Hidetoshi Matsui is a developer at Startup Technology Inc. and a member of the Japan AWS User Group (JAWS-UG). On “builders.flash,” a web magazine for developers run by AWS Japan, the articles he has contributed are among the most viewed pages on the site since 2020. His most impactful achievement is the construction of a distribution site for JAWS-UG’s largest event, JAWS DAYS 2021 re:Connect. He made full use of various AWS services to build a low-latency and scalable distribution system with a serverless architecture and smooth streaming video viewing for nearly 4000 participants.

Philipp Schmid – Nuremberg, Germany

Machine Learning Hero Philipp Schmid is a Machine Learning & Tech Lead at Hugging Face, working to democratize artificial intelligence through open source and open science. He has extensive experience in deep learning, deploying NLP models into production using AWS Lambda, and is an avid advocate of Amazon SageMaker to simplify machine learning, such as “Distributed Training: Train BART/T5 for Summarization using Transformers and Amazon SageMaker.” He loves to share his knowledge on AI and NLP at various meetups such as Data Science on AWS, and on his technical Blog.

Simone Merlini – Milan, Italy

Community Hero Simone Merlini is CEO and CTO at beSharp. In 2012 he co-founded the first AWS User Group in Italy, and he’s currently the organizer of the AWS User Group Milan. He’s also actively involved in the development of Leapp, an open-source project for managing and securing Cloud access in multi-account environments. Simone is also the editor in chief and writer for Proud2beCloud, a blog aimed to share highly specialized AWS knowledge to enable the adoption of cloud technologies.

Virginie Mathivet – Lyon, France

Machine Learning Hero Virginie Mathivet has been leading the DataSquad team at TeamWork since 2017, focused on Data and Artificial Intelligence. Their purpose is to make the most of their clients’ data, via Data Science or Data Engineering / Big Data, mainly on AWS. Virginie regularly participates in conferences and writes books and articles, both for the public (introduction to AI) and for an informed audience (technical subjects). She also campaigns for a better visibility of women in the digital industry and for diversity in the data professions. Her favorite cloud service? Amazon SageMaker of course!

Walid A. Shaari – Dhahran, Saudi Arabia

Container Hero Walid A. Shaari is the community lead for the Dammam Cloud-native AWS User Group, working closely with CNCF ambassadors, K8saraby, and AWS MENA community leaders to enable knowledge sharing, collaboration, and networking. He helped organize the first AWS Community Day – MENA 2020. Walid also maintains GitHub content for Certified Kubernetes Administrators (CKA) and Certified Kubernetes Security Specialists (CKS), and holds several active professional certifications: AWS Certified Associate Solutions Architect, Certified Kubernetes Administrator, Certified Kubernetes Application Developer, Red Hat Certified Architect level IV, and more.

 

 

 

 

If you’d like to learn more about the new Heroes, or connect with a Hero near you, please visit the AWS Hero website.

Ross;

Kinda a big announcement

Post Syndicated from Joel Spolsky original https://www.joelonsoftware.com/2021/06/02/kinda-a-big-announcement/

The other day I was talking to a young developer working on a code base with tons of COM code, and I told him that even before he was born, everyone knew that COM was already so deeply obsolete that it was impossible to find anyone who knew enough to work on it. And yet they still have this old COM code base, and they still have one old programmer holding onto their job by being the only human left on the planet with a brain big enough to manually manage multithreaded objects. I remember that COM was like Gödels Theorem: it seemed important, and you could understand it all long enough to pass an exam, but ultimately it is mostly just a demonstration of how far human intelligence can be made to stretch under extreme duress.

And, bubbeleh, if there is one thing we have learned, it’s that the things that make it easier on your brain are the things that matter.

Programming changes slowly. Really slowly.

Since I learned to code forty years ago, one thing that has mostly, mostly, changed about programming is that most developers no longer have to manage their own memory. Even getting that going took a long long time.

I took a few stupid years trying to be the CEO of a growing company during which I didn’t have time to code, and when I came back to web programming, after a break of about 10 years, I found Node, React, and other goodies, which are, don’t get me wrong, amazing? Really really great? But I also found that it took approximately the same amount of work to make a CRUD web app as it always has, and that there were some things (like handing a file upload, or centering) that were, shockingly, still just as randomly difficult as they were in VBScript twenty years ago.

Where are the flying cars?

The biggest problem is that developers of programming tools love to add things and hate to take things away. So things get harder and harder and more and more complex because there are more and more ways to do the same thing, each has pros and cons, and you are likely to spend as much time just figuring out which “rich text editor” to use as you are to implement it.

(Bill Gates, 1990: “How many f*cking programmers in this company are working on rich text editors?!”)

So, in this world of slow, gradual change and alleged improvement, one thing did change literally overnight, or, to be precise, on September 15, 2008, which was when Stack Overflow launched.

Six-to-eight weeks before that, Stack Overflow was only an idea. (*Actually Jeff started development in April). Six-to-eight weeks after that, it was a standard part of every developer’s toolkit: something they used every day. Something had changed about programming, and changed very fast: the way developers learned and got help and taught each other.

For many years, I was able to coast by telling delightful little stories about our incredible growth numbers, about the pay web site we made obsolete, and even about that one time when a Major Computer Book Publisher threatened to BURY US and launched their own Q&A platform, which turned out to be more of a Scoff Generator than a Q&A platform, but actually now almost anyone I talk to is too young to imagine The Days Before Stack Overflow, when the bookstore had an entire wall of Java and the way you picked a Rich Text Editor was going to Barnes and Noble and browsing through printed books for an hour, in the Rich Text Editor Component shelf.

Stack Overflow got to be pretty big as a business. The company grew faster than any individual’s skills at managing companies, especially mine, so a lot of the business team has changed over, and we now have a really world-class, experienced team that is doing much better than us founders. We’ve done incredible work building a recruiting platform for great developers, a “reach and relevance” platform for getting developers excited about your products, and, most importantly, Stack Overflow for Teams, which is growing so quickly that soon every developer in the world will be using the power of Stack Overflow to get help with their own code base, not just common languages and libraries.

And yeah, the one thing I made sure of was that everyone that came into the company understood exactly why Stack Overflow works, and what is important to the developers that it is by and for. So while we haven’t always been perfect, we have kept true to our mission, and the current leadership is just as committed to the vision of Stack Overflow as the founders are.

Today we’re pleased to announce that Stack Overflow is joining Prosus. Prosus is an investment and holding company, which means that the most important part of this announcement is that Stack Overflow will continue to operate independently, with the exact same team in place that has been operating it, according to the exact same plan and the exact same business practices. Don’t expect to see major changes or awkward “synergies”. The business of Stack Overflow will continue to focus on Reach and Relevance, and Stack Overflow for Teams. The entire company is staying in place: we just have different owners now.

This is, in some ways, the best possible outcome. Stack Overflow stays independent. The company has plenty of cash on hand to expand and deliver more features and fix the old broken ones. Right now, the biggest gating factor to how fast we can do this is just how fast we can hire excellent people.

I’ve been out of the day-to-day for a while now. Together with David Wilkinson, I’m helping to build HASH. HASH makes it easy to build powerful simulations and make better decisions. As we worked on that, we discovered that too much of the data that you might need to run simulations needs to be fixed up before you can use it. That’s because data is often published on the web, using page description languages that are more concerned with formatting and consumption by humans. They lack the structure to make the data they contain readily accessed programatically, so step one is miserable screen scraping and data cleanup. That’s where a lot of people give up.

We think we have an interesting way to fix this. If it works, we’ll change the web as quickly and completely as Stack Overflow changed programming. But it’s kind of ambitious and maybe a little too GRAND. If you are interested in joining that crazy journey, do reach out. The whole thing is going to be open source, so just hang on, and we’ll have something up on GitHub for you to play with.

See you soon!

Amazon Redshift ML Is Now Generally Available – Use SQL to Create Machine Learning Models and Make Predictions from Your Data

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/amazon-redshift-ml-is-now-generally-available-use-sql-to-create-machine-learning-models-and-make-predictions-from-your-data/

With Amazon Redshift, you can use SQL to query and combine exabytes of structured and semi-structured data across your data warehouse, operational databases, and data lake. Now that AQUA (Advanced Query Accelerator) is generally available, you can improve the performance of your queries by up to 10 times with no additional costs and no code changes. In fact, Amazon Redshift provides up to three times better price/performance than other cloud data warehouses.

But what if you want to go a step further and process this data to train machine learning (ML) models and use these models to generate insights from data in your warehouse? For example, to implement use cases such as forecasting revenue, predicting customer churn, and detecting anomalies? In the past, you would need to export the training data from Amazon Redshift to an Amazon Simple Storage Service (Amazon S3) bucket, and then configure and start a machine learning training process (for example, using Amazon SageMaker). This process required many different skills and usually more than one person to complete. Can we make it easier?

Today, Amazon Redshift ML is generally available to help you create, train, and deploy machine learning models directly from your Amazon Redshift cluster. To create a machine learning model, you use a simple SQL query to specify the data you want to use to train your model, and the output value you want to predict. For example, to create a model that predicts the success rate for your marketing activities, you define your inputs by selecting the columns (in one or more tables) that include customer profiles and results from previous marketing campaigns, and the output column you want to predict. In this example, the output column could be one that shows whether a customer has shown interest in a campaign.

After you run the SQL command to create the model, Redshift ML securely exports the specified data from Amazon Redshift to your S3 bucket and calls Amazon SageMaker Autopilot to prepare the data (pre-processing and feature engineering), select the appropriate pre-built algorithm, and apply the algorithm for model training. You can optionally specify the algorithm to use, for example XGBoost.

Architectural diagram.

Redshift ML handles all of the interactions between Amazon Redshift, S3, and SageMaker, including all the steps involved in training and compilation. When the model has been trained, Redshift ML uses Amazon SageMaker Neo to optimize the model for deployment and makes it available as a SQL function. You can use the SQL function to apply the machine learning model to your data in queries, reports, and dashboards.

Redshift ML now includes many new features that were not available during the preview, including Amazon Virtual Private Cloud (VPC) support. For example:

Architectural diagram.

  • You can also create SQL functions that use existing SageMaker endpoints to make predictions (remote inference). In this case, Redshift ML is batching calls to the endpoint to speed up processing.

Before looking into how to use these new capabilities in practice, let’s see the difference between Redshift ML and similar features in AWS databases and analytics services.

ML Feature Data Training
from SQL
Predictions
using SQL Functions
Amazon Redshift ML

Data warehouse

Federated relational databases

S3 data lake (with Redshift Spectrum)

Yes, using
Amazon SageMaker Autopilot
Yes, a model can be imported and executed inside the Amazon Redshift cluster, or invoked using a SageMaker endpoint.
Amazon Aurora ML Relational database
(compatible with MySQL or PostgreSQL)
No

Yes, using a SageMaker endpoint.

A native integration with Amazon Comprehend for sentiment analysis is also available.

Amazon Athena ML

S3 data lake

Other data sources can be used through Athena Federated Query.

No Yes, using a SageMaker endpoint.

Building a Machine Learning Model with Redshift ML
Let’s build a model that predicts if customers will accept or decline a marketing offer.

To manage the interactions with S3 and SageMaker, Redshift ML needs permissions to access those resources. I create an AWS Identity and Access Management (IAM) role as described in the documentation. I use RedshiftML for the role name. Note that the trust policy of the role allows both Amazon Redshift and SageMaker to assume the role to interact with other AWS services.

From the Amazon Redshift console, I create a cluster. In the cluster permissions, I associate the RedshiftML IAM role. When the cluster is available, I load the same dataset used in this super interesting blog post that my colleague Julien wrote when SageMaker Autopilot was announced.

The file I am using (bank-additional-full.csv) is in CSV format. Each line describes a direct marketing activity with a customer. The last column (y) describes the outcome of the activity (if the customer subscribed to a service that was marketed to them).

Here are the first few lines of the file. The first line contains the headers.

age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y 56,housemaid,married,basic.4y,no,no,no,telephone,may,mon,261,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
57,services,married,high.school,unknown,no,no,telephone,may,mon,149,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
37,services,married,high.school,no,yes,no,telephone,may,mon,226,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no
40,admin.,married,basic.6y,no,no,no,telephone,may,mon,151,1,999,0,nonexistent,1.1,93.994,-36.4,4.857,5191.0,no

I store the file in one of my S3 buckets. The S3 bucket is used to unload data and store SageMaker training artifacts.

Then, using the Amazon Redshift query editor in the console, I create a table to load the data.

CREATE TABLE direct_marketing (
	age DECIMAL NOT NULL, 
	job VARCHAR NOT NULL, 
	marital VARCHAR NOT NULL, 
	education VARCHAR NOT NULL, 
	credit_default VARCHAR NOT NULL, 
	housing VARCHAR NOT NULL, 
	loan VARCHAR NOT NULL, 
	contact VARCHAR NOT NULL, 
	month VARCHAR NOT NULL, 
	day_of_week VARCHAR NOT NULL, 
	duration DECIMAL NOT NULL, 
	campaign DECIMAL NOT NULL, 
	pdays DECIMAL NOT NULL, 
	previous DECIMAL NOT NULL, 
	poutcome VARCHAR NOT NULL, 
	emp_var_rate DECIMAL NOT NULL, 
	cons_price_idx DECIMAL NOT NULL, 
	cons_conf_idx DECIMAL NOT NULL, 
	euribor3m DECIMAL NOT NULL, 
	nr_employed DECIMAL NOT NULL, 
	y BOOLEAN NOT NULL
);

I load the data into the table using the COPY command. I can use the same IAM role I created earlier (RedshiftML) because I am using the same S3 bucket to import and export the data.

COPY direct_marketing 
FROM 's3://my-bucket/direct_marketing/bank-additional-full.csv' 
DELIMITER ',' IGNOREHEADER 1
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
REGION 'us-east-1';

Now, I create the model straight form the SQL interface using the new CREATE MODEL statement:

CREATE MODEL direct_marketing
FROM direct_marketing
TARGET y
FUNCTION predict_direct_marketing
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
SETTINGS (
  S3_BUCKET 'my-bucket'
);

In this SQL command, I specify the parameters required to create the model:

  • FROM – I select all the rows in the direct_marketing table, but I can replace the name of the table with a nested query (see example below).
  • TARGET – This is the column that I want to predict (in this case, y).
  • FUNCTION – The name of the SQL function to make predictions.
  • IAM_ROLE – The IAM role assumed by Amazon Redshift and SageMaker to create, train, and deploy the model.
  • S3_BUCKET – The S3 bucket where the training data is temporarily stored, and where model artifacts are stored if you choose to retain a copy of them.

Here I am using a simple syntax for the CREATE MODEL statement. For more advanced users, other options are available, such as:

  • MODEL_TYPE – To use a specific model type for training, such as XGBoost or multilayer perceptron (MLP). If I don’t specify this parameter, SageMaker Autopilot selects the appropriate model class to use.
  • PROBLEM_TYPE – To define the type of problem to solve: regression, binary classification, or multiclass classification. If I don’t specify this parameter, the problem type is discovered during training, based on my data.
  • OBJECTIVE – The objective metric used to measure the quality of the model. This metric is optimized during training to provide the best estimate from data. If I don’t specify a metric, the default behavior is to use mean squared error (MSE) for regression, the F1 score for binary classification, and accuracy for multiclass classification. Other available options are F1Macro (to apply F1 scoring to multiclass classification) and area under the curve (AUC). More information on objective metrics is available in the SageMaker documentation.

Depending on the complexity of the model and the amount of data, it can take some time for the model to be available. I use the SHOW MODEL command to see when it is available:

SHOW MODEL direct_marketing

When I execute this command using the query editor in the console, I get the following output:

Console screenshot.

As expected, the model is currently in the TRAINING state.

When I created this model, I selected all the columns in the table as input parameters. I wonder what happens if I create a model that uses fewer input parameters? I am in the cloud and I am not slowed down by limited resources, so I create another model using a subset of the columns in the table:

CREATE MODEL simple_direct_marketing
FROM (
        SELECT age, job, marital, education, housing, contact, month, day_of_week, y
 	  FROM direct_marketing
)
TARGET y
FUNCTION predict_simple_direct_marketing
IAM_ROLE 'arn:aws:iam::123412341234:role/RedshiftML'
SETTINGS (
  S3_BUCKET 'my-bucket'
);

After some time, my first model is ready, and I get this output from SHOW MODEL. The actual output in the console is in multiple pages, I merged the results here to make it easier to follow:

Console screenshot.

From the output, I see that the model has been correctly recognized as BinaryClassification, and F1 has been selected as the objective. The F1 score is a metrics that considers both precision and recall. It returns a value between 1 (perfect precision and recall) and 0 (lowest possible score). The final score for the model (validation:f1) is 0.79. In this table I also find the name of the SQL function (predict_direct_marketing) that has been created for the model, its parameters and their types, and an estimation of the training costs.

When the second model is ready, I compare the F1 scores. The F1 score of the second model is lower (0.66) than the first one. However, with fewer parameters the SQL function is easier to apply to new data. As is often the case with machine learning, I have to find the right balance between complexity and usability.

Using Redshift ML to Make Predictions
Now that the two models are ready, I can make predictions using SQL functions. Using the first model, I check how many false positives (wrong positive predictions) and false negatives (wrong negative predictions) I get when applying the model on the same data used for training:

SELECT predict_direct_marketing, y, COUNT(*)
  FROM (SELECT predict_direct_marketing(
                   age, job, marital, education, credit_default, housing,
                   loan, contact, month, day_of_week, duration, campaign,
                   pdays, previous, poutcome, emp_var_rate, cons_price_idx,
                   cons_conf_idx, euribor3m, nr_employed), y
          FROM direct_marketing)
 GROUP BY predict_direct_marketing, y;

The result of the query shows that the model is better at predicting negative rather than positive outcomes. In fact, even if the number of true negatives is much bigger than true positives, there are much more false positives than false negatives. I added some comments in green and red to the following screenshot to clarify the meaning of the results.

Console screenshot.

Using the second model, I see how many customers might be interested in a marketing campaign. Ideally, I should run this query on new customer data, not the same data I used for training.

SELECT COUNT(*)
  FROM direct_marketing
 WHERE predict_simple_direct_marketing(
           age, job, marital, education, housing,
           contact, month, day_of_week) = true;

Wow, looking at the results, there are more than 7,000 prospects!

Console screenshot.

Availability and Pricing
Redshift ML is available today in the following AWS Regions: US East (Ohio), US East (N Virginia), US West (Oregon), US West (San Francisco), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (Paris), Europe (Stockholm), Asia Pacific (Hong Kong) Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and South America (São Paulo). For more information, see the AWS Regional Services list.

With Redshift ML, you pay only for what you use. When training a new model, you pay for the Amazon SageMaker Autopilot and S3 resources used by Redshift ML. When making predictions, there is no additional cost for models imported into your Amazon Redshift cluster, as in the example I used in this post.

Redshift ML also allows you to use existing Amazon SageMaker endpoints for inference. In that case, the usual SageMaker pricing for real-time inference applies. Here you can find a few tips on how to control your costs with Redshift ML.

To learn more, you can see this blog post from when Redshift ML was announced in preview and the documentation.

Start getting better insights from your data with Redshift ML.

Danilo

Getting Started with Amazon ECS Anywhere – Now Generally Available

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/getting-started-with-amazon-ecs-anywhere-now-generally-available/

Since Amazon Elastic Container Service (Amazon ECS) was launched in 2014, AWS has released other options for running Amazon ECS tasks outside of an AWS Region such as AWS Wavelength, an offering for mobile edge devices or AWS Outposts, a service that extends to customers’ environments using hardware owned and fully managed by AWS.

But some customers have applications that need to run on premises due to regulatory, latency, and data residency requirements or the desire to leverage existing infrastructure investments. In these cases, customers have to install, operate, and manage separate container orchestration software and need to use disparate tooling across their AWS and on-premises environments. Customers asked us for a way to manage their on-premises containers without this added complexity and cost.

Following Jeff’s preannouncement last year, I am happy to announce the general availability of Amazon ECS Anywhere, a new capability in Amazon ECS that enables customers to easily run and manage container-based applications on premises, including virtual machines (VMs), bare metal servers, and other customer-managed infrastructure.

With ECS Anywhere, you can run and manage containers on any customer-managed infrastructure using the same cloud-based, fully managed, and highly scalable container orchestration service you use in AWS today. You no longer need to prepare, run, update, or maintain your own container orchestrators on premises, making it easier to manage your hybrid environment and leverage the cloud for your infrastructure by installing simple agents.

ECS Anywhere provides consistent tooling and APIs for all container-based applications and the same Amazon ECS experience for cluster management, workload scheduling, and monitoring both in the cloud and on customer-managed infrastructure. You can now enjoy the benefits of reduced cost and complexity by running container workloads such as data processing at edge locations on your own hardware maintaining reduced latency, and in the cloud using a single, consistent container orchestrator.

Amazon ECS Anywhere – Getting Started
To get started with ECS Anywhere, register your on-premises servers or VMs (also referred to as External instances) in the ECS cluster. The AWS Systems Manager Agent, Amazon ECS container agent, and Docker must be installed on these external instances. Your external instances require an IAM role that permits them to communicate with AWS APIs. For more information, see Required IAM permissions in the ECS Developer Guide.

To create a cluster for ECS Anywhere, on the Create Cluster page in the ECS console, choose the Networking Only template. This option is for use with either AWS Fargate or external instance capacity. We recommend that you use the AWS Region that is geographically closest to the on-premises servers you want to register.

This creates an empty cluster to register external instances. On the ECS Instances tab, choose Register External Instances to get activation codes and an installation script.

On the Step 1: External instances activation details page, in Activation key duration (in days), enter the number of days the activation key should remain active. The activation key can be used for up to 1,000 activations. In Number of instances, enter the number of external instances you want to register to your cluster. In Instance role, enter the IAM role to associate with your external instances.

Choose Next step to get a registration command.

On the Step 2: Register external instances page, copy the registration command. Run this command on the external instances you want to register to your cluster.

Paste the registration command in your on-premise servers or VMs. Each external instance is then registered as an AWS Systems Manager managed instance, which is then registered to your Amazon ECS clusters.

Both x86_64 and ARM64 CPU architectures are supported. The following is a list of supported operating systems:

  • CentOS 7, CentOS 8
  • RHEL 7
  • Fedora 32, Fedora 33
  • openSUSE Tumbleweed
  • Ubuntu 18, Ubuntu 20
  • Debian 9, Debian 10
  • SUSE Enterprise Server 15

When the ECS agent has started and completed the registration, your external instance will appear on the ECS Instances tab.

You can also add your external instances to the existing cluster. In this case, you can see both Amazon EC2 instances and external instances are prefixed with mi-* together.

Now that the external instances are registered to your cluster, you are ready to create a task definition. Amazon ECS provides the requiresCompatibilities parameter to validate that the task definition is compatible with the the EXTERNAL launch type when creating your service or running your standalone task. The following is an example task definition:

{
	"requiresCompatibilities": [
		"EXTERNAL"
	],
	"containerDefinitions": [{
		"name": "nginx",
		"image": "public.ecr.aws/nginx/nginx:latest",
		"memory": 256,
		"cpu": 256,
		"essential": true,
		"portMappings": [{
			"containerPort": 80,
			"hostPort": 8080,
			"protocol": "tcp"
		}]
	}],
	"networkMode": "bridge",
	"family": "nginx"
}

You can create a task definition in the ECS console. In Task Definition, choose Create new task definition. For Launch type, choose EXTERNAL and then configure the task and container definitions to use external instances.

On the Tasks tab, choose Run new task. On the Run Task page, for Cluster, choose the cluster to run your task definition on. In Number of tasks, enter the number of copies of that task to run with the EXTERNAL launch type.

Or, on the Services tab, choose Create. Configure service lets you specify copies of your task definition to run and maintain in a cluster. To run your task in the registered external instance, for Launch type, choose EXTERNAL. When you choose this launch type, load balancers, tag propagation, and service discovery integration are not supported.

The tasks you run on your external instances must use the bridge, host, or none network modes. The awsvpc network mode isn’t supported. For more information about each network mode, see Choosing a network mode in the Amazon ECS Best Practices Guide.

Now you can run your tasks and associate a mix of EXTERNAL, FARGATE, and EC2 capacity provider types with the same ECS service and specify how you would like your tasks to be split across them.

Things to Know
Here are a couple of things to keep in mind:

Connectivity: In the event of loss of network connectivity between the ECS agent running on the on-premises servers and the ECS control plane in the AWS Region, existing ECS tasks will continue to run as usual. If tasks still have connectivity with other AWS services, they will continue to communicate with them for as long as the task role credentials are active. If a task launched as part of a service crashes or exits on its own, ECS will be unable to replace it until connectivity is restored.

Monitoring: With ECS Anywhere, you can get Amazon CloudWatch metrics for your clusters and services, use the CloudWatch Logs driver (awslogs) to get your containers’ logs, and access the ECS CloudWatch event stream to monitor your clusters’ events.

Networking: ECS external instances are optimized for running applications that generate outbound traffic or process data. If your application requires inbound traffic, such as a web service, you will need to employ a workaround to place these workloads behind a load balancer until the feature is supported natively. For more information, see Networking with ECS Anywhere.

Data Security: To help customers maintain data security, ECS Anywhere only sends back to the AWS Region metadata related to the state of the tasks or the state of the containers (whether they are running or not running, performance counters, and so on). This communication is authenticated and encrypted in transit through Transport Layer Security (TLS).

ECS Anywhere Partners
ECS Anywhere integrates with a variety of ECS Anywhere partners to help customers take advantage of ECS Anywhere and provide additional functionality for the feature. Here are some of the blog posts that our partners wrote to share their experiences and offerings. (I am updating this article with links as they are published.)

Now Available
Amazon ECS Anywhere is now available in all commercial regions except AWS China Regions where ECS is supported. With ECS Anywhere, there are no minimum fees or upfront commitments. You pay per instance hour for each managed ECS Anywhere task. ECS Anywhere free tier includes 2200 instance hours per month for six months per account for all regions. For more information, see the pricing page.

To learn more, see ECS Anywhere in the Amazon ECS Developer Guide. Please send feedback to the AWS forum for Amazon ECS or through your usual AWS Support contacts.

Get started with the Amazon ECS Anywhere today.

Channy

Update. Watch a cool demo of ECS Anywhere to operate a Raspberry Pi cluster at home office and read its deep-dive blog post.

Introducing Amazon Kinesis Data Analytics Studio – Quickly Interact with Streaming Data Using SQL, Python, or Scala

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-kinesis-data-analytics-studio-quickly-interact-with-streaming-data-using-sql-python-or-scala/

The best way to get timely insights and react quickly to new information you receive from your business and your applications is to analyze streaming data. This is data that must usually be processed sequentially and incrementally on a record-by-record basis or over sliding time windows, and can be used for a variety of analytics including correlations, aggregations, filtering, and sampling.

To make it easier to analyze streaming data, today we are pleased to introduce Amazon Kinesis Data Analytics Studio.

Now, from the Amazon Kinesis console you can select a Kinesis data stream and with a single click start a Kinesis Data Analytics Studio notebook powered by Apache Zeppelin and Apache Flink to interactively analyze data in the stream. Similarly, you can select a cluster in the Amazon Managed Streaming for Apache Kafka console to start a notebook to analyze data in Apache Kafka streams. You can also start a notebook from the Kinesis Data Analytics Studio console and connect to custom sources.

Architectural diagram.

In the notebook, you can interact with streaming data and get results in seconds using SQL queries and Python or Scala programs. When you are satisfied with your results, with a few clicks you can promote your code to a production stream processing application that runs reliably at scale with no additional development effort.

For new projects, we recommend that you use the new Kinesis Data Analytics Studio over Kinesis Data Analytics for SQL Applications. Kinesis Data Analytics Studio combines ease of use with advanced analytical capabilities, which makes it possible to build sophisticated stream processing applications in minutes. Let’s see how that works in practice.

Using Kinesis Data Analytics Studio to Analyze Streaming Data
I want to get a better understanding of the data sent by some sensors to a Kinesis data stream.

To simulate the workload, I use this random_data_generator.py Python script. You don’t need to know Python to use Kinesis Data Analytics Studio. In fact, I am going to use SQL in the following steps. Also, you can avoid any coding and use the Amazon Kinesis Data Generator user interface (UI) to send test data to Kinesis Data Streams or Kinesis Data Firehose. I am using a Python script to have finer control over the data that is being sent.

import datetime
import json
import random
import boto3

STREAM_NAME = "my-input-stream"


def get_random_data():
    current_temperature = round(10 + random.random() * 170, 2)
    if current_temperature > 160:
        status = "ERROR"
    elif current_temperature > 140 or random.randrange(1, 100) > 80:
        status = random.choice(["WARNING","ERROR"])
    else:
        status = "OK"
    return {
        'sensor_id': random.randrange(1, 100),
        'current_temperature': current_temperature,
        'status': status,
        'event_time': datetime.datetime.now().isoformat()
    }


def send_data(stream_name, kinesis_client):
    while True:
        data = get_random_data()
        partition_key = str(data["sensor_id"])
        print(data)
        kinesis_client.put_record(
            StreamName=stream_name,
            Data=json.dumps(data),
            PartitionKey=partition_key)


if __name__ == '__main__':
    kinesis_client = boto3.client('kinesis')
    send_data(STREAM_NAME, kinesis_client)

This script sends random records to my Kinesis data stream using JSON syntax. For example:

{'sensor_id': 77, 'current_temperature': 93.11, 'status': 'OK', 'event_time': '2021-05-19T11:20:00.978328'}
{'sensor_id': 47, 'current_temperature': 168.32, 'status': 'ERROR', 'event_time': '2021-05-19T11:20:01.110236'}
{'sensor_id': 9, 'current_temperature': 140.93, 'status': 'WARNING', 'event_time': '2021-05-19T11:20:01.243881'}
{'sensor_id': 27, 'current_temperature': 130.41, 'status': 'OK', 'event_time': '2021-05-19T11:20:01.371191'}

From the Kinesis console, I select a Kinesis data stream (my-input-stream) and choose Process data in real time from the Process drop-down. In this way, the stream is configured as a source for the notebook.

Console screenshot.

Then, in the following dialog box, I create an Apache Flink – Studio notebook.

I enter a name (my-notebook) and a description for the notebook. The AWS Identity and Access Management (IAM) permissions to read from the Kinesis data stream I selected earlier (my-input-stream) are automatically attached to the IAM role assumed by the notebook.

Console screenshot.

I choose Create to open the AWS Glue console and create an empty database. Back in the Kinesis Data Analytics Studio console, I refresh the list and select the new database. It will define the metadata for my sources and destinations. From here, I can also review the default Studio notebook settings. Then, I choose Create Studio notebook.

Console screenshot.

Now that the notebook has been created, I choose Run.

Console screenshot.

When the notebook is running, I choose Open in Apache Zeppelin to get access to the notebook and write code in SQL, Python, or Scala to interact with my streaming data and get insights in real time.

In the notebook, I create a new note and call it Sensors. Then, I create a sensor_data table describing the format of the data in the stream:

%flink.ssql

CREATE TABLE sensor_data (
    sensor_id INTEGER,
    current_temperature DOUBLE,
    status VARCHAR(6),
    event_time TIMESTAMP(3),
    WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
)
PARTITIONED BY (sensor_id)
WITH (
    'connector' = 'kinesis',
    'stream' = 'my-input-stream',
    'aws.region' = 'us-east-1',
    'scan.stream.initpos' = 'LATEST',
    'format' = 'json',
    'json.timestamp-format.standard' = 'ISO-8601'
)

The first line in the previous command tells to Apache Zeppelin to provide a stream SQL environment (%flink.ssql) for the Apache Flink interpreter. I can also interact with the streaming data using a batch SQL environment (%flink.bsql), or Python (%flink.pyflink) or Scala (%flink) code.

The first part of the CREATE TABLE statement is familiar to anyone who has used SQL with a database. A table is created to store the sensor data in the stream. The WATERMARK option is used to measure progress in the event time, as described in the Event Time and Watermarks section of the Apache Flink documentation.

The second part of the CREATE TABLE statement describes the connector used to receive data in the table (for example, kinesis or kafka), the name of the stream, the AWS Region, the overall data format of the stream (such as json or csv), and the syntax used for timestamps (in this case, ISO 8601). I can also choose the starting position to process the stream, I am using LATEST to read the most recent data first.

When the table is ready, I find it in the AWS Glue Data Catalog database I selected when I created the notebook:

Console screenshot.

Now I can run SQL queries on the sensor_data table and use sliding or tumbling windows to get a better understanding of what is happening with my sensors.

For an overview of the data in the stream, I start with a simple SELECT to get all the content of the sensor_data table:

%flink.ssql(type=update)

SELECT * FROM sensor_data;

This time the first line of the command has a parameter (type=update) so that the output of the SELECT, which is more than one row, is continuously updated when new data arrives.

On the terminal of my laptop, I start the random_data_generator.py script:

$ python3 random_data_generator.py

At first I see a table that contains the data as it comes. To get a better understanding, I select a bar graph view. Then, I group the results by status to see their average current_temperature, as shown here:

Notebook screenshot.

As expected by the way I am generating these results, I have different average temperatures depending on the status (OK, WARNING, or ERROR). The higher the temperature, the greater the probability that something is not working correctly with my sensors.

I can run the aggregated query explicitly using a SQL syntax. This time, I want the result computed on a sliding window of 1 minute with results updated every 10 seconds. To do so, I am using the HOP function in the GROUP BY section of the SELECT statement. To add the time to the output of the select, I use the HOP_ROWTIME function. For more information, see how group window aggregations work in the Apache Flink documentation.

%flink.ssql(type=update)

SELECT sensor_data.status,
       COUNT(*) AS num,
       AVG(sensor_data.current_temperature) AS avg_current_temperature,
       HOP_ROWTIME(event_time, INTERVAL '10' second, INTERVAL '1' minute) as hop_time
  FROM sensor_data
 GROUP BY HOP(event_time, INTERVAL '10' second, INTERVAL '1' minute), sensor_data.status;

This time, I look at the results in table format:

Notebook screenshot.

To send the result of the query to a destination stream, I create a table and connect the table to the stream. First, I need to give permissions to the notebook to write into the stream.

In the Kinesis Data Analytics Studio console, I select my-notebook. Then, in the Studio notebooks details section, I choose Edit IAM permissions. Here, I can configure the sources and destinations used by the notebook and the IAM role permissions are updated automatically.

Console screenshot.

In the Included destinations in IAM policy section, I choose the destination and select my-output-stream. I save changes and wait for the notebook to be updated. I am now ready to use the destination stream.

In the notebook, I create a sensor_state table connected to my-output-stream.

%flink.ssql

CREATE TABLE sensor_state (
    status VARCHAR(6),
    num INTEGER,
    avg_current_temperature DOUBLE,
    hop_time TIMESTAMP(3)
)
WITH (
'connector' = 'kinesis',
'stream' = 'my-output-stream',
'aws.region' = 'us-east-1',
'scan.stream.initpos' = 'LATEST',
'format' = 'json',
'json.timestamp-format.standard' = 'ISO-8601');

I now use this INSERT INTO statement to continuously insert the result of the select into the sensor_state table.

%flink.ssql(type=update)

INSERT INTO sensor_state
SELECT sensor_data.status,
    COUNT(*) AS num,
    AVG(sensor_data.current_temperature) AS avg_current_temperature,
    HOP_ROWTIME(event_time, INTERVAL '10' second, INTERVAL '1' minute) as hop_time
FROM sensor_data
GROUP BY HOP(event_time, INTERVAL '10' second, INTERVAL '1' minute), sensor_data.status;

The data is also sent to the destination Kinesis data stream (my-output-stream) so that it can be used by other applications. For example, the data in the destination stream can be used to update a real-time dashboard, or to monitor the behavior of my sensors after a software update.

I am satisfied with the result. I want to deploy this query and its output as a Kinesis Analytics application. To do so, I need to provide an S3 location to store the application executable.

In the configuration section of the console, I edit the Deploy as application configuration settings. There, I choose a destination bucket in the same region and save changes.

Console screenshot.

I wait for the notebook to be ready after the update. Then, I create a SensorsApp note in my notebook and copy the statements that I want to execute as part of the application. The tables have already been created, so I just copy the INSERT INTO statement above.

From the menu at the top right of my notebook, I choose Build SensorsApp and export to Amazon S3 and confirm the application name.

Notebook screenshot.

When the export is ready, I choose Deploy SensorsApp as Kinesis Analytics application in the same menu. After that, I fine-tune the configuration of the application. I set parallelism to 1 because I have only one shard in my input Kinesis data stream and not a lot of traffic. Then, I run the application, without having to write any code.

From the Kinesis Data Analytics applications console, I choose Open Apache Flink dashboard to get more information about the execution of my application.

Apache Flink console screenshot.

Availability and Pricing
You can use Amazon Kinesis Data Analytics Studio today in all AWS Regions where Kinesis Data Analytics is generally available. For more information, see the AWS Regional Services List.

In Kinesis Data Analytics Studio, we run the open-source versions of Apache Zeppelin and Apache Flink, and we contribute changes upstream. For example, we have contributed bug fixes for Apache Zeppelin, and we have contributed to AWS connectors for Apache Flink, such as those for Kinesis Data Streams and Kinesis Data Firehose. Also, we are working with the Apache Flink community to contribute availability improvements, including automatic classification of errors at runtime to understand whether errors are in user code or in application infrastructure.

With Kinesis Data Analytics Studio, you pay based on the average number of Kinesis Processing Units (KPU) per hour, including those used by your running notebooks. One KPU comprises 1 vCPU of compute, 4 GB of memory, and associated networking. You also pay for running application storage and durable application storage. For more information, see the Kinesis Data Analytics pricing page.

Start using Kinesis Data Analytics Studio today to get better insights from your streaming data.

Danilo

AWS Lambda Extensions Are Now Generally Available – Get Started with Your Favorite Operations Tools Today

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/getting-started-with-using-your-favorite-operational-tools-on-aws-lambda-extensions-are-now-generally-available/

In October 2020, we announced the preview of AWS Lambda extensions, which you can use to easily integrate Lambda functions with your favorite tools for monitoring, observability, security, and governance.

Today, I’m happy to announce the general availability of AWS Lambda Extensions which comes with new performance improvements and an expanded set of partners. As part of the GA release, we have enabled functions to send responses as soon as the function code is complete without waiting for the included extensions to finish. This enables extensions to perform activities like sending telemetry to a preferred destination after the function’s response has been returned. We also welcome extensions from new partners: Imperva, Instana, Sentry, Site24x7, and the AWS Distro for OpenTelemetry.

You can use Lambda extensions for use cases such as capturing diagnostic information before, during, and after function invocation; automatically instrumenting your code without needing code changes; fetching configuration settings or secrets before the function invocation; detecting and alerting on function activity through security agents; and sending telemetry to custom destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Kinesis, Amazon Elasticsearch Service directly and asynchronously from your Lambda functions.

Customers are drawn to the vision of Serverless. The reduced operational responsibility frees them up to focus on their business problems. To help customers monitor, observe, secure, and govern their functions, AWS Lambda provides native integrations for logs and metrics through Amazon CloudWatch, tracing through AWS X-Ray, tracking configuration changes through AWS Config, and recording API calls through AWS CloudTrail In addition, AWS Lambda partners provide tools for application management, API integration, deployment, monitoring, and security.

AWS Lambda extensions provide a simple way to extend the Lambda execution environment, which is where your function code is executed. AWS customers, partners, and the open source community can use the new Lambda Extensions API to build their own extensions, which are companion processes that augment the capabilities of Lambda functions. To learn how to build your own extensions, see the Building Extensions for AWS Lambda – In preview blog post. The post also includes information about changes to the Lambda lifecycle.

How AWS Lambda Extensions Works
AWS Lambda extensions are designed to be the easiest way to plug in the tools you use today without complex installation or configuration management. You can add tools to your functions using Lambda layers or include them in the image for functions deployed as container images.

Lambda extensions use the Extensions API to register for function and execution environment lifecycle events. In response to these events, extensions can start new processes or run logic. Lambda extensions can also use the Runtime Logs API to subscribe to a stream of the same logs that the Lambda service sends to Amazon CloudWatch directly from the Lambda execution environment. Lambda streams the logs to the extension, and the extension can then process, filter, and send the logs to any preferred destination.

Most customers will use Lambda extensions without needing to know about the capabilities of the Extensions API. You can just consume capabilities of an extension by configuring the options in your Lambda functions.

How to Use Lambda Extensions
You can install and manage extensions using the Lambda console, the AWS Command Line Interface (CLI), or infrastructure as code (IaC) services and tools such as AWS CloudFormation, AWS Serverless Application Model (AWS SAM), and Terraform.

To use Lambda extensions to integrate existing tools with your Lambda functions, choose your a Lambda function and on the Configuration tab, choose Monitoring and Operations tools.

On the Extensions page, you can find available extensions from AWS Lambda partners. Choose an extension to view its installation instructions.

AWS Lambda Extensions Partners
At this launch, Lambda extensions integrate with these AWS Lambda partners who have provided the following information to introduce their extensions. (I am updating this article with links as they are published.)

  • AppDynamics provides end-to-end transaction tracing for AWS Lambda. With the AppDynamics extension, it is no longer mandatory for developers to include the AppDynamics tracer as a dependency in their function code, making tracing transactions across hybrid architectures even simpler.
  • Coralogix is a log analytics and cloud security platform that empowers thousands of companies to improve security and accelerate software delivery, allowing you to get deep insights without paying for the noise. Coralogix can now read Lambda function logs and metrics directly, without using CloudWatch or Amazon S3, reducing the latency, and cost of observability.
  • The Datadog extension brings comprehensive, real-time visibility to your serverless applications. Combined with Datadog’s integration with AWS, you get metrics, traces, and logs to help you monitor, detect, and resolve issues at any scale. The Datadog extension makes it easier than ever to get telemetry from your serverless workloads.
  • The Dynatrace extension makes it even easier to bring AWS Lambda metrics and traces into the Dynatrace platform for intelligent observability and automatic root cause detection. Get comprehensive, end-to-end observability with the flip of a switch and no code changes.
  • Epsagon helps you monitor, troubleshoot, and lower the cost of your Lambda functions. Epsagon’s extension reduces the overhead of sending traces to the Epsagon service, with minimal performance impact to your function.
  • HashiCorp Vault allows you to secure, store, and tightly control access to your application’s secrets and sensitive data. With the Vault extension, you can now authenticate and securely retrieve dynamic secrets before your Lambda function is invoked.
  • Honeycomb is a powerful observability tool that helps you debug your entire production app stack. Honeycomb’s extension decreases the overhead, latency, and cost of sending events to the Honeycomb service, while increasing reliability.
  • Instana Enterprise Observability Platform ingests performance metrics, traces requests, and profiles processes to make observability work for the enterprise. The Instana Lambda extension offers modification-free, low latency tracing of Lambda functions backed by their real-time Enterprise Observability Platform.
  • Imperva Serverless Protection protects organizations from vulnerabilities created by misconfigured apps and code-level security risks in serverless computing environments. The Imperva extension enables customers to easily embed additional security in their DevOps processes for serverless applications without requiring any code changes, leading to faster time to market.
  • Lumigo provides a monitoring and observability platform for serverless and microservices applications. The Lumigo extension enables the new Lumigo Lambda Profiler to see a breakdown of function resources, including CPU, memory, and network metrics. Use the extension to receive actionable insights to reduce Lambda runtime duration and cost, fix bottlenecks, and increase efficiency.
  • Check Point CloudGuard provides full lifecycle security for serverless applications. The CloudGuard extension enables Function Self Protection data aggregation as an out-of-process extension, providing detection and alerting on application layer attacks.
  • New Relic enables you to efficiently monitor, troubleshoot, and optimize your Lambda functions. New Relic’s extension allows you send your Lambda service platform logs directly to New Relic’s unified observability platform, allowing you to quickly visualize data with minimal latency and cost.
  • Thundra provides an application debugging, observability and security platform for serverless, container and virtual machine (VM) workloads. The Thundra extension adds asynchronous telemetry reporting functionality to the Thundra agents, getting rid of network latency.
  • Splunk offers an enterprise-grade cloud monitoring solution for real-time full-stack visibility at scale. The Splunk extension provides a simplified runtime-independent interface to collect high-resolution observability data with minimal overhead. Monitor, manage, and optimize the performance and cost of your serverless applications with Splunk Observability solutions.
  • Sentry’s extension enables developers to monitor code health. From error tracking to performance monitoring, developers can see issues more clearly, solve them quicker, and continuously stay informed about the health of their applications, all without making code changes.
  • Site24x7 provides a performance monitoring solution for DevOps and IT operations. The Site24x7 extension enables real-time observability into your Lambda functions. It enables you to monitor critical Lambda metrics and function executions logs and optimize execution time and performance.
  • The Sumo Logic extension enables you to get instant visibility into the health and performance of your mission-critical applications using AWS Lambda. With this extension and Sumo Logic’s continuous intelligence platform, you can now ensure that all your Lambda functions are running as expected by analyzing function, platform, and extension logs to quickly identify and remediate errors and exceptions.

Here are Lambda extensions from AWS services:

  • AWS AppConfig helps you manage, store, and safely deploy application configurations to your hosts at runtime. The AWS AppConfig extension integrates Lambda and AWS AppConfig seamlessly. Lambda functions have simple access to external configuration settings quickly and easily. Developers can now dynamically change their Lambda function’s configuration safely using robust validation features.
  • Amazon CodeGuru Profiler helps developers improve application performance and reduce costs by pinpointing an application’s most expensive line of code. It provides recommendations for improving code to save money. The Lambda integration removes the need to change any code or redeploy packages.
  • Amazon CloudWatch Lambda Insights enables you to efficiently monitor, troubleshoot, and optimize Lambda functions. The Lambda Insights extension simplifies the collection, visualization, and investigation of detailed compute performance metrics, errors, and logs. You can more easily isolate and correlate performance problems to optimize your Lambda environments.
  • AWS Distro for OpenTelemetry is a secure, production-ready, AWS-supported distribution of the OpenTelemetry project. The Lambda extension runs the OpenTelemetry collector and enables functions to send trace data to AWS monitoring services such as AWS X-Ray and to any destination such as Honeycomb and Lightstep that supports OpenTelemetry Protocol (OTLP) using the OTLP exporter.

To get started with Lambda extensions, use the links provided to install these extensions.

Things to Know
Here are a couple of things to keep in mind:

Pricing: Extensions share the same billing model as Lambda functions and you are charged for compute time used in all phases of the Lambda lifecycle. For function invocations, you pay for requests served and the compute time used to run your code and all extensions, together, in 1ms increments. To learn more about billing for extensions, visit the Lambda FAQs page.

Performance: Lambda extensions might impact the performance of your function because they share resources such as CPU, memory, and storage with the function, and because extensions are initialized before function code. For example, if an extension performs compute-intensive operations, you might see your function’s execution duration increase because the extension and your function code share the same CPU resources.

Because Lambda uses allocates proportional CPU power based on the memory setting, you might see increased execution and initialization duration at lower memory settings as more processes compete for the same CPU resources. You can use CloudWatch metrics such as PostRuntimeExecutionDuration to measure the extra time the extension takes after the function execution and MaxMemoryUsed to measure the increase in memory used.

Available Now
The performance improvements announced as part of GA are currently in US East (N. Virginia), Europe (Ireland), and Europe (Milan) Regions. (Update. AWS Lambda Extensions are now generally available in all commercial regions.)

You can also build your own extensions. To learn how to build extensions, see the Lambda Extensions API in the AWS Lambda Developer Guide. You can send feedback through the AWS forum for AWS Lambda or through your usual AWS Support contacts.

Channy

Update. Watch a quick introductory video and a deep dive playlist about AWS Lambda Extensions for more information.

Mass Exploitation of Exchange Server Zero-Day CVEs: What You Need to Know

Post Syndicated from Caitlin Condon original https://blog.rapid7.com/2021/03/03/mass-exploitation-of-exchange-server-zero-day-cves-what-you-need-to-know/

Mass Exploitation of Exchange Server Zero-Day CVEs: What You Need to Know

On March 2, 2021, the Microsoft Threat Intelligence Center (MSTIC) released details on an active state-sponsored threat campaign exploiting four zero-day vulnerabilities in on-premises instances of Microsoft Exchange Server. MSTIC attributes this campaign to HAFNIUM, a group “assessed to be state-sponsored and operating out of China.”

Rapid7 detection and response teams have also observed increased threat activity against Microsoft Exchange Server since Feb. 27, 2021, and can confirm ongoing mass exploitation of vulnerable Exchange instances. Microsoft Exchange customers should apply the latest updates on an emergency basis and take immediate steps to harden their Exchange instances. We strongly recommend that organizations monitor closely for suspicious activity and indicators of compromise (IOCs) stemming from this campaign. Rapid7 has a comprehensive list of IOCs available here.

The actively exploited zero-day vulnerabilities disclosed in the MSTIC announcement as part of the HAFNIUM-attributed threat campaign are:

  • CVE-2021-26855 is a server-side request forgery (SSRF) vulnerability in Exchange that allows an attacker to send arbitrary HTTP requests and authenticate as the Exchange server.
  • CVE-2021-26857 is an insecure deserialization vulnerability in the Unified Messaging service. Insecure deserialization is where untrusted user-controllable data is deserialized by a program. Exploiting this vulnerability gives an attacker the ability to run code as SYSTEM on the Exchange server. This requires administrator permission or another vulnerability to exploit.
  • CVE-2021-26858 is a post-authentication arbitrary file write vulnerability in Exchange. If an attacker could authenticate with the Exchange server, they could use this vulnerability to write a file to any path on the server. They could authenticate by exploiting the CVE-2021-26855 SSRF vulnerability or by compromising a legitimate admin’s credentials.
  • CVE-2021-27065 is a post-authentication arbitrary file write vulnerability in Exchange. An attacker who can authenticate with the Exchange server can use this vulnerability to write a file to any path on the server. They could authenticate by exploiting the CVE-2021-26855 SSRF vulnerability or by compromising a legitimate admin’s credentials.

Also included in the out-of-band update were three additional remote code execution vulnerabilities in Microsoft Exchange. These additional vulnerabilities are not known to be part of the HAFNIUM-attributed threat campaign but should be remediated with the same urgency nonetheless:

Microsoft has released out-of-band patches for all seven vulnerabilities as of March 2, 2021. Security updates are available for the following specific versions of Exchange:

  • Exchange Server 2010 (for Service Pack 3—this is a Defense in Depth update)
  • Exchange Server 2013 (CU 23)
  • Exchange Server 2016 (CU 19, CU 18)
  • Exchange Server 2019 (CU 8, CU 7)

Exchange Online is not affected.

For Rapid7 customers

InsightVM and Nexpose customers can assess their exposure to these vulnerabilities with authenticated vulnerability checks. Customers will need to perform a console restart after consuming the content update in order to scan for these vulnerabilities.

InsightIDR will generate an alert if suspicious activity is detected in your environment. The Insight Agent must be installed on Exchange Servers to detect the attacker behaviors observed as part of this attack. If you have not already done so, install the Insight Agent on your Exchange Servers.

For individual vulnerability analysis, see AttackerKB.

Indiscriminate Exploitation of Microsoft Exchange Servers (CVE-2021-24085)

Post Syndicated from Andrew Christian original https://blog.rapid7.com/2021/03/02/indiscriminate-exploitation-of-microsoft-exchange-servers-cve-2021-24085/

Indiscriminate Exploitation of Microsoft Exchange Servers (CVE-2021-24085)

The following blog post was co-authored by Andrew Christian and Brendan Watters.

Beginning Feb. 27, 2021, Rapid7’s Managed Detection and Response (MDR) team has observed a notable increase in the automated exploitation of vulnerable Microsoft Exchange servers to upload a webshell granting attackers remote access. The suspected vulnerability being exploited is a cross-site request forgery (CSRF) vulnerability: The likeliest culprit is CVE-2021-24085, an Exchange Server spoofing vulnerability released as part of Microsoft’s February 2021 Patch Tuesday advisory, though other CVEs may also be at play (e.g., CVE-2021-26855, CVE-2021-26865, CVE-2021-26857).

The following China Chopper command was observed multiple times beginning Feb. 27 using the same DigitalOcean source IP (165.232.154.116):

cmd /c cd /d C:\inetpub\wwwroot\aspnet_client\system_web&net group "Exchange Organization administrators" administrator /del /domain&echo [S]&cd&echo [E]

Exchange or other systems administrators who see this command—or any other China Chopper command in the near future—should look for the following in IIS logs:

  • 165.232.154.116 (the source IP of the requests)
  • /ecp/y.js
  • /ecp/DDI/DDIService.svc/GetList

Indicators of compromise (IOCs) from the attacks we have observed are consistent with IOCs for publicly available exploit code targeting CVE-2021-24085 released by security researcher Steven Seeley last week, shortly before indiscriminate exploitation began. After initial exploitation, attackers drop an ASP eval webshell before (usually) executing procdump against lsass.exe in order to grab all the credentials from the box. It would also be possible to then clean some indicators of compromise from the affected machine[s]. We have included a section on CVE-2021-24085 exploitation at the end of this document.

Exchange servers are frequent, high-value attack targets whose patch rates often lag behind attacker capabilities. Rapid7 Labs has identified nearly 170,000 Exchange servers vulnerable to CVE-2021-24085 on the public internet:

Indiscriminate Exploitation of Microsoft Exchange Servers (CVE-2021-24085)

Rapid7 recommends that Exchange customers apply Microsoft’s February 2021 updates immediately. InsightVM and Nexpose customers can assess their exposure to CVE-2021-24085 and other February Patch Tuesday CVEs with vulnerability checks. InsightIDR provides existing coverage for this vulnerability via our out-of-the-box China Chopper Webshell Executing Commands detection, and will alert you about any suspicious activity. View this detection in the Attacker Tool section of the InsightIDR Detection Library.

CVE-2021-24085 exploit chain

As part of the PoC for CVE-2021-24085, the attacker will search for a specific token using a request to /ecp/DDI/DDIService.svc/GetList. If that request is successful, the PoC moves on to writing the desired token to the server’s filesystem with the request /ecp/DDI/DDIService.svc/SetObject. At that point, the token is available for downloading directly. The PoC uses a download request to /ecp/poc.png (though the name could be anything) and may be recorded in the IIS logs themselves attached to the IP of the initial attack.

Indicators of compromise would include the requests to both /ecp/DDI/DDIService.svc/GetList and /ecp/DDI/DDIService.svc/SetObject, especially if those requests were associated with an odd user agent string like python. Because the PoC utilizes aSetObject to write the token o the server’s filesystem in a world-readable location, it would be beneficial for incident responders to examine any files that were created around the time of the requests, as one of those files could be the access token and should be removed or placed in a secure location. It is also possible that responders could discover the file name in question by checking to see if the original attacker’s IP downloaded any files.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Multiple Unauthenticated Remote Code Control and Execution Vulnerabilities in Multiple Cisco Products

Post Syndicated from boB Rudis original https://blog.rapid7.com/2021/02/25/multiple-unauthenticated-remote-code-control-and-execution-vulnerabilities-in-multiple-cisco-products/

What’s up?

Multiple Unauthenticated Remote Code Control and Execution Vulnerabilities in Multiple Cisco Products

On Feb. 24, 2021, Cisco released many patches for multiple products, three of which require immediate attention by organizations if they are running affected systems and operating system/software configurations. They are detailed below:

Cisco ACI Multi-Site Orchestrator Application Services Engine Deployment Authentication Bypass Vulnerability (CVSSv3 Base 10; CVE-2021-1388)

Cisco Security Advisory

Cisco Multi-Site Orchestrator (MSO) is the product responsible for provisioning, health monitoring, and managing the full lifecycle of Cisco Application Centric Infrastructure (ACI) networking policies and tenant policies across all Cisco ACI sites organizations have deployed. It essentially has full control over every aspect of networking and network security. Furthermore, Cisco ACI can be integrated with and administratively control VMware vCenter Server, Microsoft System Center VMM [SCVMM], and OpenStack controller virtualization platform managers.

A weakness in an API endpoint of Cisco ACI MSO installed on the Application Services Engine could allow an unauthenticated, remote attacker to bypass authentication on an affected device. One or more API endpoints improperly validated API tokens and a successful exploit gives an unauthenticated, remote attacker full control over this powerful endpoint.

This vulnerability affects Cisco ACI Multi-Site Orchestrator (MSO) running a 3.0 release of software only when deployed on a Cisco Application Services Engine. Only version 3.0 (3m) is vulnerable.

Thankfully, this vulnerability was discovered internally, reducing the immediate likelihood of proof-of-concept exploits being available.

Organizations are encouraged to restrict API access to trusted, segmented networks and ensure this patch is applied within critical patch change windows.

Cisco Application Services Engine Unauthorized Access Vulnerabilities (CVSSv3 Base 9.8; CVE-2021-1393, CVE-2021-1396)

Cisco Security Advisory

CVE-2021-1393 allows unauthenticated, remote attackers access to a privileged service on affected devices. One service running on the ASE Data Network has insufficient access controls which can be exploited by attackers via specially crafted TCP requests. Successful exploits result in privileged device access enabling the running of containers and execution of any host-level commands.

CVE-2021-1396 allows unauthenticated, remote attackers access to a privileged service on affected devices. This, too, affects a service API with lax access controls on the Data Network. Successful exploitation results in significant information disclosure, creation of support-level artifacts on an isolated volume, and the ability to manipulate an undocumented subset of configuration settings.

The ASE Data Network provides the following services:

  • Cisco Application Services Engine Clustering
  • App to app communication
  • Access to the management network of the Cisco ACI fabric
  • All app-to-ACI fabric communications

The Data Network is not the same as the Management Network, thus segmentation is not an option for temporary mitigation.

These vulnerabilities affect Cisco ASE software released 1.1 (3d) and earlier.

Again, thankfully, this vulnerability was discovered internally, reducing the immediate likelihood of proof-of-concept exploits being available.

Organizations are encouraged to ensure this patch is applied within critical patch change windows.

Cisco NX-OS Software Unauthenticated Arbitrary File Actions Vulnerability (CVSSv3 Base 9.8; CVE-2021-1361)

Cisco Security Advisory

CVE-2021-1361 enables remote, unauthenticated attackers to create, modify, or delete arbitrary files with the privileges of the root user on Cisco Nexus 3000 and 9000 series switches in standalone NX-OS mode.

Cisco has provided more technical information on this critical vulnerability than they have for the previous two, disclosing that a service running on TCP port 9075 improperly listens and responds to external communication requests. Specially crafted TCP requests can result in sufficient permissions to perform a cadre of actions, including creating a local user account without administrators (or log collectors) knowing.

Organizations can use the following command line on standalone NX-OS Nexus 3000/9000’s to determine if this service is listening externally:

nexus# show sockets connection | include 9075
tcp LISTEN 0 32 * : 9075

Only Nexus 3000/9000 series switches in standalone NX-OS mode are affected.

Organizations are encouraged to restrict Cisco management and control plane access to trusted, segmented networks and use on-device access control lists (ACLs) to block external requests to TCP port 9075. Once mitigations are performed, organizations should ensure this patch is applied within critical patch change windows. However, please note that this vulnerability was discovered by an external, anonymous reporter, which likely means there is at least one individual/group outside of Cisco that knows how to exploit this weakness. Such information may affect patch prioritization decisions in some organizations.

Rapid7 will update this post as more information is provided or proof-of-concept exploits are discovered.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

VMware vCenter Server CVE-2021-21972 Remote Code Execution Vulnerability: What You Need to Know

Post Syndicated from boB Rudis original https://blog.rapid7.com/2021/02/24/vmware-vcenter-server-cve-2021-21972-remote-code-execution-vulnerability-what-you-need-to-know/

VMware vCenter Server CVE-2021-21972 Remote Code Execution Vulnerability: What You Need to Know

This blog post was co-authored by Bob Rudis and Caitlin Condon.

What’s up?

On Feb. 23, 2021, VMware published an advisory (VMSA-2021-0002) describing three weaknesses affecting VMware ESXi, VMware vCenter Server, and VMware Cloud Foundation.

Before digging into the individual vulnerabilities, it is vital that all organizations that use the HTML5 VMware vSphere Client, i.e., VMware vCenter Server (7.x before 7.0 U1c, 6.7 before 6.7 U3l and 6.5 before 6.5 U3n) and VMware Cloud Foundation (4.x before 4.2 and 3.x before 3.10.1.2) immediately restrict network access to those clients—especially if they are not segmented off on a management network—implement the mitigation noted below, and consider performing accelerated/immediate patching on those systems.

Vulnerability details and recommendations

CVE-2021-21972 is a critical (CVSSv3 base 9.8) unauthenticated remote code execution vulnerability in the HTML5 vSphere client. Any malicious actor with access to port 443 can exploit this weakness and execute commands with unrestricted privileges.

PT Swarm has provided a detailed walkthrough of this weakness and how to exploit it.

Rapid7 researchers have independently analyzed, tested, and confirmed the exploitability of this weakness and have provided a full technical analysis.

Proof-of-concept working exploits are beginning to appear on public code-sharing sites.

Organizations should restrict access to this plugin and patch affected systems immediately (i.e., not wait for standard patch change windows).

VMware has provided steps for a temporary mitigation, which involves disabling the plugin.

CVE-2021-21973 is an important (CVSSv3 base 8.8) heap-overflow-based remote code execution vulnerability in VMware ESXi OpenSLP. Attackers with same-segment network access to port 427 on affected systems may be able to use the heap-overflow weakness to perform remote code execution.

VMware has provided steps for a temporary mitigation, which involves disabling the SLP service on affected systems.

Rapid7 recommends applying the vendor-provided patches as soon as possible after performing the vendor-recommended mitigation.

CVE-2021-21974 is a moderate (CVSSv3 base 5.3) server-side request forgery vulnerability affecting the HTML5 vSphere Client. Attackers with access to port 443 of affected systems can use this weakness to gain access to underlying system information.

VMware has provided steps for a temporary mitigation, which involves disabling the plugin.

Since attackers will already be focusing on VMware systems due to the other high-severity weaknesses, Rapid7 recommends applying the vendor-provided patches as soon as possible after performing the vendor-recommended mitigation.

Attacker activity

Rapid7 Labs has not detected broad scanning for internet-facing VMware vCenter servers, but Bad Packets has reported that they’ve detected opportunistic scanning. We will continue to monitor Project Heisenberg for attacker activity and update this blog post as we have more information.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

New – Amazon Elastic Block Store Local Snapshots on AWS Outposts

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-elastic-block-store-local-snapshots-on-aws-outposts/

Today I am happy to announce that AWS Outposts customers can now make local snapshots of their Amazon Elastic Block Store (EBS) volumes, making it easy to meet data residency and local backup requirements. AWS Outposts is a fully managed service that extends AWS infrastructure, services, APIs, and tools to virtually any datacenter, co-location space, or on-premises facility for a truly consistent hybrid experience. Until now, Amazon EBS snapshots on Outposts were stored by default on Amazon Simple Storage Service (S3) in the AWS Region. If your Outpost is provisioned with Amazon S3 on Outposts, now you have the option to store your snapshots locally on your Outpost.

Customers use AWS Outposts to support applications that need to run on-premises due to low latency, local data processing, or data residency requirements. Customers looking to use AWS services in countries where no AWS Region exists today can opt to run their applications on Outposts. Sometimes data needs to remain in a particular country, state, or municipality for regulatory, contractual, or information security reasons. These customers need the data for snapshots and Amazon Machine Image (AMI) to be stored locally on Outposts to operate their applications. In addition, some of our customers could also see value for workloads that need low latency access to local backups.

EBS Local Snapshots on Outposts is a new capability that enables snapshots and AMI data to be stored locally on Amazon S3 on Outposts. Now you can create and manage EBS Local Snapshots on Outposts through the AWS Management Console, AWS Command Line Interface (CLI), and AWS SDKs. You can also continue to take snapshots of EBS volumes on Outposts, which are stored in S3 in the associated parent Region.

How to Get Started With EBS Local Snapshots on Outposts
To get started, visit the AWS Outposts Management Console to order an Outposts configuration that includes your selected EBS and Amazon S3 storage capacity (EBS snapshots use Amazon S3 on Outposts to store snapshots), or you can add S3 storage to your existing Outposts. EBS Local Snapshots are enabled on Outposts provisioned with Amazon S3 on Outposts.

To create a local EBS snapshot on Outposts, go to the EBS volume console and select the volume you want to create a snapshot from. Click the Actions button, then select Create Snapshot in the dropdown menu.

You can create a snapshot either in the AWS Region or your Outposts when you choose the Snapshot destination. The AWS Region snapshot uses Amazon S3 in the region and the AWS Outposts snapshot uses S3 storage on Outposts for storing the snapshots. Amazon S3 on Outposts is a new storage class, which is designed to durably and redundantly store data on Outposts. Note that due to its scale, Amazon S3 in a region offers higher durability than S3 on Outposts.

You can call CreateSnapshot with the outpost-arn parameter set to the Outposts ARN that uniquely identifies your installation. If data residency is not a concern, you can also get the CreateSnapshot API to create the snapshot in the parent AWS Region by specifying AWS Region as the destination.

$ aws ec2 create-snapshot \
     --volume-id vol-1234567890abcdef0 \
     --outpost-arn arn:aws:outposts:us-east-1:123456789012:outpost/op-1a2b3c \ 
	 --description "local snapshots in outpost"

You can also use commands for the AWS Command Line Interface (CLI) and AWS SDKs e.g. CreateSnapshots, DescribeSnapshot, CopySnapshot, and DeleteSnapshot to manage snapshots on Outposts, and use Amazon Data Lifecycle Manager to automate snapshots management on Outposts. All local snapshots on Outposts are Encrypted by Default (EBD).

You can set IAM policies for data residency of your snapshots. The policy example below will enforce data residency on the Outposts by denying CreateSnapshot(s) calls to create snapshots in the region from outpost volumes.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Deny",
         "Action":[
            "ec2:CreateSnapshot",
            "ec2:CreateSnapshots"
         ],
         "Resource":"arn:aws:ec2:us-west-2::snapshot/*",
         "Condition":{
            "StringEquals":{
               "ec2:SourceOutpostArn":"arn:aws:outposts:us-west-2:1234567890:outpost/op-1a2b3c"
            },
            "Null":{
               "ec2:OutpostArn":"true"
            }
         }
      },
      {
         "Effect":"Allow",
         "Action":[
            "ec2:CreateSnapshot",
            "ec2:CreateSnapshots"
         ],
         "Resource":"*"
      }
   ]
}

You can audit your own data residency compliance by calling the DescribeSnapshots API that will return the snapshot’s storage location. All creation, update, and copy operations are logged in AWS CloudTrail audit logs.

You can copy AMI snapshots from the AWS Region to your Outposts and register them as AMI to launch your EC2 instances on Outposts.

Also, you can do this via simple AWS Command Line Interface (CLI) commands as follows:

$ aws ec2 copy-snapshot \
     --region us-west-2 \
     --source-region us-west-2 \
     --source-snapshot-id snap-1 \
     --destination-outpost-arn arn:aws:outposts:us-west-2:123456789012:outpost/op-1a2b3c \ 
	 --description "This is my copied snapshot."

Now you can register the snapshot as a local AMI for launching your EC2 instances on your Outposts.

$ aws ec2 register-image \
    --root-device-name /dev/sda1 \
    --block-device-mappings '[ \
       {"DeviceName": "/dev/sda1", "Ebs" :{"VolumeSize":100, "SnapshotId":"snap-1-copy"}}]'

You can also copy your regional AMIs to Outposts using the copy-image command. Specify the ID of the AMI to copy, the source Region, and the ARN of the destination Outpost.

$ aws ec2 copy-image \
       --source-region us-west-2 \
	   --source-image-id ami-1234567890abcdef0  \
	   --name "Local AMI copy"  \
	   --destination-outpost-arn arn:aws:outposts:us-west-2:123456789012:outpost/op-1a2b3c

Copying of local snapshots on Outposts to the parent AWS Region is not supported. In scenarios where data residency is required, you can only create local snapshots or copy snapshots from the parent Region. To ensure your data residency requirements are met on AWS Outposts, I recommend you refer to whitepapers such as AWS Policy Perspectives: Data Residency and Addressing Data Residency Requirements with AWS Outposts, and confirm and work closely with your compliance and security teams.

CloudEndure Migration and Disaster Recovery services, offered by AWS, allow customers to migrate or replicate workloads for recovery purposes into AWS from physical, virtual, or cloud-based sources. Up until now, if customers selected an Outposts device as a migration or recovery target, the snapshot data had to be copied to a public region before being copied back into the Outposts device. This led to increased cutover and recovery times, as well as other data transfer impacts.

With the newly launched availability of EBS Local Snapshots on Outposts, you can migrate, replicate and recover workloads from any sources directly into Outposts, or between Outposts devices, without requiring the EBS snapshot data to go through a public region, leading to lower latencies, greater performance, and reduced costs. Supported use cases related to Outposts for migration and disaster recovery include: from on-premises to Outposts, from public AWS Regions into Outposts, from Outposts into public AWS Regions, and between two Outposts devices. Learn more about CloudEndure Migration and CloudEndure Disaster Recovery.

Available Now
Amazon EBS Local Snapshots on AWS Outposts is available for all Outposts provisioned with S3 on Outposts. To learn more, take a look at the documentation. Please send feedback to the AWS Outposts team, your usual AWS support contacts, or Outposts partners.

Learn all the details about AWS Outposts and get started today.

Channy

Cisco Patches Recently Disclosed “sudo” Vulnerability (CVE-2021-3156) in Multiple Products

Post Syndicated from boB Rudis original https://blog.rapid7.com/2021/02/04/cisco-patches-recently-disclosed-sudo-vulnerability-cve-2021-3156-in-multiple-products/

Cisco Patches Recently Disclosed

While Punxsutawney Phil may have said we only have six more weeks of winter, the need to patch software and hardware weaknesses will, unfortunately, never end.

Cisco has released security updates to address vulnerabilities in most of their product portfolio, some of which may be exploited to gain full system/device control on certain devices, and one fixes the recently disclosed sudo input validation vulnerability. We discuss this vulnerability below, but there are many more lower-severity, or “valid administrator credentials-required” bugs on the Cisco Security Advisories page that all organizations who use Cisco products should review.

Getting back to RBAC

Cisco Patches Recently Disclosed

The “sudo” advisory is officially presented as “Sudo Privilege Escalation Vulnerability Affecting Cisco Products: January 2021” and affects pretty much every Cisco product that has a command line interface. It is a fix for the ubiquitous CVE-2021-3156 general sudo weakness.

According to the advisory, the vulnerability is due to “improper parsing of command line parameters that may result in a heap-based buffer overflow. An attacker could exploit this vulnerability by accessing a Unix shell on an affected device and then invoking the sudoedit command with crafted parameters or by executing a binary exploit.”

All commands invoked after exploiting this vulnerability will have root privileges.

This weakness will also enable lower-privileged users with access to Cisco devices to elevate their privileges, meaning you technically are out of compliance with any role-based access control requirement (which is in virtually every modern cybersecurity compliance framework).

Rapid7 strongly advises organizations to patch this weakness as soon as possible to stop attackers and curious users from taking control of your network, as well as ensuring you are able to continue checking ✅ this particular compliance box. Even though we mentioned it at the top of the post, don’t forget to check out the rest of the Cisco security advisories to see whether you need to address weaknesses in any of your other Cisco devices.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

SonicWall SNWLID-2021-0001 Zero-Day and SolarWinds’ 2021 CVE Trifecta: What You Need to Know

Post Syndicated from boB Rudis original https://blog.rapid7.com/2021/02/03/sonicwall-snwlid-2021-0001-zero-day-and-solarwinds-2021-cve-trifecta-what-you-need-to-know/

SonicWall SNWLID-2021-0001 Zero-Day and SolarWinds’ 2021 CVE Trifecta: What You Need to Know

Not content with the beating it laid down in January, 2021 continues to deliver with an unpatched zero-day exposure in some SonicWall appliances and three moderate-to-critical CVEs in SolarWinds software. We dig into the details below.

Urgent mitigations required for SonicWall SMA 100 Series appliances

On Jan. 22, 2021, SonicWall published an advisory and in-product notification that they had identified a coordinated attack on their internal systems by highly sophisticated threat actors exploiting probable zero-day vulnerabilities on certain SonicWall secure remote access products.

Specifically, they identified Secure Mobile Access (SMA) version 10.x running on the following physical SMA 100 appliances running firmware version 10x, as well as the SMA 500v virtual appliance:

  • SMA 200
  • SMA 210
  • SMA 400
  • SMA 410

On Jan. 31, 2021, NCC Group Research & Technology confirmed and demonstrated exploitability of a possible candidate for the vulnerability and detected indicators that attackers were exploiting this weakness.

On Feb. 3, 2021, SonicWall released a patch to firmware version SMA 10.2.0.5-29sv, which all impacted organizations should apply immediately.

SonicWall has recommended removing all SMA 100 Series appliances for SMA 500v virtual appliances from the internet until a patch is available. If this is not possible, organizations are strongly encouraged to perform the following steps:

  • Enable multi-factor authentication. SonicWall has indicated this is a “critical” step until the patch is available.
  • Reset user password for all SMA 100 appliances.
  • Configure the web application firewall on the SMA 100 series, which has been updated with rules to detect exploitation attempts (SonicWall indicates that this is normally a subscription-based software, but they have automatically provided 60-day complementary licenses to organizations affected by this vulnerability).

If it’s not possible to perform these steps, SonicWall recommends that organizations downgrade their SMA 100 Series appliances to firmware version 9.x. They do note that this will remove all settings and that the devices will need to be reconfigured from scratch.

Urgent patching required for SolarWinds Orion and Serv-U FTP products

On Feb. 3, 2021, Trustwave published a blog post providing details on two vulnerabilities in the SolarWinds Orion platform and a single vulnerability in the SolarWinds Serv-U FTP server for Windows.

The identified Orion platform weaknesses include:

  • CVE-2021-25274: Trustwave discovered that improper/malicious use of Microsoft Message Queue (MSMQ) could allow any remote, unprivileged attacker to execute arbitrary code in the highest privilege.
  • CVE-2021-25275: Trustwave discovered that credentials are stored insecurely, allowing any local user to take complete control over the SOLARWINDS_ORION database. This could lead to further information theft, and also enables attackers to add new admin-level users to all SolarWinds Orion platform products.

The identified SolarWinds Serv-U FTP server for Windows weakness enables any local user to create a file that can define a new Serv-U FTP admin account with full access to the C:\ drive, which will then give them access or replace any directory or file on the server.

Trustwave indicated they have private, proof-of-concept code that will be published on Feb. 9, 2021.

SolarWinds Orion Platform users can upgrade to version 2020.2.4. SolarWinds ServU-FTP users can upgrade to version 15.2.2 Hotfix 1.

Rapid7 vulnerability researchers have identified that after the Orion Platform patch is applied, there is a digital signature validation step performed on arrived messages so that messages having no signature or not signed with a per-installation certificate are not further processed. On the other hand, the MSMQ is still unauthenticated and allows anyone to send messages to it.

Rapid7 response

Rapid7 Labs is keeping a watchful eye on Project Heisenberg for indications of widespread inventory scans (attackers looking for potentially vulnerable systems) and will provide updates, as warranted, on any new developments.

Our InsightVM coverage team is currently evaluating options for detecting the presence of these vulnerabilities.

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.