Tag Archives: Kubernetes

Intelligent, automatic restarts for unhealthy Kafka consumers

Post Syndicated from Chris Shepherd original https://blog.cloudflare.com/intelligent-automatic-restarts-for-unhealthy-kafka-consumers/

Intelligent, automatic restarts for unhealthy Kafka consumers

Intelligent, automatic restarts for unhealthy Kafka consumers

At Cloudflare, we take steps to ensure we are resilient against failure at all levels of our infrastructure. This includes Kafka, which we use for critical workflows such as sending time-sensitive emails and alerts.

We learned a lot about keeping our applications that leverage Kafka healthy, so they can always be operational. Application health checks are notoriously hard to implement: What determines an application as healthy? How can we keep services operational at all times?

These can be implemented in many ways. We’ll talk about an approach that allows us to considerably reduce incidents with unhealthy applications while requiring less manual intervention.

Kafka at Cloudflare

Cloudflare is a big adopter of Kafka. We use Kafka as a way to decouple services due to its asynchronous nature and reliability. It allows different teams to work effectively without creating dependencies on one another. You can also read more about how other teams at Cloudflare use Kafka in this post.

Kafka is used to send and receive messages. Messages represent some kind of event like a credit card payment or details of a new user created in your platform. These messages can be represented in multiple ways: JSON, Protobuf, Avro and so on.

Kafka organises messages in topics. A topic is an ordered log of events in which each message is marked with a progressive offset. When an event is written by an external system, that is appended to the end of that topic. These events are not deleted from the topic by default (retention can be applied).

Intelligent, automatic restarts for unhealthy Kafka consumers

Topics are stored as log files on disk, which are finite in size. Partitions are a systematic way of breaking the one topic log file into many logs, each of which can be hosted on separate servers–enabling to scale topics.

Topics are managed by brokers–nodes in a Kafka cluster. These are responsible for writing new events to partitions, serving reads and replicating partitions among themselves.

Messages can be consumed by individual consumers or co-ordinated groups of consumers, known as consumer groups.

Consumers use a unique id (consumer id) that allows them to be identified by the broker as an application which is consuming from a specific topic.

Each topic can be read by an infinite number of different consumers, as long as they use a different id. Each consumer can replay the same messages as many times as they want.

When a consumer starts consuming from a topic, it will process all messages, starting from a selected offset, from each partition. With a consumer group, the partitions are divided amongst each consumer in the group. This division is determined by the consumer group leader. This leader will receive information about the other consumers in the group and will decide which consumers will receive messages from which partitions (partition strategy).

Intelligent, automatic restarts for unhealthy Kafka consumers

The offset of a consumer’s commit can demonstrate whether the consumer is working as expected. Committing a processed offset is the way a consumer and its consumer group report to the broker that they have processed a particular message.

Intelligent, automatic restarts for unhealthy Kafka consumers

A standard measurement of whether a consumer is processing fast enough is lag. We use this to measure how far behind the newest message we are. This tracks time elapsed between messages being written to and read from a topic. When a service is lagging behind, it means that the consumption is at a slower rate than new messages being produced.

Due to Cloudflare’s scale, message rates typically end up being very large and a lot of requests are time-sensitive so monitoring this is vital.

At Cloudflare, our applications using Kafka are deployed as microservices on Kubernetes.

Health checks for Kubernetes apps

Kubernetes uses probes to understand if a service is healthy and is ready to receive traffic or to run. When a liveness probe fails and the bounds for retrying are exceeded, Kubernetes restarts the services.

Intelligent, automatic restarts for unhealthy Kafka consumers

When a readiness probe fails and the bounds for retrying are exceeded, it stops sending HTTP traffic to the targeted pods. In the case of Kafka applications this is not relevant as they don’t run an http server. For this reason, we’ll cover only liveness checks.

A classic Kafka liveness check done on a consumer checks the status of the connection with the broker. It’s often best practice to keep these checks simple and perform some basic operations – in this case, something like listing topics. If, for any reason, this check fails consistently, for instance the broker returns a TLS error, Kubernetes terminates the service and starts a new pod of the same service, therefore forcing a new connection. Simple Kafka liveness checks do a good job of understanding when the connection with the broker is unhealthy.

Intelligent, automatic restarts for unhealthy Kafka consumers

Problems with Kafka health checks

Due to Cloudflare’s scale, a lot of our Kafka topics are divided into multiple partitions (in some cases this can be hundreds!) and in many cases the replica count of our consuming service doesn’t necessarily match the number of partitions on the Kafka topic. This can mean that in a lot of scenarios this simple approach to health checking is not quite enough!

Microservices that consume from Kafka topics are healthy if they are consuming and committing offsets at regular intervals when messages are being published to a topic. When such services are not committing offsets as expected, it means that the consumer is in a bad state, and it will start accumulating lag. An approach we often take is to manually terminate and restart the service in Kubernetes, this will cause a reconnection and rebalance.

Intelligent, automatic restarts for unhealthy Kafka consumers

When a consumer joins or leaves a consumer group, a rebalance is triggered and the consumer group leader must re-assign which consumers will read from which partitions.

When a rebalance happens, each consumer is notified to stop consuming. Some consumers might get their assigned partitions taken away and re-assigned to another consumer. We noticed when this happened within our library implementation; if the consumer doesn’t acknowledge this command, it will wait indefinitely for new messages to be consumed from a partition that it’s no longer assigned to, ultimately leading to a deadlock. Usually a manual restart of the faulty client-side app is needed to resume processing.

Intelligent health checks

As we were seeing consumers reporting as “healthy” but sitting idle, it occurred to us that maybe we were focusing on the wrong thing in our health checks. Just because the service is connected to the Kafka broker and can read from the topic, it does not mean the consumer is actively processing messages.

Therefore, we realised we should be focused on message ingestion, using the offset values to ensure that forward progress was being made.

The PagerDuty approach

PagerDuty wrote an excellent blog on this topic which we used as inspiration when coming up with our approach.

Their approach used the current (latest) offset and the committed offset values. The current offset signifies the last message that was sent to the topic, while the committed offset is the last message that was processed by the consumer.

Intelligent, automatic restarts for unhealthy Kafka consumers

Checking the consumer is moving forwards, by ensuring that the latest offset was changing (receiving new messages) and the committed offsets were changing as well (processing the new messages).

Therefore, the solution we came up with:

  • If we cannot read the current offset, fail liveness probe.
  • If we cannot read the committed offset, fail liveness probe.
  • If the committed offset == the current offset, pass liveness probe.
  • If the value for the committed offset has not changed since the last run of the health check, fail liveness probe.
Intelligent, automatic restarts for unhealthy Kafka consumers

To measure if the committed offset is changing, we need to store the value of the previous run, we do this using an in-memory map where partition number is the key. This means each instance of our service only has a view of the partitions it is currently consuming from and will run the health check for each.

Problems

When we first rolled out our smart health checks we started to notice cascading failures some time after release. After initial investigations we realised this was happening when a rebalance happens. It would initially affect one replica then quickly result in the others reporting as unhealthy.

What we observed was due to us storing the previous value of the committed offset in-memory, when a rebalance happens the service may get re-assigned a different partition. When this happened it meant our service was incorrectly assuming that the committed offset for that partition had not changed (as this specific replica was no longer updating the latest value), therefore it would start to report the service as unhealthy. The failing liveness probe would then cause it to restart which would in-turn trigger another rebalancing in Kafka causing other replicas to face the same issue.

Solution

To fix this issue we needed to ensure that each replica only kept track of the offsets for the partitions it was consuming from at that moment. Luckily, the Shopify Sarama library, which we use internally, has functionality to observe when a rebalancing happens. This meant we could use it to rebuild the in-memory map of offsets so that it would only include the relevant partition values.

This is handled by receiving the signal from the session context channel:

for {
  select {
  case message, ok := <-claim.Messages(): // <-- Message received

     // Store latest received offset in-memory
     offsetMap[message.Partition] = message.Offset


     // Handle message
     handleMessage(ctx, message)


     // Commit message offset
     session.MarkMessage(message, "")


  case <-session.Context().Done(): // <-- Rebalance happened

     // Remove rebalanced partition from in-memory map
     delete(offsetMap, claim.Partition())
  }
}

Verifying this solution was straightforward, we just needed to trigger a rebalance. To test this worked in all possible scenarios we spun up a single replica of a service consuming from multiple partitions, then proceeded to scale up the number of replicas until it matched the partition count, then scaled back down to a single replica. By doing this we verified that the health checks could safely handle new partitions being assigned as well as partitions being taken away.

Takeaways

Probes in Kubernetes are very easy to set up and can be a powerful tool to ensure your application is running as expected. Well implemented probes can often be the difference between engineers being called out to fix trivial issues (sometimes outside of working hours) and a service which is self-healing.

However, without proper thought, “dumb” health checks can also lead to a false sense of security that a service is running as expected even when it’s not. One thing we have learnt from this was to think more about the specific behaviour of the service and decide what being unhealthy means in each instance, instead of just ensuring that dependent services are connected.

Monitoring Kubernetes with Zabbix

Post Syndicated from Michaela DeForest original https://blog.zabbix.com/monitoring-kubernetes-with-zabbix/25055/

There are many options available for monitoring Kubernetes and cloud-native applications. In this multi-part blog series, we’ll explore how to use Zabbix to monitor a Kubernetes cluster and understand the metrics generated within Zabbix. We’ll also learn how to exploit Prometheus endpoints exposed by applications to monitor application-specific metrics.

Want to see Kubernetes monitoring in action? Watch the step-by-step Zabbix Kubernetes monitoring configuration and deployment guide.

Why Choose Zabbix to Monitor Kubernetes?

Before choosing Zabbix as a Kubernetes monitoring tool, we asked ourselves, “why would we choose to use Zabbix rather than Prometheus, Grafana, and alertmanager?” After all, they have become the standard monitoring tools in the cloud ecosystem. We decided that our minimum criteria for Zabbix would be that it was just as effective as Prometheus for monitoring both Kubernetes and cloud-native applications.

Through our discovery process, we concluded that Zabbix meets (and exceeds) this minimum requirement. Zabbix provides similar metrics and triggers as Prometheus, alert manager, and Grafana for Kubernetes, as they both use the same backend tools to do this. However, Zabbix can do this in one product while still maintaining flexibility and allowing you to monitor pretty much anything you can write code to collect. Regarding application monitoring, Zabbix can transform Prometheus metrics fed to it by Prometheus exporters and endpoints. In addition, because Zabbix can make calls to any HTTP endpoint, it can monitor applications that do not have a dedicated Prometheus endpoint, unlike Prometheus.

The Zabbix Helm Chart

Zabbix monitors Kubernetes by collecting metrics exposed via the Kubernetes API and kube-state-metrics. The components necessary to monitor a cluster are installed within the cluster using this helm chart provided by Zabbix. The helm chart includes the Zabbix agent installed as a daemon set and is used to monitor local resources and applications on each node. A Zabbix proxy is also installed to collect monitoring data and transfer it to the external Zabbix server.

Only the Zabbix proxy needs access to the Zabbix server, while the agents can send data to the proxy installed in the same namespace as each agent. A cluster role allows Zabbix to access resources in the cluster via the Kubernetes API. While the cluster role could be modified to restrict privileges given to Zabbix, this will result in some items becoming unsupported. We recommend keeping this the same if you want to get the most out of Kubernetes monitoring with Zabbix.

The Zabbix helm chart installs the kube-state-metrics project as a dependency. You may already be familiar with this project under the Kubernetes organization, which generates Prometheus format metrics based on the current state of the Kubernetes resources. In addition, if you have experience using Prometheus to monitor a cluster, you may already have this installed. If that is the case, you can point to this deployment rather than installing another one.

In this tutorial, we will install kube-state-metrics via the Zabbix helm chart.

For more information on skipping this step, refer to the values file in the Zabbix Kubernetes helm chart.

Installing the Zabbix Helm Chart

Now that we’ve explained how the Zabbix helm chart works, let’s go ahead and install it. In this example, we will assume that you have a running Zabbix 6.0 (or higher) instance that is reachable from the cluster you wish to monitor. I am running a 6.0 instance in a different cluster than the one we want to monitor. The server is reachable via the DNS name mdeforest.zabbix.atsgroup.io with a non-standard port of 31103.

We will start by installing the latest Zabbix helm chart. I recommend visiting zabbix.com/integrations/kubernetes to get any sources that may be referred to in this tutorial. There you will find a link to the Zabbix helm chart and templates. For the most part, we will follow the steps outlined in the readme.

 

Using a terminal window, I am going to make sure the active cluster is set to the cluster that I want to monitor:

kubectl config use-context <cluster context name>

I’m then going to add the Zabbix chart repo to my local helm repository:

helm repo add zabbix-chart-6.0 https://cdn.zabbix.com/zabbix/integrations/kubernetes-helm/6.0/

If you’re running Zabbix 6.2 or newer, change the references to 6.0 in this command to 6.2.

Depending on your circumstances, you will need to set a few values for the installation. In most cases, you only need to set a few environment variables for the Zabbix agent and the proxy. The complete list of values and environment variables is available in the helm chart repo, alongside the agent and proxy images on Docker Hub.

In this case, I’m setting the passive server environment variable for the agent to allow any IP to connect. For the proxy, I am setting the server host accessible from the proxy alongside the non-standard port. I’ve also set here some variables related to cache size. These variables may depend on your cluster size, so you may need to play around with them to find the correct values.

Now that I have the values file ready, I’m ready to install the chart. So, we’ll use the following command. Of course, the chart path might vary depending on what version of the chart you’re using.

helm install -f </path/to/values/file> [-n <namespace>] zabbix zabbix-chart-6.0/zabbix-helm-chart

You can also optionally add a namespace. You must wait until everything is running, so I’ll check just that with the following:

watch kubectl get pods

Now that everything is installed, we’re ready to set up hosts in Zabbix that will be associated with the cluster. The last step before we have all the information we need is to obtain the token created via the service account installed with the helm chart. We’ll get this by running the next command, which is the name of the service account that was created:

kubectl get secret -o jsonpath={.data.token} zabbix service-account | base64 -d

This will get the secret created for the service account and grab just the token from that, which is passed to the base64 utility to decode it. Be sure to copy that value somewhere because you’ll need it for later.

You’ll also need the Kubernetes API endpoint. In most cases, you’ll use the proxy installed rather than the server directly or a proxy outside the cluster. If this is the case, you can use the service DNS for the API. We should be able to reach it by pointing to https://kubernetes.default.svc.cluster.local:443/api.

If this is not the case, you can use the output from the command:

kubectl cluster-info

Now, let’s head over to the Zabbix UI. All the templates we need are shipped in Zabbix 6. If for some reason, you can’t find them, they are available for download and import by visiting the integrations page that I pointed out earlier on the Zabbix site.

Adding the Proxy

We will add our proxy by heading to Administration -> Proxies:

  1. Click Create Proxy. Because this is an active proxy by default, we only need to specify the proxy name. If you didn’t make any changes to the helm chart, this should default to zabbix-proxy. If you’d like to name this differently, you can change the environment variable zbx_hostname for the proxy in the helm chart. We’re going to leave it as the default for now. You’re going to enter this name and then click “Add.” After a few minutes, you’ll start to see that it says that the proxy has been seen.
  2. Create a Host Group to put hosts related to Kubernetes. For this example, let’s create one, which we’ll call Kubernetes.
  3. Head to the host page under configuration and click Create Host. The first host will collect metrics related to monitoring Kubernetes nodes, and we’ll discover nodes and create new hosts using Zabbix low-level discovery.
  4. Give this host the name Kubernetes Nodes. We’ll also assign this host to the Kubernetes host group we created and attach the template Kubernetes nodes by HTTP.
  5. Change the line “Monitored by proxy” to the proxy created earlier, called zabbix-proxy.
  6. Click the Macros tab and select “Inherited and host macros.” You should be able to see all the macros that may be set to influence what is monitored in your cluster. In this case, we need to change the first two macros. The first, {KUBE.API.ENDPOINT.URL}, should be set to the Kubernetes API endpoint. In our case, we can set it to what I mentioned earlier: default.svc.cluster.local:443/api. Next, the token should be set to the previously retrieved value from the command line.
  7. lick Add. After a few minutes, you should start seeing data on the latest data page and new hosts on the host page representing each node.

Creating an Additional Host

Now let’s create another host that will represent the metrics available via the Kubernetes API and the kube-state-metrics endpoint.

  1. Click Create Host again, name this host Kubernetes Cluster State, and add it to the Kubernetes group again.
  2. Let’s also attach the Kubernetes Cluster State template by HTTP. Again, we’re going to choose the proxy that we created earlier.
  3. In the Macro section, change the kube.api.url to the same thing we used before, but this time leave off the /api at the end. Simply: default.svc.cluster.local:443. Be sure to set the token as we did before.
  4. Assuming nothing else was changed in the installation of the helm chart, we can now add that host.

After a few minutes, you should receive metrics related to the cluster state, including hosts representing the kubelet on each node.

What’s Next?

Now you’re all set to start monitoring your Kubernetes cluster in Zabbix! Give it a try, and let us know your thoughts in the comments.

In the next blog post, we’ll look at what you can do with your newly monitored cluster and how to get the most out of it.

If you’d like help with any of this, ATS has advanced monitoring, orchestration, and automation skills to make this process a snap. Set up a 15-minute with our team to go through any questions you have.

About the Author

Michaela DeForest is a Platform Engineer for The ATS Group.  She is a Zabbix Certified Specialist on Zabbix 6.0 with additional areas of expertise, including Terraform, Amazon Web Services (AWS), Ansible, and Kubernetes, to name a few.  As ATS’s resident authority in DevOps, Michaela is critical in delivering cutting-edge solutions that help businesses improve efficiency, reduce errors, and achieve a faster ROI.

About ATS Group: The ATS Group provides a fully inclusive set of technology services and tools designed to innovate and transform IT.  Their systems integration, business resiliency, cloud enablement, infrastructure intelligence, and managed services help businesses of all sizes “get IT done.” With over 20 years in business, ATS has become the trusted advisor to nearly 500 customers across multiple industries.  They have built their reputation around honesty, integrity, and technical expertise unrivaled by the competition.

How to investigate and take action on security issues in Amazon EKS clusters with Amazon Detective – Part 2

Post Syndicated from Marshall Jones original https://aws.amazon.com/blogs/security/how-to-investigate-and-take-action-on-security-issues-in-amazon-eks-clusters-with-amazon-detective-part-2/

In part 1 of this of this two-part series, How to detect security issues in Amazon EKS cluster using Amazon GuardDuty, we walked through a real-world observed security issue in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster and saw how Amazon GuardDuty detected each phase by following MITRE ATT&CK tactics.

In this blog post, we’ll walk you through investigative techniques to use with Amazon Detective, paired with the GuardDuty EKS and malware findings from the security issue. After we have identified impacted resources through our investigation, we’ll provide example remediation tactics and preventative controls to address and help prevent security issues in EKS clusters.

Amazon Detective can help you investigate security issues and related resources in your account. Detective provides EKS coverage that you can enable within your accounts. When this coverage is enabled, Detective can help investigate and remediate potentially unauthorized EKS activity that results from misconfiguration of the control plane nodes or application. Although GuardDuty is not a prerequisite to enable Detective, it is recommended that you enable GuardDuty to enhance the visualization capabilities in Detective with GuardDuty findings.

Prerequisites

You must have the following services enabled in your AWS account to generate and investigate findings associated with EKS security events in a similar manner as outlined in this blog. If you do not have GuardDuty enabled, you can still investigate with Detective, but in a limited capacity.

Investigate with Amazon Detective

In the five phases we walked through in part 1, we discussed GuardDuty findings and MITRE ATT&CK tactics that can help you detect and understand each phase of the unauthorized activity, from the initial misconfiguration to the impact on our application when the EKS cluster is used for crypto mining.

The next recommended step is to investigate the EKS cluster and any associated resources. Amazon Detective can help you to investigate whether there was any other related unauthorized activity in the environment. We will walk through Detective capabilities for visualizing and gathering important information to effectively respond to the security issue. If you’re interested in creating detailed incident response playbooks for your security team to follow in your own environment, refer to these sample AWS incident response playbooks.

Depending on your scenario, there are various resources you can use to start your investigation, such as Security Hub findings, GuardDuty findings, related Kubernetes subjects, or an AWS account’s AWS CloudTrail activity. For our walkthrough, we’ll start our investigation from the GuardDuty finding and use the EKS cluster resource to pivot to the Detective console, as shown in Figure 7. Although we initially focus on the EKS cluster, you could start from any entities that are supported in the Detective behavior graph structure in the Amazon Detective User Guide. For example, we could start directly with the Kubernetes subject system:anonymous and find activity associated with the anonymous user.

Figure 7: Example Detective popup from GuardDuty finding for EKS cluster

Figure 7: Example Detective popup from GuardDuty finding for EKS cluster

We’ll now go over the information that you would need to gather from Detective in order to investigate the example security issue.

To investigate EKS cluster findings with Detective

  1. In the GuardDuty console, navigate to an individual finding and hover over Investigate with Detective. Choose one of the specific resources to start. In the image below, we selected the EKS cluster resource to investigate with Detective. You will need to gather some preliminary information about the IAM roles associated with the EKS cluster.
    • Questions: When was the cluster created? What IAM role created the cluster? What IAM role is assigned to the cluster?
    • Why it matters: If you are an incident responder, these details can potentially help you identify the owner of the cluster and help you determine what IAM principals are involved.
    • What next: Start looking into each IAM principal’s activity, as seen in CloudTrail, to investigate whether the IAM entity itself is potentially compromised or what other resources may have been impacted.
    Figure 8: Detective summary page for EKS cluster metadata details

    Figure 8: Detective summary page for EKS cluster metadata details

  2. Next, on the EKS cluster overview page, you can see the container details associated with the cluster.
    • Question: What are some of the other container details for the cluster? Does anything look out of the ordinary? Is it using a public image? Is it missing a network policy?
    • Why it matters: Based on the architecture related to this cluster, you might be able to use this information to determine whether there are unauthorized containers. The contents of unauthorized containers will depend on your organization but typically consist of public images or unauthorized RBAC, pod security policies, or network policy configurations. It’s important to keep in mind that when you look at data in Detective, the scope time is very important. When you pivot from a GuardDuty finding, the scope time will be set to the first time the GuardDuty finding was seen to the last time the finding was seen. The container details reflect the containers that were running during the selected scope time. Changing the scope time might change the containers that are listed in the table shown in Figure 9.
    • What next: Information found on this page can help to highlight unauthorized resources or configurations that will need to be remediated. You will also need to look at how these resources were initially created and if there are missing guardrails that should have been created during the provisioning of the cluster.
    Figure 9: Detective summary page for EKS container metadata details

    Figure 9: Detective summary page for EKS container metadata details

  3. Finally, you will see associated security findings with this specific EKS cluster, similar to Figure 10, at the bottom of the EKS cluster overview page in Detective.
    • Question: Are there any other security findings associated with this cluster that I previously was not aware of?
    • Why it matters: In our example scenario, we walked through the findings that were initially detected and the events that unfolded from those findings. After further investigation, you might see other findings that were not part of the original investigation. This can occur if your security team is only investigating specific findings or severity values. The finding for PrivilegeEscalation:Kubernetes/PrivilegedContainer informs you that a privileged container was launched on your Kubernetes cluster by using an image that has never before been used to launch privileged containers in your cluster. A privileged container has root level access to the host. The other finding, Persistence:Kubernetes/ContainerWithSensitiveMount, informs you that a container was launched with a configuration that included a sensitive host path with write access in the volumeMounts section. This makes the sensitive host path accessible and writable from inside the container. Any finding associated to the suspicious or compromised cluster is valuable because it provides additional insight into what the unauthorized entity was trying to accomplish after the initial detection.
    • What next: With Detective, you might want to continue your investigation by selecting each of these findings and reviewing all details related to the finding. Depending on the findings, you could bring in additional team members to help investigate further. For this example, we will move on to the next step.
    Figure 10: Example Detective summary of security findings associated with the EKS cluster

    Figure 10: Example Detective summary of security findings associated with the EKS cluster

  4. Shift from the EKS cluster overview section to the Kubernetes API activity section, similar to Figure 11 below. This will give you the opportunity to dig into the API activity associated with this cluster.
    1. Question: What other Kubernetes API activity was attempted from the cluster? Which API calls were successful? Which API calls failed? What was the unauthorized user trying to do?
    2. Why it matters: It’s important to determine which actions were successfully invoked by the unauthorized user so that appropriate remediation actions can be taken. You can look at trends of successful and failed API calls, and can even search by Subject, IP address, or Kubernetes API call.
    3. What next: You might want to look at all cluster role binding from days before the first GuardDuty finding was seen to determine if there was any other suspicious activity you should be investigating regarding the cluster.
    Figure 11: Example Detective summary page for Kubernetes API activity on the EKS cluster

    Figure 11: Example Detective summary page for Kubernetes API activity on the EKS cluster

  5. Next, you will want to look at the Newly observed Kubernetes API calls section, similar to Figure 12 below.
    • Question: What are some of the more recent Kubernetes API calls? What are they trying to access right now and are they successful? Do I need to start taking action for other resources outside of EKS?
    • Why it matters: This data shows Kubernetes subjects who were observed issuing API calls to this cluster for the first time during our scope time. Detective provides you this information by keeping a baseline of the activity associated with supported AWS resources. This can help you more quickly determine whether activity might be suspicious and worth looking into. In our example, we used the search functionality to look at API calls associated with the built-in Kubernetes secrets management. A common way to start your search is to see if an unauthorized user has successfully accessed any secrets, which can help you determine what information you might want to search in the overall API call volume section discussed in step 4.
    • What next: If the unauthorized user has successfully accessed any secret, those secrets should be marked as compromised, and they should be rotated immediately.
    Figure 12: Example Detective summary for newly observed Kubernetes API calls from the EKS cluster

    Figure 12: Example Detective summary for newly observed Kubernetes API calls from the EKS cluster

  6. You can also consider the following question when you look at the Newly observed Kubernetes API calls section.
    • Question: Has the IP address associated with the finding been communicating with any other resources in our environment, and if so, what are the details of that communication?
    • Why it matters: To answer this question, you can use Detective’s search functionality and the ability to use wild cards to search for IP addresses with the same first three octets. Also note that you can use CIDR notation to search, as well. Based on the results in the example in Figure 13, you can see that there are a number of related IP addresses associated with the environment. With this information, you now can look at the traffic associated with these different IPs and what resources they were communicating with.
    Figure 13: Example Detective results page from a query against IP addresses associated with the EKS cluster

    Figure 13: Example Detective results page from a query against IP addresses associated with the EKS cluster

  7. You can select one of the IP addresses in the search results to get more information related to it, similar to Figure 14 below.
    1. Question: What was the first time an IP address was observed in the environment? When was the last time it was observed?
    2. Why it matters: You can use this information to start isolating where unauthorized activity is coming from and what actions are being taken. You can also start creating a time series of unauthorized activity and scope.
    3. What next: You can repeat some of the previous investigation steps for each IP address, like looking at the different tabs to review New behavior, Resource interaction, and Kubernetes activity.
    Figure 14: Example Detective results page for specific IP address and associated metadata details

    Figure 14: Example Detective results page for specific IP address and associated metadata details

In summary, we began our investigation with a GuardDuty finding about an anonymous API request that was successful in using system:anonymous on one of our EKS clusters. We then used Detective to investigate and visualize activity associated with that EKS cluster, such as volume of successful or unsuccessful API requests, where and when those actions were attempted and other security findings associated with the resource. Once we have completed the investigation, we can confirm scope and impact of the security event and start moving towards taking action.

Remediation techniques for Amazon EKS

In this section, we will focus on how to remediate the security issue in our example. Your actions will vary based on your organization and the resources affected. It’s important to note that these actions will impact the EKS cluster and associated workloads, and should accordingly be performed by or coordinated with the cluster operator.

Before you take action on the EKS cluster, you will need to preserve forensic artifacts and evidence for the impacted EKS resources. The order of operations for these actions matters, because you want to get all the data from forensic artifacts in order to determine the overall impact to the resources affected. If you quarantine resources before you capture forensic artifacts, there is a risk that running processes will be interrupted or that the malware attempts to destroy resources that are valuable to a forensics investigation, to cover its tracks.

To preserve forensic evidence

  1. Enable termination protection on the impacted worker node and change the shutdown behavior to Stop.
  2. Label the offending pod or node with a label indicating that it is part of an active investigation.
  3. Cordon the worker node.
  4. Capture both volatile (temporary memory) and non-volatile (Amazon EBS snapshots) artifacts on the worker node.

Now that you have the forensic evidence, you can start to quarantine your EKS resources to restrict unauthorized network communication. The main objective is to prevent the affected EKS pods from communicating with internal resources or exfiltrating data externally.

To quarantine EKS resources

  1. Isolate the pod by creating a network policy that denies ingress and egress traffic to the pod.
  2. Attach a security group to the host and remove inbound and outbound rules. Take this action if you believe the underlying host has been compromised.

    Depending on existing inbound and outbound rules on the security group, the connections will either be tracked or untracked. Applying an isolation security group will drop untracked connections. For tracked connections, new connections with the host will not be allowed from the isolation security group, but existing tracked connections will not be interrupted.

    Important: This action will affect all containers running on the host.

  3. Attach a deny rule for the EKS resources in a network access control list (network ACL). Because network ACLs are stateless firewalls, all connections will be interrupted, whether they are tracked or untracked connections.

    Important: This action will affect all subnets using the network ACL and all resources within those subnets.

At this point, the affected EKS resources are quarantined, but the cluster is still configured to allow anonymous, unauthenticated access. You will need to remove all unauthorized permissions that were created or added.

To remove unauthorized permissions

  1. Update the RBAC configuration to remove system:anonymous access.
  2. Revoke temporary security credentials that are assigned to the pod or worker node, if necessary. You can also remove the IAM role associated with the EKS resources.

    Note: Removing IAM policies or attaching IAM policies to restrict permissions will affect the resources that are using the IAM role.

  3. Remove any unauthorized ClusterRoleBinding created by the system:anonymous user.
  4. Redeploy the compromised pod or workload resource.

The actions taken so far primarily target the EKS resource, but based on our Detective investigation, there are other actions you might need to take. Because secrets were involved that could be used outside of the EKS cluster, those secrets will need to be rotated wherever they are referenced. Detective will also suggest additional areas where you can investigate and remediate additional unauthorized activity in your AWS account.

It is important that your team go through game days or run-throughs for investigating and responding to different scenarios in order to make sure the team is prepared. You can run through the EKS security workshop to get your security team more familiar with remediation for EKS.

For more information about responding to EKS cluster related security issues, refer to GuardDuty EKS remediation in the GuardDuty User Guide and the EKS Best Practices Guide.

Preventative controls for EKS

This section covers several preventative controls that you can use to protect EKS clusters.

How can I prevent external access to the EKS cluster?

To help prevent external access to your EKS clusters, limit the exposure of your API server. You can achieve that in two ways:

  1. Set the API server endpoint access to Private. This will effectively forbid anyone outside of the VPC to send Kubernetes API requests to your EKS cluster.
  2. Set an IP address allow list for the EKS cluster public access endpoint.

How can I prevent giving admin access to the EKS cluster?

To help prevent an EKS cluster user from granting any type of access to anonymous or unauthenticated users, you can set up a ValidatingAdmissionWebhook. This is a special type of Kubernetes admission controller that can be configured in the Kubernetes API. (To learn how to build serverless admission webhooks, see the blog post Building serverless admission webhooks for Kubernetes with AWS SAM.)

The ValidatingAdmissionWebhook will deny a Kubernetes API request that matches all of the following checks:

  1. The request is creating or modifying a ClusterRoleBinding or RoleBinding.
  2. The subjects section contains either of the following:
    • The user system:anonymous
    • The group system:unauthenticated

How can I prevent malicious images from being deployed?

Now that you have set controls to prevent external access to the EKS cluster and prevent granting access to anonymous users, you can focus on preventing the deployment of potentially malicious images.

Malicious container images can have different origins, including:

  1. Images stored in public or unauthorized registries
  2. Images replacing the ones that are stored in authorized registries
  3. Authorized images that contain software with existing or newly discovered vulnerabilities

You can address these sources of malicious images by doing the following:

  1. Use admission controllers to verify that images meet your organization’s requirements, including for the image origin. You can also refer to this this blog post to implement a solution with a webhook and admission controllers.
  2. Enable tag immutability in your registry, a control that prevents an actor from maliciously replacing container images without changing the image’s tags. Additionally, you can enable an AWS Config rule to check tag immutability
  3. Configure another ValidatingAdmissionWebhook that will only accept images if they meet all of the following criteria.
    1. Images that come from approved registries.
    2. Images that pass the vulnerability scan during deployment time.
    3. Images that are signed by a trusted party. Amazon Elastic Container Registry (Amazon ECR) is working on a product enhancement to store image signatures. Currently, you can use an open-source cosign tool to verify and store image signatures.

      Note: These criteria can vary based on your use case and internal security and compliance standards.

The above controls will help prevent the deployment of a vulnerable, unauthorized, or potentially malicious container image.

How can I prevent lateral movement inside the cluster?

To prevent lateral movement inside the cluster, it is recommended to use network policies, as follows:

  • Enforce Kubernetes network policies to enforce ingress and egress controls within the cluster. You can implement these policies by following the steps in the Securing your cluster with network policies EKS workshop.

It’s important to note that you could use security groups for the same purpose, but pod security groups should only be used if the cluster is compromised and when you want to control the traffic between a pod and a resource that resides in the VPC, not inter-pod traffic.

In this section, we’ve reviewed different preventative controls that could have helped mitigate our example security incident. With the first preventative control, we could have prevented external actors from connecting to the API server. The second control could have prevented granting access to anonymous users. The third control could have prevented the deployment of an unauthorized or vulnerable container image. Finally, the fourth control could have helped limit the impact of the deployed vulnerable images to only the pods where the images were deployed, making it harder to laterally move to other pods in the cluster.

Conclusion

In this post, we walked you through how to investigate an EKS cluster related security issue with Amazon Detective. We also provided some recommended remediation and preventative controls to put in place for the EKS cluster specific security issues. When pairing GuardDuty’s ability for continuous threat detection and monitoring with Detective’s organization and visualization capabilities, you enable your security team to conduct faster and more effective investigation. By providing the security team the ability quickly view an organized set of data associated with security events within your AWS account, you reduce the overall Mean Time to Respond (MTTR).

Now that you understand the investigative capabilities with Detective, it’s time to try things out! It is important that you provide a mechanism for your security team to practice detection, investigation, and remediation techniques using security incident response simulations. By periodically running simulations, your security team will be prepared to quickly respond to possible security events. You can find more detailed incident response playbooks that can assist you in preparing for events in your environment, see these sample AWS incident response playbooks.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a thread on Amazon GuardDuty re:Post.

Want more AWS Security news? Follow us on Twitter.

Author

Marshall Jones

Marshall is a worldwide senior security specialist solutions architect at AWS. His background is in AWS consulting and security architecture, focused on a variety of security domains including edge, threat detection, and compliance. Today, he helps enterprise customers adopt and operationalize AWS security services to increase security effectiveness and reduce risk.

Jonathan Nguyen

Jonathan Nguyen

Jonathan is a shared delivery team senior security consultant at AWS. His background is in AWS security, with a focus on threat detection and incident response. He helps enterprise customers develop a comprehensive AWS security strategy, deploy security solutions at scale, and train customers on AWS security best practices.

Manuel Martinez Arizmendi

Manuel Martinez Arizmendi

Manuel works a Security Engineer at Amazon Detective providing new security investigation capabilities to AWS customers. Based on Boston,MA and originally from Madrid, Spain, when he’s not at work, he enjoys playing and watching soccer, playing videogames, and hanging out with his friends.

How to detect security issues in Amazon EKS clusters using Amazon GuardDuty – Part 1

Post Syndicated from Marshall Jones original https://aws.amazon.com/blogs/security/how-to-detect-security-issues-in-amazon-eks-clusters-using-amazon-guardduty-part-1/

In this two-part blog post, we’ll discuss how to detect and investigate security issues in an Amazon Elastic Kubernetes Service (Amazon EKS) cluster with Amazon GuardDuty and Amazon Detective.

Amazon Elastic Kubernetes Service (Amazon EKS) is a managed service that you can use to run and scale container workloads by using Kubernetes in the AWS Cloud, which can help increase the speed of deployment and portability of modern applications. Amazon EKS provides secure, managed Kubernetes clusters on the AWS control plane by default. Kubernetes configurations such as pod security policies, runtime security, and network policies and configurations are specific for your organization’s use-case and securing them adequately would be a customer’s responsibility within AWS’ shared responsibility model.

Amazon GuardDuty can help you continuously monitor and detect suspicious activity related to AWS resources in your account. GuardDuty for EKS protection is a feature that you can enable within your accounts. When this feature is enabled, GuardDuty can help detect potentially unauthorized EKS activity resulting from misconfiguration of the control plane nodes or application.

In this post, we’ll walk through the events leading up to a real-world security issue that occurred due to EKS cluster misconfiguration, discuss how those misconfigurations could be used by a malicious actor, and how Amazon GuardDuty monitors and identifies suspicious activity throughout the EKS security event. In part 2 of the post, we’ll cover Amazon Detective investigation capabilities, possible remediation techniques, and preventative controls for EKS cluster related security issues.

Prerequisites

You must have AWS GuardDuty enabled in your AWS account in order to monitor and generate findings associated with an EKS cluster related security issue in your environment.

EKS security issue walkthrough

Before jumping into the security issue, it is important to understand how the AWS shared responsibility model applies to the Amazon EKS managed service. AWS is responsible for the EKS managed Kubernetes control plane and the infrastructure to deliver EKS in a secure and reliable manner. You have the ability to configure EKS and how it interacts with other applications and services, where you are responsible for making sure that secure configurations are being used.

The following scenario is based on a real-world observed event, where a malicious actor used Kubernetes compromise tactics and techniques to expose and access an EKS cluster. We use this example to show how you can use AWS security services to identify and investigate each step of this security event. For a security event in your own environment, the order of operations and the investigative and remediation techniques used might be different. The scenario is broken down into the following phases and associated MITRE ATT&CK tactics:

  • Phase 1 – EKS cluster misconfiguration
  • Phase 2 (Discovery) – Discovery of vulnerable EKS clusters
  • Phase 3 (Initial Access) – Credential access to obtain Kubernetes secrets
  • Phase 4 (Persistence) – Impact to persist unauthorized access to the cluster
  • Phase 5 (Impact) – Impact to manipulate resources for unauthorized activity

Phase 1 – EKS cluster misconfiguration

By default, when you provision an EKS cluster, the API cluster endpoint is set to public, meaning that it can be accessed from the internet. Despite being accessible from the internet, the endpoint is still considered secure because it requires all API requests to be authenticated by AWS Identity and Access Management (IAM) and then authorized by Kubernetes role-based access control (RBAC). Also, the entity (user or role) that creates the EKS cluster is automatically granted system:masters permissions, which allows the entity to modify the EKS cluster’s RBAC configuration.

This example scenario starts with a developer who has access to administer EKS clusters in an AWS account. The developer wants to work from their home network and doesn’t want to connect to their enterprise VPN for IAM role federation. They configure an EKS cluster API without setting up the proper authentication and authorization components. Instead, the developer grants explicit access to the system:anonymous user in the cluster’s RBAC configuration. (Alternatively, an unauthorized RBAC configuration could be introduced into your environment after a developer unknowingly installs a malicious helm chart from the internet without reviewing or inspecting it first.)

In Kubernetes anonymous requests, unauthenticated and unrejected HTTP requests are treated as anonymous access and are identified as a system:anonymous user belonging to a system:unauthenticated group. This means that any entity on the internet can access the cluster and make API requests that are permitted by the role. There aren’t many legitimate use cases for this type of activity, because it’s considered a best practice to use RBAC instead. Anonymous requests are primarily used for setting up health endpoints and custom authentication.

By monitoring EKS audit logs, GuardDuty identifies this activity and generates the finding Policy:Kubernetes/AnonymousAccessGranted, as shown in Figure 1. This finding informs you that a user on your Kubernetes cluster successfully created a ClusterRoleBinding or RoleBinding to bind the user system:anonymous to a role. This action enables unauthenticated access to the API operations permitted by the role.

Figure 1: Example GuardDuty finding for Kubernetes anonymous access granted

Figure 1: Example GuardDuty finding for Kubernetes anonymous access granted

Phase 2 (Discovery) – Discovery of vulnerable EKS clusters

Port scanning is a method that malicious actors use to determine if resources are publicly exposed, with open ports and known vulnerabilities. As an increasing number of open-source tools allows users to search for endpoints connected to the internet, finding these endpoints has become even easier. Security teams can use these open-source tools to their advantage by proactively scanning for and identifying externally exposed resources in their organization.

This brings us to the discovery phase of our misconfigured EKS cluster. The discovery phase is defined by MITRE as follows: “Discovery consists of techniques an adversary may use to gain knowledge about the system and internal network. These techniques help adversaries observe the environment and orient themselves before deciding how to act.”

By granting system:anonymous access to the EKS cluster in our example, the developer allowed requests from any public unauthenticated source. This can result in external web crawlers probing the cluster API, which can often happen within seconds of the system:anonymous access being granted. GuardDuty identifies this activity and generates the finding Discovery:Kubernetes/SuccessfulAnonymousAccess, as shown in Figure 2. This finding informs you that an API operation to discover resources in a cluster was successfully invoked by the system:anonymous user. Remember, all API calls made by system:anonymous are unauthenticated, in addition to /healthz and /version calls that are always unauthenticated regardless of the user identity, and any entity can make use of this user within the EKS cluster.

In the screenshot, under the Action section in the finding details, you can see that the anonymous user made a get request to “/”. This is a generic request that is not specific to a Kubernetes cluster, which may indicate that the crawler is not specifically targeting Kubernetes clusters. You can further see that the Status code is 200, indicating that the request was successful. If this activity is malicious, then the actor is now aware that there is an exposed resource.

Figure 2: Example GuardDuty finding for Kubernetes successful anonymous access

Figure 2: Example GuardDuty finding for Kubernetes successful anonymous access

Phase 3 (Initial Access) – Credential access to obtain Kubernetes secrets

Next, in this phase, you might start observing more targeted API calls for establishing initial access from unauthorized users. MITRE defines initial access as “techniques that use various entry vectors to gain their initial foothold within a network. Techniques used to gain a foothold include targeted spearphishing and exploiting weaknesses on public-facing web servers. Footholds gained through initial access may allow for continued access, like valid accounts and use of external remote services, or may be limited-use due to changing passwords.”

In our example, the malicious actor has established initial access for the EKS cluster which is evident in the next GuardDuty finding, CredentialAccess:Kubernetes/SuccessfulAnonymousAccess, as shown in Figure 3. This finding informs you that an API call to access credentials or secrets was successfully invoked by the system:anonymous user. The observed API call is commonly associated with the credential access tactic where an adversary is attempting to collect passwords, usernames, and access keys for a Kubernetes cluster.

You can see that in this GuardDuty finding, in the Action section, the Request uri is targeted at a Kubernetes cluster, specifically /api/v1/namespaces/kube-system/secrets. This request seems to be targeting the secrets management capabilities that are built into Kubernetes. You can find more information about this secrets management capability in the Kubernetes documentation.

Figure 3: Example GuardDuty finding for Kubernetes successful credential access from anonymous user

Figure 3: Example GuardDuty finding for Kubernetes successful credential access from anonymous user

Phase 4 (Persistence) – Impact to persist unauthorized access to the cluster

The next phase of this scenario is likely to be an impact in the EKS cluster to enable persistence by the malicious actor. MITRE defines impact as “techniques that adversaries use to disrupt availability or compromise integrity by manipulating business and operational processes.” Following the MITRE definitions, “Persistence consists of techniques that adversaries use to keep access to systems across restarts, changed credentials, and other interruptions that could cut off their access. Techniques used for persistence include any access, action, or configuration changes that let them maintain their foothold on systems, such as replacing or hijacking legitimate code or adding startup code.”

In the GuardDuty finding Impact:Kubernetes/SuccessfulAnonymousAccess, shown in Figure 4, you can see the Kubernetes user details and Action sections that indicate that a successful Kubernetes API call was made to create a ClusterRoleBinding by the system:anonymous username. This finding informs you that a write API operation to tamper with resources was successfully invoked by the system:anonymous user. The observed API call is commonly associated with the impact stage of an attack, when an adversary is tampering with resources in your cluster. This activity shows that the system:anonymous user has now created their own role to enable persistent access the EKS cluster. If the user is malicious, they can now access the cluster even if access is removed in the RBAC configuration for the system:anonymous user.

Figure 4 Example GuardDuty finding for Kubernetes successful credential change by anonymous user

Figure 4 Example GuardDuty finding for Kubernetes successful credential change by anonymous user

Phase 5 (Impact) – Impact to manipulate resources for unauthorized activity

The fifth phase of this scenario is where the unauthorized user is likely to focus on impact techniques in order to use the access for malicious purpose. MITRE says of the impact phase: “Techniques used for impact can include destroying or tampering with data. In some cases, business processes can look fine, but may have been altered to benefit the adversaries’ goals. These techniques might be used by adversaries to follow through on their end goal or to provide cover for a confidentiality breach.” Typically, once a malicious actor has access into a system, they will introduce malware to the system to manipulate the compromised resource and possibly also other resources.

With the introduction of GuardDuty Malware Protection, when an Amazon Elastic Compute Cloud (Amazon EC2) or container-related GuardDuty finding that indicates potentially suspicious activity is generated, an agentless scan on the volumes will initiate and detect the presence of malware. Existing GuardDuty customers need to enable Malware Protection, and for new customers this feature is on by default when they enable GuardDuty for the first time. Malware Protection comes with a 30-day free trial for both existing and new GuardDuty customers. You can see a list of findings that initiates a malware scan in the GuardDuty User Guide.

In this example, the malicious actor now uses access to the cluster to perform unauthorized cryptocurrency mining. GuardDuty monitors the DNS requests from the EC2 instances used to host the EKS cluster. This allows GuardDuty to identify a DNS request made to a domain name associated with a cryptocurrency mining pool, and generate the finding CryptoCurrency:EC2/BitcoinTool.B!DNS, as shown in Figure 5.

Figure 5: Example GuardDuty finding for EC2 instance querying bitcoin domain name

Figure 5: Example GuardDuty finding for EC2 instance querying bitcoin domain name

Because this is an EC2 related GuardDuty finding and GuardDuty Malware Protection is enabled in the account, GuardDuty then conducts an agentless scan on the volumes of the EC2 instance to detect malware. If the scan results in a successful detection of one or more malicious files, another GuardDuty finding for Execution:EC2/MaliciousFile is generated, as shown in Figure 6.

Figure 6: Example GuardDuty finding for detection of a malicious file on EC2

Figure 6: Example GuardDuty finding for detection of a malicious file on EC2

The first GuardDuty finding detects crypto mining activity, while the proceeding malware protection finding provides context on the malware associated with this activity. This context is very valuable for the incident response process.

Conclusion

In this post, we walked you through each of the five phases where we outlined how an initial misconfiguration could result in a malicious actor gaining control of EKS resources within an AWS account and how GuardDuty is able to continually monitor and detect the progression of the security event. As previously stated, this is just one example where a misconfiguration in an EKS cluster could result in a security event.

Now that you have a good understanding of GuardDuty capabilities to continuously monitor and detect EKS security events, you will need to establish processes and procedures to enable your security team to investigate these events. You can enable Amazon Detective to help accelerate your security team’s mean time to respond (MTTR) by providing an efficient mechanism to analyze, investigate, and identify the root cause of security events. Follow along in part 2 of this series, How to investigate and take action on an Amazon EKS cluster related security issue with Amazon Detective, where we’ll cover techniques you can use with Amazon Detective to identify impacted EKS resources in your AWS account, possible remediation actions to take on the cluster, and preventative controls you can implement.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, start a thread on Amazon GuardDuty re:Post.

Want more AWS Security news? Follow us on Twitter.

Author

Marshall Jones

Marshall is a worldwide senior security specialist solutions architect at AWS. His background is in AWS consulting and security architecture, focused on a variety of security domains including edge, threat detection, and compliance. Today, he helps enterprise customers adopt and operationalize AWS security services to increase security effectiveness and reduce risk.

Jonathan Nguyen

Jonathan Nguyen

Jonathan is a shared delivery team senior security consultant at AWS. His background is in AWS security, with a focus on threat detection and incident response. He helps enterprise customers develop a comprehensive AWS security strategy, deploy security solutions at scale, and train customers on AWS security best practices.

Manuel Martinez Arizmendi

Manuel Martinez Arizmendi

Manuel works a Security Engineer at Amazon Detective providing new security investigation capabilities to AWS customers. Based on Boston,MA and originally from Madrid, Spain, when he’s not at work, he enjoys playing and watching soccer, playing videogames, and hanging out with his friends.

Design considerations for Amazon EMR on EKS in a multi-tenant Amazon EKS environment

Post Syndicated from Lotfi Mouhib original https://aws.amazon.com/blogs/big-data/design-considerations-for-amazon-emr-on-eks-in-a-multi-tenant-amazon-eks-environment/

Many AWS customers use Amazon Elastic Kubernetes Service (Amazon EKS) in order to take advantage of Kubernetes without the burden of managing the Kubernetes control plane. With Kubernetes, you can centrally manage your workloads and offer administrators a multi-tenant environment where they can create, update, scale, and secure workloads using a single API. Kubernetes also allows you to improve resource utilization, reduce cost, and simplify infrastructure management to support different application deployments. This model is beneficial for those running Apache Spark workloads, for several reasons. For example, it allows you to have multiple Spark environments running concurrently with different configurations and dependencies that are segregated from each other through Kubernetes multi-tenancy features. In addition, the same cluster can be used for various workloads like machine learning (ML), host applications, data streaming and thereby reducing operational overhead of managing multiple clusters.

AWS offers Amazon EMR on EKS, a managed service that enables you to run your Apache Spark workloads on Amazon EKS. This service uses the Amazon EMR runtime for Apache Spark, which increases the performance of your Spark jobs so that they run faster and cost less. When you run Spark jobs on EMR on EKS and not on self-managed Apache Spark on Kubernetes, you can take advantage of automated provisioning, scaling, faster runtimes, and the development and debugging tools that Amazon EMR provides

In this post, we show how to configure and run EMR on EKS in a multi-tenant EKS cluster that can used by your various teams. We tackle multi-tenancy through four topics: network, resource management, cost management, and security.

Concepts

Throughout this post, we use terminology that is either specific to EMR on EKS, Spark, or Kubernetes:

  • Multi-tenancy – Multi-tenancy in Kubernetes can come in three forms: hard multi-tenancy, soft multi-tenancy and sole multi-tenancy. Hard multi-tenancy means each business unit or group of applications gets a dedicated Kubernetes; there is no sharing of the control plane. This model is out of scope for this post. Soft multi-tenancy is where pods might share the same underlying compute resource (node) and are logically separated using Kubernetes constructs through namespaces, resource quotas, or network policies. A second way to achieve multi-tenancy in Kubernetes is to assign pods to specific nodes that are pre-provisioned and allocated to a specific team. In this case, we talk about sole multi-tenancy. Unless your security posture requires you to use hard or sole multi-tenancy, you would want to consider using soft multi-tenancy for the following reasons:
    • Soft multi-tenancy avoids underutilization of resources and waste of compute resources.
    • There is a limited number of managed node groups that can be used by Amazon EKS, so for large deployments, this limit can quickly become a limiting factor.
    • In sole multi-tenancy there is high chance of ghost nodes with no pods scheduled on them due to misconfiguration as we force pods into dedicated nodes with label, taints and tolerance and anti-affinity rules.
  • Namespace – Namespaces are core in Kubernetes and a pillar to implement soft multi-tenancy. With namespaces, you can divide the cluster into logical partitions. These partitions are then referenced in quotas, network policies, service accounts, and other constructs that help isolate environments in Kubernetes.
  • Virtual cluster – An EMR virtual cluster is mapped to a Kubernetes namespace that Amazon EMR is registered with. Amazon EMR uses virtual clusters to run jobs and host endpoints. Multiple virtual clusters can be backed by the same physical cluster. However, each virtual cluster maps to one namespace on an EKS cluster. Virtual clusters don’t create any active resources that contribute to your bill or require lifecycle management outside the service.
  • Pod template – In EMR on EKS, you can provide a pod template to control pod placement, or define a sidecar container. This pod template can be defined for executor pods and driver pods, and stored in an Amazon Simple Storage Service (Amazon S3) bucket. The S3 locations are then submitted as part of the applicationConfiguration object that is part of configurationOverrides, as defined in the EMR on EKS job submission API.

Security considerations

In this section, we address security from different angles. We first discuss how to protect IAM role that is used for running the job. Then address how to protect secrets use in jobs and finally we discuss how you can protect data while it is processed by Spark.

IAM role protection

A job submitted to EMR on EKS needs an AWS Identity and Access Management (IAM) execution role to interact with AWS resources, for example with Amazon S3 to get data, with Amazon CloudWatch Logs to publish logs, or use an encryption key in AWS Key Management Service (AWS KMS). It’s a best practice in AWS to apply least privilege for IAM roles. In Amazon EKS, this is achieved through IRSA (IAM Role for Service Accounts). This mechanism allows a pod to assume an IAM role at the pod level and not at the node level, while using short-term credentials that are provided through the EKS OIDC.

IRSA creates a trust relationship between the EKS OIDC provider and the IAM role. This method allows only pods with a service account (annotated with an IAM role ARN) to assume a role that has a trust policy with the EKS OIDC provider. However, this isn’t enough, because it would allow any pod with a service account within the EKS cluster that is annotated with a role ARN to assume the execution role. This must be further scoped down using conditions on the role trust policy. This condition allows the assume role to happen only if the calling service account is the one used for running a job associated with the virtual cluster. The following code shows the structure of the condition to add to the trust policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": <OIDC provider ARN >
            },
            "Action": "sts:AssumeRoleWithWebIdentity"
            "Condition": { "StringLike": { “<OIDC_PROVIDER>:sub": "system:serviceaccount:<NAMESPACE>:emr-containers-sa-*-*-<AWS_ACCOUNT_ID>-<BASE36_ENCODED_ROLE_NAME>”} }
        }
    ]
}

To scope down the trust policy using the service account condition, you need to run the following the command with AWS CLI:

aws emr-containers update-role-trust-policy \
–cluster-name cluster \
–namespace namespace \
–role-name iam_role_name_for_job_execution

The command will the add the service account that will be used by the spark client, Jupyter Enterprise Gateway, Spark kernel, driver or executor. The service accounts name have the following structure emr-containers-sa-*-*-<AWS_ACCOUNT_ID>-<BASE36_ENCODED_ROLE_NAME>.

In addition to the role segregation offered by IRSA, we recommend blocking access to instance metadata because a pod can still inherit the rights of the instance profile assigned to the worker node. For more information about how you can block access to metadata, refer to Restrict access to the instance profile assigned to the worker node.

Secret protection

Sometime a Spark job needs to consume data stored in a database or from APIs. Most of the time, these are protected with a password or access key. The most common way to pass these secrets is through environment variables. However, in a multi-tenant environment, this means any user with access to the Kubernetes API can potentially access the secrets in the environment variables if this access isn’t scoped well to the namespaces the user has access to.

To overcome this challenge, we recommend using a Secrets store like AWS Secrets Manager that can be mounted through the Secret Store CSI Driver. The benefit of using Secrets Manager is the ability to use IRSA and allow only the role assumed by the pod access to the given secret, thereby improving your security posture. You can refer to the best practices guide for sample code showing the use of Secrets Manager with EMR on EKS.

Spark data encryption

When a Spark application is running, the driver and executors produce intermediate data. This data is written to the node local storage. Anyone who is able to exec into the pods would be able to read this data. Spark supports encryption of this data, and it can be enabled by passing --conf spark.io.encryption.enabled=true. Because this configuration adds performance penalty, we recommend enabling data encryption only for workloads that store and access highly sensitive data and in untrusted environments.

Network considerations

In this section we discuss how to manage networking within the cluster as well as outside the cluster. We first address how Spark handle cross executors and driver communication and how to secure it. Then we discuss how to restrict network traffic between pods in the EKS cluster and allow only traffic destined to EMR on EKS. Last, we discuss how to restrict traffic of executors and driver pods to external AWS service traffic using security groups.

Network encryption

The communication between the driver and executor uses RPC protocol and is not encrypted. Starting with Spark 3 in the Kubernetes backed cluster, Spark offers a mechanism to encrypt communication using AES encryption.

The driver generates a key and shares it with executors through the environment variable. Because the key is shared through the environment variable, potentially any user with access to the Kubernetes API (kubectl) can read the key. We recommend securing access so that only authorized users can have access to the EMR virtual cluster. In addition, you should set up Kubernetes role-based access control in such a way that the pod spec in the namespace where the EMR virtual cluster runs is granted to only a few selected service accounts. This method of passing secrets through the environment variable would change in the future with a proposal to use Kubernetes secrets.

To enable encryption, RPC authentication must also be enabled in your Spark configuration. To enable encryption in-transit in Spark, you should use the following parameters in your Spark config:

--conf spark.authenticate=true

--conf spark.network.crypto.enabled=true

Note that these are the minimal parameters to set; refer to Encryption from the complete list of parameters.

Additionally, applying encryption in Spark has a negative impact on processing speed. You should only apply it when there is a compliance or regulation need.

Securing Network traffic within the cluster

In Kubernetes, by default pods can communicate over the network across different namespaces in the same cluster. This behavior is not always desirable in a multi-tenant environment. In some instances, for example in regulated industries, to be compliant you want to enforce strict control over the network and send and receive traffic only from the namespace that you’re interacting with. For EMR on EKS, it would be the namespace associated to the EMR virtual cluster. Kubernetes offers constructs that allow you to implement network policies and define fine-grained control over the pod-to-pod communication. These policies are implemented by the CNI plugin; in Amazon EKS, the default plugin would be the VPC CNI. A policy is defined as follows and is applied with kubectl:

Kind: NetworkPolicy
metadata:
  name: default-np-ns1
  namespace: <EMR-VC-NAMESPACE>
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          nsname: <EMR-VC-NAMESPACE>

Network traffic outside the cluster

In Amazon EKS, when you deploy pods on Amazon Elastic Compute Cloud (Amazon EC2) instances, all the pods use the security group associated with the node. This can be an issue if your pods (executor pods) are accessing a data source (namely a database) that allows traffic based on the source security group. Database servers often restrict network access only from where they are expecting it. In the case of a multi-tenant EKS cluster, this means pods from other teams that shouldn’t have access to the database servers, would be able to send traffic to it.

To overcome this challenge, you can use security groups for pods. This feature allows you to assign a specific security group to your pods, thereby controlling the network traffic to your database server or data source. You can also refer to the best practices guide for a reference implementation.

Cost management and chargeback

In a multi-tenant environment, cost management is a critical subject. You have multiple users from various business units, and you need to be able to precisely chargeback the cost of the compute resource they have used. At the beginning of the post, we introduced three models of multi-tenancy in Amazon EKS: hard multi-tenancy, soft multi-tenancy, and sole multi-tenancy. Hard multi-tenancy is out of scope because the cost tracking is trivial; all the resources are dedicated to the team using the cluster, which is not the case for sole multi-tenancy and soft multi-tenancy. In the next sections, we discuss these two methods to track the cost for each of model.

Soft multi-tenancy

In a soft multi-tenant environment, you can perform chargeback to your data engineering teams based on the resources they consumed and not the nodes allocated. In this method, you use the namespaces associated with the EMR virtual cluster to track how much resources were used for processing jobs. The following diagram illustrates an example.

Diagram to illustrate soft multi-tenancy

Diagram -1 Soft multi-tenancy

Tracking resources based on the namespace isn’t an easy task because jobs are transient in nature and fluctuate in their duration. However, there are partner tools available that allow you to keep track of the resources used, such as Kubecost, CloudZero, Vantage, and many others. For instructions on using Kubecost on Amazon EKS, refer to this blog post on cost monitoring for EKS customers.

Sole multi-tenancy

For sole multi-tenancy, the chargeback is done at the instance (node) level. Each member on your team uses a specific set of nodes that are dedicated to it. These nodes aren’t always running, and are spun up using the Kubernetes auto scaling mechanism. The following diagram illustrates an example.

Diagram to illustrate Sole tenancy

Diagram -2 Sole tenancy

With sole multi-tenancy, you use a cost allocation tag, which is an AWS mechanism that allows you to track how much each resource has consumed. Although the method of sole multi-tenancy isn’t efficient in terms of resource utilization, it provides a simplified strategy for chargebacks. With the cost allocation tag, you can chargeback a team based on all the resources they used, like Amazon S3, Amazon DynamoDB, and other AWS resources. The chargeback mechanism based on the cost allocation tag can be augmented using the recently launched AWS Billing Conductor, which allows you to issue bills internally for your team.

Resource management

In this section, we discuss considerations regarding resource management in multi-tenant clusters. We briefly discuss topics like sharing resources graciously, setting guard rails on resource consumption, techniques for ensuring resources for time sensitive and/or critical jobs, meeting quick resource scaling requirements and finally cost optimization practices with node selectors.

Sharing resources

In a multi-tenant environment, the goal is to share resources like compute and memory for better resource utilization. However, this requires careful capacity management and resource allocation to make sure each tenant gets their fair share. In Kubernetes, resource allocation is controlled and enforced by using ResourceQuota and LimitRange. ResourceQuota limits resources on the namespace level, and LimitRange allows you to make sure that all the containers are submitted with a resource requirement and a limit. In this section, we demonstrate how a data engineer or Kubernetes administrator can set up ResourceQuota as a LimitRange configuration.

The administrator creates one ResourceQuota per namespace that provides constraints for aggregate resource consumption:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: compute-resources
  namespace: teamA
spec:
  hard:
    requests.cpu: "1000"
    requests.memory: 4000Gi
    limits.cpu: "2000"
    limits.memory: 6000Gi

For LimitRange, the administrator can review the following sample configuration. We recommend using default and defaultRequest to enforce the limit and request field on containers. Lastly, from a data engineer perspective while submitting the EMR on EKS jobs, you need to make sure the Spark parameters of resource requirements are within the range of the defined LimitRange. For example, in the following configuration, the request for spark.executor.cores=7 will fail because the max limit for CPU is 6 per container:

apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-min-max
  namespace: teamA
spec:
  limits:
  - max:
      cpu: "6"
    min:
      cpu: "100m"
    default:
      cpu: "500m"
    defaultRequest:
      cpu: "100m"
    type: Container

Priority-based resource allocation

Diagram Illustrates an example of resource allocation with priority

Diagram – 3 Illustrates an example of resource allocation with priority.

As all the EMR virtual clusters share the same EKS computing platform with limited resources, there will be scenarios in which you need to prioritize jobs in a sensitive timeline. In this case, high-priority jobs can utilize the resources and finish the job, whereas low-priority jobs that are running gets stopped and any new pods must wait in the queue. EMR on EKS can achieve this with the help of pod templates, where you specify a priority class for the given job.

When a pod priority is enabled, the Kubernetes scheduler orders pending pods by their priority and places them in the scheduling queue. As a result, the higher-priority pod may be scheduled sooner than pods with lower priority if its scheduling requirements are met. If this pod can’t be scheduled, the scheduler continues and tries to schedule other lower-priority pods.

The preemptionPolicy field on the PriorityClass defaults to PreemptLowerPriority, and the pods of that PriorityClass can preempt lower-priority pods. If preemptionPolicy is set to Never, pods of that PriorityClass are non-preempting. In other words, they can’t preempt any other pods. When lower-priority pods are preempted, the victim pods get a grace period to finish their work and exit. If the pod doesn’t exit within that grace period, that pod is stopped by the Kubernetes scheduler. Therefore, there is usually a time gap between the point when the scheduler preempts victim pods and the time that a higher-priority pod is scheduled. If you want to minimize this gap, you can set a deletion grace period of lower-priority pods to zero or a small number. You can do this by setting the terminationGracePeriodSeconds option in the victim Pod YAML.

See the following code samples for priority class:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 100
globalDefault: false
description: " High-priority Pods and for Driver Pods."

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 50
globalDefault: false
description: " Low-priority Pods."

One of the key considerations while templatizing the driver pods, especially for low-priority jobs, is to avoid the same low-priority class for both driver and executor. This will save the driver pods from getting evicted and lose the progress of all its executors in a resource congestion scenario. In this low-priority job example, we have used a high-priority class for driver pod templates and low-priority classes only for executor templates. This way, we can ensure the driver pods are safe during the eviction process of low-priority jobs. In this case, only executors will be evicted, and the driver can bring back the evicted executor pods as the resource becomes freed. See the following code:

apiVersion: v1
kind: Pod
spec:
  priorityClassName: "high-priority"
  nodeSelector:
    eks.amazonaws.com/capacityType: ON_DEMAND
  containers:
  - name: spark-kubernetes-driver # This will be interpreted as Spark driver container

apiVersion: v1
kind: Pod
spec:
  priorityClassName: "low-priority"
  nodeSelector:
    eks.amazonaws.com/capacityType: SPOT
  containers:
  - name: spark-kubernetes-executors # This will be interpreted as Spark executor container

Overprovisioning with priority

Diagram Illustrates an example of overprovisioning with priority

Diagram – 4 Illustrates an example of overprovisioning with priority.

As pods wait in a pending state due to resource availability, additional capacity can be added to the cluster with Amazon EKS auto scaling. The time it takes to scale the cluster by adding new nodes for deployment has to be considered for time-sensitive jobs. Overprovisioning is an option to mitigate the auto scaling delay using temporary pods with negative priority. These pods occupy space in the cluster. When pods with high priority are unschedulable, the temporary pods are preempted to make the room. This causes the auto scaler to scale out new nodes due to overprovisioning. Be aware that this is a trade-off because it adds higher cost while minimizing scheduling latency. For more information about overprovisioning best practices, refer to Overprovisioning.

Node selectors

EKS clusters can span multiple Availability Zones in a VPC. A Spark application whose driver and executor pods are distributed across multiple Availability Zones can incur inter- Availability Zone data transfer costs. To minimize or eliminate the data transfer cost, you should configure the job to run on a specific Availability Zone or even specific node type with the help of node labels. Amazon EKS places a set of default labels to identify capacity type (On-Demand or Spot Instance), Availability Zone, instance type, and more. In addition, we can use custom labels to meet workload-specific node affinity.

EMR on EKS allows you to choose specific nodes in two ways:

  • At the job level. Refer to EKS Node Placement for more details.
  • In the driver and executor level using pod templates.

When using pod templates, we recommend using on demand instances for driver pods. You can also consider including spot instances for executor pods for workloads that are tolerant of occasional periods when the target capacity is not completely available. Leveraging spot instances allow you to save cost for jobs that are not critical and can be terminated. Please refer Define a NodeSelector in PodTemplates.

Conclusion

In this post, we provided guidance on how to design and deploy EMR on EKS in a multi-tenant EKS environment through different lenses: network, security, cost management, and resource management. For any deployment, we recommend the following:

  • Use IRSA with a condition scoped on the EMR on EKS service account
  • Use a secret manager to store credentials and the Secret Store CSI Driver to access them in your Spark application
  • Use ResourceQuota and LimitRange to specify the resources that each of your data engineering teams can use and avoid compute resource abuse and starvation
  • Implement a network policy to segregate network traffic between pods

Lastly, if you are considering migrating your spark workload to EMR on EKS you can further learn about design patterns to manage Apache Spark workload in EMR on EKS in this blog and about migrating your EMR transient cluster to EMR on EKS in this blog.


About the Authors

author - lotfiLotfi Mouhib is a Senior Solutions Architect working for the Public Sector team with Amazon Web Services. He helps public sector customers across EMEA realize their ideas, build new services, and innovate for citizens. In his spare time, Lotfi enjoys cycling and running.

author - peter ajeebAjeeb Peter is a Senior Solutions Architect with Amazon Web Services based in Charlotte, North Carolina, where he guides global financial services customers to build highly secure, scalable, reliable, and cost-efficient applications on the cloud. He brings over 20 years of technology experience on Software Development, Architecture and Analytics from industries like finance and telecom.

Cloud Threat Detection: To Agent or Not to Agent?

Post Syndicated from Gadi Naor original https://blog.rapid7.com/2022/07/22/cloud-threat-detection-to-agent-or-not-to-agent/

Cloud Threat Detection: To Agent or Not to Agent?

The shift towards cloud and cloud-native application architectures represents an evolutionary step forward from older paradigms. The adoption of containers, Kubernetes, and serverless functions, along with the use of cloud-based infrastructure, introduces a new set of risks and security challenges — as well as new opportunities. These go well beyond application security and security posture management, spanning from the build phase all the way to the application run phase.

Three areas for cloud-native security

One particular area of focus for security defenses is actively security monitoring cloud-based applications and cloud workloads, often referred to as runtime security.

We can break down cloud-based runtime security into three main categories:

1. Cloud environment security

The cloud environment is where we provision the infrastructure and services to run our applications. Running applications often involves provisioning computing resources, networking, storage resources, and access credentials to external elements such as data, external services, or secrets. This is the foundation that our cloud applications are built on, and is a critical first step in ensuring their integrity.

2. Workload and workload orchestration security

Operating modern cloud-native applications often means leveraging a container orchestration platform. In recent years, Kubernetes has been the go-to application server vehicle. Leveraging application server infrastructure like Kubernetes requires attention from a risk and threat perspective. For example, Kubernetes credentials theft, or credential compromise as a result of application breach, can be detected through continuously analyzing the Kubernetes audit log. Another example would be the detection of malware that exploit inherent weaknesses in DNS protocol through network security analysis of the workload (Pod) communications.

3. Application security

If the cloud environment is our workload vehicle where we operate and run our workloads, and containerized workloads are our application vehicle, then OS processes are where our application logic runs. Cloud functions are another example of normally short-lived processes that carry our application logic. Protecting applications is a long-standing challenge on its own. This includes application API security, memory protection, data protection, network isolation, and control, and can be achieved using multiple techniques — some of which are only practically possible through the use of security agents.

Security agents defined

Security agents represent a specialized software deployed on an application workload or application endpoint to perform specific security functions, such as network monitoring, process-level monitoring, memory monitoring, file system monitoring, system API call monitoring, and memory monitoring. They may be involved in preventive actions, detection actions, or security forensics data collection actions.

For example, we can deploy security agents to virtual machines (cloud instances) and provide host-level security. We can use security agents for containerized environments like Kubernetes, where one security agent monitors and secures Kubernetes Pods, as well as the Kubernetes node itself. We can also have embedded security agents that monitor and secure serverless functions such as Lambda, or even security agents that provide process-level security and API-level security.

Agentless security is an approach that leverages security signals obtained via cloud APIs, such as network flows, DNS flows, cloud API access audit logs, or application API access logs. Collecting data from those security signals incurs a lower operational cost than agent-based security, but it can come with some limitations. For instance, in application security, the agentless approach has fewer security signals to analyze, and may not support some threat detection techniques such as process system call analysis.

Should I use agents to secure my cloud applications?

So should you be using agents, or not? The answer really boils down to how wide and deep a detection and protection fabric you want to cast, and how many skilled personnel are available to deploy and operate various security controls and respond to security incidents.

Agents provide a greater level of detail, and are generally your best bet when it comes to preemptive prevention of fine-grained policy-based controls such as network segmentation. However, they also require additional effort and overhead to manage the agents themselves with regular updates and configurations.

The agentless approach is excellent at correlating, segmenting, and analyzing data from various workloads, as it does not rely on sharing resources with the monitored workloads. That said, you’re going to sacrifice depth of coverage at certain layers of the stack as a trade-off to relatively lower operational overhead, because agentless approaches rely on cloud provider APIs, which are less granular than what host/workload or process-level agents can collect.

So to achieve comprehensive security and balance operational overhead, the recommendation is typically to leverage both technologies.

You’ll likely want to use an agentless approach to get fast and wide coverage of both risks and threats, or in places where agents can not be deployed, such as a hosted container environment like AWS Fargate or Cloud Functions. Another example would be to assess software vulnerability and detect persistent malware — which can be achieved using both technologies, but with different levels of time until detection.

Conversely, agents can be used in environments like Kubernetes where the operational overhead is relatively low, and the containerized workload granularity requires fine-grained and deeper security controls.

The decision of where to use an agent-based approach depends on what you’re trying to secure. If you’re looking to get real-time visibility into all of your cloud resources and workloads, establish a single source of “good” across your multiple cloud platforms, prioritize risk across your environments, and measure compliance against organizational and industry standards and best practices, an agentless approach like InsightCloudSec is a great choice.

Kubectl with Cloudflare Zero Trust

Post Syndicated from Terin Stock original https://blog.cloudflare.com/kubectl-with-zero-trust/

Kubectl with Cloudflare Zero Trust

Kubectl with Cloudflare Zero Trust

Cloudflare is a heavy user of Kubernetes for engineering workloads: it’s used to power the backend of our APIs, to handle batch-processing such as analytics aggregation and bot detection, and engineering tools such as our CI/CD pipelines. But between load balancers, API servers, etcd, ingresses, and pods, the surface area exposed by Kubernetes can be rather large.

In this post, we share a little bit about how our engineering team dogfoods Cloudflare Zero Trust to secure Kubernetes — and enables kubectl without proxies.

Our General Approach to Kubernetes Security

As part of our security measures, we heavily limit what can access our clusters over the network. Where a network service is exposed, we add additional protections, such as requiring Cloudflare Access authentication or Mutual TLS (or both) to access ingress resources.

These network restrictions include access to the cluster’s API server. Without access to this, engineers at Cloudflare would not be able to use tools like kubectl to introspect their team’s resources. While we believe Continuous Deployments and GitOps are best practices, allowing developers to use the Kubernetes API aids in troubleshooting and increasing developer velocity. Not having access would have been a deal breaker.

To satisfy our security requirements, we’re using Cloudflare Zero Trust, and we wanted to share how we’re using it, and the process that brought us here.

Before Zero Trust

In the world before Zero Trust, engineers could access the Kubernetes API by connecting to a VPN appliance. While this is common across the industry, and it does allow access to the API, it also dropped engineers as clients into the internal network: they had much more network access than necessary.

We weren’t happy with this situation, but it was the status quo for several years. At the beginning of 2020, we retired our VPN and thus the Kubernetes team needed to find another solution.

Kubernetes with Argo Tunnels

At the time we worked closely with the team developing Cloudflare Tunnels to add support for handling kubectl connections using Access and cloudflared tunnels.

While this worked for our engineering users, it was a significant hurdle to on-boarding new employees. Each Kubernetes cluster required its own tunnel connection from the engineer’s device, which made shuffling between clusters annoying. While kubectl supported connecting through SOCKS proxies, this support was not universal to all tools in the Kubernetes ecosystem.

We continued using this solution internally while we worked towards a better solution.

Kubernetes with Zero Trust

Since the launch of Cloudflare One, we’ve been dogfooding the Zero Trust agent in various configurations. At first we’d been using it to implement secure DNS with 1.1.1.1. As time went on, we began to use it to dogfood additional Zero Trust features.

We’re now leveraging the private network routing in Cloudflare Zero Trust to allow engineers to access the Kubernetes APIs without needing to setup cloudflared tunnels or configure kubectl and other Kubernetes ecosystem tools to use tunnels. This isn’t something specific to Cloudflare, you can do this for your team today!

Kubectl with Cloudflare Zero Trust

Configuring Zero Trust

We use a configuration management tool for our Zero Trust configuration to enable infrastructure-as-code, which we’ve adapted below. However, the same configuration can be achieved using the Cloudflare Zero Trust dashboard.

The first thing we need to do is create a new tunnel. This tunnel will be used to connect the Cloudflare edge network to the Kubernetes API. We run the tunnel endpoints within Kubernetes, using configuration shown later in this post.

resource "cloudflare_argo_tunnel" "k8s_zero_trust_tunnel" {
  account_id = var.account_id
  name       = "k8s_zero_trust_tunnel"
  secret     = var.tunnel_secret
}

The “tunnel_secret” secret should be a 32-byte random number, which you should temporarily save as we’ll reuse it later for the Kubernetes setup later.

After we’ve created the tunnel, we need to create the routes so the Cloudflare network knows what traffic to send through the tunnel.

resource "cloudflare_tunnel_route" "k8s_zero_trust_tunnel_ipv4" {
  account_id = var.account_id
  tunnel_id  = cloudflare_argo_tunnel.k8s_zero_trust_tunnel.id
  network    = "198.51.100.101/32"
  comment    = "Kubernetes API Server (IPv4)"
}
 
resource "cloudflare_tunnel_route" "k8s_zero_trust_tunnel_ipv6" {
  account_id = var.account_id
  tunnel_id  = cloudflare_argo_tunnel.k8s_zero_trust_tunnel.id
  network    = "2001:DB8::101/128"
  comment    = "Kubernetes API Server (IPv6)"
}

We support accessing the Kubernetes API via both IPv4 and IPv6 addresses, so we configure routes for both. If you’re connecting to your API server via a hostname, these IP addresses should match what is returned via a DNS lookup.

Next we’ll configure settings for Cloudflare Gateway so that it’s compatible with the API servers and clients.

resource "cloudflare_teams_list" "k8s_apiserver_ips" {
  account_id = var.account_id
  name       = "Kubernetes API IPs"
  type       = "IP"
  items      = ["198.51.100.101/32", "2001:DB8::101/128"]
}
 
resource "cloudflare_teams_rule" "k8s_apiserver_zero_trust_http" {
  account_id  = var.account_id
  name        = "Don't inspect Kubernetes API"
  description = "Allow connections from kubectl to API"
  precedence  = 10000
  action      = "off"
  enabled     = true
  filters     = ["http"]
  traffic     = format("any(http.conn.dst_ip[*] in $%s)", replace(cloudflare_teams_list.k8s_apiserver_ips.id, "-", ""))
}

As we use mutual TLS between clients and the API server, and not all the traffic between kubectl and the API servers are HTTP, we’ve disabled HTTP inspection for these connections.

You can pair these rules with additional Zero Trust rules, such as device attestation, session lifetimes, and user and group access policies to further customize your security.

Deploying Tunnels

Once you have your tunnels created and configured, you can deploy their endpoints into your network. We’ve chosen to deploy the tunnels as pods, as this allows us to use Kubernetes’s deployment strategies for rolling out upgrades and handling node failures.

apiVersion: v1
kind: ConfigMap
metadata:
  name: tunnel-zt
  namespace: example
  labels:
    tunnel: api-zt
data:
  config.yaml: |
    tunnel: 8e343b13-a087-48ea-825f-9783931ff2a5
    credentials-file: /opt/zt/creds/creds.json
    metrics: 0.0.0.0:8081
    warp-routing:
        enabled: true

This creates a Kubernetes ConfigMap with a basic configuration that enables WARP routing for the tunnel ID specified. You can get this tunnel ID from your configuration management system, the Cloudflare Zero Trust dashboard, or by running the following command from another device logged into the same account.

cloudflared tunnel list

Next, we’ll need to create a secret for our tunnel credentials. While you should use a secret management system, for simplicity we’ll create one directly here.

jq -cn --arg accountTag $CF_ACCOUNT_TAG \
       --arg tunnelID $CF_TUNNEL_ID \
       --arg tunnelName $CF_TUNNEL_NAME \
       --arg tunnelSecret $CF_TUNNEL_SECRET \
   '{AccountTag: $accountTag, TunnelID: $tunnelID, TunnelName: $tunnelName, TunnelSecret: $tunnelSecret}' | \
kubectl create secret generic -n example tunnel-creds --from-file=creds.json=/dev/stdin

This creates a new secret “tunnel-creds” in the “example” namespace with the credentials file the tunnel expects.

Now we can deploy the tunnels themselves. We deploy multiple replicas to ensure some are always available, even while nodes are being drained.

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    tunnel: api-zt
  name: tunnel-api-zt
  namespace: example
spec:
  replicas: 3
  selector:
    matchLabels:
      tunnel: api-zt
  strategy:
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  template:
    metadata:
      labels:
        tunnel: api-zt
    spec:
      containers:
        - args:
            - tunnel
            - --config
            - /opt/zt/config/config.yaml
            - run
          env:
            - name: GOMAXPROCS
              value: "2"
            - name: TZ
              value: UTC
          image: cloudflare/cloudflared:2022.5.3
          livenessProbe:
            failureThreshold: 1
            httpGet:
              path: /ready
              port: 8081
            initialDelaySeconds: 10
            periodSeconds: 10
          name: tunnel
          ports:
            - containerPort: 8081
              name: http-metrics
          resources:
            limits:
              cpu: "1"
              memory: 100Mi
          volumeMounts:
            - mountPath: /opt/zt/config
              name: config
              readOnly: true
            - mountPath: /opt/zt/creds
              name: creds
              readOnly: true
      volumes:
        - secret:
            name: tunnel-creds
          name: creds
        - configMap:
            name: tunnel-api-zt
          name: config

Using Kubectl with Cloudflare Zero Trust

Kubectl with Cloudflare Zero Trust

After deploying the Cloudflare Zero Trust agent, members of your team can now access the Kubernetes API without needing to set up any special SOCKS tunnels!

kubectl version --short
Client Version: v1.24.1
Server Version: v1.24.1

What’s next?

If you try this out, send us your feedback! We’re continuing to improve Zero Trust for non-HTTP workflows.

Is Your Kubernetes Cluster Ready for Version 1.24?

Post Syndicated from Alon Berger original https://blog.rapid7.com/2022/05/03/is-your-kubernetes-cluster-ready-for-version-1-24/

Is Your Kubernetes Cluster Ready for Version 1.24?

Kubernetes rolled out Version 1.24 on May 3, 2022, as its first release of 2022. This version is packed with some notable improvements, as well as new and deprecated features. In this post, we will cover some of the more significant items on the list.

The Dockershim removal

The new release has caught the attention of most users, mainly due to the official removal of Dockershim, a built-in Container Runtime Interface (CRI) in the Kubelet codebase, which has been deprecated since v1.20.

Docker is essentially a user-friendly abstraction layer, created before Kubernetes was introduced. Docker isn’t compliant with CRI, which is why Dockershim was needed in the first place. However, upon discovering maintenance overhead and weak points involving Docker and containerd, it was decided to remove Docker completely, encouraging users to utilize other CRI-compliant runtimes.

Docker-produced images are still able to run with all other CRI compliant runtimes, as long as worker nodes are configured to support those runtimes and any node customizations are properly updated based on the environment and runtime requirements. The release team also published an FAQ article dedicated entirely to the Dockershim removal.

Better security with short-lived tokens

A major update in this release is the reduction of secret-based service account tokens. This is a big step toward improving the overall security of service account tokens, which until now remained valid as long as their respective service accounts lived.

Now, with a much shorter lifespan, these tokens are significantly less susceptible to security risks, preventing attackers from gaining access to the cluster and from leveraging multiple attack vectors such as privileged escalations and lateral movement.

Network Policy status

Network Policy resources are implemented differently by different Container Network Interface (CNI) providers and often apply certain features in a different order.

This can lead to a Network Policy not being honored by the current CNI provider — worst of all, without notifying the user about the situation.

In this version, a new subresource status is added that allows users to receive feedback on whether a NetworkPolicy and its features have been properly parsed and help them understand why a particular feature is not working.

This is another great example of how developers and operation teams can benefit from features like this one, alleviating the often involved pain with troubleshooting a Kubernetes network issue.

CSI volume health monitoring

Container Storage Interface (CSI) drivers can now load an external controller as a sidecar that will check for volume health, and they can also provide extra information in the NodeGetVolumeStats function that Kubelet already uses to gather information on the volumes.

In this version, the Volume Health information is exposed as kubelet VolumeStats metrics. The kubelet_volume_stats_health_status_abnormal metric will have a persistentvolumeclaim label with a value of “1” if the volume is unhealthy, or “0” otherwise.

Additional noteworthy changes in Kubernetes Version 1.24

A few other welcome changes include new features like implementing new changes to the kubelet agent, Kubernetes’ primary component that runs on each node. Dockershim-related CLI flags were removed due to its deprecation. Furthermore, the Dynamic Kubelet Configuration feature, which allows dynamic Kubelet configurations, has been officially removed in this version, after it was announced as deprecated in earlier versions. This removal aims to simplify code and to improve its reliability.

Furthermore, the newly added kubectl create token command allows easier creation and retrieval of tokens for the Kubernetes API access and control management, or SIG-Auth.

This new command significantly improves automation processes throughout the CI/CD pipelines and will accelerate roles-based access control (RBAC) policy changes as well as hardening TokenRequest endpoint validations.

Lastly, a useful added feature for cluster operators is to identify Windows pods at API admission level authoritatively. This can be crucial for managing Windows containers by applying better security policies and constraints based on the operating system.

The first release for 2022 mainly introduces improvements towards providing helpful feedback for users, reducing the attack surface and improving security posture all around. The official removal of Dockershim support will push organizations and users to adapt and align with infrastructure changes, moving forward with new technology developments in Kubernetes and the cloud in general.

Additional reading:

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Secret Management with HashiCorp Vault

Post Syndicated from Mitz Amano original https://blog.cloudflare.com/secret-management-with-hashicorp-vault/

Secret Management with HashiCorp Vault

Secret Management with HashiCorp Vault

Many applications these days require authentication to external systems with resources, such as users and passwords to access databases and service accounts to access cloud services, and so on. In such cases, private information, like passwords and keys, becomes necessary. It is essential to take extra care in managing such sensitive data. For example, if you write your AWS key information or password in a script for deployment and then push it to a Git repository, all users who can read it will also be able to access it, and you could be in trouble. Even if it’s an internal repository, you run the risk of a potential leak.

How we were managing secrets in the service

Before we talk about Vault, let’s take a look at how we’ve used to manage secrets.

Salt

We use SaltStack as a bare-metal configuration management tool. The core of the Salt ecosystem consists of two major components: the Salt Master and the Salt Minion. The configuration state is owned by Salt Master, and thousands of Salt Minions automatically install packages, generate configuration files, and start services to the node based on the state. The state may contain secrets, such as passwords and API keys. When we deploy secrets to the node, we encrypt plaintext using a Salt Master owned GPG key and fill an ASCII-armored secret into the state file. Once it is applied, the Salt Master decrypts the PGP message using its own key, then the Salt Minion retrieves rendered data from the Master.

Secret Management with HashiCorp Vault

Kubernetes

We were using Lockbox, a secure way to store your Kubernetes secrets offline. The secret is asymmetrically encrypted and can only be decrypted with the Lockbox Kubernetes controller. The controller synchronizes with Secret objects. A Secret generated from Lockbox will also be created in the corresponding namespace. Since namespaces have been assigned administrator privileges by each engineering team, ordinary users cannot read Secret objects.

Secret Management with HashiCorp Vault

Why these secrets management were insufficient

Prior to Vault, GnuPG and Lockbox were used in this way to encrypt and decrypt most secrets in the data center. Nevertheless, they were inadequate in certain cases:

  • Lack of scoping secrets: The secret data in ASCII-armor could only be decrypted by a specific node when the client read it. This was still not enough control. Salt owns a GPG key for each Salt Master, and Core services (k8s, Databases, Storage, Logging, Tracing, Monitoring, etc) are deployed to hundreds of Salt Minions by a few Salt Masters. Nodes are often reused as different services after repairing hardware failure, so we use the same GPG key to decrypt the secrets of various services. Therefore, having a GPG key for each service is complicated. Also, a specific secret is used only for a specific service. For example, an access key for object storage is needed to back up the repository. In previous configurations, the API key is decrypted by a common Salt Master, so there is a risk that the API key will be referenced by another service or for another purpose. It is impossible to scope secret access, as long as we use the same GPG key.

    Another case is Kubernetes. Namespace-scoped access control and API access restrictions by the RBAC model are excellent. And the etcd used by Kubernetes as storage is not encrypted by default, and the Secret object is also saved. We need to think about encryption-at-rest by a third party KMS, or how to prevent Secrets from being stored in etcd. In other words, it is also required to properly control access to the secret for the secret itself.

  • Rotation and static secret: Anyone who has access to the Salt Master GPG key can theoretically decrypt all current and future secrets. And as long as we have many secrets, it’s impossible to rotate the encryption of all the secrets. Current and future secret management requires a process for easy rotation and using dynamically generated secrets instead.
  • Session management: Users/Services with GPG keys can decrypt secrets at any time. So GPG secret decryption is like having no TTL. (You can set an expiration date against the GPG key, but it’s just metadata. If you try to encrypt a new secret, after the expiration date, you’ll get a warning, but you can decrypt the existing secret). A temporary session is required to limit access when not needed.
  • Audit: GPG doesn’t have a way to keep an audit trail. Audit trails help us to trace the event who/when/where read secrets. The audit trail should contain details including the date, time, and user information associated with the secret read (and login), which is required regardless of user or service.

HashiCorp Vault

Armed with our set of requirements, we chose HashiCorp Vault to make better secret management with a better security model.

  • Scoping secrets: When a client logs in, a Vault token is generated through the Auth method (backend). This token has a policy that defines access policies, so it is clear what the client can access the data after logging in.
  • Rotation and dynamic secret: Version-controlled static secret with KV V2 Secret Engine helps us to easily update/rollback secrets with a single request. In addition, dynamic secrets and credentials are available to eliminate manual rotation. Ideally, these are required to be short-lived and have frequent rotation. Service should have restricted access. These are essential to reduce the impact of an attack, but they are operationally difficult, and it is impossible to satisfy them without automation. Vault can solve this problem by allowing operators to provide dynamically generated credentials to their services. Vault manages the credential lifecycle and rotates and revokes it as needed.
  • Session management: Vault provides a login process to get the token and various auth methods are provided. It is possible to link with an Identity Provider and authenticate using JWT. Since the vault token has a TTL, it can be managed as a short-lived credential to access secrets.
  • Audit: Vault supports audit that records who accessed which Vault API, when, and from where.

We also built Vault clusters for HA, Reliability, and handling large numbers of requests.

  • Use Integrated Storage that every node in the Vault cluster has a duplicate copy of Vault’s data. A client can retrieve the same result from any node.
  • Performance Replication offers us the same result as any Vault clusters.
  • Requests from clients are routed from a single Service IP to one of the Clusters. Anycast routes incoming traffic to the nearest cluster that handles requests efficiently. If one cluster goes down, the request will be automatically routed to another available cluster.
Secret Management with HashiCorp Vault

Service integrations

Use the appropriate Auth backend and Secret Engine to integrate the Service and Vault that are responsible for each core component.

Salt

The configuration state is owned by Salt Master, and hundreds of Salt Minions automatically install packages, generate configuration files, and start services to the node based on the role. The state data may contain secrets, such as API keys, and Salt Minion retrieves them from Vault. Salt uses a JWT signed by the Salt Master to log in to the vault using the JWT Auth method.

Secret Management with HashiCorp Vault

Kubernetes

Kubernetes reads Vault secrets through an operator that synchronizes with Secret objects. The Kubernetes Auth method uses the Service Account token JWT to login, just like the JWT Auth method. This JWT contains the service account name, UID, and namespace. Vault can scope namespace based on dynamic policy.

Secret Management with HashiCorp Vault

Identity Provider – User login

Additionally, Vault can work with the Identity Provider through a delegated authorization method based on OAuth 2.0 so that users can get tokens with the right policies. The JWT issued by the Identity Provider contains the group or user ID to which it belongs, and this metadata can be used to assign a Vault policy.

Secret Management with HashiCorp Vault

Integrated ecosystem – Auth x Secret

Vault provides a plugin system for two major components: authentication (Auth method) and secret management (Secret Engine). Vault can enable the officially provided plugins and the custom plugins you can build. The Auth method provides authentication for obtaining a Vault token by various methods. As mentioned in the service integration example above, we mainly use JWT, OIDC, and Kubernetes for login. On the other hand, the secret engine provides secrets in various ways, such as KV for a static secret, PKI for certificate signing, issuing, etc.

And they have an ecosystem. Vault can easily integrate auth methods and secret engines with each other. For instance, if we add a DB dynamic credential secret engine, all existing platforms will instantly be supported, without needing to reinvent the wheel, on how they will auth to a separate service. Similarly, we can add a platform into the mix, and it would instantly have access to all the existing secret engines and their functionalities. Additionally, the Vault can perform permission to the arbitrary endpoint path provided by secret engines based on the authentication method and policies.

Wrap up

Vault integration for the core component is already ongoing and many GPG secrets have been migrated to Vault. We aim to make service integrations in our data centers, dynamic credentials, and improve CI/CD for Vault. Interested? We’re hiring for security platform engineering!

InsightCloudSec Supports the Recently Updated NSA/CISA Kubernetes Hardening Guide

Post Syndicated from Alon Berger original https://blog.rapid7.com/2022/04/14/insightcloudsec-supports-the-recently-updated-nsa-cisa-kubernetes-hardening-guide/

InsightCloudSec Supports the Recently Updated NSA/CISA Kubernetes Hardening Guide

The National Security Agency (NSA) and the Cybersecurity and Infrastructure Security Agency (CISA) recently updated their Kubernetes Hardening Guide, which was originally published in August 2021.

With the help and feedback received from numerous partners in the cybersecurity community, this guide outlines a strong line of action towards minimizing the chances of potential threats and vulnerabilities within Kubernetes deployments, while adhering to strict compliance requirements and recommendations.

The purpose of the Kubernetes hardening guide

This newly updated guide comes to the aid of multiple teams — including security, DevOps, system administrators, and developers — by focusing on the security challenges associated with setting up, monitoring, and maintaining a Kubernetes cluster. It brings together strategies to help organizations avoid misconfigurations and implement recommended hardening measures by highlighting three main sources of compromise:

  • Supply chain risks: These often occur during the container build cycle or infrastructure acquisition and are more challenging to mitigate.
  • Malicious threat actors: Attackers can exploit vulnerabilities and misconfigurations in components of the Kubernetes architecture, such as the control plane, worker nodes, or containerized applications.
  • Insider threats: These can be administrators, users, or cloud service providers, any of whom may have special access to the organization’s Kubernetes infrastructure.

“This guide focuses on security challenges and suggests hardening strategies for administrators of National Security Systems and critical infrastructure. Although this guide is tailored to National Security Systems and critical infrastructure organizations, NSA and CISA also encourage administrators of federal and state, local, tribal, and territorial (SLTT) government networks to implement the recommendations in this guide,” the authors state.

CIS Benchmarks vs. the Kubernetes Hardening Guide

For many practitioners, the Center for Internet Security (CIS) is the gold standard for security benchmarks; however, their benchmarks are not the only guidance available.

While the CIS is compliance gold, the CIS Benchmarks are very prescriptive and usually offer minimal explanations. In creating their own Kubernetes hardening guidelines, it appears that the NSA and CISA felt there was a need for a higher-level security resource that explained more of the challenges and rationale behind Kubernetes security. In this respect, the two work as perfect complements — you get strategies and rationale with the Kubernetes Hardening Guide and the extremely detailed prescriptive checks and controls enumerated by CIS.

In other words, CIS Benchmarks offer the exact checks you should use, along with recommended settings. The NSA and CISA guide supplements these by explaining challenges and recommendations, why they matter, and detailing how potential attackers look at the attack. In version 1.1, the updates include the latest hardening recommendations necessary to protect and defend against today’s threat actors.

Breaking down the updated guidance

As mentioned, the guide breaks down the Kubernetes threat model into three main sources: supply chain, malicious threat actors, and insider threats. This model reviews threats within the Kubernetes cluster and beyond its boundaries by including underlying infrastructure and surrounding workloads that Kubernetes does not manage.

Via a new compliance pack, InsightCloudSec supports and covers the main sources of compromise for a Kubernetes cluster, as mentioned in the guide. Below are the high-level points of concern, and additional examples of checks and insights, as provided by the InsightCloud Platform:

  • Supply chain: This is where attack vectors are more diverse and hard to tackle. An attacker might manipulate certain elements, services, and other product components. It is crucial to continuously monitor the entire container life cycle, from build to runtime. InsightCloudSec provides security checks to cover the supply chain level, including:

    • Checking that containers are retrieved from known and trusted registries/repositories
    • Checking for container runtime vulnerabilities
  • Kubernetes Pod security: Kubernetes Pods are often used as the attacker’s initial execution point. It is essential to have a strict security policy, in order to prevent or limit the impact of a successful compromise. Examples of relevant checks available in InsightCloudSec include:

    • Non-root containers and “rootless” container engines
      • Reject containers that execute as the root user or allow elevation to root.
      • Check K8s container configuration to use SecurityContext:runAsUser specifying a non-zero user or runAsUser.
      • Deny container features frequently exploited to break out, such as hostPID, hostIPC, hostNetwork, allowedHostPath.
    • Immutable container file systems
      • Where possible, run containers with immutable file systems.
      • Kubernetes administrators can mount secondary read/write file systems for specific directories where applications require write access.
    • Pod security enforcement
      • Harden applications against exploitation using security services such as SELinux®, AppArmor®, and secure computing mode (seccomp).
    • Protecting Pod service account tokens
      • Disable the secret token from being mounted by using the automountServiceAccountToken: false directive in the Pod’s YAML specification.
  • Network separation and hardening: Monitoring the Kubernetes cluster’s networking is key. It holds the communication among containers, Pods, services, and other external components. These resources are not isolated by default and therefore could lead to lateral movement or privilege escalations if not separated and encrypted properly. InsightCloudSec provides checks to validate that the relevant security policies are in place:

    • Namespaces
      • Set up network policies to isolate resources. Pods and services in different namespaces can still communicate with each other unless additional separation is enforced.
    • Network policies
      • Set up network policies to isolate resources. Pods and services in different namespaces can still communicate with each other unless additional separation is enforced.
    • Resource policies
      • Use resource requirements and limits.
    • Control plane hardening
      • Set up TLS encryption.
      • Configure control plane components to use authenticated, encrypted communications using Transport Layer Security (TLS) certificates.
      • Encrypt etcd at rest, and use a separate TLS certificate for communication.
      • Secure the etcd datastore with authentication and role-based access control (RBAC) policies. Set up TLS certificates to enforce Hypertext Transfer Protocol Secure (HTTPS) communication between the etcd server and API servers. Using a separate certificate authority (CA) for etcd may also be beneficial, as it trusts all certificates issued by the root CA by default.
    • Kubernetes Secrets
      • Place all credentials and sensitive information encrypted in Kubernetes Secrets rather than in configuration files
  • Authentication and authorization: Probably the primary mechanisms to leverage toward restricting access to cluster resources are authentication and authorization. There are several configurations that are supported but not enabled by default, such as RBAC controls. InsightCloudSec provides security checks that cover the activity of both users and service accounts, enabling faster detection of any unauthorized behavior:

    • Prohibit the addition of the service token by setting automaticServiceAccountToken or automaticServiceAccounttoken to false.
    • Anonymous requests should be disabled by passing the --anonymous-auth=false option to the API server.
    • Start the API server with the --authorizationmode=RBAC flag in the following command. Leaving authorization-mode flags, such as AlwaysAllow, in place allows all authorization requests, effectively disabling all authorization and limiting the ability to enforce least privilege for access.
  • Audit logging and threat detection: Kubernetes audit logs are a goldmine for security, capturing attributed activity in the cluster and making sure configurations are properly set. The security checks provided by InsightCloudSec ensure that the security audit tools are enabled. In order to keep track of any suspicious activity:

    • Check that the Kubernetes native audit logging configuration is enabled.
    • Check that seccomp: audit mode is enabled. The seccomp tool is disabled by default but can be used to limit a container’s system call abilities, thereby lowering the kernel’s attack surface. Seccomp can also log what calls are being made by using an audit profile.
  • Upgrading and application security practices: Security is an ongoing process, and it is vital to stay up to date with upgrades, updates, and patches not only in Kubernetes, but also in hypervisors, virtualization software, and other plugins. Furthermore, administrators need to make sure they uninstall old and unused components as well, in order to reduce the attack surface and risk of outdated tools. InsightCloudSec provides the checks required for such scenarios, including:

    • Promptly applying security patches and updates
    • Performing periodic vulnerability scans and penetration tests
    • Uninstalling and deleting unused components from the environment

Stay up to date with InsightCloudSec

Announcements like this catch the attention of the cybersecurity community, who want to take advantage of new functionalities and requirements in order to make sure their business is moving forward safely. However, this can often come with a hint of hesitation, as organizations need to ensure their services and settings are used properly and don’t introduce unintended consequences to their environment.

In order to help our customers to continuously stay aligned with the new guidelines, InsightCloudSec is already geared with a new compliance pack that provides additional coverage and support, based on insights that are introduced in the Kubernetes Hardening Guide.

Want to see InsightCloudSec in action? Check it out today.

Additional reading:

Why Security in Kubernetes Isn’t the Same as in Linux: Part 2

Post Syndicated from Sagi Rosenthal original https://blog.rapid7.com/2022/02/07/why-security-in-kubernetes-isnt-the-same-as-in-linux-part-2/

Why Security in Kubernetes Isn't the Same as in Linux: Part 2

Security for Kubernetes might not be quite the same as what you’re used to. In our previous article, we covered why security is so important in both Linux on-premises servers and cloud Kubernetes clusters. We also talked about 3 major aspects of Linux server security — processes, network, and file system — and how they correspond to Kubernetes. So today, we’ll talk more about the security concerns unique to Kubernetes.

Configurations

When trying to secure your infrastructure, you have to start by configuring it well. For example, this might mean disabling all unused features or using allow-policies wherever you can to keep your files, executables, or network available only to the intended entity. Both Linux servers and Kubernetes clusters have known vulnerabilities and recommendations.

One of the famous among these is the Center for Internet Security (CIS) recommendations, which are often used for compliance for insurance. Having a cloud security platform that can help implement these recommendations can be a major boon to your security.

API server

The Kubernetes API server is the admin panel, so to speak, of your cluster. In most deployments, this HTTP server is exposed to the internet. This means that a hacker that finds their way to the API server can have full control over your cluster.

Using the most strict authentication and authorization settings is highly recommended to prevent this. If you can set your cluster to private, with access only allowed from an internal network, you can sleep well at night. And just as with with configurations, you should be aware at all times of who (and what) can have access to which resources and operations in your cluster.

Audit log and other Kubernetes logs

In Kubernetes, there are additional attack vectors using the Kubernetes control plane itself that don’t exist in Linux server security. For example, an attack could call the Kubernetes API to load a new pod you didn’t want.

Kubernetes and cloud providers invest a lot of effort in preventing unauthorized users and machines from doing this. But there is always a chance that one of your employees gets hacked or a badly configured service account has too much power. Kubernetes logs all requests to its audit log so they can be investigated later in case of a breach. Additional logs include the kube-API log or etcd (resources DB) log.

Why Security in Kubernetes Isn't the Same as in Linux: Part 2

Container runtime

Container runtime is also a unique aspect of Kubernetes security. In Kubernetes, each node is actually a virtual Linux server running a container runtime daemon. A container runtime is responsible for managing the images and running and monitoring the containers, their storage and network provisioning, and more. You might be familiar with Docker as a container runtime. In reality, Docker is a company developing multiple container tools, and their container runtime is named containerd. Other container runtimes for Kubernetes include CRI-O, Rocket, and more.

Apart from a whole Linux server or virtual machine that uses its own single operating system, multiple containers are usually running over multiple operating systems that share the same host kernel. Although the operating systems of the containers are minimal, they may still have security holes. And the more holes the merrier for the attacker! Monitoring the container runtime activity can also yield a lot of information about what is going on in the node — what processes are running inside the container, any internal communication that might escape from network monitoring, the data being collected and created, and so on.

Right tools, lower risks

The unique interfaces and engines of Kubernetes can be an additional exposed surface in terms of security, especially when considering the complexity of the system. However, don’t forget that distribution and containerization add to security and help isolate potential malware.

Kubernetes may come with a few new risks to watch out for, but that’s no reason to be scared off. As long as you know what to look for, security for your Kubernetes clusters doesn’t have to be any harder than it was for your Linux servers. And there’s no need to go it alone — not when you can have handy tools like InsightCloudSec, Rapid7’s cloud-native security platform, at your side.

Additional reading

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Why Security in Kubernetes Isn’t the Same as in Linux: Part 1

Post Syndicated from Sagi Rosenthal original https://blog.rapid7.com/2022/01/27/why-security-in-kubernetes-isnt-the-same-as-in-linux-part-1/

Why Security in Kubernetes Isn't the Same as in Linux: Part 1

Kubernetes was first presented in 2014, and it almost entirely changed the way technological and even non-tech companies use infrastructure for running their applications. The Kubernetes platform still feels new and exciting — it has awesome features and can fit most use cases.

But hackers find the combination of new technology and user inexperience that’s just right for their malicious activity. Deploying your product on a Kubernetes cluster has a different security cost than on a traditional Linux server.

What are the risks of using Kubernetes?

The risks of a Kubernetes (K8s) deployment are actually the same as in traditional Linux servers. Most of them can be summed up to these 3 targets:

  • Denial of Service (DoS): These kinds of attacks want your service down. They can be caused in many different ways, including distributed denial of service (DDoS) attacks or SQL injections that erase your databases. As there is no direct profit to the attacker, these attacks are of most interest to malicious groups who disagree with your company values or products, or to your competitors.
Why Security in Kubernetes Isn't the Same as in Linux: Part 1
  • Information exfiltration: Another type of attack targets the information you hold. These attacks can collect your information, like your profits, source codes, names of employees, and so on. Or they can collect private data about your customers and users — who they are, their credit card numbers, health state, financial assets, and everything you know about them. None of this is data you want to be known outside the company.
  • Hardware hijack: A hardware hijack is any type of attack that runs a malicious code on your compute resources and causes them to operate in a way that you did not program or intend them to run. Most of these attacks are related to cryptocurrency. They typically either turn your CPU/GPU to Bitcoin miner or conduct a ransomware attack by encrypting your file system and requesting you to pay ransom (usually in Bitcoin) to unencrypt it. As the important point here for attackers is profit, not the identity of the victim, these attacks usually originate from bots or automatic scripts, rather than with dedicated special operations of malicious groups or individuals.

How can you defend yourself?

Securing deployments and identifying malicious activity on Kubernetes clusters is similar to how it’s done on traditional Linux servers. Most of the differences are in the implementation itself. But there are some distinctions worth mentioning. Let’s focus in on the operating system aspects of security.

Processes and system calls

“The system call is the fundamental interface between an application and the Linux kernel.” —Linux manual page

Linux has over 400 different system calls. These can be used for requesting to read a file, executing another program, sending a network message, and more. As you’ve probably guessed, these operations can be risky when used by unwanted programs.

The Linux Kernel has security mechanisms against malware, but it isn’t fully protected. So system calls may seem legitimate even when they aren’t. Tracking these system calls can give good insight on what a process does. In native Linux, it can be easy to track these down from a single point on the server. However, in K8s Linux nodes, the distribution, dynamics, and containerization makes this mission a very complex task.

Network security

The internet connection is your face for the customers, but it’s also the entry point for various malicious software into your infrastructure. Luckily, the big cloud providers and most of the internet-facing frameworks are well-protected against these attacks. But nothing is 100% safe. Moreover, some of the images you are using may contain security holes themselves. These can cause a malicious program to initiate from inside your cluster.

Tracking the network from the inside out can give you a lot of information on malicious activity. But you also have to consider the “east-to-west” traffic inside your infrastructure — the internal communication. In traditional Linux servers and VMs, you know exactly which microservices exist and define firewall services accordingly. However, in Kubernetes, the dynamic nature of the pods and resources makes it hard to track this traffic, so it can be difficult to find the network holes.

Why Security in Kubernetes Isn't the Same as in Linux: Part 1

File system

It may seem easy to detect new files and file changes in order to determine unwanted changes, but tracking and analyzing your whole file system can be a large, complex task in Linux servers. They can have terabytes of storage, and reading them — especially from a magnetic hard disk — isn’t fast enough to detect malicious activity when it happens. However, the containerization concept of Kubernetes can be to our advantage here, as container images are usually small, lightweight, and repetitive. Looking inside the containers files should have highly expected results.

More to come

This is one of two articles covering the detective resources that can help us identify unwanted activity in your Kubernetes clusters. In this first part, we saw that both Kubernetes and traditional Linux servers have vulnerabilities that originate in the processes, network, or file system. However, there are differences in how to monitor malicious activity in Kubernetes versus Linux. Some vulnerabilities may seem harder to defend in Kubernetes, but most of them are actually easier.

InsightCloudSec, Rapid7’s cloud-native security platform, covers these differences and ensures your on-premises server farm is secured.

The next article will explain further about the unique aspects of Kubernetes security that do not exist in traditional Linux servers. Stay tuned!

Additional reading

Kubernetes Guardrails: Bringing DevOps and Security Together on Cloud

Stay Ahead of Threats With Cloud Workload Protection

Make Room for Cloud Security in Your 2022 Budget

Time to Act: Bridging the Gap in Cloud Automation Adoption

NEVER MISS A BLOG

Get the latest stories, expertise, and news about security today.

Efficiently Scaling kOps clusters with Amazon EC2 Spot Instances

Post Syndicated from Pranaya Anshu original https://aws.amazon.com/blogs/compute/efficiently-scaling-kops-clusters-with-amazon-ec2-spot-instances/

This post is written by Carlos Manzanedo Rueda, WW SA Leader for EC2 Spot, and Brandon Wagner, Senior Software Development Engineer for EC2.

This post focuses on how you can leverage recently released tools to optimize your usage of Amazon EC2 Spot Instances on Kubernetes Operations (kOps) clusters. Spot Instances let you utilize unused capacity in the AWS cloud for up to 90% off compared to On-Demand prices, and they are a great fit for fault-tolerant, containerized applications. kOps is an open source project providing a cohesive toolset for provisioning, operating, and deleting Kubernetes clusters in the cloud.

Even with customers such as Snap Inc., Babylon Health, and Fidelity Investments telling us how Amazon Elastic Kubernetes Service (EKS) is essential for running their containerized workloads, we appreciate that there are scenarios where using Amazon EC2 instances and kOps are a viable alternative. At AWS, we understand “one size does not fit all.” While we encourage Kubernetes users to contribute their feedback to the AWS container roadmap so that we can improve our services, we also would like to reduce heavy lifting and simplify Spot best practices integration in kOps clusters.

To simplify the integration of Spot Instances in kOps clusters, in January of 2021 we introduced a new kops toolbox command: kops toolbox instance-selector. The utility is distributed as part of the standard kOps distribution. Moreover, it simplifies the creation of kOps Instance Groups by configuring them with full adherence to Spot Instances best practices.

Handling Spot interruption notifications in Kubernetes

Let’s quickly recap Spot best practices. Spot Instances perform exactly like any other EC2 Instances, except that in exchange for their discounted price, they can be interrupted with a two-minute warning when EC2 must reclaim capacity. Applications running on Spot can typically recover from transient interruptions by simply starting a new instance. Spot best practices involve measures such as diversifying into as many Spot capacity pools as possible, choosing the right Spot allocation strategy, and utilizing Spot integrated services. These handle the Spot Instances lifecycles for you. This blog post on handling Spot interruptions dives deeper into AWS’s EC2 Spot best practices.

In Kubernetes, to handle spot termination and rebalance recommendation events (both explained in this blog post on proactively managing Spot Instance lifecycle), we utilize the AWS open-source project AWS Node Termination Handler. We will be deploying the Node Termination Handler as a kOps managed addon, which simplifies its setup and configuration.

The Node Termination Handler ensures that the Kubernetes control plane responds appropriately to events that can make EC2 instances unavailable. It can be operated in two different modes: Instance Metadata Service (IMDS), deployed as a DaemonSet, or Queue Processor, deployed as a Deployment Controller. We recommend running it in Queue Processor mode. The Queue Processor controller continuously monitors an Amazon Simple Queue Service (SQS) queue for events received from Amazon EventBridge. This can lead to node termination in your cluster. When one of these events is received, the Node Termination Handler notifies the Kubernetes control plane to cordon and drain the node that is about to be interrupted. Then, the kubelet sends a SIGTERM signal to the Pods and containers running on the node. This lets your application proceed with a graceful termination – one of the recommended best practices of a Twelve-Factor App.

The kOps managed addon will let you configure the Node Termination Handler within your kOps cluster spec and, more importantly, manage provisioning the necessary infrastructure for you.

To deploy the AWS Node Termination Handler, we start by editing our cluster spec:

kops edit cluster --name ${KOPS_CLUSTER_NAME}

We append the nodeTerminationHandler configuration to the spec node:

spec:
  nodeTerminationHandler:
    enabled: true
    enableSQSTerminationDraining: true
    managedASGTag: "aws-node-termination-handler/managed"

Finally, we deploy the changes made to our cluster configuration:

kops update cluster --name ${KOPS_CLUSTER_NAME} –-state {KOPS_STATE_STORE} --yes --admin

${KOPS_CLUSTER_NAME} refers to the environment variable containing the cluster name, and ${KOPS_STATE_STORE} indicates the Amazon Simple Storage Service (S3) bucket – or kOps State Store – where kOps configuration is stored.

To check that your Node Termination Handler deployment was successful, you can execute:

kops get deployment aws-node-termination-handler -n kube-system

Instance Flexibility and Diversification

Diversification and selection of multiple instances types is essential to acquire and maintain Spot capacity, as well as to successfully replace interrupted instances with others from different pools. When running kOps on AWS, this is implemented by utilizing Amazon EC2 Auto Scaling. Amazon Auto Scaling group’s capacity-optimized allocation strategy ensures that Spot capacity is provisioned from the optimal pools, thereby reducing the chances of Spot terminations.

Simplifying adoption of Spot Best practices on kOps

Before the kops toolbox instance-selector, you would have to setup Spot best practices on kOps manually. This involved writing a stub file following the InstanceGroup specification and examples, and then implementing every best practice, including finding every pool that qualifies for our workload.

The new functionality in kops toolbox instance-selector simplifies InstanceGroup creation by moving the focus of kOps users and administrators from this manual configuration over to simply selecting the vCPUs and Memory requirements for their application (or a base instance type), and then letting kops toolbox instance-selector define the right configuration. Behind the scenes, it utilizes a library allowing it to plug into the feature-set of Amazon EC2 instance selector. At its core, ec2 instance selector helps you select compatible instance types for your application to run on. Utilize ec2 instance selector CLI or library when automating your configurations. In the case of kOps, the integration already comes in the kops toolbox.

For example, let’s say your cluster runs stateless, fault tolerant applications that are CPU/Memory bound and have a ratio of vCPU to Memory requiring at least 1vCPU : 4GB of RAM. You can run the following command in order to acquire cluster spot capacity:

kops toolbox instance-selector "spot-group-" \
  --usage-class spot --flexible --cluster-autoscaler \
  --vcpus-to-memory-ratio="1:4" \
  --ig-count 2

Let’s focus first on the command, and later cover its output. You can get a list of parameters and default values by running: kops toolbox instance-selector –help. A few default parameters weren’t passed in the command above, but they will be set to sane defaults, such as the maximum and minimum number of instances in the Instance Group. The parameter –flexible refers to our request to provide a group of flexible instance types spanning multiple generations.

Once you’ve defined the InstanceGroups, start them up by using the command:

kops update cluster \
–state=${KOPS_STATE_STORE} \
–name=${KOPS_CLUSTER_NAME} \
–yes –admin

The two commands above define and create a request for spot capacity from a flexible and diversified pool set, which meet the criteria to provide at least 4GB of RAM for each vCPU. The command creates not just one, but two node groups named “spot-group-1” and “spot-group-2” (–ig-count 2).

Now, let’s check the contents of the configuration file generated by kops toolbox instance-selector. To preview a configuration without making changes, add –dry-run –output yaml.

apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: "2020-08-11T10:22:16Z"
  labels:
    kops.k8s.io/cluster: spot-kops-cluster.k8s.local
  name: spot-group-1
spec:
  cloudLabels:
    k8s.io/cluster-autoscaler/enabled: "1"
    k8s.io/cluster-autoscaler/spot-kops-cluster.k8s.local: "1"
    kops.k8s.io/instance-selector: "1"
  image: 099720109477/ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-20200716
  machineType: m3.xlarge
  maxSize: 15
  minSize: 2
  mixedInstancesPolicy:
    instances:
    - m3.xlarge
    - m4.xlarge
    - m5.xlarge
    - m5a.xlarge
    - t2.xlarge
    - t3.xlarge
    onDemandAboveBase: 0
    onDemandBase: 0
    spotAllocationStrategy: capacity-optimized
  nodeLabels:
    kops.k8s.io/instancegroup: spot-group-1
  role: Node
  subnets:
  - eu-west-1a
  - eu-west-1b
  - eu-west-1c
...

The configuration above lists one of the groups created by kops toolbox instance-selector in the previous example. The second group will have a very similar make-up and format, except that it will refer to instances such as: r3.xlarge, r4.xlarge, r5.xlarge, and r5a.xlarge in the mixedInstancesPolicy section. By defining the parameter –usage-class to Spot, the configuration created by kops toolbox instance-selector will add the tags identifying this Auto Scaling group as a Spot group. When the nodes are initialized, kOps controller will identify the nodes as Spot and add the label node-role.kubernetes.io/spot-worker=true. Therefore, at a later stage, we can apply placement logic to our cluster by using nodeSelector and affinity. The configuration above adheres to the definition of kOps support for mixed Instance Groups in AWS, and adds all of the right cloudLabels in order to integrate and implement not only with Spot best practices, but also with Cluster Autoscaler Auto-Discovery configuration best practices.

Kubernetes Cluster Autoscaler is a Kubernetes controller that dynamically adjusts the cluster size. According to a 2020 survey by Cloud Native Computing Foundation (CNCF), 70% of Kubernetes workloads plan to autoscale their stateless applications. Dynamically scaling applications and clusters is also a great practice for optimizing your system costs in situations where capacity is unnecessary, as well as for scaling out accordingly in order to meet business demands. If there are Pods that can’t be scheduled due to insufficient resources, then Cluster Autoscaler will issue a Scale-out action. When there are nodes in the cluster that have been under-utilized for a configurable period of time, Cluster Autoscaler will Scale-in the cluster, and even down-scale to 0 instances when applications don’t need to be run.

On Scale-out operations, Cluster Autoscaler evaluates a set of node groups. When Cluster Autoscaler runs on AWS, node groups are implemented by using Auto Scaling groups (referring to the same instance group as a kOps Instance Group). Therefore, to calculate the number of nodes to scale-out, Cluster Autoscaler assumes that every instance in a node group has the same number of vCPUs and memory size.

By creating two node groups, you apply two diversification levels. You diversify within each node group by using an Auto Scaling group with Mixed Instance Policies and capacity-optimized allocation strategy. Then, to increase the pool range you can leverage, you add more than one node group, while still adhering to the best practices required by Cluster Autoscaler.

While we’ve been focusing on Spot Instances, the parameter –usage-class can be utilized to get OnDemand instances instead of Spot. In the next example, let’s say we would like to get On-Demand capacity in order to train complex deep learning models that will take hours to run. To train our models, we need instances that have at least one GPU with 16GB of RAM on instances that have at least 32GB Ram and 8 vCPUs.

kops toolbox instance-selector "ondemand-gpu-group" \
  --gpus-min 1 --gpu-memory-total-min 16gb --memory-min 32gb --vcpus 8\
  --node-count-max 4 --node-count-min 4 --cpu-architecture amd64

The command above, followed by kops update cluster –state=${KOPS_STATE_STORE} –name=${KOPS_CLUSTER_NAME} –yes can be utilized to produce a configuration and create a nodegroup with the right requirements. This could be created at the start of the training procedure, and then – once the training is done and the capacity is no longer needed – you could automate the nodegroup removal with the following command:

kops delete instancegroup ondemand-gpu-group --name ${KOPS_CLUSTER_NAME} –yes

Conclusions

We believe the best way to run Kubernetes on AWS is by using Amazon EKS. However, scenarios may exist where kOps is utilized in AWS. By using the kOps managed add-on to install aws-node-termination-handler and kops toolbox instance-selector, it is easier than ever to apply Spot best practices to Kubernetes workloads on kOps, and cost-optimize fault-tolerant, stateless applications. These tools let kOps workloads gracefully terminate applications, as well as proactively handle the replacement of instances that are at an elevated risk of termination. kops toolbox instance-selector leverages Amazon ec2-instance-selector in order to simplify the creation of Instance Group configurations adhering to Spot Instances best practices, implementing instance type flexibility, and utilizing capacity-optimized allocation strategy.

By adhering to these best practices to reduce the frequency of Spot interruptions, we will optimize not only the cost, but also our Spot Instances selection. This will enable us to acquire capacity at a massive scale if necessary.

To start using the tools we have described, follow along this step-by-step tutorial. Also, head over to the kops toolbox documentation to learn more about the ways in which you can use it.

Continuous runtime security monitoring with AWS Security Hub and Falco

Post Syndicated from Rajarshi Das original https://aws.amazon.com/blogs/security/continuous-runtime-security-monitoring-with-aws-security-hub-and-falco/

Customers want a single and comprehensive view of the security posture of their workloads. Runtime security event monitoring is important to building secure, operationally excellent, and reliable workloads, especially in environments that run containers and container orchestration platforms. In this blog post, we show you how to use services such as AWS Security Hub and Falco, a Cloud Native Computing Foundation project, to build a continuous runtime security monitoring solution.

With the solution in place, you can collect runtime security findings from multiple AWS accounts running one or more workloads on AWS container orchestration platforms, such as Amazon Elastic Kubernetes Service (Amazon EKS) or Amazon Elastic Container Service (Amazon ECS). The solution collates the findings across those accounts into a designated account where you can view the security posture across accounts and workloads.

 

Solution overview

Security Hub collects security findings from other AWS services using a standardized AWS Security Findings Format (ASFF). Falco provides the ability to detect security events at runtime for containers. Partner integrations like Falco are also available on Security Hub and use ASFF. Security Hub provides a custom integrations feature using ASFF to enable collection and aggregation of findings that are generated by custom security products.

The solution in this blog post uses AWS FireLens, Amazon CloudWatch Logs, and AWS Lambda to enrich logs from Falco and populate Security Hub.

Figure : Architecture diagram of continuous runtime security monitoring

Figure 1: Architecture diagram of continuous runtime security monitoring

Here’s how the solution works, as shown in Figure 1:

  1. An AWS account is running a workload on Amazon EKS.
    1. Runtime security events detected by Falco for that workload are sent to CloudWatch logs using AWS FireLens.
    2. CloudWatch logs act as the source for FireLens and a trigger for the Lambda function in the next step.
    3. The Lambda function transforms the logs into the ASFF. These findings can now be imported into Security Hub.
    4. The Security Hub instance that is running in the same account as the workload running on Amazon EKS stores and processes the findings provided by Lambda and provides the security posture to users of the account. This instance also acts as a member account for Security Hub.
  2. Another AWS account is running a workload on Amazon ECS.
    1. Runtime security events detected by Falco for that workload are sent to CloudWatch logs using AWS FireLens.
    2. CloudWatch logs acts as the source for FireLens and a trigger for the Lambda function in the next step.
    3. The Lambda function transforms the logs into the ASFF. These findings can now be imported into Security Hub.
    4. The Security Hub instance that is running in the same account as the workload running on Amazon ECS stores and processes the findings provided by Lambda and provides the security posture to users of the account. This instance also acts as another member account for Security Hub.
  3. The designated Security Hub administrator account combines the findings generated by the two member accounts, and then provides a comprehensive view of security alerts and security posture across AWS accounts. If your workloads span multiple regions, Security Hub supports aggregating findings across Regions.

 

Prerequisites

For this walkthrough, you should have the following in place:

  1. Three AWS accounts.

    Note: We recommend three accounts so you can experience Security Hub’s support for a multi-account setup. However, you can use a single AWS account instead to host the Amazon ECS and Amazon EKS workloads, and send findings to Security Hub in the same account. If you are using a single account, skip the following account specific-guidance. If you are integrated with AWS Organizations, the designated Security Hub administrator account will automatically have access to the member accounts.

  2. Security Hub set up with an administrator account on one account.
  3. Security Hub set up with member accounts on two accounts: one account to host the Amazon EKS workload, and one account to host the Amazon ECS workload.
  4. Falco set up on the Amazon EKS and Amazon ECS clusters, with logs routed to CloudWatch Logs using FireLens. For instructions on how to do this, see:

    Important: Take note of the names of the CloudWatch Logs groups, as you will need them in the next section.

  5. AWS Cloud Development Kit (CDK) installed on the member accounts to deploy the solution that provides the custom integration between Falco and Security Hub.

 

Deploying the solution

In this section, you will learn how to deploy the solution and enable the CloudWatch Logs group. Enabling the CloudWatch Logs group is the trigger for running the Lambda function in both member accounts.

To deploy this solution in your own account

  1. Clone the aws-securityhub-falco-ecs-eks-integration GitHub repository by running the following command.
    $git clone https://github.com/aws-samples/aws-securityhub-falco-ecs-eks-integration
  2. Follow the instructions in the README file provided on GitHub to build and deploy the solution. Make sure that you deploy the solution to the accounts hosting the Amazon EKS and Amazon ECS clusters.
  3. Navigate to the AWS Lambda console and confirm that you see the newly created Lambda function. You will use this function in the next section.
Figure : Lambda function for Falco integration with Security Hub

Figure 2: Lambda function for Falco integration with Security Hub

To enable the CloudWatch Logs group

  1. In the AWS Management Console, select the Lambda function shown in Figure 2—AwsSecurityhubFalcoEcsEksln-lambdafunction—and then, on the Function overview screen, select + Add trigger.
  2. On the Add trigger screen, provide the following information and then select Add, as shown in Figure 3.
    • Trigger configuration – From the drop-down, select CloudWatch logs.
    • Log group – Choose the Log group you noted in Step 4 of the Prerequisites. In our setup, the log group for the Amazon ECS and Amazon EKS clusters, deployed in separate AWS accounts, was set with the same value (falco).
    • Filter name – Provide a name for the filter. In our example, we used the name falco.
    • Filter pattern – optional – Leave this field blank.
    Figure 3: Lambda function trigger - CloudWatch Log group

    Figure 3: Lambda function trigger – CloudWatch Log group

  3. Repeat these steps (as applicable) to set up the trigger for the Lambda function deployed in other accounts.

 

Testing the deployment

Now that you’ve deployed the solution, you will verify that it’s working.

With the default rules, Falco generates alerts for activities such as:

  • An attempt to write to a file below the /etc folder. The /etc folder contains important system configuration files.
  • An attempt to open a sensitive file (such as /etc/shadow) for reading.

To test your deployment, you will attempt to perform these activities to generate Falco alerts that are reported as Security Hub findings in the same account. Then you will review the findings.

To test the deployment in member account 1

  1. Run the following commands to trigger an alert in member account 1, which is running an Amazon EKS cluster. Replace <container_name> with your own value.
    kubectl exec -it <container_name> /bin/bash
    touch /etc/5
    cat /etc/shadow > /dev/null
  2. To see the list of findings, log in to your Security Hub admin account and navigate to Security Hub > Findings. As shown in Figure 4, you will see the alerts generated by Falco, including the Falco-generated title, and the instance where the alert was triggered.

    Figure 4: Findings in Security Hub

    Figure 4: Findings in Security Hub

  3. To see more detail about a finding, check the box next to the finding. Figure 5 shows some of the details for the finding Read sensitive file untrusted.
    Figure 5: Sensitive file read finding - detail view

    Figure 5: Sensitive file read finding – detail view

    Figure 6 shows the Resources section of this finding, that includes the instance ID of the Amazon EKS cluster node. In our example this is the Amazon Elastic Compute Cloud (Amazon EC2) instance.

    Figure 6: Resource Detail in Security Hub finding

To test the deployment in member account 2

  1. Run the following commands to trigger a Falco alert in member account 2, which is running an Amazon ECS cluster. Replace <<container_id> with your own value.
    docker exec -it <container_id> bash
    touch /etc/5
    cat /etc/shadow > /dev/null
  2. As in the preceding example with member account 1, to view the findings related to this alert, navigate to your Security Hub admin account and select Findings.

To view the collated findings from both member accounts in Security Hub

  1. In the designated Security Hub administrator account, navigate to Security Hub > Findings. The findings from both member accounts are collated in the designated Security Hub administrator account. You can use this centralized account to view the security posture across accounts and workloads. Figure 7 shows two findings, one from each member account, viewable in the Single Pane of Glass administrator account.

    Figure 7: Write below /etc findings in a single view

    Figure 7: Write below /etc findings in a single view

  2. To see more information and a link to the corresponding member account where the finding was generated, check the box next to the finding. Figure 8 shows the account detail associated with a specific finding in member account 1.
    Figure 8: Write under /etc detail view in Security Hub admin account

    Figure 8: Write under /etc detail view in Security Hub admin account

    By centralizing and enriching the findings from Falco, you can take action more quickly or perform automated remediation on the impacted resources.

 

Cleaning up

To clean up this demo:

  1. Delete the CloudWatch Logs trigger from the Lambda functions that were created in the section To enable the CloudWatch Logs group.
  2. Delete the Lambda functions by deleting the CloudFormation stack, created in the section To deploy this solution in your own account.
  3. Delete the Amazon EKS and Amazon ECS clusters created as part of the Prerequisites.

 

Conclusion

In this post, you learned how to achieve multi-account continuous runtime security monitoring for container-based workloads running on Amazon EKS and Amazon ECS. This is achieved by creating a custom integration between Falco and Security Hub.

You can extend this solution in a number of ways. For example:

  • You can forward findings across accounts using a single source to security information and event management (SIEM) tools such as Splunk.
  • You can perform automated remediation activities based on the findings generated, using Lambda.

To learn more about managing a centralized Security Hub administrator account, see Managing administrator and member accounts. To learn more about working with ASFF, see AWS Security Finding Format (ASFF) in the documentation. To learn more about the Falco engine and rule structure, see the Falco documentation.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security news? Follow us on Twitter.

Rajarshi Das

Rajarshi Das

Rajarshi is a Solutions Architect at Amazon Web Services. He focuses on helping Public Sector customers accelerate their security and compliance certifications and authorizations by architecting secure and scalable solutions. Rajarshi holds 4 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

Author

Adam Cerini

Adam is a Senior Solutions Architect with Amazon Web Services. He focuses on helping Public Sector customers architect scalable, secure, and cost effective systems. Adam holds 5 AWS certifications including AWS Certified Solutions Architect – Professional and AWS Certified Security – Specialist.

Stay Ahead of Threats With Cloud Workload Protection

Post Syndicated from Alon Berger original https://blog.rapid7.com/2021/12/10/stay-ahead-of-threats-with-cloud-workload-protection/

Stay Ahead of Threats With Cloud Workload Protection

When it comes to cloud-native applications, optimal security requires a modern, integrated, and automated approach that starts in development and extends to runtime protection. Cloud workload protection (CWP) helps make that goal possible by bringing major structural changes to software development and enhancing security across all processes.

Assessing workload risk in the cloud

Both the rise of cloud proliferation and the high speed of deployments can make distilling down the necessary cloud security controls an overwhelming challenge. Add to the mix the ever-evolving threat landscape, and the measures you take can literally make or break your cloud deployments, including the security of your workloads.

The increasing distribution and complexity of cloud-native applications across VMs, hosts, Kubernetes, and multiple vendors requires an end-to-end, consistent workload protection platform that unifies both CSPM and CWPP functionalities, thus enabling a holistic approach for protecting valuable assets in the cloud.

How Rapid7 is changing cloud workload protection

In order to get unmanaged risk under control, Rapid7 is on a mission to help drive cloud security forward, both within individual organizations and as an entire industry.

This is why Rapid7 recently introduced InsightCloudSec, an entire division dedicated solely to cloud security and all it encompasses.

In its most recent launch, InsightCloudSec brings forward a series of functionalities that bolsters our ability to help our customers protect their cloud workloads and deployments by providing a fully integrated, cloud-native security solution at scale. These improvements include:

  • Enhancing risk assessment of Kubernetes and containers
  • Enabling developers to scan code from the CLI on their machines
  • Expanding automation based on event-driven detections in multi-cloud environments
  • Providing unified visibility and robust context across multi-cloud environments
  • Automating workflows so organizations can gain maximum efficiency

3 keys to consolidating cloud risk assessment

In an effort to help this emerging market become more mainstream and easier to operationalize, we believe there are 3 main things that organizations need to be able to do when it comes to cloud security.

1. Shift left

Prevent problems before they happen by providing a single, consistent set of security checks throughout the CI/CD pipeline to uncover misconfigurations and policy violations without delaying deployment. Not only does this help solve issues at their root cause and prevent them from happening over and over again, but it also makes for a better working relationship between the security team and the DevOps organization that is trying to move fast and innovate. By shifting left, organizations save money, and security teams are able to give developers the information and tools they need to make the right decisions as early as possible, avoiding delays later in the deployment or operationalizing stages of the CI/CD pipeline.

2. Reduce noise

Security teams need more context and simpler insights so they can actually understand the top risks in their environment. By unifying visibility across the entire cloud footprint, normalizing the terminology across each different cloud environment, and then providing rich context about interconnected assets, security teams can vastly simplify risk assessment and decision-making across even the most complex cloud and container environments.

3. Automate workflows

Finally, the ephemeral nature and speed of change in cloud environments has outstripped the human capability to manage and remediate issues manually. This means organizations need to automate DevSecOps best practices by leveraging precise automation that speeds up remediation, reduces busywork, and allows the security team to focus on the bigger picture.

By bringing together enhanced risk assessment of Kubernetes and containers, shifting further left with a CLI integration, and expanding event-based detections into the cloud-native security platform, Rapid7 is making it easier for teams to consolidate visibility and maintain consistent controls across even the most complex cloud environments.

Stay ahead of security in the modern threat landscape by ensuring cloud security as an ongoing process, and reduce your attack surface by building the necessary security measures early in an application’s life cycle.

Kubernetes Guardrails: Bringing DevOps and Security Together on Cloud

Post Syndicated from Alon Berger original https://blog.rapid7.com/2021/12/06/kubernetes-guardrails-bringing-devops-and-security-together-on-cloud/

Kubernetes Guardrails: Bringing DevOps and Security Together on Cloud

Cloud and container technologies are being increasingly embraced by organizations around the globe because of the efficiency, superior visibility, and control they provide to DevOps and IT teams.

While DevOps teams see the benefits of cloud and container solutions, these tools create a learning curve for their security colleagues. Because of this, security teams often want to slow down adoption while they figure out a strategy for maintaining security and compliance in these new fast-moving environments.

Container and Kubernetes (K8s) environments are already fairly complex as it is, and layering multiple additional security tools into the mix makes it even more challenging from a management perspective. Organizations need to find a way to enable their DevOps teams to move quickly and take advantage of the benefits of containers and K8s, while staying within the parameters the security team needs to maintain compliance with organizational policy.

This challenge goes beyond technology. These teams need to find a solution that allows them to work together well, doesn’t over-complicate their working relationship, and lets both sides get what they want with minimal overhead.

A holistic approach to Kubernetes security

As an open-source container orchestration system for automating deployment, scaling, and management of containerized applications, Kubernetes is extremely powerful. However, organizations must carefully balance their eagerness to embrace the dynamic, self-service nature of Kubernetes with the real-life need to manage and mitigate security and compliance risk.

Rapid7’s recent introduction of InsightCloudSec intelligently unifies both CSPM and CWPP functionalities, thus enabling a holistic approach for protecting valuable assets in the cloud — one that includes Kubernetes and workload security.

Learn more about InsightCloudSec here

Built for DevOps, trusted by security

In retrospect, 2020 was a tipping point for the Kubernetes community, with a massive increase in adoption across the globe. Many companies, seeking an efficient, cost-effective way to make this huge shift to the cloud, turned to Kubernetes. But this in turn created a growing need to remove Kubernetes security blind spots. For this reason, we’ve introduced Kubernetes Guardrails.

With Kubernetes Security Guardrails, organizations are equipped with a multi-cluster vulnerability scanner that covers rich Kubernetes security best practices and compliance policies, such as CIS Benchmarks. As part of Rapid7’s InsightCloudSec solution, this new capability introduces a platform-based and easy-to-maintain solution for Kubernetes security that is deployed in minutes and is fully streamlined in the Kubernetes pipeline.

Securing Kubernetes with InsightCloudSec

Kubernetes Security Guardrails is the most comprehensive solution for all relevant Kubernetes security requirements, designed from a DevOps perspective with in-depth visibility for security teams.

InsightCloudSec is designed to be an agentless state machine, seamlessly applied to any computing environment — public cloud or private software-defined infrastructure.

InsightCloudSec continually interacts with the APIs to gather information about the state of the hosts and the Kubernetes clusters of interest. These hosts can be GCP, AWS, Azure, or a private data center that can expose infrastructure information via an API.

Integrated within minutes, the Kubernetes Guardrails functionality simplifies the security assessment for the entire Kubernetes environment and the CI/CD pipeline, while also creating baseline profiles for each cluster, and highlighting and scoring security risks, misconfigurations, and hygiene drifts.

Both DevOps and Security teams enjoy the continuous and dynamic analysis of their Kubernetes deployments, all while seamlessly complying with regulatory requirements for Kubernetes.

With Kubernetes Guardrails, Dev teams are able to create a snapshot of cluster risks, delivered with a detailed list of misconfigurations, while detecting real-time hygiene and conformance drifts for deployments running on any cloud environment. Some of the most common use cases include:

  • Kubernetes vulnerability scanning
  • Hunting misplaced secrets and excessive secret access
  • Workload hardening (from pod security to network policies)
  • Istio security and configuration best practices
  • Ingress controllers security
  • Kubernetes API server access privileges
  • Kubernetes operators best practices
  • RBAC controls and misconfigurations

Ready to drive cloud security forward?

Rapid7 is proud to introduce a Kubernetes security solution that encapsulates all-in-one capabilities and unmatched coverage for all things Kubernetes.

With a security-first approach and strict compliance adherence, Kubernetes Guardrails enable a better understanding and control over distributed projects, and help organizations maintain smooth business operations.

Want to learn more? Watch the on-demand webinar on InsightCloudSec and its Kubernetes protection.

Optimizing Apache Flink on Amazon EKS using Amazon EC2 Spot Instances

Post Syndicated from Emma White original https://aws.amazon.com/blogs/compute/optimizing-apache-flink-on-amazon-eks-using-amazon-ec2-spot-instances/

This post is written by Kinnar Sen, Senior EC2 Spot Specialist Solutions Architect

Apache Flink is a distributed data processing engine for stateful computations for both batch and stream data sources. Flink supports event time semantics for out-of-order events, exactly-once semantics, backpressure control, and optimized APIs. Flink has connectors for third-party data sources and AWS Services, such as Apache Kafka, Apache NiFi, Amazon Kinesis, and Amazon MSK. Flink can be used for Event Driven (Fraud Detection), Data Analytics (Ad-Hoc Analysis), and Data Pipeline (Continuous ETL) applications. Amazon Elastic Kubernetes Service (Amazon EKS) is the chosen deployment option for many AWS customers for Big Data frameworks such as Apache Spark and Apache Flink. Flink has native integration with Kubernetes allowing direct deployment and dynamic resource allocation.

In this post, I illustrate the deployment of scalable, highly available (HA), resilient, and cost optimized Flink application using Kubernetes via Amazon EKS and Amazon EC2 Spot Instances (Spot). Learn how to save money on big data streaming workloads by implementing this solution.

Overview

Amazon EC2 Spot Instances

Amazon EC2 Spot Instances let you take advantage of spare EC2 capacity in the AWS Cloud and are available at up to a 90% discount compared to On-Demand Instances. Spot Instances receive a two-minute warning when these instances are about to be reclaimed by Amazon EC2. There are many graceful ways to handle the interruption. Recently EC2 Instance rebalance recommendation has been added to send proactive notifications when a Spot Instance is at elevated risk of interruption. Spot Instances are a great way to scale up and increase throughput of Big Data workloads and has been adopted by many customers.

Apache Flink and Kubernetes

Apache Flink is an adaptable framework and it allows multiple deployment options and one of them being Kubernetes. Flink framework has a couple of key building blocks.

  • Job Client submits the job in form of a JobGraph to the Job Manager.
  • Job Manager plays the role of central work coordinator which distributes the job to the Task Managers.
  • Task Managers are the worker component, which runs the operators for source, transformations and sinks.
  • External components which are optional such as Resource Provider, HA Service Provider, Application Data Source, Sinks etc., and this varies with the deployment mode and options.

Image shows Flink application deployment architecture with Job Manager, Task Manager, Scheduler, Data Flow Graph, and client.

Flink supports different deployment (Resource Provider) modes when running on Kubernetes. In this blog we will use the Standalone Deployment mode, as we just want to showcase the functionality. We recommend first-time users however to deploy Flink on Kubernetes using the Native Kubernetes Deployment.

Flink can be run in different modes such as Session, Application, and Per-Job. The modes differ in cluster lifecycle, resource isolation and execution of the main() method. Flink can run jobs on Kubernetes via Application and Session Modes only.

  • Application Mode: This is a lightweight and scalable way to submit an application on Flink and is the preferred way to launch application as it supports better resource isolation. Resource isolation is achieved by running a cluster per job. Once the application shuts down all the Flink components are cleaned up.
  • Session Mode: This is a long running Kubernetes deployment of Flink. Multiple applications can be launched on a cluster and the applications competes for the resources. There may be multiple jobs running on a TaskManager in parallel. Its main advantage is that it saves time on spinning up a new Flink cluster for new jobs, however if one of the Task Managers fails it may impact all the jobs running on that.

Amazon EKS

Amazon EKS is a fully managed Kubernetes service. EKS supports creating and managing Spot Instances using Amazon EKS managed node groups following Spot best practices. This enables you to take advantage of the steep savings and scale that Spot Instances provide for interruptible workloads. EKS-managed node groups require less operational effort compared to using self-managed nodes. You can learn more in the blog “Amazon EKS now supports provisioning and managing EC2 Spot Instances in managed node groups.”

Apache Flink and Spot

Big Data frameworks like Spark and Flink are distributed to manage and process high volumes of data. Designed for failure, they can run on machines with different configurations, inherently resilient and flexible. Spot Instances can optimize runtimes by increasing throughput, while spending the same (or less). Flink can tolerate interruptions using restart and failover strategies.

Fault Tolerance

Fault tolerance is implemented in Flink with the help of check-pointing the state. Checkpoints allow Flink to recover state and positions in the streams. There are two per-requisites for check-pointing a persistent data source (Apache Kafka, Amazon Kinesis) which has the ability to replay data and a persistent distributed storage to store state (Amazon Simple Storage Service (Amazon S3), HDFS).

Cost Optimization

Job Manager and Task Manager are key building blocks of Flink. The Task Manager is the compute intensive part and Job Manager is the orchestrator. We would be running Task Manager on Spot Instances and Job Manager on On Demand Instances.

Scaling

Flink supports elastic scaling via Reactive Mode, Task Managers can be added/removed based on metrics monitored by an external service monitor like Horizontal Pod Autoscaling (HPA). When scaling up new pods would be added, if the cluster has resources they would be scheduled it not then they will go in pending state. Cluster Autoscaler (CA) detects pods in pending state and new nodes will be added by EC2 Auto Scaling. This is ideal with Spot Instances as it implements elastic scaling with higher throughput in a cost optimized way.

Tutorial: Running Flink applications in a cost optimized way

In this tutorial, I review steps, which help you launch cost optimized and resilient Flink workloads running on EKS via Application mode. The streaming application will read dummy Stock ticker prices send to an Amazon Kinesis Data Stream by Amazon Kinesis Data Generator, try to determine the highest price within a per-defined window, and output will be written onto Amazon S3 files.

Image shows Flink application pipeline with data flowing from Amazon Kinesis Data Generator to Kinesis Data Stream, processed in Apache Flink and output being written in Amazon S3

The configuration files can be found in this github location. To run the workload on Kubernetes, make sure you have eksctl and kubectl command line utilities installed on your computer or on an AWS Cloud9 environment. You can run this by using an AWS IAM user or role that has the Administrator Access policy attached to it, or check the minimum required permissions for using eksctl. The Spot node groups in the Amazon EKS cluster can be launched both in a managed or a self-managed way, in this post I use the EKS Managed node group for Spot Instances.

Steps

When we deploy Flink in Application Mode it runs as a single application. The cluster is exclusive for the job. We will be bundling the user code in the Flink image for that purpose and upload in Amazon Elastic Container Registry (Amazon ECR). Amazon ECR is a fully managed container registry that makes it easy to store, manage, share, and deploy your container images and artifacts anywhere.

1. Build the Amazon ECR Image

  • Login using the following cmd and don’t forget to replace the AWS_REGION and AWS_ACCOUNT_ID with your details.

aws ecr get-login-password --region ${AWS_REGION} | docker login --username AWS —password-stdin ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com

  • Create a repository

aws ecr create-repository --repository-name flink-demo --image-scanning-configuration scanOnPush=true —region ${AWS_REGION}

  • Build the Docker image:

Download the Docker file. I am using multistage docker build here. The sample code is from Github’s Amazon Kinesis Data Analytics Java examples. I modified the code to allow checkpointing and change the sliding window interval. Build and push the docker image using the following instructions.

docker build --tag flink-demo .

  • Tag and Push your image to Amazon ECR

docker tag flink-demo:latest ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/flink-demo:latest
docker push ${ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.
amazonaws.com/flink-demo:latest

2. Create Amazon S3/Amazon Kinesis Access Policy

First, I must create an access policy to allow the Flink application to read/write from Amazon fFS3 and read Kinesis data streams. Download the Amazon S3 policy file from here and modify the <<output folder>> to an Amazon S3 bucket which you have to create.

  • Run the following to create the policy. Note the ARN.

aws iam create-policy --policy-name flink-demo-policy --policy-document file://flink-demo-policy.json

3. Cluster and node groups deployment

  • Create an EKS cluster using the following command:

eksctl create cluster –name= flink-demo --node-private-networking --without-nodegroup --asg-access –region=<<AWS Region>>

The cluster takes approximately 15 minutes to launch.

  • Create the node group using the nodeGroup config file. I am using multiple nodeGroups of different sizes to adapt Spot best practice of diversification.  Replace the <<Policy ARN>> string using the ARN string from the previous step.

eksctl create nodegroup -f managedNodeGroups.yml

  • Download the Cluster Autoscaler and edit it to add the cluster-name (flink-demo)

curl -LO https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml

4. Install the Cluster AutoScaler using the following command:

kubectl apply -f cluster-autoscaler-autodiscover.yaml

  • Using EKS Managed node groups requires significantly less operational effort compared to using self-managed node group and enables:
    • Auto enforcement of Spot best practices.
    • Spot Instance lifecycle management.
    • Auto labeling of Pods.
  • eksctl has integrated amazon-ec2-instance-selector to enable auto selection of instances based on the criteria passed. This has multiple benefits
    • ‘instance diversification’ is implemented by enabling multiple instance types selection in the node group which works well with CA
    • Reduces manual effort of selecting the instances.
  • We can create node group manifests using ‘dryrun’ and then create node groups using that.

eksctl create cluster --name flink-demo --instance-selector-vcpus=2 --instance-selector-memory=4 --dry-run

eksctl create node group -f managedNodeGroups.yml

5. Create service accounts for Flink

$ kubectl create serviceaccount flink-service-account
$ kubectl create clusterrolebinding flink-role-binding-flink --clusterrole=edit --serviceaccount=default:flink-service-account

6. Deploy Flink

This install folder here has all the YAML files required to deploy a standalone Flink cluster. Run the install.sh file. This will deploy the cluster with a JobManager, a pool of TaskManagers and a Service exposing JobManager’s ports.

  • This is a High-Availability(HA) deployment of Flink with the use of Kubernetes high availability service.
  • The JobManager runs on OnDemand and TaskManager on Spot. As the cluster is launched in Application Mode, if a node is interrupted only one job will be restarted.
  • Autoscaling is enabled by the use of ‘Reactive Mode’. Horizontal Pod Autoscaler is used to monitor the CPU load and scale accordingly.
  • Check-pointing is enabled which allows Flink to save state and be fault tolerant.

Image shows the Flink dashboard highlighting checkpoints for a job

7. Create Amazon Kinesis data stream and send dummy data      

Log in to AWS Management Console and create a Kinesis data stream name ‘ExampleInputStream’. Kinesis Data Generator is used to send data to the data stream. The template of the dummy data can be found here. Once this sends data the Flink application starts processing.

Image shows Amazon Kinesis Data Generator console sending data to Kinesis Data Strea

Observations

Spot Interruptions

If there is an interruption then the Flick application will be restarted using check-pointed data. The JobManager will restore the job as highlighted in the following log. The node will be replaced automatically by the Managed Node Group.

mage shows logs from a Flink job highlighting job restart using checkpoints.

One will be able to observe the graceful restart in the Flink UI.

Image shows the Flink dashboard highlighting job restart after failure.

AutoScaling

You can observe the elastic scaling using logs. The number of TaskManagers in the Flink UI will also reflect the scaling state.

Image shows kubectl output showing status of HPA during scale-out

Cleanup

If you are trying out the tutorial, run the following steps to make sure that you don’t encounter unwanted costs.

  • Run the delete.sh file.
  • Delete the EKS cluster and the node groups:
    • eksctl delete cluster --name flink-demo
  • Delete the Amazon S3 Access Policy:
    • aws iam delete-policy --policy-arn <<POLICY ARN>>
  • Delete the Amazon S3 Bucket:
    • aws s3 rb --force s3://<<S3_BUCKET>>
  • Delete the CloudFormation stack related to Kinesis Data Generator named ‘Kinesis-Data-Generator-Cognito-User’
  • Delete the Kinesis Data Stream.

Conclusion

In this blog, I demonstrated how you can run Flink workloads on a Kubernetes Cluster using Spot Instances, achieving scalability, resilience, and cost optimization. To cost optimize your Flink based big data workloads you should start thinking about using Amazon EKS and Spot Instances.

Have You Checked the New Kubernetes RBAC Swiss Army Knife?

Post Syndicated from Gadi Naor original https://blog.rapid7.com/2021/10/12/kubernetes-rbac-swiss-army-knife/

Have You Checked the New Kubernetes RBAC Swiss Army Knife?

Kubernetes Role-Based Access Control (RBAC) is a method of regulating access to computer or network resources based on the roles of individual users within your organization. RBAC authorization uses the rbac.authorization.k8s.io API group to drive authorization decisions, allowing you to dynamically configure policies through the Kubernetes API. This is all quite useful, but Kubernetes RBAC is often viewed as complex and not very user-friendly.

Introducing Your Swiss Army Knife for RBAC Controls

InsightCloudSec’s RBAC tool is an all-in-one open-source tool for analyzing Kubernetes RBAC policies and simplifying any complexities associated with Kubernetes RBAC.

InsightCloudSec’s RBAC tool significantly simplifies querying, analyzing, and generating RBAC policies. It is available as a standalone tool or as a kubectl Krew Plugin.

Visualize Cluster RBAC Policies and Usage

A Kubernetes RBAC command can be used to analyze cluster policies and how they are being used and generate a simple relationship graph.

Have You Checked the New Kubernetes RBAC Swiss Army Knife?

By default, rbac-tool viz will connect to the local cluster (pointed by kubeconfig) and create a RBAC graph of the actively running workload on all namespaces except kube-system.

Examples

# Scan the cluster pointed by the kubeconfig context 'myctx'
rbac-tool viz --cluster-context myctx

# Scan and create a PNG image from the graph
rbac-tool viz --outformat dot --exclude-namespaces=soemns && cat rbac.dot | dot -Tpng > rbac.png && google-chrome rbac.png

Analyze Risky RBAC Permission

The command rbac-tool analysis analyzes RBAC permissions and highlights overly permissive principals, risky permissions, or any specific permissions that are not desired by cluster operators.

The command allows the use of a custom analysis rule set, as well as the ability to define custom exceptions (global and per-rule), and can integrate into deployment tools such as GitOps and automation analysis tasks in order to detect undesired permission changes, unexpected drifts, or risky roles.

Examples

# Analyze the cluster pointed by the kubeconfig context 'myctx' with the internal analysis rule set
rbac-tool analysis --cluster-context myctx

Query Who Can Perform Certain Kubernetes API Actions

The command rbac-tool who-can enables operators to simply query which subjects/principals are allowed to perform a certain action based on the presently configured RBAC policies.

Examples

# Who can read ConfigMap resources
rbac-tool who-can get configmaps

# Who can watch Deployments
rbac-tool who-can watch deployments.apps

# Who can read the Kubernetes API endpoint /apis
rbac-tool who-can get /apis

# Who can read a secret resource by the name some-secret
rbac-tool who-can get secret/some-secret

A Flat and Simple View of RBAC Permissions

The command rbac-tool policy-rules aggregates the policies and relationships from the various RBAC resources, and provides a flat view of the allowed permissions for any given User/ServiceAccount/Group.

Examples

# List policy rules for system unauthenticated group
rbac-tool policy-rules -e '^system:unauth'

Output:

Have You Checked the New Kubernetes RBAC Swiss Army Knife?

Generate RBAC Policies Easily

Kubernetes RBAC lacks the notion of denying semantics, which means generating an RBAC policy that says “Allow everything except THIS” is not as straightforward as one would imagine.

Here are some examples that capture how rbac-tool generate can help:

  • Generate a ClusterRole policy that allows users to read everything except secrets and services
  • Generate a Role policy that allows create, update, get, list (read/write) everything except Secrets, Services, Ingresses, and NetworkPolicies
  • Generate a Role policy that allows create, update, get, list (read/write) everything except StatefulSets

Command Line Examples

Examples generated against Kubernetes cluster v1.16 deployed using KIND:

# Generate a ClusterRole policy that allows users to read everything except secrets and services
rbac-tool  gen  --deny-resources=secrets.,services. --allowed-verbs=get,list

# Generate a Role policy that allows create, update, get, list (read/write) everything except Secrets, Services, NetworkPolicies in core,Apps and networking.k8s.io API groups
rbac-tool  gen --generated-type=Role --deny-resources=secrets.,services.,networkpolicies.networking.k8s.io --allowed-verbs=* --allowed-groups=,extensions,apps,networking.k8s.i

# Generate a Role policy that allows create, update, get, list (read/write) everything except StatefulSets
rbac-tool  gen --generated-type=Role --deny-resources=apps.statefulsets --allowed-verbs=*

Example output

# Generate a Role policy that allows create, update, get, list (read/write) everything except Secrets, Services, NetworkPolicies in core,Apps & networking.k8s.io API groups
rbac-tool  gen --generated-type=Role --deny-resources=secrets.,services.,networkpolicies.networking.k8s.io --allowed-verbs=* --allowed-groups=,extensions,apps,networking.k8s.io

Output:

Have You Checked the New Kubernetes RBAC Swiss Army Knife?

Another useful policy generation command is rbac-tool auditgen, which can generate RBAC policy from Kubernetes audit events.

Conclusion

InsightCloudSec’s RBAC tool fills various gaps that exist in the Kubernetes native tools, and addresses common RBAC-related use cases. This RBAC tool is an all-in-one solution that helps practitioners to perform RBAC analysis, querying, and policy curation.

You’ve got your full Swiss army knife now—what are you waiting for?

Check out this link for more information and a step-by-side installation guide.

Rapid7 Introduces: Kubernetes Security Guardrails

Post Syndicated from Alon Berger original https://blog.rapid7.com/2021/07/26/rapid7-introduces-kubernetes-security-guardrails/

Rapid7 Introduces: Kubernetes Security Guardrails

Cloud and container technology provide tremendous flexibility, speed, and agility, so it’s not surprising that organizations around the globe are continuing to embrace cloud and container technology. Many organizations are using multiple tools to secure their often complex cloud and container environments, while struggling to maintain the flexibility, speed, and agility required to keep security intact.

Cloud Security Just Got Better!

In addition to acquiring DivvyCloud, a top-tier Cloud Security Posture Management (CSPM) platform in 2020, Rapid7 recently announced another successful acquisition— joining forces with Alcide, a leading Kubernetes security start-up that offers advanced Cloud Workload Protection Platform (CWPP) capabilities.

Rapid7 is taking the lead in the CSPM space by leveraging both DivvyCloud’s and Alcide’s capabilities and incorporating them into a single platform: InsightCloudSec, your one-stop shop for superior cloud security solutions.

Learn more about InsightCloudSec here

Built for DevOps, Trusted by Security

In retrospect, 2020 was a tipping point for the Kubernetes community, with a massive increase in adoption across the globe. Many companies, seeking an efficient, cost-effective way to make this huge shift to the cloud, turned to Kubernetes. But this in turn created a growing need to remove the Kubernetes security blind spots. It is for this reason that we are introducing Kubernetes Security Guardrails.

With Kubernetes Security Guardrails, organizations are equipped with a multi-cluster vulnerability scanner that covers rich Kubernetes security best practices and compliance policies, such as CIS Benchmarks. As part of Rapid7’s InsightCloudSec solution, this new ability introduces a platform-based and easy-to-maintain solution for Kubernetes security that is deployed in minutes and is fully streamlined in the Kubernetes pipeline.

Securing Kubernetes With InsightCloudSec

Kubernetes Security Guardrails is the most comprehensive solution for all relevant Kubernetes security requirements, designed from a DevOps perspective with in-depth visibility for security teams. Integrated within minutes, Kubernetes Guardrails simplifies the security assessment for the entire Kubernetes environment and the CI/CD pipeline while creating baseline profiles for each cluster, highlighting and scoring security risks, misconfigurations, and hygiene drifts.

Both DevOps and Security teams enjoy the continuous and dynamic analysis of their Kubernetes deployments, all while seamlessly complying with regulatory requirements for Kubernetes such as PCI, GDPR, and HIPAA.

With Kubernetes Guardrails, Dev teams are able to create a snapshot of cluster risks, delivered with a detailed list of misconfigurations, while detecting real-time hygiene and conformance drifts for deployments running on any cloud environment.

Some of the most common use cases include:

  • Kubernetes vulnerability scanning
  • Hunting misplaced secrets and excessive secret access
  • Workload hardening (from pod security to network policies)
  • Istio security and configuration best practices
  • Ingress controllers security
  • Kubernetes API server access privileges
  • Kubernetes operators best practices
  • RBAC controls and misconfigurations

Rapid7 proudly brings forward a Kubernetes security solution that encapsulates all-in-one capabilities with incomparable coverage for all things Kubernetes.

With a security-first approach and a strict compliance adherence, Kubernetes Guardrails enable a better understanding and control over distributed projects, and help organizations maintain smooth business operations.

Want to learn more? Register for the webinar on InsightCloudSec and its Kubernetes protection.

Automatic Remediation of Kubernetes Nodes

Post Syndicated from Andrew DeMaria original https://blog.cloudflare.com/automatic-remediation-of-kubernetes-nodes/

Automatic Remediation of Kubernetes Nodes

Automatic Remediation of Kubernetes Nodes

We use Kubernetes to run many of the diverse services that help us control Cloudflare’s edge. We have five geographically diverse clusters, with hundreds of nodes in our largest cluster. These clusters are self-managed on bare-metal machines which gives us a good amount of power and flexibility in the software and integrations with Kubernetes. However, it also means we don’t have a cloud provider to rely on for virtualizing or managing the nodes. This distinction becomes even more prominent when considering all the different reasons that nodes degrade. With self-managed bare-metal machines, the list of reasons that cause a node to become unhealthy include:

  • Hardware failures
  • Kernel-level software failures
  • Kubernetes cluster-level software failures
  • Degraded network communication
  • Software updates are required
  • Resource exhaustion1

Automatic Remediation of Kubernetes Nodes

Unhappy Nodes

We have plenty of examples of failures in the aforementioned categories, but one example has been particularly tedious to deal with. It starts with the following log line from the kernel:

unregister_netdevice: waiting for lo to become free. Usage count = 1

The issue is further observed with the number of network interfaces on the node owned by the Container Network Interface (CNI) plugin getting out of proportion with the number of running pods:

$ ip link | grep cali | wc -l
1088

This is unexpected as it shouldn’t exceed the maximum number of pods allowed on a node (we use the default limit of 110). While this issue is interesting and perhaps worthy of a whole separate blog, the short of it is that the Linux network interfaces owned by the CNI are not getting cleaned up after a pod terminates.

Some history on this can be read in a Docker GitHub issue. We found this seems to plague nodes with a longer uptime, and after rebooting the node it would be fine for about a month. However, with a significant number of nodes, this was happening multiple times per day. Each instance would need rebooting, which means going through our worker reboot procedure which looked like this:

  1. Cordon off the affected node to prevent new workloads from scheduling on it.
  2. Collect any diagnostic information for later investigation.
  3. Drain the node of current workloads.
  4. Reboot and wait for the node to come back.
  5. Verify the node is healthy.
  6. Re-enable scheduling of new workloads to the node.

While solving the underlying issue would be ideal, we needed a mitigation to avoid toil in the meantime — an automated node remediation process.

Existing Detection and Remediation Solutions

While not complicated, the manual remediation process outlined previously became tedious and distracting, as we had to reboot nodes multiple times a day. Some manual intervention is unavoidable, but for cases matching the following, we wanted automation:

  • Generic worker nodes
  • Software issues confined to a given node
  • Already researched and diagnosed issues

Limiting automatic remediation to generic worker nodes is important as there are other node types in our clusters where more care is required. For example, for control-plane nodes the process has to be augmented to check etcd cluster health and ensure proper redundancy for components servicing the Kubernetes API. We are also going to limit the problem space to known software issues confined to a node where we expect automatic remediation to be the right answer (as in our ballooning network interface problem). With that in mind, we took a look at existing solutions that we could use.

Node Problem Detector

Node problem detector is a daemon that runs on each node that detects problems and reports them to the Kubernetes API. It has a pluggable problem daemon system such that one can add their own logic for detecting issues with a node. Node problems are distinguished between temporary and permanent problems, with the latter being persisted as status conditions on the Kubernetes node resources.2

Draino and Cluster-Autoscaler

Draino as its name implies, drains nodes but does so based on Kubernetes node conditions. It is meant to be used with cluster-autoscaler which then can add or remove nodes via the cluster plugins to scale node groups.

Kured

Kured is a daemon that looks at the presence of a file on the node to initiate a drain, reboot and uncordon of the given node. It uses a locking mechanism via the Kubernetes API to ensure only a single node is acted upon at a time.

Cluster-API

The Kubernetes cluster-lifecycle SIG has been working on the cluster-api project to enable declaratively defining clusters to simplify provisioning, upgrading, and operating multiple Kubernetes clusters. It has a concept of machine resources which back Kubernetes node resources and furthermore has a concept of machine health checks. Machine health checks use node conditions to determine unhealthy nodes and then the cluster-api provider is then delegated to replace that machine via create and delete operations.

Proof of Concept

Interestingly, with all the above except for Kured, there is a theme of pluggable components centered around Kubernetes node conditions. We wanted to see if we could build a proof of concept using the existing theme and solutions. For the existing solutions, draino with cluster-autoscaler didn’t make sense in a non-cloud environment like our bare-metal set up. The cluster-api health checks are interesting, however they require a more complete investment into the cluster-api project to really make sense. That left us with node-problem-detector and kured. Deploying node-problem-detector was simple, and we ended up testing a custom-plugin-monitor like the following:

apiVersion: v1
kind: ConfigMap
metadata:
  name: node-problem-detector-config
data:
  check_calico_interfaces.sh: |
    #!/bin/bash
    set -euo pipefail
    
    count=$(nsenter -n/proc/1/ns/net ip link | grep cali | wc -l)
    
    if (( $count > 150 )); then
      echo "Too many calico interfaces ($count)"
      exit 1
    else
      exit 0
    fi
  cali-monitor.json: |
    {
      "plugin": "custom",
      "pluginConfig": {
        "invoke_interval": "30s",
        "timeout": "5s",
        "max_output_length": 80,
        "concurrency": 3,
        "enable_message_change_based_condition_update": false
      },
      "source": "calico-custom-plugin-monitor",
      "metricsReporting": false,
      "conditions": [
        {
          "type": "NPDCalicoUnhealthy",
          "reason": "CalicoInterfaceCountOkay",
          "message": "Normal amount of interfaces"
        }
      ],
      "rules": [
        {
          "type": "permanent",
          "condition": "NPDCalicoUnhealthy",
          "reason": "TooManyCalicoInterfaces",
          "path": "/bin/bash",
          "args": [
            "/config/check_calico_interfaces.sh"
          ],
          "timeout": "3s"
        }
      ]
    }

Testing showed that when the condition became true, a condition would be updated on the associated Kubernetes node like so:

kubectl get node -o json worker1a | jq '.status.conditions[] | select(.type | test("^NPD"))'
{
  "lastHeartbeatTime": "2020-03-20T17:05:17Z",
  "lastTransitionTime": "2020-03-20T17:05:16Z",
  "message": "Too many calico interfaces (154)",
  "reason": "TooManyCalicoInterfaces",
  "status": "True",
  "type": "NPDCalicoUnhealthy"
}

With that in place, the actual remediation needed to happen. Kured seemed to do most everything we needed, except that it was looking at a file instead of Kubernetes node conditions. We hacked together a patch to change that and tested it successfully end to end — we had a working proof of concept!

Revisiting Problem Detection

While the above worked, we found that node-problem-detector was unwieldy because we were replicating our existing monitoring into shell scripts and node-problem-detector configuration. A 2017 blog post describes Cloudflare’s monitoring stack, although some things have changed since then. What hasn’t changed is our extensive usage of Prometheus and Alertmanager.

For the network interface issue and other issues we wanted to address, we already had the necessary exported metrics and alerting to go with them. Here is what our already existing alert looked like3:

- alert: CalicoTooManyInterfaces
  expr: sum(node_network_info{device=~"cali.*"}) by (node) >= 200
  for: 1h
  labels:
    priority: "5"
    notify: chat-sre-core chat-k8s

Note that we use a “notify” label to drive the routing logic in Alertmanager. However, that got us asking, could we just route this to a Kubernetes node condition instead?

Introducing Sciuro

Automatic Remediation of Kubernetes Nodes

Sciuro is our open-source replacement of node-problem-detector that has one simple job: synchronize Kubernetes node conditions with currently firing alerts in Alertmanager. Node problems can be defined with a more holistic view and using already existing exporters such as node exporter, cadvisor or mtail. It also doesn’t run on affected nodes which allows us to rely on out-of-band remediation techniques. Here is a high level diagram of how Sciuro works:

Automatic Remediation of Kubernetes Nodes

Starting from the top, nodes are scraped by Prometheus, which collects those metrics and fires relevant alerts to Alertmanager. Sciuro polls Alertmanager for alerts with a matching receiver, matches them with a corresponding node resource in the Kubernetes API and updates that node’s conditions accordingly.

In more detail, we can start by defining an alert in Prometheus like the following:

- alert: CalicoTooManyInterfacesEarly
  expr: sum(node_network_info{device=~"cali.*"}) by (node) >= 150
  labels:
    priority: "6"
    notify: node-condition-k8s

Note the two differences from the previous alert. First, we use a new name with a more sensitive trigger. The idea is that we want automatic node remediation to try fixing the node first as soon as possible, but if the problem worsens or automatic remediation is failing, humans will still get notified to act. The second difference is that instead of notifying chat rooms, we route to a target called “node-condition-k8s”.

Sciuro then comes into play, polling the Altertmanager API for alerts matching the “node-condition-k8s” receiver. The following shows the equivalent using amtool:

$ amtool alert query -r node-condition-k8s
Alertname                 	Starts At            	Summary                                                               	 
CalicoTooManyInterfacesEarly  2021-05-11 03:25:21 UTC  Kubernetes node worker1a has too many Calico interfaces  

We can also check the labels for this alert:

$ amtool alert query -r node-condition-k8s -o json | jq '.[] | .labels'
{
  "alertname": "CalicoTooManyInterfacesEarly",
  "cluster": "a.k8s",
  "instance": "worker1a",
  "node": "worker1a",
  "notify": "node-condition-k8s",
  "priority": "6",
  "prometheus": "k8s-a"
}

Note the node and instance labels which Sciuro will use for matching with the corresponding Kubernetes node. Sciuro uses the excellent controller-runtime to keep track of and update node sources in the Kubernetes API. We can observe the updated node condition on the status field via kubectl:

$ kubectl get node worker1a -o json | jq '.status.conditions[] | select(.type | test("^AlertManager"))'
{
  "lastHeartbeatTime": "2021-05-11T03:31:20Z",
  "lastTransitionTime": "2021-05-11T03:26:53Z",
  "message": "[P6] Kubernetes node worker1a has too many Calico interfaces",
  "reason": "AlertIsFiring",
  "status": "True",
  "type": "AlertManager_CalicoTooManyInterfacesEarly"
}

One important note is Sciuro added the AlertManager_ prefix to the node condition type to prevent conflicts with other node condition types. For example, DiskPressure, a kubelet managed condition, could also be an alert name. Sciuro will also properly update heartbeat and transition times to reflect when it first saw the alert and its last update. With node conditions synchronized by Sciuro, remediation can take place via one of the existing tools. As mentioned previously we are using a modified version of Kured for now.

We’re happy to announce that we’ve open sourced Sciuro, and it can be found on GitHub where you can read the code, find the deployment instructions, or open a Pull Request for changes.

Managing Node Uptime

While we began using automatic node remediation for obvious problems, we’ve expanded its purpose to additionally keep node uptime low. Low node uptime is desirable to further reduce drift on nodes, keep the node initialization process well-oiled, and encourage the best deployment practices on the Kubernetes clusters. To expand on the last point, services that are deployed with best practices and in a high availability fashion should see negligible impact when a single node leaves the cluster. However, services that are not deployed with best practices will most likely have problems especially if they rely on singleton pods. By draining nodes more frequently, it introduces regular chaos that encourages best practices. To enable this with automatic node remediation the following alert was defined:

- alert: WorkerUptimeTooHigh
  expr: |
    (
      (
        (
              max by(node) (kube_node_role{role="worker"})
            - on(node) group_left()
              (max by(node) (kube_node_role{role!="worker"}))
          or on(node)
            max by(node) (kube_node_role{role="worker"})
        ) == 1
      )
    * on(node) group_left()
      (
        (time() - node_boot_time_seconds) > (60 * 60 * 24 * 7)
      )
    )
  labels:
    priority: "9"
    notify: node-condition-k8s

There is a bit of juggling with the kube_node_roles metric in the above to isolate the alert to generic worker nodes, but at a high level it looks at node_boot_time_seconds, a metric from prometheus node_exporter. Again the notify label is configured to send to node conditions which kicks off the automatic node remediation. One further detail is the priority here is set to “9” which is of lower precedence than our other alerts. Note that the message field of the node condition is prefixed with the alert priority in brackets. This allows the remediation process to take priority into account when choosing which node to remediate first, which is important because Kured uses a lock to act on a single node at a time.

Wrapping Up

In the past 30 days, we’ve used the above automatic node remediation process to action 571 nodes. That has saved our humans a considerable amount of time. We’ve also been able to reduce the time to repair for some issues as automatic remediation can act at all times of the day and with a faster response time.

As mentioned before, we’re open sourcing Sciuro and its code can be found on GitHub. We’re open to issues, suggestions, and pull requests. We do have some ideas for future improvements. For Sciuro, we may look to reduce latency which is mainly due to polling and potentially add a push model from Altermanager although this isn’t a need we’ve had yet.  For the larger node remediation story, we hope to do an overhaul of the remediating component. As mentioned previously, we are currently using a fork of kured, but a future replacement component should include the following:

  • Use out-of-band management interfaces to be able to shut down and power on nodes without a functional operating system.
  • Move from decentralized architecture to a centralized one that can integrate more complicated logic. This might include being able to act on entire failure domains in parallel.
  • Handle specialized nodes such as masters or storage nodes.

Finally, we’re looking for more people passionate about Kubernetes to join our team. Come help us push Kubernetes to the next level to serve Cloudflare’s many needs!


1Exhaustion can be applied to hardware resources, kernel resources, or logical resources like the amount of logging being produced.
2Nearly all Kubernetes objects have spec and status fields. The status field is used to describe the current state of an object. For node resources, typically the kubelet manages a conditions field under the status field for reporting things like if the node is ready for servicing pods.
3The format of the following alert is documented on Prometheus Alerting Rules.